You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
* rename to LinearActivationQuantizedTensor
* using `implements` util to implement torch function and torch dispatch overwrites
Test Plan:
CI
Reviewers:
Subscribers:
Tasks:
Tags:
# Note: we only added cpu path here for 8da4w, this is for executorch, in the future
353
-
# 1. we'll add cpu/cuda version (int4mm etc.)
354
-
# 2. we'll need to hide the 8da4w executorch version under things like layouts (we also have multiple impl for cpu kernel as Michael mentioned), so it will be something like
355
-
# cpu device + et laytout --> gives current 8da4w executorch representation
356
-
# cpu device + avx layout --> gives optimized kernel for 8da4w in avx cpu etc.
357
-
# cuda device + some layout --> gives cuda kernel
358
-
359
-
# two scenarios where we currently fall back to vanilla mm:
360
-
# 1 - when tensor is on CUDA: we'll add this later, we'll also enable dispatching to optimized
361
-
# kernels in CPU as well, see the note above
362
-
# 2 - we're given non-floats - quantizing long to int8 is crazy
# Note: we only added cpu path here for 8da4w, this is for executorch, in the future
343
+
# 1. we'll add cpu/cuda version (int4mm etc.)
344
+
# 2. we'll need to hide the 8da4w executorch version under things like layouts (we also have multiple impl for cpu kernel as Michael mentioned), so it will be something like
345
+
# cpu device + et laytout --> gives current 8da4w executorch representation
346
+
# cpu device + avx layout --> gives optimized kernel for 8da4w in avx cpu etc.
347
+
# cuda device + some layout --> gives cuda kernel
366
348
367
-
raiseNotImplementedError(
368
-
f"AffineQuantizedTensor dispatch: attempting to run {func}, this is not supported"
369
-
)
349
+
# two scenarios where we currently fall back to vanilla mm:
350
+
# 1 - when tensor is on CUDA: we'll add this later, we'll also enable dispatching to optimized
351
+
# kernels in CPU as well, see the note above
352
+
# 2 - we're given non-floats - quantizing long to int8 is crazy
0 commit comments