You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
* rename to LinearActivationQuantizedTensor
* using `implements` util to implement torch function and torch dispatch overwrites
Test Plan:
CI
Reviewers:
Subscribers:
Tasks:
Tags:
# Note: we only added cpu path here for 8da4w, this is for executorch, in the future
341
-
# 1. we'll add cpu/cuda version (int4mm etc.)
342
-
# 2. we'll need to hide the 8da4w executorch version under things like layouts (we also have multiple impl for cpu kernel as Michael mentioned), so it will be something like
343
-
# cpu device + et laytout --> gives current 8da4w executorch representation
344
-
# cpu device + avx layout --> gives optimized kernel for 8da4w in avx cpu etc.
345
-
# cuda device + some layout --> gives cuda kernel
346
-
347
-
# two scenarios where we currently fall back to vanilla mm:
348
-
# 1 - when tensor is on CUDA: we'll add this later, we'll also enable dispatching to optimized
349
-
# kernels in CPU as well, see the note above
350
-
# 2 - we're given non-floats - quantizing long to int8 is crazy
# Note: we only added cpu path here for 8da4w, this is for executorch, in the future
331
+
# 1. we'll add cpu/cuda version (int4mm etc.)
332
+
# 2. we'll need to hide the 8da4w executorch version under things like layouts (we also have multiple impl for cpu kernel as Michael mentioned), so it will be something like
333
+
# cpu device + et laytout --> gives current 8da4w executorch representation
334
+
# cpu device + avx layout --> gives optimized kernel for 8da4w in avx cpu etc.
335
+
# cuda device + some layout --> gives cuda kernel
354
336
355
-
raiseNotImplementedError(
356
-
f"AffineQuantizedTensor dispatch: attempting to run {func}, this is not supported"
357
-
)
337
+
# two scenarios where we currently fall back to vanilla mm:
338
+
# 1 - when tensor is on CUDA: we'll add this later, we'll also enable dispatching to optimized
339
+
# kernels in CPU as well, see the note above
340
+
# 2 - we're given non-floats - quantizing long to int8 is crazy
0 commit comments