Skip to content

Commit cca3a36

Browse files
James Reedfacebook-github-bot
authored andcommitted
Dont zero out buffers in dynamic linear (pytorch#27002)
Summary: Pull Request resolved: pytorch#27002 This was taking a significant amount of time in my benchmarks with larger output sizes (e.g. final output projection in a language classification model) Test Plan: Imported from OSS Differential Revision: D17641765 Pulled By: jamesr66a fbshipit-source-id: b0ef30767eec9774fc503bb51fed039222026bba
1 parent 09f0e94 commit cca3a36

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -127,8 +127,8 @@ class QLinearDynamicInt8 final : public torch::OperatorKernel {
127127
std::vector<int64_t> out_sizes = input.sizes().vec();
128128
out_sizes.back() = N;
129129
// Allocate output Tensor and a buffer for fbgemmPacked to use
130-
auto output = at::zeros(out_sizes, input.options().dtype(at::kFloat));
131-
auto buffer = at::zeros_like(output, output.options().dtype(at::kInt));
130+
auto output = at::empty(out_sizes, input.options().dtype(at::kFloat));
131+
auto buffer = at::empty_like(output, output.options().dtype(at::kInt));
132132

133133
if (pack_ptr.q_scheme == kPerTensorAffine) {
134134
// Process the per tensor quantization.

0 commit comments

Comments
 (0)