Skip to content

Commit f94ef0c

Browse files
authored
[SW-227433] Revert "Update PatchedVLLMKVCache for deepseek performance (#194)" (#231)
remove WA
1 parent ac21933 commit f94ef0c

File tree

1 file changed

+1
-5
lines changed

1 file changed

+1
-5
lines changed

neural_compressor/torch/algorithms/fp8_quant/_quant_common/helper_modules.py

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -965,11 +965,7 @@ def forward_measure(self, input, cache, *args, **kwargs):
965965
return output_cache
966966

967967
def fetch_from_cache(self, cache, blocks):
968-
# TODO: Remove this workaround in next release [SW-221595]
969-
if cache.dtype != self.lp_dtype:
970-
quant_cache = self.quant_input(cache)
971-
else:
972-
quant_cache = cache
968+
quant_cache = self.quant_input(cache)
973969
output_cache = self.orig_mod.fetch_from_cache(quant_cache, blocks)
974970
return self.dequant_output(output_cache)
975971

0 commit comments

Comments
 (0)