Skip to content

Commit 6168a75

Browse files
Update PatchedVLLMKVCache for deepseek performance (#194)
Co-authored-by: Linoy Buchnik <[email protected]>
1 parent 35d6ad0 commit 6168a75

File tree

1 file changed

+5
-1
lines changed

1 file changed

+5
-1
lines changed

neural_compressor/torch/algorithms/fp8_quant/_quant_common/helper_modules.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1067,7 +1067,11 @@ def forward_measure(self, input, cache, *args, **kwargs):
10671067
return output_cache
10681068

10691069
def fetch_from_cache(self, cache, blocks, permutations=None):
1070-
quant_cache = self.quant_input(cache)
1070+
# TODO: Remove this workaround in next release [SW-221595]
1071+
if cache.dtype != self.lp_dtype:
1072+
quant_cache = self.quant_input(cache)
1073+
else:
1074+
quant_cache = cache
10711075
if permutations:
10721076
output_cache = self.orig_mod.fetch_from_cache(quant_cache, blocks, permutations)
10731077
for i in range(len(output_cache)):

0 commit comments

Comments
 (0)