Skip to content

Commit eeb985b

Browse files
mengniwang95linoybu
authored andcommitted
Update PatchedVLLMKVCache for deepseek performance (#194)
Co-authored-by: Linoy Buchnik <[email protected]>
1 parent db43fa8 commit eeb985b

File tree

1 file changed

+5
-1
lines changed

1 file changed

+5
-1
lines changed

neural_compressor/torch/algorithms/fp8_quant/_quant_common/helper_modules.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1071,7 +1071,11 @@ def forward_measure(self, input, cache, *args, **kwargs):
10711071
return output_cache
10721072

10731073
def fetch_from_cache(self, cache, blocks, permutations=None):
1074-
quant_cache = self.quant_input(cache)
1074+
# TODO: Remove this workaround in next release [SW-221595]
1075+
if cache.dtype != self.lp_dtype:
1076+
quant_cache = self.quant_input(cache)
1077+
else:
1078+
quant_cache = cache
10751079
if permutations:
10761080
output_cache = self.orig_mod.fetch_from_cache(quant_cache, blocks, permutations)
10771081
for i in range(len(output_cache)):

0 commit comments

Comments
 (0)