-
Notifications
You must be signed in to change notification settings - Fork 54
Added memory optimization for onnx transforms #538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
from typing import Optional, Tuple | ||
|
||
import numpy as np | ||
from onnx import ModelProto, external_data_helper, numpy_helper | ||
|
||
from QEfficient.utils.constants import ONNX_TRANSFROM_MEMORY_CLEANUP_INTERVAL | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: TRANSFORM - spell check
:param model: The ONNX model to check | ||
:returns: True if external data is already loaded, False otherwise | ||
""" | ||
for tensor in external_data_helper._get_all_tensors(model): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we think of skipping this extra loop for checking whether for all the tensors external data has been loaded or not. The place where we are loading the external data there we can maintain a flag. This flag by default will be set to false and then once all the external data is loaded we can mark it to TRUE. Then in code we may have to just check the flag. or may not need this function if you want to directly use the flag.
@@ -61,6 +89,15 @@ def apply(cls, model: ModelProto, *, onnx_base_dir: Optional[str] = None, **kwar | |||
tensor.CopyFrom(new_tensor) | |||
transformed = True | |||
|
|||
del neg_inf_mask, clipped_tensor, new_tensor | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this loop itself you can check and then update flag
Signed-off-by: Rishin Raj <[email protected]>
Signed-off-by: Rishin Raj <[email protected]>
570aad1
to
85c8a0d
Compare
Signed-off-by: Rishin Raj <[email protected]>
Signed-off-by: Rishin Raj <[email protected]>
Signed-off-by: Rishin Raj <[email protected]>
Signed-off-by: Rishin Raj <[email protected]>
Added periodic memory cleanup to FP16ClipTransform and SplitTensorsTransform to reduce memory usage during large tensor processing. Also avoids redundant external data loading when already present.