-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Torch and Pickle Safe Load Fixes #8566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Eric Kerfoot <[email protected]>
Signed-off-by: Eric Kerfoot <[email protected]>
Signed-off-by: Eric Kerfoot <[email protected]>
WalkthroughReplaces many torch.load calls to use weights_only=True across nnU-Net bundle integration, CheckpointLoader, StateCacher, datasets, and tests. Persistent/LMDB cache I/O now stores tensors, adds LMDBDataset._safe_serialize/_safe_deserialize, and defaults LMDBDataset/CacheNTransDataset pickle_protocol to torch.serialization.DEFAULT_PROTOCOL. Meta-serialization safe globals expanded to include MetaObj, MetaTensor, MetaKeys, SpaceKeys. Removes PICKLE_KEY_SUFFIX and pickle_operations; list_data_collate/decollate_batch no longer perform pickle-based (de)serialization. convert_to_tensor gained a convert_numeric parameter. Tests updated to match these changes. A few signatures changed (LMDBDataset.init, CacheNTransDataset.init, convert_to_tensor) and new LMDB methods added. Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (11)
monai/utils/state_cacher.py (2)
127-127
: Load on CPU and fix lint.Use explicit CPU mapping and drop the lambda to satisfy Ruff ARG005 and avoid accidental GPU allocations.
- data_obj = torch.load(fn, map_location=lambda storage, location: storage, weights_only=True) + data_obj = torch.load(fn, map_location=torch.device("cpu"), weights_only=True)
67-70
: Docstring default is stale (protocol).Defaults are now torch.serialization.DEFAULT_PROTOCOL; update text.
- pickle_protocol: can be specified to override the default protocol, default to `2`. + pickle_protocol: can be specified to override the default protocol, default to `torch.serialization.DEFAULT_PROTOCOL`.- pickle_protocol: can be specified to override the default protocol, default to `self.pickle_protocol`. + pickle_protocol: can be specified to override the default protocol, default to `self.pickle_protocol` (defaults to `torch.serialization.DEFAULT_PROTOCOL`).Also applies to: 95-97
monai/data/dataset.py (1)
378-380
: Map cache loads to CPU explicitly.Avoid device surprises when reading user caches.
- return torch.load(hashfile, weights_only=False) + return torch.load(hashfile, map_location=torch.device("cpu"), weights_only=False)monai/handlers/checkpoint_loader.py (1)
125-125
: Default to CPU map_location for safe, predictable loads.Uses CPU unless caller overrides
map_location
.- checkpoint = torch.load(self.load_path, map_location=self.map_location, weights_only=True) + checkpoint = torch.load( + self.load_path, map_location=(self.map_location or torch.device("cpu")), weights_only=True + )tests/utils/test_state_cacher.py (1)
30-36
: Good switch to DEFAULT_PROTOCOL. Consider protocol matrix.Add a parametrized case for
pickle.HIGHEST_PROTOCOL
to guard regressions across protocols.monai/apps/nnunet/nnunet_bundle.py (4)
136-136
: Map pretrained load to CPU.Prevents accidental CUDA allocs during file IO.
- state_dict = torch.load(pretrained_model, weights_only=True) + state_dict = torch.load(pretrained_model, map_location=torch.device("cpu"), weights_only=True)
390-395
: CPU-map nnU-Net checkpoint reads.Consistent with other loads here.
- nnunet_checkpoint_final = torch.load( - Path(nnunet_model_folder).joinpath(f"fold_{fold}", "checkpoint_final.pth"), weights_only=True - ) - nnunet_checkpoint_best = torch.load( - Path(nnunet_model_folder).joinpath(f"fold_{fold}", "checkpoint_best.pth"), weights_only=True - ) + nnunet_checkpoint_final = torch.load( + Path(nnunet_model_folder).joinpath(f"fold_{fold}", "checkpoint_final.pth"), + map_location=torch.device("cpu"), + weights_only=True, + ) + nnunet_checkpoint_best = torch.load( + Path(nnunet_model_folder).joinpath(f"fold_{fold}", "checkpoint_best.pth"), + map_location=torch.device("cpu"), + weights_only=True, + )
481-483
: CPU-map optional model checkpoint.- state_dict = torch.load(model_ckpt, weights_only=True) + state_dict = torch.load(model_ckpt, map_location=torch.device("cpu"), weights_only=True)
545-546
: CPU-map bundle/monai checkpoint reads.- nnunet_checkpoint: dict = torch.load(f"{bundle_root_folder}/models/nnunet_checkpoint.pth", weights_only=True) + nnunet_checkpoint: dict = torch.load( + f"{bundle_root_folder}/models/nnunet_checkpoint.pth", map_location=torch.device("cpu"), weights_only=True + )- monai_last_checkpoint: dict = torch.load( - f"{bundle_root_folder}/models/fold_{fold}/checkpoint_epoch={final_epoch}.pt", weights_only=True - ) + monai_last_checkpoint: dict = torch.load( + f"{bundle_root_folder}/models/fold_{fold}/checkpoint_epoch={final_epoch}.pt", + map_location=torch.device("cpu"), + weights_only=True, + )- monai_best_checkpoint: dict = torch.load( - f"{bundle_root_folder}/models/fold_{fold}/checkpoint_key_metric={best_key_metric}.pt", weights_only=True - ) + monai_best_checkpoint: dict = torch.load( + f"{bundle_root_folder}/models/fold_{fold}/checkpoint_key_metric={best_key_metric}.pt", + map_location=torch.device("cpu"), + weights_only=True, + )Also applies to: 556-557, 569-570
monai/data/meta_tensor.py (1)
613-615
: Allowlist looks right; add LazyAttr too for pending-ops safety.Pending ops frequently store LazyAttr keys; without allowlisting LazyAttr, loads may fail for objects carrying pending metadata. Extend the list.
Apply:
-if hasattr(torch.serialization, "add_safe_globals"): - torch.serialization.add_safe_globals([MetaObj, MetaTensor, MetaKeys, SpaceKeys]) +if hasattr(torch.serialization, "add_safe_globals"): + torch.serialization.add_safe_globals([MetaObj, MetaTensor, MetaKeys, SpaceKeys, LazyAttr])tests/data/meta_tensor/test_meta_tensor.py (1)
248-248
: Gate weights_only kwarg for older PyTorch.Directly passing weights_only breaks on torch versions that don’t support the kwarg. Make the test tolerant.
Patch in-place:
- m2 = torch.load(fname, weights_only=True) + import inspect + if "weights_only" in inspect.signature(torch.load).parameters: + m2 = torch.load(fname, weights_only=True) + else: + m2 = torch.load(fname)If your CI matrix is 2.6+ only, feel free to skip this change; otherwise this avoids TypeError on older torch.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Knowledge Base: Disabled due to Reviews > Disable Knowledge Base setting
📒 Files selected for processing (7)
monai/apps/nnunet/nnunet_bundle.py
(8 hunks)monai/data/dataset.py
(2 hunks)monai/data/meta_tensor.py
(1 hunks)monai/handlers/checkpoint_loader.py
(1 hunks)monai/utils/state_cacher.py
(1 hunks)tests/data/meta_tensor/test_meta_tensor.py
(1 hunks)tests/utils/test_state_cacher.py
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
⚙️ CodeRabbit configuration file
Review the Python code for quality and correctness. Ensure variable names adhere to PEP8 style guides, are sensible and informative in regards to their function, though permitting simple names for loop and comprehension variables. Ensure routine names are meaningful in regards to their function and use verbs, adjectives, and nouns in a semantically appropriate way. Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings. Examine code for logical error or inconsistencies, and suggest what may be changed to addressed these. Suggest any enhancements for code improving efficiency, maintainability, comprehensibility, and correctness. Ensure new or modified definitions will be covered by existing or new unit tests.
Files:
monai/data/meta_tensor.py
tests/data/meta_tensor/test_meta_tensor.py
tests/utils/test_state_cacher.py
monai/utils/state_cacher.py
monai/data/dataset.py
monai/handlers/checkpoint_loader.py
monai/apps/nnunet/nnunet_bundle.py
🪛 Ruff (0.12.2)
monai/utils/state_cacher.py
127-127: Unused lambda argument: location
(ARG005)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
- GitHub Check: min-dep-pytorch (2.5.1)
- GitHub Check: min-dep-py3 (3.10)
- GitHub Check: min-dep-py3 (3.12)
- GitHub Check: min-dep-pytorch (2.8.0)
- GitHub Check: min-dep-py3 (3.11)
- GitHub Check: min-dep-pytorch (2.7.1)
- GitHub Check: min-dep-pytorch (2.6.0)
- GitHub Check: min-dep-py3 (3.9)
- GitHub Check: min-dep-os (ubuntu-latest)
- GitHub Check: min-dep-os (macOS-latest)
- GitHub Check: min-dep-os (windows-latest)
- GitHub Check: flake8-py3 (mypy)
- GitHub Check: quick-py3 (windows-latest)
- GitHub Check: packaging
- GitHub Check: quick-py3 (ubuntu-latest)
- GitHub Check: quick-py3 (macOS-latest)
- GitHub Check: build-docs
- GitHub Check: flake8-py3 (codeformat)
- GitHub Check: flake8-py3 (pytype)
Signed-off-by: Eric Kerfoot <[email protected]>
for more information, see https://pre-commit.ci
This MR fixed part of #8565 |
Signed-off-by: Eric Kerfoot <[email protected]>
…ing to tensors before saving. Signed-off-by: Eric Kerfoot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
monai/data/dataset.py (1)
376-389
: Fix misleading comment and treat legacy/incompatible cache files as misses.Comment says weights_only=False but code uses True; also handle UnpicklingError to auto-rebuild.
- try: - # Loading with weights_only=False is expected to be safe as these should be the user's own cached data - return torch.load(hashfile, weights_only=True) + try: + # Load with weights_only=True for safety. Legacy cache files created with older MONAI/PyTorch + # may be incompatible; they will be deleted and recomputed below. + return torch.load(hashfile, weights_only=True) except PermissionError as e: if sys.platform != "win32": raise e - except RuntimeError as e: - if "Invalid magic number; corrupt file" in str(e): + except (pickle.UnpicklingError, RuntimeError) as e: + # Treat incompatible or corrupt cache as a cache miss. + if "Invalid magic number; corrupt file" in str(e) or isinstance(e, pickle.UnpicklingError): warnings.warn(f"Corrupt cache file detected: {hashfile}. Deleting and recomputing.") hashfile.unlink() else: raise e
♻️ Duplicate comments (1)
monai/data/dataset.py (1)
210-213
: Tighten wording; capitalize NumPy; add legacy-cache note.Clarify allowed types and call out that old cache files may need deletion.
- Cached data is expected to be tensors, primitives, or dictionaries keying to these values. Numpy arrays will - be converted to tensors, however any other object type returned by transforms will not be loadable since - `torch.load` will be used with `weights_only=True` to prevent loading of potentially malicious objects. + Cached data is expected to be tensors, primitives, or dictionaries mapping to these types. NumPy arrays are + converted to tensors; other object types returned by transforms are not loadable because `torch.load` uses + `weights_only=True` to prevent deserializing arbitrary objects. If you have legacy cache files created before + this change, delete them to force a rebuild.
🧹 Nitpick comments (1)
monai/data/dataset.py (1)
1678-1678
: Surface a clearer error when meta cache fails to load.Wrap to emit an actionable message suggesting cache cleanup.
- return torch.load(self.cache_dir / meta_hash_file_name, weights_only=True) + try: + return torch.load(self.cache_dir / meta_hash_file_name, weights_only=True) + except (pickle.UnpicklingError, RuntimeError) as e: + raise RuntimeError( + f"Failed to load meta cache '{meta_hash_file_name}' with weights_only=True. " + f"Please delete cached files under '{self.cache_dir}' and retry." + ) from e
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base
setting
📒 Files selected for processing (1)
monai/data/dataset.py
(5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
⚙️ CodeRabbit configuration file
Review the Python code for quality and correctness. Ensure variable names adhere to PEP8 style guides, are sensible and informative in regards to their function, though permitting simple names for loop and comprehension variables. Ensure routine names are meaningful in regards to their function and use verbs, adjectives, and nouns in a semantically appropriate way. Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings. Examine code for logical error or inconsistencies, and suggest what may be changed to addressed these. Suggest any enhancements for code improving efficiency, maintainability, comprehensibility, and correctness. Ensure new or modified definitions will be covered by existing or new unit tests.
Files:
monai/data/dataset.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
- GitHub Check: min-dep-pytorch (2.7.1)
- GitHub Check: min-dep-pytorch (2.8.0)
- GitHub Check: min-dep-pytorch (2.5.1)
- GitHub Check: min-dep-pytorch (2.6.0)
- GitHub Check: min-dep-py3 (3.9)
- GitHub Check: min-dep-py3 (3.12)
- GitHub Check: min-dep-py3 (3.11)
- GitHub Check: min-dep-os (windows-latest)
- GitHub Check: min-dep-os (macOS-latest)
- GitHub Check: min-dep-os (ubuntu-latest)
- GitHub Check: min-dep-py3 (3.10)
- GitHub Check: build-docs
- GitHub Check: quick-py3 (windows-latest)
- GitHub Check: packaging
- GitHub Check: quick-py3 (ubuntu-latest)
- GitHub Check: flake8-py3 (mypy)
- GitHub Check: flake8-py3 (pytype)
- GitHub Check: flake8-py3 (codeformat)
- GitHub Check: quick-py3 (macOS-latest)
🔇 Additional comments (2)
monai/data/dataset.py (2)
400-400
: Good change: tensorizing cached payload makes weights-only reload safe.This aligns cache writes with safe loads.
1658-1659
: Confirm meta contents are tensor/primitives-only before tensorization.If MetaTensor.meta ever carries non-primitive objects, weights-only load will fail. Add/verify a unit test covering typical meta (affine, spacing, orientation, strings, numbers).
Signed-off-by: Eric Kerfoot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
monai/data/dataset.py (3)
377-389
: Harden cache-read path; preserve traceback; handle legacy/unsafe payloads
- Use bare
raise
(TRY201).- Treat EOFError and “weights-only” failures as cache-miss → delete and recompute.
- Optionally pin to CPU on load.
- try: - return torch.load(hashfile, weights_only=True) - except PermissionError as e: - if sys.platform != "win32": - raise e - except (pickle.UnpicklingError, RuntimeError) as e: # corrupt or unloadable cached files are recomputed - if "Invalid magic number; corrupt file" in str(e) or isinstance(e, pickle.UnpicklingError): - warnings.warn(f"Corrupt cache file detected: {hashfile}. Deleting and recomputing.") - hashfile.unlink() - else: - raise e + try: + return torch.load(hashfile, map_location="cpu", weights_only=True) + except PermissionError as e: + if sys.platform != "win32": + raise + except (pickle.UnpicklingError, EOFError, RuntimeError) as e: + msg = str(e) + if ( + isinstance(e, pickle.UnpicklingError) + or "Invalid magic number; corrupt file" in msg + or "weights only load failed" in msg.lower() + or "only tensors and" in msg.lower() + ): + warnings.warn(f"Invalid or legacy cache detected: {hashfile}. Deleting and recomputing.") + try: + hashfile.unlink() + except FileNotFoundError: + pass + else: + raise
1658-1662
: Don’t tensorize metadata dict; keep shape as tuple for reshapeTensorizing the meta dict turns
shape
into a Tensor, breaking downstreamreshape
expectations. Save plain Python primitives.- torch.save( - obj=convert_to_tensor(self._meta_cache[meta_hash_file_name]), + torch.save( + obj=self._meta_cache[meta_hash_file_name], f=temp_hash_file, pickle_module=look_up_option(self.pickle_module, SUPPORTED_PICKLE_MOD), pickle_protocol=self.pickle_protocol, )
1606-1614
: Fix list-of-dicts GPU path: wrong reshape target and wrong value savedTwo correctness bugs:
- Line 1612 reshapes
item[i]
(a dict) instead of the loaded array.- Lines 1634–1639 save the whole dict
_item
instead of the per-key value.@@ - for i, _item in enumerate(item_transformed): - for k in _item: + for i, _item in enumerate(item_transformed): + for k in _item: meta_i_k = self._load_meta_cache(meta_hash_file_name=f"{hashfile.name}-{k}-meta-{i}") item_k = kvikio_numpy.fromfile( f"{hashfile}-{k}-{i}", dtype=meta_i_k["dtype"], like=cp.empty(()) ) - item_k = convert_to_tensor(item[i].reshape(meta_i_k["shape"]), device=f"cuda:{self.device}") + shape = tuple(meta_i_k["shape"]) if not isinstance(meta_i_k["shape"], tuple) else meta_i_k["shape"] + item_k = convert_to_tensor(item_k.reshape(shape), device=f"cuda:{self.device}") item[i].update({k: item_k, f"{k}_meta_dict": meta_i_k}) @@ - for i, _item in enumerate(_item_transformed): - for k in _item: + for i, _item in enumerate(_item_transformed): + for k, v in _item.items(): data_hashfile = f"{hashfile}-{k}-{i}" meta_hash_file_name = f"{hashfile.name}-{k}-meta-{i}" - self._create_new_cache(_item, data_hashfile, meta_hash_file_name) + self._create_new_cache(v, data_hashfile, meta_hash_file_name)Also applies to: 1634-1639
🧹 Nitpick comments (2)
monai/data/dataset.py (2)
210-213
: Tighten wording; confirm intended policy for PersistentDatasetMinor phrasing/capitalization. Also, this class now uses weights_only=True, which conflicts with the PR summary stating PersistentDataset would retain non-weights-only. Please confirm the intent.
- Cached data is expected to be tensors, primitives, or dictionaries keying to these values. Numpy arrays will - be converted to tensors, however any other object type returned by transforms will not be loadable since - `torch.load` will be used with `weights_only=True` to prevent loading of potentially malicious objects. - Legacy cache files may not be loadable and may need to be recomputed. + Cached data must be tensors, simple Python primitives, or dicts mapping to these values. NumPy arrays will + be converted to tensors; any other object type returned by transforms will not be loadable because + `torch.load` is invoked with `weights_only=True` to prevent loading potentially malicious objects. + Legacy cache files may not be loadable and may need to be recomputed.
263-263
: Doc typo: TraceKeysUse the correct constant casing in the doc.
- reset_ops_id: whether to set `TraceKeys.ID` to ``Tracekys.NONE``, defaults to ``True``. + reset_ops_id: whether to set `TraceKeys.ID` to ``TraceKeys.NONE``, defaults to ``True``.- reset_ops_id: whether to set `TraceKeys.ID` to ``Tracekeys.NONE``, defaults to ``True``. + reset_ops_id: whether to set `TraceKeys.ID` to ``TraceKeys.NONE``, defaults to ``True``.Also applies to: 564-565
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base
setting
📒 Files selected for processing (1)
monai/data/dataset.py
(5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
⚙️ CodeRabbit configuration file
Review the Python code for quality and correctness. Ensure variable names adhere to PEP8 style guides, are sensible and informative in regards to their function, though permitting simple names for loop and comprehension variables. Ensure routine names are meaningful in regards to their function and use verbs, adjectives, and nouns in a semantically appropriate way. Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings. Examine code for logical error or inconsistencies, and suggest what may be changed to addressed these. Suggest any enhancements for code improving efficiency, maintainability, comprehensibility, and correctness. Ensure new or modified definitions will be covered by existing or new unit tests.
Files:
monai/data/dataset.py
🪛 Ruff (0.12.2)
monai/data/dataset.py
382-382: Use raise
without specifying exception name
Remove exception name
(TRY201)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
- GitHub Check: flake8-py3 (pytype)
- GitHub Check: flake8-py3 (mypy)
- GitHub Check: quick-py3 (ubuntu-latest)
- GitHub Check: build-docs
- GitHub Check: packaging
- GitHub Check: quick-py3 (macOS-latest)
- GitHub Check: flake8-py3 (codeformat)
- GitHub Check: quick-py3 (windows-latest)
- GitHub Check: min-dep-pytorch (2.8.0)
- GitHub Check: min-dep-pytorch (2.7.1)
- GitHub Check: min-dep-pytorch (2.6.0)
- GitHub Check: min-dep-pytorch (2.5.1)
- GitHub Check: min-dep-py3 (3.10)
- GitHub Check: min-dep-py3 (3.11)
- GitHub Check: min-dep-py3 (3.9)
- GitHub Check: min-dep-py3 (3.12)
- GitHub Check: min-dep-os (ubuntu-latest)
- GitHub Check: min-dep-os (windows-latest)
- GitHub Check: min-dep-os (macOS-latest)
🔇 Additional comments (3)
monai/data/dataset.py (3)
400-404
: Confirm intentional metadata drop
convert_to_tensor(_item_transformed)
will drop MetaTensor metadata. That matches the docstring but is a behavior change. Confirm this is desired for PersistentDataset caches.
1678-1678
: LGTM: safe load for meta cacheWeights-only load here is consistent with saving plain primitives. Good.
371-405
: Add tests for cache compatibility and GPU list-of-dicts pathPlease add unit tests to cover:
- Legacy cache files containing non-tensor objects → load error → file deleted → recompute path.
- GDSDataset list-of-dicts GPU branch (per-key save/load, reshape correctness).
- PersistentDataset behavior change (no MetaTensor meta persisted).
Signed-off-by: Eric Kerfoot <[email protected]>
Signed-off-by: Eric Kerfoot <[email protected]>
Signed-off-by: Eric Kerfoot <[email protected]>
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
monai/data/dataset.py (3)
380-389
: Handle weights-only failures and preserve traceback.
- Re-raising with
raise e
drops the original traceback. Use bareraise
.- Treat “Weights only load failed” as a recompute signal to avoid hard failures on legacy caches.
Apply:
- except PermissionError as e: - if sys.platform != "win32": - raise e - except (pickle.UnpicklingError, RuntimeError) as e: # corrupt or unloadable cached files are recomputed - if "Invalid magic number; corrupt file" in str(e) or isinstance(e, pickle.UnpicklingError): + except PermissionError: + if sys.platform != "win32": + raise + except (pickle.UnpicklingError, RuntimeError) as e: # corrupt or unloadable cached files are recomputed + if ( + "Invalid magic number; corrupt file" in str(e) + or "Weights only load failed" in str(e) + or isinstance(e, pickle.UnpicklingError) + ): warnings.warn(f"Corrupt cache file detected: {hashfile}. Deleting and recomputing.") hashfile.unlink() else: raise e
1623-1627
: Bug: wrong variable in reshape causes exception.Should reshape item_k, not item[i].
Apply:
- item_k = convert_to_tensor(item[i].reshape(meta_i_k["shape"]), device=f"cuda:{self.device}") + item_k = convert_to_tensor(item_k.reshape(meta_i_k["shape"]), device=f"cuda:{self.device}")
1649-1651
: Bug: writing wrong value to cache for list-of-dicts.Pass
_item[k]
(the tensor/array), not the dict_item
.Apply:
- self._create_new_cache(_item, data_hashfile, meta_hash_file_name) + self._create_new_cache(_item[k], data_hashfile, meta_hash_file_name)
🧹 Nitpick comments (3)
monai/utils/type_conversion.py (1)
120-121
: Clarify and harden convert_numeric behavior.
- The recursion doesn’t thread convert_numeric, so nested numerics in lists/tuples/dicts will always tensorize. If intentional, please document explicitly; otherwise, pass the flag through recursion.
Apply if you want nested behavior to mirror the top-level flag:
- elif isinstance(data, list): - list_ret = [convert_to_tensor(i, dtype=dtype, device=device, track_meta=track_meta) for i in data] + elif isinstance(data, list): + list_ret = [convert_to_tensor(i, dtype=dtype, device=device, track_meta=track_meta, convert_numeric=convert_numeric) for i in data] @@ - elif isinstance(data, tuple): - tuple_ret = tuple(convert_to_tensor(i, dtype=dtype, device=device, track_meta=track_meta) for i in data) + elif isinstance(data, tuple): + tuple_ret = tuple(convert_to_tensor(i, dtype=dtype, device=device, track_meta=track_meta, convert_numeric=convert_numeric) for i in data) @@ - elif isinstance(data, dict): - return {k: convert_to_tensor(v, dtype=dtype, device=device, track_meta=track_meta) for k, v in data.items()} + elif isinstance(data, dict): + return {k: convert_to_tensor(v, dtype=dtype, device=device, track_meta=track_meta, convert_numeric=convert_numeric) for k, v in data.items()}Also applies to: 140-141, 172-173
monai/data/dataset.py (2)
593-594
: Avoid print in library code; use logger.Use logging instead of stdout.
Apply:
- print(f"Accessing lmdb file: {self.db_file.absolute()}.") + logging.getLogger(__name__).info("Accessing lmdb file: %s", self.db_file.absolute())
400-405
: Consistency: use the configured pickle module in all torch.save calls.Some saves use pickle_module; keep consistent to avoid environment-specific issues.
Apply:
- torch.save( + torch.save( obj=convert_to_tensor(_item_transformed, convert_numeric=False), f=temp_hash_file, - pickle_module=look_up_option(self.pickle_module, SUPPORTED_PICKLE_MOD), + pickle_module=look_up_option(self.pickle_module, SUPPORTED_PICKLE_MOD), pickle_protocol=self.pickle_protocol, )(ensure pickle_module is passed everywhere torch.save is used)
Also applies to: 1670-1675
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base
setting
📒 Files selected for processing (4)
monai/data/__init__.py
(1 hunks)monai/data/dataset.py
(9 hunks)monai/data/utils.py
(4 hunks)monai/utils/type_conversion.py
(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
⚙️ CodeRabbit configuration file
Review the Python code for quality and correctness. Ensure variable names adhere to PEP8 style guides, are sensible and informative in regards to their function, though permitting simple names for loop and comprehension variables. Ensure routine names are meaningful in regards to their function and use verbs, adjectives, and nouns in a semantically appropriate way. Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings. Examine code for logical error or inconsistencies, and suggest what may be changed to addressed these. Suggest any enhancements for code improving efficiency, maintainability, comprehensibility, and correctness. Ensure new or modified definitions will be covered by existing or new unit tests.
Files:
monai/data/__init__.py
monai/utils/type_conversion.py
monai/data/utils.py
monai/data/dataset.py
🪛 Ruff (0.12.2)
monai/data/dataset.py
383-383: Use raise
without specifying exception name
Remove exception name
(TRY201)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
- GitHub Check: min-dep-py3 (3.12)
- GitHub Check: min-dep-pytorch (2.5.1)
- GitHub Check: min-dep-pytorch (2.7.1)
- GitHub Check: min-dep-pytorch (2.8.0)
- GitHub Check: min-dep-pytorch (2.6.0)
- GitHub Check: min-dep-py3 (3.11)
- GitHub Check: min-dep-py3 (3.9)
- GitHub Check: min-dep-py3 (3.10)
- GitHub Check: min-dep-os (macOS-latest)
- GitHub Check: min-dep-os (ubuntu-latest)
- GitHub Check: min-dep-os (windows-latest)
- GitHub Check: build-docs
- GitHub Check: quick-py3 (windows-latest)
- GitHub Check: flake8-py3 (mypy)
- GitHub Check: packaging
- GitHub Check: quick-py3 (ubuntu-latest)
- GitHub Check: quick-py3 (macOS-latest)
- GitHub Check: flake8-py3 (pytype)
- GitHub Check: flake8-py3 (codeformat)
🔇 Additional comments (2)
monai/data/utils.py (1)
502-504
: Pickle path removed; update docs and ensure invertibility paths don’t regress.Commenting out config.USE_META_DICT and pickle_operations changes collation/decollation behavior. Confirm docs and tests cover Applied_operations/inversion flows without the pickle round-trip.
Also applies to: 656-666
monai/data/dataset.py (1)
211-215
: Docs conflict with PR summary; align the policy.Text says PersistentDataset uses weights_only=True; PR summary claims Persistent datasets keep non-weights-only loads. Please reconcile and update docs/implementation accordingly.
Signed-off-by: Eric Kerfoot <[email protected]>
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
monai/data/dataset.py (1)
604-613
: Align LMDB safe (de)serialization with PersistentDataset and pickle module registry.Use the registered pickle module, avoid converting numerics to tensors, and be explicit about CPU on load.
- def _safe_serialize(self,val): - out=BytesIO() - torch.save(convert_to_tensor(val), out, pickle_protocol =self.pickle_protocol) - out.seek(0) - return out.read() + def _safe_serialize(self, val): + out = BytesIO() + torch.save( + convert_to_tensor(val, convert_numeric=False), + out, + pickle_module=look_up_option(self.pickle_module, SUPPORTED_PICKLE_MOD), + pickle_protocol=self.pickle_protocol, + ) + out.seek(0) + return out.read() - def _safe_deserialize(self,val): - out=BytesIO(val) - return torch.load(out,weights_only=True) + def _safe_deserialize(self, val): + out = BytesIO(val) + return torch.load(out, map_location="cpu", weights_only=True)Also consider brief Google-style docstrings for these two helpers.
🧹 Nitpick comments (5)
monai/utils/state_cacher.py (1)
127-131
: Use explicit CPU map_location and fix Ruff ARG005.Avoid the unused lambda arg and be explicit about CPU loads.
- data_obj = torch.load(fn, map_location=lambda storage, location: storage, weights_only=True) + data_obj = torch.load(fn, map_location="cpu", weights_only=True)monai/data/dataset.py (4)
211-215
: Nit: wording/typo.Use “NumPy” and tighten punctuation.
- Cached data is expected to be tensors, primitives, or dictionaries keying to these values. Numpy arrays will - be converted to tensors, however any other object type returned by transforms will not be loadable since + Cached data is expected to be tensors, primitives, or dictionaries keying to these values. NumPy arrays will + be converted to tensors; however, any other object type returned by transforms will not be loadable since `torch.load` will be used with `weights_only=True` to prevent loading of potentially malicious objects. Legacy cache files may not be loadable and may need to be recomputed.
380-380
: Prefer explicit CPU load.Be explicit to avoid accidental device restores when reading caches.
- return torch.load(hashfile, weights_only=True) + return torch.load(hashfile, map_location="cpu", weights_only=True)
383-383
: Use bareraise
to preserve traceback.Fixes Ruff TRY201 and keeps context.
- raise e + raise
1692-1693
: Be explicit about CPU when loading meta.Avoid any accidental GPU allocations on metadata load.
- return torch.load(self.cache_dir / meta_hash_file_name, weights_only=True) + return torch.load(self.cache_dir / meta_hash_file_name, map_location="cpu", weights_only=True)
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base
setting
📒 Files selected for processing (2)
monai/data/dataset.py
(13 hunks)monai/utils/state_cacher.py
(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
⚙️ CodeRabbit configuration file
Review the Python code for quality and correctness. Ensure variable names adhere to PEP8 style guides, are sensible and informative in regards to their function, though permitting simple names for loop and comprehension variables. Ensure routine names are meaningful in regards to their function and use verbs, adjectives, and nouns in a semantically appropriate way. Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings. Examine code for logical error or inconsistencies, and suggest what may be changed to addressed these. Suggest any enhancements for code improving efficiency, maintainability, comprehensibility, and correctness. Ensure new or modified definitions will be covered by existing or new unit tests.
Files:
monai/data/dataset.py
monai/utils/state_cacher.py
🪛 Ruff (0.12.2)
monai/data/dataset.py
383-383: Use raise
without specifying exception name
Remove exception name
(TRY201)
monai/utils/state_cacher.py
127-127: Unused lambda argument: location
(ARG005)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
- GitHub Check: min-dep-pytorch (2.6.0)
- GitHub Check: min-dep-pytorch (2.8.0)
- GitHub Check: min-dep-pytorch (2.5.1)
- GitHub Check: min-dep-pytorch (2.7.1)
- GitHub Check: min-dep-py3 (3.12)
- GitHub Check: min-dep-py3 (3.11)
- GitHub Check: min-dep-py3 (3.9)
- GitHub Check: min-dep-py3 (3.10)
- GitHub Check: min-dep-os (ubuntu-latest)
- GitHub Check: min-dep-os (windows-latest)
- GitHub Check: min-dep-os (macOS-latest)
- GitHub Check: quick-py3 (macOS-latest)
- GitHub Check: packaging
- GitHub Check: flake8-py3 (mypy)
- GitHub Check: quick-py3 (windows-latest)
- GitHub Check: flake8-py3 (pytype)
- GitHub Check: quick-py3 (ubuntu-latest)
- GitHub Check: flake8-py3 (codeformat)
🔇 Additional comments (6)
monai/utils/state_cacher.py (1)
67-69
: Docstring tweak LGTM.Accurately documents DEFAULT_PROTOCOL usage.
monai/data/dataset.py (5)
15-18
: Imports LGTM.BytesIO and UnpicklingError are used below.
401-405
: Safe save path LGTM.Consistent with safer tensor-based serialization and DEFAULT_PROTOCOL.
540-541
: Default protocol doc LGTM.Matches torch.serialization.DEFAULT_PROTOCOL.
685-687
: Use helpers — good.Switch to
_safe_deserialize
is consistent with the new policy.
211-215
: Confirm intent: persistent datasets now use weights_only=Truemonai/data/dataset.py (lines 211–215) and docs use weights_only=True, but the PR summary claims persistent datasets remain non-weights-only — confirm intended behavior and update the PR description/docs for consistency. Found torch.load calls without weights_only in tests/bundle/test_bundle_download.py:{335,339,420,424} and monai/apps/nnunet/nnunet_bundle.py:{184,197,390,393,555,568} — reconcile or document exceptions.
Signed-off-by: Eric Kerfoot <[email protected]>
Signed-off-by: Eric Kerfoot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
monai/utils/type_conversion.py (1)
172-181
: Propagate convert_numeric into recursive calls.Currently nested numerics still convert even when convert_numeric=False. Propagate the flag for consistency.
Apply this diff:
- elif isinstance(data, list): - list_ret = [convert_to_tensor(i, dtype=dtype, device=device, track_meta=track_meta) for i in data] + elif isinstance(data, list): + list_ret = [ + convert_to_tensor(i, dtype=dtype, device=device, track_meta=track_meta, convert_numeric=convert_numeric) + for i in data + ] return _convert_tensor(list_ret, dtype=dtype, device=device) if wrap_sequence else list_ret - elif isinstance(data, tuple): - tuple_ret = tuple(convert_to_tensor(i, dtype=dtype, device=device, track_meta=track_meta) for i in data) + elif isinstance(data, tuple): + tuple_ret = tuple( + convert_to_tensor(i, dtype=dtype, device=device, track_meta=track_meta, convert_numeric=convert_numeric) + for i in data + ) return _convert_tensor(tuple_ret, dtype=dtype, device=device) if wrap_sequence else tuple_ret - elif isinstance(data, dict): - return {k: convert_to_tensor(v, dtype=dtype, device=device, track_meta=track_meta) for k, v in data.items()} + elif isinstance(data, dict): + return { + k: convert_to_tensor(v, dtype=dtype, device=device, track_meta=track_meta, convert_numeric=convert_numeric) + for k, v in data.items() + }monai/data/dataset.py (1)
1621-1627
: Bug: reshape uses dict instead of array (crashes).Should reshape item_k, not item[i].
Apply:
- item_k = convert_to_tensor(item[i].reshape(meta_i_k["shape"]), device=f"cuda:{self.device}") + item_k = convert_to_tensor(item_k.reshape(meta_i_k["shape"]), device=f"cuda:{self.device}")
♻️ Duplicate comments (2)
monai/data/utils.py (1)
58-61
: Restoring PICKLE_KEY_SUFFIX for BC (deprecated alias).Removing this symbol is breaking; PR is labeled non‑breaking. Restore it in all and as an alias to TraceKeys.KEY_SUFFIX with a deprecation note.
Suggested changes (outside the current diff):
__all__ = [ "AFFINE_TOL", "SUPPORTED_PICKLE_MOD", + "PICKLE_KEY_SUFFIX", ... ] +# Deprecated alias for backward compatibility; remove in a future major release. +PICKLE_KEY_SUFFIX = TraceKeys.KEY_SUFFIX # deprecatedmonai/data/dataset.py (1)
604-612
: Honor pickle_module and avoid converting top‑level numerics.Use the registered pickle module and convert_numeric=False; also mirror pickle_module on load.
Apply:
- def _safe_serialize(self, val): - out = BytesIO() - torch.save(convert_to_tensor(val), out, pickle_protocol=self.pickle_protocol) - out.seek(0) - return out.read() + def _safe_serialize(self, val): + out = BytesIO() + torch.save( + convert_to_tensor(val, convert_numeric=False), + out, + pickle_module=look_up_option(self.pickle_module, SUPPORTED_PICKLE_MOD), + pickle_protocol=self.pickle_protocol, + ) + out.seek(0) + return out.read() - def _safe_deserialize(self, val): - return torch.load(BytesIO(val), map_location="cpu", weights_only=True) + def _safe_deserialize(self, val): + return torch.load( + BytesIO(val), + map_location="cpu", + weights_only=True, + pickle_module=look_up_option(self.pickle_module, SUPPORTED_PICKLE_MOD), + )
🧹 Nitpick comments (3)
monai/utils/type_conversion.py (1)
140-142
: Docstring: clarify scope of convert_numeric.State whether convert_numeric applies recursively. If we propagate it (see next comment), update the docstring accordingly.
Apply:
- convert_numeric: if `True`, convert numeric Python values to tensors. + convert_numeric: if `True`, convert numeric Python values to tensors. This flag is propagated + recursively into nested lists/tuples/dicts for consistent behavior.monai/data/utils.py (1)
419-426
: Typo: “collage_meta_tensor” → “collate_meta_tensor”.Fix docstring reference.
Apply:
- Collate a sequence of meta tensor into a single batched metatensor. This is called by `collage_meta_tensor` + Collate a sequence of meta tensor into a single batched metatensor. This is called by `collate_meta_tensor`monai/data/dataset.py (1)
1691-1692
: Pass configured pickle_module when loading meta cache.Ensure symmetry with save and support custom pickle modules.
Apply:
- return torch.load(self.cache_dir / meta_hash_file_name, weights_only=True) + return torch.load( + self.cache_dir / meta_hash_file_name, + weights_only=True, + pickle_module=look_up_option(self.pickle_module, SUPPORTED_PICKLE_MOD), + map_location="cpu", + )
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base
setting
📒 Files selected for processing (4)
monai/data/__init__.py
(0 hunks)monai/data/dataset.py
(13 hunks)monai/data/utils.py
(2 hunks)monai/utils/type_conversion.py
(3 hunks)
💤 Files with no reviewable changes (1)
- monai/data/init.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
⚙️ CodeRabbit configuration file
Review the Python code for quality and correctness. Ensure variable names adhere to PEP8 style guides, are sensible and informative in regards to their function, though permitting simple names for loop and comprehension variables. Ensure routine names are meaningful in regards to their function and use verbs, adjectives, and nouns in a semantically appropriate way. Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings. Examine code for logical error or inconsistencies, and suggest what may be changed to addressed these. Suggest any enhancements for code improving efficiency, maintainability, comprehensibility, and correctness. Ensure new or modified definitions will be covered by existing or new unit tests.
Files:
monai/data/utils.py
monai/utils/type_conversion.py
monai/data/dataset.py
🪛 Ruff (0.12.2)
monai/data/dataset.py
383-383: Use raise
without specifying exception name
Remove exception name
(TRY201)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
- GitHub Check: build-docs
- GitHub Check: flake8-py3 (mypy)
- GitHub Check: packaging
- GitHub Check: quick-py3 (ubuntu-latest)
- GitHub Check: flake8-py3 (pytype)
- GitHub Check: quick-py3 (macOS-latest)
- GitHub Check: flake8-py3 (codeformat)
- GitHub Check: quick-py3 (windows-latest)
- GitHub Check: min-dep-os (ubuntu-latest)
- GitHub Check: min-dep-py3 (3.12)
- GitHub Check: min-dep-pytorch (2.8.0)
- GitHub Check: min-dep-pytorch (2.7.1)
- GitHub Check: min-dep-os (windows-latest)
- GitHub Check: min-dep-os (macOS-latest)
- GitHub Check: min-dep-pytorch (2.5.1)
- GitHub Check: min-dep-pytorch (2.6.0)
- GitHub Check: min-dep-py3 (3.9)
- GitHub Check: min-dep-py3 (3.11)
- GitHub Check: min-dep-py3 (3.10)
🔇 Additional comments (3)
monai/data/dataset.py (3)
211-215
: PersistentDataset docs conflict with PR summary—confirm intent.Docs say caches load with weights_only=True; the PR summary states PersistentDataset should allow full unpickling (weights_only=False). Reconcile and align code/docs.
If keeping weights_only=True, update the PR description and migration notes; if reverting, change the load sites accordingly and adjust the docstring.
401-405
: LGTM: safe save path.Saving via convert_to_tensor(..., convert_numeric=False) with configured pickle module and protocol looks correct.
684-686
: LGTM: LMDB cache uses safe deserialization.Loading via weights_only=True is aligned with the new strategy.
Signed-off-by: Eric Kerfoot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
monai/utils/type_conversion.py (2)
127-142
: Docstring: clarify scope and edge cases.Tighten wording and document wrap_sequence interaction and NumPy scalars.
@@ - dtype: target data type to when converting to Tensor. + dtype: target data type when converting to Tensor. @@ - convert_numeric: if `True`, convert numeric Python values to tensors. + convert_numeric: if `True`, convert numeric Python values (int, float, bool) to tensors. + Note: NumPy scalars (e.g., np.float32(1)) are also treated as numeric when converting. + Note: If `wrap_sequence=True`, sequences may become a single tensor regardless of `convert_numeric`.
179-186
: wrap_sequence x convert_numeric nuance—document or assert.With
wrap_sequence=True
, a list/tuple of numerics becomes a single tensor even ifconvert_numeric=False
. If this is intentional, add a short note to the docstring; otherwise, gate the wrap withconvert_numeric
.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base
setting
📒 Files selected for processing (1)
monai/utils/type_conversion.py
(4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
⚙️ CodeRabbit configuration file
Review the Python code for quality and correctness. Ensure variable names adhere to PEP8 style guides, are sensible and informative in regards to their function, though permitting simple names for loop and comprehension variables. Ensure routine names are meaningful in regards to their function and use verbs, adjectives, and nouns in a semantically appropriate way. Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings. Examine code for logical error or inconsistencies, and suggest what may be changed to addressed these. Suggest any enhancements for code improving efficiency, maintainability, comprehensibility, and correctness. Ensure new or modified definitions will be covered by existing or new unit tests.
Files:
monai/utils/type_conversion.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
- GitHub Check: flake8-py3 (pytype)
- GitHub Check: flake8-py3 (mypy)
- GitHub Check: flake8-py3 (codeformat)
- GitHub Check: quick-py3 (windows-latest)
- GitHub Check: build-docs
- GitHub Check: packaging
- GitHub Check: quick-py3 (ubuntu-latest)
- GitHub Check: quick-py3 (macOS-latest)
- GitHub Check: min-dep-pytorch (2.8.0)
- GitHub Check: min-dep-pytorch (2.7.1)
- GitHub Check: min-dep-py3 (3.11)
- GitHub Check: min-dep-pytorch (2.5.1)
- GitHub Check: min-dep-py3 (3.12)
- GitHub Check: min-dep-py3 (3.9)
- GitHub Check: min-dep-pytorch (2.6.0)
- GitHub Check: min-dep-py3 (3.10)
- GitHub Check: min-dep-os (macOS-latest)
- GitHub Check: min-dep-os (ubuntu-latest)
- GitHub Check: min-dep-os (windows-latest)
🔇 Additional comments (1)
monai/utils/type_conversion.py (1)
162-164
: Good: propagate dtype/device/track_meta/convert_numeric into recursion.Excluding
wrap_sequence
here is correct to avoid collapsing nested sequences.
Signed-off-by: Eric Kerfoot <[email protected]>
/build |
2 similar comments
/build |
/build |
cc @ericspod |
Signed-off-by: Eric Kerfoot <[email protected]>
This is likely related to what pickle protocol is used, I'm changing defaults in tests now. |
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
tests/data/test_gdsdataset.py (1)
88-90
: Good switch to torch.serialization.DEFAULT_PROTOCOL; clarify TODO and tie to issue.The intent is clear; let’s anchor the TODO to the tracked issue and PyTorch behavior for future maintainers.
- # TODO: was pickle.HIGHEST_PROTOCOL but this wasn't compatible with torch.load, need to improve compatibility + # TODO(#8355): previously used pickle.HIGHEST_PROTOCOL; incompatible with torch.load(weights_only=True) in PyTorch>=2.6. + # Audit/document supported protocols and add a regression test covering cache read/write under weights_only=True.tests/data/test_persistentdataset.py (1)
20-20
: Import torch solely for DEFAULT_PROTOCOL is acceptable here.If you want to avoid a heavyweight import in tests, alias once at module scope and reuse:
-import torch +import torch +# Optional: TORCH_PICKLE_PROTOCOL = torch.serialization.DEFAULT_PROTOCOL
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base
setting
📒 Files selected for processing (2)
tests/data/test_gdsdataset.py
(1 hunks)tests/data/test_persistentdataset.py
(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
⚙️ CodeRabbit configuration file
Review the Python code for quality and correctness. Ensure variable names adhere to PEP8 style guides, are sensible and informative in regards to their function, though permitting simple names for loop and comprehension variables. Ensure routine names are meaningful in regards to their function and use verbs, adjectives, and nouns in a semantically appropriate way. Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings. Examine code for logical error or inconsistencies, and suggest what may be changed to addressed these. Suggest any enhancements for code improving efficiency, maintainability, comprehensibility, and correctness. Ensure new or modified definitions will be covered by existing or new unit tests.
Files:
tests/data/test_gdsdataset.py
tests/data/test_persistentdataset.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
- GitHub Check: min-dep-py3 (3.12)
- GitHub Check: min-dep-pytorch (2.5.1)
- GitHub Check: min-dep-pytorch (2.8.0)
- GitHub Check: min-dep-pytorch (2.6.0)
- GitHub Check: min-dep-os (windows-latest)
- GitHub Check: min-dep-os (ubuntu-latest)
- GitHub Check: min-dep-py3 (3.11)
- GitHub Check: min-dep-pytorch (2.7.1)
- GitHub Check: min-dep-os (macOS-latest)
- GitHub Check: min-dep-py3 (3.10)
- GitHub Check: min-dep-py3 (3.9)
- GitHub Check: flake8-py3 (pytype)
- GitHub Check: quick-py3 (windows-latest)
- GitHub Check: packaging
- GitHub Check: quick-py3 (macOS-latest)
- GitHub Check: build-docs
- GitHub Check: flake8-py3 (mypy)
- GitHub Check: flake8-py3 (codeformat)
- GitHub Check: quick-py3 (ubuntu-latest)
/build |
Fixes #8355.
Description
This modifies the use of
torch.load
to load weights-only everywhere. This will change some behaviour in that data needs to be converted to tensors where possible, ie. converting Numpy arrays to tensors.This issue was reported by @h3rrr in GHSA-6vm5-6jv9-rjpj.
This will also replace uses of
pickle
for saving and loading withtorch.save/load
. This also requires that Numpy arrays be converted but otherwise uses the restriction of weights-only to enforce safe loading.This issue was reported by @h3rrr in GHSA-p8cm-mm2v-gwjm.
Types of changes
./runtests.sh -f -u --net --coverage
../runtests.sh --quick --unittests --disttests
.make html
command in thedocs/
folder.