Skip to content

Conversation

h3rrr
Copy link
Contributor

@h3rrr h3rrr commented Sep 12, 2025

The path traversal issue in the previously submitted vulnerability report has been fixed and has been verified and fixed locally.
GHSA-x6ww-pf9m-m73m

Description

Added a security function to fix a security issue caused by directly using extractall in zip_file.extractall(output_dir).
This has been verified locally and does not affect previous functionality.
image
image
Figure 1: Malicious zip file loaded, Figure 2: Normal zip file loaded

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
  • In-line docstrings updated.
  • Documentation updated, tested make html command in the docs/ folder.

h3rrr and others added 11 commits August 27, 2025 10:52
Signed-off-by: h3rrr <[email protected]>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: h3rrr <[email protected]>
Co-authored-by: Eric Kerfoot <[email protected]>
Signed-off-by: h3rrr <[email protected]>
Path traversal security issue fix Zip Slip

Signed-off-by: h3rrr <[email protected]>
Signed-off-by: h3rrr <[email protected]>
Changed to previous content, the fix will be filed in a new PR

Signed-off-by: h3rrr <[email protected]>
The path traversal issue in the previously submitted vulnerability report has been fixed and has been verified and fixed locally.

Signed-off-by: h3rrr <[email protected]>
Copy link
Contributor

coderabbitai bot commented Sep 12, 2025

Walkthrough

Adds SECURITY.md with reporting, coordinated disclosure, and acknowledgements. Updates monai/apps/utils.py to introduce an internal safe_extract_member helper and replace archive-wide extractall with per-member, context-managed ZIP and TAR extraction that normalizes paths, disallows absolute or parent-traversal entries, and blocks symlinks/hardlinks while streaming files to disk. Adds tests in tests/apps/test_download_and_extract.py (TestPathTraversalProtection) covering valid extraction and malicious ZIP/TAR cases: path traversal, absolute paths, symlinks, and hardlinks. Public APIs unchanged.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The description contains a useful narrative, a "Description" section, and the Types of changes checklist, but it does not fill the template's required top-line "Fixes # ." field and leaves checklist items (notably "New tests added") unchecked despite tests being present in the changeset, so it does not fully comply with the repository template. The security advisory link and images are helpful, but the missing "Fixes" reference and unchecked test status are required template omissions that should be corrected. Update the PR to populate the "Fixes #" line (for example reference GHSA-x6ww-pf9m-m73m or the relevant issue number), mark the Types of changes checklist to reflect that new tests were added and their test-run status, and add a short sentence listing which tests were added and what they validate.
Docstring Coverage ⚠️ Warning Docstring coverage is 71.43% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title "Path traversal issue security fix" is concise and accurately describes the primary change—fixing a path traversal/security issue in archive extraction; it focuses on the main intent without noisy file lists or emojis. It is short, relevant to the changeset (code + tests), and readable to reviewers scanning PR history.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
monai/apps/utils.py (1)

123-147: Consider custom exception types for better error handling.

The current ValueError usage works but custom exceptions would improve error categorization.

class ArchiveSecurityError(Exception):
    """Raised when archive extraction poses security risks."""
    pass

def safe_extract_member(member, extract_to):
    """Securely verify compressed package member paths to prevent path traversal attacks"""
    # Get member path (handle different compression formats)
    if hasattr(member, 'filename'):
        member_path = member.filename  # zipfile
    elif hasattr(member, 'name'):
        member_path = member.name  # tarfile
    else:
        member_path = str(member)

    member_path = os.path.normpath(member_path)

    if os.path.isabs(member_path) or '..' in member_path.split(os.sep):
        raise ArchiveSecurityError(f"Unsafe path detected in archive: {member_path}")

    full_path = os.path.join(extract_to, member_path)
    full_path = os.path.normpath(full_path)

    extract_to_abs = os.path.abspath(extract_to)
    full_path_abs = os.path.abspath(full_path)

    if not (full_path_abs == extract_to_abs or full_path_abs.startswith(extract_to_abs + os.sep)):
        raise ArchiveSecurityError(f"Path traversal attack detected: {member_path}")

    return full_path
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between b5bc69d and f793740.

📒 Files selected for processing (2)
  • SECURITY.md (1 hunks)
  • monai/apps/utils.py (3 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.md

⚙️ CodeRabbit configuration file

Remember that documentation must be updated with the latest information.

Files:

  • SECURITY.md
**/*.py

⚙️ CodeRabbit configuration file

Review the Python code for quality and correctness. Ensure variable names adhere to PEP8 style guides, are sensible and informative in regards to their function, though permitting simple names for loop and comprehension variables. Ensure routine names are meaningful in regards to their function and use verbs, adjectives, and nouns in a semantically appropriate way. Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings. Examine code for logical error or inconsistencies, and suggest what may be changed to addressed these. Suggest any enhancements for code improving efficiency, maintainability, comprehensibility, and correctness. Ensure new or modified definitions will be covered by existing or new unit tests.

Files:

  • monai/apps/utils.py
🪛 Ruff (0.12.2)
monai/apps/utils.py

136-136: Avoid specifying long messages outside the exception class

(TRY003)


145-145: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
  • GitHub Check: min-dep-os (windows-latest)
  • GitHub Check: min-dep-os (ubuntu-latest)
  • GitHub Check: min-dep-py3 (3.9)
  • GitHub Check: min-dep-os (macOS-latest)
  • GitHub Check: min-dep-pytorch (2.8.0)
  • GitHub Check: min-dep-py3 (3.11)
  • GitHub Check: min-dep-py3 (3.10)
  • GitHub Check: min-dep-pytorch (2.6.0)
  • GitHub Check: min-dep-pytorch (2.5.1)
  • GitHub Check: min-dep-py3 (3.12)
  • GitHub Check: min-dep-pytorch (2.7.1)
  • GitHub Check: flake8-py3 (mypy)
  • GitHub Check: build-docs
  • GitHub Check: flake8-py3 (codeformat)
  • GitHub Check: flake8-py3 (pytype)
  • GitHub Check: quick-py3 (windows-latest)
  • GitHub Check: quick-py3 (ubuntu-latest)
  • GitHub Check: quick-py3 (macOS-latest)
  • GitHub Check: packaging
🔇 Additional comments (5)
SECURITY.md (1)

1-19: Good security policy documentation.

Clear structure and appropriate content for vulnerability reporting. Documentation follows best practices for coordinated disclosure.

monai/apps/utils.py (4)

14-15: Import order updated appropriately.

Adding os and shutil imports to support the new path traversal security fixes.


123-147: Excellent path traversal prevention implementation.

Robust security function properly validates archive member paths. Handles both zipfile and tarfile formats correctly. Path normalization and traversal checks are comprehensive.


314-323: Safe ZIP extraction implementation correctly replaces vulnerable extractall.

Secure per-member extraction prevents path traversal attacks. Directory creation and file streaming implementation is correct.


325-336: Safe TAR extraction implementation correctly replaces vulnerable extractall.

Secure per-member extraction with proper file-only filtering. Correct use of extractfile and streaming copy to prevent directory traversal attacks.

@h3rrr h3rrr mentioned this pull request Sep 12, 2025
@ericspod
Copy link
Member

This PR includes the security file as well but this should go away once we merge the other PR. This happened because you made your h3rr-patch-1 branch from the dev branch you were using with the previous. No worries, this will work out eventually.

@h3rrr
Copy link
Contributor Author

h3rrr commented Sep 12, 2025

Hi, @ericspod the pull request for SECURITY.md mentioned the need for testing.
I've already tested this locally, and the image above shows the results.
You can see that when a zip file contains malicious content, an error is thrown.
Normal zip files, however, are successfully extracted, with the same results as before.

@ericspod
Copy link
Member

Hi, @ericspod the pull request for SECURITY.md mentioned the need for testing. I've already tested this locally, and the image above shows the results. You can see that when a zip file contains malicious content, an error is thrown. Normal zip files, however, are successfully extracted, with the same results as before.

What I mean to say is that we need unit tests added to the tests directory which will demonstrate this is the case for a valid and a malicious zip file. You can find tests for download_and_extract here. I suggest we add two more tests in a new class in this file, one constructs a valid zip file in a temporary directory and shows extractall does not raise an exception, and a second with a maliciously defined zip file which does raise an exception as expected. This should suffice to capture the condition we're trying to cover with this change. You can use the tempfile and zipfile standard libraries to do this.

@ericspod ericspod mentioned this pull request Sep 12, 2025
46 tasks
h3rrr and others added 4 commits September 12, 2025 22:57
Sorry... I chose the wrong branch before. I added some anti-bypass measures to make the decompression operation safer.

Signed-off-by: h3rrr <[email protected]>
New test cases added as required

Signed-off-by: h3rrr <[email protected]>
@h3rrr
Copy link
Contributor Author

h3rrr commented Sep 12, 2025

Hi, @ericspod the pull request for SECURITY.md mentioned the need for testing. I've already tested this locally, and the image above shows the results. You can see that when a zip file contains malicious content, an error is thrown. Normal zip files, however, are successfully extracted, with the same results as before.你好,关于 SECURITY.md 中提到的拉取请求,提到了需要测试。我已经在本地上进行了测试,上面的图片显示了结果。你可以看到,当 zip 文件包含恶意内容时,会抛出错误。然而,正常的 zip 文件则成功提取,结果与之前相同。

What I mean to say is that we need unit tests added to the tests directory which will demonstrate this is the case for a valid and a malicious zip file. You can find tests for download_and_extract here. I suggest we add two more tests in a new class in this file, one constructs a valid zip file in a temporary directory and shows extractall does not raise an exception, and a second with a maliciously defined zip file which does raise an exception as expected. This should suffice to capture the condition we're trying to cover with this change. You can use the tempfile and zipfile standard libraries to do this.我的意思是,我们需要在 tests 目录中添加单元测试,以证明对于有效和恶意的 zip 文件,情况都是如此。你可以在这里的 download_and_extract 中找到测试。我建议我们在这个文件中添加两个新的测试,在一个新类中,一个在临时目录中构建一个有效的 zip 文件并显示 extractall 不会抛出异常,另一个使用恶意定义的 zip 文件,它确实会抛出异常,正如预期的那样。这应该足以捕捉我们通过这个更改想要覆盖的条件。你可以使用 tempfilezipfile 标准库来完成这个任务。

I understand what you mean. In the latest pull request, I fixed some potential exploits and added the corresponding testing module.
It seems the first submission was submitted as a previous pull request... Sorry.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (9)
tests/apps/test_download_and_extract.py (5)

88-103: Avoid broad exception catch in success-path test (Ruff BLE001).

Let the test fail naturally; no need for try/except.

-            try:
-                extractall(str(zip_path), str(extract_dir))
+            extractall(str(zip_path), str(extract_dir))
@@
-            except Exception as e:
-                self.fail(f"Valid zip extraction should not raise exception: {e}")

143-157: Avoid broad exception catch in success-path test (Ruff BLE001).

Same here; remove try/except.

-            try:
-                extractall(str(tar_path), str(extract_dir))
+            extractall(str(tar_path), str(extract_dir))
@@
-            except Exception as e:
-                self.fail(f"Valid tar extraction should not raise exception: {e}")

219-224: Make assertion resilient to message variants.

Current code raises “Symbolic link …”; test looks for “symlink”. Either keep as-is if you change code to include “symlink” (see utils.py comment), or loosen the check to accept both.

-            error_msg = str(context.exception).lower()
-            self.assertTrue("unsafe path" in error_msg or "symlink" in error_msg)
+            error_msg = str(context.exception).lower()
+            self.assertTrue(
+                any(x in error_msg for x in ("unsafe path", "symlink", "symbolic link"))
+            )

245-249: Same resilience for hard link message.

Match “hardlink” or “hard link”.

-            error_msg = str(context.exception).lower()
-            self.assertTrue("unsafe path" in error_msg or "hardlink" in error_msg)
+            error_msg = str(context.exception).lower()
+            self.assertTrue(
+                any(x in error_msg for x in ("unsafe path", "hardlink", "hard link"))
+            )

71-199: Optional: add a ZIP symlink test for parity with TAR cases.

ZIP symlinks can be represented via external_attr bits; add a test to ensure they’re rejected or handled consistently. I can open a follow-up PR if preferred.

def test_malicious_zip_symlink_protection(self):
    import stat
    with tempfile.TemporaryDirectory() as tmp_dir:
        zip_path = Path(tmp_dir) / "malicious_symlink.zip"
        extract_dir = Path(tmp_dir) / "extract"
        extract_dir.mkdir()
        with zipfile.ZipFile(zip_path, "w") as zf:
            zi = zipfile.ZipInfo("link_to_outside")
            zi.create_system = 3  # UNIX
            zi.external_attr = (stat.S_IFLNK | 0o777) << 16
            zf.writestr(zi, "../../../etc/passwd")
        with self.assertRaises(ValueError) as ctx:
            extractall(str(zip_path), str(extract_dir))
        self.assertIn("symlink", str(ctx.exception).lower())
monai/apps/utils.py (4)

140-151: Harden path check using commonpath.

startswith is okay; os.path.commonpath is more robust.

-    if not (full_path_abs == extract_to_abs or full_path_abs.startswith(extract_to_abs + os.sep)):
-        raise ValueError(f"Path traversal attack detected: {member_path}")
+    if os.path.commonpath([extract_to_abs, full_path_abs]) != extract_to_abs:
+        raise ValueError(f"Unsafe path: traversal {member_path}")

123-152: Docstring: add Args/Raises per guidelines.

Enhance discoverability and consistency with project style.

-def safe_extract_member(member, extract_to):
-    """Securely verify compressed package member paths to prevent path traversal attacks"""
+def safe_extract_member(member, extract_to):
+    """Validate and resolve a safe extraction path for an archive member.
+
+    Args:
+        member: ZipInfo or TarInfo (or path-like name) representing the archive entry.
+        extract_to: Directory where the member should be extracted.
+
+    Returns:
+        str: A normalized path under `extract_to` where the member may be safely written.
+
+    Raises:
+        ValueError: If the member is a symlink/hardlink, is absolute, attempts traversal,
+            or would resolve outside of `extract_to`.
+    """

14-25: Optional: detect ZIP symlinks via external_attr.

ZIP symlinks aren’t caught by TarInfo checks. If desired, block them explicitly.

 import os
 import shutil
+import stat
@@
 def safe_extract_member(member, extract_to):
@@
-    else:
-        member_path = str(member)
+    else:
+        member_path = str(member)
+
+    # ZIP symlink detection via UNIX mode bits
+    if hasattr(member, "external_attr"):
+        mode = (member.external_attr >> 16) & 0xFFFF
+        if stat.S_IFMT(mode) == stat.S_IFLNK:
+            raise ValueError(f"Unsafe path: symlink {member_path}")

319-329: ZIP extraction OK; consider preserving permissions (optional).

Current code writes bytes but drops exec bits/timestamps. If that matters, apply mode and mtime from metadata.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between f793740 and 7a9f479.

📒 Files selected for processing (2)
  • monai/apps/utils.py (3 hunks)
  • tests/apps/test_download_and_extract.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

Review the Python code for quality and correctness. Ensure variable names adhere to PEP8 style guides, are sensible and informative in regards to their function, though permitting simple names for loop and comprehension variables. Ensure routine names are meaningful in regards to their function and use verbs, adjectives, and nouns in a semantically appropriate way. Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings. Examine code for logical error or inconsistencies, and suggest what may be changed to addressed these. Suggest any enhancements for code improving efficiency, maintainability, comprehensibility, and correctness. Ensure new or modified definitions will be covered by existing or new unit tests.

Files:

  • tests/apps/test_download_and_extract.py
  • monai/apps/utils.py
🪛 Ruff (0.12.2)
tests/apps/test_download_and_extract.py

101-101: Do not catch blind exception: Exception

(BLE001)


155-155: Do not catch blind exception: Exception

(BLE001)

monai/apps/utils.py

134-134: Avoid specifying long messages outside the exception class

(TRY003)


136-136: Avoid specifying long messages outside the exception class

(TRY003)


141-141: Avoid specifying long messages outside the exception class

(TRY003)


150-150: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
  • GitHub Check: min-dep-pytorch (2.8.0)
  • GitHub Check: min-dep-pytorch (2.6.0)
  • GitHub Check: min-dep-pytorch (2.5.1)
  • GitHub Check: min-dep-py3 (3.10)
  • GitHub Check: min-dep-pytorch (2.7.1)
  • GitHub Check: min-dep-py3 (3.12)
  • GitHub Check: min-dep-py3 (3.11)
  • GitHub Check: min-dep-py3 (3.9)
  • GitHub Check: min-dep-os (ubuntu-latest)
  • GitHub Check: min-dep-os (windows-latest)
  • GitHub Check: quick-py3 (macOS-latest)
  • GitHub Check: flake8-py3 (codeformat)
  • GitHub Check: quick-py3 (windows-latest)
  • GitHub Check: flake8-py3 (pytype)
  • GitHub Check: build-docs
  • GitHub Check: flake8-py3 (mypy)
  • GitHub Check: quick-py3 (ubuntu-latest)
  • GitHub Check: packaging
🔇 Additional comments (1)
monai/apps/utils.py (1)

331-343: tar.gz extraction is broken: use transparent read mode.

tarfile.open(..., 'r') won't read gzip members; tests using .tar.gz will fail. Use r:*.

-        with tarfile.open(filepath, 'r') as tar_file:
+        with tarfile.open(filepath, 'r:*') as tar_file:

Likely an incorrect or invalid review comment.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: h3rrr <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
monai/apps/utils.py (1)

133-136: Exception messages now match prior review guidance. LGTM.

“Unsafe path: symlink … / hardlink …” aligns with tests.

🧹 Nitpick comments (5)
monai/apps/utils.py (5)

123-151: Path traversal mitigation is solid; add types + full docstring.

Give this helper a typed signature and a Google-style docstring per guidelines.

-def safe_extract_member(member, extract_to):
-    """Securely verify compressed package member paths to prevent path traversal attacks"""
+def safe_extract_member(member: Any, extract_to: PathLike) -> str:
+    """Validate an archive member path and return a safe extraction target.
+
+    Args:
+        member: A member object from zipfile (ZipInfo) or tarfile (TarInfo).
+        extract_to: Directory to extract into.
+
+    Returns:
+        The normalized safe filesystem path for this member inside extract_to.
+
+    Raises:
+        ValueError: If the member is a symlink/hardlink, absolute path, contains
+            parent traversal, or resolves outside extract_to.
+    """

133-136: Keep exact messages; silence Ruff TRY003 inline.

Tests likely depend on these messages; add noqa rather than changing text.

-    if hasattr(member, 'issym') and member.issym():
-        raise ValueError(f"Unsafe path: symlink {member_path}")
-    if hasattr(member, 'islnk') and member.islnk():
-        raise ValueError(f"Unsafe path: hardlink {member_path}")
+    if hasattr(member, 'issym') and member.issym():
+        raise ValueError(f"Unsafe path: symlink {member_path}")  # noqa: TRY003
+    if hasattr(member, 'islnk') and member.islnk():
+        raise ValueError(f"Unsafe path: hardlink {member_path}")  # noqa: TRY003
@@
-    if os.path.isabs(member_path) or '..' in member_path.split(os.sep):
-        raise ValueError(f"Unsafe path detected in archive: {member_path}")
+    if os.path.isabs(member_path) or '..' in member_path.split(os.sep):
+        raise ValueError(f"Unsafe path detected in archive: {member_path}")  # noqa: TRY003
@@
-    if not (full_path_abs == extract_to_abs or full_path_abs.startswith(extract_to_abs + os.sep)):
-        raise ValueError(f"Path traversal attack detected: {member_path}")
+    if not (full_path_abs == extract_to_abs or full_path_abs.startswith(extract_to_abs + os.sep)):
+        raise ValueError(f"Path traversal attack detected: {member_path}")  # noqa: TRY003

Also applies to: 139-141, 148-149


126-137: Optionally detect ZIP symlinks too.

Zip symlinks aren’t caught by issym/islnk; check external_attr to reject them early.

     if hasattr(member, 'filename'):
         member_path = member.filename  # zipfile
@@
-    if hasattr(member, 'issym') and member.issym():
+    # ZIP symlink detection (posix zips set S_IFLNK in external_attr)
+    if isinstance(member, zipfile.ZipInfo):
+        import stat  # local import to avoid global dependency if unused
+        if member.create_system == 3 and stat.S_ISLNK(member.external_attr >> 16):
+            raise ValueError(f"Unsafe path: symlink {member_path}")  # noqa: TRY003
+
+    if hasattr(member, 'issym') and member.issym():
         raise ValueError(f"Unsafe path: symlink {member_path}")

Note: hardlinks don’t exist in ZIP; this keeps behavior consistent while tightening checks.


330-342: TAR extraction: use r: and restore metadata.*

Be explicit about compression auto-detect and preserve mode/mtime.

-        with tarfile.open(filepath, 'r') as tar_file:
+        with tarfile.open(filepath, 'r:*') as tar_file:
@@
                         with open(safe_path, 'wb') as target:
                             shutil.copyfileobj(source, target)
+                        # restore basic metadata
+                        try:
+                            os.chmod(safe_path, member.mode & 0o777)
+                            os.utime(safe_path, (member.mtime, member.mtime))
+                        except Exception:
+                            pass

318-328: ZIP extraction: preserve perms/mtime to avoid subtle regressions.

Restore basic metadata (unix permission bits and mtime) after writing each zip member so extracted files keep executable bits/timestamps.

Apply the patch at monai/apps/utils.py (around lines 318–328):

                 with zip_file.open(member) as source:
                     with open(safe_path, 'wb') as target:
                         shutil.copyfileobj(source, target)
+                # restore basic metadata
+                perms = (member.external_attr >> 16) & 0o777
+                if perms:
+                    os.chmod(safe_path, perms)
+                from datetime import datetime
+                try:
+                    mtime = datetime(*member.date_time).timestamp()
+                    os.utime(safe_path, (mtime, mtime))
+                except Exception:
+                    pass

Also update monai/bundle/scripts.py (uses ZipFile.extractall at ~lines 290–293) to either call this helper or apply equivalent metadata restoration. Run tests that exercise download_and_extract/extractall (tests/apps/test_download_and_extract.py and related integration tests) after the change.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 7a9f479 and d07a269.

📒 Files selected for processing (1)
  • monai/apps/utils.py (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

Review the Python code for quality and correctness. Ensure variable names adhere to PEP8 style guides, are sensible and informative in regards to their function, though permitting simple names for loop and comprehension variables. Ensure routine names are meaningful in regards to their function and use verbs, adjectives, and nouns in a semantically appropriate way. Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings. Examine code for logical error or inconsistencies, and suggest what may be changed to addressed these. Suggest any enhancements for code improving efficiency, maintainability, comprehensibility, and correctness. Ensure new or modified definitions will be covered by existing or new unit tests.

Files:

  • monai/apps/utils.py
🪛 Ruff (0.12.2)
monai/apps/utils.py

134-134: Avoid specifying long messages outside the exception class

(TRY003)


136-136: Avoid specifying long messages outside the exception class

(TRY003)


140-140: Avoid specifying long messages outside the exception class

(TRY003)


149-149: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (1)
monai/apps/utils.py (1)

14-15: Imports are appropriate.

Adding os and shutil is necessary for the secure extraction logic.


with zipfile.ZipFile(zip_path, 'w') as zf:
# Try to use absolute path
zf.writestr("/etc/passwd", "malicious content")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests all look good, but please change this to be something other than "/etc/passwd" just in case someone somehow overwrites that file. Perhaps use "/etc/passwd_bad" here and elsewhere instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@ericspod
Copy link
Member

This looks good now, just run ./runtests.sh --autofix from the root MONAI directory to fix the failing test and we should be good to go.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
monai/apps/utils.py (1)

135-139: Standardize messages to include “unsafe path”; silence TRY003 locally.

Aligns with tests and prior feedback; avoids linter noise.

-    if hasattr(member, "issym") and member.issym():
-        raise ValueError(f"Symbolic link detected in archive: {member_path}")
-    if hasattr(member, "islnk") and member.islnk():
-        raise ValueError(f"Hard link detected in archive: {member_path}")
+    if hasattr(member, "issym") and member.issym():
+        raise ValueError(f"Unsafe path: symlink {member_path}")  # noqa: TRY003
+    if hasattr(member, "islnk") and member.islnk():
+        raise ValueError(f"Unsafe path: hardlink {member_path}")  # noqa: TRY003
🧹 Nitpick comments (6)
monai/apps/utils.py (4)

142-144: Unify absolute/parent check message and quiet TRY003.

Keep the check; just normalize the wording.

-    if os.path.isabs(member_path) or ".." in member_path.split(os.sep):
-        raise ValueError(f"Unsafe path detected in archive: {member_path}")
+    if os.path.isabs(member_path) or ".." in member_path.split(os.sep):
+        raise ValueError(f"Unsafe path: {member_path}")  # noqa: TRY003

125-131: Docstring + typing: clarify contract of the helper.

Add types and Google-style sections per repo guidelines.

-def safe_extract_member(member, extract_to):
-    """Securely verify compressed package member paths to prevent path traversal attacks"""
+def safe_extract_member(member: Any, extract_to: PathLike) -> str:
+    """
+    Securely validate an archive member's target path and return a safe path within `extract_to`.
+
+    Args:
+        member: zipfile.ZipInfo or tarfile.TarInfo describing the entry.
+        extract_to: Destination directory for extraction.
+
+    Returns:
+        The validated filesystem path for this member within `extract_to`.
+
+    Raises:
+        ValueError: If the member is a symlink/hardlink or resolves outside `extract_to`.
+    """

322-332: ZIP extraction looks good; consider a lightweight zip-bomb guard.

Optional: cap total uncompressed bytes per archive.

-        with zipfile.ZipFile(filepath, "r") as zip_file:
-            for member in zip_file.infolist():
+        with zipfile.ZipFile(filepath, "r") as zip_file:
+            total_uncompressed = 0
+            for member in zip_file.infolist():
                 safe_path = safe_extract_member(member, output_dir)
                 if member.is_dir():
                     continue
-
+                total_uncompressed += getattr(member, "file_size", 0)
+                if total_uncompressed > MAX_TOTAL_UNCOMPRESSED:  # define at module level
+                    raise ValueError("Unsafe path: archive too large")  # noqa: TRY003
                 os.makedirs(os.path.dirname(safe_path), exist_ok=True)
                 with zip_file.open(member) as source:
                     with open(safe_path, "wb") as target:
                         shutil.copyfileobj(source, target)

Add near the top of this module (outside this hunk):

# Max total uncompressed bytes allowed when extracting archives (defense-in-depth)
MAX_TOTAL_UNCOMPRESSED = 2 * 1024 * 1024 * 1024  # 2 GiB

334-346: Open tar with auto-detected compression.

Be explicit; improves readability and handles gz/xz transparently.

-        with tarfile.open(filepath, "r") as tar_file:
+        with tarfile.open(filepath, "r:*") as tar_file:
tests/apps/test_download_and_extract.py (2)

74-103: Drop broad exception catch; let the test fail naturally (fixes BLE001).

Catching Exception isn’t needed; if it raises, the test already fails.

-            # This should not raise any exception
-            try:
-                extractall(str(zip_path), str(extract_dir))
-
-                # Verify files were extracted correctly
-                self.assertTrue((extract_dir / "normal_file.txt").exists())
-                self.assertTrue((extract_dir / "subdir" / "nested_file.txt").exists())
-                self.assertTrue((extract_dir / "another_file.json").exists())
-
-                # Verify content
-                with open(extract_dir / "normal_file.txt") as f:
-                    self.assertEqual(f.read(), "This is a normal file")
-
-            except Exception as e:
-                self.fail(f"Valid zip extraction should not raise exception: {e}")
+            extractall(str(zip_path), str(extract_dir))
+            # Verify files were extracted correctly
+            self.assertTrue((extract_dir / "normal_file.txt").exists())
+            self.assertTrue((extract_dir / "subdir" / "nested_file.txt").exists())
+            self.assertTrue((extract_dir / "another_file.json").exists())
+            # Verify content
+            with open(extract_dir / "normal_file.txt") as f:
+                self.assertEqual(f.read(), "This is a normal file")

124-157: Same here: remove blanket catch in the TAR happy-path test.

Keeps the test simpler and fixes BLE001.

-            # This should not raise any exception
-            try:
-                extractall(str(tar_path), str(extract_dir))
-
-                # Verify files were extracted correctly
-                self.assertTrue((extract_dir / "normal_file.txt").exists())
-                self.assertTrue((extract_dir / "subdir" / "nested_file.txt").exists())
-
-                # Verify content
-                with open(extract_dir / "normal_file.txt") as f:
-                    self.assertEqual(f.read(), "This is a normal file")
-
-            except Exception as e:
-                self.fail(f"Valid tar extraction should not raise exception: {e}")
+            extractall(str(tar_path), str(extract_dir))
+            # Verify files were extracted correctly
+            self.assertTrue((extract_dir / "normal_file.txt").exists())
+            self.assertTrue((extract_dir / "subdir" / "nested_file.txt").exists())
+            # Verify content
+            with open(extract_dir / "normal_file.txt") as f:
+                self.assertEqual(f.read(), "This is a normal file")
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between d07a269 and e204524.

📒 Files selected for processing (2)
  • monai/apps/utils.py (2 hunks)
  • tests/apps/test_download_and_extract.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

Review the Python code for quality and correctness. Ensure variable names adhere to PEP8 style guides, are sensible and informative in regards to their function, though permitting simple names for loop and comprehension variables. Ensure routine names are meaningful in regards to their function and use verbs, adjectives, and nouns in a semantically appropriate way. Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings. Examine code for logical error or inconsistencies, and suggest what may be changed to addressed these. Suggest any enhancements for code improving efficiency, maintainability, comprehensibility, and correctness. Ensure new or modified definitions will be covered by existing or new unit tests.

Files:

  • tests/apps/test_download_and_extract.py
  • monai/apps/utils.py
🪛 Ruff (0.12.2)
tests/apps/test_download_and_extract.py

101-101: Do not catch blind exception: Exception

(BLE001)


155-155: Do not catch blind exception: Exception

(BLE001)

monai/apps/utils.py

136-136: Avoid specifying long messages outside the exception class

(TRY003)


138-138: Avoid specifying long messages outside the exception class

(TRY003)


143-143: Avoid specifying long messages outside the exception class

(TRY003)


152-152: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
  • GitHub Check: min-dep-py3 (3.10)
  • GitHub Check: min-dep-pytorch (2.6.0)
  • GitHub Check: min-dep-pytorch (2.5.1)
  • GitHub Check: min-dep-pytorch (2.7.1)
  • GitHub Check: min-dep-py3 (3.12)
  • GitHub Check: min-dep-pytorch (2.8.0)
  • GitHub Check: min-dep-py3 (3.11)
  • GitHub Check: min-dep-py3 (3.9)
  • GitHub Check: min-dep-os (macOS-latest)
  • GitHub Check: min-dep-os (windows-latest)
  • GitHub Check: min-dep-os (ubuntu-latest)
  • GitHub Check: quick-py3 (macOS-latest)
  • GitHub Check: flake8-py3 (mypy)
  • GitHub Check: quick-py3 (ubuntu-latest)
  • GitHub Check: flake8-py3 (codeformat)
  • GitHub Check: flake8-py3 (pytype)
  • GitHub Check: packaging
  • GitHub Check: quick-py3 (windows-latest)
  • GitHub Check: build-docs
🔇 Additional comments (3)
tests/apps/test_download_and_extract.py (3)

181-198: Good change using “/etc/passwd_bad”.

Safer test target; aligns with review feedback.


199-224: Symlink protection test LGTM.

Covers the critical case; message assertion is flexible.


225-250: Hardlink protection test LGTM.

Completes coverage with links.

h3rrr and others added 2 commits September 13, 2025 12:37
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: h3rrr <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
monai/apps/utils.py (2)

135-139: Standardize messages and detect ZIP symlinks (tests currently fail).

Messages must include “Unsafe path:” and explicit link type; also detect symlinks in ZIP via external_attr. This aligns with tests and hardens ZIP handling.

-    if hasattr(member, "issym") and member.issym():
-        raise ValueError(f"Symbolic link detected in archive: {member_path}")
-    if hasattr(member, "islnk") and member.islnk():
-        raise ValueError(f"Hard link detected in archive: {member_path}")
+    # tar: symbolic/hard links
+    if hasattr(member, "issym") and member.issym():
+        raise ValueError(f"Unsafe path: symlink {member_path}")  # noqa: TRY003
+    if hasattr(member, "islnk") and member.islnk():
+        raise ValueError(f"Unsafe path: hardlink {member_path}")  # noqa: TRY003
+    # zip: symlink via external attributes (UNIX)
+    if hasattr(member, "external_attr"):
+        mode = (member.external_attr >> 16) & 0xFFFF
+        try:
+            import stat  # local import to avoid top-level dependency where not needed
+            if stat.S_ISLNK(mode):
+                raise ValueError(f"Unsafe path: symlink {member_path}")  # noqa: TRY003
+        except Exception:
+            pass

142-144: Unify path-traversal error text to match tests.

Use the same “Unsafe path: path traversal …” message for both checks.

-    if os.path.isabs(member_path) or ".." in member_path.split(os.sep):
-        raise ValueError(f"Unsafe path detected in archive: {member_path}")
+    if os.path.isabs(member_path) or ".." in member_path.split(os.sep):
+        raise ValueError(f"Unsafe path: path traversal {member_path}")  # noqa: TRY003
@@
-    if os.path.commonpath([extract_root, target_real]) != extract_root:
-        raise ValueError(f"Unsafe path: path traversal {member_path}")
+    if os.path.commonpath([extract_root, target_real]) != extract_root:
+        raise ValueError(f"Unsafe path: path traversal {member_path}")  # noqa: TRY003

Also applies to: 151-152

🧹 Nitpick comments (4)
monai/apps/utils.py (4)

154-154: Return the canonical path.

Return target_real to reflect the resolved path you validated.

-    return full_path
+    return target_real

125-127: Docstring: expand to Google style with Args/Returns/Raises.

Improves maintainability and meets repo guidelines.

-def safe_extract_member(member, extract_to):
-    """Securely verify compressed package member paths to prevent path traversal attacks"""
+def safe_extract_member(member, extract_to):
+    """Resolve and validate a member's extraction path to prevent path traversal.
+
+    Args:
+        member: A zipfile.ZipInfo or tarfile.TarInfo-like object describing an archive entry.
+        extract_to: Destination root directory for extraction.
+
+    Returns:
+        str: Canonical absolute path within ``extract_to`` where the member should be written.
+
+    Raises:
+        ValueError: If the member is a symlink/hardlink, has an absolute path, or escapes ``extract_to``.
+    """

321-331: Mitigate TOCTOU: use a safe, exclusive writer (no symlink follow) for files.

Reduces symlink race risk on the leaf path and prevents clobbering existing files.

-                with zip_file.open(member) as source:
-                    with open(safe_path, "wb") as target:
-                        shutil.copyfileobj(source, target)
+                with zip_file.open(member) as source:
+                    with _open_for_write_safely(safe_path) as target:
+                        shutil.copyfileobj(source, target)

Add this helper once at module scope:

def _open_for_write_safely(path: str):
    """
    Create a new file exclusively and avoid following final-path symlinks where supported.
    """
    import os
    if os.name == "posix":
        flags = os.O_WRONLY | os.O_CREAT | os.O_EXCL
        if hasattr(os, "O_NOFOLLOW"):
            flags |= os.O_NOFOLLOW
        fd = os.open(path, flags, 0o644)
        return os.fdopen(fd, "wb")
    # Fallback for platforms without O_NOFOLLOW
    return open(path, "xb")

333-345: Same safe writer for TAR; and be explicit with mode.

Use the safe writer; consider r:* for clarity.

-        with tarfile.open(filepath, "r") as tar_file:
+        with tarfile.open(filepath, "r:*") as tar_file:
@@
-                if source is not None:
-                    with source:
-                        with open(safe_path, "wb") as target:
-                            shutil.copyfileobj(source, target)
+                if source is not None:
+                    with source:
+                        with _open_for_write_safely(safe_path) as target:
+                            shutil.copyfileobj(source, target)
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between e204524 and e432609.

📒 Files selected for processing (1)
  • monai/apps/utils.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

Review the Python code for quality and correctness. Ensure variable names adhere to PEP8 style guides, are sensible and informative in regards to their function, though permitting simple names for loop and comprehension variables. Ensure routine names are meaningful in regards to their function and use verbs, adjectives, and nouns in a semantically appropriate way. Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings. Examine code for logical error or inconsistencies, and suggest what may be changed to addressed these. Suggest any enhancements for code improving efficiency, maintainability, comprehensibility, and correctness. Ensure new or modified definitions will be covered by existing or new unit tests.

Files:

  • monai/apps/utils.py
🪛 Ruff (0.12.2)
monai/apps/utils.py

136-136: Avoid specifying long messages outside the exception class

(TRY003)


138-138: Avoid specifying long messages outside the exception class

(TRY003)


143-143: Avoid specifying long messages outside the exception class

(TRY003)


152-152: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
  • GitHub Check: min-dep-os (ubuntu-latest)
  • GitHub Check: min-dep-pytorch (2.6.0)
  • GitHub Check: min-dep-pytorch (2.8.0)
  • GitHub Check: min-dep-py3 (3.10)
  • GitHub Check: min-dep-py3 (3.12)
  • GitHub Check: min-dep-py3 (3.11)
  • GitHub Check: min-dep-pytorch (2.7.1)
  • GitHub Check: min-dep-pytorch (2.5.1)
  • GitHub Check: min-dep-py3 (3.9)
  • GitHub Check: min-dep-os (windows-latest)
  • GitHub Check: min-dep-os (macOS-latest)
  • GitHub Check: quick-py3 (windows-latest)
  • GitHub Check: flake8-py3 (pytype)
  • GitHub Check: packaging
  • GitHub Check: quick-py3 (ubuntu-latest)
  • GitHub Check: build-docs
  • GitHub Check: flake8-py3 (mypy)
  • GitHub Check: quick-py3 (macOS-latest)
  • GitHub Check: flake8-py3 (codeformat)

Signed-off-by: h3rrr <[email protected]>
Signed-off-by: h3rrr <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
monai/apps/utils.py (2)

154-155: Return the canonical path resolved via realpath.

Prevents later consumers from following symlinked parents.

-    return full_path
+    return target_real

135-154: Unify error messages to "Unsafe path" + link type; silence TRY003.

Matches test expectations and quiets Ruff.

-    if hasattr(member, "issym") and member.issym():
-        raise ValueError(f"Symbolic link detected in archive: {member_path}")
-    if hasattr(member, "islnk") and member.islnk():
-        raise ValueError(f"Hard link detected in archive: {member_path}")
+    if hasattr(member, "issym") and member.issym():
+        raise ValueError(f"Unsafe path: symlink {member_path}")  # noqa: TRY003
+    if hasattr(member, "islnk") and member.islnk():
+        raise ValueError(f"Unsafe path: hardlink {member_path}")  # noqa: TRY003
@@
-    if os.path.isabs(member_path) or ".." in member_path.split(os.sep):
-        raise ValueError(f"Unsafe path detected in archive: {member_path}")
+    if os.path.isabs(member_path) or ".." in member_path.split(os.sep):
+        raise ValueError(f"Unsafe path: path traversal {member_path}")  # noqa: TRY003
@@
-    if os.path.commonpath([extract_root, target_real]) != extract_root:
-        raise ValueError(f"Unsafe path: path traversal {member_path}")
+    if os.path.commonpath([extract_root, target_real]) != extract_root:
+        raise ValueError(f"Unsafe path: path traversal {member_path}")  # noqa: TRY003
🧹 Nitpick comments (1)
monai/apps/utils.py (1)

125-132: Add type hints and a proper Google-style docstring.

Improves clarity and meets guidelines.

-def safe_extract_member(member, extract_to):
-    """Securely verify compressed package member paths to prevent path traversal attacks"""
+def safe_extract_member(member: Any, extract_to: PathLike) -> str:
+    """Validate an archive member and return a safe, canonical extraction path.
+
+    Args:
+        member: ZipInfo or TarInfo-like object for an archive entry.
+        extract_to: Target directory into which the entry should be extracted.
+
+    Returns:
+        Absolute, canonical file path under ``extract_to`` that is safe to write.
+
+    Raises:
+        ValueError: If the entry is a symlink/hardlink or attempts path traversal.
+    """
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between e432609 and fcfa561.

📒 Files selected for processing (1)
  • monai/apps/utils.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

Review the Python code for quality and correctness. Ensure variable names adhere to PEP8 style guides, are sensible and informative in regards to their function, though permitting simple names for loop and comprehension variables. Ensure routine names are meaningful in regards to their function and use verbs, adjectives, and nouns in a semantically appropriate way. Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings. Examine code for logical error or inconsistencies, and suggest what may be changed to addressed these. Suggest any enhancements for code improving efficiency, maintainability, comprehensibility, and correctness. Ensure new or modified definitions will be covered by existing or new unit tests.

Files:

  • monai/apps/utils.py
🪛 Ruff (0.12.2)
monai/apps/utils.py

136-136: Avoid specifying long messages outside the exception class

(TRY003)


138-138: Avoid specifying long messages outside the exception class

(TRY003)


143-143: Avoid specifying long messages outside the exception class

(TRY003)


152-152: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (1)
monai/apps/utils.py (1)

334-334: tar.gz extraction broken: use tarfile.open(..., 'r:*').

Mode "r" won’t read gzip/bzip2/xz tars. Use auto-detect.

-        with tarfile.open(filepath, "r") as tar_file:
+        with tarfile.open(filepath, "r:*") as tar_file:

Likely an incorrect or invalid review comment.

Signed-off-by: h3rrr <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (5)
monai/apps/utils.py (5)

135-139: Zip symlink bypass; unify messages with tests; add noqa.

issym/islnk only cover tar; zip symlinks (external_attr) slip through. Also align error text to “Unsafe path: symlink/hardlink …”.

-    if hasattr(member, "issym") and member.issym():
-        raise ValueError(f"Symbolic link detected in archive: {member_path}")
-    if hasattr(member, "islnk") and member.islnk():
-        raise ValueError(f"Hard link detected in archive: {member_path}")
+    # Reject symlinks and hardlinks (tar)
+    if hasattr(member, "issym") and member.issym():
+        raise ValueError(f"Unsafe path: symlink {member_path}")  # noqa: TRY003
+    if hasattr(member, "islnk") and member.islnk():
+        raise ValueError(f"Unsafe path: hardlink {member_path}")  # noqa: TRY003
+    # Reject symlinks in ZIP (mode is encoded in external_attr)
+    if hasattr(member, "external_attr"):
+        mode = (member.external_attr >> 16)
+        if stat.S_ISLNK(mode):
+            raise ValueError(f"Unsafe path: symlink {member_path}")  # noqa: TRY003

Add import:

# top of file
import stat

142-144: Be explicit: absolute vs traversal; add noqa.

Improves testability and lints.

-    if os.path.isabs(member_path) or ".." in member_path.split(os.sep):
-        raise ValueError(f"Unsafe path detected in archive: {member_path}")
+    if os.path.isabs(member_path):
+        raise ValueError(f"Unsafe path: absolute path {member_path}")  # noqa: TRY003
+    if ".." in member_path.split(os.sep):
+        raise ValueError(f"Unsafe path: path traversal {member_path}")  # noqa: TRY003

277-287: ZIP: create dirs, preserve permissions, and use canonical path.

Restores parity with extractall and avoids losing +x; also handles explicit dir entries.

     with zipfile.ZipFile(filepath, "r") as zip_file:
         for member in zip_file.infolist():
             safe_path = safe_extract_member(member, output_dir)
-            if member.is_dir():
-                continue
-            os.makedirs(os.path.dirname(safe_path), exist_ok=True)
+            if member.is_dir():
+                os.makedirs(safe_path, exist_ok=True)
+                perm = (member.external_attr >> 16) & 0o777
+                if perm:
+                    try:
+                        os.chmod(safe_path, perm)
+                    except OSError:
+                        pass
+                continue
+            os.makedirs(os.path.dirname(safe_path), exist_ok=True)
             with zip_file.open(member) as source:
                 with open(safe_path, "wb") as target:
                     shutil.copyfileobj(source, target)
+            # Preserve file permissions
+            perm = (member.external_attr >> 16) & 0o777
+            if perm:
+                try:
+                    os.chmod(safe_path, perm)
+                except OSError:
+                    pass

289-301: TAR: handle directory entries and preserve permissions.

Matches tarfile.extractall behavior for modes.

     with tarfile.open(filepath, "r") as tar_file:
         for member in tar_file.getmembers():
             safe_path = safe_extract_member(member, output_dir)
-            if not member.isfile():
-                continue
+            if member.isdir():
+                os.makedirs(safe_path, exist_ok=True)
+                perm = member.mode & 0o777
+                if perm:
+                    try:
+                        os.chmod(safe_path, perm)
+                    except OSError:
+                        pass
+                continue
+            if not member.isfile():
+                continue
             os.makedirs(os.path.dirname(safe_path), exist_ok=True)
             source = tar_file.extractfile(member)
             if source is not None:
                 with source:
                     with open(safe_path, "wb") as target:
                         shutil.copyfileobj(source, target)
+                # Preserve file permissions
+                perm = member.mode & 0o777
+                if perm:
+                    try:
+                        os.chmod(safe_path, perm)
+                    except OSError:
+                        pass

154-154: Return canonical path to reduce TOCTOU risk.

Use the resolved path for subsequent file ops.

-    return full_path
+    return target_real

Note: consider using os.open(..., O_NOFOLLOW) on POSIX for the final write to further mitigate races.

🧹 Nitpick comments (2)
monai/apps/utils.py (2)

125-132: Docstring: specify args/returns/raises (Google-style).

Clarify behavior and exceptions raised.

-def safe_extract_member(member, extract_to):
-    """Securely verify compressed package member paths to prevent path traversal attacks"""
+def safe_extract_member(member, extract_to):
+    """Validate an archive member path and return a safe, canonical destination path.
+
+    Args:
+        member: Archive entry (zipfile.ZipInfo or tarfile.TarInfo).
+        extract_to: Root directory under which to extract.
+
+    Returns:
+        str: Canonical absolute path (symlinks resolved) to write the member.
+
+    Raises:
+        ValueError: If the member path is unsafe (absolute path, traversal, symlink, hardlink).
+    """

148-153: Lint and consistency.

Keep the second containment check; add noqa to satisfy Ruff TRY003.

-    if os.path.commonpath([extract_root, target_real]) != extract_root:
-        raise ValueError(f"Unsafe path: path traversal {member_path}")
+    if os.path.commonpath([extract_root, target_real]) != extract_root:
+        raise ValueError(f"Unsafe path: path traversal {member_path}")  # noqa: TRY003
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between fcfa561 and 98e5a0e.

📒 Files selected for processing (1)
  • monai/apps/utils.py (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

Review the Python code for quality and correctness. Ensure variable names adhere to PEP8 style guides, are sensible and informative in regards to their function, though permitting simple names for loop and comprehension variables. Ensure routine names are meaningful in regards to their function and use verbs, adjectives, and nouns in a semantically appropriate way. Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings. Examine code for logical error or inconsistencies, and suggest what may be changed to addressed these. Suggest any enhancements for code improving efficiency, maintainability, comprehensibility, and correctness. Ensure new or modified definitions will be covered by existing or new unit tests.

Files:

  • monai/apps/utils.py
🪛 Ruff (0.12.2)
monai/apps/utils.py

136-136: Avoid specifying long messages outside the exception class

(TRY003)


138-138: Avoid specifying long messages outside the exception class

(TRY003)


143-143: Avoid specifying long messages outside the exception class

(TRY003)


152-152: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
  • GitHub Check: flake8-py3 (mypy)
  • GitHub Check: build-docs
  • GitHub Check: flake8-py3 (codeformat)
  • GitHub Check: quick-py3 (ubuntu-latest)
  • GitHub Check: packaging
  • GitHub Check: quick-py3 (windows-latest)
  • GitHub Check: flake8-py3 (pytype)
  • GitHub Check: quick-py3 (macOS-latest)
  • GitHub Check: min-dep-pytorch (2.6.0)
  • GitHub Check: min-dep-pytorch (2.8.0)
  • GitHub Check: min-dep-os (windows-latest)
  • GitHub Check: min-dep-pytorch (2.7.1)
  • GitHub Check: min-dep-pytorch (2.5.1)
  • GitHub Check: min-dep-os (macOS-latest)
  • GitHub Check: min-dep-py3 (3.12)
  • GitHub Check: min-dep-py3 (3.9)
  • GitHub Check: min-dep-py3 (3.10)
  • GitHub Check: min-dep-os (ubuntu-latest)
  • GitHub Check: min-dep-py3 (3.11)
🔇 Additional comments (1)
monai/apps/utils.py (1)

347-352: LGTM: safer per-member extraction dispatch.

Type routing reads cleanly and keeps public API stable.

Comment on lines +125 to +155
def safe_extract_member(member, extract_to):
"""Securely verify compressed package member paths to prevent path traversal attacks"""
# Get member path (handle different compression formats)
if hasattr(member, "filename"):
member_path = member.filename # zipfile
elif hasattr(member, "name"):
member_path = member.name # tarfile
else:
member_path = str(member)

if hasattr(member, "issym") and member.issym():
raise ValueError(f"Symbolic link detected in archive: {member_path}")
if hasattr(member, "islnk") and member.islnk():
raise ValueError(f"Hard link detected in archive: {member_path}")

member_path = os.path.normpath(member_path)

if os.path.isabs(member_path) or ".." in member_path.split(os.sep):
raise ValueError(f"Unsafe path detected in archive: {member_path}")

full_path = os.path.join(extract_to, member_path)
full_path = os.path.normpath(full_path)

extract_root = os.path.realpath(extract_to)
target_real = os.path.realpath(full_path)
# Ensure the resolved path stays within the extraction root
if os.path.commonpath([extract_root, target_real]) != extract_root:
raise ValueError(f"Unsafe path: path traversal {member_path}")

return full_path

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Quick check: lints and tests.

After applying, run Ruff and the targeted tests.

Also applies to: 277-301


🏁 Script executed:

#!/bin/bash
set -euo pipefail
# Lint the changed file
python -m pip install ruff pytest
ruff check monai/apps/utils.py

# Run only the new extraction tests
pytest -q tests/apps/test_download_and_extract.py -k "PathTraversalProtection or extract"

Length of output: 14852


Fix Ruff errors in monai/apps/utils.py and re-run the targeted tests

  • Ruff failed: 21 issues (1 auto-fixable); ruff exited before pytest ran.
  • High-priority fixes:
    • RUF022: all not sorted (line 45).
    • S310: unsafe URL opens (urlretrieve / urlopen) — lines 115, 119, 247.
    • BLE001: broad except Exception in check_hash (line 180).
    • TRY400 / TRY201: use logging.exception and avoid raise e (lines 121–122, 181).
    • B028: add stacklevel to warnings.warn (line 118).
    • Multiple TRY003 / TRY002: long/generic exception messages and generic Exception re-raises — shorten messages or use specific exception types (lines 136–152, 232, 242, 260, 271, 342, 378, 420).
  • Action: run ruff --fix where appropriate, address the security/broad-except issues, then re-run:
    ruff check monai/apps/utils.py && pytest -q tests/apps/test_download_and_extract.py -k "PathTraversalProtection or extract"
🧰 Tools
🪛 Ruff (0.12.2)

136-136: Avoid specifying long messages outside the exception class

(TRY003)


138-138: Avoid specifying long messages outside the exception class

(TRY003)


143-143: Avoid specifying long messages outside the exception class

(TRY003)


152-152: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
monai/apps/utils.py lines ~45 and 115–420: ruff reported multiple issues — sort
the __all__ export at line ~45 (RUF022); replace unsafe urlretrieve/urlopen
usages at lines ~115, 119, 247 with safer, validated downloads or use requests
with explicit timeouts and URL validation (S310); in check_hash around line 180
narrow the broad "except Exception" to specific exceptions (e.g.,
FileNotFoundError, OSError, hashlib-related errors) and avoid re-raising with
"raise e" (use plain "raise"); replace logging.error + raise with
logging.exception where stack traces are useful (TRY400/TRY201 at ~121–122,
~181); add stacklevel to warnings.warn calls at ~118 (B028); shorten long
generic exception messages and replace generic excepts with specific exception
types at the other listed locations (~136–152, 232, 242, 260, 271, 342, 378,
420) to provide concise messages and proper exception handling. After changes
run ruff --fix for auto-fixable issues and then run: ruff check
monai/apps/utils.py && pytest -q tests/apps/test_download_and_extract.py -k
"PathTraversalProtection or extract" to verify.

@KumoLiu
Copy link
Contributor

KumoLiu commented Sep 17, 2025

/build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants