Skip to content

Conversation

@KRRT7
Copy link
Contributor

@KRRT7 KRRT7 commented May 4, 2025

PR Type

Enhancement, Tests


Description

  • Add SQLite-based test discovery cache

  • Integrate caching into test processing

  • Define cache directory via platformdirs

  • Enhance cleanup for generated and temp files


Changes walkthrough 📝

Relevant files
Enhancement
code_utils.py
Support directory cleanup in cleanup_paths                             

codeflash/code_utils/code_utils.py

  • import shutil for directory handling
  • update cleanup_paths to remove directories recursively
  • +5/-1     
    function_optimizer.py
    Unify cleanup for generated test files                                     

    codeflash/optimization/function_optimizer.py

  • add cleanup_generated_files method
  • use cleanup_paths for cleanup
  • remove manual file deletions
  • +22/-14 
    optimizer.py
    Simplify optimizer cleanup logic                                                 

    codeflash/optimization/optimizer.py

  • import cleanup_paths and get_run_tmp_file
  • remove shutil import and manual cleanup
  • call cleanup_generated_files instead
  • +2/-18   
    Configuration changes
    compat.py
    Configure codeflash cache directory                                           

    codeflash/code_utils/compat.py

  • import user_config_dir from platformdirs
  • add codeflash_cache_dir definition
  • +5/-0     
    Tests
    discover_unit_tests.py
    Implement SQLite-based test discovery cache                           

    codeflash/discovery/discover_unit_tests.py

  • add TestsCache SQLite caching for tests
  • integrate caching in process_test_files
  • compute file hash for cache keys
  • update return types and type hints
  • +149/-67

    Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • @github-actions
    Copy link

    github-actions bot commented May 4, 2025

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Cache duplication

    The cache table never clears existing entries for the same file and hash before inserting, leading to duplicate rows on repeated runs.

    class TestsCache:
        def __init__(self) -> None:
            self.connection = sqlite3.connect(codeflash_cache_dir / "tests_cache.db")
            self.cur = self.connection.cursor()
    
            self.cur.execute(
                """
                CREATE TABLE IF NOT EXISTS discovered_tests(
                    file_path TEXT,
                    file_hash TEXT,
                    qualified_name_with_modules_from_root TEXT,
                    function_name TEXT,
                    test_class TEXT,
                    test_function TEXT,
                    test_type TEXT,
                    line_number INTEGER,
                    col_number INTEGER
                )
                """
    MD5 compatibility

    Using hashlib.md5 with the parameter usedforsecurity=False may not be supported in all Python versions and could raise errors.

    def compute_file_hash(path: str) -> str:
        h = hashlib.md5(usedforsecurity=False)
        with Path(path).open("rb") as f:
            while True:
                chunk = f.read(8192)
                if not chunk:
                    break
                h.update(chunk)
        return h.hexdigest()
    SQLite operations

    Database operations are not wrapped in try/except, so any SQLite error could crash the test discovery process.

    def __init__(self) -> None:
        self.connection = sqlite3.connect(codeflash_cache_dir / "tests_cache.db")
        self.cur = self.connection.cursor()
    
        self.cur.execute(
            """
            CREATE TABLE IF NOT EXISTS discovered_tests(
                file_path TEXT,
                file_hash TEXT,
                qualified_name_with_modules_from_root TEXT,
                function_name TEXT,
                test_class TEXT,
                test_function TEXT,
                test_type TEXT,
                line_number INTEGER,
                col_number INTEGER
            )
            """
        )
        self.cur.execute(
            """
            CREATE INDEX IF NOT EXISTS idx_discovered_tests_file_path_hash
            ON discovered_tests (file_path, file_hash)
            """
        )

    @github-actions
    Copy link

    github-actions bot commented May 4, 2025

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    Security
    Handle symlinks in cleanup_paths

    Avoid following symlinks when deleting directories by explicitly checking for
    symlinks first. This prevents shutil.rmtree from traversing and deleting the target
    of a symlink. Add a branch to unlink symlinks safely.

    codeflash/code_utils/code_utils.py [122-125]

    -if path.is_dir():
    +if path.is_symlink():
    +    path.unlink(missing_ok=True)
    +elif path.is_dir():
         shutil.rmtree(path, ignore_errors=True)
     else:
         path.unlink(missing_ok=True)
    Suggestion importance[1-10]: 7

    __

    Why: The suggestion correctly adds a branch to properly unlink symlinks before deleting directories, improving security by avoiding unintended rmtree traversal.

    Medium
    Possible issue
    Delete old cache rows first

    Prevent duplicate cache entries by clearing any existing rows for the same file and
    hash before inserting new records. This avoids table bloat and ensures cache
    consistency. Execute a DELETE for (file_path, file_hash) at the start of the method.

    codeflash/discovery/discover_unit_tests.py [70-97]

     def insert_test(
         self,
         file_path: str,
         file_hash: str,
         qualified_name_with_modules_from_root: str,
         function_name: str,
         test_class: str,
         test_function: str,
         test_type: TestType,
         line_number: int,
         col_number: int,
     ) -> None:
         test_type_value = test_type.value if hasattr(test_type, "value") else test_type
    +    self.cur.execute(
    +        "DELETE FROM discovered_tests WHERE file_path = ? AND file_hash = ?",
    +        (file_path, file_hash),
    +    )
         self.cur.execute(
             "INSERT INTO discovered_tests VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)",
             (
                 file_path,
                 file_hash,
                 qualified_name_with_modules_from_root,
                 function_name,
                 test_class,
                 test_function,
                 test_type_value,
                 line_number,
                 col_number,
             ),
         )
         self.connection.commit()
    Suggestion importance[1-10]: 6

    __

    Why: Deleting existing rows prevents duplicate entries in the discovered_tests table, ensuring cache consistency and avoiding unbounded growth.

    Low
    General
    Remove unsupported md5 flag

    The usedforsecurity parameter is not supported in older Python versions and can
    cause errors. Drop this argument and use hashlib.md5() directly for compatibility
    across environments. The default MD5 constructor is sufficient here.

    codeflash/discovery/discover_unit_tests.py [118-125]

     @staticmethod
     def compute_file_hash(path: str) -> str:
    -    h = hashlib.md5(usedforsecurity=False)
    +    h = hashlib.md5()
         with Path(path).open("rb") as f:
             while True:
                 chunk = f.read(8192)
                 if not chunk:
                     break
                 h.update(chunk)
         return h.hexdigest()
    Suggestion importance[1-10]: 5

    __

    Why: Dropping the usedforsecurity argument ensures compatibility with older Python versions without affecting the MD5 calculation.

    Low

    @KRRT7 KRRT7 enabled auto-merge May 6, 2025 01:17
    @KRRT7 KRRT7 requested a review from misrasaurabh1 May 6, 2025 01:17
    @codeflash-ai
    Copy link
    Contributor

    codeflash-ai bot commented May 6, 2025

    ⚡️ Codeflash found optimizations for this PR

    📄 17% (0.17x) speedup for ask_for_telemetry in codeflash/cli_cmds/cmd_init.py

    ⏱️ Runtime : 105 microseconds 89.6 microseconds (best of 132 runs)

    I created a new dependent PR with the suggested changes. Please review:

    If you approve, it will be merged into this PR (branch cache_discovered_tests).

    @KRRT7 KRRT7 merged commit 88c671a into main May 7, 2025
    15 checks passed
    @KRRT7 KRRT7 deleted the cache_discovered_tests branch May 13, 2025 22:58
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    4 participants