Skip to content

Conversation

@alvin-r
Copy link
Contributor

@alvin-r alvin-r commented Apr 17, 2025

PR Type

  • Enhancement
  • Tests

Description

  • Add unused definitions removal in code context extraction.

  • Introduce new CST-based module for cleaning unused code.

  • Update context extractor to filter out dead code.

  • Add extensive tests for removal functionality.


Changes walkthrough 📝

Relevant files
Enhancement
import_test.py
Add test function to call main.fetch_and_transform_data. 

code_to_optimize/code_directories/retriever/import_test.py

  • Added import from main module.
  • Added function_to_optimize wrapper.
  • +5/-0     
    code_context_extractor.py
    Integrate unused definitions removal into context extraction.

    codeflash/context/code_context_extractor.py

  • Imported remove_unused_definitions_by_function_names.
  • Integrated removal call in string context functions.
  • Cleaned up code context extraction logic.
  • +9/-1     
    unused_definition_remover.py
    Implement unused definition removal using libcst.               

    codeflash/context/unused_definition_remover.py

  • New module implementing CST-based removal.
  • Added functions to extract names and definitions.
  • Implemented dependency collection and recursive removal.
  • Provided debugging print utility.
  • +476/-0 
    Tests
    test_code_context_extractor.py
    Update tests for code context extraction behavior.             

    tests/test_code_context_extractor.py

  • Updated expected context outputs in tests.
  • Added new test for direct module import.
  • Added new test for comfy module import.
  • +419/-25
    test_remove_unused_definitions.py
    Add comprehensive tests for unused definitions removal.   

    tests/test_remove_unused_definitions.py

  • Added tests for variable removal.
  • Tested class variable handling and type annotations.
  • Verified complex dependency and control block removals.
  • +424/-0 
    Formatting
    test_code_replacement.py
    Clean up redundant environment config in test.                     

    tests/test_code_replacement.py

  • Removed environment variable configuration.
  • Simplified test code for replacement.
  • +0/-4     

    Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • @github-actions
    Copy link

    github-actions bot commented Apr 17, 2025

    PR Reviewer Guide 🔍

    (Review updated until commit 218f6bd)

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
    🧪 PR contains tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Performance

    The recursive removal logic in the CST transformer may lead to performance issues or potential stack overflows on very large inputs. It would be beneficial to validate its scalability and consider adding safeguards or optimizations.

    # Skip import statements
    if isinstance(node, (cst.Import, cst.ImportFrom)):
        return node, True
    
    # Never remove function definitions
    if isinstance(node, cst.FunctionDef):
        return node, True
    
    # Never remove class definitions
    if isinstance(node, cst.ClassDef):
        class_name = node.name.value
    
        # Check if any methods or variables in this class are used
        method_or_var_used = False
        class_has_dependencies = False
    
        # Check if class itself is marked as used
        if class_name in definitions and definitions[class_name].used_by_qualified_function:
            class_has_dependencies = True
    
        if hasattr(node, "body") and isinstance(node.body, cst.IndentedBlock):
            updates = {}
            new_statements = []
    
            for statement in node.body.body:
                # Keep all function definitions
                if isinstance(statement, cst.FunctionDef):
                    method_name = f"{class_name}.{statement.name.value}"
                    if method_name in definitions and definitions[method_name].used_by_qualified_function:
                        method_or_var_used = True
                    new_statements.append(statement)
                # Only process variable assignments
                elif isinstance(statement, (cst.Assign, cst.AnnAssign, cst.AugAssign)):
                    var_used = False
    
                    # Check if any variable in this assignment is used
                    if isinstance(statement, cst.Assign):
                        for target in statement.targets:
                            names = extract_names_from_targets(target.target)
                            for name in names:
                                class_var_name = f"{class_name}.{name}"
                                if class_var_name in definitions and definitions[class_var_name].used_by_qualified_function:
                                    var_used = True
                                    method_or_var_used = True
                                    break
                    elif isinstance(statement, (cst.AnnAssign, cst.AugAssign)):
                        names = extract_names_from_targets(statement.target)
                        for name in names:
                            class_var_name = f"{class_name}.{name}"
                            if class_var_name in definitions and definitions[class_var_name].used_by_qualified_function:
                                var_used = True
                                method_or_var_used = True
                                break
    
                    if var_used or class_has_dependencies:
                        new_statements.append(statement)
                else:
                    # Keep all other statements in the class
                    new_statements.append(statement)
    
            # Update the class body
            new_body = node.body.with_changes(body=new_statements)
            updates["body"] = new_body
    
            return node.with_changes(**updates), True
    
        return node, method_or_var_used or class_has_dependencies
    
    # Handle assignments (Assign and AnnAssign)
    if isinstance(node, cst.Assign):
        for target in node.targets:
            names = extract_names_from_targets(target.target)
            for name in names:
                if name in definitions and definitions[name].used_by_qualified_function:
                    return node, True
        return None, False
    
    if isinstance(node, (cst.AnnAssign, cst.AugAssign)):
        names = extract_names_from_targets(node.target)
        for name in names:
            if name in definitions and definitions[name].used_by_qualified_function:
                return node, True
        return None, False
    
    # For other nodes, recursively process children
    section_names = get_section_names(node)
    if not section_names:
        return node, False
    
    updates = {}
    found_used = False
    
    for section in section_names:
        original_content = getattr(node, section, None)
        if isinstance(original_content, (list, tuple)):
            new_children = []
            section_found_used = False
    
            for child in original_content:
                filtered, used = remove_unused_definitions_recursively(child, definitions)
                if filtered:
                    new_children.append(filtered)
                section_found_used |= used
    
            if new_children or section_found_used:
                found_used |= section_found_used
                updates[section] = new_children
        elif original_content is not None:
            filtered, used = remove_unused_definitions_recursively(original_content, definitions)
            found_used |= used
            if filtered:
                updates[section] = filtered
    if not found_used:
        return None, False
    if updates:
        return node.with_changes(**updates), found_used
    Set Operation Validity

    Several new hooks combine qualified function name sets (using union operations) before removing unused definitions. Please ensure that the union of sets correctly captures all intended names and that no necessary definitions are dropped.

                original_code,
                code_context_type,
                qualified_function_names,
                helpers_of_helpers_qualified_names,
                remove_docstrings,
            )
            code_context = remove_unused_definitions_by_function_names(code_context, qualified_function_names | helpers_of_helpers_qualified_names)
        except ValueError as e:
            logger.debug(f"Error while getting read-only code: {e}")
            continue
        if code_context.strip():
            final_code_string_context += f"\n{code_context}"
            final_code_string_context = add_needed_imports_from_module(
                src_module_code=original_code,
                dst_module_code=final_code_string_context,
                src_path=file_path,
                dst_path=file_path,
                project_root=project_root_path,
                helper_functions=list(helpers_of_fto.get(file_path, set()) | helpers_of_helpers.get(file_path, set())),
            )
    if code_context_type == CodeContextType.READ_WRITABLE:
        return CodeString(code=final_code_string_context)
    # Extract code from file paths containing helpers of helpers
    for file_path, helper_function_sources in helpers_of_helpers_no_overlap.items():
        try:
            original_code = file_path.read_text("utf8")
        except Exception as e:
            logger.exception(f"Error while parsing {file_path}: {e}")
            continue
        try:
            qualified_helper_function_names = {func.qualified_name for func in helper_function_sources}
            code_context = parse_code_and_prune_cst(
                original_code, code_context_type, set(), qualified_helper_function_names, remove_docstrings
            )
            code_context = remove_unused_definitions_by_function_names(code_context, qualified_helper_function_names)

    @github-actions
    Copy link

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Fix isinstance union syntax

    Replace the bitwise OR operator with a tuple when checking multiple types in
    isinstance.

    codeflash/context/unused_definition_remover.py [74]

    -if isinstance(node, cst.AnnAssign | cst.AugAssign):
    +if isinstance(node, (cst.AnnAssign, cst.AugAssign)):
    Suggestion importance[1-10]: 9

    __

    Why: The suggestion correctly addresses a bug by replacing the bitwise OR with tuple syntax in the isinstance check, ensuring that both cst.AnnAssign and cst.AugAssign are properly evaluated.

    High
    Correct import isinstance check

    Use tuple syntax in isinstance to correctly check for multiple import types.

    codeflash/context/unused_definition_remover.py [321]

    -if isinstance(node, cst.Import | cst.ImportFrom):
    +if isinstance(node, (cst.Import, cst.ImportFrom)):
    Suggestion importance[1-10]: 9

    __

    Why: This modification fixes the misuse of the bitwise OR operator in an isinstance check for import types, ensuring correct type checking by using a tuple, which is a necessary bug fix.

    High

    @alvin-r alvin-r marked this pull request as ready for review April 18, 2025 19:45
    @github-actions
    Copy link

    Persistent review updated to latest commit 218f6bd

    @github-actions
    Copy link

    PR Code Suggestions ✨

    No code suggestions found for the PR.

    @alvin-r alvin-r requested a review from misrasaurabh1 April 18, 2025 20:27
    @misrasaurabh1 misrasaurabh1 merged commit 761f1ba into main Apr 19, 2025
    16 checks passed
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    3 participants