Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 28, 2025

📄 16% (0.16x) speedup for remove_indices in inference/core/workflows/execution_engine/v1/executor/execution_data_manager/step_input_assembler.py

⏱️ Runtime : 277 microseconds 240 microseconds (best of 410 runs)

📝 Explanation and details

The optimized code achieves a 15% speedup through three key changes that reduce Python's type-checking overhead:

1. Type checking optimization: Replaced isinstance(value, dict) and isinstance(value, list) with type(value) is dict and type(value) is list. The type() is comparison is faster because it performs exact type matching without walking the inheritance hierarchy, while isinstance() must check for subclasses.

2. Control flow restructuring: Changed from multiple independent if statements to an if-elif-else chain. This eliminates redundant type checks - once a condition matches, the remaining checks are skipped entirely.

3. Function call streamlining: Removed the explicit value= keyword argument in recursive calls, which slightly reduces Python's function call overhead.

Performance characteristics from tests:

  • Large-scale operations benefit most: Tests with large lists/dicts of batches show 15-22% improvements (e.g., test_remove_indices_large_list_of_batches: 21.3% faster)
  • Complex nested structures: Nested dict/list traversals see 6-17% gains
  • Simple Batch operations: Slight slowdown (3-15%) due to the additional elif check, but these are typically fast operations anyway

The optimization is most effective when processing large collections with many recursive calls, where the cumulative savings from faster type checking compound significantly. The isinstance(value, Batch) check remains unchanged to preserve compatibility with potential Batch subclasses.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 20 Passed
🌀 Generated Regression Tests 45 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
⚙️ Existing Unit Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
workflows/unit_tests/execution_engine/executor/execution_data_manager/test_step_input_assembler.py::test_remove_empty_indices 12.6μs 12.1μs 4.30%✅
🌀 Generated Regression Tests and Runtime
from typing import Any, Set

# imports
import pytest  # used for our unit tests
from inference.core.workflows.execution_engine.v1.executor.execution_data_manager.step_input_assembler import \
    remove_indices


# Dummy classes to simulate Batch and DynamicBatchIndex for testing
class DynamicBatchIndex(int):
    pass

class Batch:
    def __init__(self, items):
        self.items = list(items)
    def remove_by_indices(self, indices_to_remove: Set[DynamicBatchIndex]):
        # Remove items at the specified indices
        return Batch([item for idx, item in enumerate(self.items) if idx not in indices_to_remove])
    def __eq__(self, other):
        return isinstance(other, Batch) and self.items == other.items
    def __repr__(self):
        return f"Batch({self.items!r})"
from inference.core.workflows.execution_engine.v1.executor.execution_data_manager.step_input_assembler import \
    remove_indices

# unit tests

# 1. Basic Test Cases

def test_remove_indices_batch_basic():
    # Remove single index from Batch
    batch = Batch([1, 2, 3, 4])
    codeflash_output = remove_indices(batch, {DynamicBatchIndex(1)}); result = codeflash_output # 514ns -> 693ns (25.8% slower)

def test_remove_indices_batch_multiple_indices():
    # Remove multiple indices from Batch
    batch = Batch([10, 20, 30, 40, 50])
    codeflash_output = remove_indices(batch, {DynamicBatchIndex(0), DynamicBatchIndex(3)}); result = codeflash_output # 468ns -> 486ns (3.70% slower)

def test_remove_indices_batch_no_indices():
    # No indices to remove; should return identical Batch
    batch = Batch([5, 6, 7])
    codeflash_output = remove_indices(batch, set()); result = codeflash_output # 474ns -> 510ns (7.06% slower)

def test_remove_indices_batch_all_indices():
    # Remove all indices; should return empty Batch
    batch = Batch(['a', 'b', 'c'])
    codeflash_output = remove_indices(batch, {DynamicBatchIndex(0), DynamicBatchIndex(1), DynamicBatchIndex(2)}); result = codeflash_output # 438ns -> 493ns (11.2% slower)

def test_remove_indices_dict_with_batch():
    # Remove indices from Batch inside dict
    batch = Batch([1, 2, 3])
    value = {'batch': batch, 'other': 42}
    codeflash_output = remove_indices(value, {DynamicBatchIndex(2)}); result = codeflash_output # 2.01μs -> 1.93μs (4.36% faster)

def test_remove_indices_list_of_batches():
    # Remove indices from each Batch in a list
    batches = [Batch([1, 2]), Batch([3, 4, 5])]
    codeflash_output = remove_indices(batches, {DynamicBatchIndex(1)}); result = codeflash_output # 1.44μs -> 1.38μs (4.51% faster)

def test_remove_indices_list_of_dicts():
    # Remove indices from Batch inside dicts in a list
    value = [{'batch': Batch([10, 20, 30])}, {'batch': Batch([40, 50])}]
    codeflash_output = remove_indices(value, {DynamicBatchIndex(0)}); result = codeflash_output # 2.53μs -> 2.41μs (4.98% faster)

# 2. Edge Test Cases

def test_remove_indices_empty_batch():
    # Remove indices from empty Batch
    batch = Batch([])
    codeflash_output = remove_indices(batch, {DynamicBatchIndex(0)}); result = codeflash_output # 431ns -> 475ns (9.26% slower)

def test_remove_indices_empty_list():
    # Remove indices from empty list
    value = []
    codeflash_output = remove_indices(value, {DynamicBatchIndex(0)}); result = codeflash_output # 776ns -> 762ns (1.84% faster)

def test_remove_indices_empty_dict():
    # Remove indices from empty dict
    value = {}
    codeflash_output = remove_indices(value, {DynamicBatchIndex(0)}); result = codeflash_output # 899ns -> 924ns (2.71% slower)

def test_remove_indices_indices_out_of_range():
    # Indices to remove are out of range
    batch = Batch([1, 2])
    codeflash_output = remove_indices(batch, {DynamicBatchIndex(10)}); result = codeflash_output # 436ns -> 498ns (12.4% slower)

def test_remove_indices_non_batch_non_collection():
    # Value is not batch, list, or dict; should return unchanged
    value = 123
    codeflash_output = remove_indices(value, {DynamicBatchIndex(0)}); result = codeflash_output # 542ns -> 609ns (11.0% slower)

def test_remove_indices_nested_dict_list_batch():
    # Deeply nested structure
    value = {
        'level1': [
            {'level2': Batch([1, 2, 3])},
            {'level2': Batch([4, 5, 6])}
        ]
    }
    codeflash_output = remove_indices(value, {DynamicBatchIndex(1)}); result = codeflash_output # 3.27μs -> 3.05μs (7.27% faster)

def test_remove_indices_non_integer_indices():
    # Indices are not integers (should not raise, but nothing is removed)
    batch = Batch([1, 2, 3])
    codeflash_output = remove_indices(batch, {'a', 'b'}); result = codeflash_output # 456ns -> 515ns (11.5% slower)

def test_remove_indices_batch_with_duplicate_indices():
    # Indices set contains duplicates (should behave as set)
    batch = Batch([1, 2, 3, 4])
    codeflash_output = remove_indices(batch, {DynamicBatchIndex(1), DynamicBatchIndex(1), DynamicBatchIndex(3)}); result = codeflash_output # 439ns -> 491ns (10.6% slower)

def test_remove_indices_batch_with_negative_indices():
    # Negative indices should not remove anything (since Batch expects non-negative)
    batch = Batch([1, 2, 3])
    codeflash_output = remove_indices(batch, {DynamicBatchIndex(-1)}); result = codeflash_output # 445ns -> 505ns (11.9% slower)

# 3. Large Scale Test Cases

def test_remove_indices_large_batch():
    # Remove indices from a large Batch
    size = 1000
    batch = Batch(list(range(size)))
    indices_to_remove = set(DynamicBatchIndex(i) for i in range(0, size, 2))  # Remove even indices
    codeflash_output = remove_indices(batch, indices_to_remove); result = codeflash_output # 495ns -> 512ns (3.32% slower)
    expected = Batch([i for i in range(size) if i % 2 != 0])

def test_remove_indices_large_list_of_batches():
    # Remove indices from a large list of Batches
    batches = [Batch([i, i+1, i+2]) for i in range(0, 1000, 3)]
    indices_to_remove = {DynamicBatchIndex(1)}
    codeflash_output = remove_indices(batches, indices_to_remove); result = codeflash_output # 39.5μs -> 32.5μs (21.3% faster)
    expected = [Batch([i, i+2]) for i in range(0, 1000, 3)]

def test_remove_indices_large_nested_structure():
    # Remove indices from a large nested dict/list structure
    value = {
        'group1': [Batch(list(range(10))) for _ in range(50)],
        'group2': [Batch(list(range(20))) for _ in range(50)]
    }
    indices_to_remove = {DynamicBatchIndex(0), DynamicBatchIndex(5)}
    codeflash_output = remove_indices(value, indices_to_remove); result = codeflash_output # 14.5μs -> 12.3μs (17.4% faster)
    # Check a few samples for correctness
    for b in result['group1']:
        pass
    for b in result['group2']:
        pass

def test_remove_indices_large_dict():
    # Remove indices from batches in a large dict
    value = {f'key_{i}': Batch([i, i+1, i+2, i+3]) for i in range(0, 1000, 4)}
    indices_to_remove = {DynamicBatchIndex(2)}
    codeflash_output = remove_indices(value, indices_to_remove); result = codeflash_output # 38.0μs -> 32.9μs (15.3% faster)
    for k, v in result.items():
        orig = int(k.split('_')[1])

def test_remove_indices_large_batch_remove_none():
    # Large batch, remove no indices
    size = 1000
    batch = Batch(list(range(size)))
    codeflash_output = remove_indices(batch, set()); result = codeflash_output # 502ns -> 576ns (12.8% slower)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import pytest  # used for our unit tests
from inference.core.workflows.execution_engine.v1.executor.execution_data_manager.step_input_assembler import \
    remove_indices


# Minimal stubs for Batch and DynamicBatchIndex to allow testing
class DynamicBatchIndex(int):
    pass

class Batch:
    def __init__(self, items):
        self.items = list(items)

    def remove_by_indices(self, indices_to_remove):
        # Remove items at indices specified in indices_to_remove
        # indices_to_remove is a set of DynamicBatchIndex
        # Items are removed by index, so we need to sort and remove from the end to avoid shifting
        indices_sorted = sorted(indices_to_remove, reverse=True)
        new_items = [item for i, item in enumerate(self.items) if i not in indices_to_remove]
        return Batch(new_items)

    def __eq__(self, other):
        if not isinstance(other, Batch):
            return False
        return self.items == other.items

    def __repr__(self):
        return f"Batch({self.items!r})"

    def __str__(self):
        return repr(self)
from inference.core.workflows.execution_engine.v1.executor.execution_data_manager.step_input_assembler import \
    remove_indices

# unit tests

# ------------------------------
# Basic Test Cases
# ------------------------------

def test_remove_indices_batch_basic():
    # Remove a single index from a Batch
    batch = Batch([1, 2, 3, 4])
    indices = {DynamicBatchIndex(1)}
    codeflash_output = remove_indices(batch, indices); result = codeflash_output # 534ns -> 590ns (9.49% slower)

def test_remove_indices_batch_multiple_indices():
    # Remove multiple indices from a Batch
    batch = Batch(['a', 'b', 'c', 'd'])
    indices = {DynamicBatchIndex(0), DynamicBatchIndex(2)}
    codeflash_output = remove_indices(batch, indices); result = codeflash_output # 469ns -> 514ns (8.75% slower)

def test_remove_indices_batch_no_indices():
    # Remove no indices from a Batch
    batch = Batch([10, 20, 30])
    indices = set()
    codeflash_output = remove_indices(batch, indices); result = codeflash_output # 505ns -> 573ns (11.9% slower)

def test_remove_indices_list_of_batches():
    # Remove indices from a list of Batch objects
    batches = [Batch([1, 2]), Batch([3, 4, 5])]
    indices = {DynamicBatchIndex(1)}
    codeflash_output = remove_indices(batches, indices); result = codeflash_output # 1.43μs -> 1.35μs (6.24% faster)

def test_remove_indices_dict_of_batches():
    # Remove indices from a dict of Batch objects
    batches = {'x': Batch([1, 2, 3]), 'y': Batch([4, 5, 6])}
    indices = {DynamicBatchIndex(2)}
    codeflash_output = remove_indices(batches, indices); result = codeflash_output # 1.66μs -> 1.63μs (1.72% faster)

def test_remove_indices_nested_dict_and_list():
    # Remove indices from nested dict and list containing Batch objects
    value = {
        'a': [Batch([1, 2, 3]), Batch([4, 5, 6])],
        'b': Batch([7, 8, 9])
    }
    indices = {DynamicBatchIndex(1)}
    codeflash_output = remove_indices(value, indices); result = codeflash_output # 2.32μs -> 2.32μs (0.302% faster)

def test_remove_indices_non_batch_value():
    # Value is not Batch, list, or dict; should be returned unchanged
    value = 42
    indices = {DynamicBatchIndex(0)}
    codeflash_output = remove_indices(value, indices); result = codeflash_output # 440ns -> 491ns (10.4% slower)

def test_remove_indices_empty_list():
    # Value is an empty list
    value = []
    indices = {DynamicBatchIndex(0)}
    codeflash_output = remove_indices(value, indices); result = codeflash_output # 770ns -> 765ns (0.654% faster)

def test_remove_indices_empty_dict():
    # Value is an empty dict
    value = {}
    indices = {DynamicBatchIndex(0)}
    codeflash_output = remove_indices(value, indices); result = codeflash_output # 845ns -> 892ns (5.27% slower)

# ------------------------------
# Edge Test Cases
# ------------------------------

def test_remove_indices_batch_all_indices():
    # Remove all indices from a Batch
    batch = Batch([1, 2, 3])
    indices = {DynamicBatchIndex(0), DynamicBatchIndex(1), DynamicBatchIndex(2)}
    codeflash_output = remove_indices(batch, indices); result = codeflash_output # 468ns -> 465ns (0.645% faster)

def test_remove_indices_batch_out_of_bounds_indices():
    # Remove indices that are out of bounds (should ignore them)
    batch = Batch(['a', 'b'])
    indices = {DynamicBatchIndex(5), DynamicBatchIndex(-1)}
    codeflash_output = remove_indices(batch, indices); result = codeflash_output # 466ns -> 493ns (5.48% slower)

def test_remove_indices_batch_duplicate_indices():
    # Duplicate indices in the set should have no effect
    batch = Batch([1, 2, 3])
    indices = {DynamicBatchIndex(1), DynamicBatchIndex(1)}
    codeflash_output = remove_indices(batch, indices); result = codeflash_output # 436ns -> 491ns (11.2% slower)

def test_remove_indices_list_with_non_batch_elements():
    # List contains Batch and non-Batch elements
    value = [Batch([1, 2, 3]), 99, 'hello']
    indices = {DynamicBatchIndex(2)}
    codeflash_output = remove_indices(value, indices); result = codeflash_output # 1.63μs -> 1.55μs (5.11% faster)

def test_remove_indices_dict_with_non_batch_elements():
    # Dict contains Batch and non-Batch elements
    value = {'x': Batch([1, 2]), 'y': 'foo', 'z': 123}
    indices = {DynamicBatchIndex(0)}
    codeflash_output = remove_indices(value, indices); result = codeflash_output # 1.81μs -> 1.82μs (0.165% slower)

def test_remove_indices_nested_empty_structures():
    # Nested empty dicts/lists should be handled correctly
    value = {'a': [], 'b': {}, 'c': Batch([])}
    indices = {DynamicBatchIndex(0)}
    codeflash_output = remove_indices(value, indices); result = codeflash_output # 2.43μs -> 2.38μs (2.14% faster)

def test_remove_indices_batch_negative_index():
    # Negative index should be ignored
    batch = Batch([1, 2, 3])
    indices = {DynamicBatchIndex(-1)}
    codeflash_output = remove_indices(batch, indices); result = codeflash_output # 441ns -> 465ns (5.16% slower)

def test_remove_indices_batch_large_index():
    # Large index should be ignored
    batch = Batch([1, 2, 3])
    indices = {DynamicBatchIndex(100)}
    codeflash_output = remove_indices(batch, indices); result = codeflash_output # 436ns -> 502ns (13.1% slower)

def test_remove_indices_batch_empty_indices_set():
    # Empty indices set should not remove anything
    batch = Batch([1, 2, 3])
    indices = set()
    codeflash_output = remove_indices(batch, indices); result = codeflash_output # 418ns -> 491ns (14.9% slower)

def test_remove_indices_nested_mixed_types():
    # Nested structures with mixed types
    value = {
        'a': [Batch([1, 2]), 'x', 5],
        'b': {'c': Batch([3, 4, 5]), 'd': [Batch([6, 7])]}
    }
    indices = {DynamicBatchIndex(1)}
    codeflash_output = remove_indices(value, indices); result = codeflash_output # 3.92μs -> 3.71μs (5.60% faster)

# ------------------------------
# Large Scale Test Cases
# ------------------------------

def test_remove_indices_large_batch():
    # Remove indices from a large Batch
    items = list(range(1000))
    batch = Batch(items)
    indices = {DynamicBatchIndex(i) for i in range(0, 1000, 2)}  # Remove even indices
    codeflash_output = remove_indices(batch, indices); result = codeflash_output # 509ns -> 587ns (13.3% slower)
    expected = [i for i in range(1000) if i % 2 != 0]

def test_remove_indices_large_list_of_batches():
    # Remove indices from a large list of Batch objects
    batches = [Batch([i, i + 1, i + 2]) for i in range(0, 1000, 10)]
    indices = {DynamicBatchIndex(1)}
    codeflash_output = remove_indices(batches, indices); result = codeflash_output # 13.2μs -> 11.1μs (18.7% faster)
    expected = [Batch([i, i + 2]) for i in range(0, 1000, 10)]

def test_remove_indices_large_nested_structure():
    # Remove indices from a large nested dict/list structure
    value = {
        'a': [Batch(list(range(100))), Batch(list(range(100, 200)))],
        'b': {'x': Batch(list(range(200, 300))), 'y': [Batch(list(range(300, 400)))]}
    }
    indices = {DynamicBatchIndex(i) for i in range(0, 100, 10)}  # Remove every 10th item
    codeflash_output = remove_indices(value, indices); result = codeflash_output # 3.46μs -> 3.24μs (6.98% faster)
    def expected_batch(start):
        return Batch([i for i in range(start, start + 100) if (i - start) % 10 != 0])

def test_remove_indices_large_dict_of_batches():
    # Remove indices from a large dict of Batch objects
    value = {str(i): Batch([i, i + 1, i + 2, i + 3]) for i in range(0, 1000, 50)}
    indices = {DynamicBatchIndex(2), DynamicBatchIndex(3)}
    codeflash_output = remove_indices(value, indices); result = codeflash_output # 4.50μs -> 4.09μs (10.2% faster)
    expected = {str(i): Batch([i, i + 1]) for i in range(0, 1000, 50)}

def test_remove_indices_large_list_mixed_types():
    # Large list with mixed types, including Batch objects
    value = [Batch([i, i + 1]) if i % 3 == 0 else i for i in range(1000)]
    indices = {DynamicBatchIndex(1)}
    codeflash_output = remove_indices(value, indices); result = codeflash_output # 113μs -> 92.6μs (22.4% faster)
    expected = [Batch([i]) if i % 3 == 0 else i for i in range(1000)]
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-remove_indices-mh9v8vpa and push.

Codeflash

The optimized code achieves a **15% speedup** through three key changes that reduce Python's type-checking overhead:

**1. Type checking optimization:** Replaced `isinstance(value, dict)` and `isinstance(value, list)` with `type(value) is dict` and `type(value) is list`. The `type() is` comparison is faster because it performs exact type matching without walking the inheritance hierarchy, while `isinstance()` must check for subclasses.

**2. Control flow restructuring:** Changed from multiple independent `if` statements to an `if-elif-else` chain. This eliminates redundant type checks - once a condition matches, the remaining checks are skipped entirely.

**3. Function call streamlining:** Removed the explicit `value=` keyword argument in recursive calls, which slightly reduces Python's function call overhead.

**Performance characteristics from tests:**
- **Large-scale operations benefit most**: Tests with large lists/dicts of batches show 15-22% improvements (e.g., `test_remove_indices_large_list_of_batches: 21.3% faster`)
- **Complex nested structures**: Nested dict/list traversals see 6-17% gains
- **Simple Batch operations**: Slight slowdown (3-15%) due to the additional `elif` check, but these are typically fast operations anyway

The optimization is most effective when processing large collections with many recursive calls, where the cumulative savings from faster type checking compound significantly. The `isinstance(value, Batch)` check remains unchanged to preserve compatibility with potential Batch subclasses.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 28, 2025 01:06
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant