Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 28, 2025

📄 22% (0.22x) speedup for Batch.init in inference/core/workflows/execution_engine/entities/base.py

⏱️ Runtime : 46.8 microseconds 38.2 microseconds (best of 320 runs)

📝 Explanation and details

The optimization removes keyword arguments from the constructor call in the init method, changing cls(content=content, indices=indices) to cls(content, indices).

This eliminates the overhead of Python's keyword argument handling mechanism, which involves:

  • Creating a dictionary to map argument names to values
  • Additional parameter binding logic in the interpreter
  • Extra function call overhead for keyword processing

The 22% speedup is achieved because object instantiation becomes more direct - Python can pass arguments positionally without the extra dictionary creation and lookup steps. This optimization is particularly effective for frequently called factory methods like init.

The test results show consistent 20-35% improvements across all scenarios, with the best gains on simpler cases (empty lists: 36.1%, basic operations: 25-30%). Even complex scenarios with large datasets maintain 15-30% improvements, demonstrating that the optimization scales well regardless of content size or complexity.

Since the constructor signature remains unchanged and arguments are passed in the same order, this is a pure performance optimization with no behavioral changes.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 54 Passed
🌀 Generated Regression Tests 70 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
⚙️ Existing Unit Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
workflows/unit_tests/execution_engine/entities/test_base.py::test_broadcast_batch_when_requested_size_is_equal_to_batch_size 1.12μs 840ns 33.7%✅
workflows/unit_tests/execution_engine/entities/test_base.py::test_broadcast_batch_when_requested_size_is_invalid 1.25μs 1.09μs 14.7%✅
workflows/unit_tests/execution_engine/entities/test_base.py::test_broadcast_batch_when_requested_size_is_valid_and_batch_size_is_not_matching 1.17μs 974ns 19.6%✅
workflows/unit_tests/execution_engine/entities/test_base.py::test_broadcast_batch_when_requested_size_is_valid_and_batch_size_is_one 1.29μs 1.23μs 5.47%✅
workflows/unit_tests/execution_engine/entities/test_base.py::test_filtering_out_batch_elements 1.43μs 1.22μs 16.7%✅
workflows/unit_tests/execution_engine/entities/test_base.py::test_getting_batch_element_when_valid_element_is_chosen 1.07μs 896ns 19.6%✅
workflows/unit_tests/execution_engine/entities/test_base.py::test_getting_batch_element_when_valid_invalid_element_is_chosen 1.07μs 937ns 14.1%✅
workflows/unit_tests/execution_engine/entities/test_base.py::test_getting_batch_length 1.21μs 1.06μs 13.8%✅
workflows/unit_tests/execution_engine/entities/test_base.py::test_initialising_batch_with_misaligned_indices 1.00μs 999ns 0.200%✅
workflows/unit_tests/execution_engine/entities/test_base.py::test_standard_iteration_through_batch 1.37μs 1.18μs 15.7%✅
workflows/unit_tests/execution_engine/entities/test_base.py::test_standard_iteration_through_batch_with_indices 1.12μs 968ns 16.1%✅
🌀 Generated Regression Tests and Runtime
from typing import Generic, Iterator, List, Optional, Tuple, TypeVar

# imports
import pytest
from inference.core.workflows.execution_engine.entities.base import Batch

B = TypeVar("B")
from inference.core.workflows.execution_engine.entities.base import Batch

# unit tests

# ----------- Basic Test Cases -----------

def test_basic_init_with_ints():
    # Test with simple integer content and matching indices
    content = [1, 2, 3]
    indices = [(0,), (1,), (2,)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 975ns -> 768ns (27.0% faster)

def test_basic_init_with_strings():
    # Test with string content
    content = ["a", "b", "c"]
    indices = [(10,), (11,), (12,)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 924ns -> 725ns (27.4% faster)

def test_basic_init_with_empty_lists():
    # Test with empty content and indices
    content = []
    indices = []
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 920ns -> 676ns (36.1% faster)

def test_basic_init_with_multiple_indices_per_item():
    # Test with indices as tuples of length > 1
    content = [1, 2]
    indices = [(0, 1), (2, 3)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 899ns -> 677ns (32.8% faster)

# ----------- Edge Test Cases -----------

def test_edge_init_mismatched_lengths_raises():
    # Test with content and indices of different lengths
    content = [1, 2, 3]
    indices = [(0,), (1,)]
    with pytest.raises(ValueError) as excinfo:
        Batch.init(content, indices) # 895ns -> 860ns (4.07% faster)

def test_edge_init_indices_with_empty_tuples():
    # Test with indices containing empty tuples
    content = ["x"]
    indices = [()]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 1.01μs -> 809ns (25.0% faster)

def test_edge_init_with_nested_indices():
    # Test with indices containing tuples of length > 2
    content = ["a", "b"]
    indices = [(0, 1, 2), (3, 4, 5)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 909ns -> 740ns (22.8% faster)

def test_edge_init_with_none_content():
    # Test with None as a content element
    content = [None]
    indices = [(5,)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 918ns -> 728ns (26.1% faster)

def test_edge_init_with_non_integer_indices():
    # Test with indices containing non-integer types
    content = ["a"]
    indices = [("str",)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 896ns -> 716ns (25.1% faster)

def test_edge_init_with_large_tuple_indices():
    # Test with indices as very large tuples
    content = ["a"]
    indices = [tuple(range(100))]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 967ns -> 712ns (35.8% faster)

def test_edge_init_with_duplicate_indices():
    # Test with duplicate indices
    content = ["a", "b"]
    indices = [(0,), (0,)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 949ns -> 710ns (33.7% faster)

# ----------- Large Scale Test Cases -----------

def test_large_scale_init_1000_elements():
    # Test with 1000 elements in content and indices
    n = 1000
    content = list(range(n))
    indices = [(i,) for i in range(n)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 1.09μs -> 888ns (22.4% faster)

def test_large_scale_init_1000_elements_multi_tuple_indices():
    # Test with 1000 elements and multi-length tuple indices
    n = 1000
    content = [str(i) for i in range(n)]
    indices = [(i, i+1) for i in range(n)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 1.13μs -> 888ns (27.3% faster)

def test_large_scale_init_with_large_content_objects():
    # Test with large objects in content
    n = 1000
    content = [{"val": i, "data": [i]*10} for i in range(n)]
    indices = [(i,) for i in range(n)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 1.33μs -> 1.15μs (15.9% faster)

def test_large_scale_init_performance():
    # Test that init is performant for large input
    import time
    n = 1000
    content = list(range(n))
    indices = [(i,) for i in range(n)]
    start = time.time()
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 1.17μs -> 956ns (22.0% faster)
    end = time.time()

# ----------- Additional Edge Cases -----------

def test_edge_init_with_non_list_content_and_indices():
    # Test with content and indices as other iterables (should fail)
    content = (1, 2, 3)  # tuple, not list
    indices = [(0,), (1,), (2,)]
    with pytest.raises(TypeError):
        # Should raise TypeError in __init__ due to type mismatch
        Batch.init(content, indices)

def test_edge_init_with_non_list_indices():
    # Test with indices as a tuple (should fail)
    content = [1, 2, 3]
    indices = ((0,), (1,), (2,))
    with pytest.raises(TypeError):
        Batch.init(content, indices)

def test_edge_init_with_non_tuple_indices_elements():
    # Test with indices elements not being tuples (should still work, as per type hints)
    content = [1, 2]
    indices = [0, 1]  # not tuples
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 1.36μs -> 1.11μs (22.3% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from typing import Generic, Iterator, List, Optional, Tuple, TypeVar

# imports
import pytest  # used for our unit tests
from inference.core.workflows.execution_engine.entities.base import Batch

B = TypeVar("B")
from inference.core.workflows.execution_engine.entities.base import Batch

# unit tests

# 1. Basic Test Cases

def test_init_basic_int_content():
    # Test with simple integer content and matching indices
    content = [1, 2, 3]
    indices = [(0,), (1,), (2,)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 1.06μs -> 815ns (29.8% faster)

def test_init_basic_str_content():
    # Test with string content and matching indices
    content = ["a", "b", "c"]
    indices = [(10,), (11,), (12,)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 921ns -> 740ns (24.5% faster)

def test_init_basic_tuple_indices():
    # Test with multi-dimensional tuple indices
    content = [1, 2]
    indices = [(0, 1), (1, 2)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 978ns -> 770ns (27.0% faster)

def test_init_basic_empty():
    # Test with empty content and indices
    content = []
    indices = []
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 928ns -> 725ns (28.0% faster)

def test_init_basic_single_element():
    # Test with a single element
    content = ["x"]
    indices = [(42,)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 900ns -> 715ns (25.9% faster)

# 2. Edge Test Cases

def test_init_mismatched_lengths_raises():
    # Test with mismatched lengths of content and indices
    content = [1, 2, 3]
    indices = [(0,), (1,)]
    with pytest.raises(ValueError) as excinfo:
        Batch.init(content, indices) # 879ns -> 872ns (0.803% faster)

def test_init_indices_with_empty_tuples():
    # Test where indices contain empty tuples
    content = [1, 2]
    indices = [(), ()]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 1.01μs -> 794ns (26.8% faster)

def test_init_indices_with_varied_tuple_lengths():
    # Test indices with tuples of different lengths
    content = [1, 2, 3]
    indices = [(0,), (1, 2), (3, 4, 5)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 950ns -> 748ns (27.0% faster)

def test_init_content_with_none():
    # Test content containing None
    content = [None, 2]
    indices = [(0,), (1,)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 922ns -> 707ns (30.4% faster)

def test_init_indices_with_negative_and_large_numbers():
    # Test indices with negative and large numbers
    content = ["a", "b"]
    indices = [(-1,), (999999,)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 860ns -> 691ns (24.5% faster)

def test_init_indices_with_non_int_tuples():
    # Test indices with tuples containing non-integers
    content = ["x"]
    indices = [("a",)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 873ns -> 686ns (27.3% faster)

def test_init_content_with_mutable_types():
    # Test content with mutable types (lists)
    content = [[1, 2], [3, 4]]
    indices = [(0,), (1,)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 880ns -> 664ns (32.5% faster)

def test_init_indices_with_duplicate_tuples():
    # Test indices with duplicate tuples
    content = [1, 2, 3]
    indices = [(0,), (0,), (1,)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 912ns -> 689ns (32.4% faster)

def test_init_indices_with_large_tuple():
    # Test indices with a tuple of maximum reasonable length
    content = [1]
    indices = [tuple(range(100))]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 910ns -> 701ns (29.8% faster)

# 3. Large Scale Test Cases

def test_init_large_content_and_indices():
    # Test with large content and indices lists (1000 elements)
    content = list(range(1000))
    indices = [(i,) for i in range(1000)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 1.13μs -> 857ns (31.9% faster)

def test_init_large_content_and_multi_indices():
    # Test with large content and multi-dimensional indices
    content = ["x"] * 1000
    indices = [(i, i+1, i+2) for i in range(1000)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 1.10μs -> 856ns (28.5% faster)

def test_init_large_content_with_empty_indices():
    # Test with large content and all indices are empty tuples
    content = [None] * 1000
    indices = [() for _ in range(1000)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 1.06μs -> 809ns (30.9% faster)

def test_init_large_content_and_indices_performance():
    # Test that the function works efficiently for large inputs
    content = list(range(1000))
    indices = [(i,) for i in range(1000)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 1.05μs -> 832ns (26.1% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-Batch.init-mh9y6sa3 and push.

Codeflash

The optimization removes keyword arguments from the constructor call in the `init` method, changing `cls(content=content, indices=indices)` to `cls(content, indices)`.

This eliminates the overhead of Python's keyword argument handling mechanism, which involves:
- Creating a dictionary to map argument names to values
- Additional parameter binding logic in the interpreter
- Extra function call overhead for keyword processing

The 22% speedup is achieved because object instantiation becomes more direct - Python can pass arguments positionally without the extra dictionary creation and lookup steps. This optimization is particularly effective for frequently called factory methods like `init`.

The test results show consistent 20-35% improvements across all scenarios, with the best gains on simpler cases (empty lists: 36.1%, basic operations: 25-30%). Even complex scenarios with large datasets maintain 15-30% improvements, demonstrating that the optimization scales well regardless of content size or complexity.

Since the constructor signature remains unchanged and arguments are passed in the same order, this is a pure performance optimization with no behavioral changes.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 28, 2025 02:28
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant