⚡️ Speed up method `Batch.init` by 22% #594

codeflash-ai · 2025-10-28T02:28:54Z

📄 22% (0.22x) speedup for `Batch.init` in `inference/core/workflows/execution_engine/entities/base.py`

⏱️ Runtime : 46.8 microseconds → 38.2 microseconds (best of 320 runs)

📝 Explanation and details

The optimization removes keyword arguments from the constructor call in the init method, changing cls(content=content, indices=indices) to cls(content, indices).

This eliminates the overhead of Python's keyword argument handling mechanism, which involves:

Creating a dictionary to map argument names to values
Additional parameter binding logic in the interpreter
Extra function call overhead for keyword processing

The 22% speedup is achieved because object instantiation becomes more direct - Python can pass arguments positionally without the extra dictionary creation and lookup steps. This optimization is particularly effective for frequently called factory methods like init.

The test results show consistent 20-35% improvements across all scenarios, with the best gains on simpler cases (empty lists: 36.1%, basic operations: 25-30%). Even complex scenarios with large datasets maintain 15-30% improvements, demonstrating that the optimization scales well regardless of content size or complexity.

Since the constructor signature remains unchanged and arguments are passed in the same order, this is a pure performance optimization with no behavioral changes.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	✅ 54 Passed
🌀 Generated Regression Tests	✅ 70 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

⚙️ Existing Unit Tests and Runtime

Test File::Test Function	Original ⏱️	Optimized ⏱️	Speedup
`workflows/unit_tests/execution_engine/entities/test_base.py::test_broadcast_batch_when_requested_size_is_equal_to_batch_size`	1.12μs	840ns	33.7%✅
`workflows/unit_tests/execution_engine/entities/test_base.py::test_broadcast_batch_when_requested_size_is_invalid`	1.25μs	1.09μs	14.7%✅
`workflows/unit_tests/execution_engine/entities/test_base.py::test_broadcast_batch_when_requested_size_is_valid_and_batch_size_is_not_matching`	1.17μs	974ns	19.6%✅
`workflows/unit_tests/execution_engine/entities/test_base.py::test_broadcast_batch_when_requested_size_is_valid_and_batch_size_is_one`	1.29μs	1.23μs	5.47%✅
`workflows/unit_tests/execution_engine/entities/test_base.py::test_filtering_out_batch_elements`	1.43μs	1.22μs	16.7%✅
`workflows/unit_tests/execution_engine/entities/test_base.py::test_getting_batch_element_when_valid_element_is_chosen`	1.07μs	896ns	19.6%✅
`workflows/unit_tests/execution_engine/entities/test_base.py::test_getting_batch_element_when_valid_invalid_element_is_chosen`	1.07μs	937ns	14.1%✅
`workflows/unit_tests/execution_engine/entities/test_base.py::test_getting_batch_length`	1.21μs	1.06μs	13.8%✅
`workflows/unit_tests/execution_engine/entities/test_base.py::test_initialising_batch_with_misaligned_indices`	1.00μs	999ns	0.200%✅
`workflows/unit_tests/execution_engine/entities/test_base.py::test_standard_iteration_through_batch`	1.37μs	1.18μs	15.7%✅
`workflows/unit_tests/execution_engine/entities/test_base.py::test_standard_iteration_through_batch_with_indices`	1.12μs	968ns	16.1%✅

🌀 Generated Regression Tests and Runtime

from typing import Generic, Iterator, List, Optional, Tuple, TypeVar

# imports
import pytest
from inference.core.workflows.execution_engine.entities.base import Batch

B = TypeVar("B")
from inference.core.workflows.execution_engine.entities.base import Batch

# unit tests

# ----------- Basic Test Cases -----------

def test_basic_init_with_ints():
    # Test with simple integer content and matching indices
    content = [1, 2, 3]
    indices = [(0,), (1,), (2,)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 975ns -> 768ns (27.0% faster)

def test_basic_init_with_strings():
    # Test with string content
    content = ["a", "b", "c"]
    indices = [(10,), (11,), (12,)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 924ns -> 725ns (27.4% faster)

def test_basic_init_with_empty_lists():
    # Test with empty content and indices
    content = []
    indices = []
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 920ns -> 676ns (36.1% faster)

def test_basic_init_with_multiple_indices_per_item():
    # Test with indices as tuples of length > 1
    content = [1, 2]
    indices = [(0, 1), (2, 3)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 899ns -> 677ns (32.8% faster)

# ----------- Edge Test Cases -----------

def test_edge_init_mismatched_lengths_raises():
    # Test with content and indices of different lengths
    content = [1, 2, 3]
    indices = [(0,), (1,)]
    with pytest.raises(ValueError) as excinfo:
        Batch.init(content, indices) # 895ns -> 860ns (4.07% faster)

def test_edge_init_indices_with_empty_tuples():
    # Test with indices containing empty tuples
    content = ["x"]
    indices = [()]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 1.01μs -> 809ns (25.0% faster)

def test_edge_init_with_nested_indices():
    # Test with indices containing tuples of length > 2
    content = ["a", "b"]
    indices = [(0, 1, 2), (3, 4, 5)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 909ns -> 740ns (22.8% faster)

def test_edge_init_with_none_content():
    # Test with None as a content element
    content = [None]
    indices = [(5,)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 918ns -> 728ns (26.1% faster)

def test_edge_init_with_non_integer_indices():
    # Test with indices containing non-integer types
    content = ["a"]
    indices = [("str",)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 896ns -> 716ns (25.1% faster)

def test_edge_init_with_large_tuple_indices():
    # Test with indices as very large tuples
    content = ["a"]
    indices = [tuple(range(100))]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 967ns -> 712ns (35.8% faster)

def test_edge_init_with_duplicate_indices():
    # Test with duplicate indices
    content = ["a", "b"]
    indices = [(0,), (0,)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 949ns -> 710ns (33.7% faster)

# ----------- Large Scale Test Cases -----------

def test_large_scale_init_1000_elements():
    # Test with 1000 elements in content and indices
    n = 1000
    content = list(range(n))
    indices = [(i,) for i in range(n)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 1.09μs -> 888ns (22.4% faster)

def test_large_scale_init_1000_elements_multi_tuple_indices():
    # Test with 1000 elements and multi-length tuple indices
    n = 1000
    content = [str(i) for i in range(n)]
    indices = [(i, i+1) for i in range(n)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 1.13μs -> 888ns (27.3% faster)

def test_large_scale_init_with_large_content_objects():
    # Test with large objects in content
    n = 1000
    content = [{"val": i, "data": [i]*10} for i in range(n)]
    indices = [(i,) for i in range(n)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 1.33μs -> 1.15μs (15.9% faster)

def test_large_scale_init_performance():
    # Test that init is performant for large input
    import time
    n = 1000
    content = list(range(n))
    indices = [(i,) for i in range(n)]
    start = time.time()
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 1.17μs -> 956ns (22.0% faster)
    end = time.time()

# ----------- Additional Edge Cases -----------

def test_edge_init_with_non_list_content_and_indices():
    # Test with content and indices as other iterables (should fail)
    content = (1, 2, 3)  # tuple, not list
    indices = [(0,), (1,), (2,)]
    with pytest.raises(TypeError):
        # Should raise TypeError in __init__ due to type mismatch
        Batch.init(content, indices)

def test_edge_init_with_non_list_indices():
    # Test with indices as a tuple (should fail)
    content = [1, 2, 3]
    indices = ((0,), (1,), (2,))
    with pytest.raises(TypeError):
        Batch.init(content, indices)

def test_edge_init_with_non_tuple_indices_elements():
    # Test with indices elements not being tuples (should still work, as per type hints)
    content = [1, 2]
    indices = [0, 1]  # not tuples
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 1.36μs -> 1.11μs (22.3% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from typing import Generic, Iterator, List, Optional, Tuple, TypeVar

# imports
import pytest  # used for our unit tests
from inference.core.workflows.execution_engine.entities.base import Batch

B = TypeVar("B")
from inference.core.workflows.execution_engine.entities.base import Batch

# unit tests

# 1. Basic Test Cases

def test_init_basic_int_content():
    # Test with simple integer content and matching indices
    content = [1, 2, 3]
    indices = [(0,), (1,), (2,)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 1.06μs -> 815ns (29.8% faster)

def test_init_basic_str_content():
    # Test with string content and matching indices
    content = ["a", "b", "c"]
    indices = [(10,), (11,), (12,)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 921ns -> 740ns (24.5% faster)

def test_init_basic_tuple_indices():
    # Test with multi-dimensional tuple indices
    content = [1, 2]
    indices = [(0, 1), (1, 2)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 978ns -> 770ns (27.0% faster)

def test_init_basic_empty():
    # Test with empty content and indices
    content = []
    indices = []
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 928ns -> 725ns (28.0% faster)

def test_init_basic_single_element():
    # Test with a single element
    content = ["x"]
    indices = [(42,)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 900ns -> 715ns (25.9% faster)

# 2. Edge Test Cases

def test_init_mismatched_lengths_raises():
    # Test with mismatched lengths of content and indices
    content = [1, 2, 3]
    indices = [(0,), (1,)]
    with pytest.raises(ValueError) as excinfo:
        Batch.init(content, indices) # 879ns -> 872ns (0.803% faster)

def test_init_indices_with_empty_tuples():
    # Test where indices contain empty tuples
    content = [1, 2]
    indices = [(), ()]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 1.01μs -> 794ns (26.8% faster)

def test_init_indices_with_varied_tuple_lengths():
    # Test indices with tuples of different lengths
    content = [1, 2, 3]
    indices = [(0,), (1, 2), (3, 4, 5)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 950ns -> 748ns (27.0% faster)

def test_init_content_with_none():
    # Test content containing None
    content = [None, 2]
    indices = [(0,), (1,)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 922ns -> 707ns (30.4% faster)

def test_init_indices_with_negative_and_large_numbers():
    # Test indices with negative and large numbers
    content = ["a", "b"]
    indices = [(-1,), (999999,)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 860ns -> 691ns (24.5% faster)

def test_init_indices_with_non_int_tuples():
    # Test indices with tuples containing non-integers
    content = ["x"]
    indices = [("a",)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 873ns -> 686ns (27.3% faster)

def test_init_content_with_mutable_types():
    # Test content with mutable types (lists)
    content = [[1, 2], [3, 4]]
    indices = [(0,), (1,)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 880ns -> 664ns (32.5% faster)

def test_init_indices_with_duplicate_tuples():
    # Test indices with duplicate tuples
    content = [1, 2, 3]
    indices = [(0,), (0,), (1,)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 912ns -> 689ns (32.4% faster)

def test_init_indices_with_large_tuple():
    # Test indices with a tuple of maximum reasonable length
    content = [1]
    indices = [tuple(range(100))]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 910ns -> 701ns (29.8% faster)

# 3. Large Scale Test Cases

def test_init_large_content_and_indices():
    # Test with large content and indices lists (1000 elements)
    content = list(range(1000))
    indices = [(i,) for i in range(1000)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 1.13μs -> 857ns (31.9% faster)

def test_init_large_content_and_multi_indices():
    # Test with large content and multi-dimensional indices
    content = ["x"] * 1000
    indices = [(i, i+1, i+2) for i in range(1000)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 1.10μs -> 856ns (28.5% faster)

def test_init_large_content_with_empty_indices():
    # Test with large content and all indices are empty tuples
    content = [None] * 1000
    indices = [() for _ in range(1000)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 1.06μs -> 809ns (30.9% faster)

def test_init_large_content_and_indices_performance():
    # Test that the function works efficiently for large inputs
    content = list(range(1000))
    indices = [(i,) for i in range(1000)]
    codeflash_output = Batch.init(content, indices); batch = codeflash_output # 1.05μs -> 832ns (26.1% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-Batch.init-mh9y6sa3 and push.

The optimization removes keyword arguments from the constructor call in the `init` method, changing `cls(content=content, indices=indices)` to `cls(content, indices)`. This eliminates the overhead of Python's keyword argument handling mechanism, which involves: - Creating a dictionary to map argument names to values - Additional parameter binding logic in the interpreter - Extra function call overhead for keyword processing The 22% speedup is achieved because object instantiation becomes more direct - Python can pass arguments positionally without the extra dictionary creation and lookup steps. This optimization is particularly effective for frequently called factory methods like `init`. The test results show consistent 20-35% improvements across all scenarios, with the best gains on simpler cases (empty lists: 36.1%, basic operations: 25-30%). Even complex scenarios with large datasets maintain 15-30% improvements, demonstrating that the optimization scales well regardless of content size or complexity. Since the constructor signature remains unchanged and arguments are passed in the same order, this is a pure performance optimization with no behavioral changes.

codeflash-ai bot requested a review from mashraf-222 October 28, 2025 02:28

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up method `Batch.init` by 22% #594

⚡️ Speed up method `Batch.init` by 22% #594

Uh oh!

codeflash-ai bot commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up method Batch.init by 22% #594

Are you sure you want to change the base?

⚡️ Speed up method Batch.init by 22% #594

Uh oh!

Conversation

codeflash-ai bot commented Oct 28, 2025

📄 22% (0.22x) speedup for Batch.init in inference/core/workflows/execution_engine/entities/base.py

📝 Explanation and details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up method `Batch.init` by 22% #594

⚡️ Speed up method `Batch.init` by 22% #594

📄 22% (0.22x) speedup for `Batch.init` in `inference/core/workflows/execution_engine/entities/base.py`