Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 28, 2025

📄 17% (0.17x) speedup for create_classes_index in inference/core/workflows/core_steps/formatters/vlm_as_detector/v2.py

⏱️ Runtime : 416 microseconds 355 microseconds (best of 197 runs)

📝 Explanation and details

The optimization replaces a dictionary comprehension with dict(zip()) construction, yielding a 17% speedup by eliminating Python bytecode overhead.

Key changes:

  • Original: {class_name: idx for idx, class_name in enumerate(classes)} - uses dictionary comprehension with enumerate
  • Optimized: dict(zip(classes, range(len(classes)))) - uses built-in dict() constructor with zip()

Why this is faster:

  • Dictionary comprehensions execute in Python bytecode with per-iteration overhead for variable assignments and scope management
  • dict(zip()) leverages C-optimized internals in CPython - both zip() and the dict() constructor run at C speed
  • range(len(classes)) is more efficient than enumerate() since it avoids tuple unpacking on each iteration

Performance characteristics from tests:

  • Small lists (1-10 items): Shows 10-27% slower performance due to function call overhead outweighing the optimization benefit
  • Large lists (1000+ items): Shows 14-43% faster performance where the C-level optimizations dominate
  • Best gains: Lists with many duplicates (39-43% faster) where the reduced per-iteration overhead compounds

This optimization is most beneficial for large-scale scenarios typical in machine learning workflows where class lists can contain hundreds or thousands of entries.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 37 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from typing import Dict, List

# imports
import pytest  # used for our unit tests
from inference.core.workflows.core_steps.formatters.vlm_as_detector.v2 import \
    create_classes_index

# unit tests

# -----------------------------
# Basic Test Cases
# -----------------------------

def test_basic_single_class():
    # Test with a single class
    codeflash_output = create_classes_index(["cat"]) # 1.03μs -> 1.99μs (48.2% slower)

def test_basic_multiple_classes():
    # Test with multiple distinct class names
    codeflash_output = create_classes_index(["cat", "dog", "mouse"]) # 1.12μs -> 1.40μs (19.7% slower)

def test_basic_empty_list():
    # Test with an empty list
    codeflash_output = create_classes_index([]) # 844ns -> 1.19μs (29.3% slower)

def test_basic_class_names_with_spaces():
    # Test with class names that contain spaces
    codeflash_output = create_classes_index(["red apple", "green apple"]) # 1.07μs -> 1.35μs (20.6% slower)

def test_basic_class_names_with_special_characters():
    # Test with class names containing special characters
    codeflash_output = create_classes_index(["c@t", "d#g", "m!ouse"]) # 1.07μs -> 1.23μs (13.3% slower)

# -----------------------------
# Edge Test Cases
# -----------------------------

def test_edge_duplicate_class_names():
    # Test with duplicate class names; only the last occurrence should be in the result
    codeflash_output = create_classes_index(["cat", "dog", "cat", "mouse"]) # 1.16μs -> 1.35μs (13.6% slower)

def test_edge_all_duplicates():
    # Test with all class names being the same
    codeflash_output = create_classes_index(["cat", "cat", "cat"]) # 1.05μs -> 1.27μs (17.8% slower)

def test_edge_class_names_with_empty_strings():
    # Test with empty string as a class name
    codeflash_output = create_classes_index(["", "cat", ""]) # 974ns -> 1.28μs (23.8% slower)

def test_edge_class_names_with_whitespace_strings():
    # Test with whitespace strings as class names
    codeflash_output = create_classes_index([" ", "cat", "  "]) # 1.03μs -> 1.23μs (16.1% slower)



def test_edge_class_names_with_numeric_strings():
    # Test with numeric strings as class names
    codeflash_output = create_classes_index(["1", "2", "3"]) # 1.38μs -> 1.84μs (25.1% slower)

def test_edge_class_names_with_unicode():
    # Test with unicode class names
    codeflash_output = create_classes_index(["猫", "犬", "鼠"]) # 1.12μs -> 1.50μs (25.4% slower)






def test_large_scale_unique_classes():
    # Test with 1000 unique class names
    classes = [f"class_{i}" for i in range(1000)]
    codeflash_output = create_classes_index(classes); result = codeflash_output # 56.9μs -> 49.9μs (13.9% faster)
    # Assert correct mapping
    for i, name in enumerate(classes):
        pass

def test_large_scale_duplicate_classes():
    # Test with 1000 class names, half duplicates
    classes = [f"class_{i // 2}" for i in range(1000)]
    codeflash_output = create_classes_index(classes); result = codeflash_output # 49.6μs -> 41.6μs (19.4% faster)
    # Each class should map to its last occurrence index
    for i in range(500):
        pass

def test_large_scale_all_same_class():
    # Test with 1000 identical class names
    classes = ["cat"] * 1000
    codeflash_output = create_classes_index(classes); result = codeflash_output # 23.3μs -> 16.8μs (38.7% faster)

def test_large_scale_long_class_names():
    # Test with long class names
    classes = [f"class_{'x'*100}_{i}" for i in range(1000)]
    codeflash_output = create_classes_index(classes); result = codeflash_output # 86.4μs -> 80.1μs (7.97% faster)
    for i, name in enumerate(classes):
        pass

def test_large_scale_edge_empty_string_classes():
    # Test with 1000 empty strings as class names
    classes = [""] * 1000
    codeflash_output = create_classes_index(classes); result = codeflash_output # 23.1μs -> 16.6μs (39.8% faster)

# -----------------------------
# Mutation Testing Guards
# -----------------------------

def test_mutation_missing_enumerate():
    # If the function does not use enumerate, the indices will be wrong
    classes = ["a", "b", "c"]
    codeflash_output = create_classes_index(classes); result = codeflash_output # 1.18μs -> 1.45μs (18.4% slower)

def test_mutation_wrong_value_assignment():
    # If the function assigns the same value to all keys, this will fail
    classes = ["x", "y", "z"]
    codeflash_output = create_classes_index(classes); result = codeflash_output # 1.07μs -> 1.29μs (17.1% slower)

def test_mutation_wrong_key_assignment():
    # If the function uses indices as keys, this will fail
    classes = ["foo", "bar"]
    codeflash_output = create_classes_index(classes); result = codeflash_output # 1.02μs -> 1.17μs (12.5% slower)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from typing import Dict, List

# imports
import pytest  # used for our unit tests
from inference.core.workflows.core_steps.formatters.vlm_as_detector.v2 import \
    create_classes_index

# unit tests

# 1. Basic Test Cases

def test_basic_single_class():
    # Test with a single class
    classes = ["cat"]
    expected = {"cat": 0}
    codeflash_output = create_classes_index(classes) # 1.06μs -> 1.47μs (27.4% slower)

def test_basic_multiple_classes():
    # Test with multiple distinct classes
    classes = ["cat", "dog", "bird"]
    expected = {"cat": 0, "dog": 1, "bird": 2}
    codeflash_output = create_classes_index(classes) # 1.10μs -> 1.33μs (17.8% slower)

def test_basic_empty_list():
    # Test with an empty list
    classes = []
    expected = {}
    codeflash_output = create_classes_index(classes) # 885ns -> 1.21μs (26.6% slower)

def test_basic_classes_with_spaces():
    # Test with class names containing spaces
    classes = ["red fox", "gray wolf", "polar bear"]
    expected = {"red fox": 0, "gray wolf": 1, "polar bear": 2}
    codeflash_output = create_classes_index(classes) # 1.08μs -> 1.28μs (15.6% slower)

def test_basic_classes_with_special_characters():
    # Test with class names containing special characters
    classes = ["cat$", "dog#", "@bird"]
    expected = {"cat$": 0, "dog#": 1, "@bird": 2}
    codeflash_output = create_classes_index(classes) # 1.09μs -> 1.30μs (16.4% slower)

# 2. Edge Test Cases

def test_edge_duplicate_class_names():
    # Test with duplicate class names; only the last occurrence should be kept
    classes = ["cat", "dog", "cat", "bird", "dog"]
    expected = {"cat": 2, "dog": 4, "bird": 3}
    codeflash_output = create_classes_index(classes) # 1.11μs -> 1.27μs (12.5% slower)

def test_edge_all_duplicates():
    # Test with all class names the same
    classes = ["cat", "cat", "cat"]
    expected = {"cat": 2}
    codeflash_output = create_classes_index(classes) # 1.07μs -> 1.20μs (10.8% slower)

def test_edge_empty_string_class_names():
    # Test with empty string class names
    classes = ["", "cat", ""]
    expected = {"": 2, "cat": 1}
    codeflash_output = create_classes_index(classes) # 1.01μs -> 1.18μs (14.7% slower)


def test_edge_numeric_class_names():
    # Test with numeric class names (as strings)
    classes = ["1", "2", "3"]
    expected = {"1": 0, "2": 1, "3": 2}
    codeflash_output = create_classes_index(classes) # 1.33μs -> 1.78μs (25.5% slower)


def test_edge_unicode_class_names():
    # Test with unicode class names
    classes = ["猫", "狗", "鸟"]
    expected = {"猫": 0, "狗": 1, "鸟": 2}
    codeflash_output = create_classes_index(classes) # 1.33μs -> 1.81μs (26.7% slower)

def test_edge_long_class_names():
    # Test with very long class names
    classes = ["a"*100, "b"*200, "c"*300]
    expected = { "a"*100: 0, "b"*200: 1, "c"*300: 2 }
    codeflash_output = create_classes_index(classes) # 1.08μs -> 1.32μs (18.0% slower)

def test_edge_class_names_with_newlines_and_tabs():
    # Test with class names containing newline and tab characters
    classes = ["cat\n", "dog\t", "bird\r"]
    expected = {"cat\n": 0, "dog\t": 1, "bird\r": 2}
    codeflash_output = create_classes_index(classes) # 1.09μs -> 1.35μs (19.0% slower)

# 3. Large Scale Test Cases

def test_large_scale_1000_unique_classes():
    # Test with 1000 unique class names
    classes = [f"class_{i}" for i in range(1000)]
    expected = {f"class_{i}": i for i in range(1000)}
    codeflash_output = create_classes_index(classes) # 53.5μs -> 47.0μs (13.8% faster)

def test_large_scale_1000_duplicates():
    # Test with 1000 class names, all the same
    classes = ["cat"] * 1000
    expected = {"cat": 999}
    codeflash_output = create_classes_index(classes) # 23.3μs -> 16.7μs (39.1% faster)

def test_large_scale_mixed_duplicates():
    # Test with 500 "cat", 500 "dog"
    classes = ["cat"] * 500 + ["dog"] * 500
    expected = {"cat": 499, "dog": 999}
    codeflash_output = create_classes_index(classes) # 23.3μs -> 16.4μs (41.7% faster)

def test_large_scale_alternating_duplicates():
    # Test with alternating duplicate class names
    classes = ["cat" if i % 2 == 0 else "dog" for i in range(1000)]
    # Last occurrence of "cat" is at index 998, "dog" at 999
    expected = {"cat": 998, "dog": 999}
    codeflash_output = create_classes_index(classes) # 22.9μs -> 16.0μs (42.9% faster)

def test_large_scale_long_names_and_duplicates():
    # Test with long class names and duplicates
    classes = ["a"*50]*500 + ["b"*60]*500
    expected = {"a"*50: 499, "b"*60: 999}
    codeflash_output = create_classes_index(classes) # 23.5μs -> 16.4μs (43.2% faster)

# Additional edge case: input is not a list

def test_edge_input_none():
    # Test with input being None; should raise TypeError
    classes = None
    with pytest.raises(TypeError):
        create_classes_index(classes) # 1.49μs -> 1.26μs (18.0% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-create_classes_index-mh9xb07z and push.

Codeflash

The optimization replaces a dictionary comprehension with `dict(zip())` construction, yielding a **17% speedup** by eliminating Python bytecode overhead.

**Key changes:**
- **Original**: `{class_name: idx for idx, class_name in enumerate(classes)}` - uses dictionary comprehension with enumerate
- **Optimized**: `dict(zip(classes, range(len(classes))))` - uses built-in `dict()` constructor with `zip()`

**Why this is faster:**
- Dictionary comprehensions execute in Python bytecode with per-iteration overhead for variable assignments and scope management
- `dict(zip())` leverages C-optimized internals in CPython - both `zip()` and the `dict()` constructor run at C speed
- `range(len(classes))` is more efficient than `enumerate()` since it avoids tuple unpacking on each iteration

**Performance characteristics from tests:**
- **Small lists (1-10 items)**: Shows 10-27% slower performance due to function call overhead outweighing the optimization benefit
- **Large lists (1000+ items)**: Shows 14-43% faster performance where the C-level optimizations dominate
- **Best gains**: Lists with many duplicates (39-43% faster) where the reduced per-iteration overhead compounds

This optimization is most beneficial for large-scale scenarios typical in machine learning workflows where class lists can contain hundreds or thousands of entries.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 28, 2025 02:04
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant