Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 27, 2025

📄 14% (0.14x) speedup for get_all_outputs_kind_major in inference/core/workflows/execution_engine/introspection/connections_discovery.py

⏱️ Runtime : 1.07 milliseconds 943 microseconds (best of 254 runs)

📝 Explanation and details

Optimization summary.

  • Method caching: Cache the frequently used kind_major_step_outputs[WILDCARD_KIND.name].add bound method outside the loop to avoid repeated dict and method lookup.
  • Attribute lookup reduction: Extract the block.block_class once per output manifest iteration and use a local variable inside the loop, as attribute lookup is slower than local variable.
  • Minor local variable cache: Assign kind_major_step_outputs to a local variable in the inner loop for micro-optimization; cuts one global lookup per iteration.
  • No behavioral or signature changes: The function returns the exact same dictionary structure, preserves all behavior, and uses the same input and output types. No mutation of external state.
  • No excessive new comments or code style changes: The original code style is maintained.

These micro-optimizations improve tight-loop speed, especially when blocks_description.blocks and their output manifests are reasonably large.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 29 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from collections import defaultdict
from typing import Dict, Set, Type

# imports
import pytest
from inference.core.workflows.execution_engine.introspection.connections_discovery import \
    get_all_outputs_kind_major

# --- Begin: Minimal stubs for dependencies ---

# Stub for WILDCARD_KIND
class KindStub:
    def __init__(self, name):
        self.name = name

WILDCARD_KIND = KindStub("WILDCARD")

# Stub for WorkflowBlock
class WorkflowBlock:
    pass

# Stub for OutputManifest
class OutputManifestStub:
    def __init__(self, kind):
        self.kind = kind  # List of KindStub

# Stub for BlockDescription
class BlockDescriptionStub:
    def __init__(self, block_class, outputs_manifest):
        self.block_class = block_class
        self.outputs_manifest = outputs_manifest  # List of OutputManifestStub

# Stub for BlocksDescription
class BlocksDescription:
    def __init__(self, blocks):
        self.blocks = blocks  # List of BlockDescriptionStub
from inference.core.workflows.execution_engine.introspection.connections_discovery import \
    get_all_outputs_kind_major

# unit tests

# ----------- Basic Test Cases -----------

def test_single_block_single_output_single_kind():
    # One block, one output, one kind
    class BlockA(WorkflowBlock): pass
    kind1 = KindStub("kind1")
    output = OutputManifestStub([kind1])
    block = BlockDescriptionStub(BlockA, [output])
    blocks_desc = BlocksDescription([block])
    codeflash_output = get_all_outputs_kind_major(blocks_desc); result = codeflash_output # 2.27μs -> 2.46μs (7.74% slower)

def test_multiple_blocks_multiple_kinds():
    # Two blocks, each with different output kinds
    class BlockA(WorkflowBlock): pass
    class BlockB(WorkflowBlock): pass
    kind1 = KindStub("kind1")
    kind2 = KindStub("kind2")
    outputA = OutputManifestStub([kind1])
    outputB = OutputManifestStub([kind2])
    blockA = BlockDescriptionStub(BlockA, [outputA])
    blockB = BlockDescriptionStub(BlockB, [outputB])
    blocks_desc = BlocksDescription([blockA, blockB])
    codeflash_output = get_all_outputs_kind_major(blocks_desc); result = codeflash_output # 2.08μs -> 2.15μs (3.26% slower)

def test_block_with_multiple_output_kinds():
    # One block, one output, multiple kinds
    class BlockA(WorkflowBlock): pass
    kind1 = KindStub("kind1")
    kind2 = KindStub("kind2")
    output = OutputManifestStub([kind1, kind2])
    block = BlockDescriptionStub(BlockA, [output])
    blocks_desc = BlocksDescription([block])
    codeflash_output = get_all_outputs_kind_major(blocks_desc); result = codeflash_output # 1.74μs -> 1.91μs (8.95% slower)

def test_block_with_multiple_outputs_each_with_kinds():
    # One block, multiple outputs, each with different kind
    class BlockA(WorkflowBlock): pass
    kind1 = KindStub("kind1")
    kind2 = KindStub("kind2")
    output1 = OutputManifestStub([kind1])
    output2 = OutputManifestStub([kind2])
    block = BlockDescriptionStub(BlockA, [output1, output2])
    blocks_desc = BlocksDescription([block])
    codeflash_output = get_all_outputs_kind_major(blocks_desc); result = codeflash_output # 1.71μs -> 1.85μs (8.03% slower)

# ----------- Edge Test Cases -----------

def test_no_blocks():
    # No blocks in description
    blocks_desc = BlocksDescription([])
    codeflash_output = get_all_outputs_kind_major(blocks_desc); result = codeflash_output # 710ns -> 1.12μs (36.8% slower)

def test_block_with_no_outputs():
    # Block with no outputs_manifest
    class BlockA(WorkflowBlock): pass
    block = BlockDescriptionStub(BlockA, [])
    blocks_desc = BlocksDescription([block])
    codeflash_output = get_all_outputs_kind_major(blocks_desc); result = codeflash_output # 1.36μs -> 1.41μs (3.27% slower)

def test_output_with_empty_kind_list():
    # Block with output whose kind list is empty
    class BlockA(WorkflowBlock): pass
    output = OutputManifestStub([])
    block = BlockDescriptionStub(BlockA, [output])
    blocks_desc = BlocksDescription([block])
    codeflash_output = get_all_outputs_kind_major(blocks_desc); result = codeflash_output # 1.36μs -> 1.45μs (6.55% slower)

def test_duplicate_kinds_across_outputs():
    # Block with multiple outputs, same kind in both
    class BlockA(WorkflowBlock): pass
    kind1 = KindStub("kind1")
    output1 = OutputManifestStub([kind1])
    output2 = OutputManifestStub([kind1])
    block = BlockDescriptionStub(BlockA, [output1, output2])
    blocks_desc = BlocksDescription([block])
    codeflash_output = get_all_outputs_kind_major(blocks_desc); result = codeflash_output # 1.79μs -> 1.90μs (5.74% slower)

def test_multiple_blocks_same_kind():
    # Multiple blocks, all outputting the same kind
    class BlockA(WorkflowBlock): pass
    class BlockB(WorkflowBlock): pass
    kind1 = KindStub("kind1")
    outputA = OutputManifestStub([kind1])
    outputB = OutputManifestStub([kind1])
    blockA = BlockDescriptionStub(BlockA, [outputA])
    blockB = BlockDescriptionStub(BlockB, [outputB])
    blocks_desc = BlocksDescription([blockA, blockB])
    codeflash_output = get_all_outputs_kind_major(blocks_desc); result = codeflash_output # 1.82μs -> 1.99μs (8.21% slower)

def test_kind_name_collision_with_wildcard():
    # Kind with name 'WILDCARD' (collision)
    class BlockA(WorkflowBlock): pass
    kind_wildcard = KindStub("WILDCARD")
    output = OutputManifestStub([kind_wildcard])
    block = BlockDescriptionStub(BlockA, [output])
    blocks_desc = BlocksDescription([block])
    codeflash_output = get_all_outputs_kind_major(blocks_desc); result = codeflash_output # 1.57μs -> 1.75μs (10.1% slower)

def test_block_class_identity():
    # Use two blocks with same name but different classes
    class BlockA1(WorkflowBlock): pass
    class BlockA2(WorkflowBlock): pass
    kind1 = KindStub("kind1")
    output1 = OutputManifestStub([kind1])
    output2 = OutputManifestStub([kind1])
    block1 = BlockDescriptionStub(BlockA1, [output1])
    block2 = BlockDescriptionStub(BlockA2, [output2])
    blocks_desc = BlocksDescription([block1, block2])
    codeflash_output = get_all_outputs_kind_major(blocks_desc); result = codeflash_output # 1.85μs -> 1.96μs (5.58% slower)

# ----------- Large Scale Test Cases -----------

def test_large_number_of_blocks_and_kinds():
    # Many blocks, each with a unique kind
    num_blocks = 500
    block_classes = []
    blocks = []
    for i in range(num_blocks):
        # Dynamically create block classes
        block_class = type(f"Block{i}", (WorkflowBlock,), {})
        block_classes.append(block_class)
        kind = KindStub(f"kind{i}")
        output = OutputManifestStub([kind])
        block = BlockDescriptionStub(block_class, [output])
        blocks.append(block)
    blocks_desc = BlocksDescription(blocks)
    codeflash_output = get_all_outputs_kind_major(blocks_desc); result = codeflash_output # 144μs -> 132μs (8.80% faster)
    # Each kind should map to its block
    for i in range(num_blocks):
        pass

def test_large_number_of_kinds_per_block():
    # One block, many kinds
    class BlockA(WorkflowBlock): pass
    num_kinds = 500
    kinds = [KindStub(f"kind{i}") for i in range(num_kinds)]
    output = OutputManifestStub(kinds)
    block = BlockDescriptionStub(BlockA, [output])
    blocks_desc = BlocksDescription([block])
    codeflash_output = get_all_outputs_kind_major(blocks_desc); result = codeflash_output # 61.6μs -> 58.1μs (6.08% faster)
    # Each kind should map to BlockA
    for i in range(num_kinds):
        pass

def test_large_number_of_blocks_with_shared_kinds():
    # Many blocks, all output the same set of kinds
    num_blocks = 300
    num_kinds = 5
    shared_kinds = [KindStub(f"kind{i}") for i in range(num_kinds)]
    block_classes = []
    blocks = []
    for i in range(num_blocks):
        block_class = type(f"Block{i}", (WorkflowBlock,), {})
        block_classes.append(block_class)
        output = OutputManifestStub(shared_kinds)
        block = BlockDescriptionStub(block_class, [output])
        blocks.append(block)
    blocks_desc = BlocksDescription(blocks)
    codeflash_output = get_all_outputs_kind_major(blocks_desc); result = codeflash_output # 109μs -> 94.1μs (16.8% faster)
    # Each kind should contain all blocks
    for i in range(num_kinds):
        pass

def test_performance_large_empty_outputs():
    # Many blocks, all with empty outputs_manifest
    num_blocks = 800
    block_classes = []
    blocks = []
    for i in range(num_blocks):
        block_class = type(f"Block{i}", (WorkflowBlock,), {})
        block_classes.append(block_class)
        block = BlockDescriptionStub(block_class, [])
        blocks.append(block)
    blocks_desc = BlocksDescription(blocks)
    codeflash_output = get_all_outputs_kind_major(blocks_desc); result = codeflash_output # 81.3μs -> 62.5μs (30.0% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from collections import defaultdict
from typing import Dict, Set, Type

# imports
import pytest
from inference.core.workflows.execution_engine.introspection.connections_discovery import \
    get_all_outputs_kind_major


# Mocks for dependencies
class MockKind:
    def __init__(self, name):
        self.name = name

class MockOutput:
    def __init__(self, kinds):
        self.kind = kinds  # List[MockKind]

class MockWorkflowBlock:
    pass

class MockWorkflowBlockA(MockWorkflowBlock):
    pass

class MockWorkflowBlockB(MockWorkflowBlock):
    pass

class MockBlock:
    def __init__(self, block_class, outputs_manifest):
        self.block_class = block_class
        self.outputs_manifest = outputs_manifest  # List[MockOutput]

class MockBlocksDescription:
    def __init__(self, blocks):
        self.blocks = blocks

# Simulate WILDCARD_KIND from imported module
class MockWildcardKind:
    name = "__wildcard__"

WILDCARD_KIND = MockWildcardKind()
from inference.core.workflows.execution_engine.introspection.connections_discovery import \
    get_all_outputs_kind_major

# unit tests

# 1. Basic Test Cases

def test_single_block_single_output_single_kind():
    # One block, one output, one kind
    kind1 = MockKind("image")
    output = MockOutput([kind1])
    block = MockBlock(MockWorkflowBlockA, [output])
    blocks_description = MockBlocksDescription([block])
    codeflash_output = get_all_outputs_kind_major(blocks_description); result = codeflash_output # 2.12μs -> 2.32μs (8.69% slower)

def test_single_block_multiple_outputs_multiple_kinds():
    # One block, two outputs, each with different kinds
    kind1 = MockKind("image")
    kind2 = MockKind("text")
    output1 = MockOutput([kind1])
    output2 = MockOutput([kind2])
    block = MockBlock(MockWorkflowBlockA, [output1, output2])
    blocks_description = MockBlocksDescription([block])
    codeflash_output = get_all_outputs_kind_major(blocks_description); result = codeflash_output # 1.78μs -> 2.02μs (11.7% slower)

def test_multiple_blocks_shared_and_unique_kinds():
    # Two blocks, some shared kinds, some unique
    kind1 = MockKind("image")
    kind2 = MockKind("text")
    kind3 = MockKind("audio")
    outputA = MockOutput([kind1, kind2])
    outputB = MockOutput([kind2, kind3])
    blockA = MockBlock(MockWorkflowBlockA, [outputA])
    blockB = MockBlock(MockWorkflowBlockB, [outputB])
    blocks_description = MockBlocksDescription([blockA, blockB])
    codeflash_output = get_all_outputs_kind_major(blocks_description); result = codeflash_output # 2.32μs -> 2.24μs (3.85% faster)

def test_block_with_no_outputs():
    # Block with empty outputs_manifest
    block = MockBlock(MockWorkflowBlockA, [])
    blocks_description = MockBlocksDescription([block])
    codeflash_output = get_all_outputs_kind_major(blocks_description); result = codeflash_output # 1.30μs -> 1.38μs (5.23% slower)

# 2. Edge Test Cases

def test_no_blocks():
    # Empty blocks_description
    blocks_description = MockBlocksDescription([])
    codeflash_output = get_all_outputs_kind_major(blocks_description); result = codeflash_output # 715ns -> 1.15μs (37.8% slower)

def test_output_with_empty_kind_list():
    # Block with output whose kind is empty list
    output = MockOutput([])
    block = MockBlock(MockWorkflowBlockA, [output])
    blocks_description = MockBlocksDescription([block])
    codeflash_output = get_all_outputs_kind_major(blocks_description); result = codeflash_output # 1.43μs -> 1.45μs (1.17% slower)

def test_duplicate_kinds_in_output():
    # Block with output whose kind list has duplicates
    kind1 = MockKind("image")
    output = MockOutput([kind1, kind1])
    block = MockBlock(MockWorkflowBlockA, [output])
    blocks_description = MockBlocksDescription([block])
    codeflash_output = get_all_outputs_kind_major(blocks_description); result = codeflash_output # 1.82μs -> 1.82μs (0.385% faster)

def test_multiple_blocks_same_class():
    # Two blocks, same class, different outputs
    kind1 = MockKind("image")
    kind2 = MockKind("text")
    block1 = MockBlock(MockWorkflowBlockA, [MockOutput([kind1])])
    block2 = MockBlock(MockWorkflowBlockA, [MockOutput([kind2])])
    blocks_description = MockBlocksDescription([block1, block2])
    codeflash_output = get_all_outputs_kind_major(blocks_description); result = codeflash_output # 1.96μs -> 2.06μs (5.19% slower)

def test_kind_name_collision():
    # Two different kind objects with same name
    kind1a = MockKind("image")
    kind1b = MockKind("image")
    blockA = MockBlock(MockWorkflowBlockA, [MockOutput([kind1a])])
    blockB = MockBlock(MockWorkflowBlockB, [MockOutput([kind1b])])
    blocks_description = MockBlocksDescription([blockA, blockB])
    codeflash_output = get_all_outputs_kind_major(blocks_description); result = codeflash_output # 1.83μs -> 2.00μs (8.52% slower)

def test_block_with_multiple_outputs_some_empty():
    # Block with some outputs having empty kind lists
    kind1 = MockKind("image")
    output1 = MockOutput([kind1])
    output2 = MockOutput([])
    block = MockBlock(MockWorkflowBlockA, [output1, output2])
    blocks_description = MockBlocksDescription([block])
    codeflash_output = get_all_outputs_kind_major(blocks_description); result = codeflash_output # 1.55μs -> 1.63μs (5.32% slower)

# 3. Large Scale Test Cases

def test_many_blocks_many_kinds():
    # 100 blocks, each with 5 outputs, each output with 2 unique kinds
    num_blocks = 100
    num_outputs = 5
    num_kinds_per_output = 2
    blocks = []
    kind_names = [f"kind_{i}" for i in range(num_blocks * num_outputs * num_kinds_per_output)]
    kind_objs = [MockKind(name) for name in kind_names]
    kind_idx = 0
    block_classes = []
    for b in range(num_blocks):
        class_name = f"Block_{b}"
        block_class = type(class_name, (MockWorkflowBlock,), {})
        block_classes.append(block_class)
        outputs = []
        for o in range(num_outputs):
            kinds = kind_objs[kind_idx:kind_idx+num_kinds_per_output]
            kind_idx += num_kinds_per_output
            outputs.append(MockOutput(kinds))
        blocks.append(MockBlock(block_class, outputs))
    blocks_description = MockBlocksDescription(blocks)
    codeflash_output = get_all_outputs_kind_major(blocks_description); result = codeflash_output # 174μs -> 157μs (10.6% faster)
    # Each kind should have exactly one block class
    for i, kind in enumerate(kind_objs):
        # Find which block it should belong to
        block_num = i // (num_outputs * num_kinds_per_output)
        block_class = block_classes[block_num]

def test_large_number_of_kinds_per_block():
    # One block, 500 outputs, each output with a unique kind
    num_outputs = 500
    kind_objs = [MockKind(f"kind_{i}") for i in range(num_outputs)]
    outputs = [MockOutput([kind]) for kind in kind_objs]
    block = MockBlock(MockWorkflowBlockA, outputs)
    blocks_description = MockBlocksDescription([block])
    codeflash_output = get_all_outputs_kind_major(blocks_description); result = codeflash_output # 83.0μs -> 81.2μs (2.18% faster)
    # Each kind should have the block class
    for kind in kind_objs:
        pass

def test_large_number_of_blocks_with_shared_kind():
    # 250 blocks, all output to the same kind
    num_blocks = 250
    shared_kind = MockKind("shared")
    block_classes = []
    blocks = []
    for i in range(num_blocks):
        class_name = f"Block_{i}"
        block_class = type(class_name, (MockWorkflowBlock,), {})
        block_classes.append(block_class)
        outputs = [MockOutput([shared_kind])]
        blocks.append(MockBlock(block_class, outputs))
    blocks_description = MockBlocksDescription(blocks)
    codeflash_output = get_all_outputs_kind_major(blocks_description); result = codeflash_output # 47.6μs -> 39.1μs (21.6% faster)

def test_performance_with_maximum_elements():
    # 1000 blocks, each with 1 output, each output with 1 unique kind
    num_blocks = 1000
    block_classes = []
    blocks = []
    for i in range(num_blocks):
        class_name = f"Block_{i}"
        block_class = type(class_name, (MockWorkflowBlock,), {})
        block_classes.append(block_class)
        kind = MockKind(f"kind_{i}")
        outputs = [MockOutput([kind])]
        blocks.append(MockBlock(block_class, outputs))
    blocks_description = MockBlocksDescription(blocks)
    codeflash_output = get_all_outputs_kind_major(blocks_description); result = codeflash_output # 337μs -> 279μs (20.7% faster)
    # Each kind includes its block class
    for i in range(num_blocks):
        pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-get_all_outputs_kind_major-mh9pcucu and push.

Codeflash

#### Optimization summary.
- **Method caching:** Cache the frequently used `kind_major_step_outputs[WILDCARD_KIND.name].add` bound method outside the loop to avoid repeated dict and method lookup.
- **Attribute lookup reduction:** Extract the `block.block_class` once per output manifest iteration and use a local variable inside the loop, as attribute lookup is slower than local variable.
- **Minor local variable cache:** Assign `kind_major_step_outputs` to a local variable in the inner loop for micro-optimization; cuts one global lookup per iteration.
- **No behavioral or signature changes:** The function returns the exact same dictionary structure, preserves all behavior, and uses the same input and output types. No mutation of external state.
- **No excessive new comments or code style changes:** The original code style is maintained.

These micro-optimizations improve tight-loop speed, especially when `blocks_description.blocks` and their output manifests are reasonably large.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 27, 2025 22:21
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant