⚡️ Speed up function get_empty_batch_elements_indices by 61%
#588
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 61% (0.61x) speedup for
get_empty_batch_elements_indicesininference/core/workflows/execution_engine/v1/executor/execution_data_manager/step_input_assembler.py⏱️ Runtime :
393 microseconds→245 microseconds(best of337runs)📝 Explanation and details
The optimized code replaces recursive function calls with an iterative approach using a stack, delivering a 60% speedup. Here's why this optimization is so effective:
Key Optimization: Recursive to Iterative Conversion
Performance Impact Analysis:
Eliminates expensive set unions: The original code performed
result.union(value_result)operations (5-6.2% of total time), creating new set objects repeatedly. The optimized version directly adds indices to a single result set.Reduces function call overhead: The line profiler shows the original made 2,251 recursive calls (lines with 1023+1228 hits), while the optimized version uses simple stack operations with no function call overhead.
Better memory efficiency: Instead of creating intermediate result sets that get merged, the optimized version maintains one result set and one stack.
Test Case Performance Patterns:
The optimization is most beneficial for workloads with deeply nested or large collections of batches, where the original recursive approach created significant call stack and memory allocation overhead.
✅ Correctness verification report:
⚙️ Existing Unit Tests and Runtime
workflows/unit_tests/execution_engine/executor/execution_data_manager/test_step_input_assembler.py::test_get_empty_batch_elements_indices_from_dict_of_batchesworkflows/unit_tests/execution_engine/executor/execution_data_manager/test_step_input_assembler.py::test_get_empty_batch_elements_indices_from_list_of_batchesworkflows/unit_tests/execution_engine/executor/execution_data_manager/test_step_input_assembler.py::test_get_empty_batch_elements_indices_from_non_batch_elementsworkflows/unit_tests/execution_engine/executor/execution_data_manager/test_step_input_assembler.py::test_get_empty_batch_elements_indices_from_single_batch🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-get_empty_batch_elements_indices-mh9v16vhand push.