⚡️ Speed up function reconstruct_multicond_batch by 9%
#8
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 9% (0.09x) speedup for
reconstruct_multicond_batchinmodules/prompt_parser.py⏱️ Runtime :
1.73 milliseconds→1.60 milliseconds(best of224runs)📝 Explanation and details
The optimized code achieves an 8% speedup through two main improvements:
1. Smarter tensor padding in
stack_conds:any(tc != token_count for tc in token_counts)checktensor[-1:].repeat([rows_to_add, 1])withtensor[-1:].expand(rows_to_add, -1)-expandcreates a view without copying data, whilerepeatallocates new memorytensorsin-place, avoiding potential memory fragmentation2. Micro-optimizations in schedule lookup:
schedulesandn_schedulesvariables to avoid repeated attribute lookups in the tight loopThe optimizations are particularly effective for:
The performance gains come from reducing memory allocations and copies, especially beneficial when processing large batches or when tensor shapes vary significantly, which are common in prompt conditioning workflows.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-reconstruct_multicond_batch-mh9zb8d9and push.