⚡️ Speed up method `CosineSimilarityBlockV1.run` by 71% #587

codeflash-ai · 2025-10-28T00:43:24Z

📄 71% (0.71x) speedup for `CosineSimilarityBlockV1.run` in `inference/core/workflows/core_steps/math/cosine_similarity/v1.py`

⏱️ Runtime : 1.19 milliseconds → 694 microseconds (best of 282 runs)

📝 Explanation and details

Explanation of Optimizations:

cosine_similarity:
- Replaces individual np.linalg.norm(a) and np.linalg.norm(b) calls with a precomputed norm (np.sqrt(np.dot(a, a))), which is significantly faster because it avoids function overhead and repeated full-array iterations.
- Uses np.asarray for converting input arguments to arrays only if necessary, ensuring compatibility with list inputs while avoiding unnecessary copies.
- Uses np.dot(a, b) for the numerator and precomputed norms for the denominator to minimize temporary allocations.
- Bypasses the need for defensive shape-casting in the hot path (relies on existing length check in the caller).
CosineSimilarityBlockV1:
- Converts input lists to NumPy arrays only once, if not already (for single conversion and efficiency).
- All exception and return logic remain unchanged.

These changes ensure less overhead per function call, especially when the function is called repeatedly or on large vectors, resulting in >20% faster execution for typical input types.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 80 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Generated Regression Tests and Runtime

import math
# function to test
from typing import Any, Dict, List

# imports
import pytest  # used for our unit tests
from inference.core.workflows.core_steps.math.cosine_similarity.v1 import \
    CosineSimilarityBlockV1

# unit tests

# -------- BASIC TEST CASES --------

def test_basic_identical_vectors():
    # Identical vectors should have similarity 1.0
    block = CosineSimilarityBlockV1()
    v = [1.0, 2.0, 3.0]
    codeflash_output = block.run(v, v); result = codeflash_output # 19.0μs -> 14.1μs (34.5% faster)

def test_basic_orthogonal_vectors():
    # Orthogonal vectors should have similarity 0.0
    block = CosineSimilarityBlockV1()
    v1 = [1, 0]
    v2 = [0, 1]
    codeflash_output = block.run(v1, v2); result = codeflash_output # 17.9μs -> 14.3μs (25.3% faster)

def test_basic_opposite_vectors():
    # Opposite vectors should have similarity -1.0
    block = CosineSimilarityBlockV1()
    v1 = [1, 0]
    v2 = [-1, 0]
    codeflash_output = block.run(v1, v2); result = codeflash_output # 14.9μs -> 11.1μs (33.9% faster)

def test_basic_nontrivial_vectors():
    # Test with nontrivial vectors
    block = CosineSimilarityBlockV1()
    v1 = [1, 2, 3]
    v2 = [4, 5, 6]
    # Compute expected value
    dot = 1*4 + 2*5 + 3*6
    norm1 = math.sqrt(1**2 + 2**2 + 3**2)
    norm2 = math.sqrt(4**2 + 5**2 + 6**2)
    expected = dot / (norm1 * norm2)
    codeflash_output = block.run(v1, v2); result = codeflash_output # 14.8μs -> 10.7μs (37.4% faster)

def test_basic_negative_values():
    # Test with negative values
    block = CosineSimilarityBlockV1()
    v1 = [-1, -2, -3]
    v2 = [-1, -2, -3]
    codeflash_output = block.run(v1, v2); result = codeflash_output # 14.7μs -> 10.6μs (38.4% faster)

def test_basic_mixed_sign_values():
    # Test with mixed sign values
    block = CosineSimilarityBlockV1()
    v1 = [1, -2, 3]
    v2 = [-1, 2, -3]
    codeflash_output = block.run(v1, v2); result = codeflash_output # 14.7μs -> 10.7μs (38.1% faster)

def test_basic_float_precision():
    # Test with float values
    block = CosineSimilarityBlockV1()
    v1 = [0.1, 0.2, 0.3]
    v2 = [0.4, 0.5, 0.6]
    dot = 0.1*0.4 + 0.2*0.5 + 0.3*0.6
    norm1 = math.sqrt(0.1**2 + 0.2**2 + 0.3**2)
    norm2 = math.sqrt(0.4**2 + 0.5**2 + 0.6**2)
    expected = dot / (norm1 * norm2)
    codeflash_output = block.run(v1, v2); result = codeflash_output # 12.3μs -> 9.73μs (26.2% faster)

# -------- EDGE TEST CASES --------

def test_edge_different_lengths():
    # Should raise RuntimeError if vectors have different lengths
    block = CosineSimilarityBlockV1()
    v1 = [1, 2, 3]
    v2 = [1, 2]
    with pytest.raises(RuntimeError) as excinfo:
        block.run(v1, v2) # 1.29μs -> 1.31μs (1.37% slower)

def test_edge_zero_vector():
    # Should raise ValueError if either vector is all zeros
    block = CosineSimilarityBlockV1()
    v1 = [0, 0, 0]
    v2 = [1, 2, 3]
    with pytest.raises(ValueError) as excinfo:
        block.run(v1, v2)

def test_edge_both_zero_vectors():
    # Should raise ValueError if both vectors are all zeros
    block = CosineSimilarityBlockV1()
    v1 = [0, 0, 0]
    v2 = [0, 0, 0]
    with pytest.raises(ValueError) as excinfo:
        block.run(v1, v2)

def test_edge_empty_vectors():
    # Should raise ValueError if either vector is empty
    block = CosineSimilarityBlockV1()
    v1 = []
    v2 = []
    with pytest.raises(ValueError) as excinfo:
        block.run(v1, v2)

def test_edge_single_element_vectors():
    # Single element vectors should work
    block = CosineSimilarityBlockV1()
    v1 = [1]
    v2 = [1]
    codeflash_output = block.run(v1, v2); result = codeflash_output # 28.9μs -> 22.8μs (26.8% faster)

def test_edge_large_magnitude_values():
    # Test with very large values
    block = CosineSimilarityBlockV1()
    v1 = [1e100, 2e100, 3e100]
    v2 = [1e100, 2e100, 3e100]
    codeflash_output = block.run(v1, v2); result = codeflash_output # 14.3μs -> 11.1μs (29.2% faster)

def test_edge_small_magnitude_values():
    # Test with very small values
    block = CosineSimilarityBlockV1()
    v1 = [1e-100, 2e-100, 3e-100]
    v2 = [1e-100, 2e-100, 3e-100]
    codeflash_output = block.run(v1, v2); result = codeflash_output # 12.2μs -> 9.07μs (34.1% faster)

def test_edge_nan_values():
    # Should raise ValueError if vectors contain NaN
    block = CosineSimilarityBlockV1()
    v1 = [float('nan'), 1, 2]
    v2 = [1, 2, 3]
    with pytest.raises(ValueError):
        block.run(v1, v2)

def test_edge_inf_values():
    # Should raise ValueError if vectors contain inf
    block = CosineSimilarityBlockV1()
    v1 = [float('inf'), 1, 2]
    v2 = [1, 2, 3]
    with pytest.raises(ValueError):
        block.run(v1, v2)

# -------- LARGE SCALE TEST CASES --------

def test_large_scale_high_dimension():
    # Test with vectors of length 1000
    block = CosineSimilarityBlockV1()
    v1 = [i for i in range(1, 1001)]
    v2 = [i for i in range(1, 1001)]
    codeflash_output = block.run(v1, v2); result = codeflash_output # 131μs -> 69.0μs (91.0% faster)

def test_large_scale_high_dimension_orthogonal():
    # Test with two orthogonal vectors of length 1000
    block = CosineSimilarityBlockV1()
    v1 = [1 if i % 2 == 0 else 0 for i in range(1000)]
    v2 = [0 if i % 2 == 0 else 1 for i in range(1000)]
    codeflash_output = block.run(v1, v2); result = codeflash_output # 121μs -> 59.4μs (104% faster)

def test_large_scale_random_vectors():
    # Test with two random vectors of length 1000
    import random
    block = CosineSimilarityBlockV1()
    random.seed(42)
    v1 = [random.uniform(-1, 1) for _ in range(1000)]
    v2 = [random.uniform(-1, 1) for _ in range(1000)]
    # Compute expected value
    dot = sum(x * y for x, y in zip(v1, v2))
    norm1 = math.sqrt(sum(x * x for x in v1))
    norm2 = math.sqrt(sum(y * y for y in v2))
    expected = dot / (norm1 * norm2)
    codeflash_output = block.run(v1, v2); result = codeflash_output # 108μs -> 53.7μs (102% faster)

def test_large_scale_performance():
    # Test performance: should not take too long for vectors of length 1000
    import time
    block = CosineSimilarityBlockV1()
    v1 = [i for i in range(1000)]
    v2 = [i for i in range(1000)]
    start = time.time()
    codeflash_output = block.run(v1, v2); result = codeflash_output # 120μs -> 59.0μs (105% faster)
    end = time.time()
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import math

# imports
import pytest  # used for our unit tests
from inference.core.workflows.core_steps.math.cosine_similarity.v1 import \
    CosineSimilarityBlockV1

# Basic Test Cases

def test_identical_vectors():
    # Identical vectors should have similarity 1.0
    block = CosineSimilarityBlockV1()
    codeflash_output = block.run([1, 2, 3], [1, 2, 3]); result = codeflash_output # 15.7μs -> 11.6μs (35.6% faster)

def test_opposite_vectors():
    # Opposite vectors should have similarity -1.0
    block = CosineSimilarityBlockV1()
    codeflash_output = block.run([1, 0], [-1, 0]); result = codeflash_output # 15.0μs -> 11.2μs (34.6% faster)

def test_orthogonal_vectors():
    # Orthogonal vectors should have similarity 0.0
    block = CosineSimilarityBlockV1()
    codeflash_output = block.run([1, 0], [0, 1]); result = codeflash_output # 14.7μs -> 10.4μs (40.9% faster)

def test_simple_2d_vectors():
    # Test with simple 2D vectors at 45 degrees
    block = CosineSimilarityBlockV1()
    codeflash_output = block.run([1, 0], [1, 1]); result = codeflash_output # 14.0μs -> 10.7μs (31.7% faster)
    expected = 1 / math.sqrt(2)

def test_negative_values():
    # Test with negative values
    block = CosineSimilarityBlockV1()
    codeflash_output = block.run([-1, -2, -3], [-1, -2, -3]); result = codeflash_output # 14.8μs -> 10.7μs (38.1% faster)

# Edge Test Cases

def test_zero_vector():
    # If either vector is all zeros, similarity should be nan
    block = CosineSimilarityBlockV1()
    codeflash_output = block.run([0, 0, 0], [1, 2, 3]); result = codeflash_output # 23.7μs -> 20.0μs (18.2% faster)
    codeflash_output = block.run([1, 2, 3], [0, 0, 0]); result = codeflash_output # 8.14μs -> 6.09μs (33.5% faster)
    codeflash_output = block.run([0, 0, 0], [0, 0, 0]); result = codeflash_output # 5.92μs -> 4.60μs (28.6% faster)

def test_empty_vectors():
    # Empty vectors should raise due to length mismatch
    block = CosineSimilarityBlockV1()
    with pytest.raises(RuntimeError):
        block.run([], [1]) # 1.33μs -> 1.33μs (0.452% faster)
    with pytest.raises(RuntimeError):
        block.run([1], []) # 749ns -> 737ns (1.63% faster)

def test_length_mismatch():
    # Vectors of different lengths should raise RuntimeError
    block = CosineSimilarityBlockV1()
    with pytest.raises(RuntimeError):
        block.run([1, 2], [1, 2, 3]) # 1.09μs -> 1.08μs (0.277% faster)

def test_single_element_vectors():
    # Single element vectors
    block = CosineSimilarityBlockV1()
    codeflash_output = block.run([5], [5]); result = codeflash_output # 20.2μs -> 15.5μs (30.3% faster)
    codeflash_output = block.run([5], [-5]); result = codeflash_output # 5.84μs -> 4.23μs (38.0% faster)
    codeflash_output = block.run([0], [5]); result = codeflash_output # 12.3μs -> 11.3μs (9.24% faster)

def test_non_float_inputs():
    # Non-float inputs (integers)
    block = CosineSimilarityBlockV1()
    codeflash_output = block.run([1, 2, 3], [4, 5, 6]); result = codeflash_output # 15.7μs -> 11.0μs (43.0% faster)
    # Calculate expected similarity
    dot = 1*4 + 2*5 + 3*6
    norm1 = math.sqrt(1**2 + 2**2 + 3**2)
    norm2 = math.sqrt(4**2 + 5**2 + 6**2)
    expected = dot / (norm1 * norm2)

def test_large_and_small_values():
    # Large and small values
    block = CosineSimilarityBlockV1()
    codeflash_output = block.run([1e10, 1e-10], [1e10, 1e-10]); result = codeflash_output # 12.3μs -> 9.89μs (24.6% faster)
    codeflash_output = block.run([1e10, 0], [0, 1e10]); result = codeflash_output # 5.66μs -> 4.07μs (39.1% faster)

def test_nan_and_inf_values():
    # Vectors containing NaN or inf should propagate NaN
    block = CosineSimilarityBlockV1()
    codeflash_output = block.run([float('nan'), 1], [1, 1]); result = codeflash_output # 13.3μs -> 12.4μs (7.94% faster)
    codeflash_output = block.run([1, float('inf')], [1, 1]); result = codeflash_output # 12.8μs -> 12.2μs (4.93% faster)
    codeflash_output = block.run([1, 1], [float('-inf'), 1]); result = codeflash_output # 6.43μs -> 5.76μs (11.6% faster)

# Large Scale Test Cases

def test_large_vectors_identical():
    # Large identical vectors
    block = CosineSimilarityBlockV1()
    vec = [1.0] * 1000
    codeflash_output = block.run(vec, vec); result = codeflash_output # 105μs -> 50.9μs (107% faster)

def test_large_vectors_orthogonal():
    # Large orthogonal vectors (half zeros, half ones)
    block = CosineSimilarityBlockV1()
    a = [1.0] * 500 + [0.0] * 500
    b = [0.0] * 500 + [1.0] * 500
    codeflash_output = block.run(a, b); result = codeflash_output # 105μs -> 49.5μs (113% faster)

def test_large_vectors_random():
    # Large vectors with random values
    import random
    block = CosineSimilarityBlockV1()
    random.seed(42)
    a = [random.uniform(-100, 100) for _ in range(1000)]
    b = [random.uniform(-100, 100) for _ in range(1000)]
    # Calculate expected similarity manually
    dot = sum(x * y for x, y in zip(a, b))
    norm_a = math.sqrt(sum(x * x for x in a))
    norm_b = math.sqrt(sum(y * y for y in b))
    expected = dot / (norm_a * norm_b) if norm_a != 0 and norm_b != 0 else float('nan')
    codeflash_output = block.run(a, b); result = codeflash_output # 108μs -> 50.8μs (113% faster)
    if math.isnan(expected):
        pass
    else:
        pass

def test_large_vectors_length_mismatch():
    # Large vectors with length mismatch should raise
    block = CosineSimilarityBlockV1()
    a = [1.0] * 1000
    b = [1.0] * 999
    with pytest.raises(RuntimeError):
        block.run(a, b) # 1.45μs -> 1.45μs (0.069% slower)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-CosineSimilarityBlockV1.run-mh9uf1n6 and push.

**Explanation of Optimizations:** - **cosine_similarity:** - Replaces individual `np.linalg.norm(a)` and `np.linalg.norm(b)` calls with a precomputed norm (`np.sqrt(np.dot(a, a))`), which is significantly faster because it avoids function overhead and repeated full-array iterations. - Uses `np.asarray` for converting input arguments to arrays only if necessary, ensuring compatibility with list inputs while avoiding unnecessary copies. - Uses `np.dot(a, b)` for the numerator and precomputed norms for the denominator to minimize temporary allocations. - Bypasses the need for defensive shape-casting in the hot path (relies on existing length check in the caller). - **CosineSimilarityBlockV1:** - Converts input lists to NumPy arrays only once, if not already (for single conversion and efficiency). - All exception and return logic remain unchanged. These changes ensure less overhead per function call, especially when the function is called repeatedly or on large vectors, resulting in >20% faster execution for typical input types. ---

misrasaurabh1 · 2025-10-29T06:08:49Z

inputs are not numpy

codeflash-ai bot requested a review from mashraf-222 October 28, 2025 00:43

codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 28, 2025

misrasaurabh1 closed this Oct 29, 2025

codeflash-ai bot deleted the codeflash/optimize-CosineSimilarityBlockV1.run-mh9uf1n6 branch October 29, 2025 06:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

⚡️ Speed up method `CosineSimilarityBlockV1.run` by 71% #587

⚡️ Speed up method `CosineSimilarityBlockV1.run` by 71% #587

Uh oh!

codeflash-ai bot commented Oct 28, 2025

Uh oh!

misrasaurabh1 commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

⚡️ Speed up method CosineSimilarityBlockV1.run by 71% #587

⚡️ Speed up method CosineSimilarityBlockV1.run by 71% #587

Uh oh!

Conversation

codeflash-ai bot commented Oct 28, 2025

📄 71% (0.71x) speedup for CosineSimilarityBlockV1.run in inference/core/workflows/core_steps/math/cosine_similarity/v1.py

📝 Explanation and details

Uh oh!

misrasaurabh1 commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

⚡️ Speed up method `CosineSimilarityBlockV1.run` by 71% #587

⚡️ Speed up method `CosineSimilarityBlockV1.run` by 71% #587

📄 71% (0.71x) speedup for `CosineSimilarityBlockV1.run` in `inference/core/workflows/core_steps/math/cosine_similarity/v1.py`