Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 28, 2025

📄 64% (0.64x) speedup for combine_risk_key in gs_quant/risk/core.py

⏱️ Runtime : 460 microseconds 280 microseconds (best of 511 runs)

📝 Explanation and details

The optimization eliminates function call overhead and reduces attribute access operations. The original code uses a nested get_field_value function that calls getattr twice per field (once for each key comparison), resulting in 12 total getattr calls for 6 fields. The optimized version replaces this with direct attribute access (key_1.provider, key_2.provider, etc.), which is significantly faster in Python.

Key changes:

  • Removed nested function: Eliminates function definition overhead and 6 function calls
  • Direct attribute access: Replaced 12 getattr calls with 12 direct attribute accesses
  • Inline comparisons: Each field comparison is now a single line with direct attribute access

Why this is faster:

  • getattr() has overhead for dynamic attribute lookup and error handling
  • Direct attribute access (obj.attr) is optimized at the bytecode level
  • Eliminating the nested function removes call stack overhead
  • The line profiler shows the original version spent 56.4% of time on the return statement with multiple get_field_value calls

Performance characteristics:
The optimization provides consistent 20-80% speedup across all test cases, with the best improvements (40-80%) on scenarios with many field differences or identical keys. Even complex cases with large data structures see meaningful gains (1-8% for very large objects), making this a universally beneficial optimization for any usage pattern of combine_risk_key.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 438 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from collections import namedtuple

# imports
import pytest
from gs_quant.risk.core import combine_risk_key


# Define a minimal RiskKey class for testing, mimicking the required fields and equality
class RiskKey:
    def __init__(self, provider, date, market, params, scenario, risk_measure):
        self.provider = provider
        self.date = date
        self.market = market
        self.params = params
        self.scenario = scenario
        self.risk_measure = risk_measure

    def __eq__(self, other):
        if not isinstance(other, RiskKey):
            return False
        return (self.provider == other.provider and
                self.date == other.date and
                self.market == other.market and
                self.params == other.params and
                self.scenario == other.scenario and
                self.risk_measure == other.risk_measure)

    def __repr__(self):
        return (f"RiskKey(provider={self.provider!r}, date={self.date!r}, market={self.market!r}, "
                f"params={self.params!r}, scenario={self.scenario!r}, risk_measure={self.risk_measure!r})")
from gs_quant.risk.core import combine_risk_key

# -----------------------
# Unit Tests Start Here
# -----------------------

# 1. BASIC TEST CASES

def test_combine_identical_keys():
    # All fields are the same
    key = RiskKey("GS", "2024-06-01", "NY", {"foo": 1}, "base", "VAR")
    codeflash_output = combine_risk_key(key, key); combined = codeflash_output # 2.15μs -> 1.75μs (23.0% faster)

def test_combine_all_fields_different():
    # All fields differ
    k1 = RiskKey("A", "2024-01-01", "LON", {"a": 1}, "up", "ES")
    k2 = RiskKey("B", "2024-02-02", "NY", {"b": 2}, "down", "VAR")
    codeflash_output = combine_risk_key(k1, k2); combined = codeflash_output # 2.20μs -> 1.61μs (36.2% faster)

def test_combine_some_fields_equal():
    # Some fields are equal, some are not
    k1 = RiskKey("GS", "2024-06-01", "NY", {"foo": 1}, "base", "VAR")
    k2 = RiskKey("GS", "2024-06-01", "LON", {"foo": 1}, "base", "ES")
    codeflash_output = combine_risk_key(k1, k2); combined = codeflash_output # 2.04μs -> 1.46μs (39.7% faster)

def test_combine_with_none_fields():
    # Some fields are None in both, should treat None == None as equal
    k1 = RiskKey(None, "2024-06-01", None, None, "base", "VAR")
    k2 = RiskKey(None, "2024-06-01", None, None, "base", "VAR")
    codeflash_output = combine_risk_key(k1, k2); combined = codeflash_output # 2.00μs -> 1.38μs (44.9% faster)

def test_combine_with_one_none_one_value():
    # If one field is None and the other is not, should be None
    k1 = RiskKey(None, "2024-06-01", "NY", {"foo": 1}, "base", "VAR")
    k2 = RiskKey("GS", "2024-06-01", "NY", {"foo": 1}, "base", "VAR")
    codeflash_output = combine_risk_key(k1, k2); combined = codeflash_output # 2.07μs -> 1.56μs (32.8% faster)

# 2. EDGE TEST CASES

def test_combine_empty_dict_params():
    # params are both empty dicts
    k1 = RiskKey("GS", "2024-06-01", "NY", {}, "base", "VAR")
    k2 = RiskKey("GS", "2024-06-01", "NY", {}, "base", "VAR")
    codeflash_output = combine_risk_key(k1, k2); combined = codeflash_output # 1.95μs -> 1.47μs (33.2% faster)

def test_combine_params_different_content():
    # params are dicts with different content
    k1 = RiskKey("GS", "2024-06-01", "NY", {"foo": 1}, "base", "VAR")
    k2 = RiskKey("GS", "2024-06-01", "NY", {"foo": 2}, "base", "VAR")
    codeflash_output = combine_risk_key(k1, k2); combined = codeflash_output # 2.08μs -> 1.59μs (30.8% faster)

def test_combine_params_different_types():
    # params are different types (dict vs list)
    k1 = RiskKey("GS", "2024-06-01", "NY", {"foo": 1}, "base", "VAR")
    k2 = RiskKey("GS", "2024-06-01", "NY", ["foo", 1], "base", "VAR")
    codeflash_output = combine_risk_key(k1, k2); combined = codeflash_output # 2.07μs -> 1.55μs (33.0% faster)

def test_combine_with_empty_strings():
    # Some fields are empty strings
    k1 = RiskKey("", "2024-06-01", "", {}, "", "")
    k2 = RiskKey("", "2024-06-01", "", {}, "", "")
    codeflash_output = combine_risk_key(k1, k2); combined = codeflash_output # 1.97μs -> 1.53μs (29.3% faster)

def test_combine_with_different_types():
    # market is int in one, str in the other
    k1 = RiskKey("GS", "2024-06-01", 1, {}, "base", "VAR")
    k2 = RiskKey("GS", "2024-06-01", "1", {}, "base", "VAR")
    codeflash_output = combine_risk_key(k1, k2); combined = codeflash_output # 2.10μs -> 1.49μs (41.2% faster)

def test_combine_with_mutable_fields():
    # params is a mutable object, but values are same object
    params = {"foo": [1, 2, 3]}
    k1 = RiskKey("GS", "2024-06-01", "NY", params, "base", "VAR")
    k2 = RiskKey("GS", "2024-06-01", "NY", params, "base", "VAR")
    codeflash_output = combine_risk_key(k1, k2); combined = codeflash_output # 2.02μs -> 1.42μs (42.0% faster)

def test_combine_with_one_field_missing():
    # Simulate a RiskKey with a missing attribute using a subclass
    class PartialRiskKey(RiskKey):
        def __init__(self, provider, date, market, params, scenario):
            super().__init__(provider, date, market, params, scenario, None)
            del self.risk_measure  # Remove attribute

    k1 = PartialRiskKey("GS", "2024-06-01", "NY", {}, "base")
    k2 = RiskKey("GS", "2024-06-01", "NY", {}, "base", "VAR")
    # Should raise AttributeError
    with pytest.raises(AttributeError):
        combine_risk_key(k1, k2) # 2.44μs -> 1.99μs (22.5% faster)

def test_combine_with_non_riskkey_object():
    # Second argument is not a RiskKey
    k1 = RiskKey("GS", "2024-06-01", "NY", {}, "base", "VAR")
    class Dummy: pass
    k2 = Dummy()
    with pytest.raises(AttributeError):
        combine_risk_key(k1, k2) # 1.85μs -> 1.34μs (38.5% faster)

# 3. LARGE SCALE TEST CASES

def test_combine_large_identical_keys():
    # Test with large but identical params dict
    big_dict = {str(i): i for i in range(500)}
    k1 = RiskKey("GS", "2024-06-01", "NY", big_dict, "base", "VAR")
    k2 = RiskKey("GS", "2024-06-01", "NY", big_dict.copy(), "base", "VAR")
    codeflash_output = combine_risk_key(k1, k2); combined = codeflash_output # 7.57μs -> 7.43μs (1.93% faster)

def test_combine_large_different_params():
    # Large dicts with one difference
    big_dict1 = {str(i): i for i in range(500)}
    big_dict2 = big_dict1.copy()
    big_dict2["unique"] = 999
    k1 = RiskKey("GS", "2024-06-01", "NY", big_dict1, "base", "VAR")
    k2 = RiskKey("GS", "2024-06-01", "NY", big_dict2, "base", "VAR")
    codeflash_output = combine_risk_key(k1, k2); combined = codeflash_output # 2.19μs -> 1.75μs (24.9% faster)

def test_combine_many_keys_pairwise():
    # Pairwise combine a list of similar keys, only one field differs per pair
    keys = []
    for i in range(10):
        keys.append(RiskKey("GS", "2024-06-01", f"market{i}", {"foo": i}, "base", "VAR"))
    for i in range(1, len(keys)):
        codeflash_output = combine_risk_key(keys[0], keys[i]); combined = codeflash_output # 10.3μs -> 6.50μs (57.9% faster)

def test_combine_large_keys_with_none():
    # Large params, but one field is None in one key and not in the other
    big_dict = {str(i): i for i in range(500)}
    k1 = RiskKey("GS", "2024-06-01", "NY", big_dict, "base", "VAR")
    k2 = RiskKey(None, "2024-06-01", "NY", big_dict, "base", "VAR")
    codeflash_output = combine_risk_key(k1, k2); combined = codeflash_output # 7.18μs -> 6.68μs (7.39% faster)

def test_combine_keys_with_large_string_fields():
    # Very long string fields
    long_str = "A" * 500
    k1 = RiskKey(long_str, "2024-06-01", long_str, {}, long_str, long_str)
    k2 = RiskKey(long_str, "2024-06-01", long_str, {}, long_str, long_str)
    codeflash_output = combine_risk_key(k1, k2); combined = codeflash_output # 1.95μs -> 1.44μs (35.2% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from dataclasses import dataclass

# imports
import pytest
from gs_quant.risk.core import combine_risk_key


# Minimal RiskKey class definition for testing
@dataclass(frozen=True)
class RiskKey:
    provider: str
    date: str
    market: str
    params: str
    scenario: str
    risk_measure: str
from gs_quant.risk.core import combine_risk_key

# --------------------- UNIT TESTS ---------------------

# 1. BASIC TEST CASES

def test_combine_all_fields_equal():
    # All fields are the same in both keys
    key1 = RiskKey("GS", "2023-01-01", "NY", "p1", "s1", "rm1")
    key2 = RiskKey("GS", "2023-01-01", "NY", "p1", "s1", "rm1")
    codeflash_output = combine_risk_key(key1, key2); result = codeflash_output # 2.25μs -> 1.82μs (23.6% faster)

def test_combine_one_field_differs():
    # Only one field differs
    key1 = RiskKey("GS", "2023-01-01", "NY", "p1", "s1", "rm1")
    key2 = RiskKey("GS", "2023-01-01", "NY", "p1", "s1", "rm2")
    codeflash_output = combine_risk_key(key1, key2); result = codeflash_output # 2.08μs -> 1.43μs (45.6% faster)

def test_combine_multiple_fields_differ():
    # Multiple fields differ
    key1 = RiskKey("GS", "2023-01-01", "NY", "p1", "s1", "rm1")
    key2 = RiskKey("MS", "2023-01-01", "LDN", "p2", "s1", "rm1")
    codeflash_output = combine_risk_key(key1, key2); result = codeflash_output # 2.06μs -> 1.49μs (38.2% faster)

def test_combine_all_fields_differ():
    # All fields differ
    key1 = RiskKey("GS", "2023-01-01", "NY", "p1", "s1", "rm1")
    key2 = RiskKey("MS", "2024-02-02", "LDN", "p2", "s2", "rm2")
    codeflash_output = combine_risk_key(key1, key2); result = codeflash_output # 1.98μs -> 1.48μs (33.8% faster)

def test_combine_with_none_fields():
    # One or both keys have None fields
    key1 = RiskKey("GS", None, "NY", None, "s1", "rm1")
    key2 = RiskKey("GS", None, "NY", "p1", "s2", "rm1")
    codeflash_output = combine_risk_key(key1, key2); result = codeflash_output # 2.14μs -> 1.55μs (37.5% faster)

# 2. EDGE TEST CASES

def test_combine_empty_strings():
    # Fields are empty strings
    key1 = RiskKey("", "", "", "", "", "")
    key2 = RiskKey("", "", "", "", "", "")
    codeflash_output = combine_risk_key(key1, key2); result = codeflash_output # 1.92μs -> 1.32μs (45.4% faster)

def test_combine_empty_vs_nonempty():
    # One key has empty strings, the other has values
    key1 = RiskKey("", "", "", "", "", "")
    key2 = RiskKey("GS", "2023", "NY", "p", "s", "rm")
    codeflash_output = combine_risk_key(key1, key2); result = codeflash_output # 1.92μs -> 1.30μs (47.8% faster)

def test_combine_none_vs_value():
    # One key has None, the other has a value
    key1 = RiskKey(None, None, None, None, None, None)
    key2 = RiskKey("GS", "2023", "NY", "p", "s", "rm")
    codeflash_output = combine_risk_key(key1, key2); result = codeflash_output # 1.89μs -> 1.27μs (48.6% faster)

def test_combine_identical_none_keys():
    # Both keys are all None
    key1 = RiskKey(None, None, None, None, None, None)
    key2 = RiskKey(None, None, None, None, None, None)
    codeflash_output = combine_risk_key(key1, key2); result = codeflash_output # 1.94μs -> 1.33μs (45.7% faster)

def test_combine_mixed_types():
    # Fields have mixed types (should be strings or None, but test robustness)
    key1 = RiskKey("GS", 20230101, "NY", None, "s1", 123)
    key2 = RiskKey("GS", 20230101, "NY", None, "s1", 123)
    codeflash_output = combine_risk_key(key1, key2); result = codeflash_output # 2.10μs -> 1.44μs (46.0% faster)

def test_combine_type_mismatch():
    # Type mismatch in fields
    key1 = RiskKey("GS", "2023-01-01", "NY", "p1", "s1", "rm1")
    key2 = RiskKey("GS", 20230101, "NY", "p1", "s1", "rm1")
    codeflash_output = combine_risk_key(key1, key2); result = codeflash_output # 2.04μs -> 1.52μs (34.4% faster)

def test_combine_partial_overlap():
    # Some fields are the same, some are None, some differ
    key1 = RiskKey("GS", None, "NY", "p1", None, "rm1")
    key2 = RiskKey("GS", None, "LDN", "p1", "s2", "rm1")
    codeflash_output = combine_risk_key(key1, key2); result = codeflash_output # 2.10μs -> 1.51μs (39.1% faster)

# 3. LARGE SCALE TEST CASES

def test_combine_many_unique_keys():
    # Test combining 100 different keys with only one matching field
    for i in range(100):
        key1 = RiskKey("GS", f"2023-01-{i:02d}", "NY", f"p{i}", "s1", "rm1")
        key2 = RiskKey("GS", f"2023-01-{i:02d}", "LDN", f"p{i+1}", "s2", "rm2")
        codeflash_output = combine_risk_key(key1, key2); result = codeflash_output # 94.2μs -> 57.3μs (64.3% faster)

def test_combine_large_batch_all_equal():
    # Test combining many identical keys
    base = RiskKey("GS", "2023-01-01", "NY", "p1", "s1", "rm1")
    for _ in range(100):
        codeflash_output = combine_risk_key(base, base); result = codeflash_output # 97.5μs -> 53.2μs (83.3% faster)

def test_combine_large_batch_all_different():
    # Test combining many keys with no matching fields
    for i in range(100):
        key1 = RiskKey(f"GS{i}", f"2023-01-{i:02d}", f"NY{i}", f"p{i}", f"s{i}", f"rm{i}")
        key2 = RiskKey(f"MS{i}", f"2024-02-{i:02d}", f"LDN{i}", f"q{i}", f"t{i}", f"sm{i}")
        codeflash_output = combine_risk_key(key1, key2); result = codeflash_output # 90.3μs -> 53.5μs (68.8% faster)

def test_combine_large_batch_some_fields_equal():
    # Test combining keys where only some fields are equal across a batch
    for i in range(100):
        key1 = RiskKey("GS", f"2023-01-{i:02d}", "NY", f"p{i}", "s1", "rm1")
        key2 = RiskKey("GS", f"2023-01-{i:02d}", "NY", f"q{i}", "s1", "rm2")
        codeflash_output = combine_risk_key(key1, key2); result = codeflash_output # 97.1μs -> 54.9μs (77.0% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-combine_risk_key-mhazaa7e and push.

Codeflash

The optimization eliminates function call overhead and reduces attribute access operations. The original code uses a nested `get_field_value` function that calls `getattr` twice per field (once for each key comparison), resulting in 12 total `getattr` calls for 6 fields. The optimized version replaces this with direct attribute access (`key_1.provider`, `key_2.provider`, etc.), which is significantly faster in Python.

**Key changes:**
- **Removed nested function**: Eliminates function definition overhead and 6 function calls
- **Direct attribute access**: Replaced 12 `getattr` calls with 12 direct attribute accesses
- **Inline comparisons**: Each field comparison is now a single line with direct attribute access

**Why this is faster:**
- `getattr()` has overhead for dynamic attribute lookup and error handling
- Direct attribute access (`obj.attr`) is optimized at the bytecode level
- Eliminating the nested function removes call stack overhead
- The line profiler shows the original version spent 56.4% of time on the `return` statement with multiple `get_field_value` calls

**Performance characteristics:**
The optimization provides consistent 20-80% speedup across all test cases, with the best improvements (40-80%) on scenarios with many field differences or identical keys. Even complex cases with large data structures see meaningful gains (1-8% for very large objects), making this a universally beneficial optimization for any usage pattern of `combine_risk_key`.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 28, 2025 19:47
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant