⚡️ Speed up function `combine_risk_key` by 64% #471

codeflash-ai · 2025-10-28T19:47:27Z

📄 64% (0.64x) speedup for `combine_risk_key` in `gs_quant/risk/core.py`

⏱️ Runtime : 460 microseconds → 280 microseconds (best of 511 runs)

📝 Explanation and details

The optimization eliminates function call overhead and reduces attribute access operations. The original code uses a nested get_field_value function that calls getattr twice per field (once for each key comparison), resulting in 12 total getattr calls for 6 fields. The optimized version replaces this with direct attribute access (key_1.provider, key_2.provider, etc.), which is significantly faster in Python.

Key changes:

Removed nested function: Eliminates function definition overhead and 6 function calls
Direct attribute access: Replaced 12 getattr calls with 12 direct attribute accesses
Inline comparisons: Each field comparison is now a single line with direct attribute access

Why this is faster:

getattr() has overhead for dynamic attribute lookup and error handling
Direct attribute access (obj.attr) is optimized at the bytecode level
Eliminating the nested function removes call stack overhead
The line profiler shows the original version spent 56.4% of time on the return statement with multiple get_field_value calls

Performance characteristics:
The optimization provides consistent 20-80% speedup across all test cases, with the best improvements (40-80%) on scenarios with many field differences or identical keys. Even complex cases with large data structures see meaningful gains (1-8% for very large objects), making this a universally beneficial optimization for any usage pattern of combine_risk_key.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 438 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Generated Regression Tests and Runtime

from collections import namedtuple

# imports
import pytest
from gs_quant.risk.core import combine_risk_key


# Define a minimal RiskKey class for testing, mimicking the required fields and equality
class RiskKey:
    def __init__(self, provider, date, market, params, scenario, risk_measure):
        self.provider = provider
        self.date = date
        self.market = market
        self.params = params
        self.scenario = scenario
        self.risk_measure = risk_measure

    def __eq__(self, other):
        if not isinstance(other, RiskKey):
            return False
        return (self.provider == other.provider and
                self.date == other.date and
                self.market == other.market and
                self.params == other.params and
                self.scenario == other.scenario and
                self.risk_measure == other.risk_measure)

    def __repr__(self):
        return (f"RiskKey(provider={self.provider!r}, date={self.date!r}, market={self.market!r}, "
                f"params={self.params!r}, scenario={self.scenario!r}, risk_measure={self.risk_measure!r})")
from gs_quant.risk.core import combine_risk_key

# -----------------------
# Unit Tests Start Here
# -----------------------

# 1. BASIC TEST CASES

def test_combine_identical_keys():
    # All fields are the same
    key = RiskKey("GS", "2024-06-01", "NY", {"foo": 1}, "base", "VAR")
    codeflash_output = combine_risk_key(key, key); combined = codeflash_output # 2.15μs -> 1.75μs (23.0% faster)

def test_combine_all_fields_different():
    # All fields differ
    k1 = RiskKey("A", "2024-01-01", "LON", {"a": 1}, "up", "ES")
    k2 = RiskKey("B", "2024-02-02", "NY", {"b": 2}, "down", "VAR")
    codeflash_output = combine_risk_key(k1, k2); combined = codeflash_output # 2.20μs -> 1.61μs (36.2% faster)

def test_combine_some_fields_equal():
    # Some fields are equal, some are not
    k1 = RiskKey("GS", "2024-06-01", "NY", {"foo": 1}, "base", "VAR")
    k2 = RiskKey("GS", "2024-06-01", "LON", {"foo": 1}, "base", "ES")
    codeflash_output = combine_risk_key(k1, k2); combined = codeflash_output # 2.04μs -> 1.46μs (39.7% faster)

def test_combine_with_none_fields():
    # Some fields are None in both, should treat None == None as equal
    k1 = RiskKey(None, "2024-06-01", None, None, "base", "VAR")
    k2 = RiskKey(None, "2024-06-01", None, None, "base", "VAR")
    codeflash_output = combine_risk_key(k1, k2); combined = codeflash_output # 2.00μs -> 1.38μs (44.9% faster)

def test_combine_with_one_none_one_value():
    # If one field is None and the other is not, should be None
    k1 = RiskKey(None, "2024-06-01", "NY", {"foo": 1}, "base", "VAR")
    k2 = RiskKey("GS", "2024-06-01", "NY", {"foo": 1}, "base", "VAR")
    codeflash_output = combine_risk_key(k1, k2); combined = codeflash_output # 2.07μs -> 1.56μs (32.8% faster)

# 2. EDGE TEST CASES

def test_combine_empty_dict_params():
    # params are both empty dicts
    k1 = RiskKey("GS", "2024-06-01", "NY", {}, "base", "VAR")
    k2 = RiskKey("GS", "2024-06-01", "NY", {}, "base", "VAR")
    codeflash_output = combine_risk_key(k1, k2); combined = codeflash_output # 1.95μs -> 1.47μs (33.2% faster)

def test_combine_params_different_content():
    # params are dicts with different content
    k1 = RiskKey("GS", "2024-06-01", "NY", {"foo": 1}, "base", "VAR")
    k2 = RiskKey("GS", "2024-06-01", "NY", {"foo": 2}, "base", "VAR")
    codeflash_output = combine_risk_key(k1, k2); combined = codeflash_output # 2.08μs -> 1.59μs (30.8% faster)

def test_combine_params_different_types():
    # params are different types (dict vs list)
    k1 = RiskKey("GS", "2024-06-01", "NY", {"foo": 1}, "base", "VAR")
    k2 = RiskKey("GS", "2024-06-01", "NY", ["foo", 1], "base", "VAR")
    codeflash_output = combine_risk_key(k1, k2); combined = codeflash_output # 2.07μs -> 1.55μs (33.0% faster)

def test_combine_with_empty_strings():
    # Some fields are empty strings
    k1 = RiskKey("", "2024-06-01", "", {}, "", "")
    k2 = RiskKey("", "2024-06-01", "", {}, "", "")
    codeflash_output = combine_risk_key(k1, k2); combined = codeflash_output # 1.97μs -> 1.53μs (29.3% faster)

def test_combine_with_different_types():
    # market is int in one, str in the other
    k1 = RiskKey("GS", "2024-06-01", 1, {}, "base", "VAR")
    k2 = RiskKey("GS", "2024-06-01", "1", {}, "base", "VAR")
    codeflash_output = combine_risk_key(k1, k2); combined = codeflash_output # 2.10μs -> 1.49μs (41.2% faster)

def test_combine_with_mutable_fields():
    # params is a mutable object, but values are same object
    params = {"foo": [1, 2, 3]}
    k1 = RiskKey("GS", "2024-06-01", "NY", params, "base", "VAR")
    k2 = RiskKey("GS", "2024-06-01", "NY", params, "base", "VAR")
    codeflash_output = combine_risk_key(k1, k2); combined = codeflash_output # 2.02μs -> 1.42μs (42.0% faster)

def test_combine_with_one_field_missing():
    # Simulate a RiskKey with a missing attribute using a subclass
    class PartialRiskKey(RiskKey):
        def __init__(self, provider, date, market, params, scenario):
            super().__init__(provider, date, market, params, scenario, None)
            del self.risk_measure  # Remove attribute

    k1 = PartialRiskKey("GS", "2024-06-01", "NY", {}, "base")
    k2 = RiskKey("GS", "2024-06-01", "NY", {}, "base", "VAR")
    # Should raise AttributeError
    with pytest.raises(AttributeError):
        combine_risk_key(k1, k2) # 2.44μs -> 1.99μs (22.5% faster)

def test_combine_with_non_riskkey_object():
    # Second argument is not a RiskKey
    k1 = RiskKey("GS", "2024-06-01", "NY", {}, "base", "VAR")
    class Dummy: pass
    k2 = Dummy()
    with pytest.raises(AttributeError):
        combine_risk_key(k1, k2) # 1.85μs -> 1.34μs (38.5% faster)

# 3. LARGE SCALE TEST CASES

def test_combine_large_identical_keys():
    # Test with large but identical params dict
    big_dict = {str(i): i for i in range(500)}
    k1 = RiskKey("GS", "2024-06-01", "NY", big_dict, "base", "VAR")
    k2 = RiskKey("GS", "2024-06-01", "NY", big_dict.copy(), "base", "VAR")
    codeflash_output = combine_risk_key(k1, k2); combined = codeflash_output # 7.57μs -> 7.43μs (1.93% faster)

def test_combine_large_different_params():
    # Large dicts with one difference
    big_dict1 = {str(i): i for i in range(500)}
    big_dict2 = big_dict1.copy()
    big_dict2["unique"] = 999
    k1 = RiskKey("GS", "2024-06-01", "NY", big_dict1, "base", "VAR")
    k2 = RiskKey("GS", "2024-06-01", "NY", big_dict2, "base", "VAR")
    codeflash_output = combine_risk_key(k1, k2); combined = codeflash_output # 2.19μs -> 1.75μs (24.9% faster)

def test_combine_many_keys_pairwise():
    # Pairwise combine a list of similar keys, only one field differs per pair
    keys = []
    for i in range(10):
        keys.append(RiskKey("GS", "2024-06-01", f"market{i}", {"foo": i}, "base", "VAR"))
    for i in range(1, len(keys)):
        codeflash_output = combine_risk_key(keys[0], keys[i]); combined = codeflash_output # 10.3μs -> 6.50μs (57.9% faster)

def test_combine_large_keys_with_none():
    # Large params, but one field is None in one key and not in the other
    big_dict = {str(i): i for i in range(500)}
    k1 = RiskKey("GS", "2024-06-01", "NY", big_dict, "base", "VAR")
    k2 = RiskKey(None, "2024-06-01", "NY", big_dict, "base", "VAR")
    codeflash_output = combine_risk_key(k1, k2); combined = codeflash_output # 7.18μs -> 6.68μs (7.39% faster)

def test_combine_keys_with_large_string_fields():
    # Very long string fields
    long_str = "A" * 500
    k1 = RiskKey(long_str, "2024-06-01", long_str, {}, long_str, long_str)
    k2 = RiskKey(long_str, "2024-06-01", long_str, {}, long_str, long_str)
    codeflash_output = combine_risk_key(k1, k2); combined = codeflash_output # 1.95μs -> 1.44μs (35.2% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from dataclasses import dataclass

# imports
import pytest
from gs_quant.risk.core import combine_risk_key


# Minimal RiskKey class definition for testing
@dataclass(frozen=True)
class RiskKey:
    provider: str
    date: str
    market: str
    params: str
    scenario: str
    risk_measure: str
from gs_quant.risk.core import combine_risk_key

# --------------------- UNIT TESTS ---------------------

# 1. BASIC TEST CASES

def test_combine_all_fields_equal():
    # All fields are the same in both keys
    key1 = RiskKey("GS", "2023-01-01", "NY", "p1", "s1", "rm1")
    key2 = RiskKey("GS", "2023-01-01", "NY", "p1", "s1", "rm1")
    codeflash_output = combine_risk_key(key1, key2); result = codeflash_output # 2.25μs -> 1.82μs (23.6% faster)

def test_combine_one_field_differs():
    # Only one field differs
    key1 = RiskKey("GS", "2023-01-01", "NY", "p1", "s1", "rm1")
    key2 = RiskKey("GS", "2023-01-01", "NY", "p1", "s1", "rm2")
    codeflash_output = combine_risk_key(key1, key2); result = codeflash_output # 2.08μs -> 1.43μs (45.6% faster)

def test_combine_multiple_fields_differ():
    # Multiple fields differ
    key1 = RiskKey("GS", "2023-01-01", "NY", "p1", "s1", "rm1")
    key2 = RiskKey("MS", "2023-01-01", "LDN", "p2", "s1", "rm1")
    codeflash_output = combine_risk_key(key1, key2); result = codeflash_output # 2.06μs -> 1.49μs (38.2% faster)

def test_combine_all_fields_differ():
    # All fields differ
    key1 = RiskKey("GS", "2023-01-01", "NY", "p1", "s1", "rm1")
    key2 = RiskKey("MS", "2024-02-02", "LDN", "p2", "s2", "rm2")
    codeflash_output = combine_risk_key(key1, key2); result = codeflash_output # 1.98μs -> 1.48μs (33.8% faster)

def test_combine_with_none_fields():
    # One or both keys have None fields
    key1 = RiskKey("GS", None, "NY", None, "s1", "rm1")
    key2 = RiskKey("GS", None, "NY", "p1", "s2", "rm1")
    codeflash_output = combine_risk_key(key1, key2); result = codeflash_output # 2.14μs -> 1.55μs (37.5% faster)

# 2. EDGE TEST CASES

def test_combine_empty_strings():
    # Fields are empty strings
    key1 = RiskKey("", "", "", "", "", "")
    key2 = RiskKey("", "", "", "", "", "")
    codeflash_output = combine_risk_key(key1, key2); result = codeflash_output # 1.92μs -> 1.32μs (45.4% faster)

def test_combine_empty_vs_nonempty():
    # One key has empty strings, the other has values
    key1 = RiskKey("", "", "", "", "", "")
    key2 = RiskKey("GS", "2023", "NY", "p", "s", "rm")
    codeflash_output = combine_risk_key(key1, key2); result = codeflash_output # 1.92μs -> 1.30μs (47.8% faster)

def test_combine_none_vs_value():
    # One key has None, the other has a value
    key1 = RiskKey(None, None, None, None, None, None)
    key2 = RiskKey("GS", "2023", "NY", "p", "s", "rm")
    codeflash_output = combine_risk_key(key1, key2); result = codeflash_output # 1.89μs -> 1.27μs (48.6% faster)

def test_combine_identical_none_keys():
    # Both keys are all None
    key1 = RiskKey(None, None, None, None, None, None)
    key2 = RiskKey(None, None, None, None, None, None)
    codeflash_output = combine_risk_key(key1, key2); result = codeflash_output # 1.94μs -> 1.33μs (45.7% faster)

def test_combine_mixed_types():
    # Fields have mixed types (should be strings or None, but test robustness)
    key1 = RiskKey("GS", 20230101, "NY", None, "s1", 123)
    key2 = RiskKey("GS", 20230101, "NY", None, "s1", 123)
    codeflash_output = combine_risk_key(key1, key2); result = codeflash_output # 2.10μs -> 1.44μs (46.0% faster)

def test_combine_type_mismatch():
    # Type mismatch in fields
    key1 = RiskKey("GS", "2023-01-01", "NY", "p1", "s1", "rm1")
    key2 = RiskKey("GS", 20230101, "NY", "p1", "s1", "rm1")
    codeflash_output = combine_risk_key(key1, key2); result = codeflash_output # 2.04μs -> 1.52μs (34.4% faster)

def test_combine_partial_overlap():
    # Some fields are the same, some are None, some differ
    key1 = RiskKey("GS", None, "NY", "p1", None, "rm1")
    key2 = RiskKey("GS", None, "LDN", "p1", "s2", "rm1")
    codeflash_output = combine_risk_key(key1, key2); result = codeflash_output # 2.10μs -> 1.51μs (39.1% faster)

# 3. LARGE SCALE TEST CASES

def test_combine_many_unique_keys():
    # Test combining 100 different keys with only one matching field
    for i in range(100):
        key1 = RiskKey("GS", f"2023-01-{i:02d}", "NY", f"p{i}", "s1", "rm1")
        key2 = RiskKey("GS", f"2023-01-{i:02d}", "LDN", f"p{i+1}", "s2", "rm2")
        codeflash_output = combine_risk_key(key1, key2); result = codeflash_output # 94.2μs -> 57.3μs (64.3% faster)

def test_combine_large_batch_all_equal():
    # Test combining many identical keys
    base = RiskKey("GS", "2023-01-01", "NY", "p1", "s1", "rm1")
    for _ in range(100):
        codeflash_output = combine_risk_key(base, base); result = codeflash_output # 97.5μs -> 53.2μs (83.3% faster)

def test_combine_large_batch_all_different():
    # Test combining many keys with no matching fields
    for i in range(100):
        key1 = RiskKey(f"GS{i}", f"2023-01-{i:02d}", f"NY{i}", f"p{i}", f"s{i}", f"rm{i}")
        key2 = RiskKey(f"MS{i}", f"2024-02-{i:02d}", f"LDN{i}", f"q{i}", f"t{i}", f"sm{i}")
        codeflash_output = combine_risk_key(key1, key2); result = codeflash_output # 90.3μs -> 53.5μs (68.8% faster)

def test_combine_large_batch_some_fields_equal():
    # Test combining keys where only some fields are equal across a batch
    for i in range(100):
        key1 = RiskKey("GS", f"2023-01-{i:02d}", "NY", f"p{i}", "s1", "rm1")
        key2 = RiskKey("GS", f"2023-01-{i:02d}", "NY", f"q{i}", "s1", "rm2")
        codeflash_output = combine_risk_key(key1, key2); result = codeflash_output # 97.1μs -> 54.9μs (77.0% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-combine_risk_key-mhazaa7e and push.

The optimization eliminates function call overhead and reduces attribute access operations. The original code uses a nested `get_field_value` function that calls `getattr` twice per field (once for each key comparison), resulting in 12 total `getattr` calls for 6 fields. The optimized version replaces this with direct attribute access (`key_1.provider`, `key_2.provider`, etc.), which is significantly faster in Python. **Key changes:** - **Removed nested function**: Eliminates function definition overhead and 6 function calls - **Direct attribute access**: Replaced 12 `getattr` calls with 12 direct attribute accesses - **Inline comparisons**: Each field comparison is now a single line with direct attribute access **Why this is faster:** - `getattr()` has overhead for dynamic attribute lookup and error handling - Direct attribute access (`obj.attr`) is optimized at the bytecode level - Eliminating the nested function removes call stack overhead - The line profiler shows the original version spent 56.4% of time on the `return` statement with multiple `get_field_value` calls **Performance characteristics:** The optimization provides consistent 20-80% speedup across all test cases, with the best improvements (40-80%) on scenarios with many field differences or identical keys. Even complex cases with large data structures see meaningful gains (1-8% for very large objects), making this a universally beneficial optimization for any usage pattern of `combine_risk_key`.

codeflash-ai bot requested a review from mashraf-222 October 28, 2025 19:47

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

⚡️ Speed up function `combine_risk_key` by 64% #471

⚡️ Speed up function `combine_risk_key` by 64% #471

Uh oh!

codeflash-ai bot commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

⚡️ Speed up function combine_risk_key by 64% #471

Are you sure you want to change the base?

⚡️ Speed up function combine_risk_key by 64% #471

Uh oh!

Conversation

codeflash-ai bot commented Oct 28, 2025

📄 64% (0.64x) speedup for combine_risk_key in gs_quant/risk/core.py

📝 Explanation and details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up function `combine_risk_key` by 64% #471

⚡️ Speed up function `combine_risk_key` by 64% #471

📄 64% (0.64x) speedup for `combine_risk_key` in `gs_quant/risk/core.py`