Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 28, 2025

📄 15% (0.15x) speedup for risk_by_class_handler in gs_quant/risk/result_handlers.py

⏱️ Runtime : 7.14 milliseconds 6.20 milliseconds (best of 85 runs)

📝 Explanation and details

The optimized code achieves a 15% speedup through several key data structure and algorithmic improvements:

1. Single-pass input materialization: Both __dataframe_handler and __dataframe_handler_unsorted now convert the input result iterable to a list upfront (result_list = list(result)). This eliminates the overhead of multiple iterator traversals and enables efficient empty checks with if not result_list: instead of exhausting generators.

2. Efficient column filtering: In __dataframe_handler, the original code used enumeration with boolean indexing (indices[idx] = True) and tuple concatenation in a loop. The optimized version precomputes column selection using list comprehensions ([src in mappings_lookup for src in first_row_keys]) and direct tuple generation, reducing per-row overhead.

3. Set-based skip tracking: In risk_by_class_handler, the original code maintained a skip list and performed O(n) membership checks (if idx not in skip). The optimized version uses a set for O(1) membership tests, significantly faster for large datasets with many SPIKE/JUMP entries.

4. Direct dictionary assignment: Replaced clazz.update({'value': value}) with clazz['value'] = value, eliminating the dictionary creation and update overhead for single-key operations.

5. Reduced function call overhead: Pre-extracted frequently accessed attributes (rc_classes = result['classes']) to avoid repeated dictionary lookups.

The optimizations are particularly effective for large-scale test cases where the set-based skip tracking shows dramatic improvements (155% faster for spike/jump aggregation with 500 entries) and moderate gains for mixed datasets (11-13% faster). Basic cases show smaller but consistent improvements, with the optimizations being most beneficial when processing datasets with many classes or frequent SPIKE/JUMP filtering operations.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 23 Passed
⏪ Replay Tests 3 Passed
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import pytest  # used for our unit tests
from gs_quant.risk.result_handlers import risk_by_class_handler


# Dummy classes and helpers to allow testing of risk_by_class_handler
class RiskMeasure:
    def __init__(self, name):
        self.name = name

class RiskKey:
    def __init__(self, risk_measure):
        self.risk_measure = risk_measure

class InstrumentBase:
    pass

class DataFrameWithInfo:
    def __init__(self, records=None, risk_key=None, request_id=None):
        self._records = list(records) if records is not None else []
        self.risk_key = risk_key
        self.request_id = request_id
        self.columns = []
        # for unsorted handler, allow dict-like access
        self._data = {col: [row[i] for row in self._records] for i, col in enumerate(self.columns)} if self.columns else {}

    def __getitem__(self, item):
        # mimic pandas DataFrame column access for the unsorted handler
        idx = self.columns.index(item)
        return [row[idx] for row in self._records]

    def __eq__(self, other):
        return (
            isinstance(other, DataFrameWithInfo)
            and self._records == other._records
            and self.columns == other.columns
        )

    def __repr__(self):
        return f"DataFrameWithInfo(records={self._records}, columns={self.columns})"

class FloatWithInfo:
    def __init__(self, risk_key, value, unit=None, request_id=None):
        self.risk_key = risk_key
        self.value = value
        self.unit = unit
        self.request_id = request_id

    def __eq__(self, other):
        return (
            isinstance(other, FloatWithInfo)
            and self.value == other.value
            and self.unit == other.unit
        )

    def __repr__(self):
        return f"FloatWithInfo(value={self.value}, unit={self.unit})"

class PnlExplain:
    pass
from gs_quant.risk.result_handlers import risk_by_class_handler

# ---------------------- UNIT TESTS ----------------------

# Basic Test Cases

def test_basic_floatwithinfo_single_class():
    # Only one class, measure in external_risk_by_class_val
    result = {
        'classes': [{'type': 'IRDeltaParallel', 'asset': 'USD'}],
        'values': [1.5],
        'unit': 'USD'
    }
    risk_key = RiskKey(RiskMeasure('IRDeltaParallel'))
    codeflash_output = risk_by_class_handler(result, risk_key, InstrumentBase()); out = codeflash_output # 7.61μs -> 7.23μs (5.21% faster)

def test_basic_floatwithinfo_two_classes_same_type():
    # Two classes, same type, measure in external_risk_by_class_val
    result = {
        'classes': [{'type': 'IRDeltaParallel', 'asset': 'USD'}, {'type': 'IRDeltaParallel', 'asset': 'EUR'}],
        'values': [1.5, 2.5],
        'unit': 'USD'
    }
    risk_key = RiskKey(RiskMeasure('IRDeltaParallel'))
    codeflash_output = risk_by_class_handler(result, risk_key, InstrumentBase()); out = codeflash_output # 5.06μs -> 5.40μs (6.31% slower)

def test_basic_dataframe_handler_multiple_classes():
    # Multiple classes, different types, not in external_risk_by_class_val
    result = {
        'classes': [
            {'type': 'IRDelta', 'asset': 'USD'},
            {'type': 'IRDelta', 'asset': 'EUR'},
            {'type': 'IRDelta', 'asset': 'JPY'}
        ],
        'values': [1.0, 2.0, 3.0],
        'unit': 'USD'
    }
    risk_key = RiskKey(RiskMeasure('IRDelta'))
    codeflash_output = risk_by_class_handler(result, risk_key, InstrumentBase()); out = codeflash_output # 197μs -> 193μs (2.02% faster)


def test_edge_empty_classes():
    # No classes in result
    result = {
        'classes': [],
        'values': [],
        'unit': 'USD'
    }
    risk_key = RiskKey(RiskMeasure('IRDeltaParallel'))
    codeflash_output = risk_by_class_handler(result, risk_key, InstrumentBase()); out = codeflash_output # 182μs -> 180μs (1.40% faster)
    # Should be nan for FloatWithInfo
    if isinstance(out, FloatWithInfo):
        pass

def test_edge_classes_with_spike_and_crosses():
    # SPIKE class should be skipped, value added to CROSSES
    result = {
        'classes': [
            {'type': 'CROSSES', 'asset': 'USD'},
            {'type': 'SPIKE', 'asset': 'USD'},
            {'type': 'IRDelta', 'asset': 'EUR'}
        ],
        'values': [5.0, 2.0, 1.0],
        'unit': 'USD'
    }
    risk_key = RiskKey(RiskMeasure('IRDelta'))
    codeflash_output = risk_by_class_handler(result, risk_key, InstrumentBase()); out = codeflash_output # 187μs -> 178μs (5.10% faster)
    # SPIKE skipped, CROSSES value increased
    crosses_row = ('CROSSES', 'USD', 7.0)
    ir_row = ('IRDelta', 'EUR', 1.0)

def test_edge_classes_with_jump_and_no_crosses():
    # JUMP class should be skipped, but no CROSSES to add to
    result = {
        'classes': [
            {'type': 'JUMP', 'asset': 'USD'},
            {'type': 'IRDelta', 'asset': 'EUR'}
        ],
        'values': [3.0, 1.0],
        'unit': 'USD'
    }
    risk_key = RiskKey(RiskMeasure('IRDelta'))
    codeflash_output = risk_by_class_handler(result, risk_key, InstrumentBase()); out = codeflash_output # 161μs -> 157μs (2.66% faster)
    # JUMP skipped, only IRDelta remains
    ir_row = ('IRDelta', 'EUR', 1.0)

def test_edge_classes_with_mixed_types_and_external_measure():
    # Classes with different types, but measure in external_risk_by_class_val
    result = {
        'classes': [
            {'type': 'IRDeltaParallel', 'asset': 'USD'},
            {'type': 'IRDelta', 'asset': 'EUR'}
        ],
        'values': [1.0, 2.0],
        'unit': 'USD'
    }
    risk_key = RiskKey(RiskMeasure('IRDeltaParallel'))
    codeflash_output = risk_by_class_handler(result, risk_key, InstrumentBase()); out = codeflash_output # 154μs -> 149μs (3.50% faster)

def test_edge_values_missing():
    # values missing from result
    result = {
        'classes': [{'type': 'IRDeltaParallel', 'asset': 'USD'}],
        # 'values' missing
        'unit': 'USD'
    }
    risk_key = RiskKey(RiskMeasure('IRDeltaParallel'))
    codeflash_output = risk_by_class_handler(result, risk_key, InstrumentBase()); out = codeflash_output # 6.43μs -> 6.45μs (0.310% slower)


def test_edge_classes_with_duplicate_types():
    # Classes with duplicate types, not in external_risk_by_class_val
    result = {
        'classes': [
            {'type': 'IRDelta', 'asset': 'USD'},
            {'type': 'IRDelta', 'asset': 'USD'}
        ],
        'values': [1.0, 2.0],
        'unit': 'USD'
    }
    risk_key = RiskKey(RiskMeasure('IRDelta'))
    codeflash_output = risk_by_class_handler(result, risk_key, InstrumentBase()); out = codeflash_output # 201μs -> 195μs (3.19% faster)

# Large Scale Test Cases

def test_large_floatwithinfo_many_classes():
    # 1000 classes, same type, measure in external_risk_by_class_val
    n = 1000
    result = {
        'classes': [{'type': 'IRDeltaParallel', 'asset': f'CUR{i}'} for i in range(n)],
        'values': [float(i) for i in range(n)],
        'unit': 'USD'
    }
    risk_key = RiskKey(RiskMeasure('IRDeltaParallel'))
    codeflash_output = risk_by_class_handler(result, risk_key, InstrumentBase()); out = codeflash_output # 1.17ms -> 1.02ms (14.6% faster)

def test_large_dataframe_handler_many_classes():
    # 1000 classes, different types, not in external_risk_by_class_val
    n = 1000
    result = {
        'classes': [{'type': f'TYPE{i%10}', 'asset': f'CUR{i}'} for i in range(n)],
        'values': [float(i) for i in range(n)],
        'unit': 'USD'
    }
    risk_key = RiskKey(RiskMeasure('IRDelta'))
    codeflash_output = risk_by_class_handler(result, risk_key, InstrumentBase()); out = codeflash_output # 1.30ms -> 1.14ms (13.8% faster)
    # Spot check a few records
    for i in [0, 500, 999]:
        pass


#------------------------------------------------
import pytest
from gs_quant.risk.result_handlers import risk_by_class_handler


# Minimal stubs for dependencies (since we cannot import gs_quant)
class RiskMeasure:
    def __init__(self, name):
        self.name = name

class RiskKey:
    def __init__(self, risk_measure):
        self.risk_measure = risk_measure

class InstrumentBase:
    pass

class DataFrameWithInfo(list):
    def __init__(self, records=None, risk_key=None, request_id=None):
        super().__init__(records if records is not None else [])
        self.risk_key = risk_key
        self.request_id = request_id
        self.columns = []
    def __getitem__(self, key):
        # Support for column access if key is a string
        if isinstance(key, str):
            idx = self.columns.index(key)
            return [row[idx] for row in self]
        return super().__getitem__(key)
    def __setitem__(self, key, values):
        # Support for column assignment if key is a string
        if isinstance(key, str):
            idx = self.columns.index(key)
            for i, row in enumerate(self):
                row[idx] = values[i]
        else:
            super().__setitem__(key, values)
    def map(self, func):
        return [func(x) for x in self]

class FloatWithInfo(float):
    def __new__(cls, risk_key, value, unit=None, request_id=None):
        obj = float.__new__(cls, value)
        obj.risk_key = risk_key
        obj.unit = unit
        obj.request_id = request_id
        return obj

class PnlExplain:
    pass
from gs_quant.risk.result_handlers import risk_by_class_handler

# ========== UNIT TESTS ==========

# ----------- BASIC TEST CASES ------------

def test_basic_floatwithinfo_single_class():
    # Normal case: IRDeltaParallel, single class, returns FloatWithInfo
    result = {
        'classes': [{'type': 'IRDeltaParallel', 'asset': 'USD'}],
        'values': [10.0],
        'unit': 'USD'
    }
    risk_key = RiskKey(RiskMeasure('IRDeltaParallel'))
    instrument = InstrumentBase()
    codeflash_output = risk_by_class_handler(result, risk_key, instrument); out = codeflash_output # 7.27μs -> 7.38μs (1.44% slower)

def test_basic_floatwithinfo_multiple_classes_same_type():
    # Multiple classes, but same type, <=2, returns FloatWithInfo
    result = {
        'classes': [{'type': 'IRDeltaParallel', 'asset': 'USD'},
                    {'type': 'IRDeltaParallel', 'asset': 'EUR'}],
        'values': [10.0, 5.0],
        'unit': 'USD'
    }
    risk_key = RiskKey(RiskMeasure('IRDeltaParallel'))
    instrument = InstrumentBase()
    codeflash_output = risk_by_class_handler(result, risk_key, instrument); out = codeflash_output # 5.13μs -> 5.30μs (3.13% slower)

def test_basic_dataframe_withinfo_multiple_types():
    # Multiple types, returns DataFrameWithInfo
    result = {
        'classes': [{'type': 'CROSSES', 'asset': 'USD'},
                    {'type': 'SPIKE', 'asset': 'EUR'}],
        'values': [10.0, 5.0],
        'unit': 'USD'
    }
    risk_key = RiskKey(RiskMeasure('OtherType'))
    instrument = InstrumentBase()
    codeflash_output = risk_by_class_handler(result, risk_key, instrument); out = codeflash_output # 202μs -> 194μs (4.07% faster)


def test_edge_empty_classes():
    # Empty classes list
    result = {
        'classes': [],
        'values': [],
        'unit': 'USD'
    }
    risk_key = RiskKey(RiskMeasure('OtherType'))
    instrument = InstrumentBase()
    codeflash_output = risk_by_class_handler(result, risk_key, instrument); out = codeflash_output # 182μs -> 176μs (3.12% faster)

def test_edge_missing_values_key():
    # No 'values' key in result
    result = {
        'classes': [{'type': 'IRDeltaParallel', 'asset': 'USD'}],
        'unit': 'USD'
    }
    risk_key = RiskKey(RiskMeasure('IRDeltaParallel'))
    instrument = InstrumentBase()
    codeflash_output = risk_by_class_handler(result, risk_key, instrument); out = codeflash_output # 6.53μs -> 6.66μs (1.98% slower)

def test_edge_spike_and_jump_handling():
    # Multiple SPIKE/JUMP types, with CROSSES present
    result = {
        'classes': [
            {'type': 'CROSSES', 'asset': 'USD'},
            {'type': 'SPIKE', 'asset': 'EUR'},
            {'type': 'JUMP', 'asset': 'GBP'}
        ],
        'values': [10.0, 5.0, 2.0],
        'unit': 'USD'
    }
    risk_key = RiskKey(RiskMeasure('OtherType'))
    instrument = InstrumentBase()
    codeflash_output = risk_by_class_handler(result, risk_key, instrument); out = codeflash_output # 193μs -> 183μs (5.58% faster)

def test_edge_no_crosses_for_spike_jump():
    # SPIKE/JUMP present, but no CROSSES to aggregate
    result = {
        'classes': [
            {'type': 'SPIKE', 'asset': 'EUR'},
            {'type': 'JUMP', 'asset': 'GBP'}
        ],
        'values': [5.0, 2.0],
        'unit': 'USD'
    }
    risk_key = RiskKey(RiskMeasure('OtherType'))
    instrument = InstrumentBase()
    codeflash_output = risk_by_class_handler(result, risk_key, instrument); out = codeflash_output # 155μs -> 156μs (0.835% slower)

def test_edge_mixed_types_not_parallel():
    # Mixed types, not in external_risk_by_class_val, returns DataFrameWithInfo
    result = {
        'classes': [{'type': 'CROSSES', 'asset': 'USD'},
                    {'type': 'IRDeltaParallel', 'asset': 'EUR'}],
        'values': [10.0, 5.0],
        'unit': 'USD'
    }
    risk_key = RiskKey(RiskMeasure('OtherType'))
    instrument = InstrumentBase()
    codeflash_output = risk_by_class_handler(result, risk_key, instrument); out = codeflash_output # 171μs -> 165μs (3.38% faster)

def test_edge_floatwithinfo_parallel_two_types():
    # external_risk_by_class_val, but types are not all the same
    result = {
        'classes': [{'type': 'IRDeltaParallel', 'asset': 'USD'},
                    {'type': 'IRVegaParallel', 'asset': 'EUR'}],
        'values': [10.0, 5.0],
        'unit': 'USD'
    }
    risk_key = RiskKey(RiskMeasure('IRDeltaParallel'))
    instrument = InstrumentBase()
    codeflash_output = risk_by_class_handler(result, risk_key, instrument); out = codeflash_output # 160μs -> 148μs (8.11% faster)

# ----------- LARGE SCALE TEST CASES ------------

def test_large_scale_floatwithinfo_sum():
    # Large number of classes, all same type, returns FloatWithInfo
    n = 500
    result = {
        'classes': [{'type': 'IRDeltaParallel', 'asset': f'A{i}'} for i in range(n)],
        'values': [float(i) for i in range(n)],
        'unit': 'USD'
    }
    risk_key = RiskKey(RiskMeasure('IRDeltaParallel'))
    instrument = InstrumentBase()
    codeflash_output = risk_by_class_handler(result, risk_key, instrument); out = codeflash_output # 678μs -> 600μs (12.9% faster)

def test_large_scale_dataframe_withinfo():
    # Large number of classes, mixed types, returns DataFrameWithInfo
    n = 500
    result = {
        'classes': [{'type': 'CROSSES' if i % 2 == 0 else 'IRDeltaParallel', 'asset': f'A{i}'} for i in range(n)],
        'values': [float(i) for i in range(n)],
        'unit': 'USD'
    }
    risk_key = RiskKey(RiskMeasure('OtherType'))
    instrument = InstrumentBase()
    codeflash_output = risk_by_class_handler(result, risk_key, instrument); out = codeflash_output # 683μs -> 613μs (11.4% faster)

def test_large_scale_spike_jump_aggregation():
    # Large number of SPIKE/JUMP, one CROSSES, all SPIKE/JUMP values should aggregate into CROSSES
    n = 500
    result = {
        'classes': [{'type': 'CROSSES', 'asset': 'USD'}] +
                   [{'type': 'SPIKE', 'asset': f'A{i}'} for i in range(n)],
        'values': [1.0] + [1.0 for _ in range(n)],
        'unit': 'USD'
    }
    risk_key = RiskKey(RiskMeasure('OtherType'))
    instrument = InstrumentBase()
    codeflash_output = risk_by_class_handler(result, risk_key, instrument); out = codeflash_output # 646μs -> 253μs (155% faster)
⏪ Replay Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
test_pytest_gs_quanttestapitest_content_py_gs_quanttestanalyticstest_workspace_py_gs_quanttesttimeseriest__replay_test_0.py::test_gs_quant_risk_result_handlers_risk_by_class_handler 478μs 462μs 3.57%✅

To edit these changes git checkout codeflash/optimize-risk_by_class_handler-mhb2xqpo and push.

Codeflash

The optimized code achieves a 15% speedup through several key data structure and algorithmic improvements:

**1. Single-pass input materialization**: Both `__dataframe_handler` and `__dataframe_handler_unsorted` now convert the input `result` iterable to a list upfront (`result_list = list(result)`). This eliminates the overhead of multiple iterator traversals and enables efficient empty checks with `if not result_list:` instead of exhausting generators.

**2. Efficient column filtering**: In `__dataframe_handler`, the original code used enumeration with boolean indexing (`indices[idx] = True`) and tuple concatenation in a loop. The optimized version precomputes column selection using list comprehensions (`[src in mappings_lookup for src in first_row_keys]`) and direct tuple generation, reducing per-row overhead.

**3. Set-based skip tracking**: In `risk_by_class_handler`, the original code maintained a `skip` list and performed `O(n)` membership checks (`if idx not in skip`). The optimized version uses a `set` for `O(1)` membership tests, significantly faster for large datasets with many SPIKE/JUMP entries.

**4. Direct dictionary assignment**: Replaced `clazz.update({'value': value})` with `clazz['value'] = value`, eliminating the dictionary creation and update overhead for single-key operations.

**5. Reduced function call overhead**: Pre-extracted frequently accessed attributes (`rc_classes = result['classes']`) to avoid repeated dictionary lookups.

The optimizations are particularly effective for **large-scale test cases** where the set-based skip tracking shows dramatic improvements (155% faster for spike/jump aggregation with 500 entries) and moderate gains for mixed datasets (11-13% faster). Basic cases show smaller but consistent improvements, with the optimizations being most beneficial when processing datasets with many classes or frequent SPIKE/JUMP filtering operations.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 28, 2025 21:29
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant