Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 28, 2025

📄 10% (0.10x) speedup for mdapi_table_handler in gs_quant/risk/result_handlers.py

⏱️ Runtime : 17.5 milliseconds 15.9 milliseconds (best of 34 runs)

📝 Explanation and details

The optimizations deliver a 9% speedup through several key improvements:

1. Eliminated redundant data structure operations in __dataframe_handler:

  • Replaced boolean index array ([False] * len(...)) with direct index collection using append(), avoiding unnecessary list pre-allocation and boolean flag tracking
  • Changed tuple concatenation (columns += ((mappings_lookup[src]),)) to list append operations, which are significantly faster for building collections incrementally

2. Optimized iteration patterns:

  • Used explicit iterator management (result_iter = iter(result)) to avoid re-creating iterators when processing the remaining data after extracting the first row
  • Implemented a generator function _filtered_rows() that processes both the first row and remaining rows in a single pass, eliminating the need to reconstruct the full dataset for filtering

3. Reduced dictionary operations in mdapi_table_handler:

  • Eliminated multiple update() calls on the coordinate dictionary by using direct assignment (coordinate['point'] = point vs coordinate.update({'point': point}))
  • Cached the coordinates.append method reference to avoid repeated attribute lookups in the tight loop
  • Added rows = result['rows'] to avoid repeated dictionary access

4. Memory access optimizations:

  • Pre-converted keys to a list (key_list = list(first_row.keys())) to avoid repeated dictionary key iteration
  • Used list operations instead of tuple concatenation during the filtering phase, only converting to tuple once at the end

The optimizations are particularly effective for large-scale test cases (1000+ rows), showing 15-17% improvements, while maintaining correctness across all edge cases including empty data, missing values, and varied data types. The performance gains come primarily from reducing Python object creation overhead and eliminating redundant operations in tight loops.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 34 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import pytest
from gs_quant.risk.result_handlers import mdapi_table_handler

# --- Minimal stubs for dependencies (since we can't import gs_quant) ---

class RiskKey:
    def __init__(self, key=None):
        self.key = key

class InstrumentBase:
    pass

class DataFrameWithInfo:
    def __init__(self, records=None, risk_key=None, request_id=None):
        self.records = records if records is not None else []
        self.risk_key = risk_key
        self.request_id = request_id
        self.columns = ()

    def __eq__(self, other):
        return (
            isinstance(other, DataFrameWithInfo) and
            self.records == other.records and
            self.columns == other.columns and
            self.risk_key == other.risk_key and
            self.request_id == other.request_id
        )
from gs_quant.risk.result_handlers import mdapi_table_handler

# --- Unit Tests ---

# Helper to build a row for input
def build_row(type, asset, assetClass, point, quotingStyle, value, permissions):
    return {
        'coordinate': {
            'type': type,
            'asset': asset,
            'assetClass': assetClass,
            'point': point,
            'quotingStyle': quotingStyle
        },
        'value': value,
        'permissions': permissions
    }

# ---- BASIC TEST CASES ----


def test_basic_multiple_rows():
    """Test with multiple rows, different points and values"""
    result = {
        'rows': [
            build_row('Swap', 'USD', 'Rates', '5Y', 'Clean', 99.9, ['READ']),
            build_row('Swap', 'USD', 'Rates', '10Y', 'Clean', 101.1, ['READ'])
        ]
    }
    rk = RiskKey('rk2')
    codeflash_output = mdapi_table_handler(result, rk, InstrumentBase()); df = codeflash_output # 301μs -> 301μs (0.087% faster)

def test_basic_point_as_list():
    """Test where the point is a list, should join with ';'"""
    result = {
        'rows': [
            build_row('Swap', 'EUR', 'Rates', ['2Y', 'Spot'], 'Dirty', 88.8, ['READ', 'WRITE'])
        ]
    }
    rk = RiskKey('rk3')
    codeflash_output = mdapi_table_handler(result, rk, InstrumentBase()); df = codeflash_output # 255μs -> 255μs (0.083% faster)

def test_basic_permissions():
    """Test that permissions are copied correctly"""
    perms = ['READ', 'TRADE']
    result = {
        'rows': [
            build_row('Bond', 'GBP', 'Credit', '2025', 'Clean', 200.0, perms)
        ]
    }
    rk = RiskKey('rk4')
    codeflash_output = mdapi_table_handler(result, rk, InstrumentBase()); df = codeflash_output # 245μs -> 244μs (0.532% faster)

# ---- EDGE TEST CASES ----

def test_edge_empty_rows():
    """Test with empty 'rows' list"""
    result = {'rows': []}
    rk = RiskKey('rk_empty')
    codeflash_output = mdapi_table_handler(result, rk, InstrumentBase()); df = codeflash_output # 152μs -> 151μs (0.361% faster)

def test_edge_missing_value():
    """Test where a row does not have a 'value' key"""
    row = {
        'coordinate': {
            'type': 'Swap',
            'asset': 'USD',
            'assetClass': 'Rates',
            'point': '10Y',
            'quotingStyle': 'Clean'
        },
        # 'value' is missing
        'permissions': ['READ']
    }
    result = {'rows': [row]}
    rk = RiskKey('rk_missing_value')
    codeflash_output = mdapi_table_handler(result, rk, InstrumentBase()); df = codeflash_output # 195μs -> 199μs (1.78% slower)

def test_edge_point_is_empty_list():
    """Test where point is an empty list"""
    result = {
        'rows': [
            build_row('Swap', 'USD', 'Rates', [], 'Clean', 50.0, ['READ'])
        ]
    }
    rk = RiskKey('rk_empty_point')
    codeflash_output = mdapi_table_handler(result, rk, InstrumentBase()); df = codeflash_output # 260μs -> 255μs (1.86% faster)

def test_edge_point_is_none():
    """Test where point is None"""
    row = build_row('Swap', 'USD', 'Rates', None, 'Clean', 70.0, ['READ'])
    result = {'rows': [row]}
    rk = RiskKey('rk_none_point')
    codeflash_output = mdapi_table_handler(result, rk, InstrumentBase()); df = codeflash_output # 253μs -> 252μs (0.173% faster)

def test_edge_permissions_empty():
    """Test where permissions is an empty list"""
    result = {
        'rows': [
            build_row('Swap', 'USD', 'Rates', '10Y', 'Clean', 123.4, [])
        ]
    }
    rk = RiskKey('rk_no_perms')
    codeflash_output = mdapi_table_handler(result, rk, InstrumentBase()); df = codeflash_output # 242μs -> 241μs (0.196% faster)

def test_edge_permissions_none():
    """Test where permissions is None"""
    row = build_row('Swap', 'USD', 'Rates', '10Y', 'Clean', 123.4, None)
    result = {'rows': [row]}
    rk = RiskKey('rk_none_perms')
    codeflash_output = mdapi_table_handler(result, rk, InstrumentBase()); df = codeflash_output # 246μs -> 250μs (1.30% slower)

def test_edge_missing_coordinate_keys():
    """Test where coordinate is missing some keys"""
    row = {
        'coordinate': {
            'type': 'Swap',
            'asset': 'USD',
            # 'assetClass' missing
            'point': '10Y',
            # 'quotingStyle' missing
        },
        'value': 10.0,
        'permissions': ['READ']
    }
    result = {'rows': [row]}
    rk = RiskKey('rk_missing_keys')
    codeflash_output = mdapi_table_handler(result, rk, InstrumentBase()); df = codeflash_output # 225μs -> 216μs (3.92% faster)

def test_edge_point_is_integer():
    """Test where point is an integer"""
    result = {
        'rows': [
            build_row('Swap', 'USD', 'Rates', 2025, 'Clean', 10.0, ['READ'])
        ]
    }
    rk = RiskKey('rk_int_point')
    codeflash_output = mdapi_table_handler(result, rk, InstrumentBase()); df = codeflash_output # 257μs -> 263μs (2.19% slower)

def test_edge_unusual_types_in_permissions():
    """Test where permissions is a string instead of list"""
    result = {
        'rows': [
            build_row('Swap', 'USD', 'Rates', '10Y', 'Clean', 10.0, 'READ')
        ]
    }
    rk = RiskKey('rk_str_perms')
    codeflash_output = mdapi_table_handler(result, rk, InstrumentBase()); df = codeflash_output # 252μs -> 244μs (3.26% faster)

def test_edge_extra_keys_in_coordinate():
    """Test where coordinate contains extra keys not in mappings"""
    row = {
        'coordinate': {
            'type': 'Swap',
            'asset': 'USD',
            'assetClass': 'Rates',
            'point': '10Y',
            'quotingStyle': 'Clean',
            'extraKey': 'extraValue'
        },
        'value': 10.0,
        'permissions': ['READ']
    }
    result = {'rows': [row]}
    rk = RiskKey('rk_extra')
    codeflash_output = mdapi_table_handler(result, rk, InstrumentBase()); df = codeflash_output # 246μs -> 247μs (0.136% slower)

# ---- LARGE SCALE TEST CASES ----

def test_large_scale_many_rows():
    """Test with 1000 rows, points as numbers, values increasing"""
    N = 1000
    rows = [
        build_row('Swap', 'USD', 'Rates', str(i), 'Clean', float(i), ['READ'])
        for i in range(N)
    ]
    result = {'rows': rows}
    rk = RiskKey('rk_large')
    codeflash_output = mdapi_table_handler(result, rk, InstrumentBase()); df = codeflash_output # 1.84ms -> 1.57ms (17.6% faster)
    # Should be sorted by point (numerically)
    for i in range(N):
        pass

def test_large_scale_varied_types():
    """Test with 500 rows, alternating types/assets/permissions"""
    N = 500
    rows = [
        build_row(
            'Swap' if i % 2 == 0 else 'Bond',
            'USD' if i % 3 == 0 else 'EUR',
            'Rates' if i % 4 == 0 else 'Credit',
            [str(i), 'Spot'] if i % 5 == 0 else str(i),
            'Clean' if i % 2 == 0 else 'Dirty',
            i * 2.5,
            ['READ'] if i % 7 != 0 else ['TRADE', 'READ']
        )
        for i in range(N)
    ]
    result = {'rows': rows}
    rk = RiskKey('rk_varied')
    codeflash_output = mdapi_table_handler(result, rk, InstrumentBase()); df = codeflash_output # 1.13ms -> 1.00ms (13.1% faster)

def test_large_scale_permissions_none_and_empty():
    """Test with 100 rows, alternating permissions None and []"""
    N = 100
    rows = [
        build_row('Swap', 'USD', 'Rates', str(i), 'Clean', i, None if i % 2 == 0 else [])
        for i in range(N)
    ]
    result = {'rows': rows}
    rk = RiskKey('rk_perm_large')
    codeflash_output = mdapi_table_handler(result, rk, InstrumentBase()); df = codeflash_output # 441μs -> 413μs (6.80% faster)
    for i in range(N):
        if i % 2 == 0:
            pass
        else:
            pass



#------------------------------------------------
import pytest
from gs_quant.risk.result_handlers import mdapi_table_handler


# Minimal stubs for dependencies (since we're not to use pandas/numpy etc.)
class RiskKey:
    def __init__(self, name='test'):
        self.name = name

class InstrumentBase:
    pass

class DataFrameWithInfo:
    def __init__(self, data=None, risk_key=None, request_id=None):
        self.data = data if data is not None else []
        self.risk_key = risk_key
        self.request_id = request_id
        self.columns = ()
    def __eq__(self, other):
        return (
            isinstance(other, DataFrameWithInfo)
            and self.data == other.data
            and self.columns == other.columns
            and self.risk_key == other.risk_key
            and self.request_id == other.request_id
        )
    def __repr__(self):
        return f"DF(data={self.data}, cols={self.columns}, risk_key={self.risk_key}, request_id={self.request_id})"
from gs_quant.risk.result_handlers import mdapi_table_handler

# ========== UNIT TESTS ==========

# -------- BASIC TEST CASES --------

def test_basic_single_row_flat_point():
    # Single row, point is string
    result = {
        'rows': [
            {
                'coordinate': {
                    'type': 'TypeA',
                    'asset': 'Asset1',
                    'assetClass': 'Equity',
                    'point': '123',
                    'quotingStyle': 'Style1'
                },
                'value': 42,
                'permissions': ['read']
            }
        ]
    }
    rk = RiskKey()
    codeflash_output = mdapi_table_handler(result, rk, InstrumentBase()); df = codeflash_output # 305μs -> 301μs (1.17% faster)

def test_basic_single_row_list_point():
    # Single row, point is list
    result = {
        'rows': [
            {
                'coordinate': {
                    'type': 'TypeB',
                    'asset': 'Asset2',
                    'assetClass': 'FX',
                    'point': ['EUR', 'USD'],
                    'quotingStyle': 'Style2'
                },
                'value': 99,
                'permissions': ['write']
            }
        ]
    }
    rk = RiskKey()
    codeflash_output = mdapi_table_handler(result, rk, InstrumentBase()); df = codeflash_output # 262μs -> 252μs (3.73% faster)

def test_basic_multiple_rows_sorted():
    # Multiple rows, check sort order by point
    result = {
        'rows': [
            {
                'coordinate': {
                    'type': 'TypeA',
                    'asset': 'Asset1',
                    'assetClass': 'Equity',
                    'point': '2',
                    'quotingStyle': 'Style1'
                },
                'value': 20,
                'permissions': ['read']
            },
            {
                'coordinate': {
                    'type': 'TypeA',
                    'asset': 'Asset1',
                    'assetClass': 'Equity',
                    'point': '1',
                    'quotingStyle': 'Style1'
                },
                'value': 10,
                'permissions': ['read']
            }
        ]
    }
    rk = RiskKey()
    codeflash_output = mdapi_table_handler(result, rk, InstrumentBase()); df = codeflash_output # 246μs -> 255μs (3.63% slower)

def test_basic_permissions_list():
    # Permissions as list
    result = {
        'rows': [
            {
                'coordinate': {
                    'type': 'TypeA',
                    'asset': 'Asset1',
                    'assetClass': 'Equity',
                    'point': '1',
                    'quotingStyle': 'Style1'
                },
                'value': 5,
                'permissions': ['read', 'write']
            }
        ]
    }
    rk = RiskKey()
    codeflash_output = mdapi_table_handler(result, rk, InstrumentBase()); df = codeflash_output # 251μs -> 250μs (0.398% faster)

# -------- EDGE TEST CASES --------

def test_edge_empty_rows():
    # No rows in result
    result = {'rows': []}
    rk = RiskKey()
    codeflash_output = mdapi_table_handler(result, rk, InstrumentBase()); df = codeflash_output # 152μs -> 152μs (0.395% faster)

def test_edge_missing_point_and_value():
    # Missing 'point' and 'value'
    result = {
        'rows': [
            {
                'coordinate': {
                    'type': 'TypeC',
                    'asset': 'Asset3',
                    'assetClass': 'Rates',
                    # 'point' missing
                    'quotingStyle': 'Style3'
                },
                # 'value' missing
                'permissions': ['read']
            }
        ]
    }
    rk = RiskKey()
    codeflash_output = mdapi_table_handler(result, rk, InstrumentBase()); df = codeflash_output # 195μs -> 191μs (2.19% faster)

def test_edge_missing_permissions():
    # Missing permissions key (should raise KeyError)
    result = {
        'rows': [
            {
                'coordinate': {
                    'type': 'TypeD',
                    'asset': 'Asset4',
                    'assetClass': 'Commodities',
                    'point': 'Oil',
                    'quotingStyle': 'Style4'
                },
                'value': 77
                # 'permissions' missing
            }
        ]
    }
    rk = RiskKey()
    with pytest.raises(KeyError):
        mdapi_table_handler(result, rk, InstrumentBase()) # 2.04μs -> 1.67μs (22.3% faster)

def test_edge_point_is_none():
    # Point is explicitly None
    result = {
        'rows': [
            {
                'coordinate': {
                    'type': 'TypeE',
                    'asset': 'Asset5',
                    'assetClass': 'Credit',
                    'point': None,
                    'quotingStyle': 'Style5'
                },
                'value': 88,
                'permissions': ['read']
            }
        ]
    }
    rk = RiskKey()
    codeflash_output = mdapi_table_handler(result, rk, InstrumentBase()); df = codeflash_output # 266μs -> 270μs (1.13% slower)

def test_edge_point_is_empty_list():
    # Point is empty list
    result = {
        'rows': [
            {
                'coordinate': {
                    'type': 'TypeF',
                    'asset': 'Asset6',
                    'assetClass': 'FX',
                    'point': [],
                    'quotingStyle': 'Style6'
                },
                'value': 101,
                'permissions': ['read']
            }
        ]
    }
    rk = RiskKey()
    codeflash_output = mdapi_table_handler(result, rk, InstrumentBase()); df = codeflash_output # 257μs -> 256μs (0.239% faster)

def test_edge_extra_coordinate_fields():
    # Extra fields in coordinate, should be ignored
    result = {
        'rows': [
            {
                'coordinate': {
                    'type': 'TypeG',
                    'asset': 'Asset7',
                    'assetClass': 'Rates',
                    'point': '5',
                    'quotingStyle': 'Style7',
                    'extraField': 'should_ignore'
                },
                'value': 55,
                'permissions': ['read']
            }
        ]
    }
    rk = RiskKey()
    codeflash_output = mdapi_table_handler(result, rk, InstrumentBase()); df = codeflash_output # 254μs -> 246μs (3.09% faster)
    # Extra field not present in output

def test_edge_permissions_is_empty_list():
    # Permissions is empty list
    result = {
        'rows': [
            {
                'coordinate': {
                    'type': 'TypeH',
                    'asset': 'Asset8',
                    'assetClass': 'Equity',
                    'point': '7',
                    'quotingStyle': 'Style8'
                },
                'value': 33,
                'permissions': []
            }
        ]
    }
    rk = RiskKey()
    codeflash_output = mdapi_table_handler(result, rk, InstrumentBase()); df = codeflash_output # 242μs -> 246μs (1.63% slower)

def test_edge_permissions_is_none():
    # Permissions is None (should raise TypeError when trying to update coordinate)
    result = {
        'rows': [
            {
                'coordinate': {
                    'type': 'TypeI',
                    'asset': 'Asset9',
                    'assetClass': 'FX',
                    'point': 'GBP',
                    'quotingStyle': 'Style9'
                },
                'value': 100,
                'permissions': None
            }
        ]
    }
    rk = RiskKey()
    codeflash_output = mdapi_table_handler(result, rk, InstrumentBase()); df = codeflash_output # 250μs -> 241μs (3.41% faster)

def test_edge_missing_coordinate():
    # Row missing 'coordinate' key (should raise KeyError)
    result = {
        'rows': [
            {
                # 'coordinate' missing
                'value': 123,
                'permissions': ['read']
            }
        ]
    }
    rk = RiskKey()
    with pytest.raises(KeyError):
        mdapi_table_handler(result, rk, InstrumentBase()) # 1.04μs -> 1.16μs (10.4% slower)

def test_edge_coordinate_not_dict():
    # Coordinate is not a dict (should raise AttributeError)
    result = {
        'rows': [
            {
                'coordinate': 'not_a_dict',
                'value': 123,
                'permissions': ['read']
            }
        ]
    }
    rk = RiskKey()
    with pytest.raises(AttributeError):
        mdapi_table_handler(result, rk, InstrumentBase()) # 1.59μs -> 1.64μs (3.17% slower)

# -------- LARGE SCALE TEST CASES --------

def test_large_scale_1000_rows():
    # 1000 rows, points as increasing numbers
    rows = []
    for i in range(1000):
        rows.append({
            'coordinate': {
                'type': 'TypeBulk',
                'asset': f'Asset{i%10}',
                'assetClass': 'Equity',
                'point': str(i),
                'quotingStyle': 'StyleBulk'
            },
            'value': i,
            'permissions': ['read']
        })
    result = {'rows': rows}
    rk = RiskKey()
    codeflash_output = mdapi_table_handler(result, rk, InstrumentBase()); df = codeflash_output # 2.06ms -> 1.78ms (15.8% faster)

def test_large_scale_points_are_lists():
    # 1000 rows, points as lists
    rows = []
    for i in range(1000):
        pt = [str(i), str(999-i)]
        rows.append({
            'coordinate': {
                'type': 'TypeBulk',
                'asset': f'Asset{i%10}',
                'assetClass': 'Equity',
                'point': pt,
                'quotingStyle': 'StyleBulk'
            },
            'value': i,
            'permissions': ['read']
        })
    result = {'rows': rows}
    rk = RiskKey()
    codeflash_output = mdapi_table_handler(result, rk, InstrumentBase()); df = codeflash_output # 2.13ms -> 1.83ms (16.3% faster)

def test_large_scale_permissions_varied():
    # 1000 rows, permissions alternate between lists
    rows = []
    for i in range(1000):
        perms = ['read'] if i % 2 == 0 else ['write', 'read']
        rows.append({
            'coordinate': {
                'type': 'TypeBulk',
                'asset': f'Asset{i%10}',
                'assetClass': 'Equity',
                'point': str(i),
                'quotingStyle': 'StyleBulk'
            },
            'value': i,
            'permissions': perms
        })
    result = {'rows': rows}
    rk = RiskKey()
    codeflash_output = mdapi_table_handler(result, rk, InstrumentBase()); df = codeflash_output # 2.03ms -> 1.76ms (15.1% faster)

def test_large_scale_missing_value():
    # 1000 rows, every 10th row missing value
    rows = []
    for i in range(1000):
        row = {
            'coordinate': {
                'type': 'TypeBulk',
                'asset': f'Asset{i%10}',
                'assetClass': 'Equity',
                'point': str(i),
                'quotingStyle': 'StyleBulk'
            },
            'permissions': ['read']
        }
        if i % 10 != 0:
            row['value'] = i
        rows.append(row)
    result = {'rows': rows}
    rk = RiskKey()
    codeflash_output = mdapi_table_handler(result, rk, InstrumentBase()); df = codeflash_output # 2.01ms -> 1.74ms (15.5% faster)
    # Every 10th row value is None
    for i in range(0, 1000, 10):
        pass
    # Others are correct
    for i in range(1, 1000):
        if i % 10 != 0:
            pass

To edit these changes git checkout codeflash/optimize-mdapi_table_handler-mhb49ope and push.

Codeflash

The optimizations deliver a **9% speedup** through several key improvements:

**1. Eliminated redundant data structure operations in `__dataframe_handler`:**
- Replaced boolean index array (`[False] * len(...)`) with direct index collection using `append()`, avoiding unnecessary list pre-allocation and boolean flag tracking
- Changed tuple concatenation (`columns += ((mappings_lookup[src]),)`) to list append operations, which are significantly faster for building collections incrementally

**2. Optimized iteration patterns:**
- Used explicit iterator management (`result_iter = iter(result)`) to avoid re-creating iterators when processing the remaining data after extracting the first row
- Implemented a generator function `_filtered_rows()` that processes both the first row and remaining rows in a single pass, eliminating the need to reconstruct the full dataset for filtering

**3. Reduced dictionary operations in `mdapi_table_handler`:**
- Eliminated multiple `update()` calls on the coordinate dictionary by using direct assignment (`coordinate['point'] = point` vs `coordinate.update({'point': point})`)
- Cached the `coordinates.append` method reference to avoid repeated attribute lookups in the tight loop
- Added `rows = result['rows']` to avoid repeated dictionary access

**4. Memory access optimizations:**
- Pre-converted keys to a list (`key_list = list(first_row.keys())`) to avoid repeated dictionary key iteration
- Used list operations instead of tuple concatenation during the filtering phase, only converting to tuple once at the end

The optimizations are particularly effective for **large-scale test cases** (1000+ rows), showing 15-17% improvements, while maintaining correctness across all edge cases including empty data, missing values, and varied data types. The performance gains come primarily from reducing Python object creation overhead and eliminating redundant operations in tight loops.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 28, 2025 22:06
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: Medium Optimization Quality according to Codeflash labels Oct 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: Medium Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant