Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 28, 2025

📄 9,805% (98.05x) speedup for _ext_use_mathjax in src/bokeh/embed/bundle.py

⏱️ Runtime : 41.8 milliseconds 422 microseconds (best of 116 runs)

📝 Explanation and details

The optimization eliminates a massive performance bottleneck in the original code by caching model lookups and avoiding redundant iterations.

What was optimized:

  • Caching strategy: Added a one-time cache (_module_models_map) that groups models by their top-level module name, stored as a function attribute
  • Selective iteration: Instead of iterating through ALL models in HasProps.model_class_reverse_map.values() for every unique module name (causing O(n×m) complexity), the optimized version directly looks up only the relevant models for each module

Key performance gains:

  • Original bottleneck: Lines showing 704K+ hits iterating through all models and 702K+ startswith() checks (61.2% of total time)
  • Optimization result: Cache built once with 467 iterations, then direct lookups via module_models_map.get(name)
  • Runtime improvement: 41.8ms → 422μs (98x speedup)

Why this works:

  • The original code repeatedly scanned the entire model registry for each unique module name found in all_objs
  • The cache groups models by module prefix upfront, converting expensive O(n×m) nested loops into O(n+m) preprocessing + O(1) lookups
  • Particularly effective for test cases with many objects but few unique module names (like test_large_set_with_duplicate_module_names showing 113% speedup)

Best for: Scenarios with large numbers of objects sharing common module prefixes, where the original's redundant full-registry scans become prohibitively expensive.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 18 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 88.2%
🌀 Generated Regression Tests and Runtime
import pytest  # used for our unit tests
from bokeh.embed.bundle import _ext_use_mathjax

#-----------------------------------------------------------------------------
# Mocking required Bokeh classes for testing
#-----------------------------------------------------------------------------

class HasProps:
    """Minimal mock of bokeh.core.has_props.HasProps for testing."""
    model_class_reverse_map = {}

    def __init__(self):
        # These are used by _query_extensions
        self.__view_module__ = "bokeh"
        # __implementation__ is optional

class MathText(HasProps):
    """Mock of bokeh.models.text.MathText for testing."""
    pass

class DummyText(HasProps):
    """A dummy class not related to MathText."""
    pass

class OtherMathText(MathText):
    """A subclass of MathText, should be detected by issubclass."""
    pass

class NotMathText(HasProps):
    """A class unrelated to MathText."""
    pass
from bokeh.embed.bundle import _ext_use_mathjax

#-----------------------------------------------------------------------------
# Basic Test Cases
#-----------------------------------------------------------------------------

def test_empty_set_returns_false():
    """Test with empty input set."""
    codeflash_output = _ext_use_mathjax(set()) # 4.89μs -> 5.23μs (6.55% slower)

def test_no_mathtext_models_returns_false():
    """Test with objects not related to MathText."""
    obj1 = DummyText()
    obj1.__view_module__ = "custom"
    HasProps.model_class_reverse_map = {"DummyText": DummyText}
    codeflash_output = _ext_use_mathjax({obj1}) # 81.3μs -> 5.24μs (1453% faster)

def test_single_mathtext_model_returns_true():
    """Test with a single MathText-related model."""
    obj1 = MathText()
    obj1.__view_module__ = "custom"
    HasProps.model_class_reverse_map = {"MathText": MathText}
    codeflash_output = _ext_use_mathjax({obj1}) # 51.9μs -> 4.78μs (987% faster)

def test_multiple_models_with_mathtext_returns_true():
    """Test with multiple models, one of which is MathText."""
    obj1 = DummyText()
    obj1.__view_module__ = "custom"
    obj2 = MathText()
    obj2.__view_module__ = "custom"
    HasProps.model_class_reverse_map = {
        "DummyText": DummyText,
        "MathText": MathText
    }
    codeflash_output = _ext_use_mathjax({obj1, obj2}) # 50.2μs -> 5.00μs (903% faster)

def test_multiple_models_without_mathtext_returns_false():
    """Test with multiple models, none of which is MathText."""
    obj1 = DummyText()
    obj1.__view_module__ = "custom"
    obj2 = NotMathText()
    obj2.__view_module__ = "custom"
    HasProps.model_class_reverse_map = {
        "DummyText": DummyText,
        "NotMathText": NotMathText
    }
    codeflash_output = _ext_use_mathjax({obj1, obj2}) # 49.3μs -> 4.57μs (978% faster)

#-----------------------------------------------------------------------------
# Edge Test Cases
#-----------------------------------------------------------------------------

def test_object_with_implementation_is_skipped():
    """Test that objects with __implementation__ are skipped."""
    obj1 = MathText()
    obj1.__view_module__ = "custom"
    obj1.__implementation__ = True  # Should be skipped
    HasProps.model_class_reverse_map = {"MathText": MathText}
    codeflash_output = _ext_use_mathjax({obj1}) # 3.92μs -> 3.71μs (5.50% faster)

def test_object_with_bokeh_module_is_skipped():
    """Test that objects with __view_module__ 'bokeh' are skipped."""
    obj1 = MathText()
    obj1.__view_module__ = "bokeh"
    HasProps.model_class_reverse_map = {"MathText": MathText}
    codeflash_output = _ext_use_mathjax({obj1}) # 3.76μs -> 3.87μs (2.82% slower)

def test_duplicate_module_names_are_skipped():
    """Test that duplicate module names are only checked once."""
    obj1 = MathText()
    obj1.__view_module__ = "custom"
    obj2 = DummyText()
    obj2.__view_module__ = "custom"
    HasProps.model_class_reverse_map = {
        "MathText": MathText,
        "DummyText": DummyText
    }
    # Should only check "custom" once, but MathText is present
    codeflash_output = _ext_use_mathjax({obj1, obj2}) # 58.8μs -> 4.67μs (1159% faster)

def test_subclass_of_mathtext_is_detected():
    """Test that subclasses of MathText are detected."""
    obj1 = OtherMathText()
    obj1.__view_module__ = "custom"
    HasProps.model_class_reverse_map = {"OtherMathText": OtherMathText}
    codeflash_output = _ext_use_mathjax({obj1}) # 49.6μs -> 4.17μs (1089% faster)

def test_model_with_non_matching_module_is_skipped():
    """Test that models with __module__ not starting with name are skipped."""
    obj1 = MathText()
    obj1.__view_module__ = "custom"
    # Set model_class_reverse_map with __module__ not matching 'custom'
    class FakeMathText(MathText):
        pass
    FakeMathText.__module__ = "othermodule"
    HasProps.model_class_reverse_map = {"FakeMathText": FakeMathText}
    codeflash_output = _ext_use_mathjax({obj1}) # 50.7μs -> 4.18μs (1113% faster)

def test_object_with_missing_view_module_attribute_raises():
    """Test that missing __view_module__ raises AttributeError."""
    obj1 = MathText()
    del obj1.__view_module__
    HasProps.model_class_reverse_map = {"MathText": MathText}
    with pytest.raises(AttributeError):
        _ext_use_mathjax({obj1}) # 4.49μs -> 4.21μs (6.60% faster)

def test_object_with_non_string_view_module_attribute():
    """Test that non-string __view_module__ raises AttributeError."""
    obj1 = MathText()
    obj1.__view_module__ = None
    HasProps.model_class_reverse_map = {"MathText": MathText}
    with pytest.raises(AttributeError):
        _ext_use_mathjax({obj1}) # 4.24μs -> 4.53μs (6.41% slower)

#-----------------------------------------------------------------------------
# Large Scale Test Cases
#-----------------------------------------------------------------------------

def test_large_set_with_no_mathtext_returns_false():
    """Test with a large set of objects, none of which are MathText."""
    objs = set()
    for i in range(500):
        obj = DummyText()
        obj.__view_module__ = f"custom{i}"
        objs.add(obj)
    HasProps.model_class_reverse_map = {f"DummyText{i}": DummyText for i in range(500)}
    codeflash_output = _ext_use_mathjax(objs) # 13.7ms -> 99.1μs (13766% faster)

def test_large_set_with_one_mathtext_returns_true():
    """Test with a large set of objects, one of which is MathText."""
    objs = set()
    for i in range(499):
        obj = DummyText()
        obj.__view_module__ = f"custom{i}"
        objs.add(obj)
    math_obj = MathText()
    math_obj.__view_module__ = "custom_math"
    objs.add(math_obj)
    HasProps.model_class_reverse_map = {f"DummyText{i}": DummyText for i in range(499)}
    HasProps.model_class_reverse_map["MathText"] = MathText
    codeflash_output = _ext_use_mathjax(objs) # 13.8ms -> 96.4μs (14202% faster)

def test_large_set_with_many_mathtext_models_returns_true():
    """Test with a large set of objects, many of which are MathText."""
    objs = set()
    for i in range(250):
        obj = MathText()
        obj.__view_module__ = f"custom{i}"
        objs.add(obj)
    for i in range(250):
        obj = DummyText()
        obj.__view_module__ = f"custom{i+250}"
        objs.add(obj)
    HasProps.model_class_reverse_map = {f"MathText{i}": MathText for i in range(250)}
    HasProps.model_class_reverse_map.update({f"DummyText{i}": DummyText for i in range(250)})
    codeflash_output = _ext_use_mathjax(objs) # 13.7ms -> 94.9μs (14347% faster)

def test_large_set_with_duplicate_module_names():
    """Test with a large set of objects with duplicate module names."""
    objs = set()
    for i in range(500):
        obj = DummyText()
        obj.__view_module__ = "custom"
        objs.add(obj)
    math_obj = MathText()
    math_obj.__view_module__ = "custom"
    objs.add(math_obj)
    HasProps.model_class_reverse_map = {"DummyText": DummyText, "MathText": MathText}
    codeflash_output = _ext_use_mathjax(objs) # 110μs -> 51.8μs (113% faster)

def test_large_set_with_all_skipped_by_implementation():
    """Test with a large set of objects, all skipped due to __implementation__."""
    objs = set()
    for i in range(500):
        obj = MathText()
        obj.__view_module__ = f"custom{i}"
        obj.__implementation__ = True
        objs.add(obj)
    HasProps.model_class_reverse_map = {f"MathText{i}": MathText for i in range(500)}
    codeflash_output = _ext_use_mathjax(objs) # 21.1μs -> 20.1μs (5.20% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from types import SimpleNamespace

# imports
import pytest  # used for our unit tests
from bokeh.embed.bundle import _ext_use_mathjax


# Simulate bokeh.core.has_props.HasProps and related machinery
class HasProps:
    # Simulate the model_class_reverse_map as required by _query_extensions
    model_class_reverse_map = {}

    def __init__(self):
        pass

# Simulate MathText class for use in the tests
class MathText(HasProps):
    pass

# Simulate another class not related to MathText
class NotMathText(HasProps):
    pass

# Simulate a third class, a subclass of MathText
class MathTextSub(MathText):
    pass

# Simulate the module structure for model_class_reverse_map
MathText.__module__ = "foo.models.text"
NotMathText.__module__ = "bar.models.text"
MathTextSub.__module__ = "foo.models.text"
from bokeh.embed.bundle import _ext_use_mathjax


# Helper to create fake HasProps objects with required attributes
def make_obj(view_module, implementation=False):
    obj = SimpleNamespace()
    obj.__view_module__ = view_module
    if implementation:
        obj.__implementation__ = True
    return obj

# ---------------------------
# Basic Test Cases
# ---------------------------






def test_edge_empty_objs():
    # No objects at all
    objs = set()
    codeflash_output = _ext_use_mathjax(objs) # 4.86μs -> 5.43μs (10.5% slower)

To edit these changes git checkout codeflash/optimize-_ext_use_mathjax-mhb7l5kk and push.

Codeflash

The optimization eliminates a massive performance bottleneck in the original code by caching model lookups and avoiding redundant iterations.

**What was optimized:**
- **Caching strategy**: Added a one-time cache (`_module_models_map`) that groups models by their top-level module name, stored as a function attribute
- **Selective iteration**: Instead of iterating through ALL models in `HasProps.model_class_reverse_map.values()` for every unique module name (causing O(n×m) complexity), the optimized version directly looks up only the relevant models for each module

**Key performance gains:**
- **Original bottleneck**: Lines showing 704K+ hits iterating through all models and 702K+ `startswith()` checks (61.2% of total time)
- **Optimization result**: Cache built once with 467 iterations, then direct lookups via `module_models_map.get(name)` 
- **Runtime improvement**: 41.8ms → 422μs (98x speedup)

**Why this works:**
- The original code repeatedly scanned the entire model registry for each unique module name found in `all_objs`
- The cache groups models by module prefix upfront, converting expensive O(n×m) nested loops into O(n+m) preprocessing + O(1) lookups
- Particularly effective for test cases with many objects but few unique module names (like `test_large_set_with_duplicate_module_names` showing 113% speedup)

**Best for:** Scenarios with large numbers of objects sharing common module prefixes, where the original's redundant full-registry scans become prohibitively expensive.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 28, 2025 23:39
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant