Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 28, 2025

📄 33% (0.33x) speedup for markdown_to_marimo in marimo/_convert/utils.py

⏱️ Runtime : 84.8 microseconds 63.9 microseconds (best of 146 runs)

📝 Explanation and details

The optimization achieves a 32% speedup through two key changes:

1. Replaced textwrap.indent with direct string formatting

  • Changed textwrap.indent(text, INDENT) to f"{INDENT}{text}" in the indent_text function
  • This eliminates the overhead of textwrap.indent, which uses regular expressions and is designed for multi-line text processing - overkill for simple single-line indentation
  • The line profiler shows indent_text execution time dropped from 71,310ns to 7,907ns (89% faster)

2. Pre-computed the indented string to avoid repeated function calls

  • Instead of calling codegen.indent_text('r"""') inside the list construction for join(), the result is now computed once and stored in indented_r_triple_quote
  • This reduces function call overhead during list construction in the multi-line code path

Why these optimizations work:

  • textwrap.indent is optimized for complex multi-line indentation with various edge cases, but here we only need to prepend a constant string
  • Moving the function call outside the list eliminates redundant work during string joining
  • Direct string concatenation via f-strings is one of Python's fastest string operations

Test case performance patterns:

  • Single-line cases show minimal change (they don't use indent_text)
  • Multi-line cases show significant improvements (50-100% faster) where indent_text is actually called
  • The optimization scales well with larger inputs, as seen in the large multi-line test cases

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 5 Passed
🌀 Generated Regression Tests 34 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 2 Passed
📊 Tests Coverage 100.0%
⚙️ Existing Unit Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
_convert/test_convert_utils.py::test_markdown_to_marimo 4.93μs 3.20μs 53.9%✅
🌀 Generated Regression Tests and Runtime
import textwrap

# imports
import pytest  # used for our unit tests
from marimo._convert.utils import markdown_to_marimo

INDENT = "    "
from marimo._convert.utils import markdown_to_marimo

# unit tests

# --- Basic Test Cases ---

def test_basic_single_line():
    # Test a simple single-line markdown
    codeflash_output = markdown_to_marimo("Hello, world!") # 690ns -> 706ns (2.27% slower)

def test_basic_multiline():
    # Test a simple multi-line markdown
    input_md = "Hello,\nworld!"
    expected = "\n".join([
        "mo.md(",
        INDENT + 'r"""',
        input_md,
        '"""',
        ")"
    ])
    codeflash_output = markdown_to_marimo(input_md) # 3.35μs -> 1.69μs (98.4% faster)

def test_basic_triple_quotes_in_single_line():
    # Test single-line markdown with triple quotes
    input_md = 'Hello """ world'
    expected = 'mo.md(r"""Hello \\"\\"\\" world""")'
    codeflash_output = markdown_to_marimo(input_md) # 1.18μs -> 1.18μs (0.766% faster)

def test_basic_triple_quotes_in_multiline():
    # Test multi-line markdown with triple quotes
    input_md = 'Hello\n"""\nworld'
    expected = "\n".join([
        "mo.md(",
        INDENT + 'r"""',
        'Hello',
        '\\"\\"\\"',
        'world',
        '"""',
        ")"
    ])
    codeflash_output = markdown_to_marimo(input_md) # 3.65μs -> 2.13μs (71.6% faster)

# --- Edge Test Cases ---

def test_edge_empty_string():
    # Test empty string input
    codeflash_output = markdown_to_marimo("") # 634ns -> 628ns (0.955% faster)

def test_edge_only_newline():
    # Test input with only a newline
    input_md = "\n"
    expected = "\n".join([
        "mo.md(",
        INDENT + 'r"""',
        input_md,
        '"""',
        ")"
    ])
    codeflash_output = markdown_to_marimo(input_md) # 3.14μs -> 1.59μs (98.0% faster)

def test_edge_only_spaces():
    # Test input with only spaces
    input_md = "    "
    expected = 'mo.md(r"""    """)'
    codeflash_output = markdown_to_marimo(input_md) # 696ns -> 723ns (3.73% slower)

def test_edge_only_triple_quotes():
    # Test input with only triple quotes
    input_md = '"""'
    expected = 'mo.md(r"""\\"\\"\\"""")'
    codeflash_output = markdown_to_marimo(input_md) # 1.00μs -> 1.02μs (1.77% slower)

def test_edge_six_quotes_in_row():
    # Test input with six quotes in a row
    input_md = '""""""'
    expected = 'mo.md(r"""\\"\\"\\"\\"\\"\\"""")'
    codeflash_output = markdown_to_marimo(input_md) # 1.07μs -> 1.08μs (1.30% slower)

def test_edge_mixed_quotes_and_newlines():
    # Test input with triple quotes and newlines
    input_md = '"""\n"""\n'
    expected = "\n".join([
        "mo.md(",
        INDENT + 'r"""',
        '\\"\\"\\"',
        '\\"\\"\\"',
        '',
        '"""',
        ")"
    ])
    codeflash_output = markdown_to_marimo(input_md) # 3.79μs -> 2.17μs (74.5% faster)

def test_edge_unicode_characters():
    # Test input with unicode characters
    input_md = "你好,世界!"
    expected = 'mo.md(r"""你好,世界!""")'
    codeflash_output = markdown_to_marimo(input_md) # 1.40μs -> 1.33μs (5.49% faster)

def test_edge_backslashes():
    # Test input with backslashes
    input_md = "foo\\bar"
    expected = 'mo.md(r"""foo\\bar""")'
    codeflash_output = markdown_to_marimo(input_md) # 694ns -> 722ns (3.88% slower)

def test_edge_leading_and_trailing_whitespace():
    # Test input with leading and trailing whitespace
    input_md = "  hello world  "
    expected = 'mo.md(r"""  hello world  """)'
    codeflash_output = markdown_to_marimo(input_md) # 709ns -> 735ns (3.54% slower)

def test_edge_newline_at_end():
    # Test input with newline at the end
    input_md = "hello world\n"
    expected = "\n".join([
        "mo.md(",
        INDENT + 'r"""',
        "hello world",
        "",
        '"""',
        ")"
    ])
    codeflash_output = markdown_to_marimo(input_md) # 3.57μs -> 1.88μs (90.1% faster)

def test_edge_newline_at_start():
    # Test input with newline at the start
    input_md = "\nhello world"
    expected = "\n".join([
        "mo.md(",
        INDENT + 'r"""',
        "",
        "hello world",
        '"""',
        ")"
    ])
    codeflash_output = markdown_to_marimo(input_md) # 3.06μs -> 1.53μs (100% faster)

def test_edge_multiple_consecutive_newlines():
    # Test input with multiple consecutive newlines
    input_md = "a\n\nb"
    expected = "\n".join([
        "mo.md(",
        INDENT + 'r"""',
        "a",
        "",
        "b",
        '"""',
        ")"
    ])
    codeflash_output = markdown_to_marimo(input_md) # 2.93μs -> 1.57μs (86.7% faster)

# --- Large Scale Test Cases ---

def test_large_single_line():
    # Test a very long single line
    input_md = "x" * 1000
    expected = f'mo.md(r"""{"x"*1000}""")'
    codeflash_output = markdown_to_marimo(input_md) # 1.09μs -> 1.08μs (0.369% faster)

def test_large_multiline():
    # Test a large multiline markdown (1000 lines)
    lines = [f"line {i}" for i in range(1000)]
    input_md = "\n".join(lines)
    expected = "\n".join([
        "mo.md(",
        INDENT + 'r"""',
        input_md,
        '"""',
        ")"
    ])
    codeflash_output = markdown_to_marimo(input_md) # 5.11μs -> 3.38μs (50.9% faster)

def test_large_triple_quotes():
    # Test a large input with many triple quotes
    input_md = '"""\n' * 500  # 500 lines, each with triple quotes
    # Each triple quote should be escaped
    expected_lines = ['\\\"\\\"\\\"' for _ in range(500)]
    expected = "\n".join([
        "mo.md(",
        INDENT + 'r"""',
        "\n".join(expected_lines),
        '"""',
        ")"
    ])
    codeflash_output = markdown_to_marimo(input_md.rstrip('\n')) # 12.0μs -> 10.4μs (14.5% faster)

def test_large_mixed_content():
    # Test a large input with mixed content
    input_md = "\n".join([
        "### Header",
        "Some text.",
        '"""',
        "More text.",
        '"""',
        "End."
    ] * 150)  # 150 blocks, total ~900 lines
    # Build expected output
    expected_lines = []
    for _ in range(150):
        expected_lines.extend([
            "### Header",
            "Some text.",
            '\\\"\\\"\\\"',
            "More text.",
            '\\\"\\\"\\\"',
            "End."
        ])
    expected = "\n".join([
        "mo.md(",
        INDENT + 'r"""',
        "\n".join(expected_lines),
        '"""',
        ")"
    ])
    codeflash_output = markdown_to_marimo(input_md) # 8.87μs -> 7.47μs (18.8% faster)

def test_large_unicode():
    # Test large input with unicode characters
    input_md = "你好\n" * 1000
    expected = "\n".join([
        "mo.md(",
        INDENT + 'r"""',
        input_md.rstrip('\n'),
        '"""',
        ")"
    ])
    codeflash_output = markdown_to_marimo(input_md.rstrip('\n')) # 4.19μs -> 2.67μs (57.0% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import textwrap

# imports
import pytest  # used for our unit tests
from marimo._convert.utils import markdown_to_marimo

# unit tests

# ------------------- Basic Test Cases -------------------

def test_single_line_markdown():
    # Test: basic single line markdown
    input_str = "Hello, world!"
    expected = 'mo.md(r"""Hello, world!""")'
    codeflash_output = markdown_to_marimo(input_str) # 746ns -> 755ns (1.19% slower)

def test_single_line_markdown_with_special_chars():
    # Test: single line with markdown special characters
    input_str = "# Header *italic* **bold**"
    expected = 'mo.md(r"""# Header *italic* **bold**""")'
    codeflash_output = markdown_to_marimo(input_str) # 739ns -> 761ns (2.89% slower)

def test_single_line_with_triple_quotes():
    # Test: single line containing triple quotes
    input_str = 'This contains """ triple quotes'
    expected = 'mo.md(r"""This contains \\"\\"\\" triple quotes""")'
    codeflash_output = markdown_to_marimo(input_str) # 1.29μs -> 1.21μs (6.76% faster)



def test_empty_string():
    # Test: empty string input
    input_str = ""
    expected = 'mo.md(r""" """)'
    codeflash_output = markdown_to_marimo(input_str) # 840ns -> 780ns (7.69% faster)



def test_string_with_escaped_quotes():
    # Test: input with escaped quotes
    input_str = 'This has \\" escaped quote'
    expected = 'mo.md(r"""This has \\" escaped quote""")'
    codeflash_output = markdown_to_marimo(input_str) # 936ns -> 969ns (3.41% slower)

def test_string_with_six_quotes():
    # Test: input with six quotes in a row
    input_str = '""""""'
    expected = 'mo.md(r"""\\"\\"\\"\\"\\"\\"""")'
    codeflash_output = markdown_to_marimo(input_str) # 1.17μs -> 1.19μs (1.76% slower)


def test_string_with_only_spaces():
    # Test: input is only spaces
    input_str = "   "
    expected = 'mo.md(r"""   """)'
    codeflash_output = markdown_to_marimo(input_str) # 875ns -> 861ns (1.63% faster)

def test_string_with_leading_trailing_whitespace():
    # Test: input with leading and trailing whitespace
    input_str = "  Hello world!  "
    expected = 'mo.md(r"""  Hello world!  """)'
    codeflash_output = markdown_to_marimo(input_str) # 751ns -> 781ns (3.84% slower)

def test_string_with_backslashes():
    # Test: input contains backslashes
    input_str = r"This is a backslash: \\"
    expected = 'mo.md(r"""This is a backslash: \\\\""")'
    codeflash_output = markdown_to_marimo(input_str) # 728ns -> 727ns (0.138% faster)

def test_string_with_unicode():
    # Test: input contains unicode characters
    input_str = "Unicode: ☀️ 🌙"
    expected = 'mo.md(r"""Unicode: ☀️ 🌙""")'
    codeflash_output = markdown_to_marimo(input_str) # 1.46μs -> 1.41μs (4.13% faster)

def test_string_with_tabs():
    # Test: input contains tabs
    input_str = "Tab\tseparated\tvalues"
    expected = 'mo.md(r"""Tab\tseparated\tvalues""")'
    codeflash_output = markdown_to_marimo(input_str) # 720ns -> 764ns (5.76% slower)

def test_string_with_carriage_return():
    # Test: input contains carriage return
    input_str = "Line1\rLine2"
    expected = 'mo.md(r"""Line1\rLine2""")'
    codeflash_output = markdown_to_marimo(input_str) # 652ns -> 722ns (9.70% slower)



def test_large_single_line():
    # Test: very long single line string (1000 chars)
    input_str = "a" * 1000
    expected = f'mo.md(r"""{"a" * 1000}""")'
    codeflash_output = markdown_to_marimo(input_str) # 1.31μs -> 1.27μs (3.47% faster)





#------------------------------------------------
from marimo._convert.utils import markdown_to_marimo

def test_markdown_to_marimo():
    markdown_to_marimo('\n')

def test_markdown_to_marimo_2():
    markdown_to_marimo('')
🔎 Concolic Coverage Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
codeflash_concolic_hg3s6k0k/tmpt59_l6a8/test_concolic_coverage.py::test_markdown_to_marimo 4.21μs 2.08μs 102%✅
codeflash_concolic_hg3s6k0k/tmpt59_l6a8/test_concolic_coverage.py::test_markdown_to_marimo_2 643ns 666ns -3.45%⚠️

To edit these changes git checkout codeflash/optimize-markdown_to_marimo-mhb5x0fa and push.

Codeflash

The optimization achieves a 32% speedup through two key changes:

**1. Replaced `textwrap.indent` with direct string formatting**
- Changed `textwrap.indent(text, INDENT)` to `f"{INDENT}{text}"` in the `indent_text` function
- This eliminates the overhead of `textwrap.indent`, which uses regular expressions and is designed for multi-line text processing - overkill for simple single-line indentation
- The line profiler shows `indent_text` execution time dropped from 71,310ns to 7,907ns (89% faster)

**2. Pre-computed the indented string to avoid repeated function calls**
- Instead of calling `codegen.indent_text('r"""')` inside the list construction for `join()`, the result is now computed once and stored in `indented_r_triple_quote`
- This reduces function call overhead during list construction in the multi-line code path

**Why these optimizations work:**
- `textwrap.indent` is optimized for complex multi-line indentation with various edge cases, but here we only need to prepend a constant string
- Moving the function call outside the list eliminates redundant work during string joining
- Direct string concatenation via f-strings is one of Python's fastest string operations

**Test case performance patterns:**
- Single-line cases show minimal change (they don't use `indent_text`)  
- Multi-line cases show significant improvements (50-100% faster) where `indent_text` is actually called
- The optimization scales well with larger inputs, as seen in the large multi-line test cases
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 28, 2025 22:53
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant