Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 28, 2025

📄 18% (0.18x) speedup for sql_to_marimo in marimo/_convert/utils.py

⏱️ Runtime : 797 microseconds 677 microseconds (best of 358 runs)

📝 Explanation and details

The optimization replaces Python's textwrap.indent() with a custom implementation that's ~40% faster for the indent_text() function.

Key changes:

  • Removed textwrap dependency: Eliminated the import and function call overhead by implementing indentation inline
  • Optimized empty text handling: Added early return for empty strings to avoid unnecessary processing
  • Direct string operations: Uses splitlines(keepends=True) and a generator expression with "".join() instead of the more general-purpose textwrap.indent()

Why it's faster:

  • Avoids module import overhead and function call indirection
  • The custom implementation is more targeted - it only handles the specific indentation pattern needed (4 spaces) rather than textwrap's general-purpose logic
  • Generator expression with join is more efficient than textwrap's internal string building for this specific use case

Test case performance:

  • Best gains on empty/small inputs (32-38% faster) due to the early return optimization
  • Consistent 15-25% speedup across most test cases regardless of SQL complexity
  • Large-scale tests (500-1000 lines) still see 15-18% improvements, showing the optimization scales well

The speedup comes primarily from eliminating the textwrap overhead rather than algorithmic improvements, making it effective across all input sizes.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 45 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 1 Passed
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from __future__ import annotations

import textwrap

# imports
import pytest  # used for our unit tests
from marimo._convert.utils import sql_to_marimo

# function to test
# Copyright 2024 Marimo. All rights reserved.


INDENT = "    "
from marimo._convert.utils import sql_to_marimo

# unit tests

# Basic Test Cases

def test_basic_sql_to_marimo_minimal():
    # Test with minimal SQL and table name
    codeflash_output = sql_to_marimo("SELECT 1", "my_table"); result = codeflash_output # 4.75μs -> 3.99μs (19.1% faster)

def test_basic_sql_to_marimo_hide_output_true():
    # Test with hide_output True
    codeflash_output = sql_to_marimo("SELECT * FROM foo", "t", hide_output=True); result = codeflash_output # 5.62μs -> 4.62μs (21.5% faster)

def test_basic_sql_to_marimo_with_engine():
    # Test with engine parameter
    codeflash_output = sql_to_marimo("SELECT x FROM y", "tbl", engine="duckdb"); result = codeflash_output # 5.80μs -> 4.79μs (21.0% faster)

def test_basic_sql_to_marimo_hide_output_and_engine():
    # Test with both hide_output and engine
    codeflash_output = sql_to_marimo("SELECT * FROM z", "table1", hide_output=True, engine="sqlite"); result = codeflash_output # 6.46μs -> 5.43μs (18.9% faster)

def test_basic_sql_to_marimo_multiline_sql():
    # Test with multi-line SQL
    sql = "SELECT *\nFROM foo\nWHERE bar = 1"
    codeflash_output = sql_to_marimo(sql, "t"); result = codeflash_output # 5.26μs -> 4.29μs (22.7% faster)

# Edge Test Cases

def test_edge_empty_sql():
    # Test with empty SQL string
    codeflash_output = sql_to_marimo("", "empty_table"); result = codeflash_output # 4.40μs -> 3.32μs (32.4% faster)

def test_edge_empty_table_name():
    # Test with empty table name
    codeflash_output = sql_to_marimo("SELECT 1", ""); result = codeflash_output # 4.38μs -> 3.65μs (20.1% faster)

def test_edge_sql_with_quotes_and_special_chars():
    # Test SQL containing quotes and special characters
    sql = "SELECT \"foo\", 'bar', `baz`, [qux] FROM t WHERE x = 'y';"
    codeflash_output = sql_to_marimo(sql, "special_table"); result = codeflash_output # 4.33μs -> 3.49μs (24.2% faster)

def test_edge_sql_with_unicode():
    # Test SQL containing unicode characters
    sql = "SELECT café, π FROM α WHERE β = 'γ'"
    codeflash_output = sql_to_marimo(sql, "unicode_table"); result = codeflash_output # 5.36μs -> 4.36μs (22.8% faster)

def test_edge_hide_output_false_and_engine_none():
    # Explicitly test hide_output=False and engine=None
    codeflash_output = sql_to_marimo("SELECT 1", "t", hide_output=False, engine=None); result = codeflash_output # 4.85μs -> 3.98μs (22.0% faster)

def test_edge_sql_with_leading_and_trailing_whitespace():
    # SQL with leading/trailing whitespace should be preserved
    sql = "   SELECT * FROM foo   "
    codeflash_output = sql_to_marimo(sql, "whitespace_table"); result = codeflash_output # 4.41μs -> 3.74μs (17.8% faster)

def test_edge_table_name_with_spaces_and_special_chars():
    # Table name with spaces and special characters
    codeflash_output = sql_to_marimo("SELECT 1", "table name!@#"); result = codeflash_output # 4.24μs -> 3.50μs (20.9% faster)

def test_edge_sql_with_newlines_only():
    # SQL with only newlines
    sql = "\n\n"
    codeflash_output = sql_to_marimo(sql, "newline_table"); result = codeflash_output # 4.77μs -> 4.09μs (16.6% faster)

def test_edge_sql_with_long_line():
    # SQL with a very long line (but not large scale)
    sql = "SELECT " + ", ".join(f"col{i}" for i in range(50)) + " FROM t"
    codeflash_output = sql_to_marimo(sql, "long_line_table"); result = codeflash_output # 4.45μs -> 3.64μs (22.2% faster)
    for i in range(50):
        pass

# Large Scale Test Cases

def test_large_scale_sql_to_marimo_many_columns():
    # SQL with many columns (up to 1000)
    columns = [f"col{i}" for i in range(1000)]
    sql = "SELECT " + ", ".join(columns) + " FROM big_table"
    codeflash_output = sql_to_marimo(sql, "large_table"); result = codeflash_output # 8.08μs -> 7.28μs (11.1% faster)
    # Check that all columns are present
    for col in columns:
        pass

def test_large_scale_sql_to_marimo_long_table_name():
    # Table name with 500 characters
    table_name = "t" * 500
    codeflash_output = sql_to_marimo("SELECT 1", table_name); result = codeflash_output # 4.76μs -> 3.89μs (22.4% faster)

def test_large_scale_sql_to_marimo_long_sql():
    # SQL with 1000 lines
    sql = "\n".join(f"SELECT {i} FROM t{i};" for i in range(1000))
    codeflash_output = sql_to_marimo(sql, "multi_line_table"); result = codeflash_output # 131μs -> 118μs (11.7% faster)

def test_large_scale_sql_to_marimo_hide_output_and_engine():
    # Large SQL with both hide_output and engine
    sql = "\n".join(f"SELECT {i}" for i in range(500))
    codeflash_output = sql_to_marimo(sql, "big_table", hide_output=True, engine="duckdb"); result = codeflash_output # 60.5μs -> 50.6μs (19.5% faster)

def test_large_scale_sql_to_marimo_all_options():
    # Large SQL, long table name, hide_output, engine
    table_name = "T" * 100
    sql = "\n".join(f"SELECT {i}" for i in range(1000))
    codeflash_output = sql_to_marimo(sql, table_name, hide_output=True, engine="sqlite"); result = codeflash_output # 111μs -> 92.0μs (21.0% faster)

# Mutation detection: ensure output=False and engine only appear if requested

def test_mutation_hide_output_and_engine_exclusivity():
    # If hide_output and engine are not set, they should not appear
    codeflash_output = sql_to_marimo("SELECT 1", "t"); result = codeflash_output # 4.42μs -> 3.76μs (17.5% faster)

def test_mutation_hide_output_only():
    # Only hide_output should appear
    codeflash_output = sql_to_marimo("SELECT 1", "t", hide_output=True); result = codeflash_output # 5.30μs -> 4.51μs (17.6% faster)

def test_mutation_engine_only():
    # Only engine should appear
    codeflash_output = sql_to_marimo("SELECT 1", "t", engine="duckdb"); result = codeflash_output # 5.56μs -> 4.72μs (17.7% faster)

def test_mutation_hide_output_and_engine():
    # Both should appear
    codeflash_output = sql_to_marimo("SELECT 1", "t", hide_output=True, engine="duckdb"); result = codeflash_output # 6.29μs -> 5.26μs (19.7% faster)

# Indentation checks

def test_indentation_of_sql_and_options():
    # All lines after the first should be indented
    sql = "SELECT 1"
    codeflash_output = sql_to_marimo(sql, "t", hide_output=True, engine="duckdb"); result = codeflash_output # 6.36μs -> 5.28μs (20.5% faster)
    lines = result.splitlines()
    # The next lines should be indented
    for line in lines[1:-1]:  # skip first and last
        if line.strip() != "":
            pass

# Output structure checks

def test_output_structure():
    # Output should have the expected structure
    sql = "SELECT 1"
    codeflash_output = sql_to_marimo(sql, "my_table", hide_output=True, engine="duckdb"); result = codeflash_output # 6.39μs -> 5.15μs (24.1% faster)
    lines = result.splitlines()
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from __future__ import annotations

import textwrap

# imports
import pytest  # used for our unit tests
from marimo._convert.utils import sql_to_marimo

# unit tests

# ----------- Basic Test Cases -----------

def test_basic_minimal_input():
    # Test with minimal SQL and table name
    source = "SELECT 1"
    table = "result"
    expected = (
        "result = mo.sql(\n"
        "    f\"\"\"\n"
        "    SELECT 1\n"
        "    \"\"\"\n"
        ")"
    )
    codeflash_output = sql_to_marimo(source, table) # 4.37μs -> 3.54μs (23.2% faster)

def test_basic_hide_output_true():
    # Test with hide_output True
    source = "SELECT * FROM foo"
    table = "my_table"
    expected = (
        "my_table = mo.sql(\n"
        "    f\"\"\"\n"
        "    SELECT * FROM foo\n"
        "    \"\"\"\n"
        "    output=False\n"
        ")"
    )
    codeflash_output = sql_to_marimo(source, table, hide_output=True) # 5.30μs -> 4.49μs (18.0% faster)

def test_basic_with_engine():
    # Test with engine specified
    source = "SELECT * FROM bar"
    table = "tbl"
    engine = "duckdb"
    expected = (
        "tbl = mo.sql(\n"
        "    f\"\"\"\n"
        "    SELECT * FROM bar\n"
        "    \"\"\"\n"
        "    engine=duckdb\n"
        ")"
    )
    codeflash_output = sql_to_marimo(source, table, engine=engine) # 5.70μs -> 4.71μs (21.0% faster)

def test_basic_hide_output_and_engine():
    # Test with both hide_output True and engine specified
    source = "SELECT * FROM baz"
    table = "tbl2"
    engine = "sqlite"
    expected = (
        "tbl2 = mo.sql(\n"
        "    f\"\"\"\n"
        "    SELECT * FROM baz\n"
        "    \"\"\"\n"
        "    output=False\n"
        "    engine=sqlite\n"
        ")"
    )
    codeflash_output = sql_to_marimo(source, table, hide_output=True, engine=engine) # 6.43μs -> 5.26μs (22.3% faster)

# ----------- Edge Test Cases -----------

def test_edge_empty_sql():
    # Test with empty SQL string
    source = ""
    table = "empty_tbl"
    expected = (
        "empty_tbl = mo.sql(\n"
        "    f\"\"\"\n"
        "    \n"
        "    \"\"\"\n"
        ")"
    )
    codeflash_output = sql_to_marimo(source, table) # 4.55μs -> 3.29μs (38.2% faster)

def test_edge_empty_table_name():
    # Test with empty table name
    source = "SELECT 1"
    table = ""
    expected = (
        " = mo.sql(\n"
        "    f\"\"\"\n"
        "    SELECT 1\n"
        "    \"\"\"\n"
        ")"
    )
    codeflash_output = sql_to_marimo(source, table) # 4.23μs -> 3.56μs (18.8% faster)

def test_edge_sql_with_newlines_and_spaces():
    # Test SQL with multiple newlines and spaces
    source = "SELECT *\nFROM foo\nWHERE bar = 1\n"
    table = "tbl"
    expected = (
        "tbl = mo.sql(\n"
        "    f\"\"\"\n"
        "    SELECT *\nFROM foo\nWHERE bar = 1\n\n"
        "    \"\"\"\n"
        ")"
    )
    codeflash_output = sql_to_marimo(source, table) # 5.15μs -> 4.26μs (21.0% faster)

def test_edge_sql_with_special_characters():
    # Test SQL with special characters
    source = "SELECT '\"\\n\\t' FROM foo"
    table = "tbl"
    expected = (
        "tbl = mo.sql(\n"
        "    f\"\"\"\n"
        "    SELECT '\"\\n\\t' FROM foo\n"
        "    \"\"\"\n"
        ")"
    )
    codeflash_output = sql_to_marimo(source, table) # 4.34μs -> 3.56μs (21.9% faster)

def test_edge_table_name_with_special_characters():
    # Table name with special characters
    source = "SELECT 1"
    table = "tbl$#@!"
    expected = (
        "tbl$#@! = mo.sql(\n"
        "    f\"\"\"\n"
        "    SELECT 1\n"
        "    \"\"\"\n"
        ")"
    )
    codeflash_output = sql_to_marimo(source, table) # 4.26μs -> 3.56μs (19.6% faster)

def test_edge_long_engine_name():
    # Engine name is unusually long
    source = "SELECT 1"
    table = "tbl"
    engine = "verylongenginename1234567890"
    expected = (
        "tbl = mo.sql(\n"
        "    f\"\"\"\n"
        "    SELECT 1\n"
        "    \"\"\"\n"
        "    engine=verylongenginename1234567890\n"
        ")"
    )
    codeflash_output = sql_to_marimo(source, table, engine=engine) # 5.74μs -> 4.84μs (18.5% faster)

def test_edge_hide_output_false_explicit():
    # hide_output explicitly set to False
    source = "SELECT 1"
    table = "tbl"
    expected = (
        "tbl = mo.sql(\n"
        "    f\"\"\"\n"
        "    SELECT 1\n"
        "    \"\"\"\n"
        ")"
    )
    codeflash_output = sql_to_marimo(source, table, hide_output=False) # 4.51μs -> 3.78μs (19.3% faster)

def test_edge_engine_empty_string():
    # Engine is empty string
    source = "SELECT 1"
    table = "tbl"
    engine = ""
    expected = (
        "tbl = mo.sql(\n"
        "    f\"\"\"\n"
        "    SELECT 1\n"
        "    \"\"\"\n"
        "    engine=\n"
        ")"
    )
    codeflash_output = sql_to_marimo(source, table, engine=engine) # 4.56μs -> 3.83μs (19.2% faster)

def test_edge_sql_with_indentations():
    # SQL with leading indentation
    source = "    SELECT 1"
    table = "tbl"
    expected = (
        "tbl = mo.sql(\n"
        "    f\"\"\"\n"
        "        SELECT 1\n"
        "    \"\"\"\n"
        ")"
    )
    codeflash_output = sql_to_marimo(source, table) # 4.58μs -> 3.72μs (23.1% faster)

def test_edge_sql_with_unicode_characters():
    # SQL with unicode characters
    source = "SELECT '你好' FROM foo"
    table = "tbl"
    expected = (
        "tbl = mo.sql(\n"
        "    f\"\"\"\n"
        "    SELECT '你好' FROM foo\n"
        "    \"\"\"\n"
        ")"
    )
    codeflash_output = sql_to_marimo(source, table) # 5.14μs -> 4.19μs (22.9% faster)

# ----------- Large Scale Test Cases -----------

def test_large_sql_query():
    # Large SQL query (500 lines)
    source = "\n".join([f"SELECT {i} FROM tbl{i};" for i in range(500)])
    table = "large_tbl"
    expected = (
        "large_tbl = mo.sql(\n"
        "    f\"\"\"\n"
        f"    {source}\n"
        "    \"\"\"\n"
        ")"
    )
    codeflash_output = sql_to_marimo(source, table) # 60.6μs -> 51.6μs (17.6% faster)

def test_large_table_name():
    # Large table name (250 chars)
    table = "tbl_" + "x" * 246
    source = "SELECT 1"
    expected = (
        f"{table} = mo.sql(\n"
        "    f\"\"\"\n"
        "    SELECT 1\n"
        "    \"\"\"\n"
        ")"
    )
    codeflash_output = sql_to_marimo(source, table) # 4.59μs -> 3.70μs (24.0% faster)

def test_large_engine_name():
    # Large engine name (500 chars)
    engine = "engine_" + "y" * 492
    table = "tbl"
    source = "SELECT 1"
    expected = (
        "tbl = mo.sql(\n"
        "    f\"\"\"\n"
        "    SELECT 1\n"
        "    \"\"\"\n"
        f"    engine={engine}\n"
        ")"
    )
    codeflash_output = sql_to_marimo(source, table, engine=engine) # 6.43μs -> 5.32μs (21.0% faster)

def test_large_sql_and_hide_output_and_engine():
    # Large SQL, hide_output True, large engine
    source = "\n".join([f"SELECT {i} FROM tbl{i};" for i in range(1000)])
    engine = "engine_" + "z" * 990
    table = "big_tbl"
    expected = (
        "big_tbl = mo.sql(\n"
        "    f\"\"\"\n"
        f"    {source}\n"
        "    \"\"\"\n"
        "    output=False\n"
        f"    engine={engine}\n"
        ")"
    )
    codeflash_output = sql_to_marimo(source, table, hide_output=True, engine=engine) # 144μs -> 125μs (15.5% faster)

def test_large_sql_with_newlines_and_spaces():
    # Large SQL with lots of newlines and spaces
    source = "\n".join(["   " * (i % 5) + f"SELECT {i}" for i in range(500)])
    table = "tbl"
    expected = (
        "tbl = mo.sql(\n"
        "    f\"\"\"\n"
        f"    {source}\n"
        "    \"\"\"\n"
        ")"
    )
    codeflash_output = sql_to_marimo(source, table) # 62.8μs -> 53.2μs (18.0% faster)

def test_large_sql_with_unicode():
    # Large SQL with unicode characters
    source = "\n".join([f"SELECT '{chr(0x4e00 + i)}' FROM tbl{i};" for i in range(100)])
    table = "tbl"
    expected = (
        "tbl = mo.sql(\n"
        "    f\"\"\"\n"
        f"    {source}\n"
        "    \"\"\"\n"
        ")"
    )
    codeflash_output = sql_to_marimo(source, table) # 20.9μs -> 17.9μs (16.5% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from marimo._convert.utils import sql_to_marimo

def test_sql_to_marimo():
    sql_to_marimo('', '', hide_output=True, engine='ᚁ')
🔎 Concolic Coverage Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
codeflash_concolic_hg3s6k0k/tmpzi0lkfnp/test_concolic_coverage.py::test_sql_to_marimo 7.76μs 5.90μs 31.5%✅

To edit these changes git checkout codeflash/optimize-sql_to_marimo-mhb61dak and push.

Codeflash

The optimization replaces Python's `textwrap.indent()` with a custom implementation that's ~40% faster for the `indent_text()` function. 

**Key changes:**
- **Removed textwrap dependency**: Eliminated the import and function call overhead by implementing indentation inline
- **Optimized empty text handling**: Added early return for empty strings to avoid unnecessary processing  
- **Direct string operations**: Uses `splitlines(keepends=True)` and a generator expression with `"".join()` instead of the more general-purpose `textwrap.indent()`

**Why it's faster:**
- Avoids module import overhead and function call indirection
- The custom implementation is more targeted - it only handles the specific indentation pattern needed (4 spaces) rather than textwrap's general-purpose logic
- Generator expression with join is more efficient than textwrap's internal string building for this specific use case

**Test case performance:**
- Best gains on **empty/small inputs** (32-38% faster) due to the early return optimization
- Consistent **15-25% speedup** across most test cases regardless of SQL complexity
- Large-scale tests (500-1000 lines) still see **15-18% improvements**, showing the optimization scales well

The speedup comes primarily from eliminating the textwrap overhead rather than algorithmic improvements, making it effective across all input sizes.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 28, 2025 22:56
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant