Skip to content

Conversation

misrasaurabh1
Copy link
Contributor

Hi all, I am building Codeflash.ai which is an automated performance optimizer for Python codebases. I tried optimizing sentry and found a bunch of great optimizations that I would like to contribute. Would love to collaborate with your team to get them reviewed and merged. Let me know what's the best way to get in touch.

📄 44% (0.44x) speedup for _get_db_span_description in sentry_sdk/integrations/redis/modules/queries.py

⏱️ Runtime : 586 microseconds 408 microseconds (best of 269 runs)

📝 Explanation and details

The optimization achieves a 43% speedup by eliminating redundant function calls inside the loop in _get_safe_command().

Key optimizations applied:

  1. Cached should_send_default_pii() call: The original code called this function inside the loop for every non-key argument (up to 146 times in profiling). The optimized version calls it once before the loop and stores the result in send_default_pii, reducing expensive function calls from O(n) to O(1).

  2. Pre-computed name.lower(): The original code computed name.lower() inside the loop for every argument (204 times in profiling). The optimized version computes it once before the loop and reuses the name_low variable.

Performance impact from profiling:

  • The should_send_default_pii() calls dropped from 1.40ms (65.2% of total time) to 625μs (45.9% of total time)
  • The name.lower() calls were eliminated from the loop entirely, removing 99ms of redundant computation
  • Overall _get_safe_command execution time improved from 2.14ms to 1.36ms (36% faster)

Test case patterns where this optimization excels:

  • Multiple arguments: Commands with many arguments see dramatic improvements (up to 262% faster for large arg lists)
  • Large-scale operations: Tests with 1000+ arguments show 171-223% speedups
  • Frequent Redis commands: Any command processing multiple values benefits significantly

The optimization is most effective when processing Redis commands with multiple arguments, which is common in batch operations and complex data manipulations.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 48 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import pytest
from sentry_sdk.integrations.redis.modules.queries import \
    _get_db_span_description

_MAX_NUM_ARGS = 10

# Dummy RedisIntegration class for testing
class RedisIntegration:
    def __init__(self, max_data_size=None):
        self.max_data_size = max_data_size

# Dummy should_send_default_pii function for testing
_send_pii = False
from sentry_sdk.integrations.redis.modules.queries import \
    _get_db_span_description

# --- Basic Test Cases ---

def test_basic_no_args():
    """Test command with no arguments."""
    integration = RedisIntegration()
    codeflash_output = _get_db_span_description(integration, "PING", ()); desc = codeflash_output # 2.55μs -> 7.76μs (67.2% slower)

def test_basic_single_arg_pii_false():
    """Test command with one argument, PII off."""
    integration = RedisIntegration()
    codeflash_output = _get_db_span_description(integration, "GET", ("mykey",)); desc = codeflash_output # 3.62μs -> 7.86μs (54.0% slower)

def test_basic_single_arg_pii_true():
    """Test command with one argument, PII on."""
    global _send_pii
    _send_pii = True
    integration = RedisIntegration()
    codeflash_output = _get_db_span_description(integration, "GET", ("mykey",)); desc = codeflash_output # 3.28μs -> 7.40μs (55.7% slower)

def test_basic_multiple_args_pii_false():
    """Test command with multiple args, PII off."""
    integration = RedisIntegration()
    codeflash_output = _get_db_span_description(integration, "GET", ("mykey", "value1", "value2")); desc = codeflash_output # 12.6μs -> 8.24μs (52.8% faster)

def test_basic_multiple_args_pii_true():
    """Test command with multiple args, PII on."""
    global _send_pii
    _send_pii = True
    integration = RedisIntegration()
    codeflash_output = _get_db_span_description(integration, "GET", ("mykey", "value1", "value2")); desc = codeflash_output # 9.92μs -> 8.47μs (17.0% faster)

def test_basic_sensitive_command():
    """Test sensitive command: should always filter after command name."""
    integration = RedisIntegration()
    codeflash_output = _get_db_span_description(integration, "SET", ("mykey", "secret")); desc = codeflash_output # 7.96μs -> 7.56μs (5.33% faster)

def test_basic_sensitive_command_case_insensitive():
    """Test sensitive command with different casing."""
    integration = RedisIntegration()
    codeflash_output = _get_db_span_description(integration, "set", ("mykey", "secret")); desc = codeflash_output # 7.77μs -> 7.84μs (0.881% slower)

def test_basic_max_num_args():
    """Test that args beyond _MAX_NUM_ARGS are ignored."""
    integration = RedisIntegration()
    args = tuple(f"arg{i}" for i in range(_MAX_NUM_ARGS + 2))
    codeflash_output = _get_db_span_description(integration, "GET", args); desc = codeflash_output # 28.0μs -> 9.43μs (197% faster)
    # Only up to _MAX_NUM_ARGS+1 args are processed (the first arg is key)
    expected = "GET 'arg0'" + " [Filtered]" * _MAX_NUM_ARGS

# --- Edge Test Cases ---

def test_edge_empty_command_name():
    """Test with empty command name."""
    integration = RedisIntegration()
    codeflash_output = _get_db_span_description(integration, "", ("key",)); desc = codeflash_output # 3.22μs -> 7.46μs (56.9% slower)

def test_edge_empty_args():
    """Test with empty args tuple."""
    integration = RedisIntegration()
    codeflash_output = _get_db_span_description(integration, "DEL", ()); desc = codeflash_output # 2.09μs -> 6.73μs (69.0% slower)

def test_edge_none_arg():
    """Test with None argument."""
    integration = RedisIntegration()
    codeflash_output = _get_db_span_description(integration, "GET", (None,)); desc = codeflash_output # 3.37μs -> 7.57μs (55.5% slower)

def test_edge_mixed_types_args():
    """Test with mixed argument types."""
    integration = RedisIntegration()
    args = ("key", 123, 45.6, True, None, ["a", "b"], {"x": 1})
    codeflash_output = _get_db_span_description(integration, "GET", args); desc = codeflash_output # 19.9μs -> 8.46μs (136% faster)

def test_edge_sensitive_command_with_pii_true():
    """Sensitive commands should always filter, even if PII is on."""
    global _send_pii
    _send_pii = True
    integration = RedisIntegration()
    codeflash_output = _get_db_span_description(integration, "AUTH", ("user", "pass")); desc = codeflash_output # 3.40μs -> 7.50μs (54.7% slower)

def test_edge_max_data_size_truncation():
    """Test truncation when description exceeds max_data_size."""
    integration = RedisIntegration(max_data_size=15)
    codeflash_output = _get_db_span_description(integration, "GET", ("verylongkeyname", "value")); desc = codeflash_output # 9.20μs -> 8.72μs (5.57% faster)
    # "GET 'verylongkeyname' [Filtered]" is longer than 15
    # Truncate to 15-len("...") = 12, then add "..."
    expected = "GET 'verylo..."

def test_edge_max_data_size_exact_length():
    """Test truncation when description is exactly max_data_size."""
    integration = RedisIntegration(max_data_size=23)
    codeflash_output = _get_db_span_description(integration, "GET", ("shortkey",)); desc = codeflash_output # 3.33μs -> 7.63μs (56.4% slower)

def test_edge_max_data_size_less_than_ellipsis():
    """Test when max_data_size is less than length of ellipsis."""
    integration = RedisIntegration(max_data_size=2)
    codeflash_output = _get_db_span_description(integration, "GET", ("key",)); desc = codeflash_output # 4.07μs -> 8.65μs (52.9% slower)

def test_edge_args_are_empty_strings():
    """Test when args are empty strings."""
    integration = RedisIntegration()
    codeflash_output = _get_db_span_description(integration, "GET", ("", "")); desc = codeflash_output # 8.52μs -> 7.74μs (10.1% faster)

def test_edge_command_name_is_space():
    """Test when command name is a space."""
    integration = RedisIntegration()
    codeflash_output = _get_db_span_description(integration, " ", ("key",)); desc = codeflash_output # 3.09μs -> 7.34μs (57.9% slower)

# --- Large Scale Test Cases ---

def test_large_many_args_pii_false():
    """Test with a large number of arguments, PII off."""
    integration = RedisIntegration()
    args = tuple(f"arg{i}" for i in range(1000))
    codeflash_output = _get_db_span_description(integration, "GET", args); desc = codeflash_output # 32.3μs -> 10.3μs (213% faster)
    # Only first arg shown, rest are filtered, up to _MAX_NUM_ARGS
    expected = "GET 'arg0'" + " [Filtered]" * min(len(args)-1, _MAX_NUM_ARGS)

def test_large_many_args_pii_true():
    """Test with a large number of arguments, PII on."""
    global _send_pii
    _send_pii = True
    integration = RedisIntegration()
    args = tuple(f"arg{i}" for i in range(1000))
    # Only up to _MAX_NUM_ARGS are processed
    expected = "GET " + " ".join([repr(f"arg{i}") for i in range(_MAX_NUM_ARGS+1)])
    codeflash_output = _get_db_span_description(integration, "GET", args); desc = codeflash_output # 28.1μs -> 9.55μs (194% faster)

def test_large_long_command_name_and_args():
    """Test with very long command name and args."""
    integration = RedisIntegration()
    cmd = "LONGCOMMAND" * 10
    args = tuple("X"*100 for _ in range(_MAX_NUM_ARGS+1))
    expected = cmd + " " + " ".join([repr("X"*100) if i == 0 else "[Filtered]" for i in range(_MAX_NUM_ARGS+1)])
    codeflash_output = _get_db_span_description(integration, cmd, args); desc = codeflash_output # 34.2μs -> 9.45μs (262% faster)

def test_large_truncation():
    """Test truncation with very large description."""
    integration = RedisIntegration(max_data_size=50)
    args = tuple("X"*20 for _ in range(_MAX_NUM_ARGS+1))
    codeflash_output = _get_db_span_description(integration, "GET", args); desc = codeflash_output # 28.3μs -> 10.0μs (182% faster)

def test_large_sensitive_command():
    """Test large sensitive command, all args filtered."""
    integration = RedisIntegration()
    args = tuple(f"secret{i}" for i in range(1000))
    codeflash_output = _get_db_span_description(integration, "SET", args); desc = codeflash_output # 28.0μs -> 10.1μs (178% faster)
    # Only up to _MAX_NUM_ARGS+1 args are processed, all filtered
    expected = "SET" + " [Filtered]" * (_MAX_NUM_ARGS+1)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import pytest  # used for our unit tests
from sentry_sdk.integrations.redis.modules.queries import \
    _get_db_span_description

_MAX_NUM_ARGS = 10

# Minimal RedisIntegration stub for testing
class RedisIntegration:
    def __init__(self, max_data_size=None):
        self.max_data_size = max_data_size

# Minimal Scope and client stub for should_send_default_pii
class ClientStub:
    def __init__(self, send_pii):
        self._send_pii = send_pii
    def should_send_default_pii(self):
        return self._send_pii

class Scope:
    _client = ClientStub(send_pii=False)
    @classmethod
    def get_client(cls):
        return cls._client

def should_send_default_pii():
    return Scope.get_client().should_send_default_pii()
from sentry_sdk.integrations.redis.modules.queries import \
    _get_db_span_description

# --- Begin: Unit Tests ---

# 1. Basic Test Cases

def test_basic_single_arg_no_pii():
    # Test a simple command with one argument, PII disabled
    Scope._client = ClientStub(send_pii=False)
    integration = RedisIntegration()
    codeflash_output = _get_db_span_description(integration, "GET", ("mykey",)); result = codeflash_output # 3.46μs -> 7.84μs (55.9% slower)

def test_basic_multiple_args_no_pii():
    # Test a command with multiple arguments, PII disabled
    Scope._client = ClientStub(send_pii=False)
    integration = RedisIntegration()
    codeflash_output = _get_db_span_description(integration, "SET", ("mykey", "myvalue")); result = codeflash_output # 8.35μs -> 8.05μs (3.70% faster)

def test_basic_multiple_args_with_pii():
    # Test a command with multiple arguments, PII enabled
    Scope._client = ClientStub(send_pii=True)
    integration = RedisIntegration()
    codeflash_output = _get_db_span_description(integration, "SET", ("mykey", "myvalue")); result = codeflash_output # 7.97μs -> 7.63μs (4.39% faster)

def test_basic_sensitive_command():
    # Test a sensitive command, should always be filtered
    Scope._client = ClientStub(send_pii=True)
    integration = RedisIntegration()
    codeflash_output = _get_db_span_description(integration, "AUTH", ("user", "password")); result = codeflash_output # 3.40μs -> 7.46μs (54.4% slower)

def test_basic_no_args():
    # Test a command with no arguments
    Scope._client = ClientStub(send_pii=False)
    integration = RedisIntegration()
    codeflash_output = _get_db_span_description(integration, "PING", ()); result = codeflash_output # 2.16μs -> 6.63μs (67.4% slower)

# 2. Edge Test Cases

def test_edge_max_num_args():
    # Test with more than _MAX_NUM_ARGS arguments, should truncate at _MAX_NUM_ARGS
    Scope._client = ClientStub(send_pii=True)
    integration = RedisIntegration()
    args = tuple(f"arg{i}" for i in range(_MAX_NUM_ARGS + 2))
    codeflash_output = _get_db_span_description(integration, "SET", args); result = codeflash_output # 32.4μs -> 9.05μs (258% faster)
    # Only up to _MAX_NUM_ARGS should be included
    expected = "SET " + " ".join(
        [repr(args[0])] + [repr(arg) for arg in args[1:_MAX_NUM_ARGS+1]]
    )

def test_edge_empty_string_key():
    # Test with an empty string as key
    Scope._client = ClientStub(send_pii=False)
    integration = RedisIntegration()
    codeflash_output = _get_db_span_description(integration, "GET", ("",)); result = codeflash_output # 3.42μs -> 7.51μs (54.5% slower)

def test_edge_none_key():
    # Test with None as key
    Scope._client = ClientStub(send_pii=False)
    integration = RedisIntegration()
    codeflash_output = _get_db_span_description(integration, "GET", (None,)); result = codeflash_output # 3.25μs -> 7.42μs (56.2% slower)

def test_edge_non_string_key():
    # Test with integer as key
    Scope._client = ClientStub(send_pii=False)
    integration = RedisIntegration()
    codeflash_output = _get_db_span_description(integration, "GET", (12345,)); result = codeflash_output # 3.24μs -> 7.62μs (57.5% slower)

def test_edge_sensitive_command_case_insensitive():
    # Test sensitive command with mixed case
    Scope._client = ClientStub(send_pii=True)
    integration = RedisIntegration()
    codeflash_output = _get_db_span_description(integration, "AuTh", ("user", "password")); result = codeflash_output # 3.57μs -> 7.72μs (53.8% slower)

def test_edge_truncation_exact():
    # Test truncation where description is exactly max_data_size
    Scope._client = ClientStub(send_pii=True)
    integration = RedisIntegration(max_data_size=13)
    codeflash_output = _get_db_span_description(integration, "GET", ("mykey",)); result = codeflash_output # 3.61μs -> 8.05μs (55.1% slower)

def test_edge_truncation_needed():
    # Test truncation where description exceeds max_data_size
    Scope._client = ClientStub(send_pii=True)
    integration = RedisIntegration(max_data_size=10)
    codeflash_output = _get_db_span_description(integration, "GET", ("mykey",)); result = codeflash_output # 4.32μs -> 7.96μs (45.8% slower)

def test_edge_truncation_with_filtered():
    # Truncation with filtered data
    Scope._client = ClientStub(send_pii=False)
    integration = RedisIntegration(max_data_size=10)
    codeflash_output = _get_db_span_description(integration, "SET", ("mykey", "myvalue")); result = codeflash_output # 10.3μs -> 8.92μs (15.7% faster)

def test_edge_args_are_bytes():
    # Test arguments are bytes
    Scope._client = ClientStub(send_pii=True)
    integration = RedisIntegration()
    codeflash_output = _get_db_span_description(integration, "GET", (b"mykey",)); result = codeflash_output # 3.42μs -> 7.54μs (54.7% slower)

def test_edge_args_are_mixed_types():
    # Test arguments are mixed types
    Scope._client = ClientStub(send_pii=True)
    integration = RedisIntegration()
    args = ("key", 123, None, b"bytes")
    codeflash_output = _get_db_span_description(integration, "SET", args); result = codeflash_output # 13.7μs -> 8.31μs (65.1% faster)
    expected = "SET 'key' 123 None b'bytes'"

def test_edge_args_are_empty_tuple():
    # Test arguments is empty tuple
    Scope._client = ClientStub(send_pii=True)
    integration = RedisIntegration()
    codeflash_output = _get_db_span_description(integration, "PING", ()); result = codeflash_output # 2.14μs -> 6.67μs (67.9% slower)

def test_edge_args_are_list():
    # Test arguments as a list (should still work as sequence)
    Scope._client = ClientStub(send_pii=True)
    integration = RedisIntegration()
    codeflash_output = _get_db_span_description(integration, "SET", ["key", "val"]); result = codeflash_output # 8.54μs -> 7.96μs (7.30% faster)


def test_edge_args_are_dict():
    # Test arguments as a dict (should treat as sequence of keys)
    Scope._client = ClientStub(send_pii=True)
    integration = RedisIntegration()
    args = {"a": 1, "b": 2}
    codeflash_output = _get_db_span_description(integration, "SET", args); result = codeflash_output # 7.87μs -> 7.86μs (0.102% faster)

def test_edge_args_are_long_string():
    # Test argument is a very long string (truncation)
    Scope._client = ClientStub(send_pii=True)
    integration = RedisIntegration(max_data_size=20)
    long_str = "x" * 100
    codeflash_output = _get_db_span_description(integration, "SET", (long_str,)); result = codeflash_output # 4.46μs -> 8.43μs (47.1% slower)

# 3. Large Scale Test Cases

def test_large_many_args_no_pii():
    # Test with large number of arguments, PII disabled
    Scope._client = ClientStub(send_pii=False)
    integration = RedisIntegration()
    args = tuple(f"key{i}" for i in range(999))
    codeflash_output = _get_db_span_description(integration, "MGET", args); result = codeflash_output # 28.6μs -> 10.6μs (171% faster)
    # Only first is shown, rest are filtered (up to _MAX_NUM_ARGS)
    expected = "MGET 'key0'" + " [Filtered]" * _MAX_NUM_ARGS

def test_large_many_args_with_pii():
    # Test with large number of arguments, PII enabled
    Scope._client = ClientStub(send_pii=True)
    integration = RedisIntegration()
    args = tuple(f"key{i}" for i in range(999))
    codeflash_output = _get_db_span_description(integration, "MGET", args); result = codeflash_output # 30.9μs -> 9.87μs (213% faster)
    # Only up to _MAX_NUM_ARGS are shown
    expected = "MGET " + " ".join([repr(arg) for arg in args[:_MAX_NUM_ARGS+1]])

def test_large_truncation():
    # Test truncation with large description
    Scope._client = ClientStub(send_pii=True)
    integration = RedisIntegration(max_data_size=50)
    args = tuple("x" * 10 for _ in range(20))
    codeflash_output = _get_db_span_description(integration, "MGET", args); result = codeflash_output # 31.0μs -> 10.4μs (198% faster)

def test_large_sensitive_command():
    # Test large sensitive command, should always be filtered
    Scope._client = ClientStub(send_pii=True)
    integration = RedisIntegration()
    args = tuple("x" * 10 for _ in range(20))
    codeflash_output = _get_db_span_description(integration, "AUTH", args); result = codeflash_output # 5.42μs -> 9.30μs (41.8% slower)

def test_large_args_are_large_numbers():
    # Test with large integer arguments
    Scope._client = ClientStub(send_pii=True)
    integration = RedisIntegration()
    args = tuple(10**6 + i for i in range(_MAX_NUM_ARGS + 1))
    codeflash_output = _get_db_span_description(integration, "MGET", args); result = codeflash_output # 27.6μs -> 9.38μs (194% faster)
    expected = "MGET " + " ".join([repr(arg) for arg in args[:_MAX_NUM_ARGS+1]])

def test_large_args_are_large_bytes():
    # Test with large bytes arguments
    Scope._client = ClientStub(send_pii=True)
    integration = RedisIntegration()
    args = tuple(b"x" * 100 for _ in range(_MAX_NUM_ARGS + 1))
    codeflash_output = _get_db_span_description(integration, "MGET", args); result = codeflash_output # 30.2μs -> 9.35μs (223% faster)
    expected = "MGET " + " ".join([repr(arg) for arg in args[:_MAX_NUM_ARGS+1]])
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-_get_db_span_description-mg9vzvxu and push.

Codeflash

The optimization achieves a **43% speedup** by eliminating redundant function calls inside the loop in `_get_safe_command()`.

**Key optimizations applied:**

1. **Cached `should_send_default_pii()` call**: The original code called this function inside the loop for every non-key argument (up to 146 times in profiling). The optimized version calls it once before the loop and stores the result in `send_default_pii`, reducing expensive function calls from O(n) to O(1).

2. **Pre-computed `name.lower()`**: The original code computed `name.lower()` inside the loop for every argument (204 times in profiling). The optimized version computes it once before the loop and reuses the `name_low` variable.

**Performance impact from profiling:**
- The `should_send_default_pii()` calls dropped from 1.40ms (65.2% of total time) to 625μs (45.9% of total time)
- The `name.lower()` calls were eliminated from the loop entirely, removing 99ms of redundant computation
- Overall `_get_safe_command` execution time improved from 2.14ms to 1.36ms (36% faster)

**Test case patterns where this optimization excels:**
- **Multiple arguments**: Commands with many arguments see dramatic improvements (up to 262% faster for large arg lists)
- **Large-scale operations**: Tests with 1000+ arguments show 171-223% speedups
- **Frequent Redis commands**: Any command processing multiple values benefits significantly

The optimization is most effective when processing Redis commands with multiple arguments, which is common in batch operations and complex data manipulations.
@misrasaurabh1 misrasaurabh1 requested a review from a team as a code owner October 14, 2025 19:19
Copy link
Contributor

@sentrivana sentrivana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @misrasaurabh1!

@sentrivana sentrivana enabled auto-merge (squash) October 15, 2025 07:41
@sentrivana sentrivana merged commit 749e409 into getsentry:master Oct 15, 2025
112 checks passed
Copy link
Member

@szokeasaurusrex szokeasaurusrex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants