⚡️ Speed up method `PPTParser.convert_ppt_to_pptx` by 126% #272

codeflash-ai · 2025-10-30T03:16:51Z

📄 126% (1.26x) speedup for `PPTParser.convert_ppt_to_pptx` in `backend/python/app/modules/parsers/pptx/ppt_parser.py`

⏱️ Runtime : 2.94 milliseconds → 1.30 milliseconds (best of 190 runs)

📝 Explanation and details

The optimization introduces caching for LibreOffice availability checking using a class-level attribute _libreoffice_found. This eliminates the repeated subprocess.run(["which", "libreoffice"]) call that was executed on every method invocation.

Key changes:

LibreOffice check caching: The availability check now runs only once per class lifetime, storing the result in self.__class__._libreoffice_found
Direct return optimization: Removed intermediate variable pptx_content and return file content directly

Performance impact:
The line profiler shows the LibreOffice check (subprocess.run) takes ~3.7ms and represents 78-93% of total execution time. By caching this check, subsequent calls skip this expensive operation entirely. The optimization is most effective for:

Batch processing scenarios: When converting multiple PPT files in sequence, only the first call pays the LibreOffice check cost
Repeated conversions: Applications that perform multiple conversions benefit immediately after the first successful check
High-frequency usage: Services processing many PPT files see cumulative time savings

The 125% speedup (2.94ms → 1.30ms) demonstrates significant improvement, particularly valuable in production environments where PPT conversion happens repeatedly with the same parser instance.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 12 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	63.6%

🌀 Generated Regression Tests and Runtime

import os
import shutil
import subprocess
import tempfile

# imports
import pytest
from app.modules.parsers.pptx.ppt_parser import PPTParser

# unit tests

# Helper to check if LibreOffice is installed
def libreoffice_installed():
    try:
        subprocess.run(["which", "libreoffice"], check=True, capture_output=True)
        return True
    except subprocess.CalledProcessError:
        return False

# Helper to create a minimal valid .ppt file (not a real one, but enough for LibreOffice to process)
def minimal_ppt_bytes():
    # This is a minimal valid PPT file header (OLE Compound File header)
    # LibreOffice can process this as a valid PPT
    return (
        b'\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1'  # OLE header
        + b'\x00' * 512  # pad to make it a bit larger
    )

# Helper to create a large fake PPT file
def large_ppt_bytes(size=1024*512):
    # Start with valid header, then pad
    return minimal_ppt_bytes() + b'A' * (size - len(minimal_ppt_bytes()))

@pytest.mark.skipif(not libreoffice_installed(), reason="LibreOffice not installed")
class TestConvertPptToPptxBasic:
    def setup_method(self):
        self.parser = PPTParser()

    
#------------------------------------------------
import os
import shutil
import subprocess
import tempfile

# imports
import pytest  # used for our unit tests
from app.modules.parsers.pptx.ppt_parser import PPTParser

# unit tests

# Helper function to check if LibreOffice is installed
def libreoffice_installed():
    return shutil.which("libreoffice") is not None

# Helper: create a minimal valid .ppt file (binary)
def minimal_ppt_bytes():
    # Minimal valid .ppt files start with D0 CF 11 E0 A1 B1 1A E1 (OLE header)
    # This is not a real PPT but enough for LibreOffice to attempt conversion
    return b'\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1' + b'\x00' * 512

# Helper: create a large .ppt file (simulate by repeating minimal bytes)
def large_ppt_bytes(size=1000):
    # 1000 slides simulated by repeating the minimal header (not a real PPT, but for stress test)
    return minimal_ppt_bytes() * size

# Helper: create a corrupted .ppt file (invalid header)
def corrupted_ppt_bytes():
    return b'not_a_valid_ppt_file'

# Helper: create a valid but empty .ppt file (OLE header only)
def empty_ppt_bytes():
    return b'\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1'

@pytest.mark.skipif(not libreoffice_installed(), reason="LibreOffice must be installed for these tests")
class TestConvertPptToPptx:
    # 1. Basic Test Cases

    
#------------------------------------------------
from app.modules.parsers.pptx.ppt_parser import PPTParser
import pytest

def test_PPTParser_convert_ppt_to_pptx():
    with pytest.raises(SideEffectDetected, match='We\'ve\\ blocked\\ a\\ file\\ writing\\ operation\\ on\\ "/tmp/rahaxd23"\\.\\ CrossHair\\ should\\ not\\ be\\ run\\ on\\ code\\ with\\ side\\ effects'):
        PPTParser.convert_ppt_to_pptx(PPTParser(), b'')

To edit these changes git checkout codeflash/optimize-PPTParser.convert_ppt_to_pptx-mhcus5al and push.

The optimization introduces **caching for LibreOffice availability checking** using a class-level attribute `_libreoffice_found`. This eliminates the repeated `subprocess.run(["which", "libreoffice"])` call that was executed on every method invocation. **Key changes:** - **LibreOffice check caching**: The availability check now runs only once per class lifetime, storing the result in `self.__class__._libreoffice_found` - **Direct return optimization**: Removed intermediate variable `pptx_content` and return file content directly **Performance impact:** The line profiler shows the LibreOffice check (`subprocess.run`) takes ~3.7ms and represents 78-93% of total execution time. By caching this check, subsequent calls skip this expensive operation entirely. The optimization is most effective for: - **Batch processing scenarios**: When converting multiple PPT files in sequence, only the first call pays the LibreOffice check cost - **Repeated conversions**: Applications that perform multiple conversions benefit immediately after the first successful check - **High-frequency usage**: Services processing many PPT files see cumulative time savings The 125% speedup (2.94ms → 1.30ms) demonstrates significant improvement, particularly valuable in production environments where PPT conversion happens repeatedly with the same parser instance.

codeflash-ai bot requested a review from mashraf-222 October 30, 2025 03:16

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up method `PPTParser.convert_ppt_to_pptx` by 126% #272

⚡️ Speed up method `PPTParser.convert_ppt_to_pptx` by 126% #272

Uh oh!

codeflash-ai bot commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up method PPTParser.convert_ppt_to_pptx by 126% #272

Are you sure you want to change the base?

⚡️ Speed up method PPTParser.convert_ppt_to_pptx by 126% #272

Uh oh!

Conversation

codeflash-ai bot commented Oct 30, 2025

📄 126% (1.26x) speedup for PPTParser.convert_ppt_to_pptx in backend/python/app/modules/parsers/pptx/ppt_parser.py

📝 Explanation and details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up method `PPTParser.convert_ppt_to_pptx` by 126% #272

⚡️ Speed up method `PPTParser.convert_ppt_to_pptx` by 126% #272

📄 126% (1.26x) speedup for `PPTParser.convert_ppt_to_pptx` in `backend/python/app/modules/parsers/pptx/ppt_parser.py`