Skip to content

mpulsemobile/automated-conversion

Repository files navigation

Automated SQL Server to Snowflake Conversion Tool

A comprehensive tool for automatically converting SQL Server T-SQL code to Snowflake SQL with integrated QA testing and Git workflow management.

Features

  • Automated Code Conversion: Uses LLM (Claude, GPT, or OpenRouter) to convert T-SQL to Snowflake SQL
  • Iterative Review Process: Multiple conversion iterations with LLM-based code review
  • Comprehensive QA Testing: Integrated testing using the existing snowflake-qa system
  • Git Workflow Management: Automatic feature branch creation, commits, and pull requests
  • Database Integration: Direct connection to SQL Server and Snowflake databases
  • Object Discovery: Automatic discovery of stored procedures, functions, views, and DDL scripts
  • Error Handling & Recovery: Robust error handling with automatic recovery strategies
  • Monitoring & Logging: Comprehensive logging and progress monitoring
  • Configuration Management: Flexible YAML-based configuration

Prerequisites

Required Tools

  • Python 3.8+
  • GitHub CLI tools (gh command)
  • Git
  • SQL Server access (with integrated security or username/password)
  • Snowflake account with SSO or API key authentication

Required API Keys

  • Anthropic API key (for Claude) OR
  • OpenAI API key (for GPT) OR
  • OpenRouter API key (for multiple models)

Installation

  1. Clone or download the tool:

    git clone <repository-url>
    cd automated-conversion
  2. Install Python dependencies:

    pip install -r requirements.txt
  3. Set up environment variables:

    # For Anthropic (Claude)
    export ANTHROPIC_API_KEY="your-api-key-here"
    
    # For OpenAI (GPT)
    export OPENAI_API_KEY="your-api-key-here"
    
    # For OpenRouter
    export OPENROUTER_API_KEY="your-api-key-here"
  4. Configure the tool:

    • Edit config.yaml with your specific settings
    • Update repository paths, database connections, and LLM preferences

Configuration

The tool uses a comprehensive YAML configuration file (config.yaml) with the following main sections:

Global Settings

global:
  max_review_iterations: 3
  conversion_timeout_seconds: 300
  qa_timeout_seconds: 600
  log_level: "INFO"
  dry_run: false

Repository Paths

repositories:
  source_repo: "C:/code/db-staging"      # Bitbucket T-SQL objects
  target_repo: "C:/gh_code/db-staging"   # GitHub Snowflake SQL
  qa_repo: "C:/code/snowflake-qa"       # Centralized QA configs

Database Connections

databases:
  sql_server:
    server: "DB10"
    use_integrated_security: true
    backup_path: "C:/backups/db-staging.bak"
  
  snowflake:
    account: "nmb10465.us-east-1.privatelink"
    warehouse: "wh_compute_xm"
    role: "DATA_DEVELOPER"
    authenticator: "externalbrowser"

LLM Configuration

llm:
  provider: "anthropic"  # anthropic, openai, openrouter
  model: "claude-3-5-sonnet-20241022"
  api_key_env: "ANTHROPIC_API_KEY"
  max_tokens: 4000
  temperature: 0.1

Usage

Basic Commands

  1. Check configuration:

    python main.py check-config
  2. List available objects:

    python main.py list-objects
  3. Convert all objects:

    python main.py convert
  4. Convert specific objects:

    python main.py convert --objects "dbo.proc1" "dbo.func1"
  5. Dry run (no actual conversion):

    python main.py convert --dry-run
  6. Check conversion status:

    python main.py status

Advanced Usage

  1. Use custom configuration:

    python main.py convert --config custom_config.yaml
  2. Specify output directory:

    python main.py convert --output-dir /path/to/output

Workflow

The tool follows this automated workflow:

Step 1: Setup

  1. Validates configuration and database connections
  2. Discovers SQL Server objects to convert
  3. Initializes Git repositories and QA system

Step 2: Conversion Loop

For each object:

  1. Create feature branch with object name
  2. Retrieve source code from database or files
  3. Convert using LLM with iterative review (up to 3 iterations)
  4. Save converted code to target repository
  5. Run QA tests (if not dry-run)
  6. Commit changes with descriptive message
  7. Create pull request (if QA passes)

Step 3: QA Validation

  1. Setup database objects in both SQL Server and Snowflake
  2. Deploy converted code to respective databases
  3. Execute procedures and compare outputs
  4. Run summary/KPI checks and record results
  5. Run row-level checks and record differences

Step 4: Git Workflow

  1. Commit changes to feature branch
  2. Push branch to remote repository
  3. Create pull request with conversion details
  4. Clean up temporary objects (if QA passes)

Object Types Supported

  • Stored Procedures: Converted to Snowflake stored procedures
  • Functions: Converted to Snowflake UDFs
  • Views: Converted to Snowflake views
  • DDL Scripts: Table creation and modification scripts
  • ETL Scripts: Data manipulation scripts

Conversion Rules

The tool applies comprehensive conversion rules including:

Syntax Conversion

  • DECLARELET
  • SET @variable = valueLET variable := value
  • IF EXISTSIF (EXISTS())
  • BEGIN/END blocks → Snowflake syntax

Data Type Mapping

  • VARCHAR(MAX)VARCHAR
  • DATETIMETIMESTAMP_NTZ
  • BITBOOLEAN
  • UNIQUEIDENTIFIERVARCHAR(36)

Function Conversion

  • ISNULL()IFNULL()
  • LEN()LENGTH()
  • CHARINDEX()POSITION()
  • GETDATE()CURRENT_TIMESTAMP()

Naming Conventions

  • Schema prefix: STG
  • Procedure prefix: SP_
  • Function prefix: FN_
  • View prefix: VW_

QA Testing

The tool integrates with the existing snowflake-qa system to provide:

  • Syntax Validation: Basic SQL syntax checks
  • Functionality Tests: Parameter and return type consistency
  • Performance Tests: Query structure and optimization checks
  • Data Validation: Data type and constraint consistency
  • Integration Tests: Full end-to-end testing

Error Handling

The tool includes comprehensive error handling:

  • Automatic Recovery: Retry strategies for different error types
  • Fallback Mechanisms: Alternative approaches when primary methods fail
  • Error Categorization: Classification by severity and type
  • Detailed Logging: Complete error tracking and reporting

Monitoring

Real-time monitoring includes:

  • Progress Tracking: Current object and completion percentage
  • Performance Metrics: Processing speed and resource usage
  • LLM Usage: Token consumption and API costs
  • Error Statistics: Error rates and recovery success

Output Files

The tool generates several output files:

  • conversion_log.json: Detailed conversion log
  • error_log.json: Error tracking and recovery attempts
  • progress_report.json: Performance and progress metrics
  • Converted SQL files in the target repository

Troubleshooting

Common Issues

  1. Database Connection Errors:

    • Verify SQL Server server name and authentication
    • Check Snowflake account and credentials
    • Ensure network connectivity
  2. LLM API Errors:

    • Verify API key is set correctly
    • Check API rate limits and quotas
    • Ensure model availability
  3. Git Workflow Errors:

    • Verify GitHub CLI is installed and authenticated
    • Check repository permissions
    • Ensure target repository exists
  4. QA Test Failures:

    • Check database object creation
    • Verify data consistency
    • Review conversion accuracy

Debug Mode

Enable debug logging by setting:

logging:
  level: "DEBUG"

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

For issues and questions:

  1. Check the troubleshooting section
  2. Review the logs for error details
  3. Create an issue with detailed information
  4. Include configuration (with sensitive data redacted) and error logs

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages