Skip to content

Conversation

@kyungsoo-datahub
Copy link
Contributor

Implements fallback parsing for MSSQL/TSQL stored procedures containing TRY/CATCH blocks and other control flow syntax that sqlglot doesn't support. The parser extracts and parses individual DML statements separately, then aggregates the lineage results.

…ures with control flow

Implements fallback parsing for MSSQL/TSQL stored procedures containing TRY/CATCH blocks and other control flow syntax that sqlglot doesn't support. The parser extracts and parses individual DML statements separately, then aggregates the lineage results.
@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Nov 18, 2025
@codecov
Copy link

codecov bot commented Nov 18, 2025

Codecov Report

❌ Patch coverage is 90.19608% with 5 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...gestion/src/datahub/sql_parsing/sqlglot_lineage.py 90.19% 5 Missing ⚠️

📢 Thoughts on this report? Let us know!

@datahub-cyborg datahub-cyborg bot added the needs-review Label for PRs that need review from a maintainer. label Nov 18, 2025
Use single TSQL_CONTROL_FLOW_KEYWORDS for both detection and filtering.
Enables support for stored procedures with IF, WHILE, and other control flow.
This is necessary because sqlglot doesn't support TSQL control flow syntax like
TRY/CATCH blocks, which causes the entire procedure to be unparseable.
"""
from datahub.sql_parsing.split_statements import split_statements
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move import to the top?

"""
from datahub.sql_parsing.split_statements import split_statements

logger.info(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be on debug level

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I right? I'm just thinking from user's perspective if it makes sense to show this on info level.

return create


def _is_stored_procedure_with_unsupported_syntax(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we match the dialect as well? This is an mssql specific syntax, right? Can we check for the dialect as well?

@datahub-cyborg datahub-cyborg bot added pending-submitter-response Issue/request has been reviewed but requires a response from the submitter and removed needs-review Label for PRs that need review from a maintainer. labels Nov 19, 2025
… stored procedure parser

- Add dialect check to ensure TSQL control flow detection only applies to MSSQL/TSQL
- Fix infinite recursion bug by adding _disable_fallback_parser flag to prevent recursive fallback parser calls
- Add test coverage for non-MSSQL dialects (PostgreSQL, MySQL)

This addresses a code review comment and resolves a critical recursion issue where split_statements could return chunks still matching the CREATE PROCEDURE pattern, causing infinite loops.
@datahub-cyborg datahub-cyborg bot added needs-review Label for PRs that need review from a maintainer. and removed pending-submitter-response Issue/request has been reviewed but requires a response from the submitter labels Nov 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ingestion PR or Issue related to the ingestion of metadata needs-review Label for PRs that need review from a maintainer.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants