refactor(fivetran): update handling of database and schema names to use quoted identifiers #15321

askumar27 · 2025-11-18T01:39:18Z

📋 Summary

Quote database identifiers in Fivetran connector to align with Snowflake connector pattern and ensure consistent case-sensitive handling across all identifiers. Includes backward compatibility for valid unquoted identifiers to support existing Snowflake databases and schemas.

🎯 Motivation

This change addresses a regression identified in the postmortem where database names were not quoted while schema names were already quoted (since v1.2.0.7). The inconsistency created confusion and potential issues for users with case-sensitive or special-character database names. Also, introduced backward incompatibility with unquoted identifier usage in Snowflake configurations.

Root Cause:

Schema names were quoted in PR fix(ingestion): updated schema name handling to support Snowflake quoted identifiers #14210 #14523 to support BigQuery destination (Issue Fivetran Ingestion doesn't work for BigQuery Destination due to failure in parsing BigQuery sql syntax #14210)
Database names were left unquoted, creating inconsistent behavior
This inconsistency could cause failures when database names contain special characters or need case-sensitive handling

Why This Matters:

Consistency: Aligns Fivetran connector with the Snowflake connector's pattern of always quoting identifiers
Special Characters: Enables proper handling of database names with hyphens, spaces, dots, and other special characters
Case Sensitivity: Ensures database names preserve exact case, matching the behavior already established for schema names
Cross-Platform Support: Required for proper SQL transpilation to BigQuery and Databricks destinations
Backward Compatibility: Automatically handles existing Snowflake databases created with unquoted identifiers

Related Context:

Original Issue: Fivetran Ingestion doesn't work for BigQuery Destination due to failure in parsing BigQuery sql syntax #14210 - Fivetran Ingestion doesn't work for BigQuery Destination
Original PR: fix(ingestion): updated schema name handling to support Snowflake quoted identifiers #14210 #14523 - Introduced quoted identifiers for schema names

🏗️ Architecture/Design Notes

Identifier Handling Pattern:

Follows Snowflake's quoted identifier convention (double quotes)
All identifiers (database and schema) are now consistently quoted
Quote escaping: " → "" to handle database/schema names containing quotes
SQL queries are written in Snowflake SQL and transpiled to target dialects at runtime

Backward Compatibility Strategy:

Valid unquoted identifiers are detected using _is_valid_unquoted_identifier():
- Must start with letter (A-Z) or underscore (_)
- Must contain only letters, numbers, and underscores
- Cannot be already quoted (starts/ends with double quotes)
For Snowflake destinations, valid unquoted identifiers are automatically uppercased before quoting
This matches Snowflake's behavior: unquoted identifiers are auto-converted to uppercase
Ensures existing databases created without quotes continue to work seamlessly

Alignment with Snowflake Connector:

Matches the pattern used in src/datahub/ingestion/source/snowflake/snowflake_query.py
Both connectors now use: use database "{db_name}" and "{schema_name}".table_name
Ensures consistent behavior across DataHub's Snowflake-related connectors

📊 Impact Assessment

Breaking Changes:

NO
Backward compatibility is automatically handled for valid unquoted identifiers
Users with standard identifiers (letters, numbers, underscores only) require no configuration changes
Users with special characters or case-sensitive requirements must ensure exact case matching - This is a new enhancement in this PR to allow such usage.

Migration Impact:

No Action Required for standard identifiers (letters, numbers, underscores only):
- Valid unquoted identifiers are automatically uppercased to match existing Snowflake objects
- Example: fivetran_logs → automatically becomes FIVETRAN_LOGS → "FIVETRAN_LOGS"
- Existing configurations continue to work without changes
Recommended Best Practice: Maintain exact case in configuration for consistency:
- If database is stored as MYDATABASE in Snowflake → use MYDATABASE in config
- If database is stored as mydatabase in Snowflake → use mydatabase in config
- This ensures consistency and makes configuration more explicit
Timeline: Change impacts all CLI versions since v1.2.0.7 (schema quoting) and now extends to database names

Risk Level: Low

Backward compatibility automatically handles existing configurations
Comprehensive test coverage (unit + integration) reduces risk of regressions
Change follows established pattern already used for schema names
Low technical risk - change is straightforward and well-tested

🔗 References

Original Issue: Fivetran Ingestion doesn't work for BigQuery Destination due to failure in parsing BigQuery sql syntax #14210 - Fivetran Ingestion doesn't work for BigQuery Destination
Original PR: fix(ingestion): updated schema name handling to support Snowflake quoted identifiers #14210 #14523 - Introduced quoted identifiers for schema names
Snowflake Documentation: Identifier Requirements

…se quoted identifiers for Snowflake compatibility

codecov · 2025-11-18T01:41:54Z

❌ 1 Tests Failed:

Tests completed	Failed	Passed	Skipped
5496	1	5495	38

View the top 1 failed test(s) by shortest run time

tests.integration.kafka-connect-confluent-cloud.test_kafka_connect_confluent_cloud::test_kafka_connect_confluent_cloud_ingest

Stack Traces | 60.8s run time

docker_compose_runner = <function docker_compose_runner.<locals>.run at 0x7fdd6b41f6a0>
pytestconfig = <_pytest.config.Config object at 0x7fde49cc9590>
test_resources_dir = PosixPath('.../tests/integration/kafka-connect-confluent-cloud')

    @pytest.fixture(scope="module")
    def confluent_cloud_mock_runner(
        docker_compose_runner, pytestconfig, test_resources_dir
    ):
        docker_compose_file = [str(test_resources_dir / "docker-compose.override.yml")]
    
>       with docker_compose_runner(
            docker_compose_file, "kafka-connect-confluent-cloud"
        ) as docker_services:

.../integration/kafka-connect-confluent-cloud/test_kafka_connect_confluent_cloud.py:45: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
.../hostedtoolcache/Python/3.11.14....../x64/lib/python3.11/contextlib.py:137: in __enter__
    return next(self.gen)
           ^^^^^^^^^^^^^^
.../datahub/testing/docker_utils.py:65: in run
    with pytest_docker.plugin.get_docker_services(
.../hostedtoolcache/Python/3.11.14....../x64/lib/python3.11/contextlib.py:137: in __enter__
    return next(self.gen)
           ^^^^^^^^^^^^^^
venv/lib/python3.11........./site-packages/pytest_docker/plugin.py:212: in get_docker_services
    docker_compose.execute(command)
venv/lib/python3.11........./site-packages/pytest_docker/plugin.py:140: in execute
    return execute(command, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

command = 'docker compose --parallel -1 -f ".../tests/integration/kafka-connect-confluent-cloud/docker-compose.override.yml" -p "pytest5034-kafka-connect-confluent-cloud" up --build --wait'
success_codes = (0,), ignore_stderr = False

    def execute(command: str, success_codes: Iterable[int] = (0,), ignore_stderr: bool = False) -> Union[bytes, Any]:
        """Run a shell command."""
        try:
            stderr_pipe = subprocess.DEVNULL if ignore_stderr else subprocess.STDOUT
            output = subprocess.check_output(command, stderr=stderr_pipe, shell=True)
            status = 0
        except subprocess.CalledProcessError as error:
            output = error.output or b""
            status = error.returncode
            command = error.cmd
    
        if status not in success_codes:
>           raise Exception(
                'Command {} returned {}: """{}""".'.format(command, status, output.decode("utf-8"))
            )
E           Exception: Command docker compose --parallel -1 -f ".../tests/integration/kafka-connect-confluent-cloud/docker-compose.override.yml" -p "pytest5034-kafka-connect-confluent-cloud" up --build --wait returned 1: """#1 [internal] load local bake definitions
E           #1 reading from stdin 578B done
E           #1 DONE 0.0s
E           
E           #2 [internal] load build definition from Dockerfile
E           #2 transferring dockerfile: 364B done
E           #2 DONE 0.0s
E           
E           #3 [internal] load metadata for docker.io/library/python:3.11-slim
E           #3 DONE 0.2s
E           
E           #4 [internal] load .dockerignore
E           #4 transferring context: 2B done
E           #4 DONE 0.0s
E           
E           #5 [internal] load build context
E           #5 transferring context: 18.80kB done
E           #5 DONE 0.0s
E           
E           #6 [1/4] FROM docker.io/library/python:3.11-slim@sha256:8ef21a26e7c342e978a68cf2d6b07627885930530064f572f432ea422a8c0907
E           #6 resolve docker.io/library/python:3.11-slim@sha256:8ef21a26e7c342e978a68cf2d6b07627885930530064f572f432ea422a8c0907 done
E           #6 sha256:c4116d4d7ea9320db352f6516001262753529edf1e20b2c6609a6b9a49cc6be4 1.75kB / 1.75kB done
E           #6 sha256:040af88f5bce9e9239b7e739e86304d26964d1d55ada56b9297a3d891e91634d 5.48kB / 5.48kB done
E           #6 sha256:0e4bc2bd6656e6e004e3c749af70e5650bac2258243eb0949dea51cb8b7863db 7.34MB / 29.78MB 0.1s
E           #6 sha256:22b63e76fde1e200371ed9f3cee91161d192063bcff65c9ab6bf63819810a974 1.29MB / 1.29MB 0.1s done
E           #6 sha256:b3dd773c329649f22e467ae63d1c612a039a0559dec99ffb9ada904ab5c60c55 6.29MB / 14.36MB 0.1s
E           #6 sha256:1771569cc1299abc9cc762fc4419523e721b11a3927ef968ae63ba0a4a88f2da 0B / 251B 0.1s
E           #6 sha256:8ef21a26e7c342e978a68cf2d6b07627885930530064f572f432ea422a8c0907 10.37kB / 10.37kB done
E           #6 sha256:0e4bc2bd6656e6e004e3c749af70e5650bac2258243eb0949dea51cb8b7863db 27.26MB / 29.78MB 0.3s
E           #6 sha256:b3dd773c329649f22e467ae63d1c612a039a0559dec99ffb9ada904ab5c60c55 14.36MB / 14.36MB 0.2s done
E           #6 sha256:1771569cc1299abc9cc762fc4419523e721b11a3927ef968ae63ba0a4a88f2da 251B / 251B 0.1s done
E           #6 sha256:0e4bc2bd6656e6e004e3c749af70e5650bac2258243eb0949dea51cb8b7863db 29.78MB / 29.78MB 0.3s done
E           #6 extracting sha256:0e4bc2bd6656e6e004e3c749af70e5650bac2258243eb0949dea51cb8b7863db 0.1s
E           #6 extracting sha256:0e4bc2bd6656e6e004e3c749af70e5650bac2258243eb0949dea51cb8b7863db 0.5s done
E           #6 extracting sha256:22b63e76fde1e200371ed9f3cee91161d192063bcff65c9ab6bf63819810a974 0.1s done
E           #6 extracting sha256:b3dd773c329649f22e467ae63d1c612a039a0559dec99ffb9ada904ab5c60c55
E           #6 extracting sha256:b3dd773c329649f22e467ae63d1c612a039a0559dec99ffb9ada904ab5c60c55 0.4s done
E           #6 extracting sha256:1771569cc1299abc9cc762fc4419523e721b11a3927ef968ae63ba0a4a88f2da
E           #6 extracting sha256:1771569cc1299abc9cc762fc4419523e721b11a3927ef968ae63ba0a4a88f2da done
E           #6 DONE 1.7s
E           
E           #7 [2/4] WORKDIR /app
E           #7 DONE 0.0s
E           
E           #8 [3/4] COPY confluent_cloud_mock_api.py ./
E           #8 DONE 0.0s
E           
E           #9 [4/4] RUN pip install --no-cache-dir flask
E           #9 1.613 Collecting flask
E           #9 1.627   Downloading flask-3.1.2-py3-none-any.whl.metadata (3.2 kB)
E           #9 1.639 Collecting blinker>=1.9.0 (from flask)
E           #9 1.643   Downloading blinker-1.9.0-py3-none-any.whl.metadata (1.6 kB)
E           #9 1.659 Collecting click>=8.1.3 (from flask)
E           #9 1.661   Downloading click-8.3.1-py3-none-any.whl.metadata (2.6 kB)
E           #9 1.670 Collecting itsdangerous>=2.2.0 (from flask)
E           #9 1.673   Downloading itsdangerous-2.2.0-py3-none-any.whl.metadata (1.9 kB)
E           #9 1.686 Collecting jinja2>=3.1.2 (from flask)
E           #9 1.688   Downloading jinja2-3.1.6-py3-none-any.whl.metadata (2.9 kB)
E           #9 1.727 Collecting markupsafe>=2.1.1 (from flask)
E           #9 1.730   Downloading markupsafe-3.0.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata (2.7 kB)
E           #9 1.750 Collecting werkzeug>=3.1.0 (from flask)
E           #9 1.753   Downloading werkzeug-3.1.3-py3-none-any.whl.metadata (3.7 kB)
E           #9 1.764 Downloading flask-3.1.2-py3-none-any.whl (103 kB)
E           #9 1.767    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 103.3/103.3 kB 53.3 MB/s eta 0:00:00
E           #9 1.770 Downloading blinker-1.9.0-py3-none-any.whl (8.5 kB)
E           #9 1.772 Downloading click-8.3.1-py3-none-any.whl (108 kB)
E           #9 1.773    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 108.3/108.3 kB 579.6 MB/s eta 0:00:00
E           #9 1.775 Downloading itsdangerous-2.2.0-py3-none-any.whl (16 kB)
E           #9 1.778 Downloading jinja2-3.1.6-py3-none-any.whl (134 kB)
E           #9 1.779    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 134.9/134.9 kB 607.9 MB/s eta 0:00:00
E           #9 1.781 Downloading markupsafe-3.0.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (22 kB)
E           #9 1.784 Downloading werkzeug-3.1.3-py3-none-any.whl (224 kB)
E           #9 1.785    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 224.5/224.5 kB 657.7 MB/s eta 0:00:00
E           #9 1.820 Installing collected packages: markupsafe, itsdangerous, click, blinker, werkzeug, jinja2, flask
E           #9 2.066 Successfully installed blinker-1.9.0 click-8.3.1 flask-3.1.2 itsdangerous-2.2.0 jinja2-3.1.6 markupsafe-3.0.3 werkzeug-3.1.3
E           #9 2.067 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
E           #9 2.114 
E           #9 2.114 [notice] A new release of pip is available: 24.0 -> 25.3
E           #9 2.114 [notice] To update, run: pip install --upgrade pip
E           #9 DONE 2.3s
E           
E           #10 exporting to image
E           #10 exporting layers
E           #10 exporting layers 0.5s done
E           #10 writing image sha256:0cc303943d4c48bd103d1aaff9a5a09541bf80c5ad2b9381fec888cf42f000f8 done
E           #10 naming to docker.io/library/pytest5034-kafka-connect-confluent-cloud-confluent-cloud-mock done
E           #10 DONE 0.5s
E           
E           #11 resolving provenance for metadata file
E           #11 DONE 0.0s
E            confluent-cloud-mock  Built
E            Network pytest5034-kafka-connect-confluent-cloud_default  Creating
E            Network pytest5034-kafka-connect-confluent-cloud_default  Created
E            Container test_confluent_cloud_mock  Creating
E            Container test_confluent_cloud_mock  Created
E            Container test_confluent_cloud_mock  Starting
E            Container test_confluent_cloud_mock  Started
E            Container test_confluent_cloud_mock  Waiting
E           container test_confluent_cloud_mock is unhealthy
E           """.

venv/lib/python3.11........./site-packages/pytest_docker/plugin.py:37: Exception

To view more test analytics, go to the Test Analytics Dashboard
_{📋 Got 3 mins? Take this short survey to help us improve Test Analytics.}

docs/how/updating-datahub.md

metadata-ingestion/tests/integration/fivetran/test_fivetran.py

…ntifiers in Snowflake - Added logic to handle unquoted database and schema names by converting them to uppercase quoted identifiers. - Introduced a validation method for unquoted identifiers to ensure compliance with Snowflake's requirements. - Updated tests to cover various cases for database and schema names, ensuring correct handling and transpilation for Snowflake, BigQuery, and Databricks.

…sensitivity in Snowflake - Removed outdated information regarding unquoted identifiers and emphasized the importance of using quoted identifiers for database and schema names. - Added details on backward compatibility for valid unquoted identifiers and recommended best practices for configuration. - Clarified action required for users to ensure case sensitivity in their Fivetran source configurations.

refactor(fivetran): update handling of database and schema names to u…

d10ca55

…se quoted identifiers for Snowflake compatibility

askumar27 requested a review from sgomezvillamor November 18, 2025 01:39

github-actions bot added ingestion PR or Issue related to the ingestion of metadata docs Issues and Improvements to docs labels Nov 18, 2025

github-actions bot deployed to datahub-wheels (Preview) November 18, 2025 01:41 View deployment

datahub-cyborg bot added the needs-review Label for PRs that need review from a maintainer. label Nov 18, 2025

vercel bot deployed to Preview November 18, 2025 01:50 View deployment

sgomezvillamor reviewed Nov 18, 2025

View reviewed changes

docs/how/updating-datahub.md Outdated Show resolved Hide resolved

sgomezvillamor reviewed Nov 18, 2025

View reviewed changes

metadata-ingestion/tests/integration/fivetran/test_fivetran.py Outdated Show resolved Hide resolved

sgomezvillamor approved these changes Nov 18, 2025

View reviewed changes

datahub-cyborg bot added pending-submitter-merge and removed needs-review Label for PRs that need review from a maintainer. labels Nov 18, 2025

askumar27 added 2 commits November 18, 2025 18:14

github-actions bot deployed to datahub-wheels (Preview) November 19, 2025 02:30 View deployment

Merge branch 'master' into refactor/fivetran-database-case-sensitivity

f50b1fd

github-actions bot deployed to datahub-wheels (Preview) November 19, 2025 02:44 View deployment

vercel bot deployed to Preview November 19, 2025 02:59 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(fivetran): update handling of database and schema names to use quoted identifiers #15321

refactor(fivetran): update handling of database and schema names to use quoted identifiers #15321

askumar27 commented Nov 18, 2025 •

edited

Loading

Uh oh!

codecov bot commented Nov 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

refactor(fivetran): update handling of database and schema names to use quoted identifiers #15321

Are you sure you want to change the base?

refactor(fivetran): update handling of database and schema names to use quoted identifiers #15321

Conversation

askumar27 commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📋 Summary

🎯 Motivation

🏗️ Architecture/Design Notes

📊 Impact Assessment

🔗 References

Uh oh!

codecov bot commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

❌ 1 Tests Failed:

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

askumar27 commented Nov 18, 2025 •

edited

Loading

codecov bot commented Nov 18, 2025 •

edited

Loading