Skip to content

Conversation

@carlopi
Copy link
Collaborator

@carlopi carlopi commented Jul 3, 2025

This integrates #58 AND #76, since they have some interactions that needed some care.

After duckdb/duckdb#18107 landed in duckdb/duckdb, and moving the duckdb submodule to a recent commit on v1.3-ossivalis, this PR allows to switch at runtime based on the newly added httpfs config option httpfs_client_implementation:

D SET logging_storage=stdout;
D PRAGMA enable_logging('HTTP');
D SET httpfs_client_implementation='default';
D select count(*)
  FROM 'https://github.com/duckdb/duckdb-data/releases/download/v1.0/event_baserunning_advance_attempt.parquet';
[LOG] 2025-07-03 10:06:18.479, HTTP, DEBUG, {'request': {'type': HEAD, 'url': 'https://github.com/duckdb/duckdb-data/releases/download/v1.0/event_baserunning_advance_attempt.parquet', 'headers': {}}, 'response': {'status': OK_200, 'reason': OK, 'headers': {X-Timer='S1751537178.169255,VS0,VE1', x-ms-request-id=91401b99-a01e-000c-7caa-dd765d000000, X-Served-By='cache-iad-kcgs7200132-IAD, cache-rtm-ehrd2290029-RTM', Fastly-Restarts=1, x-ms-lease-status=unlocked, x-ms-creation-time='Tue, 17 Jan 2023 21:28:40 GMT', x-ms-blob-type=BlockBlob, x-ms-blob-content-md5='0OgVgULsCWfa1mU4BlFEbg==', X-Cache-Hits='3730, 1', Via='1.1 varnish, 1.1 varnish', Server=Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0, x-ms-version=2025-05-05, X-Cache='HIT, HIT', ETag='"0x8DAF8D1CD43CA79"', Connection=keep-alive, x-ms-lease-state=available, Last-Modified='Tue, 17 Jan 2023 21:28:40 GMT', Date='Thu, 03 Jul 2025 10:06:18 GMT', Content-Length=21916382, Content-Type=application/octet-stream, Content-Disposition='attachment; filename=event_baserunning_advance_attempt.parquet', x-ms-server-encrypted=true, Age=806, Accept-Ranges=bytes}}}, CONNECTION, 2, 11, NULL
┌────────────────┐
│  count_star()  │
│     int64      │
├────────────────┤
│    9388600     │
│ (9.39 million) │
└────────────────┘
D SET httpfs_client_implementation='curl';
D select count(*)
  FROM 'https://github.com/duckdb/duckdb-data/releases/download/v1.0/event_baserunning_advance_attempt.parquet';
[LOG] 2025-07-03 10:06:30.247, HTTP, DEBUG, {'request': {'type': HEAD, 'url': 'https://github.com/duckdb/duckdb-data/releases/download/v1.0/event_baserunning_advance_attempt.parquet', 'headers': {}}, 'response': {'status': OK_200, 'reason': '', 'headers': {content-type=application/octet-stream, x-ms-lease-state=available, last-modified='Tue, 17 Jan 2023 21:28:40 GMT', x-ms-request-id=91401b99-a01e-000c-7caa-dd765d000000, accept-ranges=bytes, x-ms-version=2025-05-05, server=Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0, x-ms-creation-time='Tue, 17 Jan 2023 21:28:40 GMT', x-ms-blob-content-md5='0OgVgULsCWfa1mU4BlFEbg==', x-cache='HIT, HIT', __RESPONSE_STATUS__='HTTP/2 200 ', etag='"0x8DAF8D1CD43CA79"', x-ms-blob-type=BlockBlob, x-ms-server-encrypted=true, age=818, x-ms-lease-status=unlocked, x-served-by='cache-iad-kcgs7200132-IAD, cache-rtm-ehrd2290021-RTM', fastly-restarts=1, via='1.1 varnish, 1.1 varnish', date='Thu, 03 Jul 2025 10:06:30 GMT', x-cache-hits='3730, 1', content-disposition='attachment; filename=event_baserunning_advance_attempt.parquet', x-timer='S1751537190.940711,VS0,VE1', content-length=21916382}}}, CONNECTION, 2, 13, NULL
┌────────────────┐
│  count_star()  │
│     int64      │
├────────────────┤
│    9388600     │
│ (9.39 million) │
└────────────────┘
D SET httpfs_client_implementation='httplib';
D select count(*)
  FROM 'https://github.com/duckdb/duckdb-data/releases/download/v1.0/event_baserunning_advance_attempt.parquet';
[LOG] 2025-07-03 10:07:45.552, HTTP, DEBUG, {'request': {'type': HEAD, 'url': 'https://github.com/duckdb/duckdb-data/releases/download/v1.0/event_baserunning_advance_attempt.parquet', 'headers': {}}, 'response': {'status': OK_200, 'reason': OK, 'headers': {X-Timer='S1751537265.144944,VS0,VE0', x-ms-request-id=91401b99-a01e-000c-7caa-dd765d000000, X-Served-By='cache-iad-kcgs7200132-IAD, cache-rtm-ehrd2290047-RTM', Fastly-Restarts=1, x-ms-lease-status=unlocked, x-ms-creation-time='Tue, 17 Jan 2023 21:28:40 GMT', x-ms-blob-type=BlockBlob, x-ms-blob-content-md5='0OgVgULsCWfa1mU4BlFEbg==', X-Cache-Hits='3730, 0', Via='1.1 varnish, 1.1 varnish', Server=Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0, x-ms-version=2025-05-05, X-Cache='HIT, HIT', ETag='"0x8DAF8D1CD43CA79"', Connection=keep-alive, x-ms-lease-state=available, Last-Modified='Tue, 17 Jan 2023 21:28:40 GMT', Date='Thu, 03 Jul 2025 10:07:45 GMT', Content-Length=21916382, Content-Type=application/octet-stream, Content-Disposition='attachment; filename=event_baserunning_advance_attempt.parquet', x-ms-server-encrypted=true, Age=893, Accept-Ranges=bytes}}}, CONNECTION, 2, 15, NULL
┌────────────────┐
│  count_star()  │
│     int64      │
├────────────────┤
│    9388600     │
│ (9.39 million) │
└────────────────┘
D SET httpfs_client_implementation='something_else';
Invalid Input Error:
Unsupported option for httpfs_client_implementation, only `curl`, `httplib` and `default` are currently supported

It can be checked from the headers that slightly different implementations are used, given for example different styling for Etag vs etag or similar implementation details.

Please check original PR from @Tmonster that all relevant details: #58, this PR only adds a setting and resolve conflict with ongoing work.
Probably best path is cherry-picking commit back into original PR, or anyhow to be discussed on a side.

@carlopi
Copy link
Collaborator Author

carlopi commented Jul 3, 2025

As a note, I did give this some minor testing, I found a difference in behaviour while doing:

FORCE INSTALL non_existing_extension;

that would return an empty std::exception instead of the relevant error message.

Also, currently switching the setting is basically untested.

@Tmonster
Copy link
Contributor

superseded by #96
Can be closed

@carlopi carlopi closed this Aug 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants