Skip to content

Enhance SEA HTTP Client #618

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 453 commits into from
Aug 4, 2025
Merged

Enhance SEA HTTP Client #618

merged 453 commits into from
Aug 4, 2025

Conversation

varun-edachali-dbx
Copy link
Contributor

@varun-edachali-dbx varun-edachali-dbx commented Jun 27, 2025

What type of PR is this?

  • Feature

Description

Enhance the current, simplified SEA Http Client by introducing retries. Most of the auth related functionality is pulled from the Thrift backend. All of the retry-related tests are expected to pass since they represent user-facing functionality (in terms of exceptions to be thrown in particular cases).

We continue using a low-level connection pool instead of using the higher level requests module because the latter does not have first-class support for passing a password to a client certificate, which we want to support. There are workarounds involving subclassing the HttpAdapter and invoking init_poolmanager, but the docstring of the method states that it should not be invoked from client code, so we do not consider it a long term solution.

The exceptions thrown by the SEA and Thrift backends are not completely the same (yet), because this PR passes None to the context and session id of the RequestError object. This misparity will be logged and addressed later. Here, the context is expected to have the following keys as per the existing class definition:

class RequestError(OperationalError):
    """Thrown if there was a error during request to the server.
    Its context will have the following keys:
    "method": The RPC method name that failed
    "session-id": The Thrift session guid
    "query-id": The Thrift query guid (if available)
    "http-code": HTTP response code to RPC request (if available)
    "error-message": Error message from the HTTP headers (if available)
    "original-exception": The Python level original exception
    "no-retry-reason": Why the request wasn't retried (if available)
    "bounded-retry-delay": The maximum amount of time an error will be retried before giving up
    "attempt": current retry number / maximum number of retries
    "elapsed-seconds": time that has elapsed since first attempting the RPC request
    """

    pass

How?

In _make_request(), the client sets up the retry context before making the HTTP request:

  • Determines the operation type (EXECUTE_STATEMENT, GET_OPERATION_STATUS, etc.) based on the API path and HTTP method
  • Sets the command type on the DatabricksRetryPolicy - this influences retry decisions (e.g., EXECUTE_STATEMENT is only retried for 429/503 status codes)
  • Starts the retry timer to track total request duration across all retry attempts
  • Makes the HTTP request through urllib3 with the retry policy attached

The retry policy then handles all retry logic automatically - determining when to retry based on status codes and command types, calculating backoff delays, and enforcing attempt/duration limits. If retries are exhausted, urllib3 raises a MaxRetryError which gets caught and wrapped in a RequestError.

How is this tested?

  • Unit tests
  • E2E Tests
  • Manually
  • N/A

Related Tickets & Documents

Design Doc

varun-edachali-dbx and others added 30 commits June 18, 2025 05:46
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
* [squash from exec-sea] bring over execution phase changes

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove excess test

Signed-off-by: varun-edachali-dbx <[email protected]>

* add docstring

Signed-off-by: varun-edachali-dbx <[email protected]>

* remvoe exec func in sea backend

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove excess files

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove excess models

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove excess sea backend tests

Signed-off-by: varun-edachali-dbx <[email protected]>

* cleanup

Signed-off-by: varun-edachali-dbx <[email protected]>

* re-introduce get_schema_desc

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove SeaResultSet

Signed-off-by: varun-edachali-dbx <[email protected]>

* clean imports and attributes

Signed-off-by: varun-edachali-dbx <[email protected]>

* pass CommandId to ExecResp

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove changes in types

Signed-off-by: varun-edachali-dbx <[email protected]>

* add back essential types (ExecResponse, from_sea_state)

Signed-off-by: varun-edachali-dbx <[email protected]>

* fix fetch types

Signed-off-by: varun-edachali-dbx <[email protected]>

* excess imports

Signed-off-by: varun-edachali-dbx <[email protected]>

* reduce diff by maintaining logs

Signed-off-by: varun-edachali-dbx <[email protected]>

* fix int test types

Signed-off-by: varun-edachali-dbx <[email protected]>

* [squashed from exec-sea] init execution func

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove irrelevant changes

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove ResultSetFilter functionality

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove more irrelevant changes

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove more irrelevant changes

Signed-off-by: varun-edachali-dbx <[email protected]>

* even more irrelevant changes

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove sea response as init option

Signed-off-by: varun-edachali-dbx <[email protected]>

* exec test example scripts

Signed-off-by: varun-edachali-dbx <[email protected]>

* formatting (black)

Signed-off-by: varun-edachali-dbx <[email protected]>

* [squashed from sea-exec] merge sea stuffs

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove excess changes

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove excess removed docstring

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove excess changes in backend

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove excess imports

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove accidentally removed _get_schema_desc

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove unnecessary init with sea_response tests

Signed-off-by: varun-edachali-dbx <[email protected]>

* rmeove unnecessary changes

Signed-off-by: varun-edachali-dbx <[email protected]>

* formatting (black)

Signed-off-by: varun-edachali-dbx <[email protected]>

* improved models and filters from cloudfetch-sea branch

Signed-off-by: varun-edachali-dbx <[email protected]>

* filters stuff (align with JDBC)

Signed-off-by: varun-edachali-dbx <[email protected]>

* backend from cloudfetch-sea

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove filtering, metadata ops

Signed-off-by: varun-edachali-dbx <[email protected]>

* raise NotImplementedErrror for metadata ops

Signed-off-by: varun-edachali-dbx <[email protected]>

* change to valid table name

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove un-necessary changes

covered by #588

Signed-off-by: varun-edachali-dbx <[email protected]>

* simplify test module

Signed-off-by: varun-edachali-dbx <[email protected]>

* logging -> debug level

Signed-off-by: varun-edachali-dbx <[email protected]>

* change table name in log

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove un-necessary changes

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove un-necessary backend cahnges

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove un-needed GetChunksResponse

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove un-needed GetChunksResponse

only relevant in Fetch phase

Signed-off-by: varun-edachali-dbx <[email protected]>

* reduce code duplication in response parsing

Signed-off-by: varun-edachali-dbx <[email protected]>

* reduce code duplication

Signed-off-by: varun-edachali-dbx <[email protected]>

* more clear docstrings

Signed-off-by: varun-edachali-dbx <[email protected]>

* introduce strongly typed ChunkInfo

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove is_volume_operation from response

Signed-off-by: varun-edachali-dbx <[email protected]>

* add is_volume_op and more ResultData fields

Signed-off-by: varun-edachali-dbx <[email protected]>

* add test scripts

Signed-off-by: varun-edachali-dbx <[email protected]>

* Revert "Merge branch 'exec-models-sea' into exec-phase-sea"

This reverts commit be1997e, reversing
changes made to 37813ba.

* change logging level

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove un-necessary changes

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove excess changes

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove excess changes

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove _get_schema_bytes (for now)

Signed-off-by: varun-edachali-dbx <[email protected]>

* redundant comments

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove fetch phase methods

Signed-off-by: varun-edachali-dbx <[email protected]>

* reduce code repetititon + introduce gaps after multi line pydocs

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove unused imports

Signed-off-by: varun-edachali-dbx <[email protected]>

* move description extraction to helper func

Signed-off-by: varun-edachali-dbx <[email protected]>

* formatting (black)

Signed-off-by: varun-edachali-dbx <[email protected]>

* add more unit tests

Signed-off-by: varun-edachali-dbx <[email protected]>

* streamline unit tests

Signed-off-by: varun-edachali-dbx <[email protected]>

* test getting the list of allowed configurations

Signed-off-by: varun-edachali-dbx <[email protected]>

* reduce diff

Signed-off-by: varun-edachali-dbx <[email protected]>

* reduce diff

Signed-off-by: varun-edachali-dbx <[email protected]>

* house constants in enums for readability and immutability

Signed-off-by: varun-edachali-dbx <[email protected]>

* add note on hybrid disposition

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove redundant note on arrow_schema_bytes

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove invalid import

Signed-off-by: varun-edachali-dbx <[email protected]>

* add strong typing for manifest in _extract_description

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove un-necessary column skipping

Signed-off-by: varun-edachali-dbx <[email protected]>

* remove parsing in backend

Signed-off-by: varun-edachali-dbx <[email protected]>

* fix: convert sea statement id to CommandId type

Signed-off-by: varun-edachali-dbx <[email protected]>

* make polling interval a separate constant

Signed-off-by: varun-edachali-dbx <[email protected]>

* align state checking with Thrift implementation

Signed-off-by: varun-edachali-dbx <[email protected]>

* update unit tests according to changes

Signed-off-by: varun-edachali-dbx <[email protected]>

* add unit tests for added methods

Signed-off-by: varun-edachali-dbx <[email protected]>

* add spec to description extraction docstring, add strong typing to params

Signed-off-by: varun-edachali-dbx <[email protected]>

* add strong typing for backend parameters arg

Signed-off-by: varun-edachali-dbx <[email protected]>

---------

Signed-off-by: varun-edachali-dbx <[email protected]>
…nd forward refs, remove some unused imports

Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
This reverts commit dabba55, reversing
changes made to dd7dc6a.

Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
This reverts commit 3a999c0, reversing
changes made to a1f9b9c.
This reverts commit dabba55, reversing
changes made to dd7dc6a.
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
@databricks databricks deleted a comment from github-actions bot Jul 28, 2025
@databricks databricks deleted a comment from github-actions bot Jul 28, 2025
Copy link
Contributor

@jayantsing-db jayantsing-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments

@jayantsing-db
Copy link
Contributor

The exceptions thrown by the SEA and Thrift backends are not completely the same (yet), because this PR passes None to the context and session id of the RequestError object. This misparity will be logged and addressed later.

Could you add details on this? Is there a follow-up PR? what is the context and session id?

Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Copy link
Contributor

@jayantsing-db jayantsing-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@databricks databricks deleted a comment from github-actions bot Aug 4, 2025
@varun-edachali-dbx varun-edachali-dbx merged commit 3b0c882 into main Aug 4, 2025
22 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants