streaming ingestion support for PUT operation #643

sreekanth-db · 2025-07-21T13:41:00Z

What type of PR is this?

Refactor
Feature
Bug Fix
Other

Description

Currently if users want to ingest an in-memory object or input stream (which they might have generated by some processing) using the PUT command, they first need to save it into a file on the disk and need to pass the the file path in the PUT command. This involves unnecessary disk I/O for writing and reading back from the disk. In this PR we are providing an interface to directly pass the in-memory object to the command to remove disk I/O

Related issue: #435

Example usage: https://github.com/sreekanth-db/databricks-sql-python/blob/4cd2cee3d793760ebab964191136bbd19b33d399/examples/streaming_put.py (used for manual testing as well)

Unit tests and e2e tests are also included in the changed files

Integration tests successful run: https://github.com/databricks/databricks-sql-python/actions/runs/16444053628

How is this tested?

Unit tests
E2E Tests
Manually
N/A

Related Tickets & Documents

Signed-off-by: Sreekanth Vadigi <[email protected]>

jprakash-db · 2025-08-05T18:21:23Z

examples/streaming_put.py

+import io
+import os
+from databricks import sql
+


nit: We don't define an executable file in example, ref the other examples just a connect and execution part is enough. The main, etc are not needed

jprakash-db · 2025-08-05T18:34:34Z

src/databricks/sql/client.py

+        """
+
+        # Prepare headers
+        http_headers = dict(headers) if headers else {}


This check is redundant, we already are checking and passing a dict, otherwise you can define in the func argument as headers: dict = {} because expectation is to either not pass header or give a dict, never None

jprakash-db · 2025-08-05T18:35:27Z

src/databricks/sql/client.py

+        try:
+            # Stream directly to presigned URL
+            response = requests.put(
+                url=presigned_url,


Plz try to integrated the pre existing http client

databricks-sql-python/src/databricks/sql/common/http.py

Line 44 in 701f7f6

class DatabricksHttpClient:

jprakash-db · 2025-08-05T18:37:36Z

src/databricks/sql/client.py

+            # Check response codes
+            OK = requests.codes.ok  # 200
+            CREATED = requests.codes.created  # 201
+            ACCEPTED = requests.codes.accepted  # 202
+            NO_CONTENT = requests.codes.no_content  # 204
+
+            if response.status_code not in [OK, CREATED, NO_CONTENT, ACCEPTED]:
+                raise OperationalError(
+                    f"Staging operation over HTTP was unsuccessful: {response.status_code}-{response.text}",
+                    session_id_hex=self.connection.get_session_id_hex(),
+                )
+


This part is being repeated in both streaming and the existing flow, can we separated out this in a volume error handing util

jprakash-db · 2025-08-05T18:39:55Z

src/databricks/sql/client.py

+            # Store stream data if provided
+            self._input_stream_data = None
+            if input_stream is not None:
+                # Validate stream has required methods
+                if not hasattr(input_stream, "read"):
+                    raise TypeError(
+                        "input_stream must be a binary stream with read() method"
+                    )


I don't feel we need to be try catching this, any Runtime error such as user providing incorrect data should fail naturally

jprakash-db · 2025-08-05T18:47:39Z

tests/unit/test_streaming_put.py

+    """Unit tests for streaming PUT functionality."""
+
+    def setUp(self):
+        """Set up test fixtures."""


Use pytest fixtures

jprakash-db · 2025-08-05T18:49:49Z

tests/unit/test_streaming_put.py

+            mock_handler.assert_called_once()
+
+            # Verify the finally block cleanup
+            self.assertIsNone(self.cursor._input_stream_data)


Same thoughts as the try catch cleanup in execute

jprakash-db · 2025-08-05T18:55:18Z

src/databricks/sql/client.py

+            if not self._input_stream_data:
+                raise ProgrammingError(
+                    "No input stream provided for streaming operation",
+                    session_id_hex=self.connection.get_session_id_hex(),
+                )


If needed this should be within the handle_staging_put_stream function as this error handling is the concern of that function

jprakash-db · 2025-08-05T18:56:24Z

src/databricks/sql/client.py

+
+            if response.status_code == ACCEPTED:
+                logger.debug(
+                    f"Response code {ACCEPTED} from server indicates upload was accepted "


Can you integrate lazy logging in all the logger logs, plz refer other logging examples

jprakash-db · 2025-08-05T18:57:20Z

src/databricks/sql/client.py

@@ -783,6 +856,7 @@ def execute(
        self,
        operation: str,
        parameters: Optional[TParameterCollection] = None,
+        input_stream: Optional[BinaryIO] = None,


Introducing a new argument should be at the last (after enforce_embedded_schema), otherwise this will break the existing users functions

Signed-off-by: Sreekanth Vadigi <[email protected]>

streaming ingestion support for PUT operation

b79ca86

Signed-off-by: Sreekanth Vadigi <[email protected]>

sreekanth-db had a problem deploying to azure-prod July 21, 2025 13:41 — with GitHub Actions Failure

code formatter

4cd2cee

Signed-off-by: Sreekanth Vadigi <[email protected]>

sreekanth-db had a problem deploying to azure-prod July 21, 2025 13:58 — with GitHub Actions Failure

type error fix

8ae220c

Signed-off-by: Sreekanth Vadigi <[email protected]>

sreekanth-db had a problem deploying to azure-prod July 22, 2025 06:48 — with GitHub Actions Failure

sreekanth-db had a problem deploying to azure-prod July 22, 2025 12:08 — with GitHub Actions Failure

sreekanth-db requested review from jprakash-db and gopalldb July 28, 2025 06:03

jprakash-db reviewed Aug 5, 2025

View reviewed changes

addressing review comments

1563a18

Signed-off-by: Sreekanth Vadigi <[email protected]>

sreekanth-db had a problem deploying to azure-prod August 6, 2025 16:43 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

streaming ingestion support for PUT operation #643

streaming ingestion support for PUT operation #643

sreekanth-db commented Jul 21, 2025 •

edited

Loading

Uh oh!

jprakash-db Aug 5, 2025

Uh oh!

jprakash-db Aug 5, 2025

Uh oh!

jprakash-db Aug 5, 2025

Uh oh!

jprakash-db Aug 5, 2025

Uh oh!

jprakash-db Aug 5, 2025

Uh oh!

jprakash-db Aug 5, 2025

Uh oh!

jprakash-db Aug 5, 2025

Uh oh!

jprakash-db Aug 5, 2025

Uh oh!

jprakash-db Aug 5, 2025

Uh oh!

jprakash-db Aug 5, 2025

Uh oh!

Uh oh!

streaming ingestion support for PUT operation #643

Are you sure you want to change the base?

streaming ingestion support for PUT operation #643

Conversation

sreekanth-db commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

Description

How is this tested?

Related Tickets & Documents

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sreekanth-db commented Jul 21, 2025 •

edited

Loading