Skip to content

Conversation

madhav-db
Copy link
Contributor

@madhav-db madhav-db commented May 30, 2025

  • Introduced _arrow_pandas_type_override and _arrow_to_pandas_kwargs in Connection class for customizable dtype mapping and DataFrame construction parameters.
  • Updated ResultSet to utilize these new options during conversion from Arrow tables to Pandas DataFrames.
  • Added unit tests to validate the new functionality, including scenarios for type overrides and additional kwargs handling.

What type of PR is this?

  • Refactor
  • Feature
  • Bug Fix
  • Other

Description

How is this tested?

  • Unit tests
  • E2E Tests
  • Manually
  • N/A

Related Tickets & Documents

#578

… kwargs

* Introduced _arrow_pandas_type_override and _arrow_to_pandas_kwargs in Connection class for customizable dtype mapping and DataFrame construction parameters.
* Updated ResultSet to utilize these new options during conversion from Arrow tables to Pandas DataFrames.
* Added unit tests to validate the new functionality, including scenarios for type overrides and additional kwargs handling.
Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

@madhav-db madhav-db changed the title Enhance Arrow to Pandas conversion with type overrides and additional kwargs [#578] Enhance Arrow to Pandas conversion with type overrides and additional kwargs May 30, 2025
Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

@j-bennet
Copy link

@madhav-db Very interested in seeing this pass. 🤞

@vikrantpuppala vikrantpuppala removed their request for review July 8, 2025 03:36
@@ -1361,13 +1370,35 @@ def _convert_arrow_table(self, table):
pyarrow.string(): pandas.StringDtype(),
}

arrow_pandas_type_override = self.connection._arrow_pandas_type_override
if not isinstance(arrow_pandas_type_override, dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if block is not needed, let it fail here itself. Don't want the user to give something incorrect and then everything works. This is a new change and nothing to be backward compatible.
just leave it at this - arrow_pandas_type_override = self.connection._arrow_pandas_type_override

}

arrow_to_pandas_kwargs = self.connection._arrow_to_pandas_kwargs
if isinstance(arrow_to_pandas_kwargs, dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same let it fail when the input format is incorrect. The default python interpreter error of type mismatch is enough

self.assertIsInstance(result_default[0].col_ts, datetime.datetime)


if __name__ == "__main__":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not needed, as we run tests using pytest. Also can you move everything to pytest and remove unittest


@pytest.mark.skipif(pa is None, reason="PyArrow is not installed")
class ArrowConversionTests(unittest.TestCase):
@staticmethod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to use fixtures

conn._arrow_to_pandas_kwargs = {}
return conn

@staticmethod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use fixtures or just normal functions, don't need static methods

self.assertEqual(result[0].col_int, 1.0)
self.assertEqual(result[0].col_str, "a")

def test_convert_arrow_table_to_pandas_kwargs(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too much code duplication. Can you create a parameterized test, where in this mock_connection._arrow_to_pandas_kwargs = {"timestamp_as_object": False} and the assertIsInstance values are parameterized. Otherwise the checking part looks the same and is copied repeatedly

er.is_staging_operation = False
return er

def test_convert_arrow_table_default(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test_convert_arrow_table_deafult, test_convert_arrow_table_disable_pandas and test_convert_arrow_table_type_override are essentially the same test flow just with different arguments. Plz use pytest's parameterized tests for such tests where only arguments change

@jprakash-db
Copy link
Contributor

Plz merge the master once, as the last commit is pretty old

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants