-
Notifications
You must be signed in to change notification settings - Fork 120
Enhance Arrow to Pandas conversion with type overrides and additional kwargs #579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… kwargs * Introduced _arrow_pandas_type_override and _arrow_to_pandas_kwargs in Connection class for customizable dtype mapping and DataFrame construction parameters. * Updated ResultSet to utilize these new options during conversion from Arrow tables to Pandas DataFrames. * Added unit tests to validate the new functionality, including scenarios for type overrides and additional kwargs handling.
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
@madhav-db Very interested in seeing this pass. 🤞 |
@@ -1361,13 +1370,35 @@ def _convert_arrow_table(self, table): | |||
pyarrow.string(): pandas.StringDtype(), | |||
} | |||
|
|||
arrow_pandas_type_override = self.connection._arrow_pandas_type_override | |||
if not isinstance(arrow_pandas_type_override, dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if block is not needed, let it fail here itself. Don't want the user to give something incorrect and then everything works. This is a new change and nothing to be backward compatible.
just leave it at this - arrow_pandas_type_override = self.connection._arrow_pandas_type_override
} | ||
|
||
arrow_to_pandas_kwargs = self.connection._arrow_to_pandas_kwargs | ||
if isinstance(arrow_to_pandas_kwargs, dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same let it fail when the input format is incorrect. The default python interpreter error of type mismatch is enough
self.assertIsInstance(result_default[0].col_ts, datetime.datetime) | ||
|
||
|
||
if __name__ == "__main__": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not needed, as we run tests using pytest. Also can you move everything to pytest and remove unittest
|
||
@pytest.mark.skipif(pa is None, reason="PyArrow is not installed") | ||
class ArrowConversionTests(unittest.TestCase): | ||
@staticmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this to use fixtures
conn._arrow_to_pandas_kwargs = {} | ||
return conn | ||
|
||
@staticmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use fixtures or just normal functions, don't need static methods
self.assertEqual(result[0].col_int, 1.0) | ||
self.assertEqual(result[0].col_str, "a") | ||
|
||
def test_convert_arrow_table_to_pandas_kwargs(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too much code duplication. Can you create a parameterized test, where in this mock_connection._arrow_to_pandas_kwargs = {"timestamp_as_object": False}
and the assertIsInstance values are parameterized. Otherwise the checking part looks the same and is copied repeatedly
er.is_staging_operation = False | ||
return er | ||
|
||
def test_convert_arrow_table_default(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test_convert_arrow_table_deafult, test_convert_arrow_table_disable_pandas and test_convert_arrow_table_type_override are essentially the same test flow just with different arguments. Plz use pytest's parameterized tests for such tests where only arguments change
Plz merge the master once, as the last commit is pretty old |
What type of PR is this?
Description
How is this tested?
Related Tickets & Documents
#578