-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Add support for arrow iterable when concatenating or interleaving #7771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for arrow iterable when concatenating or interleaving #7771
Conversation
Seeing the following numbers on the script shared in the original issue. (MacBook Pro M4)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool ! Love the speed up :)
Note that __iter__
is used to iterate on python objects though, not pyarrow tables.
Would it be possible to take that into account and implement proper _iter_arrow
methods when necessary instead of using __iter__
, as in your first proposition in the original issue ?
You can probably still minimize the duplication between iter and iter_arrow methods by making __iter__
and _iter_arrow
share some code.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@lhoestq I've implemented the iteration on arrow as separate methods, can you take another look/trigger ci? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome ! lgtm :)
@lhoestq Any idea why the integration tests are failing, is this expected? Anything I can do on my side? |
They seem unrelated to your changes. Merging :) |
Fixes a case when concatenating or interleaving datasets with
with_format(...)
call was slower.Details here: #6637
@lhoestq I tried to minimize the duplication between iter and iter_arrow methods, not sure if this is against the design, can separate those if needed.