-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Fix errors when reading nested Lists with pushdown predicates. #8866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c5f8afe to
0956202
Compare
| } | ||
|
|
||
| #[tokio::test] | ||
| async fn test_nested_lists() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is directly adapted from the reproducer provided by @lewiszlw on
| } else { | ||
| Some(ProjectionMask::leaves(schema, included_leaves)) | ||
| } | ||
| mask.without_nested_types(self.metadata.file_metadata().schema_descr()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the logic into ProjectionMask
| if self.leaf_included(leaf_idx) { | ||
| let root = schema.get_column_root(leaf_idx); | ||
| let root_idx = schema.get_column_root_idx(leaf_idx); | ||
| if root_leaf_counts[root_idx] == 1 && !root.is_list() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only code difference here is to add the check for !root.is_list()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense to me!
| #[test] | ||
| fn test_mask_from_column_names() { | ||
| let message_type = " | ||
| let schema = parse_schema( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just reduced some duplication
| } | ||
|
|
||
| #[test] | ||
| fn test_projection_mask_without_nested_list() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test fails without the code change
|
FYI @XiangpengHao |
etseidl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, nice sleuthing @alamb!
|
|
||
| writer.close().await?; | ||
|
|
||
| println!("Parquet file written successfully!"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes good call! That was (also) left over from the test 🤦
|
I plan to merge this tomorrow in case @lewiszlw would also like a chance to review |
|
Thank you for the fix! I added this patch to parquet 56 and verified in my project, everything works fine. |
|
Thank you @etseidl @XiangpengHao and @lewiszlw -- sorry this one took so long |
Note this PR contains a 1 line fix, and the rest is tests , comments, and reorganization to support the tests
Which issue does this PR close?
error: item_reader def levels are Nonewhen reading nested field with row filter #8657Rationale for this change
#8657 is a regression
The check for "is this column nested" did not work correctly for Lists in Parquet, due to the somewhat wacky way Lists are encoded
What changes are included in this PR?
ProjectionMask::without_nested_types, mostly so I could write better tests for itAre these changes tested?
Yes, both the reproducer from #8657 and a bunch of tests are added
Are there any user-facing changes?
There is a new API