-
Notifications
You must be signed in to change notification settings - Fork 1k
Fix RowNumberReader when not all row groups are selected
#8863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…r, not according to the ordinal order
| /// # Ok(()) | ||
| /// # } | ||
| /// ``` | ||
| pub fn with_virtual_columns(self, virtual_columns: Vec<FieldRef>) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also made this comment in the original PR, but maybe we could change this to
pub fn set_virtual_columns(&mut self, virtual_columns: Vec<FieldRef>) -> Result<()> {to avoid the panic. If we squeak this in before 57.1.0 ships it wouldn't be an API change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attaching a link for easier discussion: #8715 (comment)
Not having an ability to chain seems a bit degrading, though users can just use ? operator for that (in which case it just panics, but that's fine, it's a choice).
Having a proper builder would be nice, but it's then definitely an API change, and a bit less quick fix for the main thing that this PR is intended to solve.
I'm fine with changing to Result(). Would you prefer that? Are others on board?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is Result<Self> bad, sorry? Call it try_with_xxx if naming conventions are the problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I was upthumbing
Not having an ability to chain seems a bit degrading, though users can just use ? operator for that
Meaning I'd be on board with Result<Self>.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with changing to Result(). Would you prefer that? Are others on board?
I agree that sounds better to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ) | ||
| })?; | ||
|
|
||
| let offset = ordinal_to_offset.get(&ordinal).ok_or_else(|| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tiny nit, to simplify L68 below?
| let offset = ordinal_to_offset.get(&ordinal).ok_or_else(|| { | |
| let offset = *ordinal_to_offset.get(&ordinal).ok_or_else(|| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @vustef -- I think this is a nice fix. I also wrote another "end to end" style test (passing in row numbers to the reader options) which I will push directly to this branch
Thank you @scovich for finding this bug before release 🙏
Given the suggestions from @etseidl and @scovich about avoiding panics, which seems reasonable, I will try and make a follow on PR to do that, as I think this one stands alone as is (no reason to also include the API change there too)
Update: PR is
RowNumberReaderRowNumberReader when not all row groups are selected
Which issue does this PR close?
Closes #8864.
Rationale for this change
#8715 introduced row numbers feature last week. However, it had a bug, which luckily for us @scovich pointed out soon after the merge. The issue is that the row numbers are produced in ordinal-based order of the row groups, instead of user-requested order of row groups. The former is wrong, and is being fixed here by switching to user-requested order.
What changes are included in this PR?
Just fixing the bug as explained above, and adding test. Also addressing two small comments from post-merge review: #8715 (review)
Are these changes tested?
Yes.
Are there any user-facing changes?
No, this wasn't released yet.