-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-22092] Reallocation in OffHeapColumnVector.reserveInternal corrupts struct and array data #19308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| protected void reserveInternal(int newCapacity) { | ||
| int oldCapacity = (this.data == 0L) ? 0 : capacity; | ||
| if (this.resultArray != null) { | ||
| oldCapacity = (this.lengthData == 0L) ? 0 : capacity; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Structs have a similar problem, only nulls is used and data == 0. Should we also fix these here?
A related question, maybe we should use nulls instead of data or length to detect if we are resizing the column or creating a new one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point. I'll fix it.
|
Test build #82038 has finished for PR 19308 at commit
|
|
@hvanhovell How about this? |
hvanhovell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - pending jenkins
|
Test build #82077 has finished for PR 19308 at commit
|
|
LGTM - merging to master. Thanks! |
|
@ala Can you backport this one to 2.2? |
…rupts struct and array data `OffHeapColumnVector.reserveInternal()` will only copy already inserted values during reallocation if `data != null`. In vectors containing arrays or structs this is incorrect, since there field `data` is not used at all. We need to check `nulls` instead. Adds new tests to `ColumnVectorSuite` that reproduce the errors. Author: Ala Luszczak <[email protected]> Closes apache#19308 from ala/vector-realloc. (cherry picked from commit d2b2932) Signed-off-by: Ala Luszczak <[email protected]>
What changes were proposed in this pull request?
OffHeapColumnVector.reserveInternal()will only copy already inserted values during reallocation ifdata != null. In vectors containing arrays or structs this is incorrect, since there fielddatais not used at all. We need to checknullsinstead.How was this patch tested?
Adds new tests to
ColumnVectorSuitethat reproduce the errors.