-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-20474] Fixing OnHeapColumnVector reallocation #17773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
ok to test |
|
Test build #76184 has finished for PR 17773 at commit
|
| if (this.arrayLengths != null) { | ||
| System.arraycopy(this.arrayLengths, 0, newLengths, 0, elementsAppended); | ||
| System.arraycopy(this.arrayOffsets, 0, newOffsets, 0, elementsAppended); | ||
| System.arraycopy(this.arrayLengths, 0, newLengths, 0, capacity); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch. Do we also need to fix reserveInternal in OffHeapColumnVector? Additionally, after this change, do we even need elementsAppended anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elementsAppended is necessary to keep the tail position by append<TYPE>().
|
add to whitelist |
1 similar comment
|
add to whitelist |
|
Test build #76191 has finished for PR 17773 at commit
|
|
Test build #76192 has finished for PR 17773 at commit
|
|
Merging in master/branch-2.2. |
## What changes were proposed in this pull request? OnHeapColumnVector reallocation copies to the new storage data up to 'elementsAppended'. This variable is only updated when using the ColumnVector.appendX API, while ColumnVector.putX is more commonly used. ## How was this patch tested? Tested using existing unit tests. Author: Michal Szafranski <[email protected]> Closes #17773 from michal-databricks/spark-20474. (cherry picked from commit a277ae8) Signed-off-by: Reynold Xin <[email protected]>
|
Do we need similar changes for |
|
Actually yes, I missed it because |
|
Yes, I think it should see |
## What changes were proposed in this pull request? As #17773 revealed `OnHeapColumnVector` may copy a part of the original storage. `OffHeapColumnVector` reallocation also copies to the new storage data up to 'elementsAppended'. This variable is only updated when using the `ColumnVector.appendX` API, while `ColumnVector.putX` is more commonly used. This PR copies the new storage data up to the previously-allocated size in`OffHeapColumnVector`. ## How was this patch tested? Existing test suites Author: Kazuaki Ishizaki <[email protected]> Closes #17811 from kiszk/SPARK-20537. (cherry picked from commit afb21bf) Signed-off-by: Wenchen Fan <[email protected]>
## What changes were proposed in this pull request? As #17773 revealed `OnHeapColumnVector` may copy a part of the original storage. `OffHeapColumnVector` reallocation also copies to the new storage data up to 'elementsAppended'. This variable is only updated when using the `ColumnVector.appendX` API, while `ColumnVector.putX` is more commonly used. This PR copies the new storage data up to the previously-allocated size in`OffHeapColumnVector`. ## How was this patch tested? Existing test suites Author: Kazuaki Ishizaki <[email protected]> Closes #17811 from kiszk/SPARK-20537.
What changes were proposed in this pull request?
OnHeapColumnVector reallocation copies to the new storage data up to 'elementsAppended'. This variable is only updated when using the ColumnVector.appendX API, while ColumnVector.putX is more commonly used.
How was this patch tested?
Tested using existing unit tests.