Skip to content

Conversation

@michal-databricks
Copy link
Contributor

What changes were proposed in this pull request?

OnHeapColumnVector reallocation copies to the new storage data up to 'elementsAppended'. This variable is only updated when using the ColumnVector.appendX API, while ColumnVector.putX is more commonly used.

How was this patch tested?

Tested using existing unit tests.

@hvanhovell
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Apr 26, 2017

Test build #76184 has finished for PR 17773 at commit 3914766.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

if (this.arrayLengths != null) {
System.arraycopy(this.arrayLengths, 0, newLengths, 0, elementsAppended);
System.arraycopy(this.arrayOffsets, 0, newOffsets, 0, elementsAppended);
System.arraycopy(this.arrayLengths, 0, newLengths, 0, capacity);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. Do we also need to fix reserveInternal in OffHeapColumnVector? Additionally, after this change, do we even need elementsAppended anymore?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elementsAppended is necessary to keep the tail position by append<TYPE>().

@sameeragarwal
Copy link
Member

add to whitelist

1 similar comment
@gatorsmile
Copy link
Member

add to whitelist

@SparkQA
Copy link

SparkQA commented Apr 26, 2017

Test build #76191 has finished for PR 17773 at commit 3914766.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 26, 2017

Test build #76192 has finished for PR 17773 at commit 3914766.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Apr 26, 2017

Merging in master/branch-2.2.

asfgit pushed a commit that referenced this pull request Apr 26, 2017
## What changes were proposed in this pull request?
OnHeapColumnVector reallocation copies to the new storage data up to 'elementsAppended'. This variable is only updated when using the ColumnVector.appendX API, while ColumnVector.putX is more commonly used.

## How was this patch tested?
Tested using existing unit tests.

Author: Michal Szafranski <[email protected]>

Closes #17773 from michal-databricks/spark-20474.

(cherry picked from commit a277ae8)
Signed-off-by: Reynold Xin <[email protected]>
@asfgit asfgit closed this in a277ae8 Apr 26, 2017
@kiszk
Copy link
Member

kiszk commented Apr 27, 2017

Do we need similar changes for OffHeapColumnVector?

@michal-databricks
Copy link
Contributor Author

Actually yes, I missed it becauseOffHeapColumnVector uses Platform.reallocateMemory, but this still has a parameter to indicate how much to copy (and it is currently based on elementsAppended.

@kiszk
Copy link
Member

kiszk commented Apr 27, 2017

Yes, I think it should see capacity. I will create a PR tomorrow for OffHeapColumnVector.

asfgit pushed a commit that referenced this pull request May 2, 2017
## What changes were proposed in this pull request?

As #17773 revealed `OnHeapColumnVector` may copy a part of the original storage.

`OffHeapColumnVector` reallocation also copies to the new storage data up to 'elementsAppended'. This variable is only updated when using the `ColumnVector.appendX` API, while `ColumnVector.putX` is more commonly used.
This PR copies the new storage data up to the previously-allocated size in`OffHeapColumnVector`.

## How was this patch tested?

Existing test suites

Author: Kazuaki Ishizaki <[email protected]>

Closes #17811 from kiszk/SPARK-20537.

(cherry picked from commit afb21bf)
Signed-off-by: Wenchen Fan <[email protected]>
asfgit pushed a commit that referenced this pull request May 2, 2017
## What changes were proposed in this pull request?

As #17773 revealed `OnHeapColumnVector` may copy a part of the original storage.

`OffHeapColumnVector` reallocation also copies to the new storage data up to 'elementsAppended'. This variable is only updated when using the `ColumnVector.appendX` API, while `ColumnVector.putX` is more commonly used.
This PR copies the new storage data up to the previously-allocated size in`OffHeapColumnVector`.

## How was this patch tested?

Existing test suites

Author: Kazuaki Ishizaki <[email protected]>

Closes #17811 from kiszk/SPARK-20537.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants