Skip to content

Conversation

@ala
Copy link
Contributor

@ala ala commented Sep 21, 2017

What changes were proposed in this pull request?

OffHeapColumnVector.reserveInternal() will only copy already inserted values during reallocation if data != null. In vectors containing arrays or structs this is incorrect, since there field data is not used at all. We need to check nulls instead.

How was this patch tested?

Adds new tests to ColumnVectorSuite that reproduce the errors.

@ala
Copy link
Contributor Author

ala commented Sep 21, 2017

@hvanhovell

protected void reserveInternal(int newCapacity) {
int oldCapacity = (this.data == 0L) ? 0 : capacity;
if (this.resultArray != null) {
oldCapacity = (this.lengthData == 0L) ? 0 : capacity;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Structs have a similar problem, only nulls is used and data == 0. Should we also fix these here?

A related question, maybe we should use nulls instead of data or length to detect if we are resizing the column or creating a new one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. I'll fix it.

@SparkQA
Copy link

SparkQA commented Sep 21, 2017

Test build #82038 has finished for PR 19308 at commit 8ce2a68.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ala ala changed the title [SPARK-22092] Reallocation in OffHeapColumnVector.reserveInternal corrupts array data [SPARK-22092] Reallocation in OffHeapColumnVector.reserveInternal corrupts struct and array data Sep 22, 2017
@ala
Copy link
Contributor Author

ala commented Sep 22, 2017

@hvanhovell How about this?

Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - pending jenkins

@SparkQA
Copy link

SparkQA commented Sep 22, 2017

Test build #82077 has finished for PR 19308 at commit 0ea5793.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor

LGTM - merging to master. Thanks!

@asfgit asfgit closed this in d2b2932 Sep 22, 2017
@hvanhovell
Copy link
Contributor

@ala Can you backport this one to 2.2?

ala added a commit to ala/spark that referenced this pull request Sep 22, 2017
…rupts struct and array data

`OffHeapColumnVector.reserveInternal()` will only copy already inserted values during reallocation if `data != null`. In vectors containing arrays or structs this is incorrect, since there field `data` is not used at all. We need to check `nulls` instead.

Adds new tests to `ColumnVectorSuite` that reproduce the errors.

Author: Ala Luszczak <[email protected]>

Closes apache#19308 from ala/vector-realloc.

(cherry picked from commit d2b2932)
Signed-off-by: Ala Luszczak <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants