Skip to content

Conversation

@rxin
Copy link
Contributor

@rxin rxin commented Oct 8, 2015

UnsafeRow contains 3 pieces of information when pointing to some data in memory (an object, a base offset, and length). When the row is serialized with Java/Kryo serialization, the object layout in memory can change if two machines have different pointer width (Oops in JVM).

To reproduce, launch Spark using

MASTER=local-cluster[2,1,1024] bin/spark-shell --conf "spark.executor.extraJavaOptions=-XX:-UseCompressedOops"

And then run the following

scala> sql("select 1 xx").collect()

…ifferent Oops size.

The problem is that UnsafeRow contains 3 pieces of information when pointing to some data in memory (an object, a base offset, and length). When the row is serialized with Java/Kryo serialization, the object layout in memory can change if two machines have different pointer width (Oops in JVM).

To reproduce, launch Spark using

MASTER=local-cluster[2,1,1024] bin/spark-shell --conf "spark.executor.extraJavaOptions=-XX:-UseCompressedOops"

And then run the following

scala> sql("select 1 xx").collect()

(cherry picked from commit 157b2a818d3993b1321cc41fb7b30407bd13490b)
Signed-off-by: Reynold Xin <[email protected]>
@rxin
Copy link
Contributor Author

rxin commented Oct 8, 2015

cc @davies and @JoshRosen

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

going to revert this

@davies
Copy link
Contributor

davies commented Oct 8, 2015

LGTM, we have done this for UTF8String already (not support Kryo).

@cloud-fan Should we also do this for UnsafeArrayData?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this just call writeExternal(out)?

@cloud-fan
Copy link
Contributor

I think we should apply this to unsafe array too.

@SparkQA
Copy link

SparkQA commented Oct 8, 2015

Test build #1860 has finished for PR 9030 at commit 9b79e6f.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 8, 2015

Test build #43414 has finished for PR 9030 at commit 9b79e6f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public final class UnsafeRow extends MutableRow implements Externalizable, KryoSerializable

@SparkQA
Copy link

SparkQA commented Oct 8, 2015

Test build #1861 has finished for PR 9030 at commit 9b79e6f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public final class UnsafeRow extends MutableRow implements Externalizable, KryoSerializable

@yhuai
Copy link
Contributor

yhuai commented Oct 8, 2015

test this please

@SparkQA
Copy link

SparkQA commented Oct 9, 2015

Test build #43434 has finished for PR 9030 at commit 9b79e6f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public final class UnsafeRow extends MutableRow implements Externalizable, KryoSerializable

@rxin
Copy link
Contributor Author

rxin commented Oct 9, 2015

Merging this in master & branch-1.5.

asfgit pushed a commit that referenced this pull request Oct 9, 2015
…ifferent Oops size.

UnsafeRow contains 3 pieces of information when pointing to some data in memory (an object, a base offset, and length). When the row is serialized with Java/Kryo serialization, the object layout in memory can change if two machines have different pointer width (Oops in JVM).

To reproduce, launch Spark using

MASTER=local-cluster[2,1,1024] bin/spark-shell --conf "spark.executor.extraJavaOptions=-XX:-UseCompressedOops"

And then run the following

scala> sql("select 1 xx").collect()

Author: Reynold Xin <[email protected]>

Closes #9030 from rxin/SPARK-10914.

(cherry picked from commit 84ea287)
Signed-off-by: Reynold Xin <[email protected]>
@asfgit asfgit closed this in 84ea287 Oct 9, 2015
cloud-fan pushed a commit that referenced this pull request Apr 9, 2019
…ines have different Oops size

## What changes were proposed in this pull request?
ApproxCountDistinctForIntervals holds the UnsafeArrayData data to initialize endpoints. When the UnsafeArrayData is serialized with Java serialization, the BYTE_ARRAY_OFFSET in memory can change if two machines have different pointer width (Oops in JVM).

This PR fixes this issue by using the same way in #9030

## How was this patch tested?
Manual test has been done in our tpcds environment and regarding unit test case has been added as well

Closes #24317 from pengbo/SPARK-27406.

Authored-by: mingbo_pb <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
cloud-fan pushed a commit that referenced this pull request Apr 17, 2019
## What changes were proposed in this pull request?
Finish the rest work of #24317, #9030
a. Implement Kryo serialization for UnsafeArrayData
b. fix UnsafeMapData Java/Kryo Serialization issue when two machines have different Oops size
c. Move the duplicate code "getBytes()" to Utils.

## How was this patch tested?
According Units has been added & tested

Closes #24357 from pengbo/SPARK-27416_new.

Authored-by: pengbo <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
JoshRosen pushed a commit to JoshRosen/spark that referenced this pull request Jul 22, 2019
Finish the rest work of apache#24317, apache#9030
a. Implement Kryo serialization for UnsafeArrayData
b. fix UnsafeMapData Java/Kryo Serialization issue when two machines have different Oops size
c. Move the duplicate code "getBytes()" to Utils.

According Units has been added & tested

Closes apache#24357 from pengbo/SPARK-27416_new.

Authored-by: pengbo <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
dongjoon-hyun pushed a commit that referenced this pull request Jul 22, 2019
…erialization …

This is a Spark 2.4.x backport of #24357 by pengbo. Original description follows below:

---

## What changes were proposed in this pull request?
Finish the rest work of #24317, #9030
a. Implement Kryo serialization for UnsafeArrayData
b. fix UnsafeMapData Java/Kryo Serialization issue when two machines have different Oops size
c. Move the duplicate code "getBytes()" to Utils.

## How was this patch tested?
According Units has been added & tested

Closes #25223 from JoshRosen/SPARK-27416-2.4.

Authored-by: pengbo <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
rluta pushed a commit to rluta/spark that referenced this pull request Sep 17, 2019
…erialization …

This is a Spark 2.4.x backport of apache#24357 by pengbo. Original description follows below:

---

## What changes were proposed in this pull request?
Finish the rest work of apache#24317, apache#9030
a. Implement Kryo serialization for UnsafeArrayData
b. fix UnsafeMapData Java/Kryo Serialization issue when two machines have different Oops size
c. Move the duplicate code "getBytes()" to Utils.

## How was this patch tested?
According Units has been added & tested

Closes apache#25223 from JoshRosen/SPARK-27416-2.4.

Authored-by: pengbo <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Sep 26, 2019
…erialization …

This is a Spark 2.4.x backport of apache#24357 by pengbo. Original description follows below:

---

## What changes were proposed in this pull request?
Finish the rest work of apache#24317, apache#9030
a. Implement Kryo serialization for UnsafeArrayData
b. fix UnsafeMapData Java/Kryo Serialization issue when two machines have different Oops size
c. Move the duplicate code "getBytes()" to Utils.

## How was this patch tested?
According Units has been added & tested

Closes apache#25223 from JoshRosen/SPARK-27416-2.4.

Authored-by: pengbo <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
jaltekruse added a commit to jaltekruse/spark that referenced this pull request Dec 10, 2019
…erialization

This is a Spark 2.3.x backport of a 2.4.x backport of apache#24357 by pengbo. Original description follows below:

---

Finish the rest work of apache#24317, apache#9030
a. Implement Kryo serialization for UnsafeArrayData
b. fix UnsafeMapData Java/Kryo Serialization issue when two machines have different Oops size
c. Move the duplicate code "getBytes()" to Utils.

According Units has been added & tested

Authored-by: pengbo <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

Signed-off-by: Jason Altekruse <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants