Skip to content

Conversation

@pengbo
Copy link
Contributor

@pengbo pengbo commented Apr 12, 2019

What changes were proposed in this pull request?

Finish the rest work of #24317, #9030
a. Implement Kryo serialization for UnsafeArrayData
b. fix UnsafeMapData Java/Kryo Serialization issue when two machines have different Oops size
c. Move the duplicate code "getBytes()" to Utils.

How was this patch tested?

According Units has been added & tested

…breaks when two machines have different Oops size
@pengbo
Copy link
Contributor Author

pengbo commented Apr 12, 2019

@srowen

I have put Unsafe data class in KryoSerializer.newKryo.
I reopen the PR with only 1 commit for further discussion.

Please recheck when available. Thanks

@srowen
Copy link
Member

srowen commented Apr 12, 2019

CC @cloud-fan

*/
final class UnsafeDataUtils {

private UnsafeDataUtils() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use 2-indent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing it out.

import org.apache.spark.sql.catalyst.util.MapData;
import org.apache.spark.unsafe.Platform;

import java.io.Externalizable;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks incorrect import order. org.apache should be last.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

val valueArray = new UnsafeArrayData()
Platform.putLong(baseObject, offset + 8 + keyArray.getSizeInBytes, 1)
valueArray.pointTo(baseObject, offset + 8 + keyArray.getSizeInBytes, keyArraySize)
valueArray.setLong(0, 19285)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be good to use different values among key and value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I don't think that makes any different, can you please explain a little more?
Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We hope so. For testing typical use cases, I think that it is good to allocate a separate array for map and value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I am missing sth, the key/value array are currently different arrays. The value set is the same (0 -> 19285).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using a different value for key and value, not just 19285 both times? that would make a slightly better test. Otherwise you'd possibly miss a weird bug where key and value arrays are swapped

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Thanks @kiszk @srowen

@cloud-fan
Copy link
Contributor

ok to test

}
byte[] bytes = new byte[sizeInBytes];
Platform.copyMemory(baseObject, baseOffset, bytes, Platform.BYTE_ARRAY_OFFSET,
sizeInBytes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: probably 2-indent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks

@SparkQA
Copy link

SparkQA commented Apr 15, 2019

Test build #104590 has finished for PR 24357 at commit ca2bf06.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@pengbo
Copy link
Contributor Author

pengbo commented Apr 16, 2019

retest this please.

@SparkQA
Copy link

SparkQA commented Apr 16, 2019

Test build #104621 has finished for PR 24357 at commit 357144c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan cloud-fan changed the title [Spark-27416][SQL]UnsafeMapData & UnsafeArrayData Kryo serialization … [SPARK-27416][SQL]UnsafeMapData & UnsafeArrayData Kryo serialization … Apr 16, 2019
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 54b0d1e Apr 17, 2019
JoshRosen pushed a commit to JoshRosen/spark that referenced this pull request Jul 22, 2019
Finish the rest work of apache#24317, apache#9030
a. Implement Kryo serialization for UnsafeArrayData
b. fix UnsafeMapData Java/Kryo Serialization issue when two machines have different Oops size
c. Move the duplicate code "getBytes()" to Utils.

According Units has been added & tested

Closes apache#24357 from pengbo/SPARK-27416_new.

Authored-by: pengbo <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
dongjoon-hyun pushed a commit that referenced this pull request Jul 22, 2019
…erialization …

This is a Spark 2.4.x backport of #24357 by pengbo. Original description follows below:

---

## What changes were proposed in this pull request?
Finish the rest work of #24317, #9030
a. Implement Kryo serialization for UnsafeArrayData
b. fix UnsafeMapData Java/Kryo Serialization issue when two machines have different Oops size
c. Move the duplicate code "getBytes()" to Utils.

## How was this patch tested?
According Units has been added & tested

Closes #25223 from JoshRosen/SPARK-27416-2.4.

Authored-by: pengbo <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
rluta pushed a commit to rluta/spark that referenced this pull request Sep 17, 2019
…erialization …

This is a Spark 2.4.x backport of apache#24357 by pengbo. Original description follows below:

---

## What changes were proposed in this pull request?
Finish the rest work of apache#24317, apache#9030
a. Implement Kryo serialization for UnsafeArrayData
b. fix UnsafeMapData Java/Kryo Serialization issue when two machines have different Oops size
c. Move the duplicate code "getBytes()" to Utils.

## How was this patch tested?
According Units has been added & tested

Closes apache#25223 from JoshRosen/SPARK-27416-2.4.

Authored-by: pengbo <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Sep 26, 2019
…erialization …

This is a Spark 2.4.x backport of apache#24357 by pengbo. Original description follows below:

---

## What changes were proposed in this pull request?
Finish the rest work of apache#24317, apache#9030
a. Implement Kryo serialization for UnsafeArrayData
b. fix UnsafeMapData Java/Kryo Serialization issue when two machines have different Oops size
c. Move the duplicate code "getBytes()" to Utils.

## How was this patch tested?
According Units has been added & tested

Closes apache#25223 from JoshRosen/SPARK-27416-2.4.

Authored-by: pengbo <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
jaltekruse added a commit to jaltekruse/spark that referenced this pull request Dec 10, 2019
…erialization

This is a Spark 2.3.x backport of a 2.4.x backport of apache#24357 by pengbo. Original description follows below:

---

Finish the rest work of apache#24317, apache#9030
a. Implement Kryo serialization for UnsafeArrayData
b. fix UnsafeMapData Java/Kryo Serialization issue when two machines have different Oops size
c. Move the duplicate code "getBytes()" to Utils.

According Units has been added & tested

Authored-by: pengbo <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

Signed-off-by: Jason Altekruse <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants