-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-27416][SQL]UnsafeMapData & UnsafeArrayData Kryo serialization … #24357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…breaks when two machines have different Oops size
|
I have put Unsafe data class in KryoSerializer.newKryo. Please recheck when available. Thanks |
|
CC @cloud-fan |
| */ | ||
| final class UnsafeDataUtils { | ||
|
|
||
| private UnsafeDataUtils() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use 2-indent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing it out.
| import org.apache.spark.sql.catalyst.util.MapData; | ||
| import org.apache.spark.unsafe.Platform; | ||
|
|
||
| import java.io.Externalizable; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks incorrect import order. org.apache should be last.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/UnsafeMapSuite.scala
Show resolved
Hide resolved
| val valueArray = new UnsafeArrayData() | ||
| Platform.putLong(baseObject, offset + 8 + keyArray.getSizeInBytes, 1) | ||
| valueArray.pointTo(baseObject, offset + 8 + keyArray.getSizeInBytes, keyArraySize) | ||
| valueArray.setLong(0, 19285) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be good to use different values among key and value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I don't think that makes any different, can you please explain a little more?
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We hope so. For testing typical use cases, I think that it is good to allocate a separate array for map and value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I am missing sth, the key/value array are currently different arrays. The value set is the same (0 -> 19285).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using a different value for key and value, not just 19285 both times? that would make a slightly better test. Otherwise you'd possibly miss a weird bug where key and value arrays are swapped
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
ok to test |
| } | ||
| byte[] bytes = new byte[sizeInBytes]; | ||
| Platform.copyMemory(baseObject, baseOffset, bytes, Platform.BYTE_ARRAY_OFFSET, | ||
| sizeInBytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: probably 2-indent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks
|
Test build #104590 has finished for PR 24357 at commit
|
|
retest this please. |
|
Test build #104621 has finished for PR 24357 at commit
|
|
thanks, merging to master! |
Finish the rest work of apache#24317, apache#9030 a. Implement Kryo serialization for UnsafeArrayData b. fix UnsafeMapData Java/Kryo Serialization issue when two machines have different Oops size c. Move the duplicate code "getBytes()" to Utils. According Units has been added & tested Closes apache#24357 from pengbo/SPARK-27416_new. Authored-by: pengbo <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…erialization … This is a Spark 2.4.x backport of #24357 by pengbo. Original description follows below: --- ## What changes were proposed in this pull request? Finish the rest work of #24317, #9030 a. Implement Kryo serialization for UnsafeArrayData b. fix UnsafeMapData Java/Kryo Serialization issue when two machines have different Oops size c. Move the duplicate code "getBytes()" to Utils. ## How was this patch tested? According Units has been added & tested Closes #25223 from JoshRosen/SPARK-27416-2.4. Authored-by: pengbo <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…erialization … This is a Spark 2.4.x backport of apache#24357 by pengbo. Original description follows below: --- ## What changes were proposed in this pull request? Finish the rest work of apache#24317, apache#9030 a. Implement Kryo serialization for UnsafeArrayData b. fix UnsafeMapData Java/Kryo Serialization issue when two machines have different Oops size c. Move the duplicate code "getBytes()" to Utils. ## How was this patch tested? According Units has been added & tested Closes apache#25223 from JoshRosen/SPARK-27416-2.4. Authored-by: pengbo <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…erialization … This is a Spark 2.4.x backport of apache#24357 by pengbo. Original description follows below: --- ## What changes were proposed in this pull request? Finish the rest work of apache#24317, apache#9030 a. Implement Kryo serialization for UnsafeArrayData b. fix UnsafeMapData Java/Kryo Serialization issue when two machines have different Oops size c. Move the duplicate code "getBytes()" to Utils. ## How was this patch tested? According Units has been added & tested Closes apache#25223 from JoshRosen/SPARK-27416-2.4. Authored-by: pengbo <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…erialization This is a Spark 2.3.x backport of a 2.4.x backport of apache#24357 by pengbo. Original description follows below: --- Finish the rest work of apache#24317, apache#9030 a. Implement Kryo serialization for UnsafeArrayData b. fix UnsafeMapData Java/Kryo Serialization issue when two machines have different Oops size c. Move the duplicate code "getBytes()" to Utils. According Units has been added & tested Authored-by: pengbo <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> Signed-off-by: Jason Altekruse <[email protected]>
What changes were proposed in this pull request?
Finish the rest work of #24317, #9030
a. Implement Kryo serialization for UnsafeArrayData
b. fix UnsafeMapData Java/Kryo Serialization issue when two machines have different Oops size
c. Move the duplicate code "getBytes()" to Utils.
How was this patch tested?
According Units has been added & tested