-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[Spark-27416][SQL]UnsafeMapData & UnsafeArrayData Kryo serialization breaks when two machines have different Oops size #24340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… when two machines have different Oops size
|
cc @cloud-fan @sandeep-katta @rxin |
|
Can one of the admins verify this patch? |
| /** | ||
| * General utilities available for unsafe data | ||
| */ | ||
| public class UnsafeDataUtils { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Package private? and final and with private constructor for good measure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I prefer to make this class public, as it's "UnsafeDataUtils" which may contain other public utilities. It may turn to be "package private" if it's named UnsafeSerializationUtils.
Your comments will be appreciated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be made public later if it needs to be. Otherwise it becomes something apps can inadvertently depend on in their code. We want to discourage that.
| && baseOffset == Platform.BYTE_ARRAY_OFFSET | ||
| && (((byte[]) baseObject).length == sizeInBytes)) { | ||
| return (byte[]) baseObject; | ||
| } else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that it matters, but if you're changing this, this 'else' is redundant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing that out, feel really happy to reduct one line of code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's really trivial, no need to change it. This is more just about talking about code style preferences
|
@srowen updated as discussed, please recheck when available. thanks |
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK do we need to add these classes to the list that is automatically registered with Kryo by default too?
I am not quite sure what's the main advantage. If that's for performance, I think in this case it makes no difference. |
What changes were proposed in this pull request?
Finish the rest work of #24317, #9030
a. Implement Kryo serialization for UnsafeArrayData
b. fix UnsafeMapData Java/Kryo Serialization issue when two machines have different Oops size
c. Move the duplicate code "getBytes()" to Utils.
How was this patch tested?
According Units has been added & tested