-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-23381][CORE] Murmur3 hash generates a different value from other implementations #20630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
| public static int hashUnsafeBytes(Object base, long offset, int lengthInBytes, int seed) { | ||
| // This is not compatible with original and another implementations. | ||
| // But remain it for backward compatibility for the components existing before 2.3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: touch this up. The sentence is a bit weird, i.e.: However we retain this implementation for backwards compatibility with pre-existing (pre 2.3) components.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will correct it in the follow-up PR.
| Assert.assertEquals(-2106506049, hasher.hashLong(Long.MAX_VALUE)); | ||
| } | ||
|
|
||
| // SPARK-23381 Check whether the hash of the byte array is the same as another implementations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add a randomized (disabled) test here. That might increase the confidence we have in the new hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we can do it in the follow-up PR.
hvanhovell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - pending jenkins
|
The ML changes LGTM. Thanks! |
|
LGTM |
|
Test build #87514 has finished for PR 20630 at commit
|
|
Thanks! Merged to master/2.3 |
…er implementations ## What changes were proposed in this pull request? Murmur3 hash generates a different value from the original and other implementations (like Scala standard library and Guava or so) when the length of a bytes array is not multiple of 4. ## How was this patch tested? Added a unit test. **Note: When we merge this PR, please give all the credits to Shintaro Murakami.** Author: Shintaro Murakami <mrkm4ntrgmail.com> Author: gatorsmile <[email protected]> Author: Shintaro Murakami <[email protected]> Closes #20630 from gatorsmile/pr-20568. (cherry picked from commit d5ed210) Signed-off-by: gatorsmile <[email protected]>
What changes were proposed in this pull request?
Murmur3 hash generates a different value from the original and other implementations (like Scala standard library and Guava or so) when the length of a bytes array is not multiple of 4.
How was this patch tested?
Added a unit test.
Note: When we merge this PR, please give all the credits to Shintaro Murakami.
Author: Shintaro Murakami [email protected]