Skip to content

Conversation

@gatorsmile
Copy link
Member

@gatorsmile gatorsmile commented Feb 16, 2018

What changes were proposed in this pull request?

Murmur3 hash generates a different value from the original and other implementations (like Scala standard library and Guava or so) when the length of a bytes array is not multiple of 4.

How was this patch tested?

Added a unit test.

Note: When we merge this PR, please give all the credits to Shintaro Murakami.

Author: Shintaro Murakami [email protected]

@gatorsmile
Copy link
Member Author


public static int hashUnsafeBytes(Object base, long offset, int lengthInBytes, int seed) {
// This is not compatible with original and another implementations.
// But remain it for backward compatibility for the components existing before 2.3.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: touch this up. The sentence is a bit weird, i.e.: However we retain this implementation for backwards compatibility with pre-existing (pre 2.3) components.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will correct it in the follow-up PR.

Assert.assertEquals(-2106506049, hasher.hashLong(Long.MAX_VALUE));
}

// SPARK-23381 Check whether the hash of the byte array is the same as another implementations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add a randomized (disabled) test here. That might increase the confidence we have in the new hash.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we can do it in the follow-up PR.

Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - pending jenkins

@jkbradley
Copy link
Member

The ML changes LGTM. Thanks!

@viirya
Copy link
Member

viirya commented Feb 17, 2018

LGTM

@SparkQA
Copy link

SparkQA commented Feb 17, 2018

Test build #87514 has finished for PR 20630 at commit c406f98.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

Thanks! Merged to master/2.3

asfgit pushed a commit that referenced this pull request Feb 17, 2018
…er implementations

## What changes were proposed in this pull request?
Murmur3 hash generates a different value from the original and other implementations (like Scala standard library and Guava or so) when the length of a bytes array is not multiple of 4.

## How was this patch tested?
Added a unit test.

**Note: When we merge this PR, please give all the credits to Shintaro Murakami.**

Author: Shintaro Murakami <mrkm4ntrgmail.com>

Author: gatorsmile <[email protected]>
Author: Shintaro Murakami <[email protected]>

Closes #20630 from gatorsmile/pr-20568.

(cherry picked from commit d5ed210)
Signed-off-by: gatorsmile <[email protected]>
@asfgit asfgit closed this in d5ed210 Feb 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants