-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-12937][SQL] bloom filter serialization #10920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bloom filter and count-min sketch can have different version values, but we can share same version class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should move it back, because:
- The version enum is actually the best place to document the binary protocol.
- This will be really confusing when bloomfilter has v2 and yet count-min sketch has only v1.
- The amount of code duplication you save is teeny (actually you probably added more loc by having an apache licensing header).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @liancheng on point 1 - the best place to document the binary protocol is in Version!
|
cc @rxin @liancheng |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also include the current value
|
Test build #50081 has finished for PR 10920 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @liancheng , I removed this as the design doc says users should not care about the version being used.
|
Test build #50084 has finished for PR 10920 at commit
|
|
Test build #50086 has finished for PR 10920 at commit
|
|
I'm going to merge this first. Please move the num hash function thing in your next pr. Thanks. |
This PR adds serialization support for BloomFilter.
A version number is added to version the serialized binary format.