Skip to content

Conversation

@liancheng
Copy link
Contributor

This PR adds serialization support for CountMinSketch.

A version number is added to version the serialized binary format.

@SparkQA
Copy link

SparkQA commented Jan 25, 2016

Test build #49967 has finished for PR 10893 at commit e97d7f9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • throw new CountMinSketchMergeException(\"Cannot merge estimator of class \" + other.getClass().getName());
    • public class CountMinSketchMergeException extends Exception

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's put the implementation in CountMinSketchImpl, and simply delegate to a static method there

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also somewhere in CountMinSketchImpl we should document the serialization format.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this out so that we can also use it in bloom filter?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be count min sketch specific, because the two binary protocols can and should evolve separately.

@SparkQA
Copy link

SparkQA commented Jan 25, 2016

Test build #50023 has finished for PR 10893 at commit 9acfe33.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 25, 2016

Test build #50025 has finished for PR 10893 at commit 4636af1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 25, 2016

Test build #50020 has finished for PR 10893 at commit 4abc4e0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we move this comment near writeTo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I couldn't decide whether to put it before writeTo or readFrom, and then ended up here...

@cloud-fan
Copy link
Contributor

LGTM overall

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should add some check here to throw an exception if the version number is not 1.

@rxin
Copy link
Contributor

rxin commented Jan 25, 2016

Going to merge this. Thanks. Please address my (one) feedback in your next pr.

@asfgit asfgit closed this in 6f0f1d9 Jan 25, 2016
@liancheng liancheng deleted the cms-serialization branch January 25, 2016 23:09
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be closed before returning from the method, right ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But DataInputStream.close() also closes the underlying input stream, which isn't our expected behavior, is it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants