Skip to content

Conversation

@polyfractal
Copy link
Contributor

This PR consolidates HDR/TDigest factories into a single factory, and consolidates most of the HDR/TDigest builder into an abstract builder. The only difference in percentiles/ranks builder is the parse field for the percents or values field. It does this by introduce a new PercentileMethod.Config object which carries both algo method and algo-specific settings. This has several advantages, but a few disadvantages too.

I'm about 70/30 on this PR. I like the refactor, but it turned a bit more verbose than my liking. Open to suggestions, or just shelving this PR and ignoring the slightly trapping percentiles behavior today as a "wont-fix" due to complexity of changing the API.

Pros

  • Builders and factories share most code now, much less c/p. Some of this is already merged in the VS refactor too
  • Single object to hold method and algo-specific settings (again, already in VS refactor)
  • Consistent behavior between ranks and percentiles. Before Percentiles would reject a request with hdr and tdigest, while ranks would accept it and just use the last-parsed object
  • Disallows java users from configuring a setting that will be ignored (e.g. configuring compression on an HDRHisto, which previously would be silently ignored)
  • Removes the one-off private "InternalBuilder' on percentiles

Cons

  • Breaking-java: we now throw an exception if the user configures cross-algo settings. This is, however, a client-side exception so it should be caught with simple unit tests.
  • To maintain BWC, there are some extra gymnastics (method(), compression(), numSigFig() are retained but deprecated, default handling from old semantics is retained but a little tricky, etc)
  • PercentileMethod.Config is a bit of a strange arrangement, but we have to keep the Method enum around for BWC so it seemed reasonable to put the new config object there too. It's also a bit "bloated" for essentially a method + single setting.

- Consolidates HDR/TDigest factories into a single factory
- Consolidates most HDR/TDigest builder into an abstract builder
- Deprecates method(), compression(), numSigFig() in favor of a new
unified PercentileMethod.Config object
- Disallows setting algo options that don't apply to current algo

The unified config method carries both the method and algo-specific
setting. This provides a mechanism to reject settings that apply
to the wrong algorithm.  For BWC the old methods are retained
but marked as deprecated, and can be removed in future versions.
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

Copy link
Member

@not-napoleon not-napoleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned here and discusses in Slack, I'd really like to see the switching on enum logic refactored into enum methods, or at least document why we don't want to do that.

Someone besides me should look at the parser changes. I read them, but I don't know the parser well enough to feel confidant saying that's the right way to do it.

/**
* Return the current algo configuration, or a default (Tdigest) otherwise
*
* This is needed because builders don't have a "build" or "finalize" method, but
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed because builders don't have a "build"...method

😢 😢 😢

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😭


public static Config fromStream(StreamInput in) throws IOException {
PercentilesMethod method = PercentilesMethod.readFromStream(in);
if (method.equals(PercentilesMethod.TDIGEST)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like this if(method.equals(TDIGEST) {...} else if (method.equals(HDR) {...} else { throw } pattern. We use it in a lot of places, and if we ever add another percentile method, we'll have to find and touch each of them. Plus, the throw is always unreachable now, since we'd have already thrown trying to create a bogus enum instance.

In this case, it's pretty trivial to put a method on the enum that takes a StreamInput and returns a Config, and I think we should do that. We should at least think about applying this logic to the other places we have this type of construct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a change that takes care of the factory/builder instances of these as discussed on slack. I'm not sure this particular one is easily solved though. Note that this is a static method on the abstract Config object, one level "inside" the PercentilesMethod enum

The higher-level PercentilesMethod has a readFromStream() which is in use today to serialize the enum (as well as both compression and numSigFig in the builder). The builder already has to incorporate version checks for wire BWC, so I was hesitant to lump the Config serialization into the existing PercentilesMethod because it would also need version checks.

(Also, conceptually we're in a slightly odd place where the "new way" is to serialize the Config object which holds a PercentileMethod enum, so asking the enum to deserialize a Config seems incorrect. But that's more of an artifact of the current hierarchy which should be cleaner once things are removed post-deprecation)

Otherwise, I'm happy to move this up a level... but I think we'll still need these if/else at this point in the code. Since this isn't a NamedWriteable style situation, we have to inspect the serialized method to determine what kind of object needs to be built next. I don't see any way around that?

@polyfractal
Copy link
Contributor Author

Thanks @not-napoleon. Looks like i have some json parsing issues anyhow, based on the test failures.

@nik9000
Copy link
Member

nik9000 commented Feb 6, 2020

Someone besides me should look at the parser changes. I read them, but I don't know the parser well enough to feel confidant saying that's the right way to do it.

I'll take a look soon. Sorry for the delay.

builder = configOrDefault().toXContent(builder, params);
return builder;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like there should be some parsing code around here somewhere.

double[] values = ((List<Double>) objects[0]).stream().mapToDouble(Double::doubleValue).toArray();
PercentilesMethod.Config percentilesConfig;

if (objects[1] != null && objects[2] != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have a method to et the config method you could use the flavor of ObjectParser that takes a categoryClass and parse these two an named xcontent. I don't believe that flavor is supported by ContructingObjectParser hough. This'll do the trick, but it is less nice.


PARSER.declareDoubleArray(ConstructingObjectParser.constructorArg(), PERCENTS_FIELD);
PARSER.declareBoolean(PercentilesAggregationBuilder::keyed, KEYED_FIELD);
PARSER.declareObject(ConstructingObjectParser.optionalConstructorArg(), PercentilesMethod.TDIGEST_PARSER,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have something in the superclass that helps with the parsing. I'm not sure what though.

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do really like the change, even though BWC is tricky!

@polyfractal
Copy link
Contributor Author

Pushed some test fixes, addressed some review comments, and a bit of fiddling with how things work.

The config object is now a top-level class. Wasn't liking how it was an internal class, and this should make deprecating easier since the PercentileMethod can become an internal, private thing eventually

Adjusted how the parsers work, so now the bulk of it is handled by the super-class. Had to settle on a not-ideal ConstructingObjectParser situation (described in comments) due to the differences in Percentiles and Ranks, but I don't think it is too bad.

The exception message changed slightly, and rather than make
two tests with version gates we can just look for a 4xx exception.
Copy link
Member

@not-napoleon not-napoleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, once CI is green.

* This is a compromise, in that it is a ConstructingOP which accepts all optional arguments, and then we sort
* out the behavior from there
*/
ConstructingObjectParser<T, String> parser = new ConstructingObjectParser<>(aggName, false, (objects, name) -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we just add something to the javadoc for what we expect the contents of objects to be? Especially, it looks like we're passing in two configs, at most one of which should be not null, but it's not clear if order matters for them.

* a single unified constructor for both algos, but internally switch execution
* depending on which algo is selected
*/
public abstract class PercentilesConfig implements ToXContent, Writeable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how this looks as a top level. Makes more sense this way.

@polyfractal
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/1

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few small things but LGTM regardless.

@polyfractal
Copy link
Contributor Author

Thanks @not-napoleon @nik9000 for the reviews! <3

@polyfractal polyfractal merged commit ba9c4fb into elastic:master Feb 11, 2020
polyfractal added a commit to polyfractal/elasticsearch that referenced this pull request Mar 31, 2020
…c#51887)

- Consolidates HDR/TDigest factories into a single factory
- Consolidates most HDR/TDigest builder into an abstract builder
- Deprecates method(), compression(), numSigFig() in favor of a new
unified PercentileConfig object
- Disallows setting algo options that don't apply to current algo

The unified config method carries both the method and algo-specific
setting. This provides a mechanism to reject settings that apply
to the wrong algorithm.  For BWC the old methods are retained
but marked as deprecated, and can be removed in future versions.

Co-authored-by: Mark Tozzi <[email protected]>
polyfractal added a commit that referenced this pull request Apr 2, 2020
#54537)

- Consolidates HDR/TDigest factories into a single factory
- Consolidates most HDR/TDigest builder into an abstract builder
- Deprecates method(), compression(), numSigFig() in favor of a new
unified PercentileConfig object
- Disallows setting algo options that don't apply to current algo

The unified config method carries both the method and algo-specific
setting. This provides a mechanism to reject settings that apply
to the wrong algorithm.  For BWC the old methods are retained
but marked as deprecated, and can be removed in future versions.

Co-authored-by: Mark Tozzi <[email protected]>

Co-authored-by: Mark Tozzi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants