-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Refactor Percentiles/Ranks aggregation builders and factories #51887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Percentiles/Ranks aggregation builders and factories #51887
Conversation
- Consolidates HDR/TDigest factories into a single factory - Consolidates most HDR/TDigest builder into an abstract builder - Deprecates method(), compression(), numSigFig() in favor of a new unified PercentileMethod.Config object - Disallows setting algo options that don't apply to current algo The unified config method carries both the method and algo-specific setting. This provides a mechanism to reject settings that apply to the wrong algorithm. For BWC the old methods are retained but marked as deprecated, and can be removed in future versions.
|
Pinging @elastic/es-analytics-geo (:Analytics/Aggregations) |
not-napoleon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned here and discusses in Slack, I'd really like to see the switching on enum logic refactored into enum methods, or at least document why we don't want to do that.
Someone besides me should look at the parser changes. I read them, but I don't know the parser well enough to feel confidant saying that's the right way to do it.
...ava/org/elasticsearch/search/aggregations/metrics/AbstractPercentilesAggregationBuilder.java
Outdated
Show resolved
Hide resolved
| /** | ||
| * Return the current algo configuration, or a default (Tdigest) otherwise | ||
| * | ||
| * This is needed because builders don't have a "build" or "finalize" method, but |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed because builders don't have a "build"...method
😢 😢 😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😭
...ava/org/elasticsearch/search/aggregations/metrics/AbstractPercentilesAggregationBuilder.java
Outdated
Show resolved
Hide resolved
...ain/java/org/elasticsearch/search/aggregations/metrics/PercentileRanksAggregatorFactory.java
Outdated
Show resolved
Hide resolved
|
|
||
| public static Config fromStream(StreamInput in) throws IOException { | ||
| PercentilesMethod method = PercentilesMethod.readFromStream(in); | ||
| if (method.equals(PercentilesMethod.TDIGEST)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't like this if(method.equals(TDIGEST) {...} else if (method.equals(HDR) {...} else { throw } pattern. We use it in a lot of places, and if we ever add another percentile method, we'll have to find and touch each of them. Plus, the throw is always unreachable now, since we'd have already thrown trying to create a bogus enum instance.
In this case, it's pretty trivial to put a method on the enum that takes a StreamInput and returns a Config, and I think we should do that. We should at least think about applying this logic to the other places we have this type of construct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed a change that takes care of the factory/builder instances of these as discussed on slack. I'm not sure this particular one is easily solved though. Note that this is a static method on the abstract Config object, one level "inside" the PercentilesMethod enum
The higher-level PercentilesMethod has a readFromStream() which is in use today to serialize the enum (as well as both compression and numSigFig in the builder). The builder already has to incorporate version checks for wire BWC, so I was hesitant to lump the Config serialization into the existing PercentilesMethod because it would also need version checks.
(Also, conceptually we're in a slightly odd place where the "new way" is to serialize the Config object which holds a PercentileMethod enum, so asking the enum to deserialize a Config seems incorrect. But that's more of an artifact of the current hierarchy which should be cleaner once things are removed post-deprecation)
Otherwise, I'm happy to move this up a level... but I think we'll still need these if/else at this point in the code. Since this isn't a NamedWriteable style situation, we have to inspect the serialized method to determine what kind of object needs to be built next. I don't see any way around that?
|
Thanks @not-napoleon. Looks like i have some json parsing issues anyhow, based on the test failures. |
I'll take a look soon. Sorry for the delay. |
| builder = configOrDefault().toXContent(builder, params); | ||
| return builder; | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like there should be some parsing code around here somewhere.
| double[] values = ((List<Double>) objects[0]).stream().mapToDouble(Double::doubleValue).toArray(); | ||
| PercentilesMethod.Config percentilesConfig; | ||
|
|
||
| if (objects[1] != null && objects[2] != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have a method to et the config method you could use the flavor of ObjectParser that takes a categoryClass and parse these two an named xcontent. I don't believe that flavor is supported by ContructingObjectParser hough. This'll do the trick, but it is less nice.
|
|
||
| PARSER.declareDoubleArray(ConstructingObjectParser.constructorArg(), PERCENTS_FIELD); | ||
| PARSER.declareBoolean(PercentilesAggregationBuilder::keyed, KEYED_FIELD); | ||
| PARSER.declareObject(ConstructingObjectParser.optionalConstructorArg(), PercentilesMethod.TDIGEST_PARSER, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have something in the superclass that helps with the parsing. I'm not sure what though.
nik9000
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do really like the change, even though BWC is tricky!
|
Pushed some test fixes, addressed some review comments, and a bit of fiddling with how things work. The config object is now a top-level class. Wasn't liking how it was an internal class, and this should make deprecating easier since the Adjusted how the parsers work, so now the bulk of it is handled by the super-class. Had to settle on a not-ideal ConstructingObjectParser situation (described in comments) due to the differences in Percentiles and Ranks, but I don't think it is too bad. |
The exception message changed slightly, and rather than make two tests with version gates we can just look for a 4xx exception.
not-napoleon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, once CI is green.
| * This is a compromise, in that it is a ConstructingOP which accepts all optional arguments, and then we sort | ||
| * out the behavior from there | ||
| */ | ||
| ConstructingObjectParser<T, String> parser = new ConstructingObjectParser<>(aggName, false, (objects, name) -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can we just add something to the javadoc for what we expect the contents of objects to be? Especially, it looks like we're passing in two configs, at most one of which should be not null, but it's not clear if order matters for them.
| * a single unified constructor for both algos, but internally switch execution | ||
| * depending on which algo is selected | ||
| */ | ||
| public abstract class PercentilesConfig implements ToXContent, Writeable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like how this looks as a top level. Makes more sense this way.
|
@elasticmachine run elasticsearch-ci/1 |
nik9000
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few small things but LGTM regardless.
...ava/org/elasticsearch/search/aggregations/metrics/AbstractPercentilesAggregationBuilder.java
Show resolved
Hide resolved
...ava/org/elasticsearch/search/aggregations/metrics/AbstractPercentilesAggregationBuilder.java
Outdated
Show resolved
Hide resolved
...ava/org/elasticsearch/search/aggregations/metrics/AbstractPercentilesAggregationBuilder.java
Outdated
Show resolved
Hide resolved
...in/java/org/elasticsearch/search/aggregations/metrics/PercentileRanksAggregationBuilder.java
Outdated
Show resolved
Hide resolved
|
Thanks @not-napoleon @nik9000 for the reviews! <3 |
…c#51887) - Consolidates HDR/TDigest factories into a single factory - Consolidates most HDR/TDigest builder into an abstract builder - Deprecates method(), compression(), numSigFig() in favor of a new unified PercentileConfig object - Disallows setting algo options that don't apply to current algo The unified config method carries both the method and algo-specific setting. This provides a mechanism to reject settings that apply to the wrong algorithm. For BWC the old methods are retained but marked as deprecated, and can be removed in future versions. Co-authored-by: Mark Tozzi <[email protected]>
#54537) - Consolidates HDR/TDigest factories into a single factory - Consolidates most HDR/TDigest builder into an abstract builder - Deprecates method(), compression(), numSigFig() in favor of a new unified PercentileConfig object - Disallows setting algo options that don't apply to current algo The unified config method carries both the method and algo-specific setting. This provides a mechanism to reject settings that apply to the wrong algorithm. For BWC the old methods are retained but marked as deprecated, and can be removed in future versions. Co-authored-by: Mark Tozzi <[email protected]> Co-authored-by: Mark Tozzi <[email protected]>
This PR consolidates HDR/TDigest factories into a single factory, and consolidates most of the HDR/TDigest builder into an abstract builder. The only difference in percentiles/ranks builder is the parse field for the
percentsorvaluesfield. It does this by introduce a newPercentileMethod.Configobject which carries both algo method and algo-specific settings. This has several advantages, but a few disadvantages too.I'm about 70/30 on this PR. I like the refactor, but it turned a bit more verbose than my liking. Open to suggestions, or just shelving this PR and ignoring the slightly trapping percentiles behavior today as a "wont-fix" due to complexity of changing the API.
Pros
hdrandtdigest, while ranks would accept it and just use the last-parsed objectcompressionon an HDRHisto, which previously would be silently ignored)Cons
method(),compression(),numSigFig()are retained but deprecated, default handling from old semantics is retained but a little tricky, etc)PercentileMethod.Configis a bit of a strange arrangement, but we have to keep the Method enum around for BWC so it seemed reasonable to put the new config object there too. It's also a bit "bloated" for essentially a method + single setting.