Add parsing for percentiles ranks #23974

tlrx · 2017-04-07T15:44:26Z

This is an attempt to parse InternalHDRPercentilesRanks and InternalTDigestPercentilesRanks aggregation.

cbuescher

@tlrx thanks, I had a first look, I think it looks great given that this is already a very complicated aggregation to parse. I left a few comments, most are just considered to be questions or suggestions.

cbuescher · 2017-04-10T08:55:48Z

core/src/main/java/org/elasticsearch/search/aggregations/ParsedAggregation.java

Already mentioned somewhere else, but adding again for reference: I think we need to implement the same logic here as in InternalAggregation (either treat null/empty() different here or change this in InternalAggregation).

cbuescher · 2017-04-10T08:58:23Z

...c/main/java/org/elasticsearch/search/aggregations/metrics/percentiles/ParsedPercentiles.java

As far as I understand, the type will be constant. I don't think we should make it a stor argument, instead use some constant like PercentileRanks.TYPE_NAME in getType() directly.

cbuescher · 2017-04-10T09:00:17Z

...c/main/java/org/elasticsearch/search/aggregations/metrics/percentiles/ParsedPercentiles.java

Already mentioned above, can we use PercentileRanks.TYPE_NAME here directly?

cbuescher · 2017-04-10T09:01:41Z

...c/main/java/org/elasticsearch/search/aggregations/metrics/percentiles/ParsedPercentiles.java

Maybe its just me disliking Tuple, but if we could keep two maps (e.g. percentiles, percentilesAsString) that would avoid a lot of the v1(), v2() usage that I find a bit confusing to read later in this class.

cbuescher · 2017-04-10T09:08:35Z

...c/main/java/org/elasticsearch/search/aggregations/metrics/percentiles/ParsedPercentiles.java

Yuck! I think I see now why these response keys make using ObjectParser almost impossible. Nothing to do about this unfortunately, I think.

cbuescher · 2017-04-10T09:19:18Z

...c/main/java/org/elasticsearch/search/aggregations/metrics/percentiles/ParsedPercentiles.java

Nice, didn't know map.compute(), seems to be handy. I still wonder if this logic could be a bit simpler to read if we had two maps instead of Map<Double, Tuple<Double, String>>.

cbuescher · 2017-04-10T09:27:41Z

.../org/elasticsearch/search/aggregations/metrics/percentiles/hdr/ParsedHDRPercentileRanks.java

Okay, I see the use of the type in the ctor now. I'd still consider individual contants in ParsedHDRPercentileRanks and ParsedTDigestPercentileRanks and overwriting getType() separately as an alternative. Maybe just different tastes, nothing big.

You're right, override getType would be more readable, thanks

cbuescher · 2017-04-10T09:33:59Z

core/src/test/java/org/elasticsearch/search/aggregations/InternalAggregationTestCase.java

Instead of providing the instance parser by overwriting this method in the subtests, can subtest simply provide their own xContentRegistry with the appropriate parser by overwriting getNamedWriteableRegistry() from AbstractWireSerializingTestCase? I might be missing something though.

NamedWriteableRegistry and NamedXContentRegistry are two different beasts, so maybe you're suggestion to have an abstract method like getNamedXContentRegistry in each test?

Sorry, my bad. What I meant was overwriting xContentRegistry() from ESTestCase, I mixed that up. That way you can e.g. provide the parser that you need in the test without the need of instanceParser(). I don't mind either way, feel free to use it or not.

Oh ok, I see what you meant.

Yes, that's the way to go, I didn't think about overriding xContentRegistry() and that will help. I usd instanceParser() to mimic instanceReader() we already have, but you suggestion is better. I'll update the pull request.

cbuescher · 2017-04-10T09:34:33Z

core/src/test/java/org/elasticsearch/search/aggregations/InternalAggregationTestCase.java

nit: reactivate randomization

woops, thanks :)

cbuescher · 2017-04-10T09:35:57Z

test/framework/src/main/java/org/elasticsearch/bootstrap/BootstrapForTesting.java

What was wrong here?

Nothing, it should not have been commited. (I used intellij 2017.1 and it has classpath issues with Gradle)

tlrx · 2017-04-10T18:57:51Z

@cbuescher Thanks a lot for your review. I think I implemented almost all your comments except the one about tests. Please let me know what you think

cbuescher · 2017-04-10T19:34:23Z

core/src/test/java/org/elasticsearch/search/aggregations/InternalAggregationTestCase.java

I believe this can be provided by overwriting EsTestCase#xContentRegistry().

Oh I see. I didn't think about this method as I wanted to mimic the already existing instanceReader() one, but it makes sense.

I'll update the PR.

cbuescher · 2017-04-10T19:36:05Z

@tlrx thanks a lot, I left one small clarification, I wasn't clear on my part. The rest LGTM from my side.

tlrx · 2017-04-11T07:58:39Z

@cbuescher Thanks! I updated again to use the xContentRegistry() method. Would you like to have another look?

javanna

left some comments, looks great though

javanna · 2017-04-07T22:31:31Z

test/framework/src/main/java/org/elasticsearch/bootstrap/BootstrapForTesting.java

oh oh what happened here? :)

Intellij 2017.1 with gradle :)

javanna · 2017-04-07T22:43:04Z

...ch/search/aggregations/metrics/percentiles/tdigest/InternalTDigestPercentilesRanksTests.java

nit: call context name given that it's a simple string which holds the name. I would love not to need the cast here to String but I didn't find a way...

javanna · 2017-04-10T20:10:49Z

core/src/main/java/org/elasticsearch/search/aggregations/ParsedAggregation.java

I think you can use the setter that will be added by Christoph's PR once that gets merged

javanna · 2017-04-10T22:36:32Z

...c/main/java/org/elasticsearch/search/aggregations/metrics/percentiles/ParsedPercentiles.java

We are reusing InternalPercentile on purpose here I think. I wonder if it makes sense to rename that class later, as it differs quite a bit from the other internal classes and it can be shared between hl client and es core. Or maybe Percentile should be a class instead of an interface given that it has a single straightforward impl.

I also thought about it when I wrote this. I think it makes sense to use InternalPercentile here and I'm in favor of merging the implementation with its interface in core.

javanna · 2017-04-11T12:35:39Z

.../org/elasticsearch/search/aggregations/metrics/percentiles/hdr/ParsedHDRPercentileRanks.java

shouldn't this implement PercentileRanks ?

Totally, I wonder where it has gone :/

javanna · 2017-04-11T14:17:39Z

core/src/test/java/org/elasticsearch/search/aggregations/InternalAggregationTestCase.java

I think I saw this in Christoph's PR too. Hopefully you don't need it.

javanna · 2017-04-11T14:21:01Z

core/src/test/java/org/elasticsearch/search/aggregations/InternalAggregationTestCase.java

cool! this means that we don't have to write unit tests for each single agg anymore, as this one is generic enough?

That's the idea, yes.

shall we make it final then and remove InternalCardinalityTests#testFromXContent ?

javanna · 2017-04-11T14:22:12Z

core/src/test/java/org/elasticsearch/search/aggregations/InternalAggregationTestCase.java

shall we use //norelease instead of TODOs here just to make sure? Though I don't think those fail the build anymore, so maybe it is the same after all. I am a bit paranoid on forgetting these. I would love to merge this branch back with no TODOs left.

javanna · 2017-04-11T14:31:39Z

...sticsearch/search/aggregations/metrics/percentiles/hdr/InternalHDRPercentilesRanksTests.java

I am wondering if we should have a single final impl for this in the base test class that contains all of the aggs that we can parse. That would be useful later when we will parse nested aggs and bucket aggs.

I think so, but I think it is OK for now and should be easy to change later. I'm in favor of keeping it like this until we have many parsable aggregations.

I would insist on doing it now. Adding a line to an existing method for each agg is going to cost less than having to do it for all aggs afterwards. Any reason why we should postpone this? are we not sure it is the way to go?

Ok. I wanted to change that in a follow up method to limit the set of changes of this PR but OK, let's do it now.

I am ok with a followup as well. but in the previous comments you said "when we have many parsable aggregations". I would do it earlier than that :)

javanna · 2017-04-11T14:51:42Z

...c/main/java/org/elasticsearch/search/aggregations/metrics/percentiles/ParsedPercentiles.java

Nice, was the idea to reuse this also for percentiles (InternalHDRPercentiles and InternalTDigestPercentiles)? That looks quite straight-forward.

Yes, that was the idea. To be honest I have to give it a try but it should be doable

tlrx · 2017-04-13T13:39:47Z

@javanna @cbuescher Thanks for your reviews.

I looked at it again and I found few issues in my previous code, so I updated it again. It now correctly implement the PercentileRanks interface and mutualize the tests in a InternalPercentilesRanksTestCase class which randomized the formatter and also adds more percentiles keys/values. To make it work, DocValueFormat has to implements equals/hascode.

It also mutualize the parsing logic in a AbstractParsedPercentiles class because I expect it to be reused when parsing percentiles aggregations.

That would be awesome if you could have another look.

cbuescher

Thanks for the update, the additions look good to me. I left a few suggestions and questions, this is really a tough one when it gets to the details but I think its almost there.

cbuescher · 2017-04-13T15:27:18Z

...ava/org/elasticsearch/search/aggregations/metrics/percentiles/AbstractParsedPercentiles.java

-        if (percentile == null) {
-            return Double.NaN;
+    InternalPercentile getPercentile(double percent) {
+        for (InternalPercentile percentile : percentiles) {


Does is make sense to keep a Map in addition to the List to make lookups like this faster? I don't know if its worth it though if the expected number of elements in the list is small.

cbuescher · 2017-04-13T15:27:54Z

...ava/org/elasticsearch/search/aggregations/metrics/percentiles/AbstractParsedPercentiles.java

+            }
        }
-        return percentile;
+        return null;


Before this you returnes Double.NaN, was that wrong?

cbuescher · 2017-04-13T15:32:55Z

...ava/org/elasticsearch/search/aggregations/metrics/percentiles/AbstractParsedPercentiles.java

                            }
                        }
                    }
-                    if (key != null) {


Can you explain why we don't need the check anymore?

I should not have reverted this, sorry.

cbuescher · 2017-04-13T15:42:10Z

...sticsearch/search/aggregations/metrics/percentiles/tdigest/ParsedTDigestPercentileRanks.java


    private static ObjectParser<ParsedTDigestPercentileRanks, Void> PARSER =
-            new ObjectParser<>("ParsedTDigestPercentileRanks", true, ParsedTDigestPercentileRanks::new);
+            new ObjectParser<>(ParsedTDigestPercentileRanks.class.getSimpleName(), true, ParsedTDigestPercentileRanks::new);


I used the NAME constant used in getWritableName as name for the parser. I guess it doesn't matter much, but maybe we want to stay consistent with this? I think the parser name currently is only used for error messages though.

cbuescher · 2017-04-13T15:46:17Z

.../elasticsearch/search/aggregations/metrics/percentiles/InternalPercentilesRanksTestCase.java

+
+    @Override
+    protected void assertFromXContent(T aggregation, ParsedAggregation parsedAggregation) {
+        super.assertFromXContent(aggregation, parsedAggregation);


Do we need this when the intention is to make super.assertFromXContent abstract?

agreed, I'd remove this line

cbuescher · 2017-04-13T15:52:56Z

.../elasticsearch/search/aggregations/metrics/percentiles/InternalPercentilesRanksTestCase.java

+        for (Percentile percentile : percentileRanks) {
+            Double value = percentile.getValue();
+            assertEquals(percentileRanks.percent(value), parsedPercentileRanks.percent(value), 0);
+            if (format !=  DocValueFormat.RAW) {


I think this assertion should also work if the formatter in the original aggregation is DocValueFormat.RAW. The original objects method returns a value formatted with DocValueFormat.RAW, I think the parsed version should do the same.

good point.

Good catch, thanks

cbuescher · 2017-04-13T15:56:58Z

.../elasticsearch/search/aggregations/metrics/percentiles/InternalPercentilesRanksTestCase.java

+
+public abstract class InternalPercentilesRanksTestCase<T extends InternalAggregation> extends InternalAggregationTestCase<T> {
+
+    private final boolean keyed = randomBoolean();


This can probably moved to a local variable in createTestInstance(...)

cbuescher · 2017-04-13T15:57:43Z

.../elasticsearch/search/aggregations/metrics/percentiles/InternalPercentilesRanksTestCase.java

+public abstract class InternalPercentilesRanksTestCase<T extends InternalAggregation> extends InternalAggregationTestCase<T> {
+
+    private final boolean keyed = randomBoolean();
+    private final DocValueFormat format = randomDocValueFormat();


This can probably moved to a local variable in createTestInstance(...) if it is not needed anymore in assertFromXContent() (see comment below)

cbuescher · 2017-04-13T16:31:05Z

.../elasticsearch/search/aggregations/metrics/percentiles/InternalPercentilesRanksTestCase.java

+
+    private DocValueFormat randomDocValueFormat() {
+        if (randomBoolean()) {
+            return new DocValueFormat.DateTime(DateFieldMapper.DEFAULT_DATE_TIME_FORMATTER, DateTimeZone.UTC);


I wonder if in this case we really need the DateTime formatter. Although I think its theoretically possible to use it with percentiles aggregations I doubt that the output would make much sense (I think it truncates any double value from the InternalPercentile to a long and then make a date out of it, but that output doesn't make much sense). Maybe I missed something, but otherwise we could test with DocValueFormat.Decimal() here instead? I implemented equals/hashCode for that formatter in #24085.

Agreed, thanks

javanna

left a few comments, nothing major. LGTM otherwise.

javanna · 2017-04-14T09:04:06Z

...ava/org/elasticsearch/search/aggregations/metrics/percentiles/AbstractParsedPercentiles.java

+    }
+
+    InternalPercentile getPercentile(double percent) {
+        for (InternalPercentile percentile : percentiles) {


I would rather use a LinkedHashMap for this, why a list that we have to iterate through when retrieving stuff per key?

Oh right that would be better. Not sure why I moved back to a list of internal percentile.

javanna · 2017-04-14T09:06:31Z

...ava/org/elasticsearch/search/aggregations/metrics/percentiles/AbstractParsedPercentiles.java

+                builder.field(String.valueOf(key), percentile.getValue());
+
+                String valueAsString = percentileAsString(key);
+                if (valueAsString != null && valueAsString.isEmpty() == false) {


Strings.hasLength? But when can it happen that the key is empty? isn't that an issue?

javanna · 2017-04-14T09:08:23Z

...ava/org/elasticsearch/search/aggregations/metrics/percentiles/AbstractParsedPercentiles.java

+                    builder.field(CommonFields.KEY.getPreferredName(), key);
+                    builder.field(CommonFields.VALUE.getPreferredName(), percentile.getValue());
+                    String valueAsString = percentileAsString(key);
+                    if (valueAsString != null && valueAsString.isEmpty() == false) {


same as above

javanna · 2017-04-14T09:15:20Z

core/src/test/java/org/elasticsearch/search/aggregations/InternalAggregationTestCase.java

shall we make it final then and remove InternalCardinalityTests#testFromXContent ?

javanna · 2017-04-14T09:18:18Z

.../elasticsearch/search/aggregations/metrics/percentiles/InternalPercentilesRanksTestCase.java

+
+    @Override
+    protected void assertFromXContent(T aggregation, ParsedAggregation parsedAggregation) {
+        super.assertFromXContent(aggregation, parsedAggregation);


agreed, I'd remove this line

javanna · 2017-04-14T09:19:00Z

.../elasticsearch/search/aggregations/metrics/percentiles/InternalPercentilesRanksTestCase.java

+        for (Percentile percentile : percentileRanks) {
+            Double value = percentile.getValue();
+            assertEquals(percentileRanks.percent(value), parsedPercentileRanks.percent(value), 0);
+            if (format !=  DocValueFormat.RAW) {


good point.

javanna · 2017-04-14T09:20:03Z

...sticsearch/search/aggregations/metrics/percentiles/hdr/InternalHDRPercentilesRanksTests.java

I would insist on doing it now. Adding a line to an existing method for each agg is going to cost less than having to do it for all aggs afterwards. Any reason why we should postpone this? are we not sure it is the way to go?

javanna · 2017-04-14T09:34:04Z

.../elasticsearch/search/aggregations/metrics/percentiles/InternalPercentilesRanksTestCase.java

+            cdfValues[i] = randomCdfValues.get(i);
+        }
+
+        TDigestState state = new TDigestState(100);


question: I see this method moved to this new base class. But is TDigestState also used in HDR percentile ranks? I thought it applied only to standard percentile ranks, maybe I got this wrong though.

Also, if this method is not supposed to be overridden, make it final?

This is a left over, sorry for the confusion. I marked the method final +1

javanna · 2017-04-14T09:34:22Z

.../elasticsearch/search/aggregations/metrics/percentiles/InternalPercentilesRanksTestCase.java

+                                            double[] cdfValues, boolean keyed, DocValueFormat format);
+
+    @Override
+    protected void assertFromXContent(T aggregation, ParsedAggregation parsedAggregation) {


make it final?

javanna

left two minors, LGTM otherwise

javanna · 2017-04-14T14:58:54Z

...ava/org/elasticsearch/search/aggregations/metrics/percentiles/AbstractParsedPercentiles.java

+
+public abstract class AbstractParsedPercentiles extends ParsedAggregation implements Iterable<Percentile>  {
+
+    private final Map<Double, Double> percentiles = new LinkedHashMap<>();


nit: (could also be a followup) is there a test that fails if we switch to the wrong impl? Maybe we should compare the iterator returned but the internal object and the iterator returned by the parsed object?

nit: (could also be a followup) is there a test that fails if we switch to the wrong impl?

No, but I'm not sure to understand what you mean.

Maybe we should compare the iterator returned but the internal object and the iterator returned by the parsed object?

I thought it was the case but I looked again and added more verifications in #24160 - it compares both iterators obtained from the internal aggregation and the parsed one. It raised a bug in the parsed implementations where the Percentile retrieved from iterators where not equals. I looked at the internal implementation again and the percents/values are inverted (compared to the internal percentiles NOT ranks aggregations) so I changed the parsed implementation to behave the same.

the parsing side of things where key & value are "inverted" (compared to the percentiles NOT rank aggregations)

perfect, thanks for doing that. always nice when adding tests leads to finding bugs :)
I previously meant that I would like to see a test fail if we lose ordering, by going back to HashMap rather than LinkedHashMap. I think that should be the case with #24160 .

OK, got it. Yes it is now checked in #24160.

javanna · 2017-04-14T15:00:51Z

core/src/test/java/org/elasticsearch/search/aggregations/InternalAggregationTestCase.java

+    static List<NamedXContentRegistry.Entry> getNamedXContents() {
+        Map<String, ContextParser<Object, ? extends Aggregation>> namedXContents = new HashMap<>();
+        namedXContents.put(InternalHDRPercentileRanks.NAME, (p, c) -> ParsedHDRPercentileRanks.fromXContent(p, (String) c));
+        namedXContents.put(InternalTDigestPercentileRanks.NAME, (p, c) -> ParsedTDigestPercentileRanks.fromXContent(p, (String) c));


shouldn't we have cardinality here too? I hope the cardinality testFromXContent fails otherwise?

Yes, I added it here. No, the test doesn't fail because it allows the testFromXContent() test to fail without errors for all the aggregations that are not parsable yet (see the try/catch block in the test method)

oh right. I think we should make this test a bit smarter otherwise it's trappy. Can we run the test only for known aggs, may using an assume? or ignore the exception only when it's correct to do so?

I always forgot about assume. This is a good suggestion, thanks. I fixed this in #24160

tlrx · 2017-04-18T08:19:49Z

Thanks a lot @cbuescher @javanna

This commit adds the logic for parsing the percentiles ranks aggregations.

tlrx added :Java High Level REST Client WIP labels Apr 7, 2017

tlrx requested review from cbuescher and javanna April 7, 2017 15:44

javanna changed the base branch from master to feature/client_aggs_parsing April 7, 2017 22:30

cbuescher requested changes Apr 10, 2017

View reviewed changes

tlrx force-pushed the add-parsing-for-percentiles-ranks branch from a0e083d to f41909e Compare April 10, 2017 18:52

tlrx added review v6.0.0-alpha1 and removed WIP labels Apr 10, 2017

tlrx changed the title ~~WIP: Add parsing for percentiles ranks~~ Add parsing for percentiles ranks Apr 10, 2017

cbuescher reviewed Apr 10, 2017

View reviewed changes

cbuescher approved these changes Apr 10, 2017

View reviewed changes

javanna removed the v6.0.0-alpha1 label Apr 10, 2017

javanna requested changes Apr 11, 2017

View reviewed changes

tlrx added 4 commits April 13, 2017 09:26

First shot of parsing percentiles aggregations

d3ecaaf

Update after Christoph review

b8e1bb1

Override xContentRegistry()

b5f279f

Update after Luca review

4e7b1a8

tlrx force-pushed the add-parsing-for-percentiles-ranks branch from 7a8589a to 4e7b1a8 Compare April 13, 2017 13:35

cbuescher reviewed Apr 13, 2017

View reviewed changes

javanna approved these changes Apr 14, 2017

View reviewed changes

Apply review feedback

040023d

javanna approved these changes Apr 14, 2017

View reviewed changes

Add InternalCardinality

8e75076

tlrx merged commit c0036d8 into elastic:feature/client_aggs_parsing Apr 18, 2017

tlrx deleted the add-parsing-for-percentiles-ranks branch April 18, 2017 08:20

tlrx mentioned this pull request Apr 19, 2017

Add parsing methods for Percentiles aggregations #24183

Merged

javanna mentioned this pull request May 22, 2017

Add aggs parsers for high level REST Client #24824

Merged

javanna pushed a commit to javanna/elasticsearch that referenced this pull request May 23, 2017

Add parsing for percentiles ranks (elastic#23974)

b450c9d

This commit adds the logic for parsing the percentiles ranks aggregations.

javanna mentioned this pull request May 23, 2017

Backport aggs parsers for high level REST Client #24844

Merged


		public abstract class InternalPercentilesRanksTestCase<T extends InternalAggregation> extends InternalAggregationTestCase<T> {

		private final boolean keyed = randomBoolean();


		public abstract class AbstractParsedPercentiles extends ParsedAggregation implements Iterable<Percentile> {

		private final Map<Double, Double> percentiles = new LinkedHashMap<>();

Add parsing for percentiles ranks #23974

Add parsing for percentiles ranks #23974

Uh oh!

Conversation

tlrx commented Apr 7, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cbuescher left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tlrx commented Apr 10, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cbuescher commented Apr 10, 2017

Uh oh!

tlrx commented Apr 11, 2017

Uh oh!

javanna left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

tlrx commented Apr 7, 2017 •

edited

Loading