Skip to content

Conversation

@tlrx
Copy link
Member

@tlrx tlrx commented Apr 7, 2017

This is an attempt to parse InternalHDRPercentilesRanks and InternalTDigestPercentilesRanks aggregation.

@tlrx tlrx requested review from cbuescher and javanna April 7, 2017 15:44
@javanna javanna changed the base branch from master to feature/client_aggs_parsing April 7, 2017 22:30
Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tlrx thanks, I had a first look, I think it looks great given that this is already a very complicated aggregation to parse. I left a few comments, most are just considered to be questions or suggestions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already mentioned somewhere else, but adding again for reference: I think we need to implement the same logic here as in InternalAggregation (either treat null/empty() different here or change this in InternalAggregation).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, the type will be constant. I don't think we should make it a stor argument, instead use some constant like PercentileRanks.TYPE_NAME in getType() directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already mentioned above, can we use PercentileRanks.TYPE_NAME here directly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe its just me disliking Tuple, but if we could keep two maps (e.g. percentiles, percentilesAsString) that would avoid a lot of the v1(), v2() usage that I find a bit confusing to read later in this class.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yuck! I think I see now why these response keys make using ObjectParser almost impossible. Nothing to do about this unfortunately, I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, didn't know map.compute(), seems to be handy. I still wonder if this logic could be a bit simpler to read if we had two maps instead of Map<Double, Tuple<Double, String>>.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I see the use of the type in the ctor now. I'd still consider individual contants in ParsedHDRPercentileRanks and ParsedTDigestPercentileRanks and overwriting getType() separately as an alternative. Maybe just different tastes, nothing big.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, override getType would be more readable, thanks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of providing the instance parser by overwriting this method in the subtests, can subtest simply provide their own xContentRegistry with the appropriate parser by overwriting getNamedWriteableRegistry() from AbstractWireSerializingTestCase? I might be missing something though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NamedWriteableRegistry and NamedXContentRegistry are two different beasts, so maybe you're suggestion to have an abstract method like getNamedXContentRegistry in each test?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my bad. What I meant was overwriting xContentRegistry() from ESTestCase, I mixed that up. That way you can e.g. provide the parser that you need in the test without the need of instanceParser(). I don't mind either way, feel free to use it or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok, I see what you meant.

Yes, that's the way to go, I didn't think about overriding xContentRegistry() and that will help. I usd instanceParser() to mimic instanceReader() we already have, but you suggestion is better. I'll update the pull request.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: reactivate randomization

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woops, thanks :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was wrong here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing, it should not have been commited. (I used intellij 2017.1 and it has classpath issues with Gradle)

@tlrx tlrx force-pushed the add-parsing-for-percentiles-ranks branch from a0e083d to f41909e Compare April 10, 2017 18:52
@tlrx tlrx changed the title WIP: Add parsing for percentiles ranks Add parsing for percentiles ranks Apr 10, 2017
@tlrx
Copy link
Member Author

tlrx commented Apr 10, 2017

@cbuescher Thanks a lot for your review. I think I implemented almost all your comments except the one about tests. Please let me know what you think

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this can be provided by overwriting EsTestCase#xContentRegistry().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see. I didn't think about this method as I wanted to mimic the already existing instanceReader() one, but it makes sense.

I'll update the PR.

@cbuescher
Copy link
Member

@tlrx thanks a lot, I left one small clarification, I wasn't clear on my part. The rest LGTM from my side.

@tlrx
Copy link
Member Author

tlrx commented Apr 11, 2017

@cbuescher Thanks! I updated again to use the xContentRegistry() method. Would you like to have another look?

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some comments, looks great though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh oh what happened here? :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intellij 2017.1 with gradle :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: call context name given that it's a simple string which holds the name. I would love not to need the cast here to String but I didn't find a way...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use the setter that will be added by Christoph's PR once that gets merged

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are reusing InternalPercentile on purpose here I think. I wonder if it makes sense to rename that class later, as it differs quite a bit from the other internal classes and it can be shared between hl client and es core. Or maybe Percentile should be a class instead of an interface given that it has a single straightforward impl.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also thought about it when I wrote this. I think it makes sense to use InternalPercentile here and I'm in favor of merging the implementation with its interface in core.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this implement PercentileRanks ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally, I wonder where it has gone :/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I saw this in Christoph's PR too. Hopefully you don't need it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool! this means that we don't have to write unit tests for each single agg anymore, as this one is generic enough?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the idea, yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we make it final then and remove InternalCardinalityTests#testFromXContent ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we use //norelease instead of TODOs here just to make sure? Though I don't think those fail the build anymore, so maybe it is the same after all. I am a bit paranoid on forgetting these. I would love to merge this branch back with no TODOs left.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we should have a single final impl for this in the base test class that contains all of the aggs that we can parse. That would be useful later when we will parse nested aggs and bucket aggs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, but I think it is OK for now and should be easy to change later. I'm in favor of keeping it like this until we have many parsable aggregations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would insist on doing it now. Adding a line to an existing method for each agg is going to cost less than having to do it for all aggs afterwards. Any reason why we should postpone this? are we not sure it is the way to go?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I wanted to change that in a follow up method to limit the set of changes of this PR but OK, let's do it now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with a followup as well. but in the previous comments you said "when we have many parsable aggregations". I would do it earlier than that :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, was the idea to reuse this also for percentiles (InternalHDRPercentiles and InternalTDigestPercentiles)? That looks quite straight-forward.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was the idea. To be honest I have to give it a try but it should be doable

@tlrx tlrx force-pushed the add-parsing-for-percentiles-ranks branch from 7a8589a to 4e7b1a8 Compare April 13, 2017 13:35
@tlrx
Copy link
Member Author

tlrx commented Apr 13, 2017

@javanna @cbuescher Thanks for your reviews.

I looked at it again and I found few issues in my previous code, so I updated it again. It now correctly implement the PercentileRanks interface and mutualize the tests in a InternalPercentilesRanksTestCase class which randomized the formatter and also adds more percentiles keys/values. To make it work, DocValueFormat has to implements equals/hascode.

It also mutualize the parsing logic in a AbstractParsedPercentiles class because I expect it to be reused when parsing percentiles aggregations.

That would be awesome if you could have another look.

Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update, the additions look good to me. I left a few suggestions and questions, this is really a tough one when it gets to the details but I think its almost there.

if (percentile == null) {
return Double.NaN;
InternalPercentile getPercentile(double percent) {
for (InternalPercentile percentile : percentiles) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does is make sense to keep a Map in addition to the List to make lookups like this faster? I don't know if its worth it though if the expected number of elements in the list is small.

}
}
return percentile;
return null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this you returnes Double.NaN, was that wrong?

}
}
}
if (key != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why we don't need the check anymore?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should not have reverted this, sorry.


private static ObjectParser<ParsedTDigestPercentileRanks, Void> PARSER =
new ObjectParser<>("ParsedTDigestPercentileRanks", true, ParsedTDigestPercentileRanks::new);
new ObjectParser<>(ParsedTDigestPercentileRanks.class.getSimpleName(), true, ParsedTDigestPercentileRanks::new);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the NAME constant used in getWritableName as name for the parser. I guess it doesn't matter much, but maybe we want to stay consistent with this? I think the parser name currently is only used for error messages though.


@Override
protected void assertFromXContent(T aggregation, ParsedAggregation parsedAggregation) {
super.assertFromXContent(aggregation, parsedAggregation);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this when the intention is to make super.assertFromXContent abstract?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, I'd remove this line

for (Percentile percentile : percentileRanks) {
Double value = percentile.getValue();
assertEquals(percentileRanks.percent(value), parsedPercentileRanks.percent(value), 0);
if (format != DocValueFormat.RAW) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this assertion should also work if the formatter in the original aggregation is DocValueFormat.RAW. The original objects method returns a value formatted with DocValueFormat.RAW, I think the parsed version should do the same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks


public abstract class InternalPercentilesRanksTestCase<T extends InternalAggregation> extends InternalAggregationTestCase<T> {

private final boolean keyed = randomBoolean();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can probably moved to a local variable in createTestInstance(...)

public abstract class InternalPercentilesRanksTestCase<T extends InternalAggregation> extends InternalAggregationTestCase<T> {

private final boolean keyed = randomBoolean();
private final DocValueFormat format = randomDocValueFormat();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can probably moved to a local variable in createTestInstance(...) if it is not needed anymore in assertFromXContent() (see comment below)


private DocValueFormat randomDocValueFormat() {
if (randomBoolean()) {
return new DocValueFormat.DateTime(DateFieldMapper.DEFAULT_DATE_TIME_FORMATTER, DateTimeZone.UTC);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if in this case we really need the DateTime formatter. Although I think its theoretically possible to use it with percentiles aggregations I doubt that the output would make much sense (I think it truncates any double value from the InternalPercentile to a long and then make a date out of it, but that output doesn't make much sense). Maybe I missed something, but otherwise we could test with DocValueFormat.Decimal() here instead? I implemented equals/hashCode for that formatter in #24085.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, thanks

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a few comments, nothing major. LGTM otherwise.

}

InternalPercentile getPercentile(double percent) {
for (InternalPercentile percentile : percentiles) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather use a LinkedHashMap for this, why a list that we have to iterate through when retrieving stuff per key?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right that would be better. Not sure why I moved back to a list of internal percentile.

builder.field(String.valueOf(key), percentile.getValue());

String valueAsString = percentileAsString(key);
if (valueAsString != null && valueAsString.isEmpty() == false) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strings.hasLength? But when can it happen that the key is empty? isn't that an issue?

builder.field(CommonFields.KEY.getPreferredName(), key);
builder.field(CommonFields.VALUE.getPreferredName(), percentile.getValue());
String valueAsString = percentileAsString(key);
if (valueAsString != null && valueAsString.isEmpty() == false) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we make it final then and remove InternalCardinalityTests#testFromXContent ?


@Override
protected void assertFromXContent(T aggregation, ParsedAggregation parsedAggregation) {
super.assertFromXContent(aggregation, parsedAggregation);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, I'd remove this line

for (Percentile percentile : percentileRanks) {
Double value = percentile.getValue();
assertEquals(percentileRanks.percent(value), parsedPercentileRanks.percent(value), 0);
if (format != DocValueFormat.RAW) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would insist on doing it now. Adding a line to an existing method for each agg is going to cost less than having to do it for all aggs afterwards. Any reason why we should postpone this? are we not sure it is the way to go?

cdfValues[i] = randomCdfValues.get(i);
}

TDigestState state = new TDigestState(100);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: I see this method moved to this new base class. But is TDigestState also used in HDR percentile ranks? I thought it applied only to standard percentile ranks, maybe I got this wrong though.

Also, if this method is not supposed to be overridden, make it final?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a left over, sorry for the confusion. I marked the method final +1

double[] cdfValues, boolean keyed, DocValueFormat format);

@Override
protected void assertFromXContent(T aggregation, ParsedAggregation parsedAggregation) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make it final?

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left two minors, LGTM otherwise


public abstract class AbstractParsedPercentiles extends ParsedAggregation implements Iterable<Percentile> {

private final Map<Double, Double> percentiles = new LinkedHashMap<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: (could also be a followup) is there a test that fails if we switch to the wrong impl? Maybe we should compare the iterator returned but the internal object and the iterator returned by the parsed object?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: (could also be a followup) is there a test that fails if we switch to the wrong impl?

No, but I'm not sure to understand what you mean.

Maybe we should compare the iterator returned but the internal object and the iterator returned by the parsed object?

I thought it was the case but I looked again and added more verifications in #24160 - it compares both iterators obtained from the internal aggregation and the parsed one. It raised a bug in the parsed implementations where the Percentile retrieved from iterators where not equals. I looked at the internal implementation again and the percents/values are inverted (compared to the internal percentiles NOT ranks aggregations) so I changed the parsed implementation to behave the same.

the parsing side of things where key & value are "inverted" (compared to the percentiles NOT rank aggregations)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perfect, thanks for doing that. always nice when adding tests leads to finding bugs :)
I previously meant that I would like to see a test fail if we lose ordering, by going back to HashMap rather than LinkedHashMap. I think that should be the case with #24160 .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, got it. Yes it is now checked in #24160.

static List<NamedXContentRegistry.Entry> getNamedXContents() {
Map<String, ContextParser<Object, ? extends Aggregation>> namedXContents = new HashMap<>();
namedXContents.put(InternalHDRPercentileRanks.NAME, (p, c) -> ParsedHDRPercentileRanks.fromXContent(p, (String) c));
namedXContents.put(InternalTDigestPercentileRanks.NAME, (p, c) -> ParsedTDigestPercentileRanks.fromXContent(p, (String) c));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we have cardinality here too? I hope the cardinality testFromXContent fails otherwise?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I added it here. No, the test doesn't fail because it allows the testFromXContent() test to fail without errors for all the aggregations that are not parsable yet (see the try/catch block in the test method)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh right. I think we should make this test a bit smarter otherwise it's trappy. Can we run the test only for known aggs, may using an assume? or ignore the exception only when it's correct to do so?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always forgot about assume. This is a good suggestion, thanks. I fixed this in #24160

@tlrx tlrx merged commit c0036d8 into elastic:feature/client_aggs_parsing Apr 18, 2017
@tlrx
Copy link
Member Author

tlrx commented Apr 18, 2017

Thanks a lot @cbuescher @javanna

@tlrx tlrx deleted the add-parsing-for-percentiles-ranks branch April 18, 2017 08:20
javanna pushed a commit to javanna/elasticsearch that referenced this pull request May 23, 2017
This commit adds the logic for parsing the percentiles ranks aggregations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants