-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Add parsing to Significant Terms aggregations #24682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add parsing to Significant Terms aggregations #24682
Conversation
javanna
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left a few minors, LGTM otherwise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this happen in practice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I should have changed that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above, just wondering if this can really happen, when can a key be null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we can return key.utf8ToString() directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this method needed? I think it isn't used anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems it's not used but I prefer to leave it in core - I saw something about it might be needed in scripted aggs or something. I'll try to figure this out, and if it's unused then I'll create a separate PR in core.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created #24714 to remove the method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove one of these empty lines? ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that to make this work as part of AggregationsTests you need to rename this method to setUp and remove before annotation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AggregationsTests passed so I didn't catch it. I changed that, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok weird that it worked. does the format have a default value in case this method is not called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No... but the output does not depend on the format (or heuristic) so a null format produces a valid XContent output that is parseable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one we set although it is never printed out as part of toXContent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We render the doc count which is in fact the subsetDf.
cbuescher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I left a few minor comments that might either be addressed, ignored or handled as follow ups, as you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we reuse the constants used in InternalSignificantTerms by making them public? I think we did that elsewhere and also do this with the CommonFields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure - I changed them in core specially and after that I forgot to use them :/ Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we lazily compute a bucketMap like in InternalMappedSignificantTerms and then be able reuse it when this method gets called many times? I have no strong opinions on this though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have too... I tend to think that it can be done on caller's side if a lot of bucket are going to be retrieve by their keys. I'll add one for the sake of coherency with the internal implementations and other parsed aggregations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also added a test about this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We render a subsetSize as the DOC_COUNT field in the surrounding aggregation, I'm not sure if this is equivalent with the bucket subset size but maybe it could be used here? Probably would need some checking with somebody who knows the aggregation better though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how they are related to be honest. Since we don't render the superset size and subset size I think it's ok to throw an unsupported operation exception here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussion that gives more background around where these fields come from: #5146 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if getSuperset/SubsetSize is part of the Bucket interface but not rendered via the Rest response, should we either add rendering of these values to the bucket response or remove it from the interface to get equivalent behaviour of functionality of the transport client with the high level rest client here? I think this can be done in a separate issue though, maybe its not needed at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe @markharwood has an opinion on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with the UnsupportedOperationException for now if we can track this question (whether we can reach consistency between the functionality the transport client provides via the SignificantTerms.Bucket interface with the rest response) in a separate issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This took me some time to wrap my head around this, but I finally created #24865 to address this.
|
Thanks @javanna @cbuescher ! I updated a bit, would you like to have another look please? |
|
Update looks good to me. |
a861bd7 to
cafdb94
Compare
|
Thanks @javanna @cbuescher ! |
This pull request adds parsing methods for the
SignificantStringTermsandSignificantLongTermsaggregations.