-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Add parsing method for Matrix Stats #24746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add parsing method for Matrix Stats #24746
Conversation
cc3a6aa to
c65c4ea
Compare
c65c4ea to
d6abe69
Compare
|
@cbuescher @javanna This is ready to be reviewed. |
| @Override | ||
| public XContentBuilder doXContentBody(XContentBuilder builder, Params params) throws IOException { | ||
| if (results != null && results.getFieldCounts().keySet().isEmpty() == false) { | ||
| if (results != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to change this so that at parsing time we can make the distinction between no existing results and empty results.
When there is no results, the rendered output will be something like "matrix_stats":{} and empty results will be "matrix_stats":{"fields":[]}. This allows us to reproduce the same behavior of MatrixStats methods in the transport client and parsed implementations.
I'm not sure that "no results" is a real use case but InternalMatrixStats has specific checks at several for results being null (and it's also randomized in tests) so it let me think that it can happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at InternalMatrixStats#doReduce() it seems to me that at least after the aggregations are reduced (which I think should be always the case for aggregations that are rendered to REST) the results should always be set. Maybe we can check with somebody who knows more about the internals and work with that assumption on the client side instead of making this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, after looking again results is never null once the aggregations are reduced. It can though be null at the transport level when shard results are sent over the wire before being reduced on the coordinating node.
It explains why results is randomly set to null in InternalMatrixStats because serialization is tested there. I reverted this change and only test the normal case.
cbuescher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tlrx thanks, I left a few questions and comments, especially around whether we need to test the extra "no result" case. Maybe this is something to can check with somebody who knows the internals of this aggregation better. It would simplify this PR a bit I think.
| @Override | ||
| public XContentBuilder doXContentBody(XContentBuilder builder, Params params) throws IOException { | ||
| if (results != null && results.getFieldCounts().keySet().isEmpty() == false) { | ||
| if (results != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at InternalMatrixStats#doReduce() it seems to me that at least after the aggregations are reduced (which I think should be always the case for aggregations that are rendered to REST) the results should always be set. Maybe we can check with somebody who knows more about the internals and work with that assumption on the client side instead of making this change.
|
|
||
| @Override | ||
| public long getDocCount() { | ||
| throw new UnsupportedOperationException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't render this to the REST response, but its part of the MatrixStats interface, should we either add this value to the output or remove the method from the interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to add this field to the REST response. I can do that in another PR against master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
| if (covariances == null) { | ||
| return Double.NaN; | ||
| } | ||
| return checkedGet(checkedGet(covariances, fieldX), fieldY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'm not entirely sure this does exactly what getValFromUpperTriangularMatrix() does (e.g. I think swapping the fieldX, fieldY arguments should return the same result in the original implementation, I'm not entirely sure though). It should be possible to declare that method as static in MatrixStatsResults and then use it from here to get the same behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, thanks
| if (fieldX != null && fieldX.equals(fieldY)) { | ||
| return 1.0; | ||
| } | ||
| return checkedGet(checkedGet(correlations, fieldX), fieldY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here about using getValFromUpperTriangularMatrix() if possible.
| Map<String, Object> metaData) { | ||
| public void setUp() throws Exception { | ||
| super.setUp(); | ||
| hasMatrixStatsResults = randomBoolean(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the discussion above (whether we need to account for the "has no results" case at all after "reduce", I would opt for at least testing it less frequent, and thest the "result" case mostly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
| } | ||
|
|
||
| final String unknownField = randomAlphaOfLength(3); | ||
| final String fieldX = randomFrom(unknownField, randomAlphaOfLength(3)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand what this is doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's not really readable. I changed that. The idea is to test getCovariance/getCorrelation with various unknown fields.
| matrixStats.skewness = new LinkedHashMap<>(size); | ||
| matrixStats.kurtosis = new LinkedHashMap<>(size); | ||
| matrixStats.covariances = new LinkedHashMap<>(size); | ||
| matrixStats.correlations = new LinkedHashMap<>(size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to maintain insertion order in all these maps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, but at least on counts so that we can render the parsed aggregation with the same order of fields stats.
|
Thanks @cbuescher and @javanna for your reviews. I updated the code according to your comments. I think this PR could be merged without #24776. Just in case I added a //norelease comment to not forget about updating the parsing logic and test once the change in core is merged. |
cbuescher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| runningStats.add(fields, values); | ||
| MatrixStatsResults matrixStatsResults = hasMatrixStatsResults ? new MatrixStatsResults(runningStats) : null; | ||
| return new InternalMatrixStats("_name", 1L, runningStats, matrixStatsResults, Collections.emptyList(), Collections.emptyMap()); | ||
| return new InternalMatrixStats(name, 1L, runningStats, matrixStatsResults, Collections.emptyList(), Collections.emptyMap()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch ;-)
javanna
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Thanks @javanna and @cbuescher |
This PR adds the parsing logic for the
InternalMatrixStatsaggregation.