Add parsing method for Matrix Stats #24746

tlrx · 2017-05-17T16:21:57Z

This PR adds the parsing logic for the InternalMatrixStats aggregation.

tlrx · 2017-05-18T11:32:38Z

@cbuescher @javanna This is ready to be reviewed.

tlrx · 2017-05-18T11:37:51Z

...ts/src/main/java/org/elasticsearch/search/aggregations/matrix/stats/InternalMatrixStats.java

    @Override
    public XContentBuilder doXContentBody(XContentBuilder builder, Params params) throws IOException {
-        if (results != null && results.getFieldCounts().keySet().isEmpty() == false) {
+        if (results != null) {


I had to change this so that at parsing time we can make the distinction between no existing results and empty results.

When there is no results, the rendered output will be something like "matrix_stats":{} and empty results will be "matrix_stats":{"fields":[]}. This allows us to reproduce the same behavior of MatrixStats methods in the transport client and parsed implementations.

I'm not sure that "no results" is a real use case but InternalMatrixStats has specific checks at several for results being null (and it's also randomized in tests) so it let me think that it can happen.

Looking at InternalMatrixStats#doReduce() it seems to me that at least after the aggregations are reduced (which I think should be always the case for aggregations that are rendered to REST) the results should always be set. Maybe we can check with somebody who knows more about the internals and work with that assumption on the client side instead of making this change.

I agree, after looking again results is never null once the aggregations are reduced. It can though be null at the transport level when shard results are sent over the wire before being reduced on the coordinating node.

It explains why results is randomly set to null in InternalMatrixStats because serialization is tested there. I reverted this change and only test the normal case.

cbuescher

@tlrx thanks, I left a few questions and comments, especially around whether we need to test the extra "no result" case. Maybe this is something to can check with somebody who knows the internals of this aggregation better. It would simplify this PR a bit I think.

cbuescher · 2017-05-18T13:13:32Z

...ts/src/main/java/org/elasticsearch/search/aggregations/matrix/stats/InternalMatrixStats.java

    @Override
    public XContentBuilder doXContentBody(XContentBuilder builder, Params params) throws IOException {
-        if (results != null && results.getFieldCounts().keySet().isEmpty() == false) {
+        if (results != null) {


Looking at InternalMatrixStats#doReduce() it seems to me that at least after the aggregations are reduced (which I think should be always the case for aggregations that are rendered to REST) the results should always be set. Maybe we can check with somebody who knows more about the internals and work with that assumption on the client side instead of making this change.

cbuescher · 2017-05-18T13:17:41Z

...tats/src/main/java/org/elasticsearch/search/aggregations/matrix/stats/ParsedMatrixStats.java

+
+    @Override
+    public long getDocCount() {
+        throw new UnsupportedOperationException();


If we don't render this to the REST response, but its part of the MatrixStats interface, should we either add this value to the output or remove the method from the interface?

I think it makes sense to add this field to the REST response. I can do that in another PR against master.

cbuescher · 2017-05-18T13:32:11Z

...tats/src/main/java/org/elasticsearch/search/aggregations/matrix/stats/ParsedMatrixStats.java

+        if (covariances == null) {
+            return Double.NaN;
+        }
+        return checkedGet(checkedGet(covariances, fieldX), fieldY);


nit: I'm not entirely sure this does exactly what getValFromUpperTriangularMatrix() does (e.g. I think swapping the fieldX, fieldY arguments should return the same result in the original implementation, I'm not entirely sure though). It should be possible to declare that method as static in MatrixStatsResults and then use it from here to get the same behaviour.

Good idea, thanks

cbuescher · 2017-05-18T13:32:30Z

...tats/src/main/java/org/elasticsearch/search/aggregations/matrix/stats/ParsedMatrixStats.java

+        if (fieldX != null && fieldX.equals(fieldY)) {
+            return 1.0;
+        }
+        return checkedGet(checkedGet(correlations, fieldX), fieldY);


Same here about using getValFromUpperTriangularMatrix() if possible.

cbuescher · 2017-05-18T13:43:15Z

...c/test/java/org/elasticsearch/search/aggregations/matrix/stats/InternalMatrixStatsTests.java

-                                                     Map<String, Object> metaData) {
+    public void setUp() throws Exception {
+        super.setUp();
+        hasMatrixStatsResults = randomBoolean();


Based on the discussion above (whether we need to account for the "has no results" case at all after "reduce", I would opt for at least testing it less frequent, and thest the "result" case mostly.

cbuescher · 2017-05-18T13:50:09Z

...c/test/java/org/elasticsearch/search/aggregations/matrix/stats/InternalMatrixStatsTests.java

+        }
+
+        final String unknownField = randomAlphaOfLength(3);
+        final String fieldX = randomFrom(unknownField, randomAlphaOfLength(3));


I'm not sure I understand what this is doing?

Yeah, it's not really readable. I changed that. The idea is to test getCovariance/getCorrelation with various unknown fields.

javanna · 2017-05-18T14:52:43Z

...tats/src/main/java/org/elasticsearch/search/aggregations/matrix/stats/ParsedMatrixStats.java

+            matrixStats.skewness = new LinkedHashMap<>(size);
+            matrixStats.kurtosis = new LinkedHashMap<>(size);
+            matrixStats.covariances = new LinkedHashMap<>(size);
+            matrixStats.correlations = new LinkedHashMap<>(size);


do we need to maintain insertion order in all these maps?

No, but at least on counts so that we can render the parsed aggregation with the same order of fields stats.

tlrx · 2017-05-19T08:56:40Z

Thanks @cbuescher and @javanna for your reviews. I updated the code according to your comments.

I think this PR could be merged without #24776. Just in case I added a //norelease comment to not forget about updating the parsing logic and test once the change in core is merged.

cbuescher

LGTM

cbuescher · 2017-05-19T09:17:37Z

...c/test/java/org/elasticsearch/search/aggregations/matrix/stats/InternalMatrixStatsTests.java

        runningStats.add(fields, values);
        MatrixStatsResults matrixStatsResults = hasMatrixStatsResults ? new MatrixStatsResults(runningStats) : null;
-        return new InternalMatrixStats("_name", 1L, runningStats, matrixStatsResults, Collections.emptyList(), Collections.emptyMap());
+        return new InternalMatrixStats(name, 1L, runningStats, matrixStatsResults, Collections.emptyList(), Collections.emptyMap());


Good catch ;-)

javanna

LGTM

tlrx · 2017-05-19T10:23:09Z

Thanks @javanna and @cbuescher

Related to elastic#23331

tlrx added :Java High Level REST Client WIP labels May 17, 2017

tlrx force-pushed the add-parsing-matrix-stats branch 2 times, most recently from cc3a6aa to c65c4ea Compare May 18, 2017 11:29

Add parsing method for Matrix Stats aggregation

d6abe69

tlrx force-pushed the add-parsing-matrix-stats branch from c65c4ea to d6abe69 Compare May 18, 2017 11:30

tlrx requested review from cbuescher and javanna May 18, 2017 11:31

tlrx added review and removed WIP labels May 18, 2017

tlrx commented May 18, 2017

View reviewed changes

tlrx changed the title ~~[WIP] Add parsing method for Matrix Stats~~ Add parsing method for Matrix Stats May 18, 2017

javanna mentioned this pull request May 18, 2017

Java High Level REST Client plan for first release #23331

Closed

58 tasks

cbuescher requested changes May 18, 2017

View reviewed changes

javanna reviewed May 18, 2017

View reviewed changes

tlrx mentioned this pull request May 18, 2017

Add document count to Matrix Stats aggregation response #24776

Merged

tlrx added 2 commits May 19, 2017 10:49

Apply feedback

e4ef85d

add norelease for doc_count field

d96474e

cbuescher approved these changes May 19, 2017

View reviewed changes

javanna approved these changes May 19, 2017

View reviewed changes

tlrx merged commit dd731d9 into elastic:feature/client_aggs_parsing May 19, 2017

tlrx deleted the add-parsing-matrix-stats branch May 19, 2017 10:23

javanna mentioned this pull request May 22, 2017

Add aggs parsers for high level REST Client #24824

Merged

javanna pushed a commit to javanna/elasticsearch that referenced this pull request May 23, 2017

Add parsing method for Matrix Stats (elastic#24746)

24a9ee9

Related to elastic#23331

javanna mentioned this pull request May 23, 2017

Backport aggs parsers for high level REST Client #24844

Merged

Add parsing method for Matrix Stats #24746

Add parsing method for Matrix Stats #24746

Uh oh!

Conversation

tlrx commented May 17, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tlrx commented May 18, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cbuescher left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tlrx commented May 19, 2017 • edited by javanna Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cbuescher left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

javanna left a comment

Choose a reason for hiding this comment

Uh oh!

tlrx commented May 19, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tlrx commented May 17, 2017 •

edited

Loading

tlrx commented May 19, 2017 •

edited by javanna

Loading