Skip to content

Conversation

@tlrx
Copy link
Member

@tlrx tlrx commented May 17, 2017

This PR adds the parsing logic for the InternalMatrixStats aggregation.

@tlrx tlrx force-pushed the add-parsing-matrix-stats branch 2 times, most recently from cc3a6aa to c65c4ea Compare May 18, 2017 11:29
@tlrx tlrx force-pushed the add-parsing-matrix-stats branch from c65c4ea to d6abe69 Compare May 18, 2017 11:30
@tlrx tlrx requested review from cbuescher and javanna May 18, 2017 11:31
@tlrx tlrx added review and removed WIP labels May 18, 2017
@tlrx
Copy link
Member Author

tlrx commented May 18, 2017

@cbuescher @javanna This is ready to be reviewed.

@Override
public XContentBuilder doXContentBody(XContentBuilder builder, Params params) throws IOException {
if (results != null && results.getFieldCounts().keySet().isEmpty() == false) {
if (results != null) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to change this so that at parsing time we can make the distinction between no existing results and empty results.

When there is no results, the rendered output will be something like "matrix_stats":{} and empty results will be "matrix_stats":{"fields":[]}. This allows us to reproduce the same behavior of MatrixStats methods in the transport client and parsed implementations.

I'm not sure that "no results" is a real use case but InternalMatrixStats has specific checks at several for results being null (and it's also randomized in tests) so it let me think that it can happen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at InternalMatrixStats#doReduce() it seems to me that at least after the aggregations are reduced (which I think should be always the case for aggregations that are rendered to REST) the results should always be set. Maybe we can check with somebody who knows more about the internals and work with that assumption on the client side instead of making this change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, after looking again results is never null once the aggregations are reduced. It can though be null at the transport level when shard results are sent over the wire before being reduced on the coordinating node.

It explains why results is randomly set to null in InternalMatrixStats because serialization is tested there. I reverted this change and only test the normal case.

@tlrx tlrx changed the title [WIP] Add parsing method for Matrix Stats Add parsing method for Matrix Stats May 18, 2017
Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tlrx thanks, I left a few questions and comments, especially around whether we need to test the extra "no result" case. Maybe this is something to can check with somebody who knows the internals of this aggregation better. It would simplify this PR a bit I think.

@Override
public XContentBuilder doXContentBody(XContentBuilder builder, Params params) throws IOException {
if (results != null && results.getFieldCounts().keySet().isEmpty() == false) {
if (results != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at InternalMatrixStats#doReduce() it seems to me that at least after the aggregations are reduced (which I think should be always the case for aggregations that are rendered to REST) the results should always be set. Maybe we can check with somebody who knows more about the internals and work with that assumption on the client side instead of making this change.


@Override
public long getDocCount() {
throw new UnsupportedOperationException();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't render this to the REST response, but its part of the MatrixStats interface, should we either add this value to the output or remove the method from the interface?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to add this field to the REST response. I can do that in another PR against master.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

if (covariances == null) {
return Double.NaN;
}
return checkedGet(checkedGet(covariances, fieldX), fieldY);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'm not entirely sure this does exactly what getValFromUpperTriangularMatrix() does (e.g. I think swapping the fieldX, fieldY arguments should return the same result in the original implementation, I'm not entirely sure though). It should be possible to declare that method as static in MatrixStatsResults and then use it from here to get the same behaviour.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, thanks

if (fieldX != null && fieldX.equals(fieldY)) {
return 1.0;
}
return checkedGet(checkedGet(correlations, fieldX), fieldY);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here about using getValFromUpperTriangularMatrix() if possible.

Map<String, Object> metaData) {
public void setUp() throws Exception {
super.setUp();
hasMatrixStatsResults = randomBoolean();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the discussion above (whether we need to account for the "has no results" case at all after "reduce", I would opt for at least testing it less frequent, and thest the "result" case mostly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

}

final String unknownField = randomAlphaOfLength(3);
final String fieldX = randomFrom(unknownField, randomAlphaOfLength(3));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what this is doing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's not really readable. I changed that. The idea is to test getCovariance/getCorrelation with various unknown fields.

matrixStats.skewness = new LinkedHashMap<>(size);
matrixStats.kurtosis = new LinkedHashMap<>(size);
matrixStats.covariances = new LinkedHashMap<>(size);
matrixStats.correlations = new LinkedHashMap<>(size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to maintain insertion order in all these maps?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but at least on counts so that we can render the parsed aggregation with the same order of fields stats.

@tlrx
Copy link
Member Author

tlrx commented May 19, 2017

Thanks @cbuescher and @javanna for your reviews. I updated the code according to your comments.

I think this PR could be merged without #24776. Just in case I added a //norelease comment to not forget about updating the parsing logic and test once the change in core is merged.

Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

runningStats.add(fields, values);
MatrixStatsResults matrixStatsResults = hasMatrixStatsResults ? new MatrixStatsResults(runningStats) : null;
return new InternalMatrixStats("_name", 1L, runningStats, matrixStatsResults, Collections.emptyList(), Collections.emptyMap());
return new InternalMatrixStats(name, 1L, runningStats, matrixStatsResults, Collections.emptyList(), Collections.emptyMap());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch ;-)

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tlrx tlrx merged commit dd731d9 into elastic:feature/client_aggs_parsing May 19, 2017
@tlrx
Copy link
Member Author

tlrx commented May 19, 2017

Thanks @javanna and @cbuescher

@tlrx tlrx deleted the add-parsing-matrix-stats branch May 19, 2017 10:23
javanna pushed a commit to javanna/elasticsearch that referenced this pull request May 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants