Skip to content

Conversation

@martijnvg
Copy link
Member

The Writeble representation is less heavy to parse and that will benefit percolate performance and throughput. The performance gain of this change depends on the complexity of the stored percolator queries. A percolator query with only simple term query will not benefit as much as bool query with many clauses. A simple test with percolator queries having bool query with two should clauses showed a 35% improvement in query time. Percolator queries do need to be re-indexed in order to benefit from this change.

The change looks bigger than it is. The main change is in PercolatorFieldMapper and PercolateQueryBuilder. The changes in the other files are related to make sure that NamedWriteableRegistry is available in the PercolateQueryBuilder.

@martijnvg martijnvg added :Search Relevance/Percolator Reverse search: find queries that match a document >enhancement review v6.0.0 labels Jun 28, 2017
@martijnvg martijnvg mentioned this pull request Jun 28, 2017
9 tasks
@jpountz
Copy link
Contributor

jpountz commented Jul 4, 2017

I think this is a better representation indeed. In terms of backward compat, I think we have several options:

  • extend the compatibility guarantees for the binary representation of query builders
  • store both the xcontent and binary representation of the query and use the binary representation whenever possible and the xcontent as a fallback
  • only store the binary representation of the query in doc values and fall back to the _source if the index creation version is earlier than the min compatibility version of the node?

@martijnvg
Copy link
Member Author

@jpountz Thanks for looking! I'll label the pr with discuss label to discuss this change with a wider audience.

@martijnvg martijnvg force-pushed the percolator_store_queries_as_writable branch 2 times, most recently from c0aa339 to 51500ca Compare July 6, 2017 11:35
@martijnvg martijnvg removed the discuss label Jul 7, 2017
@martijnvg
Copy link
Member Author

martijnvg commented Jul 7, 2017

I removed the discuss label. At least from 6.0, the binary format has the same the same bwc guarantees as the xcontent format. So there this PR doesn't have make trace bwc guarantees for better performance .

@martijnvg martijnvg force-pushed the percolator_store_queries_as_writable branch from 51500ca to 99c29cc Compare July 12, 2017 13:31
@martijnvg
Copy link
Member Author

My previous statement is incorrect. The binary format is backwards compatible with the latest minor release of the previous major release line and that is different than the backwards compatibility of the xcontent format.

I've updated the PR to take this into account, but instead of adding a fallback parsing mechanism to parse to _source or remain to store the xcontent next to the binary format, the pr now fails to execute the percolate query and index percolator queries if the created index version of the index the percolator queries reside in is before the latest minor release of the previous release line. I believe that this behavior is better than if percolating becomes slower after an upgrading without any clear warning.

If percolator queries are no longer compatible or if upgrading to the next version means that percolator queries are not going to be compatible then percolator indices should re-indexed using the reindex api. This shouldn't be a problem for indices only containing percolator queries, because the number of documents tends to be much lower compared to indices with actual data.

@frankkoornstra
Copy link

Don't know too much about the internals but from a user perspective a performance improvement in percolators would be awesome! Just speaking from my use case: percolators are a projection from another source of truth so backwards compatibility is a non-issue for me. We would just remove the percolators index and re-create it. Thanks for looking into this!

@martijnvg martijnvg force-pushed the percolator_store_queries_as_writable branch 2 times, most recently from 4d3823e to ff31123 Compare July 18, 2017 14:11
@martijnvg
Copy link
Member Author

martijnvg commented Jul 18, 2017

I've updated the pr. Removed the checks that prohibits using the percolate query on a too old index and indexing percolator queries into an too old index. Instead the backwards compatability of the binary representation of query builders is extended to be the same as the xcontent representation of query builders.

To ensure this updated backwards compatability of query builders' binary representation; a qa test has been added that writes percolator queries with an older version and then reads them with the current version. Currently the qa test only tests against the current version, because there isn't a release that included the changes in this pr. I'm going to test more query builders (currently 2 are tested), but just wanted to check what others think about the testing approach.

@martijnvg martijnvg force-pushed the percolator_store_queries_as_writable branch 2 times, most recently from 31ac8ac to c32d291 Compare July 24, 2017 08:36
Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments but it looks good overall.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not necessary anymore?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left over...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we could try to reuse the inputs across calls since this class is not supposed to be used by multiple threads?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm not seeing it, but the input classes here don't have some kind of a reset method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just checking: the percolator field type only supports a single value per doc? If yes I think it'd be nice to have a comment here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm actually why don't you pull binary doc values like we used to do before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this change is a side effect of making the the query builder field inside the percolator field accessible via doc values fields. I needed that in order to make the query builder bwc qa test to work.

When field values via doc values fields it uses ScriptDocValues to access doc values. (DocValueFieldsFetchSubPhase.java line 88) This uses a special encoding to deal with multi values binary values.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I changed this in the latest commit to now not use our field data apis, but BinaryDocValues directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even if we don't need the values, can you leave comments about what these ints store?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a comment. The reason two vints are read is because of the format for binary fields.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to have an assert after this line that we consumed all bytes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll only run this qa module with versions >= 6.0 I think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why CURRENT? it should be the index created version?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's also make sure that we read all the bytes?

@martijnvg martijnvg force-pushed the percolator_store_queries_as_writable branch from ee61b71 to 3ecd820 Compare July 26, 2017 10:02
@martijnvg martijnvg force-pushed the percolator_store_queries_as_writable branch from 3ecd820 to 5748912 Compare July 27, 2017 05:58
Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Maybe we should discuss whether we should get this change in 6.0 in order to be able to remove the bw compat logic when we move to 8.0. cc @s1monw @clintongormley

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check should not be necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now I think about it: no :)

@martijnvg martijnvg force-pushed the percolator_store_queries_as_writable branch from 5748912 to 4dcf938 Compare July 27, 2017 12:24
@martijnvg martijnvg force-pushed the percolator_store_queries_as_writable branch 3 times, most recently from f94fac7 to dd1c084 Compare July 28, 2017 08:47
…of its XContent representation.

The Writeble representation is less heavy to parse and that will benefit percolate performance and throughput.

The query builder's binary format has now the same bwc guarentees as the xcontent format.

Added a qa test that verifies that percolator queries written in older versions are still readable by the current version.
@martijnvg martijnvg force-pushed the percolator_store_queries_as_writable branch from dd1c084 to 7c3735b Compare July 28, 2017 10:24
@martijnvg martijnvg merged commit 7c3735b into elastic:master Jul 28, 2017
@martijnvg
Copy link
Member Author

@jpountz I labelled this issue backport pending in order to discuss whether this should be backported to 6.0.

@martijnvg
Copy link
Member Author

We are going to backport this change after 6.0.0-beta1 has been released.

@martijnvg
Copy link
Member Author

This has been backported to the 6.0 branch: 77c73c2

martijnvg added a commit that referenced this pull request Oct 9, 2017
@lcawl lcawl removed the v6.1.0 label Dec 12, 2017
@jimczi jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :Search Relevance/Percolator Reverse search: find queries that match a document v6.0.0-beta2 v7.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants