Limit num hits when reading Lucene changes #30908

dnhatn · 2018-05-28T15:01:44Z

Today we don't limit the number of hits when reading changes from Lucene
index. If the index and the requesting seq# range both are large, the
searcher may consume a huge amount of memory.

This commit uses a fixed size batch with search_after to avoid the
problem.

Today we don't limit the number of hits when reading changes from Lucene index. If the index and the requesting seq# range both are large, the searcher may consume a huge amount of memory. This commit uses a fixed size batch with search_after to avoid the problem.

elasticmachine · 2018-05-28T15:01:45Z

Pinging @elastic/es-distributed

bleskes

I left a question. I was also wondering if we have already a test in place which makes sure that we overflow the SEARCH_BATCH_SIZE (as I don't see tests here)?

bleskes · 2018-05-28T15:07:00Z

server/src/main/java/org/elasticsearch/index/engine/LuceneChangesSnapshot.java

-            .add(LongPoint.newRangeQuery(SeqNoFieldMapper.NAME, fromSeqNo, toSeqNo), BooleanClause.Occur.FILTER)
-            .build();
+    private TopDocs searchOperations(ScoreDoc after) throws IOException {
+        final Query rangeQuery = LongPoint.newRangeQuery(SeqNoFieldMapper.NAME, fromSeqNo, toSeqNo);


why did we drop the DocValuesFieldExistsQuery(SeqNoFieldMapper.PRIMARY_TERM_NAME) part?

DocValuesFieldExistsQuery(SeqNoFieldMapper.PRIMARY_TERM_NAME) was in the temporary fix (7004d2c). This clause was to use to eliminate nested docs.

martijnvg

Thanks @dnhatn! I left a comment, also like @bleskes mentioned we need to test fetching of next window in LuceneChangesSnapshotTests.

martijnvg · 2018-05-28T15:18:57Z

server/src/main/java/org/elasticsearch/index/engine/LuceneChangesSnapshot.java

 * A {@link Translog.Snapshot} from changes in a Lucene index
 */
 final class LuceneChangesSnapshot implements Translog.Snapshot {
+    static final int SEARCH_BATCH_SIZE = 100;


Can this be specified as a parameter via newLuceneChangesSnapshot(...) method instead of a constant? Then in ccr we can make this configurable.

Also 100 feels on the low side to me. Maybe default to 1024?

dnhatn · 2018-05-28T16:11:23Z

@bleskes and @martijnvg I've added a test that verifies reading multiple batches. Could you please have another look? Thank you!

martijnvg

Left one comment. Otherwise LGTM

martijnvg · 2018-05-29T05:55:00Z

server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

        Searcher searcher = acquireSearcher(source, SearcherScope.INTERNAL);
        try {
-            LuceneChangesSnapshot snapshot = new LuceneChangesSnapshot(searcher, mapperService, minSeqNo, maxSeqNo, requiredFullRange);
+            final int batchSize = preferredSearchBatchSize <= 0 ? Engine.LUCENE_HISTORY_DEFAULT_SEARCH_BATCH_SIZE : preferredSearchBatchSize;


I think it is better to define this default just in ccr (ShardChanges.java)?
(and let newLuceneChangesSnapshot(...) methods just not allow values lower than 1)

If we do this (and I'm not convinced we should) that we please not use a magic number for the parameter? people should just say what they want or use this constant directly

bleskes

I left some more minor comments and one question I will reach out to discuss.

bleskes · 2018-05-29T06:46:49Z

server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

        Searcher searcher = acquireSearcher(source, SearcherScope.INTERNAL);
        try {
-            LuceneChangesSnapshot snapshot = new LuceneChangesSnapshot(searcher, mapperService, minSeqNo, maxSeqNo, requiredFullRange);
+            final int batchSize = preferredSearchBatchSize <= 0 ? Engine.LUCENE_HISTORY_DEFAULT_SEARCH_BATCH_SIZE : preferredSearchBatchSize;


If we do this (and I'm not convinced we should) that we please not use a magic number for the parameter? people should just say what they want or use this constant directly

bleskes · 2018-05-29T06:47:34Z

server/src/main/java/org/elasticsearch/index/engine/Engine.java

    public static final String SYNC_COMMIT_ID = "sync_id";
    public static final String HISTORY_UUID_KEY = "history_uuid";
+    // The default number of hits that one search should return when reading Lucene history
+    public static final int LUCENE_HISTORY_DEFAULT_SEARCH_BATCH_SIZE = 1024;


can you move this to LuceneChangesSnapshot ? this will allow using a short name like DEFAULT_BATCH_SIZE

bleskes · 2018-05-29T06:49:26Z

server/src/test/java/org/elasticsearch/index/engine/LuceneChangesSnapshotTests.java

        return operations;
    }
+
+    private int randomSearchBatchSize() {


we only need a method for this because the default is long. Can we use a short default name or a constant and just inline this.

bleskes · 2018-05-29T06:50:18Z

server/src/test/java/org/elasticsearch/index/engine/LuceneChangesSnapshotTests.java

        }
    }

+    public void testSearchMultipleBatches() throws Exception {


sine we now randomize the batching in tests, i don't think we need this dedicated test any more?

dnhatn · 2018-05-29T13:05:32Z

I discussed this with @martijnvg and agreed to back out the extra parameter. @bleskes Could you please take a look? Thank you!

s1monw

LGTM

dnhatn · 2018-05-29T17:13:04Z

@elasticmachine test this please

bleskes

LGTM2

dnhatn · 2018-05-30T22:35:54Z

Thanks everyone!

Today we don't limit the number of hits when reading changes from Lucene index. If the index and the requesting seq# range both are large, the searcher may consume a huge amount of memory. This commit uses a fixed size batch with search_after to avoid the problem.

dnhatn added >feature :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. labels May 28, 2018

dnhatn requested review from bleskes, martijnvg and s1monw May 28, 2018 15:01

bleskes reviewed May 28, 2018

View reviewed changes

martijnvg reviewed May 28, 2018

View reviewed changes

dnhatn added 2 commits May 28, 2018 11:24

Test multiple batches

0185ab3

make it configurable

b9120cc

Merge branch 'ccr' into lucene-changes-fixed-batch

3d50a08

martijnvg approved these changes May 29, 2018

View reviewed changes

bleskes suggested changes May 29, 2018

View reviewed changes

rollback the batch_size parameter

579fcdf

dnhatn requested a review from bleskes May 29, 2018 13:05

s1monw approved these changes May 29, 2018

View reviewed changes

Merge branch 'ccr' into lucene-changes-fixed-batch

f3453ff

bleskes approved these changes May 30, 2018

View reviewed changes

dnhatn added 2 commits May 30, 2018 12:23

Merge branch 'ccr' into lucene-changes-fixed-batch

5cbdfed

Merge branch 'ccr' into lucene-changes-fixed-batch

8d47369

dnhatn merged commit 8793ebc into elastic:ccr May 30, 2018

dnhatn deleted the lucene-changes-fixed-batch branch May 30, 2018 22:36

dnhatn added backport pending and removed backport pending labels May 30, 2018

Limit num hits when reading Lucene changes #30908

Limit num hits when reading Lucene changes #30908

Uh oh!

Conversation

dnhatn commented May 28, 2018

Uh oh!

elasticmachine commented May 28, 2018

Uh oh!

bleskes left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martijnvg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dnhatn commented May 28, 2018

Uh oh!

martijnvg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bleskes left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dnhatn commented May 29, 2018

Uh oh!

s1monw left a comment

Choose a reason for hiding this comment

Uh oh!

dnhatn commented May 29, 2018

Uh oh!

bleskes left a comment

Choose a reason for hiding this comment

Uh oh!

dnhatn commented May 30, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants