Skip to content

Conversation

@dnhatn
Copy link
Member

@dnhatn dnhatn commented May 28, 2018

Today we don't limit the number of hits when reading changes from Lucene
index. If the index and the requesting seq# range both are large, the
searcher may consume a huge amount of memory.

This commit uses a fixed size batch with search_after to avoid the
problem.

Today we don't limit the number of hits when reading changes from Lucene
index. If the index and the requesting seq# range both are large, the
searcher may consume a huge amount of memory.

This commit uses a fixed size batch with search_after to avoid the
problem.
@dnhatn dnhatn added >feature :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. labels May 28, 2018
@dnhatn dnhatn requested review from bleskes, martijnvg and s1monw May 28, 2018 15:01
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a question. I was also wondering if we have already a test in place which makes sure that we overflow the SEARCH_BATCH_SIZE (as I don't see tests here)?

.add(LongPoint.newRangeQuery(SeqNoFieldMapper.NAME, fromSeqNo, toSeqNo), BooleanClause.Occur.FILTER)
.build();
private TopDocs searchOperations(ScoreDoc after) throws IOException {
final Query rangeQuery = LongPoint.newRangeQuery(SeqNoFieldMapper.NAME, fromSeqNo, toSeqNo);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did we drop the DocValuesFieldExistsQuery(SeqNoFieldMapper.PRIMARY_TERM_NAME) part?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DocValuesFieldExistsQuery(SeqNoFieldMapper.PRIMARY_TERM_NAME) was in the temporary fix (7004d2c). This clause was to use to eliminate nested docs.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dnhatn! I left a comment, also like @bleskes mentioned we need to test fetching of next window in LuceneChangesSnapshotTests.

* A {@link Translog.Snapshot} from changes in a Lucene index
*/
final class LuceneChangesSnapshot implements Translog.Snapshot {
static final int SEARCH_BATCH_SIZE = 100;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be specified as a parameter via newLuceneChangesSnapshot(...) method instead of a constant? Then in ccr we can make this configurable.

Also 100 feels on the low side to me. Maybe default to 1024?

@dnhatn
Copy link
Member Author

dnhatn commented May 28, 2018

@bleskes and @martijnvg I've added a test that verifies reading multiple batches. Could you please have another look? Thank you!

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one comment. Otherwise LGTM

Searcher searcher = acquireSearcher(source, SearcherScope.INTERNAL);
try {
LuceneChangesSnapshot snapshot = new LuceneChangesSnapshot(searcher, mapperService, minSeqNo, maxSeqNo, requiredFullRange);
final int batchSize = preferredSearchBatchSize <= 0 ? Engine.LUCENE_HISTORY_DEFAULT_SEARCH_BATCH_SIZE : preferredSearchBatchSize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to define this default just in ccr (ShardChanges.java)?
(and let newLuceneChangesSnapshot(...) methods just not allow values lower than 1)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do this (and I'm not convinced we should) that we please not use a magic number for the parameter? people should just say what they want or use this constant directly

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some more minor comments and one question I will reach out to discuss.

Searcher searcher = acquireSearcher(source, SearcherScope.INTERNAL);
try {
LuceneChangesSnapshot snapshot = new LuceneChangesSnapshot(searcher, mapperService, minSeqNo, maxSeqNo, requiredFullRange);
final int batchSize = preferredSearchBatchSize <= 0 ? Engine.LUCENE_HISTORY_DEFAULT_SEARCH_BATCH_SIZE : preferredSearchBatchSize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do this (and I'm not convinced we should) that we please not use a magic number for the parameter? people should just say what they want or use this constant directly

public static final String SYNC_COMMIT_ID = "sync_id";
public static final String HISTORY_UUID_KEY = "history_uuid";
// The default number of hits that one search should return when reading Lucene history
public static final int LUCENE_HISTORY_DEFAULT_SEARCH_BATCH_SIZE = 1024;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move this to LuceneChangesSnapshot ? this will allow using a short name like DEFAULT_BATCH_SIZE

return operations;
}

private int randomSearchBatchSize() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we only need a method for this because the default is long. Can we use a short default name or a constant and just inline this.

}
}

public void testSearchMultipleBatches() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sine we now randomize the batching in tests, i don't think we need this dedicated test any more?

@dnhatn
Copy link
Member Author

dnhatn commented May 29, 2018

I discussed this with @martijnvg and agreed to back out the extra parameter. @bleskes Could you please take a look? Thank you!

@dnhatn dnhatn requested a review from bleskes May 29, 2018 13:05
Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dnhatn
Copy link
Member Author

dnhatn commented May 29, 2018

@elasticmachine test this please

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM2

@dnhatn
Copy link
Member Author

dnhatn commented May 30, 2018

Thanks everyone!

@dnhatn dnhatn merged commit 8793ebc into elastic:ccr May 30, 2018
@dnhatn dnhatn deleted the lucene-changes-fixed-batch branch May 30, 2018 22:36
dnhatn added a commit that referenced this pull request May 31, 2018
Today we don't limit the number of hits when reading changes from Lucene
index. If the index and the requesting seq# range both are large, the
searcher may consume a huge amount of memory.

This commit uses a fixed size batch with search_after to avoid the problem.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants