Skip to content

NPE in slice query when running DeleteByQuery #42996

@benwtrent

Description

@benwtrent

Elasticsearch version (bin/elasticsearch --version):
Versions v8.0.0 and v7.3.0

JVM version (java -version): Most likely numerous versions. Specifically tested against openjdk 11.0.2

OS version (uname -a if on a Unix-like system): Darwin Kernel Version 18.6.0

Description of the problem including expected versus actual behavior:

When running a DeleteByQuery with a terms query, an NPE is thrown. While the DBQ is running, a document is being indexed that would have been included in the DBQ. Meaning, we are calling PUT <index>/_doc/<document_id> where document_id should be deleted via the DBQ.

Steps to reproduce:

We can get it to reproduce pretty reliably. This is obviously some sort of race condition. Simply try to index documents with the same ID as the ones that are being deleted via the DBQ. Potentially, it may be as simple as a newly indexed document contains the same term as the one being used in the DBQ. I am not 100% sure.

Provide logs (if relevant):
Stacktrace:

[2019-06-07T11:51:28,473][DEBUG][o.e.a.s.TransportSearchAction] [ml-qa-ubuntu-2] [.data-frame-internal-1][0], node[qSY9saZrRjm_7YULzSw-RQ], [P], s[STARTED], a[id=HTY4jpS1TzqOyOvpTVQauA]: Failed to execute [SearchRequest{searchType=QUERY_THEN_FETCH, indices=[.data-frame-internal-1], indicesOptions=IndicesOptions[ignore_unavailable=false, allow_no_indices=true, expand_wildcards_open=true, expand_wildcards_closed=false, allow_aliases_to_multiple_indices=true, forbid_closed_indices=true, ignore_aliases=false, ignore_throttled=true], routing='null', preference='null', requestCache=null, scroll=Scroll{keepAlive=5m}, maxConcurrentShardRequests=0, batchedReduceSize=512, preFilterShardSize=128, allowPartialSearchResults=true, localClusterAlias=null, getOrCreateAbsoluteStartMillis=-1, ccsMinimizeRoundtrips=true, source={"size":1000,"query":{"term":{"id":{"value":"test_dft_2","boost":1.0}}},"version":false,"seq_no_primary_term":true,"_source":false,"sort":[{"_doc":{"order":"asc"}}],"slice":{"field":"_id","id":4,"max":5}}}]
org.elasticsearch.transport.RemoteTransportException: [ml-qa-ubuntu-2][10.0.4.102:9300][indices:data/read/search[phase/query]]
Caused by: org.elasticsearch.search.query.QueryPhaseExecutionException: Query Failed [Failed to execute main query]
	at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:305) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:113) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:335) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:360) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.search.SearchService.lambda$executeQueryPhase$1(SearchService.java:340) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.ActionListener.lambda$map$2(ActionListener.java:145) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:62) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.search.SearchService$2.doRun(SearchService.java:1052) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:44) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:757) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:835) [?:?]
Caused by: java.lang.NullPointerException
	at org.elasticsearch.search.slice.TermsSliceQuery.build(TermsSliceQuery.java:81) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.search.slice.TermsSliceQuery$1.scorer(TermsSliceQuery.java:63) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.apache.lucene.search.Weight.scorerSupplier(Weight.java:147) ~[lucene-core-8.1.0.jar:8.1.0 dbe5ed0b2f17677ca6c904ebae919363f2d36a0a - ishan - 2019-05-09 19:34:03]
	at org.apache.lucene.search.LRUQueryCache$CachingWrapperWeight.scorerSupplier(LRUQueryCache.java:727) ~[lucene-core-8.1.0.jar:8.1.0 dbe5ed0b2f17677ca6c904ebae919363f2d36a0a - ishan - 2019-05-09 19:34:03]
	at org.elasticsearch.indices.IndicesQueryCache$CachingWeightWrapper.scorerSupplier(IndicesQueryCache.java:157) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.apache.lucene.search.BooleanWeight.scorerSupplier(BooleanWeight.java:374) ~[lucene-core-8.1.0.jar:8.1.0 dbe5ed0b2f17677ca6c904ebae919363f2d36a0a - ishan - 2019-05-09 19:34:03]
	at org.apache.lucene.search.BooleanWeight.scorer(BooleanWeight.java:340) ~[lucene-core-8.1.0.jar:8.1.0 dbe5ed0b2f17677ca6c904ebae919363f2d36a0a - ishan - 2019-05-09 19:34:03]
	at org.apache.lucene.search.Weight.bulkScorer(Weight.java:181) ~[lucene-core-8.1.0.jar:8.1.0 dbe5ed0b2f17677ca6c904ebae919363f2d36a0a - ishan - 2019-05-09 19:34:03]
	at org.apache.lucene.search.BooleanWeight.bulkScorer(BooleanWeight.java:334) ~[lucene-core-8.1.0.jar:8.1.0 dbe5ed0b2f17677ca6c904ebae919363f2d36a0a - ishan - 2019-05-09 19:34:03]
	at org.apache.lucene.search.LRUQueryCache$CachingWrapperWeight.bulkScorer(LRUQueryCache.java:808) ~[lucene-core-8.1.0.jar:8.1.0 dbe5ed0b2f17677ca6c904ebae919363f2d36a0a - ishan - 2019-05-09 19:34:03]
	at org.elasticsearch.indices.IndicesQueryCache$CachingWeightWrapper.bulkScorer(IndicesQueryCache.java:163) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.search.internal.ContextIndexSearcher$1.bulkScorer(ContextIndexSearcher.java:166) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:649) ~[lucene-core-8.1.0.jar:8.1.0 dbe5ed0b2f17677ca6c904ebae919363f2d36a0a - ishan - 2019-05-09 19:34:03]
	at org.apache.lucene.search.XIndexSearcher.search(XIndexSearcher.java:44) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.1.0 dbe5ed0b2f17677ca6c904ebae919363f2d36a0a - ishan - 2019-05-09 19:34:03]
	at org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:177) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:443) ~[lucene-core-8.1.0.jar:8.1.0 dbe5ed0b2f17677ca6c904ebae919363f2d36a0a - ishan - 2019-05-09 19:34:03]
	at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:270) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:113) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:335) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:360) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.search.SearchService.lambda$executeQueryPhase$1(SearchService.java:340) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.ActionListener.lambda$map$2(ActionListener.java:145) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:62) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.search.SearchService$2.doRun(SearchService.java:1052) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:44) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:757) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
	at java.lang.Thread.run(Thread.java:835) ~[?:?]

** Additional Notes **
This PR introduced the bug: #40741

I have verified reverting:

        if (softDeleteEnabled) {
            mergePolicy = new RecoverySourcePruneMergePolicy(SourceFieldMapper.RECOVERY_SOURCE_NAME, softDeletesPolicy::getRetentionQuery,
-                new SoftDeletesRetentionMergePolicy(Lucene.SOFT_DELETES_FIELD, softDeletesPolicy::getRetentionQuery, mergePolicy));
+                new SoftDeletesRetentionMergePolicy(Lucene.SOFT_DELETES_FIELD, softDeletesPolicy::getRetentionQuery,
+                    new PrunePostingsMergePolicy(mergePolicy, IdFieldMapper.NAME)));
        }

removes the issue.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions