Skip to content

Commit 0c4b256

Browse files
authored
ESQL: Honor skip_unavailable setting for nonmatching indices errors at planning time (#116348) (#116824)
Adds support to ES|QL planning time (EsqlSession code) for dealing with non-matching indices and how that relates to the remote cluster skip_unavailable setting and also how to deal with missing indices on the local cluster (if included in the query). For clusters included in an ES|QL query: • `skip_unavailable=true` means: if no data is returned from that cluster (due to cluster-not-connected, no matching indices, a missing concrete index or shard failures during searches), it is not a "fatal" error that causes the entire query to fail. Instead it is just a failure on that particular cluster and partial data should be returned from other clusters. • `skip_unavailable=false` means: if no data is returned from that cluster (for the same reasons enumerated above), then the whole query should fail. This allows users to ensure that data is returned from a "required" cluster. • For the local cluster, ES|QL assumes `allow_no_indices=true` and the skip_unavailable setting does not apply (in part because there is no way for a user to set skip_unavailable for the local cluster) Based on discussions with ES|QL team members, we defined the following rules to be enforced with respect to non-matching index expressions: **Rules enforced at planning time** P1. fail the query if there are no matching indices on any cluster (VerificationException) P2. fail the query if a skip_unavailable:false cluster has no matching indices (the local cluster already has this rule enforced at planning time) P3. fail query if the local cluster has no matching indices and a concrete index was specified **Rules enforced at execution time** For missing concrete (no wildcards present) index expressions: E1. fail the query when it was specified for the local cluster or a skip_unavailable=false remote cluster E2: on skip_unavailable=true clusters: an error fails the query on that cluster, but not the entire query (data from other clusters still returned) **Notes on the rules** P1: this already happens, no new code needed in this PR P2: The reason we need to enforce rule 2 at planning time is that when there are no matching indices from field caps the EsIndex that is created (and passed into IndexResolution.valid) leaves that cluster out of the list, so at execution time it will not attempt to query that cluster at all, so execution time will not catch missing concrete indices. And even if it did get queried at execution time it wouldn't fail on wildcard only indices where none of them matched. P3: Right now `FROM remote:existent,nomatch` does NOT throw a failure (for same reason described in rule 2 above) so that needs to be enforced in this PR. This PR deals with enforcing and testing the planning time rules: P1, P2 and P3. A follow-on PR will address changes needed for handling the execution time rules. **Notes on PR scope** This PR covers nonsecured clusters (`xpack.security.enabled: false`) and security using certs ("RCS1). In my testing I've founding that api-key based security ("RCS2") is not behaving the same, so that work has been deferred to a follow-on PR. Partially addresses #114531
1 parent 37ef5f2 commit 0c4b256

File tree

10 files changed

+1668
-303
lines changed

10 files changed

+1668
-303
lines changed

docs/changelog/116348.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 116348
2+
summary: "ESQL: Honor skip_unavailable setting for nonmatching indices errors at planning time"
3+
area: ES|QL
4+
type: enhancement
5+
issues: [ 114531 ]

x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/CrossClustersQueryIT.java

Lines changed: 839 additions & 230 deletions
Large diffs are not rendered by default.

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/index/IndexResolution.java

Lines changed: 53 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -12,22 +12,37 @@
1212
import java.util.Collections;
1313
import java.util.Map;
1414
import java.util.Objects;
15+
import java.util.Set;
1516

1617
public final class IndexResolution {
1718

18-
public static IndexResolution valid(EsIndex index, Map<String, FieldCapabilitiesFailure> unavailableClusters) {
19+
/**
20+
* @param index EsIndex encapsulating requested index expression, resolved mappings and index modes from field-caps.
21+
* @param resolvedIndices Set of concrete indices resolved by field-caps. (This information is not always present in the EsIndex).
22+
* @param unavailableClusters Remote clusters that could not be contacted during planning
23+
* @return valid IndexResolution
24+
*/
25+
public static IndexResolution valid(
26+
EsIndex index,
27+
Set<String> resolvedIndices,
28+
Map<String, FieldCapabilitiesFailure> unavailableClusters
29+
) {
1930
Objects.requireNonNull(index, "index must not be null if it was found");
31+
Objects.requireNonNull(resolvedIndices, "resolvedIndices must not be null");
2032
Objects.requireNonNull(unavailableClusters, "unavailableClusters must not be null");
21-
return new IndexResolution(index, null, unavailableClusters);
33+
return new IndexResolution(index, null, resolvedIndices, unavailableClusters);
2234
}
2335

36+
/**
37+
* Use this method only if the set of concrete resolved indices is the same as EsIndex#concreteIndices().
38+
*/
2439
public static IndexResolution valid(EsIndex index) {
25-
return valid(index, Collections.emptyMap());
40+
return valid(index, index.concreteIndices(), Collections.emptyMap());
2641
}
2742

2843
public static IndexResolution invalid(String invalid) {
2944
Objects.requireNonNull(invalid, "invalid must not be null to signal that the index is invalid");
30-
return new IndexResolution(null, invalid, Collections.emptyMap());
45+
return new IndexResolution(null, invalid, Collections.emptySet(), Collections.emptyMap());
3146
}
3247

3348
public static IndexResolution notFound(String name) {
@@ -39,12 +54,20 @@ public static IndexResolution notFound(String name) {
3954
@Nullable
4055
private final String invalid;
4156

57+
// all indices found by field-caps
58+
private final Set<String> resolvedIndices;
4259
// remote clusters included in the user's index expression that could not be connected to
4360
private final Map<String, FieldCapabilitiesFailure> unavailableClusters;
4461

45-
private IndexResolution(EsIndex index, @Nullable String invalid, Map<String, FieldCapabilitiesFailure> unavailableClusters) {
62+
private IndexResolution(
63+
EsIndex index,
64+
@Nullable String invalid,
65+
Set<String> resolvedIndices,
66+
Map<String, FieldCapabilitiesFailure> unavailableClusters
67+
) {
4668
this.index = index;
4769
this.invalid = invalid;
70+
this.resolvedIndices = resolvedIndices;
4871
this.unavailableClusters = unavailableClusters;
4972
}
5073

@@ -64,8 +87,8 @@ public EsIndex get() {
6487
}
6588

6689
/**
67-
* Is the index valid for use with ql? Returns {@code false} if the
68-
* index wasn't found.
90+
* Is the index valid for use with ql?
91+
* @return {@code false} if the index wasn't found.
6992
*/
7093
public boolean isValid() {
7194
return invalid == null;
@@ -75,10 +98,17 @@ public boolean isValid() {
7598
* @return Map of unavailable clusters (could not be connected to during field-caps query). Key of map is cluster alias,
7699
* value is the {@link FieldCapabilitiesFailure} describing the issue.
77100
*/
78-
public Map<String, FieldCapabilitiesFailure> getUnavailableClusters() {
101+
public Map<String, FieldCapabilitiesFailure> unavailableClusters() {
79102
return unavailableClusters;
80103
}
81104

105+
/**
106+
* @return all indices found by field-caps (regardless of whether they had any mappings)
107+
*/
108+
public Set<String> resolvedIndices() {
109+
return resolvedIndices;
110+
}
111+
82112
@Override
83113
public boolean equals(Object obj) {
84114
if (obj == null || obj.getClass() != getClass()) {
@@ -87,16 +117,29 @@ public boolean equals(Object obj) {
87117
IndexResolution other = (IndexResolution) obj;
88118
return Objects.equals(index, other.index)
89119
&& Objects.equals(invalid, other.invalid)
120+
&& Objects.equals(resolvedIndices, other.resolvedIndices)
90121
&& Objects.equals(unavailableClusters, other.unavailableClusters);
91122
}
92123

93124
@Override
94125
public int hashCode() {
95-
return Objects.hash(index, invalid, unavailableClusters);
126+
return Objects.hash(index, invalid, resolvedIndices, unavailableClusters);
96127
}
97128

98129
@Override
99130
public String toString() {
100-
return invalid != null ? invalid : index.name();
131+
return invalid != null
132+
? invalid
133+
: "IndexResolution{"
134+
+ "index="
135+
+ index
136+
+ ", invalid='"
137+
+ invalid
138+
+ '\''
139+
+ ", resolvedIndices="
140+
+ resolvedIndices
141+
+ ", unavailableClusters="
142+
+ unavailableClusters
143+
+ '}';
101144
}
102145
}

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/ComputeService.java

Lines changed: 11 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -251,19 +251,17 @@ private static void updateExecutionInfoAfterCoordinatorOnlyQuery(EsqlExecutionIn
251251
if (execInfo.isCrossClusterSearch()) {
252252
assert execInfo.planningTookTime() != null : "Planning took time should be set on EsqlExecutionInfo but is null";
253253
for (String clusterAlias : execInfo.clusterAliases()) {
254-
// took time and shard counts for SKIPPED clusters were added at end of planning, so only update other cases here
255-
if (execInfo.getCluster(clusterAlias).getStatus() != EsqlExecutionInfo.Cluster.Status.SKIPPED) {
256-
execInfo.swapCluster(
257-
clusterAlias,
258-
(k, v) -> new EsqlExecutionInfo.Cluster.Builder(v).setTook(execInfo.overallTook())
259-
.setStatus(EsqlExecutionInfo.Cluster.Status.SUCCESSFUL)
260-
.setTotalShards(0)
261-
.setSuccessfulShards(0)
262-
.setSkippedShards(0)
263-
.setFailedShards(0)
264-
.build()
265-
);
266-
}
254+
execInfo.swapCluster(clusterAlias, (k, v) -> {
255+
var builder = new EsqlExecutionInfo.Cluster.Builder(v).setTook(execInfo.overallTook())
256+
.setTotalShards(0)
257+
.setSuccessfulShards(0)
258+
.setSkippedShards(0)
259+
.setFailedShards(0);
260+
if (v.getStatus() == EsqlExecutionInfo.Cluster.Status.RUNNING) {
261+
builder.setStatus(EsqlExecutionInfo.Cluster.Status.SUCCESSFUL);
262+
}
263+
return builder.build();
264+
});
267265
}
268266
}
269267
}

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -309,7 +309,7 @@ private <T> void preAnalyze(
309309
// resolution to updateExecutionInfo
310310
if (indexResolution.isValid()) {
311311
EsqlSessionCCSUtils.updateExecutionInfoWithClustersWithNoMatchingIndices(executionInfo, indexResolution);
312-
EsqlSessionCCSUtils.updateExecutionInfoWithUnavailableClusters(executionInfo, indexResolution.getUnavailableClusters());
312+
EsqlSessionCCSUtils.updateExecutionInfoWithUnavailableClusters(executionInfo, indexResolution.unavailableClusters());
313313
if (executionInfo.isCrossClusterSearch()
314314
&& executionInfo.getClusterStateCount(EsqlExecutionInfo.Cluster.Status.RUNNING) == 0) {
315315
// for a CCS, if all clusters have been marked as SKIPPED, nothing to search so send a sentinel

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSessionCCSUtils.java

Lines changed: 75 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
import org.elasticsearch.transport.ConnectTransportException;
1818
import org.elasticsearch.transport.RemoteClusterAware;
1919
import org.elasticsearch.transport.RemoteTransportException;
20+
import org.elasticsearch.xpack.esql.VerificationException;
2021
import org.elasticsearch.xpack.esql.action.EsqlExecutionInfo;
2122
import org.elasticsearch.xpack.esql.analysis.Analyzer;
2223
import org.elasticsearch.xpack.esql.index.IndexResolution;
@@ -33,7 +34,6 @@ class EsqlSessionCCSUtils {
3334

3435
private EsqlSessionCCSUtils() {}
3536

36-
// visible for testing
3737
static Map<String, FieldCapabilitiesFailure> determineUnavailableRemoteClusters(List<FieldCapabilitiesFailure> failures) {
3838
Map<String, FieldCapabilitiesFailure> unavailableRemotes = new HashMap<>();
3939
for (FieldCapabilitiesFailure failure : failures) {
@@ -75,10 +75,10 @@ public void onFailure(Exception e) {
7575

7676
/**
7777
* Whether to return an empty result (HTTP status 200) for a CCS rather than a top level 4xx/5xx error.
78-
*
78+
* <p>
7979
* For cases where field-caps had no indices to search and the remotes were unavailable, we
8080
* return an empty successful response (200) if all remotes are marked with skip_unavailable=true.
81-
*
81+
* <p>
8282
* Note: a follow-on PR will expand this logic to handle cases where no indices could be found to match
8383
* on any of the requested clusters.
8484
*/
@@ -132,7 +132,6 @@ static void updateExecutionInfoToReturnEmptyResult(EsqlExecutionInfo executionIn
132132
}
133133
}
134134

135-
// visible for testing
136135
static String createIndexExpressionFromAvailableClusters(EsqlExecutionInfo executionInfo) {
137136
StringBuilder sb = new StringBuilder();
138137
for (String clusterAlias : executionInfo.clusterAliases()) {
@@ -181,39 +180,91 @@ static void updateExecutionInfoWithUnavailableClusters(EsqlExecutionInfo execInf
181180
}
182181
}
183182

184-
// visible for testing
185183
static void updateExecutionInfoWithClustersWithNoMatchingIndices(EsqlExecutionInfo executionInfo, IndexResolution indexResolution) {
186184
Set<String> clustersWithResolvedIndices = new HashSet<>();
187185
// determine missing clusters
188-
for (String indexName : indexResolution.get().indexNameWithModes().keySet()) {
186+
for (String indexName : indexResolution.resolvedIndices()) {
189187
clustersWithResolvedIndices.add(RemoteClusterAware.parseClusterAlias(indexName));
190188
}
191189
Set<String> clustersRequested = executionInfo.clusterAliases();
192190
Set<String> clustersWithNoMatchingIndices = Sets.difference(clustersRequested, clustersWithResolvedIndices);
193-
clustersWithNoMatchingIndices.removeAll(indexResolution.getUnavailableClusters().keySet());
191+
clustersWithNoMatchingIndices.removeAll(indexResolution.unavailableClusters().keySet());
192+
193+
/**
194+
* Rules enforced at planning time around non-matching indices
195+
* P1. fail query if no matching indices on any cluster (VerificationException) - that is handled elsewhere (TODO: document where)
196+
* P2. fail query if a skip_unavailable:false cluster has no matching indices (the local cluster already has this rule
197+
* enforced at planning time)
198+
* P3. fail query if the local cluster has no matching indices and a concrete index was specified
199+
*/
200+
String fatalErrorMessage = null;
194201
/*
195202
* These are clusters in the original request that are not present in the field-caps response. They were
196-
* specified with an index or indices that do not exist, so the search on that cluster is done.
203+
* specified with an index expression matched no indices, so the search on that cluster is done.
197204
* Mark it as SKIPPED with 0 shards searched and took=0.
198205
*/
199206
for (String c : clustersWithNoMatchingIndices) {
200-
// TODO: in a follow-on PR, throw a Verification(400 status code) for local and remotes with skip_unavailable=false if
201-
// they were requested with one or more concrete indices
202-
// for now we never mark the local cluster as SKIPPED
203-
final var status = RemoteClusterAware.LOCAL_CLUSTER_GROUP_KEY.equals(c)
204-
? EsqlExecutionInfo.Cluster.Status.SUCCESSFUL
205-
: EsqlExecutionInfo.Cluster.Status.SKIPPED;
206-
executionInfo.swapCluster(
207-
c,
208-
(k, v) -> new EsqlExecutionInfo.Cluster.Builder(v).setStatus(status)
209-
.setTook(new TimeValue(0))
210-
.setTotalShards(0)
211-
.setSuccessfulShards(0)
212-
.setSkippedShards(0)
213-
.setFailedShards(0)
214-
.build()
215-
);
207+
final String indexExpression = executionInfo.getCluster(c).getIndexExpression();
208+
if (missingIndicesIsFatal(c, executionInfo)) {
209+
String error = Strings.format(
210+
"Unknown index [%s]",
211+
(c.equals(RemoteClusterAware.LOCAL_CLUSTER_GROUP_KEY) ? indexExpression : c + ":" + indexExpression)
212+
);
213+
if (fatalErrorMessage == null) {
214+
fatalErrorMessage = error;
215+
} else {
216+
fatalErrorMessage += "; " + error;
217+
}
218+
} else {
219+
// handles local cluster (when no concrete indices requested) and skip_unavailable=true clusters
220+
EsqlExecutionInfo.Cluster.Status status;
221+
ShardSearchFailure failure;
222+
if (c.equals(RemoteClusterAware.LOCAL_CLUSTER_GROUP_KEY)) {
223+
status = EsqlExecutionInfo.Cluster.Status.SUCCESSFUL;
224+
failure = null;
225+
} else {
226+
status = EsqlExecutionInfo.Cluster.Status.SKIPPED;
227+
failure = new ShardSearchFailure(new VerificationException("Unknown index [" + indexExpression + "]"));
228+
}
229+
executionInfo.swapCluster(c, (k, v) -> {
230+
var builder = new EsqlExecutionInfo.Cluster.Builder(v).setStatus(status)
231+
.setTook(new TimeValue(0))
232+
.setTotalShards(0)
233+
.setSuccessfulShards(0)
234+
.setSkippedShards(0)
235+
.setFailedShards(0);
236+
if (failure != null) {
237+
builder.setFailures(List.of(failure));
238+
}
239+
return builder.build();
240+
});
241+
}
216242
}
243+
if (fatalErrorMessage != null) {
244+
throw new VerificationException(fatalErrorMessage);
245+
}
246+
}
247+
248+
// visible for testing
249+
static boolean missingIndicesIsFatal(String clusterAlias, EsqlExecutionInfo executionInfo) {
250+
// missing indices on local cluster is fatal only if a concrete index requested
251+
if (clusterAlias.equals(RemoteClusterAware.LOCAL_CLUSTER_GROUP_KEY)) {
252+
return concreteIndexRequested(executionInfo.getCluster(clusterAlias).getIndexExpression());
253+
}
254+
return executionInfo.getCluster(clusterAlias).isSkipUnavailable() == false;
255+
}
256+
257+
private static boolean concreteIndexRequested(String indexExpression) {
258+
for (String expr : indexExpression.split(",")) {
259+
if (expr.charAt(0) == '<' || expr.startsWith("-<")) {
260+
// skip date math expressions
261+
continue;
262+
}
263+
if (expr.indexOf('*') < 0) {
264+
return true;
265+
}
266+
}
267+
return false;
217268
}
218269

219270
// visible for testing

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/IndexResolver.java

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
package org.elasticsearch.xpack.esql.session;
88

99
import org.elasticsearch.action.ActionListener;
10+
import org.elasticsearch.action.fieldcaps.FieldCapabilitiesFailure;
1011
import org.elasticsearch.action.fieldcaps.FieldCapabilitiesIndexResponse;
1112
import org.elasticsearch.action.fieldcaps.FieldCapabilitiesRequest;
1213
import org.elasticsearch.action.fieldcaps.FieldCapabilitiesResponse;
@@ -143,21 +144,24 @@ public IndexResolution mergedMappings(String indexPattern, FieldCapabilitiesResp
143144
fields.put(name, field);
144145
}
145146

147+
Map<String, FieldCapabilitiesFailure> unavailableRemotes = EsqlSessionCCSUtils.determineUnavailableRemoteClusters(
148+
fieldCapsResponse.getFailures()
149+
);
150+
151+
Map<String, IndexMode> concreteIndices = Maps.newMapWithExpectedSize(fieldCapsResponse.getIndexResponses().size());
152+
for (FieldCapabilitiesIndexResponse ir : fieldCapsResponse.getIndexResponses()) {
153+
concreteIndices.put(ir.getIndexName(), ir.getIndexMode());
154+
}
155+
146156
boolean allEmpty = true;
147157
for (FieldCapabilitiesIndexResponse ir : fieldCapsResponse.getIndexResponses()) {
148158
allEmpty &= ir.get().isEmpty();
149159
}
150160
if (allEmpty) {
151161
// If all the mappings are empty we return an empty set of resolved indices to line up with QL
152-
return IndexResolution.valid(new EsIndex(indexPattern, rootFields, Map.of()));
153-
}
154-
155-
Map<String, IndexMode> concreteIndices = Maps.newMapWithExpectedSize(fieldCapsResponse.getIndexResponses().size());
156-
for (FieldCapabilitiesIndexResponse ir : fieldCapsResponse.getIndexResponses()) {
157-
concreteIndices.put(ir.getIndexName(), ir.getIndexMode());
162+
return IndexResolution.valid(new EsIndex(indexPattern, rootFields, Map.of()), concreteIndices.keySet(), unavailableRemotes);
158163
}
159-
EsIndex esIndex = new EsIndex(indexPattern, rootFields, concreteIndices);
160-
return IndexResolution.valid(esIndex, EsqlSessionCCSUtils.determineUnavailableRemoteClusters(fieldCapsResponse.getFailures()));
164+
return IndexResolution.valid(new EsIndex(indexPattern, rootFields, concreteIndices), concreteIndices.keySet(), unavailableRemotes);
161165
}
162166

163167
private boolean allNested(List<IndexFieldCapabilities> caps) {

0 commit comments

Comments
 (0)