ESQL: Honor skip_unavailable setting for nonmatching indices errors at planning time #116348

quux00 · 2024-11-06T18:18:43Z

Adds support to ES|QL planning time (EsqlSession code) for dealing with non-matching indices and
how that relates to the remote cluster skip_unavailable setting and also how to deal with missing
indices on the local cluster (if included in the query).

For clusters included in an ES|QL query:

• skip_unavailable=true means:
if no data is returned from that cluster (due to cluster-not-connected, no matching indices, a missing
concrete index or shard failures during searches), it is not a "fatal" error that causes the entire query to fail.
Instead it is just a failure on that particular cluster and partial data should be returned from other clusters.

• skip_unavailable=false means:
if no data is returned from that cluster (for the same reasons enumerated above), then the whole query
should fail. This allows users to ensure that data is returned from a "required" cluster.

• For the local cluster, ES|QL assumes allow_no_indices=true and the skip_unavailable setting does not apply
(in part because there is no way for a user to set skip_unavailable for the local cluster)

Based on discussions with ES|QL team members, we defined the following rules to be enforced with respect to non-matching index expressions:

Rules enforced at planning time

P1. fail the query if there are no matching indices on any cluster (VerificationException)
P2. fail the query if a skip_unavailable:false cluster has no matching indices (the local cluster already has this rule enforced at planning time)
P3. fail query if the local cluster has no matching indices and a concrete index was specified

Rules enforced at execution time

For missing concrete (no wildcards present) index expressions:
E1. fail the query when it was specified for the local cluster or a skip_unavailable=false remote cluster
E2: on skip_unavailable=true clusters: an error fails the query on that cluster, but not the entire query
(data from other clusters still returned)

Notes on the rules

P1: this already happens, no new code needed in this PR
P2: The reason we need to enforce rule 2 at planning time is that when there are no matching indices from
field caps the EsIndex that is created (and passed into IndexResolution.valid) leaves that cluster out of the
list, so at execution time it will not attempt to query that cluster at all, so execution time will not catch
missing concrete indices. And even if it did get queried at execution time it wouldn't fail on wildcard only
indices where none of them matched.
P3: Right now FROM remote:existent,nomatch does NOT throw a failure (for same reason described in rule 2 above)
so that needs to be enforced in this PR.

This PR deals with enforcing and testing the planning time rules: P1, P2 and P3. A follow-on PR will address changes needed for handling the execution time rules.

Notes on PR scope

This PR covers nonsecured clusters (xpack.security.enabled: false) and security using certs ("RCS1).
In my testing I've founding that api-key based security ("RCS2") is not behaving as expected, so that
work has been deferred to a follow-on PR.

Partially addresses #114531

elasticsearchmachine · 2024-11-06T18:19:08Z

Hi @quux00, I've created a changelog YAML for you.

.../src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/CrossClustersEnrichIT.java

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSessionCCSUtils.java

…w new rules; tests updated

…des in EsIndex to always have the list of indices resolved by field-caps because when there are no mappings found in that index we pass in an empty map (which seems wrong). I tried to stop doing that but soon hit layers of ESQL I don't understand, so tried the simplier approach of adding a new (somewhat redundant) new field to IndexResolver: Set<String> resolvedIndices. This solves the problem of allowing the CCS handler code in EsqlSession.preAnalyze (which calls EsqlSessionCCSUtils.updateExecutionInfoWithClustersWithNoMatchingIndices) to determine whether or not a cluster had no matching indices from the field-caps call, allowing either an error to be thrown or for the CCS metadata to be updated.

…lable-missing-indices-t2

…nmatching indices tests

…lable-missing-indices-t2

…ssage is returned Adjusted RemoteClusterSecurityEsqlIT so it passes for skip_unavailable=true, although the error message returned for skip_unavailable=false is still wrong.

… policies for skip_un=true clusters; requires modifying the Analyzer

…lable-missing-indices-t2

costin · 2024-11-13T05:32:03Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/index/IndexResolution.java


-    public static IndexResolution valid(EsIndex index) {
-        return valid(index, Collections.emptyMap());
+    public static IndexResolution valid(EsIndex index, Set<String> resolvedIndices) {


Nit: to avoid the noise in the PR, keep the old method in place and extract the name from EsIndex:

valid(EsIndex index) { return valid(index, singletonSet(index.name())}

Good idea. Except we should use index.concreteIndices(), not index name, since the latter is the original unresolved user requested index expression. Fixed in next push.

costin · 2024-11-13T05:32:56Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/index/IndexResolution.java

+    /**
+     * @return all indices found by field-caps (regardless of whether they had any mappings)
+     */
+    public Set<String> getResolvedIndices() {


Nit: since there's no setter, you can drop the get -> resolvedIndices()

OK. Fixed in next push. I also changed getUnavailableClusters() to unavailableClusters().

costin · 2024-11-13T05:33:26Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/index/IndexResolution.java

+        return Objects.hash(index, invalid, resolvedIndices, unavailableClusters);
    }

    @Override


You probably want to have the resolved indices in toString()

Good idea. Updated in next push.

costin

Minor drive-by comments.

…lable-missing-indices-t2

…resolvedIndices) to avoid PR noise, since it can be inferred from EsIndex contents

…lable-missing-indices-t2

quux00 · 2024-11-13T15:46:43Z

@elasticsearchmachine run elasticsearch-ci/part-1

…lable-missing-indices-t2

elasticsearchmachine · 2024-11-13T17:59:26Z

Hi @quux00, I've updated the changelog YAML for you.

…reak, so reverting original model for invalid case

…lable-missing-indices-t2

elasticsearchmachine · 2024-11-14T15:44:58Z

💚 Backport successful

Status	Branch	Result
✅	8.x

…t planning time (elastic#116348) Adds support to ES|QL planning time (EsqlSession code) for dealing with non-matching indices and how that relates to the remote cluster skip_unavailable setting and also how to deal with missing indices on the local cluster (if included in the query). For clusters included in an ES|QL query: • `skip_unavailable=true` means: if no data is returned from that cluster (due to cluster-not-connected, no matching indices, a missing concrete index or shard failures during searches), it is not a "fatal" error that causes the entire query to fail. Instead it is just a failure on that particular cluster and partial data should be returned from other clusters. • `skip_unavailable=false` means: if no data is returned from that cluster (for the same reasons enumerated above), then the whole query should fail. This allows users to ensure that data is returned from a "required" cluster. • For the local cluster, ES|QL assumes `allow_no_indices=true` and the skip_unavailable setting does not apply (in part because there is no way for a user to set skip_unavailable for the local cluster) Based on discussions with ES|QL team members, we defined the following rules to be enforced with respect to non-matching index expressions: **Rules enforced at planning time** P1. fail the query if there are no matching indices on any cluster (VerificationException) P2. fail the query if a skip_unavailable:false cluster has no matching indices (the local cluster already has this rule enforced at planning time) P3. fail query if the local cluster has no matching indices and a concrete index was specified **Rules enforced at execution time** For missing concrete (no wildcards present) index expressions: E1. fail the query when it was specified for the local cluster or a skip_unavailable=false remote cluster E2: on skip_unavailable=true clusters: an error fails the query on that cluster, but not the entire query (data from other clusters still returned) **Notes on the rules** P1: this already happens, no new code needed in this PR P2: The reason we need to enforce rule 2 at planning time is that when there are no matching indices from field caps the EsIndex that is created (and passed into IndexResolution.valid) leaves that cluster out of the list, so at execution time it will not attempt to query that cluster at all, so execution time will not catch missing concrete indices. And even if it did get queried at execution time it wouldn't fail on wildcard only indices where none of them matched. P3: Right now `FROM remote:existent,nomatch` does NOT throw a failure (for same reason described in rule 2 above) so that needs to be enforced in this PR. This PR deals with enforcing and testing the planning time rules: P1, P2 and P3. A follow-on PR will address changes needed for handling the execution time rules. **Notes on PR scope** This PR covers nonsecured clusters (`xpack.security.enabled: false`) and security using certs ("RCS1). In my testing I've founding that api-key based security ("RCS2") is not behaving the same, so that work has been deferred to a follow-on PR. Partially addresses elastic#114531

…t planning time (#116348) (#116824) Adds support to ES|QL planning time (EsqlSession code) for dealing with non-matching indices and how that relates to the remote cluster skip_unavailable setting and also how to deal with missing indices on the local cluster (if included in the query). For clusters included in an ES|QL query: • `skip_unavailable=true` means: if no data is returned from that cluster (due to cluster-not-connected, no matching indices, a missing concrete index or shard failures during searches), it is not a "fatal" error that causes the entire query to fail. Instead it is just a failure on that particular cluster and partial data should be returned from other clusters. • `skip_unavailable=false` means: if no data is returned from that cluster (for the same reasons enumerated above), then the whole query should fail. This allows users to ensure that data is returned from a "required" cluster. • For the local cluster, ES|QL assumes `allow_no_indices=true` and the skip_unavailable setting does not apply (in part because there is no way for a user to set skip_unavailable for the local cluster) Based on discussions with ES|QL team members, we defined the following rules to be enforced with respect to non-matching index expressions: **Rules enforced at planning time** P1. fail the query if there are no matching indices on any cluster (VerificationException) P2. fail the query if a skip_unavailable:false cluster has no matching indices (the local cluster already has this rule enforced at planning time) P3. fail query if the local cluster has no matching indices and a concrete index was specified **Rules enforced at execution time** For missing concrete (no wildcards present) index expressions: E1. fail the query when it was specified for the local cluster or a skip_unavailable=false remote cluster E2: on skip_unavailable=true clusters: an error fails the query on that cluster, but not the entire query (data from other clusters still returned) **Notes on the rules** P1: this already happens, no new code needed in this PR P2: The reason we need to enforce rule 2 at planning time is that when there are no matching indices from field caps the EsIndex that is created (and passed into IndexResolution.valid) leaves that cluster out of the list, so at execution time it will not attempt to query that cluster at all, so execution time will not catch missing concrete indices. And even if it did get queried at execution time it wouldn't fail on wildcard only indices where none of them matched. P3: Right now `FROM remote:existent,nomatch` does NOT throw a failure (for same reason described in rule 2 above) so that needs to be enforced in this PR. This PR deals with enforcing and testing the planning time rules: P1, P2 and P3. A follow-on PR will address changes needed for handling the execution time rules. **Notes on PR scope** This PR covers nonsecured clusters (`xpack.security.enabled: false`) and security using certs ("RCS1). In my testing I've founding that api-key based security ("RCS2") is not behaving the same, so that work has been deferred to a follow-on PR. Partially addresses #114531

…t planning time (elastic#116348) Adds support to ES|QL planning time (EsqlSession code) for dealing with non-matching indices and how that relates to the remote cluster skip_unavailable setting and also how to deal with missing indices on the local cluster (if included in the query). For clusters included in an ES|QL query: • `skip_unavailable=true` means: if no data is returned from that cluster (due to cluster-not-connected, no matching indices, a missing concrete index or shard failures during searches), it is not a "fatal" error that causes the entire query to fail. Instead it is just a failure on that particular cluster and partial data should be returned from other clusters. • `skip_unavailable=false` means: if no data is returned from that cluster (for the same reasons enumerated above), then the whole query should fail. This allows users to ensure that data is returned from a "required" cluster. • For the local cluster, ES|QL assumes `allow_no_indices=true` and the skip_unavailable setting does not apply (in part because there is no way for a user to set skip_unavailable for the local cluster) Based on discussions with ES|QL team members, we defined the following rules to be enforced with respect to non-matching index expressions: **Rules enforced at planning time** P1. fail the query if there are no matching indices on any cluster (VerificationException) P2. fail the query if a skip_unavailable:false cluster has no matching indices (the local cluster already has this rule enforced at planning time) P3. fail query if the local cluster has no matching indices and a concrete index was specified **Rules enforced at execution time** For missing concrete (no wildcards present) index expressions: E1. fail the query when it was specified for the local cluster or a skip_unavailable=false remote cluster E2: on skip_unavailable=true clusters: an error fails the query on that cluster, but not the entire query (data from other clusters still returned) **Notes on the rules** P1: this already happens, no new code needed in this PR P2: The reason we need to enforce rule 2 at planning time is that when there are no matching indices from field caps the EsIndex that is created (and passed into IndexResolution.valid) leaves that cluster out of the list, so at execution time it will not attempt to query that cluster at all, so execution time will not catch missing concrete indices. And even if it did get queried at execution time it wouldn't fail on wildcard only indices where none of them matched. P3: Right now `FROM remote:existent,nomatch` does NOT throw a failure (for same reason described in rule 2 above) so that needs to be enforced in this PR. This PR deals with enforcing and testing the planning time rules: P1, P2 and P3. A follow-on PR will address changes needed for handling the execution time rules. **Notes on PR scope** This PR covers nonsecured clusters (`xpack.security.enabled: false`) and security using certs ("RCS1). In my testing I've founding that api-key based security ("RCS2") is not behaving the same, so that work has been deferred to a follow-on PR. Partially addresses elastic#114531

…time other than disconnected exceptions (#120236) For ES|QL, we are moving to limit the scope of the skip_unavailable setting for remote clusters. Going forward, skip_unavailable will be considered for two scenarios: 1) inability to connect to a remote cluster ("unavailable") 2) whether to fail on execution time errors or not (inline with the upcoming allow_partial_search_results work for ES|QL). This PR reverses the special plan-time handling for skip_unavailable=true clusters that was added in #116348. Remote clusters, regardless of their skip_unavailable setting, will now use the same logic as the local cluster for index expression analysis at plan time, namely: 1) If any concrete index specified is missing from the cluster, a VerificationException will be thrown 2) If no matching index/alias/datastream was found on any cluster (even if all were specified with a wildcard), a VerificationException will be thrown Thus, we no longer require at least one matching index expression for skip_unavailable=false clusters either, as was done in the previous PR referenced above.

…time other than disconnected exceptions (elastic#120236) For ES|QL, we are moving to limit the scope of the skip_unavailable setting for remote clusters. Going forward, skip_unavailable will be considered for two scenarios: 1) inability to connect to a remote cluster ("unavailable") 2) whether to fail on execution time errors or not (inline with the upcoming allow_partial_search_results work for ES|QL). This PR reverses the special plan-time handling for skip_unavailable=true clusters that was added in elastic#116348. Remote clusters, regardless of their skip_unavailable setting, will now use the same logic as the local cluster for index expression analysis at plan time, namely: 1) If any concrete index specified is missing from the cluster, a VerificationException will be thrown 2) If no matching index/alias/datastream was found on any cluster (even if all were specified with a wildcard), a VerificationException will be thrown Thus, we no longer require at least one matching index expression for skip_unavailable=false clusters either, as was done in the previous PR referenced above.

…time other than disconnected exceptions (#120236) (#120628) For ES|QL, we are moving to limit the scope of the skip_unavailable setting for remote clusters. Going forward, skip_unavailable will be considered for two scenarios: 1) inability to connect to a remote cluster ("unavailable") 2) whether to fail on execution time errors or not (inline with the upcoming allow_partial_search_results work for ES|QL). This PR reverses the special plan-time handling for skip_unavailable=true clusters that was added in #116348. Remote clusters, regardless of their skip_unavailable setting, will now use the same logic as the local cluster for index expression analysis at plan time, namely: 1) If any concrete index specified is missing from the cluster, a VerificationException will be thrown 2) If no matching index/alias/datastream was found on any cluster (even if all were specified with a wildcard), a VerificationException will be thrown Thus, we no longer require at least one matching index expression for skip_unavailable=false clusters either, as was done in the previous PR referenced above.

quux00 added >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) auto-backport Automatically create backport pull requests when merged :Analytics/ES|QL AKA ESQL v9.0.0 v8.17.0 labels Nov 6, 2024

quux00 changed the title ~~DRAFT: ESQL honors skip_unavailable setting for nonmatching indices errors at planning time~~ DRAFT: ES|QL honors skip_unavailable setting for nonmatching indices errors at planning time Nov 6, 2024

quux00 force-pushed the esql-ccs/skip_unavailable-missing-indices-t2 branch from 4b574b4 to d8bba72 Compare November 6, 2024 18:30

smalyshev reviewed Nov 6, 2024

View reviewed changes

.../src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/CrossClustersEnrichIT.java Outdated Show resolved Hide resolved

smalyshev reviewed Nov 6, 2024

View reviewed changes

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSessionCCSUtils.java Outdated Show resolved Hide resolved

quux00 force-pushed the esql-ccs/skip_unavailable-missing-indices-t2 branch from 8770e2c to b642e6d Compare November 7, 2024 15:47

quux00 added 6 commits November 7, 2024 11:20

Intmd commit - going to split into refactoring ticket

de67736

updateExecutionInfoWithClustersWithNoMatchingIndices updated to follo…

0669837

…w new rules; tests updated

Update docs/changelog/116348.yaml

d89845b

Fixed checkstyle issues

1b0dc2e

Fixed forbiddenAPI issues in CrossClustersQueryIT

33cd137

quux00 force-pushed the esql-ccs/skip_unavailable-missing-indices-t2 branch from b642e6d to 33cd137 Compare November 7, 2024 16:20

quux00 added 7 commits November 8, 2024 10:45

Added CrossClusterEsqlRCS1MissingIndicesIT

c5bf23e

Merge remote-tracking branch 'elastic/main' into esql-ccs/skip_unavai…

3b55eed

…lable-missing-indices-t2

Added index aliases and filtered alias to CrossClustersQueryIT for no…

d72f22a

…nmatching indices tests

Merge remote-tracking branch 'elastic/main' into esql-ccs/skip_unavai…

0693265

…lable-missing-indices-t2

Added tests against index with no mappings to ensure correct error me…

a619698

…ssage is returned Adjusted RemoteClusterSecurityEsqlIT so it passes for skip_unavailable=true, although the error message returned for skip_unavailable=false is still wrong.

I have decided that this ticket will NOT try to handle missing enrich…

3016c28

… policies for skip_un=true clusters; requires modifying the Analyzer

Merge remote-tracking branch 'elastic/main' into esql-ccs/skip_unavai…

1c0e8ca

…lable-missing-indices-t2

quux00 marked this pull request as ready for review November 8, 2024 22:09

quux00 requested review from astefan, nik9000 and pawankartik-elastic November 8, 2024 22:09

costin reviewed Nov 13, 2024

View reviewed changes

quux00 added 3 commits November 13, 2024 08:51

Merge remote-tracking branch 'elastic/main' into esql-ccs/skip_unavai…

3172e79

…lable-missing-indices-t2

PR feedback: Removed IndexResolution.valid(EsIndex index,Set<String> …

690670a

…resolvedIndices) to avoid PR noise, since it can be inferred from EsIndex contents

Merge remote-tracking branch 'elastic/main' into esql-ccs/skip_unavai…

931f83d

…lable-missing-indices-t2

quux00 requested a review from nik9000 November 13, 2024 15:12

Merge remote-tracking branch 'elastic/main' into esql-ccs/skip_unavai…

6c06c68

…lable-missing-indices-t2

quux00 changed the title ~~ES|QL honors skip_unavailable setting for nonmatching indices errors at planning time~~ ESQL: Honor skip_unavailable setting for nonmatching indices errors at planning time Nov 13, 2024

Update docs/changelog/116348.yaml

8d4d89b

quux00 added 3 commits November 13, 2024 13:53

Changing IndexResolution toString for invalid cases causes tests to b…

13c5c96

…reak, so reverting original model for invalid case

Merge remote-tracking branch 'elastic/main' into esql-ccs/skip_unavai…

f6b9ec5

…lable-missing-indices-t2

Fix changelong error

6e99656

nik9000 approved these changes Nov 13, 2024

View reviewed changes

quux00 merged commit cca7c15 into elastic:main Nov 14, 2024
16 checks passed

quux00 mentioned this pull request Nov 14, 2024

[8.x] ESQL: Honor skip_unavailable setting for nonmatching indices errors at planning time (#116348) #116824

Merged

quux00 mentioned this pull request Nov 14, 2024

ESQL: CCS skip_unavailable testing for non-matching index expressions under RCS2 #116846

Merged

quux00 mentioned this pull request Nov 22, 2024

ESQL: Missing enrich policies on skip_unavailable=true clusters no longer fail the query #116972

Closed

quux00 mentioned this pull request Jan 16, 2025

No special handling rules for skip_unavailable=true clusters at plan time other than disconnected exceptions #120236

Merged

ESQL: Honor skip_unavailable setting for nonmatching indices errors at planning time #116348

ESQL: Honor skip_unavailable setting for nonmatching indices errors at planning time #116348

Uh oh!

Conversation

quux00 commented Nov 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Nov 6, 2024

Uh oh!

Uh oh!

Uh oh!

costin Nov 13, 2024

Choose a reason for hiding this comment

Uh oh!

quux00 Nov 13, 2024

Choose a reason for hiding this comment

Uh oh!

costin Nov 13, 2024

Choose a reason for hiding this comment

Uh oh!

quux00 Nov 13, 2024

Choose a reason for hiding this comment

Uh oh!

costin Nov 13, 2024

Choose a reason for hiding this comment

Uh oh!

quux00 Nov 13, 2024

Choose a reason for hiding this comment

Uh oh!

costin left a comment

Choose a reason for hiding this comment

Uh oh!

quux00 commented Nov 13, 2024

Uh oh!

elasticsearchmachine commented Nov 13, 2024

Uh oh!

Uh oh!

elasticsearchmachine commented Nov 14, 2024

💚 Backport successful

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

quux00 commented Nov 6, 2024 •

edited

Loading