Skip to content

Conversation

@quux00
Copy link
Contributor

@quux00 quux00 commented Nov 6, 2024

Adds support to ES|QL planning time (EsqlSession code) for dealing with non-matching indices and
how that relates to the remote cluster skip_unavailable setting and also how to deal with missing
indices on the local cluster (if included in the query).

For clusters included in an ES|QL query:

skip_unavailable=true means:
if no data is returned from that cluster (due to cluster-not-connected, no matching indices, a missing
concrete index or shard failures during searches), it is not a "fatal" error that causes the entire query to fail.
Instead it is just a failure on that particular cluster and partial data should be returned from other clusters.

skip_unavailable=false means:
if no data is returned from that cluster (for the same reasons enumerated above), then the whole query
should fail. This allows users to ensure that data is returned from a "required" cluster.

• For the local cluster, ES|QL assumes allow_no_indices=true and the skip_unavailable setting does not apply
(in part because there is no way for a user to set skip_unavailable for the local cluster)

Based on discussions with ES|QL team members, we defined the following rules to be enforced with respect to non-matching index expressions:

Rules enforced at planning time

P1. fail the query if there are no matching indices on any cluster (VerificationException)
P2. fail the query if a skip_unavailable:false cluster has no matching indices (the local cluster already has this rule enforced at planning time)
P3. fail query if the local cluster has no matching indices and a concrete index was specified

Rules enforced at execution time

For missing concrete (no wildcards present) index expressions:
E1. fail the query when it was specified for the local cluster or a skip_unavailable=false remote cluster
E2: on skip_unavailable=true clusters: an error fails the query on that cluster, but not the entire query
(data from other clusters still returned)

Notes on the rules

P1: this already happens, no new code needed in this PR
P2: The reason we need to enforce rule 2 at planning time is that when there are no matching indices from
field caps the EsIndex that is created (and passed into IndexResolution.valid) leaves that cluster out of the
list, so at execution time it will not attempt to query that cluster at all, so execution time will not catch
missing concrete indices. And even if it did get queried at execution time it wouldn't fail on wildcard only
indices where none of them matched.
P3: Right now FROM remote:existent,nomatch does NOT throw a failure (for same reason described in rule 2 above)
so that needs to be enforced in this PR.

This PR deals with enforcing and testing the planning time rules: P1, P2 and P3. A follow-on PR will address changes needed for handling the execution time rules.

Notes on PR scope

This PR covers nonsecured clusters (xpack.security.enabled: false) and security using certs ("RCS1).
In my testing I've founding that api-key based security ("RCS2") is not behaving as expected, so that
work has been deferred to a follow-on PR.

Partially addresses #114531

@quux00 quux00 added >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) auto-backport Automatically create backport pull requests when merged :Analytics/ES|QL AKA ESQL v9.0.0 v8.17.0 labels Nov 6, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @quux00, I've created a changelog YAML for you.

@quux00 quux00 changed the title DRAFT: ESQL honors skip_unavailable setting for nonmatching indices errors at planning time DRAFT: ES|QL honors skip_unavailable setting for nonmatching indices errors at planning time Nov 6, 2024
@quux00 quux00 force-pushed the esql-ccs/skip_unavailable-missing-indices-t2 branch from 4b574b4 to d8bba72 Compare November 6, 2024 18:30
@quux00 quux00 force-pushed the esql-ccs/skip_unavailable-missing-indices-t2 branch from 8770e2c to b642e6d Compare November 7, 2024 15:47
…des in EsIndex to always have the list

of indices resolved by field-caps because when there are no mappings found in that index we pass in an empty
map (which seems wrong). I tried to stop doing that but soon hit layers of ESQL I don't understand, so tried
the simplier approach of adding a new (somewhat redundant) new field to IndexResolver: Set<String> resolvedIndices.

This solves the problem of allowing the CCS handler code in EsqlSession.preAnalyze (which calls
EsqlSessionCCSUtils.updateExecutionInfoWithClustersWithNoMatchingIndices) to determine whether or not a cluster
had no matching indices from the field-caps call, allowing either an error to be thrown or for the CCS metadata to
be updated.
@quux00 quux00 force-pushed the esql-ccs/skip_unavailable-missing-indices-t2 branch from b642e6d to 33cd137 Compare November 7, 2024 16:20
…ssage is returned

Adjusted RemoteClusterSecurityEsqlIT so it passes for skip_unavailable=true, although
the error message returned for skip_unavailable=false is still wrong.
… policies for skip_un=true clusters; requires modifying the Analyzer
@quux00 quux00 marked this pull request as ready for review November 8, 2024 22:09

public static IndexResolution valid(EsIndex index) {
return valid(index, Collections.emptyMap());
public static IndexResolution valid(EsIndex index, Set<String> resolvedIndices) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: to avoid the noise in the PR, keep the old method in place and extract the name from EsIndex:

valid(EsIndex index) { return valid(index,  singletonSet(index.name())}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Except we should use index.concreteIndices(), not index name, since the latter is the original unresolved user requested index expression. Fixed in next push.

/**
* @return all indices found by field-caps (regardless of whether they had any mappings)
*/
public Set<String> getResolvedIndices() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: since there's no setter, you can drop the get -> resolvedIndices()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Fixed in next push. I also changed getUnavailableClusters() to unavailableClusters().

return Objects.hash(index, invalid, resolvedIndices, unavailableClusters);
}

@Override
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want to have the resolved indices in toString()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Updated in next push.

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor drive-by comments.

@quux00 quux00 requested a review from nik9000 November 13, 2024 15:12
@quux00
Copy link
Contributor Author

quux00 commented Nov 13, 2024

@elasticsearchmachine run elasticsearch-ci/part-1

@quux00 quux00 changed the title ES|QL honors skip_unavailable setting for nonmatching indices errors at planning time ESQL: Honor skip_unavailable setting for nonmatching indices errors at planning time Nov 13, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @quux00, I've updated the changelog YAML for you.

@quux00 quux00 merged commit cca7c15 into elastic:main Nov 14, 2024
16 checks passed
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

quux00 added a commit to quux00/elasticsearch that referenced this pull request Nov 14, 2024
…t planning time (elastic#116348)

Adds support to ES|QL planning time (EsqlSession code) for dealing with non-matching indices and
how that relates to the remote cluster skip_unavailable setting and also how to deal with missing
indices on the local cluster (if included in the query).

For clusters included in an ES|QL query:

• `skip_unavailable=true` means:
  if no data is returned from that cluster (due to cluster-not-connected, no matching indices, a missing
  concrete index or shard failures during searches), it is not a "fatal" error that causes the entire query to fail. 
  Instead it is just a failure on that particular cluster and partial data should be returned from other clusters.

• `skip_unavailable=false` means: 
   if no data is returned from that cluster (for the same reasons enumerated above), then the whole query
should fail. This allows users to ensure that data is returned from a "required" cluster.					

• For the local cluster, ES|QL assumes `allow_no_indices=true` and the skip_unavailable setting does not apply
  (in part because there is no way for a user to set skip_unavailable for the local cluster)

Based on discussions with ES|QL team members, we defined the following rules to be enforced with respect to non-matching index expressions:

**Rules enforced at planning time**

P1. fail the query if there are no matching indices on any cluster (VerificationException)
P2. fail the query if a skip_unavailable:false cluster has no matching indices (the local cluster already has this rule enforced at planning time)
P3. fail query if the local cluster has no matching indices and a concrete index was specified

**Rules enforced at execution time**

For missing concrete (no wildcards present) index expressions:
E1. fail the query when it was specified for the local cluster or a skip_unavailable=false remote cluster
E2: on skip_unavailable=true clusters: an error fails the query on that cluster, but not the entire query
    (data from other clusters still returned)

**Notes on the rules**

P1: this already happens, no new code needed in this PR
P2: The reason we need to enforce rule 2 at planning time is that when there are no matching indices from
  field caps the EsIndex that is created (and passed into IndexResolution.valid) leaves that cluster out of the
  list, so at execution time it will not attempt to query that cluster at all, so execution time will not catch
  missing concrete indices. And even if it did get queried at execution time it wouldn't fail on wildcard only
  indices where none of them matched.
P3: Right now `FROM remote:existent,nomatch` does NOT throw a failure (for same reason described in rule 2 above)
so that needs to be enforced in this PR.

This PR deals with enforcing and testing the planning time rules: P1, P2 and P3. A follow-on PR will address changes needed for handling the execution time rules.

**Notes on PR scope**

This PR covers nonsecured clusters (`xpack.security.enabled: false`) and security using certs ("RCS1). 
In my testing I've founding that api-key based security ("RCS2") is not behaving the same, so that
work has been deferred to a follow-on PR.

Partially addresses elastic#114531
elasticsearchmachine pushed a commit that referenced this pull request Nov 14, 2024
…t planning time (#116348) (#116824)

Adds support to ES|QL planning time (EsqlSession code) for dealing with non-matching indices and
how that relates to the remote cluster skip_unavailable setting and also how to deal with missing
indices on the local cluster (if included in the query).

For clusters included in an ES|QL query:

• `skip_unavailable=true` means:
  if no data is returned from that cluster (due to cluster-not-connected, no matching indices, a missing
  concrete index or shard failures during searches), it is not a "fatal" error that causes the entire query to fail. 
  Instead it is just a failure on that particular cluster and partial data should be returned from other clusters.

• `skip_unavailable=false` means: 
   if no data is returned from that cluster (for the same reasons enumerated above), then the whole query
should fail. This allows users to ensure that data is returned from a "required" cluster.					

• For the local cluster, ES|QL assumes `allow_no_indices=true` and the skip_unavailable setting does not apply
  (in part because there is no way for a user to set skip_unavailable for the local cluster)

Based on discussions with ES|QL team members, we defined the following rules to be enforced with respect to non-matching index expressions:

**Rules enforced at planning time**

P1. fail the query if there are no matching indices on any cluster (VerificationException)
P2. fail the query if a skip_unavailable:false cluster has no matching indices (the local cluster already has this rule enforced at planning time)
P3. fail query if the local cluster has no matching indices and a concrete index was specified

**Rules enforced at execution time**

For missing concrete (no wildcards present) index expressions:
E1. fail the query when it was specified for the local cluster or a skip_unavailable=false remote cluster
E2: on skip_unavailable=true clusters: an error fails the query on that cluster, but not the entire query
    (data from other clusters still returned)

**Notes on the rules**

P1: this already happens, no new code needed in this PR
P2: The reason we need to enforce rule 2 at planning time is that when there are no matching indices from
  field caps the EsIndex that is created (and passed into IndexResolution.valid) leaves that cluster out of the
  list, so at execution time it will not attempt to query that cluster at all, so execution time will not catch
  missing concrete indices. And even if it did get queried at execution time it wouldn't fail on wildcard only
  indices where none of them matched.
P3: Right now `FROM remote:existent,nomatch` does NOT throw a failure (for same reason described in rule 2 above)
so that needs to be enforced in this PR.

This PR deals with enforcing and testing the planning time rules: P1, P2 and P3. A follow-on PR will address changes needed for handling the execution time rules.

**Notes on PR scope**

This PR covers nonsecured clusters (`xpack.security.enabled: false`) and security using certs ("RCS1). 
In my testing I've founding that api-key based security ("RCS2") is not behaving the same, so that
work has been deferred to a follow-on PR.

Partially addresses #114531
salvatore-campagna pushed a commit to salvatore-campagna/elasticsearch that referenced this pull request Nov 18, 2024
…t planning time (elastic#116348)

Adds support to ES|QL planning time (EsqlSession code) for dealing with non-matching indices and
how that relates to the remote cluster skip_unavailable setting and also how to deal with missing
indices on the local cluster (if included in the query).

For clusters included in an ES|QL query:

• `skip_unavailable=true` means:
  if no data is returned from that cluster (due to cluster-not-connected, no matching indices, a missing
  concrete index or shard failures during searches), it is not a "fatal" error that causes the entire query to fail. 
  Instead it is just a failure on that particular cluster and partial data should be returned from other clusters.

• `skip_unavailable=false` means: 
   if no data is returned from that cluster (for the same reasons enumerated above), then the whole query
should fail. This allows users to ensure that data is returned from a "required" cluster.					

• For the local cluster, ES|QL assumes `allow_no_indices=true` and the skip_unavailable setting does not apply
  (in part because there is no way for a user to set skip_unavailable for the local cluster)

Based on discussions with ES|QL team members, we defined the following rules to be enforced with respect to non-matching index expressions:

**Rules enforced at planning time**

P1. fail the query if there are no matching indices on any cluster (VerificationException)
P2. fail the query if a skip_unavailable:false cluster has no matching indices (the local cluster already has this rule enforced at planning time)
P3. fail query if the local cluster has no matching indices and a concrete index was specified

**Rules enforced at execution time**

For missing concrete (no wildcards present) index expressions:
E1. fail the query when it was specified for the local cluster or a skip_unavailable=false remote cluster
E2: on skip_unavailable=true clusters: an error fails the query on that cluster, but not the entire query
    (data from other clusters still returned)

**Notes on the rules**

P1: this already happens, no new code needed in this PR
P2: The reason we need to enforce rule 2 at planning time is that when there are no matching indices from
  field caps the EsIndex that is created (and passed into IndexResolution.valid) leaves that cluster out of the
  list, so at execution time it will not attempt to query that cluster at all, so execution time will not catch
  missing concrete indices. And even if it did get queried at execution time it wouldn't fail on wildcard only
  indices where none of them matched.
P3: Right now `FROM remote:existent,nomatch` does NOT throw a failure (for same reason described in rule 2 above)
so that needs to be enforced in this PR.

This PR deals with enforcing and testing the planning time rules: P1, P2 and P3. A follow-on PR will address changes needed for handling the execution time rules.

**Notes on PR scope**

This PR covers nonsecured clusters (`xpack.security.enabled: false`) and security using certs ("RCS1). 
In my testing I've founding that api-key based security ("RCS2") is not behaving the same, so that
work has been deferred to a follow-on PR.

Partially addresses elastic#114531
alexey-ivanov-es pushed a commit to alexey-ivanov-es/elasticsearch that referenced this pull request Nov 28, 2024
…t planning time (elastic#116348)

Adds support to ES|QL planning time (EsqlSession code) for dealing with non-matching indices and
how that relates to the remote cluster skip_unavailable setting and also how to deal with missing
indices on the local cluster (if included in the query).

For clusters included in an ES|QL query:

• `skip_unavailable=true` means:
  if no data is returned from that cluster (due to cluster-not-connected, no matching indices, a missing
  concrete index or shard failures during searches), it is not a "fatal" error that causes the entire query to fail. 
  Instead it is just a failure on that particular cluster and partial data should be returned from other clusters.

• `skip_unavailable=false` means: 
   if no data is returned from that cluster (for the same reasons enumerated above), then the whole query
should fail. This allows users to ensure that data is returned from a "required" cluster.					

• For the local cluster, ES|QL assumes `allow_no_indices=true` and the skip_unavailable setting does not apply
  (in part because there is no way for a user to set skip_unavailable for the local cluster)

Based on discussions with ES|QL team members, we defined the following rules to be enforced with respect to non-matching index expressions:

**Rules enforced at planning time**

P1. fail the query if there are no matching indices on any cluster (VerificationException)
P2. fail the query if a skip_unavailable:false cluster has no matching indices (the local cluster already has this rule enforced at planning time)
P3. fail query if the local cluster has no matching indices and a concrete index was specified

**Rules enforced at execution time**

For missing concrete (no wildcards present) index expressions:
E1. fail the query when it was specified for the local cluster or a skip_unavailable=false remote cluster
E2: on skip_unavailable=true clusters: an error fails the query on that cluster, but not the entire query
    (data from other clusters still returned)

**Notes on the rules**

P1: this already happens, no new code needed in this PR
P2: The reason we need to enforce rule 2 at planning time is that when there are no matching indices from
  field caps the EsIndex that is created (and passed into IndexResolution.valid) leaves that cluster out of the
  list, so at execution time it will not attempt to query that cluster at all, so execution time will not catch
  missing concrete indices. And even if it did get queried at execution time it wouldn't fail on wildcard only
  indices where none of them matched.
P3: Right now `FROM remote:existent,nomatch` does NOT throw a failure (for same reason described in rule 2 above)
so that needs to be enforced in this PR.

This PR deals with enforcing and testing the planning time rules: P1, P2 and P3. A follow-on PR will address changes needed for handling the execution time rules.

**Notes on PR scope**

This PR covers nonsecured clusters (`xpack.security.enabled: false`) and security using certs ("RCS1). 
In my testing I've founding that api-key based security ("RCS2") is not behaving the same, so that
work has been deferred to a follow-on PR.

Partially addresses elastic#114531
quux00 added a commit that referenced this pull request Jan 22, 2025
…time other than disconnected exceptions (#120236)

For ES|QL, we are moving to limit the scope of the skip_unavailable setting for remote clusters.
Going forward, skip_unavailable will be considered for two scenarios:

1) inability to connect to a remote cluster ("unavailable")
2) whether to fail on execution time errors or not (inline with the upcoming 
allow_partial_search_results work for ES|QL).

This PR reverses the special plan-time handling for skip_unavailable=true clusters
that was added in #116348. Remote clusters, regardless of their skip_unavailable setting, 
will now use the same logic as the local cluster for index expression analysis at plan time, namely:

1) If any concrete index specified is missing from the cluster, a VerificationException will be thrown
2) If no matching index/alias/datastream was found on any cluster (even if all were specified with a wildcard), 
a VerificationException will be thrown

Thus, we no longer require at least one matching index expression for skip_unavailable=false 
clusters either, as was done in the previous PR referenced above.
quux00 added a commit to quux00/elasticsearch that referenced this pull request Jan 22, 2025
…time other than disconnected exceptions (elastic#120236)

For ES|QL, we are moving to limit the scope of the skip_unavailable setting for remote clusters.
Going forward, skip_unavailable will be considered for two scenarios:

1) inability to connect to a remote cluster ("unavailable")
2) whether to fail on execution time errors or not (inline with the upcoming 
allow_partial_search_results work for ES|QL).

This PR reverses the special plan-time handling for skip_unavailable=true clusters
that was added in elastic#116348. Remote clusters, regardless of their skip_unavailable setting, 
will now use the same logic as the local cluster for index expression analysis at plan time, namely:

1) If any concrete index specified is missing from the cluster, a VerificationException will be thrown
2) If no matching index/alias/datastream was found on any cluster (even if all were specified with a wildcard), 
a VerificationException will be thrown

Thus, we no longer require at least one matching index expression for skip_unavailable=false 
clusters either, as was done in the previous PR referenced above.
elasticsearchmachine pushed a commit that referenced this pull request Jan 22, 2025
…time other than disconnected exceptions (#120236) (#120628)

For ES|QL, we are moving to limit the scope of the skip_unavailable setting for remote clusters.
Going forward, skip_unavailable will be considered for two scenarios:

1) inability to connect to a remote cluster ("unavailable")
2) whether to fail on execution time errors or not (inline with the upcoming 
allow_partial_search_results work for ES|QL).

This PR reverses the special plan-time handling for skip_unavailable=true clusters
that was added in #116348. Remote clusters, regardless of their skip_unavailable setting, 
will now use the same logic as the local cluster for index expression analysis at plan time, namely:

1) If any concrete index specified is missing from the cluster, a VerificationException will be thrown
2) If no matching index/alias/datastream was found on any cluster (even if all were specified with a wildcard), 
a VerificationException will be thrown

Thus, we no longer require at least one matching index expression for skip_unavailable=false 
clusters either, as was done in the previous PR referenced above.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.17.0 v9.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants