Skip to content

Conversation

@dimitris-athanasiou
Copy link
Contributor

When a datafeed encounters errors extracting data, often
the error is an instance of SearchPhaseExecutionException.
In that case the top level error message is Partial shards failure
which is not very informative.

This commit refactors a transform util method from
ExceptionRootCauseFinder, which unwraps exceptions with special
handling for SearchPhaseExecutionException, and makes use of
it from datafeed ProblemTracker in order to provide a more
useful error message.

When a datafeed encounters errors extracting data, often
the error is an instance of `SearchPhaseExecutionException`.
In that case the top level error message is `Partial shards failure`
which is not very informative.

This commit refactors a transform util method from
`ExceptionRootCauseFinder`, which unwraps exceptions with special
handling for `SearchPhaseExecutionException`, and makes use of
it from datafeed `ProblemTracker` in order to provide a more
useful error message.
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some minor concerns. Overall, this greatly improves things

Comment on lines 130 to 136
for (ShardSearchFailure shardFailure : searchPhaseException.shardFailures()) {
Throwable unwrappedShardFailure = org.elasticsearch.ExceptionsHelper.unwrapCause(shardFailure.getCause());

if (unwrappedShardFailure instanceof ElasticsearchException) {
return unwrappedShardFailure;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway to add the rest of the shard failures as suppressed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying something like that can make the message unreadable. Note we're already logging the SearchPhaseExecutionException.

I think there are 2 scenarios:

  1. the search fails because of some user error (e.g. time field doesn't exist in some index). In this scenario there will be a single cause and we'll bubble it up.
  2. the search fails due to the cluster being unstable. Here, we will have many shard failures. I think in this scenario we'd have to look at the logs anyhow to debug what's happening.

Having said that, I'm open to suggestions if you see another way.

Copy link

@hendrikmuhs hendrikmuhs Dec 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't explored this for search, but for indexing I use a similar method that tries to get the most important error. That means you could loop through the shard failures and get the most problematic one, e.g. IllegalArgumentException > RuntimeException, because the IAE is irrecoverable, the RE might just be a 429 (too manu requests).

Having that said: "it depends" on what you are aiming for with this

* @param t raw Throwable
* @return unwrapped throwable if possible
*/
public static Throwable unwrapCause(Throwable t) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a bad name as it collides with org.elasticsearch.ExceptionsHelper.unwrapCause

Not saying getRootCauseException is a good one, maybe something that indicates the intent: unwrapping search responses, e.g. unwrapSearchException or findSearchExceptionRootCause.

I think this method should not become a general exception handling method, but only for handling search errors. Otherwise the chances are high that code sharing lead to undesired bugs.

For that reason I think it would be good to move it out of the ml subpackage (despite this is just a logical but not technically limiting) into a more common area.

Transform should use this without indirection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is basically an unwrapCause that also handles SearchPhaseExecutionException. The problem with SearchPhaseExecutionException is that it doesn't implement ElasticsearchWrapperException. Let's discuss this further offline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I have refactored the code to keep a method named findSearchExceptionRootCause instead of replacing unwrapCause.

* @see org.elasticsearch.xpack.core.ml.utils.ExceptionsHelper#unwrapCause(Throwable)
*/
public static Throwable unwrapCause(Throwable t) {
return org.elasticsearch.xpack.core.ml.utils.ExceptionsHelper.unwrapCause(t);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my 1st comment, this makes no sense to me

@dimitris-athanasiou
Copy link
Contributor Author

@elasticmachine update branch

Copy link

@hendrikmuhs hendrikmuhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dimitris-athanasiou dimitris-athanasiou merged commit f8bda89 into elastic:master Dec 14, 2020
@dimitris-athanasiou dimitris-athanasiou deleted the report-cause-when-datafeed-extraction-encounters-error branch December 14, 2020 17:24
dimitris-athanasiou added a commit to dimitris-athanasiou/elasticsearch that referenced this pull request Dec 14, 2020
…stic#66167)

When a datafeed encounters errors extracting data, often
the error is an instance of `SearchPhaseExecutionException`.
In that case the top level error message is `Partial shards failure`
which is not very informative.

This commit refactors a transform util method from
`ExceptionRootCauseFinder`, which unwraps exceptions with special
handling for `SearchPhaseExecutionException`, and makes use of
it from datafeed `ProblemTracker` in order to provide a more
useful error message.

Backport of elastic#66167
dimitris-athanasiou added a commit that referenced this pull request Dec 14, 2020
) (#66277)

When a datafeed encounters errors extracting data, often
the error is an instance of `SearchPhaseExecutionException`.
In that case the top level error message is `Partial shards failure`
which is not very informative.

This commit refactors a transform util method from
`ExceptionRootCauseFinder`, which unwraps exceptions with special
handling for `SearchPhaseExecutionException`, and makes use of
it from datafeed `ProblemTracker` in order to provide a more
useful error message.

Backport of #66167
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Dec 14, 2020
* elastic/master: (33 commits)
  Add searchable snapshot cache folder to NodeEnvironment (elastic#66297)
  [DOCS] Add dynamic runtime fields to docs (elastic#66194)
  Add HDFS searchable snapshot integration (elastic#66185)
  Support canceling cross-clusters search requests (elastic#66206)
  Mute testCacheSurviveRestart (elastic#66289)
  Fix cat tasks api params in spec and handler (elastic#66272)
  Snapshot of a searchable snapshot should be empty (elastic#66162)
  [ML] DFA _explain API should not fail when none field is included (elastic#66281)
  Add action to decommission legacy monitoring cluster alerts (elastic#64373)
  move rollup_index param out of RollupActionConfig (elastic#66139)
  Improve FieldFetcher retrieval of fields (elastic#66160)
  Remove unsed fields in `RestAnalyzeAction` (elastic#66215)
  Simplify searchable snapshot CacheKey (elastic#66263)
  Autoscaling remove feature flags (elastic#65973)
  Improve searchable snapshot mount time (elastic#66198)
  [ML] Report cause when datafeed extraction encounters error (elastic#66167)
  Remove suggest reference in some API specs (elastic#66180)
  Fix warning when installing a plugin for different ESversion (elastic#66146)
  [ML] make `xpack.ml.max_ml_node_size` and `xpack.ml.use_auto_machine_memory_percent` dynamically settable (elastic#66132)
  [DOCS] Add `require_alias` to Bulk API (elastic#66259)
  ...
* @see org.elasticsearch.xpack.core.ml.utils.ExceptionsHelper#findSearchExceptionRootCause(Throwable)
*/
public static Throwable findSearchExceptionRootCause(Throwable t) {
return org.elasticsearch.xpack.core.ml.utils.ExceptionsHelper.findSearchExceptionRootCause(t);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this cross-package dependency ok?
I'd consider introducing ExceptionsHelper class under org.elasticsearch.xpack.core.common instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this with Hendrik. It's not great but harmless otherwise. Adding another ExceptionsHelper class is marginally worse. There is much to improve in our exceptions handling framework. I am going to raise an issue that once fixed should make this method unnecessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants