-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[ML] Report cause when datafeed extraction encounters error #66167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Report cause when datafeed extraction encounters error #66167
Conversation
When a datafeed encounters errors extracting data, often the error is an instance of `SearchPhaseExecutionException`. In that case the top level error message is `Partial shards failure` which is not very informative. This commit refactors a transform util method from `ExceptionRootCauseFinder`, which unwraps exceptions with special handling for `SearchPhaseExecutionException`, and makes use of it from datafeed `ProblemTracker` in order to provide a more useful error message.
|
Pinging @elastic/ml-core (:ml) |
benwtrent
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some minor concerns. Overall, this greatly improves things
x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/datafeed/DatafeedJob.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/datafeed/DatafeedJob.java
Outdated
Show resolved
Hide resolved
| for (ShardSearchFailure shardFailure : searchPhaseException.shardFailures()) { | ||
| Throwable unwrappedShardFailure = org.elasticsearch.ExceptionsHelper.unwrapCause(shardFailure.getCause()); | ||
|
|
||
| if (unwrappedShardFailure instanceof ElasticsearchException) { | ||
| return unwrappedShardFailure; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway to add the rest of the shard failures as suppressed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying something like that can make the message unreadable. Note we're already logging the SearchPhaseExecutionException.
I think there are 2 scenarios:
- the search fails because of some user error (e.g. time field doesn't exist in some index). In this scenario there will be a single cause and we'll bubble it up.
- the search fails due to the cluster being unstable. Here, we will have many shard failures. I think in this scenario we'd have to look at the logs anyhow to debug what's happening.
Having said that, I'm open to suggestions if you see another way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't explored this for search, but for indexing I use a similar method that tries to get the most important error. That means you could loop through the shard failures and get the most problematic one, e.g. IllegalArgumentException > RuntimeException, because the IAE is irrecoverable, the RE might just be a 429 (too manu requests).
Having that said: "it depends" on what you are aiming for with this
…feed/DatafeedJob.java Co-authored-by: Benjamin Trent <[email protected]>
…feed/DatafeedJob.java Co-authored-by: Benjamin Trent <[email protected]>
| * @param t raw Throwable | ||
| * @return unwrapped throwable if possible | ||
| */ | ||
| public static Throwable unwrapCause(Throwable t) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a bad name as it collides with org.elasticsearch.ExceptionsHelper.unwrapCause
Not saying getRootCauseException is a good one, maybe something that indicates the intent: unwrapping search responses, e.g. unwrapSearchException or findSearchExceptionRootCause.
I think this method should not become a general exception handling method, but only for handling search errors. Otherwise the chances are high that code sharing lead to undesired bugs.
For that reason I think it would be good to move it out of the ml subpackage (despite this is just a logical but not technically limiting) into a more common area.
Transform should use this without indirection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is basically an unwrapCause that also handles SearchPhaseExecutionException. The problem with SearchPhaseExecutionException is that it doesn't implement ElasticsearchWrapperException. Let's discuss this further offline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I have refactored the code to keep a method named findSearchExceptionRootCause instead of replacing unwrapCause.
| * @see org.elasticsearch.xpack.core.ml.utils.ExceptionsHelper#unwrapCause(Throwable) | ||
| */ | ||
| public static Throwable unwrapCause(Throwable t) { | ||
| return org.elasticsearch.xpack.core.ml.utils.ExceptionsHelper.unwrapCause(t); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my 1st comment, this makes no sense to me
x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/datafeed/DatafeedJob.java
Show resolved
Hide resolved
x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/datafeed/DatafeedJob.java
Show resolved
Hide resolved
|
@elasticmachine update branch |
hendrikmuhs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…stic#66167) When a datafeed encounters errors extracting data, often the error is an instance of `SearchPhaseExecutionException`. In that case the top level error message is `Partial shards failure` which is not very informative. This commit refactors a transform util method from `ExceptionRootCauseFinder`, which unwraps exceptions with special handling for `SearchPhaseExecutionException`, and makes use of it from datafeed `ProblemTracker` in order to provide a more useful error message. Backport of elastic#66167
) (#66277) When a datafeed encounters errors extracting data, often the error is an instance of `SearchPhaseExecutionException`. In that case the top level error message is `Partial shards failure` which is not very informative. This commit refactors a transform util method from `ExceptionRootCauseFinder`, which unwraps exceptions with special handling for `SearchPhaseExecutionException`, and makes use of it from datafeed `ProblemTracker` in order to provide a more useful error message. Backport of #66167
* elastic/master: (33 commits) Add searchable snapshot cache folder to NodeEnvironment (elastic#66297) [DOCS] Add dynamic runtime fields to docs (elastic#66194) Add HDFS searchable snapshot integration (elastic#66185) Support canceling cross-clusters search requests (elastic#66206) Mute testCacheSurviveRestart (elastic#66289) Fix cat tasks api params in spec and handler (elastic#66272) Snapshot of a searchable snapshot should be empty (elastic#66162) [ML] DFA _explain API should not fail when none field is included (elastic#66281) Add action to decommission legacy monitoring cluster alerts (elastic#64373) move rollup_index param out of RollupActionConfig (elastic#66139) Improve FieldFetcher retrieval of fields (elastic#66160) Remove unsed fields in `RestAnalyzeAction` (elastic#66215) Simplify searchable snapshot CacheKey (elastic#66263) Autoscaling remove feature flags (elastic#65973) Improve searchable snapshot mount time (elastic#66198) [ML] Report cause when datafeed extraction encounters error (elastic#66167) Remove suggest reference in some API specs (elastic#66180) Fix warning when installing a plugin for different ESversion (elastic#66146) [ML] make `xpack.ml.max_ml_node_size` and `xpack.ml.use_auto_machine_memory_percent` dynamically settable (elastic#66132) [DOCS] Add `require_alias` to Bulk API (elastic#66259) ...
| * @see org.elasticsearch.xpack.core.ml.utils.ExceptionsHelper#findSearchExceptionRootCause(Throwable) | ||
| */ | ||
| public static Throwable findSearchExceptionRootCause(Throwable t) { | ||
| return org.elasticsearch.xpack.core.ml.utils.ExceptionsHelper.findSearchExceptionRootCause(t); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this cross-package dependency ok?
I'd consider introducing ExceptionsHelper class under org.elasticsearch.xpack.core.common instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this with Hendrik. It's not great but harmless otherwise. Adding another ExceptionsHelper class is marginally worse. There is much to improve in our exceptions handling framework. I am going to raise an issue that once fixed should make this method unnecessary.
When a datafeed encounters errors extracting data, often
the error is an instance of
SearchPhaseExecutionException.In that case the top level error message is
Partial shards failurewhich is not very informative.
This commit refactors a transform util method from
ExceptionRootCauseFinder, which unwraps exceptions with specialhandling for
SearchPhaseExecutionException, and makes use ofit from datafeed
ProblemTrackerin order to provide a moreuseful error message.