Skip to content

[ML] Individual bulk index failures in data frame should be treated as index failures  #44101

@hendrikmuhs

Description

@hendrikmuhs

Follow up from #43194

Steps to reproduce:

  1. Create a dataset
  2. Create data frame transform
  3. Create destination index with conflicting mappings
  4. Start data frame transform
  5. Notice that it finishes without reporting any indexing failures in the _stats API, even though ES logs show 100% of the docs have failed to index.

Bug:
There are 2 issues:

  • the indexer does not fail
  • the error count is wrong (we should count the bulk index failure as a whole, not the individual index failures)

Solution:
The error handling should use the same error handler that we use for search failures, meaning it should classify the issue into:

  • temporary failures (e.g. individual shard failures due to trouble on the node holding the shard)
  • irrecoverable failures (e.g. mapping conflict)

Temporary failures shall be retried, irrecoverable failures will set the state to failed without retry.

Retry will re-create/re-index the full page as a whole but not the individual index requests (which would be very complex).

/cc @dolaru @sophiec20

Closes #43194

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions