Skip to content

[ML] Unnecessary transform warning message is logged very often #48379

@dolaru

Description

@dolaru

Spotted in 7.4.0

In a multi-node environment, when running a continuous transform, the following warning is spammed in the logs occasionally:

[instance-0000000009] [some_transform_id] data frame transform encountered an exception: 
java.lang.RuntimeException: Failed to retrieve checkpoint due to Failed to create checkpoint
	at org.elasticsearch.xpack.dataframe.transforms.DataFrameTransformTask$ClientDataFrameIndexer.lambda$createCheckpoint$17(DataFrameTransformTask.java:1084) [data-frame-7.4.0.jar:7.4.0]
	at org.elasticsearch.action.ActionListener$1.onFailure(ActionListener.java:70) [elasticsearch-7.4.0.jar:7.4.0]
...

After @hendrikmuhs investigated this, we found out that this is due to a mismatch of global checkpoints for the same shard (replicas). This is by design and it's nothing to worry about but the transform is paranoid and throws an exception. It should be safe to ignore the mismatch and e.g. take the max of all global checkpoints.

As a result, we should remove this message as it is unnecessary.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions