-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Closed
Labels
:ml/TransformTransformTransform
Description
Spotted in 7.4.0
In a multi-node environment, when running a continuous transform, the following warning is spammed in the logs occasionally:
[instance-0000000009] [some_transform_id] data frame transform encountered an exception:
java.lang.RuntimeException: Failed to retrieve checkpoint due to Failed to create checkpoint
at org.elasticsearch.xpack.dataframe.transforms.DataFrameTransformTask$ClientDataFrameIndexer.lambda$createCheckpoint$17(DataFrameTransformTask.java:1084) [data-frame-7.4.0.jar:7.4.0]
at org.elasticsearch.action.ActionListener$1.onFailure(ActionListener.java:70) [elasticsearch-7.4.0.jar:7.4.0]
...
After @hendrikmuhs investigated this, we found out that this is due to a mismatch of global checkpoints for the same shard (replicas). This is by design and it's nothing to worry about but the transform is paranoid and throws an exception. It should be safe to ignore the mismatch and e.g. take the max of all global checkpoints.
As a result, we should remove this message as it is unnecessary.
Metadata
Metadata
Assignees
Labels
:ml/TransformTransformTransform