[ML] Job in index: Datafeed node selector #34194
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Breaks the
DatafeedNodeSelectors dependency on datafeed and job configuration defined in the cluster state.Similar to #33994 the required configuration is added to the persistent task's parameters.
TransportStartDatafeedActionnow collects the required config and adds it toStartDatafeedAction.DatafeedParams. The new fields added to the params cannot be streamed in BWC safe way but this isn't an issue asTransportStartDatafeedActionis a master node action andStartDatafeedPersistentTasksExecutor.getAssignmentis also only called on the master node (by thePersistentTasksClusterService).DatafeedManager.runis changed to accept the config as a parameter rather than reading from the clusterstate, this has the additional benefit that the validated configs are used rather than re-reading the config some point after validation. However, the BWC breaks down here as the datafeed may start on a node that isn't upgraded (mixed cluster) and the required config will not have been streamed inStartDatafeedAction.DatafeedParams.Discuss
I can think of 2 solutions to this, either re-read the missing config in DatafeedManager or prevent the datafeed from starting on an old node/in a mixed cluster state. I prefer the later as I think it will simplify the migration process. That change can sensibly be done as part of the migration/upgrade work.