-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[ML] Jindex: Rolling upgrade tests #35700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
2c7ff1f
f150c2d
b488806
49fde66
28a58eb
8265777
9928a1e
74f65f2
e43e256
43af2d7
ee57263
8bf3216
a919880
d1beb2b
e1020fd
6e86639
bc2252d
eb8a40d
4a755d2
6532995
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -158,6 +158,10 @@ static PersistentTasksCustomMetaData.Assignment selectLeastLoadedMlNode(String j | |
| int maxMachineMemoryPercent, | ||
| MlMemoryTracker memoryTracker, | ||
| Logger logger) { | ||
| if (job == null) { | ||
| logger.debug("[{}] select node job is null", jobId); | ||
| } | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe assert instead?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
|
|
||
| String resultsIndexName = job != null ? job.getResultsIndexName() : null; | ||
| List<String> unavailableIndices = verifyIndicesPrimaryShardsAreActive(resultsIndexName, clusterState); | ||
| if (unavailableIndices.size() != 0) { | ||
|
|
@@ -236,6 +240,16 @@ static PersistentTasksCustomMetaData.Assignment selectLeastLoadedMlNode(String j | |
| reasons.add(reason); | ||
| continue; | ||
| } | ||
|
|
||
| boolean jobConfigIsStoredInIndex = job.getJobVersion().onOrAfter(Version.V_6_6_0); | ||
| if (jobConfigIsStoredInIndex && node.getVersion().before(Version.V_6_6_0)) { | ||
| String reason = "Not opening job [" + jobId + "] on node [" + nodeNameOrId(node) | ||
| + "] version [" + node.getVersion() + "], because this node does not support " + | ||
| "jobs of version [" + job.getJobVersion() + "]"; | ||
| logger.trace(reason); | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: debug seems more suited to me There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Remember that on a 100 node cluster each allocation will generate 100 messages similar to this one, which would be significant log spam. They get concatenated into the overall reason, which is stored in the cluster state if the persistent task exists (and returned in the error message in the case of this being called prior to opening). All the other possible reasons for ruling out a node in this method also currently log at the trace level. I think they should all log at the same level, otherwise someone reading the logs could get a misleading picture of what is happening. I would leave this as trace to match the others. |
||
| reasons.add(reason); | ||
| continue; | ||
| } | ||
| } | ||
|
|
||
| long numberOfAssignedJobs = 0; | ||
|
|
@@ -820,8 +834,16 @@ public OpenJobPersistentTasksExecutor(Settings settings, ClusterService clusterS | |
|
|
||
| @Override | ||
| public PersistentTasksCustomMetaData.Assignment getAssignment(OpenJobAction.JobParams params, ClusterState clusterState) { | ||
| Job foundJob = params.getJob(); | ||
| if (foundJob == null) { | ||
| // The job was added to the persistent task parameters in 6.6.0 | ||
| // if the field is not present the task was created before 6.6.0. | ||
| // In which case the job should still be in the clusterstate | ||
| foundJob = MlMetadata.getMlMetadata(clusterState).getJobs().get(params.getJobId()); | ||
| } | ||
|
|
||
| PersistentTasksCustomMetaData.Assignment assignment = selectLeastLoadedMlNode(params.getJobId(), | ||
| params.getJob(), | ||
| foundJob, | ||
| clusterState, | ||
| maxConcurrentJobAllocations, | ||
| fallbackMaxNumberOfOpenJobs, | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: isn't it stringified to "[id1, id2, ...]" anyway? + less garbage for non-debug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point. I'll remedy in a later commit as this is merged now.