-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[ML] BWC tests for job_stats.timing_stats field #43267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] BWC tests for job_stats.timing_stats field #43267
Conversation
|
Pinging @elastic/ml-core |
droberts195
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test should be OK for the master branch. However, when backported to 7.x there will be a problem.
The 7.x test might run with an old cluster version that supports timing stats or it might run with an old cluster version that does not support timing stats. (After 7.x moves on to 7.4.)
So the 7.x version of the test will need to skip the timing stats sections if the old version is before 7.3.0 and execute them if the old version is 7.3.0 or above.
You can only skip a whole test based on version, not individual asserts. Therefore, to prevent differences between master and 7.x that will make backporting other changes harder, it would be good to move all the assertions relating to timing stats into separate tests so that those entire tests can be skipped when the old version doesn't support timing stats.
513e86e to
1ac725c
Compare
Done for "mixed_cluster" and "upgraded_cluster" but I was not sure what to do with "old_cluster" as separating out the test would imply creating a separate job (unless there is a mechanism to control the order of executing tests which I doubt). Should I create a separate job for this new test case or just skip the test in "old_cluster"? |
I think it's reasonable not to assert on the timing stats at all in the old cluster. The old cluster consists entirely of nodes on the old version, and if that old version has timing stats it's reasonable to assume that testing for timing stats in a cluster purely containing that old version has been done in other tests running against that old version. So I think the best thing is to delete the timing stats assertions from the old cluster tests and just leave them in the mixed and upgraded cluster tests. |
|
GET job stats is a task action that will redirect to the node the job is running on if the job is open else the coordinating node will handle the request. For the mixed node tests it will be random which node the request hits and whether that node is upgraded so you can't rely on getting the new response fields in a mixed cluster. For master this is ok as the BWC tests run against the latest 7.x which does have the code so every node in the old and mixed cluster will return the new response. But when you backport to 7.x the assertions can only be made in the upgraded cluster. |
The timing stats are generated in the old cluster, so will only exist if the old cluster was on version 7.3.0 or above. Therefore I think it's OK to assert in both the mixed cluster and the upgraded cluster, but the new tests for both need to be skipped entirely if the old cluster was older than 7.3.0. |
There is a "skip" clause in which I can provide the range of versions in which the test can run. Does this "skip" version refer to the version of the old cluster? |
For the mixed cluster I think the version is the minimum node version in the cluster which will be the old cluster version so the skip version test should work. These are all changes you will need to make for the backport. I would raise a separate PR for the backport and make the changes there and let CI run it a few times. For the master branch this PR looks good |
Ok, I'll raise a separate PR once this one is merged in. Thanks! |
droberts195
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. As @davidkyle said, raise a separate backport PR to add the necessary skip version directives for 7.x and make sure they work.
No description provided.