Skip to content

Conversation

xylar
Copy link
Collaborator

@xylar xylar commented Apr 30, 2021

The run() method previously included validation of variables and timers, but things are cleaner in a few ways if running steps and validation are kept separate:

  1. For steps that only need to validate variables or timers, there is no need to override the run() method at all, saving a (somewhat confusing) call to super().run(). Since TestCase.validate() does nothing, there is no need to call super().validate().
  2. In test cases that do override run(), it is now clearer that super().run() should be called at the end of the new run() method to run the steps (e.g. after altering the number of cores that each step should run with based on config options).

All test cases have been updated with this split.

The output from test suites has been altered to indicate if the test, the internal validation, and the comparison with the baseline
passed:

ocean/baroclinic_channel/10km/default
  test execution:      SUCCESS
ocean/baroclinic_channel/10km/threads_test
  test execution:      SUCCESS
  test validation:     PASS
  baseline comparison: PASS
ocean/baroclinic_channel/10km/decomp_test
  test execution:      SUCCESS
  test validation:     PASS
  baseline comparison: PASS

The documentation has been updated to reflect these changes.

closes #74

@xylar xylar added clean-up python package DEPRECATED: PRs and Issues involving the python package (master branch) labels Apr 30, 2021
@xylar xylar self-assigned this Apr 30, 2021
@xylar
Copy link
Collaborator Author

xylar commented Apr 30, 2021

Testing

I ran the nightly and sia_integration test suites with this change, comparing against a baseline from master. All tests other than the RK4 restart test passed (this needs to be fixed in MPAS-Ocean, unrelated to this PR).

@xylar
Copy link
Collaborator Author

xylar commented Apr 30, 2021

@vanroekel and @mark-petersen, this was one of @matthewhoffman's suggestions during the review of #28. Especially after implementing it, I'm quite convinced that it's a good change.

I'm not looking for a particularly detailed review from any of you. I mostly want to get your thumbs up about this split and make sure you're aware of it going forward.

Also, I want your thumbs up on the change to the test suite output described above. It's a bit more verbose but I think it will be helpful to know if a failure is in the test itself, internal validation (e.g. between outputs in different steps like restart or decomposition testing) or against the baseline. I've also found the single-line output introduced in #63 to be hard to read, so I think this multiline format works better for me.

Here's an example where the additional info is potentially helpful:

ocean/global_ocean/QU240/PHC/RK4/restart_test
  test:                PASS
  internal validation: FAIL
  baseline comparison: PASS

@matthewhoffman
Copy link
Member

@xylar , this looks really nice to me. I also like the multiline output format and separating the 3 pieces of information. I'm a little fuzzy on the meaning for the test: line - is this simply that the step ran to completion without error? And then internal validation is that the 'goal' of the test succeeded? If so, what about 'test execution' and 'test validation'? (I think 'baseline comparison' is good for the third part.) I also suggest 'test execution' have values of 'SUCCESS/ERROR' while the other two use 'PASS/FAIL'. This is all kind of semantics, so feel free to take it or leave it.

@xylar
Copy link
Collaborator Author

xylar commented Apr 30, 2021

@matthewhoffman, yes, your interpretation is exactly right and I agree that your wording is an improvement. I'll make that change.

Also, switch to SUCCESS/ERROR in test execution
@xylar
Copy link
Collaborator Author

xylar commented Apr 30, 2021

I do think that's an improvement, see the main description above.

@xylar
Copy link
Collaborator Author

xylar commented Apr 30, 2021

Ooh, ooh, fun! I just figured out how to make PASS and SUCCESS green and FAIL and ERROR red. That will help a lot, I think.

@matthewhoffman
Copy link
Member

Ha! I was almost going to suggest that but decided not to go there.

@vanroekel
Copy link
Collaborator

@xylar I'm struggling a little bit with what each line means. if test validation is the goal of the test succeeded, what is baseline comparison? I guess I'm confused on what sits under "validation". I'm stuck on thinking about validation as if the model reproduces what we expect, but then I can't separate out what baseline comparison is.

@xylar
Copy link
Collaborator Author

xylar commented May 2, 2021

@vanroekel, I think we are open to suggestions for how to clarify further.

Some test cases only supply one file (filename1) for validation. This file is compared against a baseline if you supply one to make sure results are bit-for-bit with the baseline. This is baseline comparison.

Other test cases like restart, thread and decomposition tests compare output between two different steps. This is what is meant by test validation. (The results of each step are also compared with the corresponding steps in the baseline, so all tests with test validation will also have baseline comparison if a baseline is provided.)

Previously, even if no internal validation was performed, the
"test validation" message was showing up.  We only want a message
when a test case includes validation between internal steps (as
opposed to against a baseline).
@xylar
Copy link
Collaborator Author

xylar commented May 2, 2021

@vanroekel, in the process of looking into your question, I realized I made a mistake in how test validation was being determined. I was checking for exceptions when validate() was called, which would have been consistent with how validation worked until earlier this week. The details aren't important unless you're curious. But I wonder if the bug might have been part of the reason for your confusion. Tests that do not have internal validation where producing test validaiton output.

With my last commit, output of the nightly suite looks like this:

nightly output:
$ python -m compass run nightly
ocean/baroclinic_channel/10km/default
  test execution:      SUCCESS
ocean/baroclinic_channel/10km/threads_test
  test execution:      SUCCESS
  test validation:     PASS
  baseline comparison: PASS
ocean/baroclinic_channel/10km/decomp_test
  test execution:      SUCCESS
  test validation:     PASS
  baseline comparison: PASS
ocean/baroclinic_channel/10km/restart_test
  test execution:      SUCCESS
  test validation:     PASS
  baseline comparison: PASS
ocean/global_ocean/QU240/mesh
  test execution:      SUCCESS
  baseline comparison: PASS
ocean/global_ocean/QU240/PHC/init
  test execution:      SUCCESS
  baseline comparison: PASS
ocean/global_ocean/QU240/PHC/performance_test
  test execution:      SUCCESS
  baseline comparison: PASS
ocean/global_ocean/QU240/PHC/restart_test
  test execution:      SUCCESS
  test validation:     PASS
  baseline comparison: PASS
ocean/global_ocean/QU240/PHC/decomp_test
  test execution:      SUCCESS
  test validation:     PASS
  baseline comparison: PASS
ocean/global_ocean/QU240/PHC/threads_test
  test execution:      SUCCESS
  test validation:     PASS
  baseline comparison: PASS
ocean/global_ocean/QU240/PHC/analysis_test
  test execution:      SUCCESS
  baseline comparison: PASS
ocean/global_ocean/QU240/PHC/RK4/performance_test
  test execution:      SUCCESS
  baseline comparison: PASS
ocean/global_ocean/QU240/PHC/RK4/restart_test
  test execution:      SUCCESS
  test validation:     FAIL
  baseline comparison: PASS
  see: case_outputs/ocean_global_ocean_QU240_PHC_RK4_restart_test.log
ocean/global_ocean/QU240/PHC/RK4/decomp_test
  test execution:      SUCCESS
  test validation:     PASS
  baseline comparison: PASS
ocean/global_ocean/QU240/PHC/RK4/threads_test
  test execution:      SUCCESS
  test validation:     PASS
  baseline comparison: PASS
ocean/global_ocean/QU240/EN4_1900/init
  test execution:      SUCCESS
  baseline comparison: PASS
ocean/global_ocean/QU240/EN4_1900/performance_test
  test execution:      SUCCESS
  baseline comparison: PASS
ocean/global_ocean/QU240/PHC_BGC/init
  test execution:      SUCCESS
  baseline comparison: PASS
ocean/global_ocean/QU240/PHC_BGC/performance_test
  test execution:      SUCCESS
  baseline comparison: PASS
ocean/ice_shelf_2d/5km/restart_test
  test execution:      SUCCESS
  test validation:     PASS
  baseline comparison: PASS
ocean/ziso/20km/default
  test execution:      SUCCESS
  baseline comparison: PASS
ocean/ziso/20km/with_frazil
  test execution:      SUCCESS
  baseline comparison: PASS
Test Runtimes:
00:10 PASS ocean_baroclinic_channel_10km_default
00:15 PASS ocean_baroclinic_channel_10km_threads_test
00:20 PASS ocean_baroclinic_channel_10km_decomp_test
00:17 PASS ocean_baroclinic_channel_10km_restart_test
00:30 PASS ocean_global_ocean_QU240_mesh
00:35 PASS ocean_global_ocean_QU240_PHC_init
00:46 PASS ocean_global_ocean_QU240_PHC_performance_test
01:25 PASS ocean_global_ocean_QU240_PHC_restart_test
01:32 PASS ocean_global_ocean_QU240_PHC_decomp_test
01:35 PASS ocean_global_ocean_QU240_PHC_threads_test
00:55 PASS ocean_global_ocean_QU240_PHC_analysis_test
00:42 PASS ocean_global_ocean_QU240_PHC_RK4_performance_test
01:23 FAIL ocean_global_ocean_QU240_PHC_RK4_restart_test
01:18 PASS ocean_global_ocean_QU240_PHC_RK4_decomp_test
01:13 PASS ocean_global_ocean_QU240_PHC_RK4_threads_test
00:33 PASS ocean_global_ocean_QU240_EN4_1900_init
00:39 PASS ocean_global_ocean_QU240_EN4_1900_performance_test
00:49 PASS ocean_global_ocean_QU240_PHC_BGC_init
00:57 PASS ocean_global_ocean_QU240_PHC_BGC_performance_test
00:43 PASS ocean_ice_shelf_2d_5km_restart_test
00:38 PASS ocean_ziso_20km_default
00:20 PASS ocean_ziso_20km_with_frazil
Total runtime 17:26
FAIL: 1 test failed, see above.

@vanroekel
Copy link
Collaborator

Ah, I see now @xylar Now that I understand, I think what you have for each step is perfect. I was just confused as in my interpretation we in effect overload things like thread test with a BFB test. I think this could prove quite useful though, we could potentially track down BFB issues more efficiently by seeing which tests fail baseline comparison.

If no baseline is provided the nightly output would have a pass/fail for baseline comparison or test validation, but not both, right?

@xylar
Copy link
Collaborator Author

xylar commented May 3, 2021

If no baseline is provided the nightly output would have a pass/fail for baseline comparison or test validation, but not both, right?

If no baseline is provided, baseline comparison will not appear at all in the output. test validation will appear as before.

To illustrate this, currently, the only test that is failing is the RK4 restart test. It fails because of test validation (the full run is not bit-for-bit with the restart run). If you compare it with a baseline, baseline comparison passes because both the full run and the restart run are bit-for-bit with the same step in the baseline (even though they are not bit-for-bit with each other).

@vanroekel
Copy link
Collaborator

Thanks again @xylar for taking time to explain this, sorry for the obvious questions. This makes complete sense to me now.

Copy link
Collaborator

@vanroekel vanroekel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great change. I really like the new more detailed output in the nightly to see if the test fails in setup/execution or the validation. Thanks @xylar!

Copy link
Collaborator

@mark-petersen mark-petersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I love the colors!

I tested the nightly regression suite with and without baseline comparisons, and made it fail in various ways. It all worked as expected. This is a big help in understanding what PASS meant in previous testing.

@xylar
Copy link
Collaborator Author

xylar commented May 8, 2021

@mark-petersen, thanks for taking the time to review this (and the others) on a weekend!

@vanroekel and @matthewhoffman, thank you as well for your prompt reviews!

@xylar xylar merged commit f68c908 into MPAS-Dev:master May 8, 2021
@xylar xylar deleted the split_into_run_and_validate branch May 8, 2021 17:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clean-up python package DEPRECATED: PRs and Issues involving the python package (master branch)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Break test case run() method into run() and validate()
4 participants