Split `Testcase.run()` into `run()` and `validate()` methods #102

xylar · 2021-04-30T19:29:21Z

The run() method previously included validation of variables and timers, but things are cleaner in a few ways if running steps and validation are kept separate:

For steps that only need to validate variables or timers, there is no need to override the run() method at all, saving a (somewhat confusing) call to super().run(). Since TestCase.validate() does nothing, there is no need to call super().validate().
In test cases that do override run(), it is now clearer that super().run() should be called at the end of the new run() method to run the steps (e.g. after altering the number of cores that each step should run with based on config options).

All test cases have been updated with this split.

The output from test suites has been altered to indicate if the test, the internal validation, and the comparison with the baseline
passed:

ocean/baroclinic_channel/10km/default
  test execution:      SUCCESS
ocean/baroclinic_channel/10km/threads_test
  test execution:      SUCCESS
  test validation:     PASS
  baseline comparison: PASS
ocean/baroclinic_channel/10km/decomp_test
  test execution:      SUCCESS
  test validation:     PASS
  baseline comparison: PASS

The documentation has been updated to reflect these changes.

closes #74

This is used to validate variables and timers.

xylar · 2021-04-30T19:31:23Z

Testing

I ran the nightly and sia_integration test suites with this change, comparing against a baseline from master. All tests other than the RK4 restart test passed (this needs to be fixed in MPAS-Ocean, unrelated to this PR).

xylar · 2021-04-30T19:43:19Z

@vanroekel and @mark-petersen, this was one of @matthewhoffman's suggestions during the review of #28. Especially after implementing it, I'm quite convinced that it's a good change.

I'm not looking for a particularly detailed review from any of you. I mostly want to get your thumbs up about this split and make sure you're aware of it going forward.

Also, I want your thumbs up on the change to the test suite output described above. It's a bit more verbose but I think it will be helpful to know if a failure is in the test itself, internal validation (e.g. between outputs in different steps like restart or decomposition testing) or against the baseline. I've also found the single-line output introduced in #63 to be hard to read, so I think this multiline format works better for me.

Here's an example where the additional info is potentially helpful:

ocean/global_ocean/QU240/PHC/RK4/restart_test
  test:                PASS
  internal validation: FAIL
  baseline comparison: PASS

matthewhoffman · 2021-04-30T21:22:11Z

@xylar , this looks really nice to me. I also like the multiline output format and separating the 3 pieces of information. I'm a little fuzzy on the meaning for the test: line - is this simply that the step ran to completion without error? And then internal validation is that the 'goal' of the test succeeded? If so, what about 'test execution' and 'test validation'? (I think 'baseline comparison' is good for the third part.) I also suggest 'test execution' have values of 'SUCCESS/ERROR' while the other two use 'PASS/FAIL'. This is all kind of semantics, so feel free to take it or leave it.

xylar · 2021-04-30T21:42:47Z

@matthewhoffman, yes, your interpretation is exactly right and I agree that your wording is an improvement. I'll make that change.

Also, switch to SUCCESS/ERROR in test execution

xylar · 2021-04-30T21:49:25Z

I do think that's an improvement, see the main description above.

xylar · 2021-04-30T22:07:36Z

Ooh, ooh, fun! I just figured out how to make PASS and SUCCESS green and FAIL and ERROR red. That will help a lot, I think.

matthewhoffman · 2021-04-30T22:30:02Z

Ha! I was almost going to suggest that but decided not to go there.

vanroekel · 2021-05-02T04:20:08Z

@xylar I'm struggling a little bit with what each line means. if test validation is the goal of the test succeeded, what is baseline comparison? I guess I'm confused on what sits under "validation". I'm stuck on thinking about validation as if the model reproduces what we expect, but then I can't separate out what baseline comparison is.

xylar · 2021-05-02T08:22:25Z

@vanroekel, I think we are open to suggestions for how to clarify further.

Some test cases only supply one file (filename1) for validation. This file is compared against a baseline if you supply one to make sure results are bit-for-bit with the baseline. This is baseline comparison.

Other test cases like restart, thread and decomposition tests compare output between two different steps. This is what is meant by test validation. (The results of each step are also compared with the corresponding steps in the baseline, so all tests with test validation will also have baseline comparison if a baseline is provided.)

Previously, even if no internal validation was performed, the "test validation" message was showing up. We only want a message when a test case includes validation between internal steps (as opposed to against a baseline).

xylar · 2021-05-02T09:10:23Z

@vanroekel, in the process of looking into your question, I realized I made a mistake in how test validation was being determined. I was checking for exceptions when validate() was called, which would have been consistent with how validation worked until earlier this week. The details aren't important unless you're curious. But I wonder if the bug might have been part of the reason for your confusion. Tests that do not have internal validation where producing test validaiton output.

With my last commit, output of the nightly suite looks like this:

nightly output:

$ python -m compass run nightly
ocean/baroclinic_channel/10km/default
  test execution:      SUCCESS
ocean/baroclinic_channel/10km/threads_test
  test execution:      SUCCESS
  test validation:     PASS
  baseline comparison: PASS
ocean/baroclinic_channel/10km/decomp_test
  test execution:      SUCCESS
  test validation:     PASS
  baseline comparison: PASS
ocean/baroclinic_channel/10km/restart_test
  test execution:      SUCCESS
  test validation:     PASS
  baseline comparison: PASS
ocean/global_ocean/QU240/mesh
  test execution:      SUCCESS
  baseline comparison: PASS
ocean/global_ocean/QU240/PHC/init
  test execution:      SUCCESS
  baseline comparison: PASS
ocean/global_ocean/QU240/PHC/performance_test
  test execution:      SUCCESS
  baseline comparison: PASS
ocean/global_ocean/QU240/PHC/restart_test
  test execution:      SUCCESS
  test validation:     PASS
  baseline comparison: PASS
ocean/global_ocean/QU240/PHC/decomp_test
  test execution:      SUCCESS
  test validation:     PASS
  baseline comparison: PASS
ocean/global_ocean/QU240/PHC/threads_test
  test execution:      SUCCESS
  test validation:     PASS
  baseline comparison: PASS
ocean/global_ocean/QU240/PHC/analysis_test
  test execution:      SUCCESS
  baseline comparison: PASS
ocean/global_ocean/QU240/PHC/RK4/performance_test
  test execution:      SUCCESS
  baseline comparison: PASS
ocean/global_ocean/QU240/PHC/RK4/restart_test
  test execution:      SUCCESS
  test validation:     FAIL
  baseline comparison: PASS
  see: case_outputs/ocean_global_ocean_QU240_PHC_RK4_restart_test.log
ocean/global_ocean/QU240/PHC/RK4/decomp_test
  test execution:      SUCCESS
  test validation:     PASS
  baseline comparison: PASS
ocean/global_ocean/QU240/PHC/RK4/threads_test
  test execution:      SUCCESS
  test validation:     PASS
  baseline comparison: PASS
ocean/global_ocean/QU240/EN4_1900/init
  test execution:      SUCCESS
  baseline comparison: PASS
ocean/global_ocean/QU240/EN4_1900/performance_test
  test execution:      SUCCESS
  baseline comparison: PASS
ocean/global_ocean/QU240/PHC_BGC/init
  test execution:      SUCCESS
  baseline comparison: PASS
ocean/global_ocean/QU240/PHC_BGC/performance_test
  test execution:      SUCCESS
  baseline comparison: PASS
ocean/ice_shelf_2d/5km/restart_test
  test execution:      SUCCESS
  test validation:     PASS
  baseline comparison: PASS
ocean/ziso/20km/default
  test execution:      SUCCESS
  baseline comparison: PASS
ocean/ziso/20km/with_frazil
  test execution:      SUCCESS
  baseline comparison: PASS
Test Runtimes:
00:10 PASS ocean_baroclinic_channel_10km_default
00:15 PASS ocean_baroclinic_channel_10km_threads_test
00:20 PASS ocean_baroclinic_channel_10km_decomp_test
00:17 PASS ocean_baroclinic_channel_10km_restart_test
00:30 PASS ocean_global_ocean_QU240_mesh
00:35 PASS ocean_global_ocean_QU240_PHC_init
00:46 PASS ocean_global_ocean_QU240_PHC_performance_test
01:25 PASS ocean_global_ocean_QU240_PHC_restart_test
01:32 PASS ocean_global_ocean_QU240_PHC_decomp_test
01:35 PASS ocean_global_ocean_QU240_PHC_threads_test
00:55 PASS ocean_global_ocean_QU240_PHC_analysis_test
00:42 PASS ocean_global_ocean_QU240_PHC_RK4_performance_test
01:23 FAIL ocean_global_ocean_QU240_PHC_RK4_restart_test
01:18 PASS ocean_global_ocean_QU240_PHC_RK4_decomp_test
01:13 PASS ocean_global_ocean_QU240_PHC_RK4_threads_test
00:33 PASS ocean_global_ocean_QU240_EN4_1900_init
00:39 PASS ocean_global_ocean_QU240_EN4_1900_performance_test
00:49 PASS ocean_global_ocean_QU240_PHC_BGC_init
00:57 PASS ocean_global_ocean_QU240_PHC_BGC_performance_test
00:43 PASS ocean_ice_shelf_2d_5km_restart_test
00:38 PASS ocean_ziso_20km_default
00:20 PASS ocean_ziso_20km_with_frazil
Total runtime 17:26
FAIL: 1 test failed, see above.

vanroekel · 2021-05-03T03:34:59Z

Ah, I see now @xylar Now that I understand, I think what you have for each step is perfect. I was just confused as in my interpretation we in effect overload things like thread test with a BFB test. I think this could prove quite useful though, we could potentially track down BFB issues more efficiently by seeing which tests fail baseline comparison.

If no baseline is provided the nightly output would have a pass/fail for baseline comparison or test validation, but not both, right?

xylar · 2021-05-03T06:57:45Z

If no baseline is provided the nightly output would have a pass/fail for baseline comparison or test validation, but not both, right?

If no baseline is provided, baseline comparison will not appear at all in the output. test validation will appear as before.

To illustrate this, currently, the only test that is failing is the RK4 restart test. It fails because of test validation (the full run is not bit-for-bit with the restart run). If you compare it with a baseline, baseline comparison passes because both the full run and the restart run are bit-for-bit with the same step in the baseline (even though they are not bit-for-bit with each other).

vanroekel · 2021-05-03T13:09:38Z

Thanks again @xylar for taking time to explain this, sorry for the obvious questions. This makes complete sense to me now.

vanroekel

I think this is a great change. I really like the new more detailed output in the nightly to see if the test fails in setup/execution or the validation. Thanks @xylar!

mark-petersen

This is great! I love the colors!

I tested the nightly regression suite with and without baseline comparisons, and made it fail in various ways. It all worked as expected. This is a big help in understanding what PASS meant in previous testing.

xylar · 2021-05-08T17:45:02Z

@mark-petersen, thanks for taking the time to review this (and the others) on a weekend!

@vanroekel and @matthewhoffman, thank you as well for your prompt reviews!

xylar added 4 commits April 30, 2021 21:04

Add validate() method to TestCase

45c104a

This is used to validate variables and timers.

Move landice validation to validate() method

689d911

Move ocean validation to validate() method

d22d315

Update documentation based on run/validate split

727583b

xylar added clean-up python package DEPRECATED: PRs and Issues involving the python package (master branch) labels Apr 30, 2021

xylar requested review from mark-petersen, matthewhoffman and vanroekel April 30, 2021 19:29

xylar self-assigned this Apr 30, 2021

matthewhoffman approved these changes Apr 30, 2021

View reviewed changes

Switch to test execution/test validtion in output

8a0ab73

Also, switch to SUCCESS/ERROR in test execution

Add color to pass/fail messages!!!

9d23272

Fix handling of internal test validation

c0a255a

Previously, even if no internal validation was performed, the "test validation" message was showing up. We only want a message when a test case includes validation between internal steps (as opposed to against a baseline).

vanroekel approved these changes May 3, 2021

View reviewed changes

mark-petersen approved these changes May 8, 2021

View reviewed changes

xylar merged commit f68c908 into MPAS-Dev:master May 8, 2021

xylar deleted the split_into_run_and_validate branch May 8, 2021 17:45

Split Testcase.run() into run() and validate() methods #102

Split Testcase.run() into run() and validate() methods #102

Uh oh!

Conversation

xylar commented Apr 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xylar commented Apr 30, 2021

Testing

Uh oh!

xylar commented Apr 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

matthewhoffman commented Apr 30, 2021

Uh oh!

xylar commented Apr 30, 2021

Uh oh!

xylar commented Apr 30, 2021

Uh oh!

xylar commented Apr 30, 2021

Uh oh!

matthewhoffman commented Apr 30, 2021

Uh oh!

vanroekel commented May 2, 2021

Uh oh!

xylar commented May 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xylar commented May 2, 2021

Uh oh!

vanroekel commented May 3, 2021

Uh oh!

xylar commented May 3, 2021

Uh oh!

vanroekel commented May 3, 2021

Uh oh!

vanroekel left a comment

Choose a reason for hiding this comment

Uh oh!

mark-petersen left a comment

Choose a reason for hiding this comment

Uh oh!

xylar commented May 8, 2021

Uh oh!

Uh oh!

Split `Testcase.run()` into `run()` and `validate()` methods #102

Split `Testcase.run()` into `run()` and `validate()` methods #102

xylar commented Apr 30, 2021 •

edited

Loading

xylar commented Apr 30, 2021 •

edited

Loading

xylar commented May 2, 2021 •

edited

Loading