Functional tests use GoogleTest now #26

KADichev · 2024-09-23T09:07:38Z

All functional tests are now GoogleTest tests.

The main changes for CMake are:

reorganize CMake to use GTest CMake package for compilation, and gtest_add_tests clause to define Gtests;
remove all use of the run.sh script;
make explicit the list of tests and do not use file(GLOB, ...) clauses to look for all files in test directories, as recommended by CMakesource files:

The main changes for source files are:

every single test file needed to be modified to not include internal Test.h but gtest.h, and to use the assert/equal clauses of GoogleTest
all death tests are slightly modified now to expect FAIL() after the expected fail condition
some modernization of C to C++ was needed for a number of files, incl. use of new/delete instead of malloc/free, or string manipulation instead of C char * manipulation

A new Python wrapper script test_launcher.py is being employed now for more flexible checking of return codes, which is needed for death tests and normal tests alike. The expected return codes are still parsed from scanning the source files as before.

The use of the variable TEST_LAUNCHER is quite important, as it sets our wrapper test_launcher.py and its parameters for each single test individually, which is needed, as each test has its own engine, parameteres, expected return code.

…quests in a queue pair), not just relying on device information about max_qp_wr, but actually trying to create QPs via ibv_create_qp with different max_send_wr until we find the largest still working number (via binary search). This becomes the updated m_maxSrs. Independently, the 100K element test manyPuts needs to be downgraded to 5K for our cluster, as our count is just over 10K, but actually 10K does not work as well (not sure why?)

…e in manyPuts -- the device does not support having too many messages in the send WR QP

…it is very complicated to fix these tests - they seem all over the place, not working, but commiting it

… (only the minimum) but still have many failing tests without explanation, and not tested at all properly

…in the execute command. Also, reduce some example message count as it does not work with IB Verbs with very large tests on the ARM machine

…as launcher

…ted with the debug subdirectory

…ent logic if the bloody Gtest wants to just list the tests or run them

…ATH, and that should be it

…. Also, using Setup and TearDown for entire test suite now, pretty neat, no code duplication each time

…d remove c99 requirement and turn them into C++ tests

… I need to make MPI engines in debug layer call MPI_Abort, and pthread engine in debug layer call std::abort

…y, while it works, this didn't solve the problem. Mpirun still is used with pthreads, so it changes the std::abort signal to 134. This is why now I changed the launcher. Still having issues with some hybrid tests though.

… actually contaminating the hybrid code

…ap script from pre-existing googletest messages

… which internally is non-portably converted to 134. This also simplifies the launcher script. Also fix some incorrect delete's for arrays in the collectives

…y, we get all tests at the moment via gtest_add_tests. It would be good to replace gtest_add_tests with gtest_discover_tests in the future though, because the current one takes 60-90 seconds to configure. Also, there is a horrible bug now where if I specify a high CMake version (e.g. the needed 3.29.0), the GoogleTests would simply not compile at all

…chook tests to ensure they don't block each other using the same port

…cs expose the issue, and that number is capped with current CMake.

…before adding it as a target

anyzelman · 2024-11-28T18:07:59Z

At this point, all tests pass on:

x86_64, CentOS Stream 9, GCC 11.5, MPICH 4.1.1
x86_64, CentOS Stream 9, GCC 11.5, OpenMPI 4.1.1
ARM, Ubuntu 24.01 LTS, GCC 11.4, MPICH 4.3.0b1

With OpenMPI, the only exception is that the following two tests could fail due to not enough slots available to OpenMPI:

The following tests FAILED:
        285 - mpimsg_API.func_lpf_register_local_parallel_multiple (Failed)
        378 - mpirma_API.func_lpf_register_local_parallel_multiple (Failed)

The fix would be to pass --oversubscribe to the test launcher, or to configure OpenMPI to have hardware threads dub as slots. Issue #37 has been raised to make it easier; I suggest to not change anything related to this in this MR.

tests/functional/func_lpf_put_parallel_bad_pattern.cpp

…use mpirma and mpimsg as fallback in such cases

…ible (even if no IB card is present). With GTest integration now, this is only possible when tests are disabled

…d (finding the header does not need pass by link stage)

… that the variables are populated by dependent engines

… engines

…d ensure that the variables are populated by dependent engines" This reverts commit faa3b33.

… which were declared in the MPI directory for the MPI engine, while engines in general also define unit tests that need call add_gtest. We break the cycle by pulling engine-specific cflags into the main CMakeLists.txt

…ll have no test on OS X, but at least we will not break something else for that target

anyzelman

This MR has no regressions in functionality and adds all benefits that come with full Google Test integration. In retaining past functionality, as a side-effect also a new feature was added: hybrid tests now also build if libibverbs is not available (it will then rely on the mpirma engine if available, and the mpimsg engine if not).

This MR also resolves bugs that have appeared on more modern cluster deployments compared to the previous main tag of LPF. This is hence a priority MR to base all further extensions and evolutions of LPF on.

At this point, this MR has tested successfully for:

x86_64, CentOS Stream 9, GCC 11.5.0, MPICH 4.1.1 (caveat 2)
x86_64, CentOS Stream 9, GCC 11.5.0, OpenMPI 4.1.1 (caveats 1, 2, and 5)
x86_64, Fedora, GCC 9.3.1, MPICH 3.3.2 (caveat 2)
x86_64, Ubuntu 23.04, GCC 13.1.0, OpenMPI 4.1.6 (caveat 5)
x86_64, openEuler 20.03 LTS, GCC 7.3, MPICH 3.2.1
x86_64, Ubuntu 22.04 LTS, GCC 11.4.0, MPICH 4.3.0b1 (caveat 3, transient)
ARM, Ubuntu 22.04 LTS, GCC 11.4.0, MPICH 4.3.0b1 (caveats 3 & 4)

This are under different variations of calls to bootstrap.sh to test all the above-described changes.

The following caveats apply:

with OpenMPI and default arguments to its mpirun, oversubscription to larger process counts may be limited, causing some tests to fail. Issue #37 has been raised to pass the appropriate flags to the OpenMPI mpirun during testing;
make install post-install checks fail (as they should) when: 1) tests are disabled, 2) ibverbs dev libraries are present, but 3) no IB card is present. On the machines with IB we tested on, all post-install checks succeed.
The hybrid backend on MPICH 4.3.0b1 does not pass the debug layer tests due to a segfault in MPICH's MPI_Abort. Issue #41 has been raised to track this. This error presently only is consistently reproducible on our cluster's ARM nodes and transient on our cluster's x86_64 nodes. They have not been observed outside our cluster.
With MPICH 4.3.0b1, we furthermore hit issues #43 and #44. Also these issues are only reproducible on ARM nodes.
With OpenMPI 4.1.6 and OpenMPI 4.1.1 on x86_64 we hit issue #45
At present, we have no CI flow that can run all build variations relevant to this MR. Issues #40 has been raised to (partially) address as well as to track this issue.

At present, I suggest to merge first and consider the above known issues separately.

This MR fails with the following deprecated/unsupported MPI implementations:

OpenMPI 4.0.2 (tested on x86_64, Fedora, GCC 9.3.1)

I suggest, since these are unsupported MPI implementations, we file this under "won't fix".

anyzelman · 2024-12-02T13:02:56Z

Concept release notes:

This MR makes all LPF functional tests use GoogleTest.

The main changes for CMake are:

reorganize CMake to use GTest CMake package for compilation, and gtest_add_tests clause to define Gtests;
remove all use of the run.sh script;
make explicit the list of tests and do not use file(GLOB, ...) clauses to look for all files in test directories, as recommended by CMakesource files:

The main changes for source files are:

every single test file needed to be modified to not include internal Test.h but gtest.h, and to use the assert/equal clauses of GoogleTest
all death tests are slightly modified now to expect FAIL() after the expected fail condition
some modernization of C to C++ was needed for a number of files, incl. use of new/delete instead of malloc/free, or string manipulation instead of C char * manipulation

A new Python wrapper script test_launcher.py is being employed now for more flexible checking of return codes, which is needed for death tests and normal tests alike. The expected return codes are still parsed from scanning the source files as before.

For the complete history, see GitHub PR #26 .

This MR also resolves bugs that have appeared on more modern cluster deployments compared to the previous main tag of LPF. This is hence a priority MR to base all further extensions and evolutions of LPF on. This MR was tested successfully for:

x86_64, CentOS Stream 9, GCC 11.5.0, MPICH 4.1.1 (caveat 2)
x86_64, CentOS Stream 9, GCC 11.5.0, OpenMPI 4.1.1 (caveats 1, 2, and 5)
x86_64, Fedora, GCC 9.3.1, MPICH 3.3.2 (caveat 2)
x86_64, Ubuntu 23.04, GCC 13.1.0, OpenMPI 4.1.6 (caveat 5)
x86_64, openEuler 20.03 LTS, GCC 7.3, MPICH 3.2.1
x86_64, Ubuntu 22.04 LTS, GCC 11.4.0, MPICH 4.3.0b1 (caveat 3, transient)
ARM, Ubuntu 22.04 LTS, GCC 11.4.0, MPICH 4.3.0b1 (caveats 3 & 4)

These are under different variations of calls to bootstrap.sh to test all the above-described changes.

The following caveats apply:

with OpenMPI and default arguments to its mpirun, oversubscription to larger process counts may be limited, causing some tests to fail. Issue Make it easier to pass arguments for mpirun to test launcher #37 has been raised to pass the appropriate flags to the OpenMPI mpirun during testing;
make install post-install checks fail (as they should) when: 1) tests are disabled, 2) ibverbs dev libraries are present, but 3) no IB card is present. On the machines with IB we tested on, all post-install checks succeed.
The hybrid backend on MPICH 4.3.0b1 does not pass the debug layer tests due to a segfault in MPICH's MPI_Abort. Issue The debug tests for hybrid engine fails for MPICH 4.3.0b1 on call to MPI_Abort #41 has been raised to track this. This error presently only is consistently reproducible on our cluster's ARM nodes and transient on our cluster's x86_64 nodes. They have not been observed outside our cluster.
With MPICH 4.3.0b1, we furthermore hit issues With MPICH 4.3.0b1, func_bsplib_hpsend_many_hybrid_*_debug segfaults #43 and With MPICH 4.3.0b1, func_lpf_hook_tcp_timeout.mpirma returns wrong error code #44. Also these issues are only reproducible on ARM nodes.
With OpenMPI 4.1.6 and OpenMPI 4.1.1 on x86_64 we hit issue func_lpf_probe_parallel_nested tests UB? #45
At present, we have no CI flow that can run all build variations relevant to this MR. Issues Bring up a basic GitHub CI #40 has been raised to (partially) address as well as to track this issue.

This MR makes all LPF functional tests use GoogleTest. The main changes for CMake are: - reorganize CMake to use GTest CMake package for compilation, and gtest_add_tests clause to define Gtests; - remove all use of the run.sh script; - make explicit the list of tests and do not use file(GLOB, ...) clauses to look for all files in test directories, as recommended by CMakesource files: The main changes for source files are: - every single test file needed to be modified to not include internal Test.h but gtest.h, and to use the assert/equal clauses of GoogleTest - all death tests are slightly modified now to expect FAIL() after the expected fail condition - some modernization of C to C++ was needed for a number of files, incl. use of new/delete instead of malloc/free, or string manipulation instead of C char * manipulation A new Python wrapper script test_launcher.py is being employed now for more flexible checking of return codes, which is needed for death tests and normal tests alike. The expected return codes are still parsed from scanning the source files as before. For the complete history, see GitHub PR #26 . This MR also resolves bugs that have appeared on more modern cluster deployments compared to the previous main tag of LPF. This is hence a priority MR to base all further extensions and evolutions of LPF on. This MR was tested successfully for: - x86_64, CentOS Stream 9, GCC 11.5.0, MPICH 4.1.1 (caveat 2) - x86_64, CentOS Stream 9, GCC 11.5.0, OpenMPI 4.1.1 (caveats 1, 2, and 5) - x86_64, Fedora, GCC 9.3.1, MPICH 3.3.2 (caveat 2) - x86_64, Ubuntu 23.04, GCC 13.1.0, OpenMPI 4.1.6 (caveat 5) - x86_64, openEuler 20.03 LTS, GCC 7.3, MPICH 3.2.1 - x86_64, Ubuntu 22.04 LTS, GCC 11.4.0, MPICH 4.3.0b1 (caveat 3, transient) - ARM, Ubuntu 22.04 LTS, GCC 11.4.0, MPICH 4.3.0b1 (caveats 3 & 4) These are under different variations of calls to `bootstrap.sh` to test all the above-described changes. The following caveats apply: 1. with OpenMPI and default arguments to its `mpirun`, oversubscription to larger process counts may be limited, causing some tests to fail. Issue #37 has been raised to pass the appropriate flags to the OpenMPI `mpirun` during testing; 2. `make install` post-install checks fail (as they should) when: 1) tests are disabled, 2) ibverbs dev libraries are present, but 3) no IB card is present. On the machines with IB we tested on, all post-install checks succeed. 3. The hybrid backend on MPICH 4.3.0b1 does not pass the debug layer tests due to a segfault in MPICH's MPI_Abort. Issue #41 has been raised to track this. This error presently only is consistently reproducible on our cluster's ARM nodes and transient on our cluster's x86_64 nodes. They have not been observed outside our cluster. 4. With MPICH 4.3.0b1, we furthermore hit issues #43 and #44. Also these issues are only reproducible on ARM nodes. 5. With OpenMPI 4.1.6 and OpenMPI 4.1.1 on x86_64 we hit issue #45 6. At present, we have no CI flow that can run all build variations relevant to this MR. Issues #40 has been raised to (partially) address as well as to track this issue.

KADichev added 30 commits August 20, 2024 11:39

Decrease the number of messages to use for same reason as the decreas…

219372a

…e in manyPuts -- the device does not support having too many messages in the send WR QP

Trying to modernize LPF to use FindGTest/GoogleTest combination, but …

37cd6eb

…it is very complicated to fix these tests - they seem all over the place, not working, but commiting it

Make tests compile again

2886606

In a middle of a big mess of changes, which I hope will end well

c7dbc7d

Working my way through Gtest-ifying the tests

9eab088

Finished porting functional tests to gtest

47294a7

Added a script converting the source line with P into a process count…

359c512

… (only the minimum) but still have many failing tests without explanation, and not tested at all properly

Fixing how the default process count is parsed (some parsing errors) …

7319765

…in the execute command. Also, reduce some example message count as it does not work with IB Verbs with very large tests on the ARM machine

Fix reading in probe argument, plus use lpfrun now instead of mpirun …

02b14d8

…as launcher

Finished with the tests/functional directory, tests passing. Now star…

e8052c1

…ted with the debug subdirectory

Commit current state as I can't deal with this enormous change

71c1785

Compiles again, will not run okay because of EXPECT_DEATH + MPI

2c98499

Slow progress, now I need to implement in my Python script the differ…

e2a758d

…ent logic if the bloody Gtest wants to just list the tests or run them

Almost got it, now need to fix the debug tests not to issue EXPECT_DE…

7220e76

…ATH, and that should be it

Use GoogleTest but without death tests

6f84f8f

Got IB Verbs tests to work again by setting LPF_MPI_AUTO_INITIALIZE=0…

80f0392

…. Also, using Setup and TearDown for entire test suite now, pretty neat, no code duplication each time

All tests passing now - omitting the huge runs

758b8de

Rename c99 folder to collectives, as the folder tests collectives, an…

4021f1f

…d remove c99 requirement and turn them into C++ tests

First step towards making it work for many engines

703fde7

Go back to only ibverbs for now, have to think how to fix this

d6eebab

This version runs all tests, but fails because I need one more fix --…

7081c78

… I need to make MPI engines in debug layer call MPI_Abort, and pthread engine in debug layer call std::abort

Oops, missing test

3cd577a

I think I figured how to tell hybrid engine to call MPI abort without…

4db403d

… actually contaminating the hybrid code

Request CMake 3.29 if building with tests, and clean up a bit bootstr…

6efc47e

…ap script from pre-existing googletest messages

Improve Pthread abort to return exit(6) instead of calling std::abort…

2f42d4c

… which internally is non-portably converted to 134. This also simplifies the launcher script. Also fix some incorrect delete's for arrays in the collectives

Eliminate remaining DEATH statements in debug folder

15aea88

A very annoying bug that took ages to find.

bfafa5f

KADichev and others added 5 commits November 22, 2024 10:02

Fix some incorrectly refactored tests. Also, add RUN_SERIAL to dynami…

2f1db74

…chook tests to ensure they don't block each other using the same port

Disable bad_pattern test, since it has certain assumptions - 1024 pro…

0d17ddc

…cs expose the issue, and that number is capped with current CMake.

Fix for #36

7cb92df

For engine-specific checks, make sure the target engines are enabled …

c88b20f

…before adding it as a target

Make bootstrap --with-mpiexec work with modern CMake FindMPI

6890dcb

Code review: quick fixes to some formatting

7f31ddd

anyzelman requested changes Nov 28, 2024

View reviewed changes

tests/functional/func_lpf_put_parallel_bad_pattern.cpp Outdated Show resolved Hide resolved

anyzelman added 3 commits November 28, 2024 20:16

Final code review: prevent passing empty strings to CMake

6e9baaf

Let hybrid engine be built even if no ibverbs engine is available -- …

0905568

…use mpirma and mpimsg as fallback in such cases

Allow the previous behaviour of building the ibverbs engine when poss…

74e9f41

…ible (even if no IB card is present). With GTest integration now, this is only possible when tests are disabled

anyzelman mentioned this pull request Nov 29, 2024

Improve and re-enable func_lpf_put_parallel_bad_pattern test #38

Open

Remove disabled test-- retained in issue #38 on GitHub

76fa81d

anyzelman mentioned this pull request Nov 29, 2024

Bring up a basic GitHub CI #40

Open

anyzelman added 5 commits November 29, 2024 13:15

Make lpfcc compiler front-end compatible with CMake once more

cfc2db8

Clarification re last commit

f156f53

Fix post-install check, and make sure it checks only what was intende…

b3c4e5d

…d (finding the header does not need pass by link stage)

Move var declarations closer to where they are to be used, and ensure…

faa3b33

… that the variables are populated by dependent engines

Populate / append lpf_cflags and lpf_lib_link_flags only once for MPI…

635bdfe

… engines

anyzelman mentioned this pull request Nov 29, 2024

post install tasks are broken #28

Open

anyzelman added 5 commits November 29, 2024 13:57

Revert "Move var declarations closer to where they are to be used, an…

94a6f6e

…d ensure that the variables are populated by dependent engines" This reverts commit faa3b33.

Avoid one warning on modern CMake

0edb521

Past code review comment: put back the OS X flag on rdynamic-- we sti…

66f0635

…ll have no test on OS X, but at least we will not break something else for that target

Code review comments from Kiril - remove unnecessary CMake statements

09f8142

anyzelman approved these changes Nov 29, 2024

View reviewed changes

anyzelman mentioned this pull request Dec 1, 2024

func_lpf_probe_parallel_nested tests UB? #45

Open

anyzelman merged commit 14ee22a into master Dec 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Functional tests use GoogleTest now #26

Functional tests use GoogleTest now #26

Uh oh!

KADichev commented Sep 23, 2024 •

edited

Loading

Uh oh!

anyzelman commented Nov 28, 2024

Uh oh!

Uh oh!

anyzelman left a comment •

edited

Loading

Uh oh!

anyzelman commented Dec 2, 2024

Uh oh!

Uh oh!

Functional tests use GoogleTest now #26

Functional tests use GoogleTest now #26

Uh oh!

Conversation

KADichev commented Sep 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anyzelman commented Nov 28, 2024

Uh oh!

Uh oh!

anyzelman left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anyzelman commented Dec 2, 2024

Uh oh!

Uh oh!

KADichev commented Sep 23, 2024 •

edited

Loading

anyzelman left a comment •

edited

Loading