Skip to content

CI for testing if EESSI stack is available is only checking single architecture #349

@trz42

Description

@trz42

The CI defined in https://github.com/EESSI/software-layer/blob/2023.06/.github/workflows/test_eessi.yml is not working as intended. It should test if all packages defined by easystack files are available for all supported architectures. It seems it always runs the check for the software directory that is as close as possible to the CPU microarchitecture of the host running the CI script.

For example, for the EESSI pilot 2023.06, tests are run against the x86_64/intel/haswell builds regardless of the software installation to be checked for is for another x86_64 architecture or even for aarch64. See screenshot below.

It seems this is caused by setting the matrix variable EESSI_SOFTWARE_SUBDIR (and not EESSI_SOFTWARE_SUBDIR_OVERRIDE) plus that variable (EESSI_SOFTWARE_SUBDIR_OVERRIDE) not being set when the init script is sourced (source /cvmfs/pilot.eessi-hpc.org/versions/${{matrix.EESSI_VERSION}}/init/bash).

Relevant part of the test_eessi.yml file is

        - name: Test check_missing_installations.sh script
          run: |
              source /cvmfs/pilot.eessi-hpc.org/versions/${{matrix.EESSI_VERSION}}/init/bash
              module load EasyBuild
              eb --version
              export EESSI_PREFIX=/cvmfs/pilot.eessi-hpc.org/versions/${{matrix.EESSI_VERSION}}
              export EESSI_OS_TYPE=linux
              export EESSI_SOFTWARE_SUBDIR=${{matrix.EESSI_SOFTWARE_SUBDIR}}
              env | grep ^EESSI | sort
              echo "just run check_missing_installations.sh (should use eessi-${{matrix.EESSI_VERSION}}.yml)"
              ./check_missing_installations.sh ${{matrix.EASYSTACK_FILE}}

The result can be seen in the screenshot below.

Screenshot 2023-10-03 at 13 34 05

Marked rectangles are as follows

  1. Test should be run against the software directory for aarch64/generic.
  2. However archspec returns the best matching software directory on the CPU microarchitecture where the CI is run is intel/haswell. Then that is used to set up the environment.
  3. EESSI_SOFTWARE_SUBDIR is set according to the definition of the test (but it's the wrong variable and/or too late).
  4. Message shows that this test is run for aarch64/generic.
  5. However the log message from EasyBuild shows that the test is actually run for intel/haswell

Adding the line

              export EESSI_SOFTWARE_SUBDIR_OVERRIDE=${{matrix.EESSI_SOFTWARE_SUBDIR}}

before the init script is sourced, fixes the issue. See changes applied in https://github.com/NorESSI/software-layer/pull/167/files#diff-39e6e5e8c8c229d5ef64936a450e3e0a162dba5c72427a25e6d9e918e0a7d699 and https://github.com/NorESSI/software-layer/actions/runs/6338385822/job/17215379514#logs for the results.

Apparently, the test then uses the compat layer for x86_64 (due to using name -m when setting the EESSI_CPU_FAMILY environment variable in https://github.com/EESSI/software-layer/blob/2023.06/init/minimal_eessi_env#L20), and the EasyBuild installation from the software layer (as defined in the matrix).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtestsRelated to software testing

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions