Skip to content

Conversation

@casparvl
Copy link
Contributor

@casparvl casparvl commented Nov 12, 2025

Currently, we construct a ReFrame partition name based on the software subdir that is being used. E.g. the constructed partition name would be x86_64_intel_skylake_avx512 on an intel skylake node. However, this leads to failures in the test step when cross compiling:

ERROR: failed to load configuration: could not find a configuration entry for the requested system/partition combination: 'BotBuildTests:aarch64_neoverse_v1_accel_nvidia_cc90'

Because ReFrame will look for a partition aarch64_neoverse_v1_accel_nvidia_cc90 in the ReFrame config, but that doesn't exist. We could of course define 5 identical partitions aarch64_neoverse_v1_accel_nvidia_ccXX (with XX=70, 80, 90, 100, 120), but there's no point: they all represent the same physical node, with the same physical properties (namely: it has no GPU!).

Rather than this duplication, we should really have one ReFrame partition that corresponds to the actual hardware in the node, and make sure that whenever that node is used for building, this partition config is used. With the change of arch_target_map to node_type_map on the bot side, I silently introduced a property in the cfg/job.cfg file that allows us to do this: it stores the node_type, i.e. the keys in the node_type_map. Using this means one has to use the same names for partitions in the app.cfg and the ReFrame config file, which I think is intuitive.

The way in which I implemented this is that I retrieve the node_type value in the bot/test.sh, then pass it as an argument to test_suite.sh. I also made sure that it still works with the old configuration.

Proof that this works:

@casparvl casparvl marked this pull request as ready for review November 12, 2025 14:59
Comment on lines +186 to +203
# Check if the partition specified by RFM_SYSTEM is in the config file
# Redirect to /dev/null because we don't want to print an ERROR, we want to try a fallback
reframe --show-config | grep -v "could not find a configuration entry for the requested system/partition combination" > /dev/null
if [[ $? -eq 1 ]]; then
# There was a match by grep, so we failed to find the system/partition combination
# Try the previous approach for backwards compatibility
# This fallback can be scrapped once all bots have adopted the new naming convention
# (i.e. using the node_type name from app.cfg) for ReFrame partitions
# Get the correct partition name
echo "Falling back to old naming scheme for REFRAME_PARTITION_NAME."
echo "This naming scheme is deprecated, please update your partition names in the ReFrame config file."
REFRAME_PARTITION_NAME=${EESSI_SOFTWARE_SUBDIR//\//_}
if [ ! -z "$EESSI_ACCELERATOR_TARGET_OVERRIDE" ]; then
REFRAME_PARTITION_NAME=${REFRAME_PARTITION_NAME}_${EESSI_ACCELERATOR_TARGET_OVERRIDE//\//_}
fi
echo "Constructed partition name based on EESSI_SOFTWARE_SUBDIR and EESSI_ACCELERATOR_TARGET: ${REFRAME_PARTITION_NAME}"
export RFM_SYSTEM="BotBuildTests:${REFRAME_PARTITION_NAME}"
fi
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is all to keep backwards compatibility

@laraPPr laraPPr self-requested a review November 12, 2025 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant