Skip to content

[OpenMP] offload hierachical parallelism gives wrong results. #156805

@ye-luo

Description

@ye-luo

Caused by
#146404

Test OvO. https://github.com/TApplencourt/OvO
run

CXX="/soft/compilers/llvm/main-patched/bin/clang++" CXXFLAGS="-fopenmp --offload-arch=sm_90 -O3" ./ovo.sh run test_src/cpp/hierarchical_parallelism

results of 87db8e9

./ovo.sh report --summary --tablefmt github
>> Overall result for test_result/2025-09-04_00-08_hopper01
|   pass rate(%) |   test(#) |   success(#) |   compilation error(#) |   runtime error(#) |   wrong value(#) |   timeout(#) |
|----------------|-----------|--------------|------------------------|--------------------|------------------|--------------|
|          74.2% |       310 |          230 |                      0 |                  0 |               80 |            0 |

 >> Summary
| language   | category                 | name                         |   pass rate(%) |   test(#) |   success(#) |   compilation error(#) |   runtime error(#) |   wrong value(#) |   timeout(#) |
|------------|--------------------------|------------------------------|----------------|-----------|--------------|------------------------|--------------------|------------------|--------------|
| cpp        | hierarchical_parallelism | memcopy-complex_double       |          64.4% |        45 |           29 |                      0 |                  0 |               16 |            0 |
| cpp        | hierarchical_parallelism | memcopy-float                |          64.4% |        45 |           29 |                      0 |                  0 |               16 |            0 |
| cpp        | hierarchical_parallelism | atomic_add-float             |          77.8% |        72 |           56 |                      0 |                  0 |               16 |            0 |
| cpp        | hierarchical_parallelism | reduction_add-complex_double |          78.4% |        74 |           58 |                      0 |                  0 |               16 |            0 |
| cpp        | hierarchical_parallelism | reduction_add-float          |          78.4% |        74 |           58 |                      0 |                  0 |               16 |            0 |

before change 87db8e9

>> Overall result for test_result/2025-09-04_00-24_hopper01
|   pass rate(%) |   test(#) |   success(#) |   compilation error(#) |   runtime error(#) |   wrong value(#) |   timeout(#) |
|----------------|-----------|--------------|------------------------|--------------------|------------------|--------------|
|         100.0% |       310 |          310 |                      0 |                  0 |                0 |            0 |

 >> Summary
| language   | category                 | name                         |   pass rate(%) |   test(#) |   success(#) |   compilation error(#) |   runtime error(#) |   wrong value(#) |   timeout(#) |
|------------|--------------------------|------------------------------|----------------|-----------|--------------|------------------------|--------------------|------------------|--------------|
| cpp        | hierarchical_parallelism | atomic_add-float             |         100.0% |        72 |           72 |                      0 |                  0 |                0 |            0 |
| cpp        | hierarchical_parallelism | memcopy-complex_double       |         100.0% |        45 |           45 |                      0 |                  0 |                0 |            0 |
| cpp        | hierarchical_parallelism | memcopy-float                |         100.0% |        45 |           45 |                      0 |                  0 |                0 |            0 |
| cpp        | hierarchical_parallelism | reduction_add-complex_double |         100.0% |        74 |           74 |                      0 |                  0 |                0 |            0 |
| cpp        | hierarchical_parallelism | reduction_add-float          |         100.0% |        74 |           74 |                      0 |                  0 |                0 |            0 |

LIBOMPTARGET_DEBUG=1 shows the old code launching kernels in generic-SPMD mode but new code in generic mode.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions