[Flang][OpenMP] Appropriately emit present/load/store in all cases in MapInfoFinalization #150311

agozillon · 2025-07-23T20:24:46Z

Currently, we return early whenever we've already generated an allocation for intermediate descriptor variables (required in certain cases when we can't directly access the base address of a passes in descriptor function argument due to HLFIR/FIR restrictions). This unfortunately, skips over the presence check and load/store required to set the intermediate descriptor allocations values/data. This is fine in most cases, but if a function happens to have a series of branches with seperate target regions capturing the same input argument, we'd emit the present/load/store into the first branch with the first target inside of it, the secondary (or any preceding) branches would not have the present/load/store, this would lead to the subsequent mapped values in that branch being empty and then leading to a memory access violation on device.

The fix for the moment is to emit a present/load/store at the relevant location of every target utilising the input argument, this likely will also lead to fixing possible issues with the input argument being manipulated inbetween target regions (primarily resizing, the data should remain the same as we're just copying an address around, in theory at least). There's possible optimizations/simplifications to emit less load/stores such as by raising the load/store out of the branches when we can, but I'm inclined to leave this sort of optimization to lower level passes such as an LLVM pass (which very possibly already covers it).

llvmbot · 2025-07-23T20:25:24Z

@llvm/pr-subscribers-offload
@llvm/pr-subscribers-flang-fir-hlfir

@llvm/pr-subscribers-flang-openmp

Author: None (agozillon)

Changes

Currently, we return early whenever we've already generated an allocation for intermediate descriptor variables (required in certain cases when we can't directly access the base address of a passes in descriptor function argument due to HLFIR/FIR restrictions). This unfortunately, skips over the presence check and load/store required to set the intermediate descriptor allocations values/data. This is fine in most cases, but if a function happens to have a series of branches with seperate target regions capturing the same input argument, we'd emit the present/load/store into the first branch with the first target inside of it, the secondary (or any preceding) branches would not have the present/load/store, this would lead to the subsequent mapped values in that branch being empty and then leading to a memory access violation on device.

The fix for the moment is to emit a present/load/store at the relevant location of every target utilising the input argument, this likely will also lead to fixing possible issues with the input argument being manipulated inbetween target regions (primarily resizing, the data should remain the same as we're just copying an address around, in theory at least). There's possible optimizations/simplifications to emit less load/stores such as by raising the load/store out of the branches when we can, but I'm inclined to leave this sort of optimization to lower level passes such as an LLVM pass (which very possibly already covers it).

Full diff: https://github.com/llvm/llvm-project/pull/150311.diff

2 Files Affected:

(modified) flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp (+31-22)
(added) flang/test/Lower/OpenMP/optional-argument-map-3.f90 (+46)

diff --git a/flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp b/flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp
index f052cf8d22d1e..de1a98cd678ea 100644
--- a/flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp
+++ b/flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp
@@ -137,32 +137,41 @@ class MapInfoFinalizationPass
         !fir::factory::isOptionalArgument(descriptor.getDefiningOp()))
       return descriptor;
 
-    mlir::Value &slot = localBoxAllocas[descriptor.getDefiningOp()];
-    if (slot) {
-      return slot;
+    mlir::Value &alloca = localBoxAllocas[descriptor.getDefiningOp()];
+    mlir::Location loc = boxMap->getLoc();
+
+    if (!alloca) {
+      // The fir::BoxOffsetOp only works with !fir.ref<!fir.box<...>> types, as
+      // allowing it to access non-reference box operations can cause some
+      // problematic SSA IR. However, in the case of assumed shape's the type
+      // is not a !fir.ref, in these cases to retrieve the appropriate
+      // !fir.ref<!fir.box<...>> to access the data we need to map we must
+      // perform an alloca and then store to it and retrieve the data from the
+      // new alloca.
+      mlir::OpBuilder::InsertPoint insPt = builder.saveInsertionPoint();
+      mlir::Block *allocaBlock = builder.getAllocaBlock();
+      assert(allocaBlock && "No alloca block found for this top level op");
+      builder.setInsertionPointToStart(allocaBlock);
+
+      mlir::Type allocaType = descriptor.getType();
+      if (fir::isBoxAddress(allocaType))
+        allocaType = fir::unwrapRefType(allocaType);
+      alloca = builder.create<fir::AllocaOp>(loc, allocaType);
+      builder.restoreInsertionPoint(insPt);
     }
 
-    // The fir::BoxOffsetOp only works with !fir.ref<!fir.box<...>> types, as
-    // allowing it to access non-reference box operations can cause some
-    // problematic SSA IR. However, in the case of assumed shape's the type
-    // is not a !fir.ref, in these cases to retrieve the appropriate
-    // !fir.ref<!fir.box<...>> to access the data we need to map we must
-    // perform an alloca and then store to it and retrieve the data from the new
-    // alloca.
-    mlir::OpBuilder::InsertPoint insPt = builder.saveInsertionPoint();
-    mlir::Block *allocaBlock = builder.getAllocaBlock();
-    mlir::Location loc = boxMap->getLoc();
-    assert(allocaBlock && "No alloca block found for this top level op");
-    builder.setInsertionPointToStart(allocaBlock);
-
-    mlir::Type allocaType = descriptor.getType();
-    if (fir::isBoxAddress(allocaType))
-      allocaType = fir::unwrapRefType(allocaType);
-    auto alloca = builder.create<fir::AllocaOp>(loc, allocaType);
-    builder.restoreInsertionPoint(insPt);
     // We should only emit a store if the passed in data is present, it is
     // possible a user passes in no argument to an optional parameter, in which
     // case we cannot store or we'll segfault on the emitted memcpy.
+    // TODO: We currently emit a present -> load/store every time we use a
+    // mapped value that requires a local allocation, this isn't the most
+    // efficient, although, it is more correct in a lot of situations. One
+    // such situation is emitting a this series of instructions in separate
+    // segments of a branch (e.g. two target regions in separate else/if branch
+    // mapping the same function argument), however, it would be nice to be able
+    // to optimize these situations e.g. raising the load/store out of the
+    // branch if possible. But perhaps this is best left to lower level
+    // optimisation passes.
     auto isPresent =
         builder.create<fir::IsPresentOp>(loc, builder.getI1Type(), descriptor);
     builder.genIfOp(loc, {}, isPresent, false)
@@ -171,7 +180,7 @@ class MapInfoFinalizationPass
           builder.create<fir::StoreOp>(loc, descriptor, alloca);
         })
         .end();
-    return slot = alloca;
+    return alloca;
   }
 
   /// Function that generates a FIR operation accessing the descriptor's
diff --git a/flang/test/Lower/OpenMP/optional-argument-map-3.f90 b/flang/test/Lower/OpenMP/optional-argument-map-3.f90
new file mode 100644
index 0000000000000..f1c1ac2d0bad0
--- /dev/null
+++ b/flang/test/Lower/OpenMP/optional-argument-map-3.f90
@@ -0,0 +1,46 @@
+!RUN: %flang_fc1 -emit-hlfir -fopenmp %s -o - | FileCheck %s
+
+module mod
+contains
+      subroutine foo( dt, switch )
+        implicit none
+        real(4), dimension(:), INTENT(INOUT) :: dt
+        logical, INTENT(IN) :: switch
+        integer :: dim, idx
+
+        if ( switch ) then
+!$omp target teams distribute parallel do
+            do idx = 1, 100
+            dt(idx) = 20
+            end do
+        else
+!$omp target teams distribute parallel do
+            do idx = 1, 100
+            dt(idx) = 30
+            end do
+        end if
+      end subroutine foo
+end module
+
+! CHECK-LABEL:   func.func @{{.*}}(
+! CHECK-SAME:      %[[ARG0:.*]]: !fir.box<!fir.array<?xf32>> {fir.bindc_name = "dt"},
+! CHECK:           %[[VAL_0:.*]] = fir.alloca !fir.box<!fir.array<?xf32>>
+! CHECK:           %[[VAL_1:.*]]:2 = hlfir.declare %[[ARG0]] dummy_scope {{.*}}
+! CHECK:           fir.if %{{.*}} {
+! CHECK:             %[[VAL_2:.*]] = fir.is_present %[[VAL_1]]#1 : (!fir.box<!fir.array<?xf32>>) -> i1
+! CHECK:             fir.if %[[VAL_2]] {
+! CHECK:               fir.store %[[VAL_1]]#1 to %[[VAL_0]] : !fir.ref<!fir.box<!fir.array<?xf32>>>
+! CHECK:             }
+! CHECK:             %[[VAL_3:.*]] = fir.box_offset %[[VAL_0]] base_addr : (!fir.ref<!fir.box<!fir.array<?xf32>>>) -> !fir.llvm_ptr<!fir.ref<!fir.array<?xf32>>>
+! CHECK:             %[[VAL_4:.*]] = omp.map.info var_ptr(%[[VAL_0]] : !fir.ref<!fir.box<!fir.array<?xf32>>>, f32) map_clauses(implicit, tofrom) capture(ByRef) var_ptr_ptr(%[[VAL_3]] : !fir.llvm_ptr<!fir.ref<!fir.array<?xf32>>>) bounds(%{{.*}}) -> !fir.llvm_ptr<!fir.ref<!fir.array<?xf32>>> {name = ""}
+! CHECK:             %[[VAL_5:.*]] = omp.map.info var_ptr(%[[VAL_0]] : !fir.ref<!fir.box<!fir.array<?xf32>>>, !fir.box<!fir.array<?xf32>>) map_clauses(implicit, to) capture(ByRef) members(%[[VAL_4]] : [0] : !fir.llvm_ptr<!fir.ref<!fir.array<?xf32>>>) -> !fir.ref<!fir.array<?xf32>> {name = "dt"}
+! CHECK:             omp.target host_eval({{.*}}) map_entries({{.*}}%[[VAL_5]] -> {{.*}}, %[[VAL_4]] -> {{.*}} : {{.*}}) {
+! CHECK:           } else {
+! CHECK:             %[[VAL_6:.*]] = fir.is_present %[[VAL_1]]#1 : (!fir.box<!fir.array<?xf32>>) -> i1
+! CHECK:             fir.if %[[VAL_6]] {
+! CHECK:               fir.store %[[VAL_1]]#1 to %[[VAL_0]] : !fir.ref<!fir.box<!fir.array<?xf32>>>
+! CHECK:             }
+! CHECK:             %[[VAL_7:.*]] = fir.box_offset %[[VAL_0]] base_addr : (!fir.ref<!fir.box<!fir.array<?xf32>>>) -> !fir.llvm_ptr<!fir.ref<!fir.array<?xf32>>>
+! CHECK:             %[[VAL_8:.*]] = omp.map.info var_ptr(%[[VAL_0]] : !fir.ref<!fir.box<!fir.array<?xf32>>>, f32) map_clauses(implicit, tofrom) capture(ByRef) var_ptr_ptr(%[[VAL_7]] : !fir.llvm_ptr<!fir.ref<!fir.array<?xf32>>>) bounds(%{{.*}}) -> !fir.llvm_ptr<!fir.ref<!fir.array<?xf32>>> {name = ""}
+! CHECK:             %[[VAL_9:.*]] = omp.map.info var_ptr(%[[VAL_0]] : !fir.ref<!fir.box<!fir.array<?xf32>>>, !fir.box<!fir.array<?xf32>>) map_clauses(implicit, to) capture(ByRef) members(%[[VAL_8]] : [0] : !fir.llvm_ptr<!fir.ref<!fir.array<?xf32>>>) -> !fir.ref<!fir.array<?xf32>> {name = "dt"}
+! CHECK:             omp.target host_eval({{.*}}) map_entries({{.*}}, %[[VAL_9]] ->{{.*}}, %[[VAL_8]] -> {{.*}} : {{.*}}) {

mjklemm · 2025-07-24T13:06:22Z

offload/test/offloading/fortran/optional-mapped-arguments-3.f90

Shouldn't this test rather go into the llvm test repository? That appears to me to be a runtime functionality test.

Everything in this directory is an end to end runtime tests (executed via check-offload if you have the right pre-requisites), they're just OpenMP Fortran specific :-) There's CUDA and C++ OpenMP in here as well a directory or so above (there also used to be host specific runtime tests, but since this is the offload project now they're likely still in the OpenMP subdirectory). It could also be added to the llvm test repository, but there's already plenty of precedent for adding these tests in this (and parent/sibling) directories! I'm also not so sure the llvm test repository is setup/configured to run offload tests!

Ah, OK. I was under the impression that runtime tests should not be part of the llvm-project tests. I'll take it back in that case. :-)

It's a really unusual case I think and perhaps an unfortunate one as I think a few of the wider range of OpenMP/offload tests have a tendency to hang (depending on device/environment/feeling of the GPU/CPU on that particular day) makes it hard to get the tests ran in the buildbots that ping people as they need to consistently be green enough :-)

mjklemm · 2025-07-24T13:31:18Z

flang/test/Lower/OpenMP/optional-argument-map-3.f90

Please use some more consistent formatting for the test code.

Will do, I now have the power of fprettify at my finger tips... :-)

mjklemm · 2025-07-24T13:32:14Z

flang/test/Lower/OpenMP/optional-argument-map-3.f90

Suggested change

module mod

contains

subroutine foo( dt, switch )

implicit none

real(4), dimension(:), INTENT(INOUT) :: dt

logical, INTENT(IN) :: switch

integer :: dim, idx

if ( switch ) then

!$omp target teams distribute parallel do

do idx = 1, 100

dt(idx) = 20

end do

else

!$omp target teams distribute parallel do

do idx = 1, 100

dt(idx) = 30

end do

end if

end subroutine foo

end module

module mod

contains

subroutine foo( dt, switch )

implicit none

real(4), dimension(:), INTENT(INOUT) :: dt

logical, INTENT(IN) :: switch

integer :: dim, idx

if ( switch ) then

!$omp target teams distribute parallel do

do idx = 1, 100

dt(idx) = 20

end do

else

!$omp target teams distribute parallel do

do idx = 1, 100

dt(idx) = 30

end do

end if

end subroutine foo

end module

agozillon · 2025-07-24T22:16:24Z

rebased and formatted the tests (hopefully a bit better) with the last commit.

mjklemm

LGTM. Thanks!

agozillon · 2025-07-25T08:44:30Z

Thank you very much Michael for the review!

I'll look to land this at the end of the EU workday today (gives me a bit of time to babysit the buildbot for breakages), after some further downstream testing comes back just to be 100% sure, so please if anyone has any further review comments don't hesitate to add them :-)

… MapInfoFinalization Currently, we return early whenever we've already generated an allocation for intermediate descriptor variables (required in certain cases when we can't directly access the base address of a passes in descriptor function argument due to HLFIR/FIR restrictions). This unfortunately, skips over the presence check and load/store required to set the intermediate descriptor allocations values/data. This is fine in most cases, but if a function happens to have a series of branches with seperate target regions capturing the same input argument, we'd emit the present/load/store into the first branch with the first target inside of it, the secondary (or any preceding) branches would not have the present/load/store, this would lead to the subsequent mapped values in that branch being empty and then leading to a memory access violation on device. The fix for the moment is to emit a present/load/store at the relevant location of every target utilising the input argument, this likely will also lead to fixing possible issues with the input argument being manipulated inbetween target regions (primarily resizing, the data should remain the same as we're just copying an address around, in theory at least). There's possible optimizations/simplifications to emit less load/stores such as by raising the load/store out of the branches when we can, but I'm inclined to leave this sort of optimization to lower level passes such as an LLVM pass (which very possibly already covers it).

… MapInfoFinalization (llvm#150311) Currently, we return early whenever we've already generated an allocation for intermediate descriptor variables (required in certain cases when we can't directly access the base address of a passes in descriptor function argument due to HLFIR/FIR restrictions). This unfortunately, skips over the presence check and load/store required to set the intermediate descriptor allocations values/data. This is fine in most cases, but if a function happens to have a series of branches with seperate target regions capturing the same input argument, we'd emit the present/load/store into the first branch with the first target inside of it, the secondary (or any preceding) branches would not have the present/load/store, this would lead to the subsequent mapped values in that branch being empty and then leading to a memory access violation on device. The fix for the moment is to emit a present/load/store at the relevant location of every target utilising the input argument, this likely will also lead to fixing possible issues with the input argument being manipulated inbetween target regions (primarily resizing, the data should remain the same as we're just copying an address around, in theory at least). There's possible optimizations/simplifications to emit less load/stores such as by raising the load/store out of the branches when we can, but I'm inclined to leave this sort of optimization to lower level passes such as an LLVM pass (which very possibly already covers it).

agozillon requested review from DominikAdamski, dpalermo, mjklemm and skatrak July 23, 2025 20:24

llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir flang:openmp labels Jul 23, 2025

agozillon force-pushed the missing-load-store-in-branch-crash branch from 8296423 to a2e7f1f Compare July 23, 2025 20:25

llvmbot added the offload label Jul 23, 2025

mjklemm reviewed Jul 24, 2025

View reviewed changes

agozillon force-pushed the missing-load-store-in-branch-crash branch 2 times, most recently from 2c9c0d9 to e35f2c1 Compare July 24, 2025 22:15

mjklemm approved these changes Jul 25, 2025

View reviewed changes

agozillon added 2 commits July 25, 2025 03:57

Tidy up tests

e3538b8

agozillon force-pushed the missing-load-store-in-branch-crash branch from e35f2c1 to e3538b8 Compare July 25, 2025 09:15

agozillon merged commit 73272d6 into llvm:main Jul 25, 2025
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Flang][OpenMP] Appropriately emit present/load/store in all cases in MapInfoFinalization #150311

[Flang][OpenMP] Appropriately emit present/load/store in all cases in MapInfoFinalization #150311

Uh oh!

agozillon commented Jul 23, 2025

Uh oh!

llvmbot commented Jul 23, 2025 •

edited

Loading

Uh oh!

mjklemm Jul 24, 2025

Uh oh!

agozillon Jul 24, 2025

Uh oh!

mjklemm Jul 25, 2025

Uh oh!

agozillon Jul 25, 2025

Uh oh!

mjklemm Jul 24, 2025

Uh oh!

agozillon Jul 24, 2025

Uh oh!

mjklemm Jul 24, 2025

Uh oh!

agozillon commented Jul 24, 2025

Uh oh!

mjklemm left a comment

Uh oh!

agozillon commented Jul 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Flang][OpenMP] Appropriately emit present/load/store in all cases in MapInfoFinalization #150311

[Flang][OpenMP] Appropriately emit present/load/store in all cases in MapInfoFinalization #150311

Uh oh!

Conversation

agozillon commented Jul 23, 2025

Uh oh!

llvmbot commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mjklemm Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

agozillon Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

mjklemm Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

agozillon Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

mjklemm Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

agozillon Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

mjklemm Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

agozillon commented Jul 24, 2025

Uh oh!

mjklemm left a comment

Choose a reason for hiding this comment

Uh oh!

agozillon commented Jul 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

llvmbot commented Jul 23, 2025 •

edited

Loading