Skip to content

Conversation

@bhandarkar-pranav
Copy link
Contributor

@bhandarkar-pranav bhandarkar-pranav commented Aug 26, 2025

This PR adds support for translation of the private clause on deferred target tasks - that is omp.target operations with the nowait clause.

An offloading call for a deferred target-task is not blocking - the offloading host task continues it execution after issuing the offloading call. Therefore, the key problem we need to solve is to ensure that the data needed for private variables to be initialized in the target task persists even after the host task has completed.
We do this in a new pass called PrepareForOMPOffloadPrivatizationPass. For a privatized variable that needs its host counterpart for initialization (such as the shape of the data from the descriptor when an allocatable is privatized or the value of the data when an allocatable is firstprivatized),

  • the pass allocates memory on the heap.
  • it then initializes this memory by using the init and copy (for firstprivate) regions of the corresponding omp::PrivateClauseOp.
  • Finally the memory allocated on the heap is free using the dealloc region of the same omp::PrivateClauseOp instance. This step is not straightforward though, because we cannot simply free the memory that's going to be used by another thread without any synchronization. So, for deallocation, we create a omp.task after the omp.target and synchronize the two with a dummy dependency (using the depend clause). In this newly created omp.task we do the deallocation.

@llvmbot
Copy link
Member

llvmbot commented Aug 26, 2025

@llvm/pr-subscribers-mlir-llvm
@llvm/pr-subscribers-flang-fir-hlfir

@llvm/pr-subscribers-flang-driver

Author: Pranav Bhandarkar (bhandarkar-pranav)

Changes

This PR adds support for translation of the private clause on deferred target tasks - that is omp.target operations with the nowait clause.

An offloading call for a deferred target-task is not blocking - the offloading host task continues it execution after issuing the offloading call. Therefore, the key problem we need to solve is to ensure that the data needed for private variables to be initialized in the target task persists even after the host task has completed.
We do this in a new pass called PrepareForOMPOffloadPrivatizationPass. For a privatized variable that needs its host counterpart for initialization (such as the shape of the data from the descriptor when an allocatable is privatized or the value of the data when an allocatable is firstprivatized),

  • the pass allocates memory on the heap.
  • it then initializes this memory by copying the contents of host variable to the newly allocated location on the heap.
  • Then, the pass updates all the omp.map.info operations that pointed to the host variable to now point to the one located in the heap.

The pass uses a rewrite pattern applied using the greedy pattern matcher, which in turn does some constant folding and DCE. Due to this a number of lit tests had to be updated. In GEPs constant get folded into indices and truncated to i32 types. In some tests sequence of insertvalue and extractvalue instructions get cancelled out. So, these needed to be updated too.


Patch is 79.33 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/155348.diff

30 Files Affected:

  • (modified) flang/include/flang/Optimizer/Passes/Pipelines.h (+1)
  • (modified) flang/lib/Optimizer/Passes/Pipelines.cpp (+6)
  • (modified) flang/test/Driver/tco-emit-final-mlir.fir (+2-2)
  • (modified) flang/test/Driver/tco-test-gen.fir (+2-3)
  • (modified) flang/test/Fir/alloc-32.fir (+1-1)
  • (modified) flang/test/Fir/alloc.fir (+9-8)
  • (modified) flang/test/Fir/arrexp.fir (+2-2)
  • (modified) flang/test/Fir/basic-program.fir (+2)
  • (modified) flang/test/Fir/box.fir (+3-3)
  • (modified) flang/test/Fir/boxproc.fir (+4-12)
  • (modified) flang/test/Fir/embox.fir (+3-3)
  • (modified) flang/test/Fir/omp-reduction-embox-codegen.fir (+3-3)
  • (modified) flang/test/Fir/optional.fir (+1-2)
  • (modified) flang/test/Fir/pdt.fir (+3-3)
  • (modified) flang/test/Fir/rebox.fir (+9-9)
  • (modified) flang/test/Fir/select.fir (+1-1)
  • (modified) flang/test/Fir/target.fir (-4)
  • (modified) flang/test/Fir/tbaa-codegen2.fir (+3-9)
  • (modified) flang/test/Integration/OpenMP/map-types-and-sizes.f90 (+7-7)
  • (modified) flang/test/Lower/allocatable-polymorphic.f90 (-4)
  • (modified) flang/test/Lower/forall/character-1.f90 (+2-2)
  • (added) mlir/include/mlir/Dialect/LLVMIR/Transforms/OpenMPOffloadPrivatizationPrepare.h (+23)
  • (modified) mlir/include/mlir/Dialect/LLVMIR/Transforms/Passes.td (+12)
  • (modified) mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td (+2-2)
  • (modified) mlir/lib/Dialect/LLVMIR/Transforms/CMakeLists.txt (+2)
  • (added) mlir/lib/Dialect/LLVMIR/Transforms/OpenMPOffloadPrivatizationPrepare.cpp (+425)
  • (modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+2-9)
  • (modified) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp (+1)
  • (added) mlir/test/Dialect/LLVMIR/omp-offload-privatization-prepare.mlir (+167)
  • (modified) mlir/test/Target/LLVMIR/openmp-todo.mlir (-18)
diff --git a/flang/include/flang/Optimizer/Passes/Pipelines.h b/flang/include/flang/Optimizer/Passes/Pipelines.h
index a3f59ee8dd013..17d48f46e4b9b 100644
--- a/flang/include/flang/Optimizer/Passes/Pipelines.h
+++ b/flang/include/flang/Optimizer/Passes/Pipelines.h
@@ -22,6 +22,7 @@
 #include "mlir/Conversion/SCFToControlFlow/SCFToControlFlow.h"
 #include "mlir/Dialect/GPU/IR/GPUDialect.h"
 #include "mlir/Dialect/LLVMIR/LLVMAttrs.h"
+#include "mlir/Dialect/LLVMIR/Transforms/OpenMPOffloadPrivatizationPrepare.h"
 #include "mlir/Pass/PassManager.h"
 #include "mlir/Transforms/GreedyPatternRewriteDriver.h"
 #include "mlir/Transforms/Passes.h"
diff --git a/flang/lib/Optimizer/Passes/Pipelines.cpp b/flang/lib/Optimizer/Passes/Pipelines.cpp
index ca8e820608688..6a11461cd8380 100644
--- a/flang/lib/Optimizer/Passes/Pipelines.cpp
+++ b/flang/lib/Optimizer/Passes/Pipelines.cpp
@@ -403,6 +403,12 @@ void createMLIRToLLVMPassPipeline(mlir::PassManager &pm,
 
   // Add codegen pass pipeline.
   fir::createDefaultFIRCodeGenPassPipeline(pm, config, inputFilename);
+
+  // Run a pass to prepare for translation of delayed privatization in the
+  // context of deferred target tasks.
+  addNestedPassConditionally<mlir::LLVM::LLVMFuncOp>(pm, disableFirToLlvmIr,[&]() {
+    return mlir::LLVM::createPrepareForOMPOffloadPrivatizationPass();
+  });
 }
 
 } // namespace fir
diff --git a/flang/test/Driver/tco-emit-final-mlir.fir b/flang/test/Driver/tco-emit-final-mlir.fir
index 75f8f153127af..177810cf41378 100644
--- a/flang/test/Driver/tco-emit-final-mlir.fir
+++ b/flang/test/Driver/tco-emit-final-mlir.fir
@@ -13,7 +13,7 @@
 // CHECK: llvm.return
 // CHECK-NOT: func.func
 
-func.func @_QPfoo() {
+func.func @_QPfoo() -> !fir.ref<i32> {
   %1 = fir.alloca i32
-  return
+  return %1 : !fir.ref<i32>
 }
diff --git a/flang/test/Driver/tco-test-gen.fir b/flang/test/Driver/tco-test-gen.fir
index 38d4e50ecf3aa..15483f7ee3534 100644
--- a/flang/test/Driver/tco-test-gen.fir
+++ b/flang/test/Driver/tco-test-gen.fir
@@ -42,11 +42,10 @@ func.func @_QPtest(%arg0: !fir.ref<i32> {fir.bindc_name = "num"}, %arg1: !fir.re
 // CHECK-SAME:      %[[ARG2:.*]]: !llvm.ptr {fir.bindc_name = "ub", llvm.nocapture},
 // CHECK-SAME:      %[[ARG3:.*]]: !llvm.ptr {fir.bindc_name = "step", llvm.nocapture}) {
 
+// CMPLX:           %[[VAL_3:.*]] = llvm.mlir.constant(0 : index) : i64
+// CMPLX:           %[[VAL_2:.*]] = llvm.mlir.constant(1 : index) : i64
 // CMPLX:           %[[VAL_0:.*]] = llvm.mlir.constant(1 : i64) : i64
 // CMPLX:           %[[VAL_1:.*]] = llvm.alloca %[[VAL_0]] x i32 {bindc_name = "i"} : (i64) -> !llvm.ptr
-// CMPLX:           %[[VAL_2:.*]] = llvm.mlir.constant(1 : index) : i64
-// CMPLX:           %[[VAL_3:.*]] = llvm.mlir.constant(0 : index) : i64
-// CMPLX:           %[[VAL_4:.*]] = llvm.mlir.constant(1 : i64) : i64
 
 // SIMPLE:          %[[VAL_3:.*]] = llvm.mlir.constant(0 : index) : i64
 // SIMPLE:          %[[VAL_2:.*]] = llvm.mlir.constant(1 : index) : i64
diff --git a/flang/test/Fir/alloc-32.fir b/flang/test/Fir/alloc-32.fir
index a3cbf200c24fc..f57f6ce6fcf5e 100644
--- a/flang/test/Fir/alloc-32.fir
+++ b/flang/test/Fir/alloc-32.fir
@@ -19,7 +19,7 @@ func.func @allocmem_scalar_nonchar() -> !fir.heap<i32> {
 // CHECK-LABEL: define ptr @allocmem_scalar_dynchar(
 // CHECK-SAME: i32 %[[len:.*]])
 // CHECK: %[[mul1:.*]] = sext i32 %[[len]] to i64
-// CHECK: %[[mul2:.*]] = mul i64 1, %[[mul1]]
+// CHECK: %[[mul2:.*]] = mul i64 %[[mul1]], 1
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[mul2]], 0
 // CHECK: %[[sz:.*]] = select i1 %[[cmp]], i64 %[[mul2]], i64 1
 // CHECK: %[[trunc:.*]] = trunc i64 %[[sz]] to i32
diff --git a/flang/test/Fir/alloc.fir b/flang/test/Fir/alloc.fir
index 8da8b828c18b9..0d3ce323d0d7c 100644
--- a/flang/test/Fir/alloc.fir
+++ b/flang/test/Fir/alloc.fir
@@ -86,7 +86,7 @@ func.func @alloca_scalar_dynchar_kind(%l : i32) -> !fir.ref<!fir.char<2,?>> {
 // CHECK-LABEL: define ptr @allocmem_scalar_dynchar(
 // CHECK-SAME: i32 %[[len:.*]])
 // CHECK: %[[mul1:.*]] = sext i32 %[[len]] to i64
-// CHECK: %[[mul2:.*]] = mul i64 1, %[[mul1]]
+// CHECK: %[[mul2:.*]] = mul i64 %[[mul1]], 1
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[mul2]], 0
 // CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[mul2]], i64 1
 // CHECK: call ptr @malloc(i64 %[[size]])
@@ -98,7 +98,7 @@ func.func @allocmem_scalar_dynchar(%l : i32) -> !fir.heap<!fir.char<1,?>> {
 // CHECK-LABEL: define ptr @allocmem_scalar_dynchar_kind(
 // CHECK-SAME: i32 %[[len:.*]])
 // CHECK: %[[mul1:.*]] = sext i32 %[[len]] to i64
-// CHECK: %[[mul2:.*]] = mul i64 2, %[[mul1]]
+// CHECK: %[[mul2:.*]] = mul i64 %[[mul1]], 2
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[mul2]], 0
 // CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[mul2]], i64 1
 // CHECK: call ptr @malloc(i64 %[[size]])
@@ -185,7 +185,7 @@ func.func @alloca_dynarray_of_nonchar2(%e: index) -> !fir.ref<!fir.array<?x?xi32
 
 // CHECK-LABEL: define ptr @allocmem_dynarray_of_nonchar(
 // CHECK-SAME: i64 %[[extent:.*]])
-// CHECK: %[[prod1:.*]] = mul i64 12, %[[extent]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[extent]], 12
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod1]], 0
 // CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[prod1]], i64 1
 // CHECK: call ptr @malloc(i64 %[[size]])
@@ -196,7 +196,7 @@ func.func @allocmem_dynarray_of_nonchar(%e: index) -> !fir.heap<!fir.array<3x?xi
 
 // CHECK-LABEL: define ptr @allocmem_dynarray_of_nonchar2(
 // CHECK-SAME: i64 %[[extent:.*]])
-// CHECK: %[[prod1:.*]] = mul i64 4, %[[extent]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[extent]], 4
 // CHECK: %[[prod2:.*]] = mul i64 %[[prod1]], %[[extent]]
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod2]], 0
 // CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[prod2]], i64 1
@@ -227,7 +227,7 @@ func.func @alloca_dynarray_of_char2(%e : index) -> !fir.ref<!fir.array<?x?x!fir.
 
 // CHECK-LABEL: define ptr @allocmem_dynarray_of_char(
 // CHECK-SAME: i64 %[[extent:.*]])
-// CHECK: %[[prod1:.*]] = mul i64 60, %[[extent]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[extent]], 60
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod1]], 0
 // CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[prod1]], i64 1
 // CHECK: call ptr @malloc(i64 %[[size]])
@@ -238,7 +238,7 @@ func.func @allocmem_dynarray_of_char(%e : index) -> !fir.heap<!fir.array<3x?x!fi
 
 // CHECK-LABEL: define ptr @allocmem_dynarray_of_char2(
 // CHECK-SAME: i64 %[[extent:.*]])
-// CHECK: %[[prod1:.*]] = mul i64 20, %[[extent]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[extent]], 20
 // CHECK: %[[prod2:.*]] = mul i64 %[[prod1]], %[[extent]]
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod2]], 0
 // CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[mul2]], i64 1
@@ -286,7 +286,7 @@ func.func @allocmem_dynarray_of_dynchar(%l: i32, %e : index) -> !fir.heap<!fir.a
 // CHECK-LABEL: define ptr @allocmem_dynarray_of_dynchar2(
 // CHECK-SAME: i32 %[[len:.*]], i64 %[[extent:.*]])
 // CHECK: %[[a:.*]] = sext i32 %[[len]] to i64
-// CHECK: %[[prod1:.*]] = mul i64 2, %[[a]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[a]], 2
 // CHECK: %[[prod2:.*]] = mul i64 %[[prod1]], %[[extent]]
 // CHECK: %[[prod3:.*]] = mul i64 %[[prod2]], %[[extent]]
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod3]], 0
@@ -366,12 +366,13 @@ func.func @allocmem_array_with_holes_dynchar(%arg0: index, %arg1: index) -> !fir
 // CHECK:    %[[VAL_0:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, ptr, [1 x i64] }, i64 1
 // CHECK:    %[[VAL_3:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]], ptr, [1 x i64] }, i64 1
 // CHECK:    %[[VAL_2:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, ptr, [1 x i64] }, i64 1
-
+func.func private @foo(%0: !fir.ref<!fir.class<none>>, %1: !fir.ref<!fir.class<!fir.array<?xnone>>>, %2: !fir.ref<!fir.box<none>>, %3: !fir.ref<!fir.box<!fir.array<?xnone>>>)
 func.func @alloca_unlimited_polymorphic_box() {
   %0 = fir.alloca !fir.class<none>
   %1 = fir.alloca !fir.class<!fir.array<?xnone>>
   %2 = fir.alloca !fir.box<none>
   %3 = fir.alloca !fir.box<!fir.array<?xnone>>
+  fir.call @foo(%0, %1, %2, %3) : (!fir.ref<!fir.class<none>>, !fir.ref<!fir.class<!fir.array<?xnone>>>, !fir.ref<!fir.box<none>>, !fir.ref<!fir.box<!fir.array<?xnone>>>) -> ()
   return
 }
 // Note: allocmem of fir.box are not possible (fir::HeapType::verify does not
diff --git a/flang/test/Fir/arrexp.fir b/flang/test/Fir/arrexp.fir
index e8ec8ac79e0c2..2eb717228d998 100644
--- a/flang/test/Fir/arrexp.fir
+++ b/flang/test/Fir/arrexp.fir
@@ -143,9 +143,9 @@ func.func @f6(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: f32) {
   %c9 = arith.constant 9 : index
   %c10 = arith.constant 10 : index
 
-  // CHECK: %[[EXT_GEP:.*]] = getelementptr {{.*}} %[[A]], i32 0, i32 7, i64 0, i32 1
+  // CHECK: %[[EXT_GEP:.*]] = getelementptr {{.*}} %[[A]], i32 0, i32 7, i32 0, i32 1
   // CHECK: %[[EXTENT:.*]] = load i64, ptr %[[EXT_GEP]]
-  // CHECK: %[[SIZE:.*]] = mul i64 4, %[[EXTENT]]
+  // CHECK: %[[SIZE:.*]] = mul i64 %[[EXTENT]], 4
   // CHECK: %[[CMP:.*]] = icmp sgt i64 %[[SIZE]], 0
   // CHECK: %[[SZ:.*]] = select i1 %[[CMP]], i64 %[[SIZE]], i64 1
   // CHECK: %[[MALLOC:.*]] = call ptr @malloc(i64 %[[SZ]])
diff --git a/flang/test/Fir/basic-program.fir b/flang/test/Fir/basic-program.fir
index c9fe53bf093a1..6bad03dded24d 100644
--- a/flang/test/Fir/basic-program.fir
+++ b/flang/test/Fir/basic-program.fir
@@ -158,4 +158,6 @@ func.func @_QQmain() {
 // PASSES-NEXT:  LowerNontemporalPass
 // PASSES-NEXT: FIRToLLVMLowering
 // PASSES-NEXT: ReconcileUnrealizedCasts
+// PASSES-NEXT: 'llvm.func' Pipeline
+// PASSES-NEXT: PrepareForOMPOffloadPrivatizationPass
 // PASSES-NEXT: LLVMIRLoweringPass
diff --git a/flang/test/Fir/box.fir b/flang/test/Fir/box.fir
index c0cf3d8375983..760fbd4792122 100644
--- a/flang/test/Fir/box.fir
+++ b/flang/test/Fir/box.fir
@@ -57,7 +57,7 @@ func.func @fa(%a : !fir.ref<!fir.array<100xf32>>) {
 // CHECK-SAME: ptr {{[^%]*}}%[[res:.*]], ptr {{[^%]*}}%[[arg0:.*]], i64 %[[arg1:.*]])
 func.func @b1(%arg0 : !fir.ref<!fir.char<1,?>>, %arg1 : index) -> !fir.box<!fir.char<1,?>> {
   // CHECK: %[[alloca:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8 }
-  // CHECK: %[[size:.*]] = mul i64 1, %[[arg1]]
+  // CHECK: %[[size:.*]] = mul i64 %[[arg1]], 1
   // CHECK: insertvalue {{.*}} undef, i64 %[[size]], 1
   // CHECK: insertvalue {{.*}} i32 20240719, 2
   // CHECK: insertvalue {{.*}} ptr %[[arg0]], 0
@@ -89,7 +89,7 @@ func.func @b2(%arg0 : !fir.ref<!fir.array<?x!fir.char<1,5>>>, %arg1 : index) ->
 func.func @b3(%arg0 : !fir.ref<!fir.array<?x!fir.char<1,?>>>, %arg1 : index, %arg2 : index) -> !fir.box<!fir.array<?x!fir.char<1,?>>> {
   %1 = fir.shape %arg2 : (index) -> !fir.shape<1>
   // CHECK: %[[alloca:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }
-  // CHECK: %[[size:.*]] = mul i64 1, %[[arg1]]
+  // CHECK: %[[size:.*]] = mul i64 %[[arg1]], 1
   // CHECK: insertvalue {{.*}} i64 %[[size]], 1
   // CHECK: insertvalue {{.*}} i32 20240719, 2
   // CHECK: insertvalue {{.*}} i64 %[[arg2]], 7, 0, 1
@@ -108,7 +108,7 @@ func.func @b4(%arg0 : !fir.ref<!fir.array<7x!fir.char<1,?>>>, %arg1 : index) ->
   %c_7 = arith.constant 7 : index
   %1 = fir.shape %c_7 : (index) -> !fir.shape<1>
   // CHECK: %[[alloca:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }
-  // CHECK:   %[[size:.*]] = mul i64 1, %[[arg1]]
+  // CHECK:   %[[size:.*]] = mul i64 %[[arg1]], 1
   // CHECK: insertvalue {{.*}} i64 %[[size]], 1
   // CHECK: insertvalue {{.*}} i32 20240719, 2
   // CHECK: insertvalue {{.*}} i64 7, 7, 0, 1
diff --git a/flang/test/Fir/boxproc.fir b/flang/test/Fir/boxproc.fir
index 97d9b38ed6f40..d4c36a4f5b213 100644
--- a/flang/test/Fir/boxproc.fir
+++ b/flang/test/Fir/boxproc.fir
@@ -82,12 +82,8 @@ func.func @_QPtest_proc_dummy_other(%arg0: !fir.boxproc<() -> ()>) {
 // CHECK:         store [1 x i8] c" ", ptr %[[VAL_18]], align 1
 // CHECK:         call void @llvm.init.trampoline(ptr %[[VAL_20]], ptr @_QFtest_proc_dummy_charPgen_message, ptr %[[VAL_2]])
 // CHECK:         %[[VAL_23:.*]] = call ptr @llvm.adjust.trampoline(ptr %[[VAL_20]])
-// CHECK:         %[[VAL_25:.*]] = insertvalue { ptr, i64 } undef, ptr %[[VAL_23]], 0
-// CHECK:         %[[VAL_26:.*]] = insertvalue { ptr, i64 } %[[VAL_25]], i64 10, 1
 // CHECK:         %[[VAL_27:.*]] = call ptr @llvm.stacksave.p0()
-// CHECK:         %[[VAL_28:.*]] = extractvalue { ptr, i64 } %[[VAL_26]], 0
-// CHECK:         %[[VAL_29:.*]] = extractvalue { ptr, i64 } %[[VAL_26]], 1
-// CHECK:         %[[VAL_30:.*]] = call { ptr, i64 } @_QPget_message(ptr %[[VAL_0]], i64 40, ptr %[[VAL_28]], i64 %[[VAL_29]])
+// CHECK:         %[[VAL_30:.*]] = call { ptr, i64 } @_QPget_message(ptr %[[VAL_0]], i64 40, ptr %[[VAL_23]], i64 10)
 // CHECK:         %[[VAL_32:.*]] = call i1 @_FortranAioOutputAscii(ptr %{{.*}}, ptr %[[VAL_0]], i64 40)
 // CHECK:         call void @llvm.stackrestore.p0(ptr %[[VAL_27]])
 
@@ -115,14 +111,10 @@ func.func @_QPtest_proc_dummy_other(%arg0: !fir.boxproc<() -> ()>) {
 // CHECK-LABEL: define { ptr, i64 } @_QPget_message(ptr
 // CHECK-SAME:                  %[[VAL_0:.*]], i64 %[[VAL_1:.*]], ptr %[[VAL_2:.*]], i64
 // CHECK-SAME:                                                 %[[VAL_3:.*]])
-// CHECK:         %[[VAL_4:.*]] = insertvalue { ptr, i64 } undef, ptr %[[VAL_2]], 0
-// CHECK:         %[[VAL_5:.*]] = insertvalue { ptr, i64 } %[[VAL_4]], i64 %[[VAL_3]], 1
-// CHECK:         %[[VAL_7:.*]] = extractvalue { ptr, i64 } %[[VAL_5]], 0
-// CHECK:         %[[VAL_8:.*]] = extractvalue { ptr, i64 } %[[VAL_5]], 1
 // CHECK:         %[[VAL_9:.*]] = call ptr @llvm.stacksave.p0()
-// CHECK:         %[[VAL_10:.*]] = alloca i8, i64 %[[VAL_8]], align 1
-// CHECK:         %[[VAL_12:.*]] = call { ptr, i64 } %[[VAL_7]](ptr %[[VAL_10]], i64 %[[VAL_8]])
-// CHECK:         %[[VAL_13:.*]] = add i64 %[[VAL_8]], 12
+// CHECK:         %[[VAL_10:.*]] = alloca i8, i64 %[[VAL_3]], align 1
+// CHECK:         %[[VAL_12:.*]] = call { ptr, i64 } %[[VAL_2]](ptr %[[VAL_10]], i64 %[[VAL_3]])
+// CHECK:         %[[VAL_13:.*]] = add i64 %[[VAL_3]], 12
 // CHECK:         %[[VAL_14:.*]] = alloca i8, i64 %[[VAL_13]], align 1
 // CHECK:         call void @llvm.memmove.p0.p0.i64(ptr %[[VAL_14]], ptr {{.*}}, i64 12, i1 false)
 // CHECK:         %[[VAL_18:.*]] = phi i64
diff --git a/flang/test/Fir/embox.fir b/flang/test/Fir/embox.fir
index 0f304cff2c79e..11f7457b6873c 100644
--- a/flang/test/Fir/embox.fir
+++ b/flang/test/Fir/embox.fir
@@ -11,7 +11,7 @@ func.func @_QPtest_callee(%arg0: !fir.box<!fir.array<?xi32>>) {
 func.func @_QPtest_slice() {
 // CHECK:  %[[a1:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8
 // CHECK:  %[[a2:.*]] = alloca [20 x i32], i64 1, align 4
-// CHECK:  %[[a3:.*]] = getelementptr [20 x i32], ptr %[[a2]], i64 0, i64 0
+// CHECK:  %[[a3:.*]] = getelementptr [20 x i32], ptr %[[a2]], i32 0, i64 0
 // CHECK:  %[[a4:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }
 // CHECK:  { ptr undef, i64 4, i32 20240719, i8 1, i8 9, i8 0, i8 0, [1 x [3 x i64]]
 // CHECK: [i64 1, i64 5, i64 8]] }, ptr %[[a3]], 0
@@ -38,7 +38,7 @@ func.func @_QPtest_dt_callee(%arg0: !fir.box<!fir.array<?xi32>>) {
 func.func @_QPtest_dt_slice() {
 // CHECK:  %[[a1:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8
 // CHECK:  %[[a3:.*]] = alloca [20 x %_QFtest_dt_sliceTt], i64 1, align 8
-// CHECK:  %[[a4:.*]] = getelementptr [20 x %_QFtest_dt_sliceTt], ptr %[[a3]], i64 0, i64 0, i32 0
+// CHECK:  %[[a4:.*]] = getelementptr [20 x %_QFtest_dt_sliceTt], ptr %[[a3]], i32 0, i64 0, i32 0
 // CHECK: %[[a5:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }
 // CHECK-SAME: { ptr undef, i64 4, i32 20240719, i8 1, i8 9, i8 0, i8 0, [1 x [3 x i64]]
 // CHECK-SAME: [i64 1, i64 5, i64 16
@@ -73,7 +73,7 @@ func.func @emboxSubstring(%arg0: !fir.ref<!fir.array<2x3x!fir.char<1,4>>>) {
   %0 = fir.shape %c2, %c3 : (index, index) -> !fir.shape<2>
   %1 = fir.slice %c1, %c2, %c1, %c1, %c3, %c1 substr %c1_i64, %c2_i64 : (index, index, index, index, index, index, i64, i64) -> !fir.slice<2>
   %2 = fir.embox %arg0(%0) [%1] : (!fir.ref<!fir.array<2x3x!fir.char<1,4>>>, !fir.shape<2>, !fir.slice<2>) -> !fir.box<!fir.array<?x?x!fir.char<1,?>>>
-  // CHECK: %[[addr:.*]] = getelementptr [3 x [2 x [4 x i8]]], ptr %[[arg0]], i64 0, i64 0, i64 0, i64 1
+  // CHECK: %[[addr:.*]] = getelementptr [3 x [2 x [4 x i8]]], ptr %[[arg0]], i32 0, i64 0, i64 0, i32 1
   // CHECK: insertvalue {[[descriptorType:.*]]} { ptr undef, i64 2, i32 20240719, i8 2, i8 40, i8 0, i8 0
   // CHECK-SAME: [2 x [3 x i64]] [{{\[}}3 x i64] [i64 1, i64 2, i64 4], [3 x i64] [i64 1, i64 3, i64 8]] }
   // CHECK-SAME: ptr %[[addr]], 0
diff --git a/flang/test/Fir/omp-reduction-embox-codegen.fir b/flang/test/Fir/omp-reduction-embox-codegen.fir
index 1645e1a407ad4..e517b1352ff5c 100644
--- a/flang/test/Fir/omp-reduction-embox-codegen.fir
+++ b/flang/test/Fir/omp-reduction-embox-codegen.fir
@@ -23,14 +23,14 @@ omp.declare_reduction @test_reduction : !fir.ref<!fir.box<i32>> init {
   omp.yield(%0 : !fir.ref<!fir.box<i32>>)
 }
 
-func.func @_QQmain() attributes {fir.bindc_name = "reduce"} {
+func.func @_QQmain()  -> !fir.ref<!fir.box<i32>> attributes {fir.bindc_name = "reduce"} {
   %4 = fir.alloca !fir.box<i32>
   omp.parallel reduction(byref @test_reduction %4 -> %arg0 : !fir.ref<!fir.box<i32>>) {
     omp.terminator
   }
-  return
+  return %4: !fir.ref<!fir.box<i32>>
 }
 
 // basically we are testing that there isn't a crash
-// CHECK-LABEL: define void @_QQmain
+// CHECK-LABEL: define ptr @_QQmain
 // CHECK-NEXT:    alloca { ptr, i64, i32, i8, i8, i8, i8 }, i64 1, align 8
diff --git a/flang/test/Fir/optional.fir b/flang/test/Fir/optional.fir
index bded8b5332a30..66ff69f083467 100644
--- a/flang/test/Fir/optional.fir
+++ b/flang/test/Fir/optional.fir
@@ -37,8 +37,7 @@ func.func @bar2() -> i1 {
 
 // CHECK-LABEL: @foo3
 func.func @foo3(%arg0: !fir.boxchar<1>) -> i1 {
-  // CHECK: %[[extract:.*]] = extractvalue { ptr, i64 } %{{.*}}, 0
-  // CHECK: %[[ptr:.*]] = ptrtoint ptr %[[extract]] to i64
+  // CHECK: %[[ptr:.*]] = ptrtoint ptr %0 to i64
   // CHECK: icmp ne i64 %[[ptr]], 0
   %0 = fir.is_present %arg0 : (!fir.boxchar<1>) -> i1
   return %0 : i1
diff --git a/flang/test/Fir/pdt.fir b/flang/test/Fir/pdt.fir
index a200cd7e7cc03..411927aae6bdf 100644
--- a/flang/test/Fir/pdt.fir
+++ b/flang/test/Fir/pdt.fir
@@ -96,13 +96,13 @@ func.func @_QTt1P.f2.offset(%0 : i32, %1 : i32) -> i32 {
 
 func.func private @bar(!fir.ref<!fir.char<1,?>>)
 
-// CHECK-LABEL: define void @_QPfoo(i32 %0, i32 %1)
-func.func @_QPfoo(%arg0 : i32, %arg1 : i32) {
+// CHECK-LABEL: define ptr @_QPfoo(i32 %0, i32 %1)
+func.func @_QPfoo(%arg0 : i32, %arg1 : i32) -> !fir.ref<!fir.type<_QTt1>> {
   // CHECK: %[[size:.*]] = call i64 @_QTt1P.mem.size(i32 %0, i32 %1)
   // CHECK: %[[alloc:.*]] = alloca i8, i64 %[[size]]
   %0 = fir.alloca !fir.type<_QTt1(p1:i32,p2:i32){f1:!fir.char<1,?>,f2:!fir.char<1,?>}>(%arg0, %arg1 : i32, i32)
   //%2 = fir.coordinate_of %0, f2 : (!fir.ref<!fir.type<_QTt1>>) -> !fir.ref<!fir.char<1,?>>
   %2 = fir.zero_bits !fir.ref<!fir.char<1,?>>
   fir.call @bar(%2) : (!fir.ref<!fir.char<1,?>>) -> ()
-  return
+  return %0 : !fir.ref<!fir.type<_QTt1>>
 }
diff --git a/flang/test/Fir/rebox.fir b/flang/test/Fir/rebox.fir
index 0c9f6d9bb94ad..d858adfb7c45d 100644
--- a/flang/test/Fir/rebox.fir
+++ b/flang/test/Fir/rebox.fir
@@ -36,7 +36,7 @@ func.func @test_rebox_1(%arg0: !fir.box<!fir.array<?x?xf32>>) {
   // CHECK: %[[VOIDBASE0:.*]] = getelementptr i8, ptr %[[INBASE]], i64 %[[OFFSET_0]]
   // CHECK: %[[OFFSET_1:.*]] = mul i64 2, %[[INSTRIDE_1]]
   // CHECK: %[[VOIDBASE1:.*]] = getelementptr i8, ptr %[[VOIDBASE0]], i64 %[[OFFSET_1]]
-  // CHECK: %[[OUTSTRIDE0:.*]] = mul i64 3, %[[INSTRIDE_1]]
+  // CHECK: %[[OUTSTRIDE0:.*]] = mul i64 %[[INSTRIDE_1]], 3
   // CHECK: %[[OUTBOX1:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } %{{.*}}, i64 %[[OUTSTRIDE0]], 7, 0, 2
   // CHECK: %[[OUTBOX2:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } %[[OUTBOX1]], ptr %[[VOIDBASE1]], 0
   // CHECK: store { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } %[[OUTBOX2]], ptr %[[OUTBOX_ALLOC]], align 8
@@ -63,7 +63,7 @@ func.func @test_rebox_2(%arg0: !fir.box<!fir.array<?x?x!fir.char<1,?>>>) {
   // CHECK: %[[OUTBOX:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [2 x [3 x i64]] }
   // CHECK: %[[LEN_GEP:.*]] = getelementptr { ptr, i64, i32, i8, i8, i8, i8, [2 x [3 x i64]] }, ptr %[[INBOX]], i32 0, i32 1
   // CHECK: %[[LEN:...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Aug 26, 2025

@llvm/pr-subscribers-mlir-openmp

Author: Pranav Bhandarkar (bhandarkar-pranav)

Changes

This PR adds support for translation of the private clause on deferred target tasks - that is omp.target operations with the nowait clause.

An offloading call for a deferred target-task is not blocking - the offloading host task continues it execution after issuing the offloading call. Therefore, the key problem we need to solve is to ensure that the data needed for private variables to be initialized in the target task persists even after the host task has completed.
We do this in a new pass called PrepareForOMPOffloadPrivatizationPass. For a privatized variable that needs its host counterpart for initialization (such as the shape of the data from the descriptor when an allocatable is privatized or the value of the data when an allocatable is firstprivatized),

  • the pass allocates memory on the heap.
  • it then initializes this memory by copying the contents of host variable to the newly allocated location on the heap.
  • Then, the pass updates all the omp.map.info operations that pointed to the host variable to now point to the one located in the heap.

The pass uses a rewrite pattern applied using the greedy pattern matcher, which in turn does some constant folding and DCE. Due to this a number of lit tests had to be updated. In GEPs constant get folded into indices and truncated to i32 types. In some tests sequence of insertvalue and extractvalue instructions get cancelled out. So, these needed to be updated too.


Patch is 79.33 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/155348.diff

30 Files Affected:

  • (modified) flang/include/flang/Optimizer/Passes/Pipelines.h (+1)
  • (modified) flang/lib/Optimizer/Passes/Pipelines.cpp (+6)
  • (modified) flang/test/Driver/tco-emit-final-mlir.fir (+2-2)
  • (modified) flang/test/Driver/tco-test-gen.fir (+2-3)
  • (modified) flang/test/Fir/alloc-32.fir (+1-1)
  • (modified) flang/test/Fir/alloc.fir (+9-8)
  • (modified) flang/test/Fir/arrexp.fir (+2-2)
  • (modified) flang/test/Fir/basic-program.fir (+2)
  • (modified) flang/test/Fir/box.fir (+3-3)
  • (modified) flang/test/Fir/boxproc.fir (+4-12)
  • (modified) flang/test/Fir/embox.fir (+3-3)
  • (modified) flang/test/Fir/omp-reduction-embox-codegen.fir (+3-3)
  • (modified) flang/test/Fir/optional.fir (+1-2)
  • (modified) flang/test/Fir/pdt.fir (+3-3)
  • (modified) flang/test/Fir/rebox.fir (+9-9)
  • (modified) flang/test/Fir/select.fir (+1-1)
  • (modified) flang/test/Fir/target.fir (-4)
  • (modified) flang/test/Fir/tbaa-codegen2.fir (+3-9)
  • (modified) flang/test/Integration/OpenMP/map-types-and-sizes.f90 (+7-7)
  • (modified) flang/test/Lower/allocatable-polymorphic.f90 (-4)
  • (modified) flang/test/Lower/forall/character-1.f90 (+2-2)
  • (added) mlir/include/mlir/Dialect/LLVMIR/Transforms/OpenMPOffloadPrivatizationPrepare.h (+23)
  • (modified) mlir/include/mlir/Dialect/LLVMIR/Transforms/Passes.td (+12)
  • (modified) mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td (+2-2)
  • (modified) mlir/lib/Dialect/LLVMIR/Transforms/CMakeLists.txt (+2)
  • (added) mlir/lib/Dialect/LLVMIR/Transforms/OpenMPOffloadPrivatizationPrepare.cpp (+425)
  • (modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+2-9)
  • (modified) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp (+1)
  • (added) mlir/test/Dialect/LLVMIR/omp-offload-privatization-prepare.mlir (+167)
  • (modified) mlir/test/Target/LLVMIR/openmp-todo.mlir (-18)
diff --git a/flang/include/flang/Optimizer/Passes/Pipelines.h b/flang/include/flang/Optimizer/Passes/Pipelines.h
index a3f59ee8dd013..17d48f46e4b9b 100644
--- a/flang/include/flang/Optimizer/Passes/Pipelines.h
+++ b/flang/include/flang/Optimizer/Passes/Pipelines.h
@@ -22,6 +22,7 @@
 #include "mlir/Conversion/SCFToControlFlow/SCFToControlFlow.h"
 #include "mlir/Dialect/GPU/IR/GPUDialect.h"
 #include "mlir/Dialect/LLVMIR/LLVMAttrs.h"
+#include "mlir/Dialect/LLVMIR/Transforms/OpenMPOffloadPrivatizationPrepare.h"
 #include "mlir/Pass/PassManager.h"
 #include "mlir/Transforms/GreedyPatternRewriteDriver.h"
 #include "mlir/Transforms/Passes.h"
diff --git a/flang/lib/Optimizer/Passes/Pipelines.cpp b/flang/lib/Optimizer/Passes/Pipelines.cpp
index ca8e820608688..6a11461cd8380 100644
--- a/flang/lib/Optimizer/Passes/Pipelines.cpp
+++ b/flang/lib/Optimizer/Passes/Pipelines.cpp
@@ -403,6 +403,12 @@ void createMLIRToLLVMPassPipeline(mlir::PassManager &pm,
 
   // Add codegen pass pipeline.
   fir::createDefaultFIRCodeGenPassPipeline(pm, config, inputFilename);
+
+  // Run a pass to prepare for translation of delayed privatization in the
+  // context of deferred target tasks.
+  addNestedPassConditionally<mlir::LLVM::LLVMFuncOp>(pm, disableFirToLlvmIr,[&]() {
+    return mlir::LLVM::createPrepareForOMPOffloadPrivatizationPass();
+  });
 }
 
 } // namespace fir
diff --git a/flang/test/Driver/tco-emit-final-mlir.fir b/flang/test/Driver/tco-emit-final-mlir.fir
index 75f8f153127af..177810cf41378 100644
--- a/flang/test/Driver/tco-emit-final-mlir.fir
+++ b/flang/test/Driver/tco-emit-final-mlir.fir
@@ -13,7 +13,7 @@
 // CHECK: llvm.return
 // CHECK-NOT: func.func
 
-func.func @_QPfoo() {
+func.func @_QPfoo() -> !fir.ref<i32> {
   %1 = fir.alloca i32
-  return
+  return %1 : !fir.ref<i32>
 }
diff --git a/flang/test/Driver/tco-test-gen.fir b/flang/test/Driver/tco-test-gen.fir
index 38d4e50ecf3aa..15483f7ee3534 100644
--- a/flang/test/Driver/tco-test-gen.fir
+++ b/flang/test/Driver/tco-test-gen.fir
@@ -42,11 +42,10 @@ func.func @_QPtest(%arg0: !fir.ref<i32> {fir.bindc_name = "num"}, %arg1: !fir.re
 // CHECK-SAME:      %[[ARG2:.*]]: !llvm.ptr {fir.bindc_name = "ub", llvm.nocapture},
 // CHECK-SAME:      %[[ARG3:.*]]: !llvm.ptr {fir.bindc_name = "step", llvm.nocapture}) {
 
+// CMPLX:           %[[VAL_3:.*]] = llvm.mlir.constant(0 : index) : i64
+// CMPLX:           %[[VAL_2:.*]] = llvm.mlir.constant(1 : index) : i64
 // CMPLX:           %[[VAL_0:.*]] = llvm.mlir.constant(1 : i64) : i64
 // CMPLX:           %[[VAL_1:.*]] = llvm.alloca %[[VAL_0]] x i32 {bindc_name = "i"} : (i64) -> !llvm.ptr
-// CMPLX:           %[[VAL_2:.*]] = llvm.mlir.constant(1 : index) : i64
-// CMPLX:           %[[VAL_3:.*]] = llvm.mlir.constant(0 : index) : i64
-// CMPLX:           %[[VAL_4:.*]] = llvm.mlir.constant(1 : i64) : i64
 
 // SIMPLE:          %[[VAL_3:.*]] = llvm.mlir.constant(0 : index) : i64
 // SIMPLE:          %[[VAL_2:.*]] = llvm.mlir.constant(1 : index) : i64
diff --git a/flang/test/Fir/alloc-32.fir b/flang/test/Fir/alloc-32.fir
index a3cbf200c24fc..f57f6ce6fcf5e 100644
--- a/flang/test/Fir/alloc-32.fir
+++ b/flang/test/Fir/alloc-32.fir
@@ -19,7 +19,7 @@ func.func @allocmem_scalar_nonchar() -> !fir.heap<i32> {
 // CHECK-LABEL: define ptr @allocmem_scalar_dynchar(
 // CHECK-SAME: i32 %[[len:.*]])
 // CHECK: %[[mul1:.*]] = sext i32 %[[len]] to i64
-// CHECK: %[[mul2:.*]] = mul i64 1, %[[mul1]]
+// CHECK: %[[mul2:.*]] = mul i64 %[[mul1]], 1
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[mul2]], 0
 // CHECK: %[[sz:.*]] = select i1 %[[cmp]], i64 %[[mul2]], i64 1
 // CHECK: %[[trunc:.*]] = trunc i64 %[[sz]] to i32
diff --git a/flang/test/Fir/alloc.fir b/flang/test/Fir/alloc.fir
index 8da8b828c18b9..0d3ce323d0d7c 100644
--- a/flang/test/Fir/alloc.fir
+++ b/flang/test/Fir/alloc.fir
@@ -86,7 +86,7 @@ func.func @alloca_scalar_dynchar_kind(%l : i32) -> !fir.ref<!fir.char<2,?>> {
 // CHECK-LABEL: define ptr @allocmem_scalar_dynchar(
 // CHECK-SAME: i32 %[[len:.*]])
 // CHECK: %[[mul1:.*]] = sext i32 %[[len]] to i64
-// CHECK: %[[mul2:.*]] = mul i64 1, %[[mul1]]
+// CHECK: %[[mul2:.*]] = mul i64 %[[mul1]], 1
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[mul2]], 0
 // CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[mul2]], i64 1
 // CHECK: call ptr @malloc(i64 %[[size]])
@@ -98,7 +98,7 @@ func.func @allocmem_scalar_dynchar(%l : i32) -> !fir.heap<!fir.char<1,?>> {
 // CHECK-LABEL: define ptr @allocmem_scalar_dynchar_kind(
 // CHECK-SAME: i32 %[[len:.*]])
 // CHECK: %[[mul1:.*]] = sext i32 %[[len]] to i64
-// CHECK: %[[mul2:.*]] = mul i64 2, %[[mul1]]
+// CHECK: %[[mul2:.*]] = mul i64 %[[mul1]], 2
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[mul2]], 0
 // CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[mul2]], i64 1
 // CHECK: call ptr @malloc(i64 %[[size]])
@@ -185,7 +185,7 @@ func.func @alloca_dynarray_of_nonchar2(%e: index) -> !fir.ref<!fir.array<?x?xi32
 
 // CHECK-LABEL: define ptr @allocmem_dynarray_of_nonchar(
 // CHECK-SAME: i64 %[[extent:.*]])
-// CHECK: %[[prod1:.*]] = mul i64 12, %[[extent]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[extent]], 12
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod1]], 0
 // CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[prod1]], i64 1
 // CHECK: call ptr @malloc(i64 %[[size]])
@@ -196,7 +196,7 @@ func.func @allocmem_dynarray_of_nonchar(%e: index) -> !fir.heap<!fir.array<3x?xi
 
 // CHECK-LABEL: define ptr @allocmem_dynarray_of_nonchar2(
 // CHECK-SAME: i64 %[[extent:.*]])
-// CHECK: %[[prod1:.*]] = mul i64 4, %[[extent]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[extent]], 4
 // CHECK: %[[prod2:.*]] = mul i64 %[[prod1]], %[[extent]]
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod2]], 0
 // CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[prod2]], i64 1
@@ -227,7 +227,7 @@ func.func @alloca_dynarray_of_char2(%e : index) -> !fir.ref<!fir.array<?x?x!fir.
 
 // CHECK-LABEL: define ptr @allocmem_dynarray_of_char(
 // CHECK-SAME: i64 %[[extent:.*]])
-// CHECK: %[[prod1:.*]] = mul i64 60, %[[extent]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[extent]], 60
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod1]], 0
 // CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[prod1]], i64 1
 // CHECK: call ptr @malloc(i64 %[[size]])
@@ -238,7 +238,7 @@ func.func @allocmem_dynarray_of_char(%e : index) -> !fir.heap<!fir.array<3x?x!fi
 
 // CHECK-LABEL: define ptr @allocmem_dynarray_of_char2(
 // CHECK-SAME: i64 %[[extent:.*]])
-// CHECK: %[[prod1:.*]] = mul i64 20, %[[extent]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[extent]], 20
 // CHECK: %[[prod2:.*]] = mul i64 %[[prod1]], %[[extent]]
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod2]], 0
 // CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[mul2]], i64 1
@@ -286,7 +286,7 @@ func.func @allocmem_dynarray_of_dynchar(%l: i32, %e : index) -> !fir.heap<!fir.a
 // CHECK-LABEL: define ptr @allocmem_dynarray_of_dynchar2(
 // CHECK-SAME: i32 %[[len:.*]], i64 %[[extent:.*]])
 // CHECK: %[[a:.*]] = sext i32 %[[len]] to i64
-// CHECK: %[[prod1:.*]] = mul i64 2, %[[a]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[a]], 2
 // CHECK: %[[prod2:.*]] = mul i64 %[[prod1]], %[[extent]]
 // CHECK: %[[prod3:.*]] = mul i64 %[[prod2]], %[[extent]]
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod3]], 0
@@ -366,12 +366,13 @@ func.func @allocmem_array_with_holes_dynchar(%arg0: index, %arg1: index) -> !fir
 // CHECK:    %[[VAL_0:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, ptr, [1 x i64] }, i64 1
 // CHECK:    %[[VAL_3:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]], ptr, [1 x i64] }, i64 1
 // CHECK:    %[[VAL_2:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, ptr, [1 x i64] }, i64 1
-
+func.func private @foo(%0: !fir.ref<!fir.class<none>>, %1: !fir.ref<!fir.class<!fir.array<?xnone>>>, %2: !fir.ref<!fir.box<none>>, %3: !fir.ref<!fir.box<!fir.array<?xnone>>>)
 func.func @alloca_unlimited_polymorphic_box() {
   %0 = fir.alloca !fir.class<none>
   %1 = fir.alloca !fir.class<!fir.array<?xnone>>
   %2 = fir.alloca !fir.box<none>
   %3 = fir.alloca !fir.box<!fir.array<?xnone>>
+  fir.call @foo(%0, %1, %2, %3) : (!fir.ref<!fir.class<none>>, !fir.ref<!fir.class<!fir.array<?xnone>>>, !fir.ref<!fir.box<none>>, !fir.ref<!fir.box<!fir.array<?xnone>>>) -> ()
   return
 }
 // Note: allocmem of fir.box are not possible (fir::HeapType::verify does not
diff --git a/flang/test/Fir/arrexp.fir b/flang/test/Fir/arrexp.fir
index e8ec8ac79e0c2..2eb717228d998 100644
--- a/flang/test/Fir/arrexp.fir
+++ b/flang/test/Fir/arrexp.fir
@@ -143,9 +143,9 @@ func.func @f6(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: f32) {
   %c9 = arith.constant 9 : index
   %c10 = arith.constant 10 : index
 
-  // CHECK: %[[EXT_GEP:.*]] = getelementptr {{.*}} %[[A]], i32 0, i32 7, i64 0, i32 1
+  // CHECK: %[[EXT_GEP:.*]] = getelementptr {{.*}} %[[A]], i32 0, i32 7, i32 0, i32 1
   // CHECK: %[[EXTENT:.*]] = load i64, ptr %[[EXT_GEP]]
-  // CHECK: %[[SIZE:.*]] = mul i64 4, %[[EXTENT]]
+  // CHECK: %[[SIZE:.*]] = mul i64 %[[EXTENT]], 4
   // CHECK: %[[CMP:.*]] = icmp sgt i64 %[[SIZE]], 0
   // CHECK: %[[SZ:.*]] = select i1 %[[CMP]], i64 %[[SIZE]], i64 1
   // CHECK: %[[MALLOC:.*]] = call ptr @malloc(i64 %[[SZ]])
diff --git a/flang/test/Fir/basic-program.fir b/flang/test/Fir/basic-program.fir
index c9fe53bf093a1..6bad03dded24d 100644
--- a/flang/test/Fir/basic-program.fir
+++ b/flang/test/Fir/basic-program.fir
@@ -158,4 +158,6 @@ func.func @_QQmain() {
 // PASSES-NEXT:  LowerNontemporalPass
 // PASSES-NEXT: FIRToLLVMLowering
 // PASSES-NEXT: ReconcileUnrealizedCasts
+// PASSES-NEXT: 'llvm.func' Pipeline
+// PASSES-NEXT: PrepareForOMPOffloadPrivatizationPass
 // PASSES-NEXT: LLVMIRLoweringPass
diff --git a/flang/test/Fir/box.fir b/flang/test/Fir/box.fir
index c0cf3d8375983..760fbd4792122 100644
--- a/flang/test/Fir/box.fir
+++ b/flang/test/Fir/box.fir
@@ -57,7 +57,7 @@ func.func @fa(%a : !fir.ref<!fir.array<100xf32>>) {
 // CHECK-SAME: ptr {{[^%]*}}%[[res:.*]], ptr {{[^%]*}}%[[arg0:.*]], i64 %[[arg1:.*]])
 func.func @b1(%arg0 : !fir.ref<!fir.char<1,?>>, %arg1 : index) -> !fir.box<!fir.char<1,?>> {
   // CHECK: %[[alloca:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8 }
-  // CHECK: %[[size:.*]] = mul i64 1, %[[arg1]]
+  // CHECK: %[[size:.*]] = mul i64 %[[arg1]], 1
   // CHECK: insertvalue {{.*}} undef, i64 %[[size]], 1
   // CHECK: insertvalue {{.*}} i32 20240719, 2
   // CHECK: insertvalue {{.*}} ptr %[[arg0]], 0
@@ -89,7 +89,7 @@ func.func @b2(%arg0 : !fir.ref<!fir.array<?x!fir.char<1,5>>>, %arg1 : index) ->
 func.func @b3(%arg0 : !fir.ref<!fir.array<?x!fir.char<1,?>>>, %arg1 : index, %arg2 : index) -> !fir.box<!fir.array<?x!fir.char<1,?>>> {
   %1 = fir.shape %arg2 : (index) -> !fir.shape<1>
   // CHECK: %[[alloca:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }
-  // CHECK: %[[size:.*]] = mul i64 1, %[[arg1]]
+  // CHECK: %[[size:.*]] = mul i64 %[[arg1]], 1
   // CHECK: insertvalue {{.*}} i64 %[[size]], 1
   // CHECK: insertvalue {{.*}} i32 20240719, 2
   // CHECK: insertvalue {{.*}} i64 %[[arg2]], 7, 0, 1
@@ -108,7 +108,7 @@ func.func @b4(%arg0 : !fir.ref<!fir.array<7x!fir.char<1,?>>>, %arg1 : index) ->
   %c_7 = arith.constant 7 : index
   %1 = fir.shape %c_7 : (index) -> !fir.shape<1>
   // CHECK: %[[alloca:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }
-  // CHECK:   %[[size:.*]] = mul i64 1, %[[arg1]]
+  // CHECK:   %[[size:.*]] = mul i64 %[[arg1]], 1
   // CHECK: insertvalue {{.*}} i64 %[[size]], 1
   // CHECK: insertvalue {{.*}} i32 20240719, 2
   // CHECK: insertvalue {{.*}} i64 7, 7, 0, 1
diff --git a/flang/test/Fir/boxproc.fir b/flang/test/Fir/boxproc.fir
index 97d9b38ed6f40..d4c36a4f5b213 100644
--- a/flang/test/Fir/boxproc.fir
+++ b/flang/test/Fir/boxproc.fir
@@ -82,12 +82,8 @@ func.func @_QPtest_proc_dummy_other(%arg0: !fir.boxproc<() -> ()>) {
 // CHECK:         store [1 x i8] c" ", ptr %[[VAL_18]], align 1
 // CHECK:         call void @llvm.init.trampoline(ptr %[[VAL_20]], ptr @_QFtest_proc_dummy_charPgen_message, ptr %[[VAL_2]])
 // CHECK:         %[[VAL_23:.*]] = call ptr @llvm.adjust.trampoline(ptr %[[VAL_20]])
-// CHECK:         %[[VAL_25:.*]] = insertvalue { ptr, i64 } undef, ptr %[[VAL_23]], 0
-// CHECK:         %[[VAL_26:.*]] = insertvalue { ptr, i64 } %[[VAL_25]], i64 10, 1
 // CHECK:         %[[VAL_27:.*]] = call ptr @llvm.stacksave.p0()
-// CHECK:         %[[VAL_28:.*]] = extractvalue { ptr, i64 } %[[VAL_26]], 0
-// CHECK:         %[[VAL_29:.*]] = extractvalue { ptr, i64 } %[[VAL_26]], 1
-// CHECK:         %[[VAL_30:.*]] = call { ptr, i64 } @_QPget_message(ptr %[[VAL_0]], i64 40, ptr %[[VAL_28]], i64 %[[VAL_29]])
+// CHECK:         %[[VAL_30:.*]] = call { ptr, i64 } @_QPget_message(ptr %[[VAL_0]], i64 40, ptr %[[VAL_23]], i64 10)
 // CHECK:         %[[VAL_32:.*]] = call i1 @_FortranAioOutputAscii(ptr %{{.*}}, ptr %[[VAL_0]], i64 40)
 // CHECK:         call void @llvm.stackrestore.p0(ptr %[[VAL_27]])
 
@@ -115,14 +111,10 @@ func.func @_QPtest_proc_dummy_other(%arg0: !fir.boxproc<() -> ()>) {
 // CHECK-LABEL: define { ptr, i64 } @_QPget_message(ptr
 // CHECK-SAME:                  %[[VAL_0:.*]], i64 %[[VAL_1:.*]], ptr %[[VAL_2:.*]], i64
 // CHECK-SAME:                                                 %[[VAL_3:.*]])
-// CHECK:         %[[VAL_4:.*]] = insertvalue { ptr, i64 } undef, ptr %[[VAL_2]], 0
-// CHECK:         %[[VAL_5:.*]] = insertvalue { ptr, i64 } %[[VAL_4]], i64 %[[VAL_3]], 1
-// CHECK:         %[[VAL_7:.*]] = extractvalue { ptr, i64 } %[[VAL_5]], 0
-// CHECK:         %[[VAL_8:.*]] = extractvalue { ptr, i64 } %[[VAL_5]], 1
 // CHECK:         %[[VAL_9:.*]] = call ptr @llvm.stacksave.p0()
-// CHECK:         %[[VAL_10:.*]] = alloca i8, i64 %[[VAL_8]], align 1
-// CHECK:         %[[VAL_12:.*]] = call { ptr, i64 } %[[VAL_7]](ptr %[[VAL_10]], i64 %[[VAL_8]])
-// CHECK:         %[[VAL_13:.*]] = add i64 %[[VAL_8]], 12
+// CHECK:         %[[VAL_10:.*]] = alloca i8, i64 %[[VAL_3]], align 1
+// CHECK:         %[[VAL_12:.*]] = call { ptr, i64 } %[[VAL_2]](ptr %[[VAL_10]], i64 %[[VAL_3]])
+// CHECK:         %[[VAL_13:.*]] = add i64 %[[VAL_3]], 12
 // CHECK:         %[[VAL_14:.*]] = alloca i8, i64 %[[VAL_13]], align 1
 // CHECK:         call void @llvm.memmove.p0.p0.i64(ptr %[[VAL_14]], ptr {{.*}}, i64 12, i1 false)
 // CHECK:         %[[VAL_18:.*]] = phi i64
diff --git a/flang/test/Fir/embox.fir b/flang/test/Fir/embox.fir
index 0f304cff2c79e..11f7457b6873c 100644
--- a/flang/test/Fir/embox.fir
+++ b/flang/test/Fir/embox.fir
@@ -11,7 +11,7 @@ func.func @_QPtest_callee(%arg0: !fir.box<!fir.array<?xi32>>) {
 func.func @_QPtest_slice() {
 // CHECK:  %[[a1:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8
 // CHECK:  %[[a2:.*]] = alloca [20 x i32], i64 1, align 4
-// CHECK:  %[[a3:.*]] = getelementptr [20 x i32], ptr %[[a2]], i64 0, i64 0
+// CHECK:  %[[a3:.*]] = getelementptr [20 x i32], ptr %[[a2]], i32 0, i64 0
 // CHECK:  %[[a4:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }
 // CHECK:  { ptr undef, i64 4, i32 20240719, i8 1, i8 9, i8 0, i8 0, [1 x [3 x i64]]
 // CHECK: [i64 1, i64 5, i64 8]] }, ptr %[[a3]], 0
@@ -38,7 +38,7 @@ func.func @_QPtest_dt_callee(%arg0: !fir.box<!fir.array<?xi32>>) {
 func.func @_QPtest_dt_slice() {
 // CHECK:  %[[a1:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8
 // CHECK:  %[[a3:.*]] = alloca [20 x %_QFtest_dt_sliceTt], i64 1, align 8
-// CHECK:  %[[a4:.*]] = getelementptr [20 x %_QFtest_dt_sliceTt], ptr %[[a3]], i64 0, i64 0, i32 0
+// CHECK:  %[[a4:.*]] = getelementptr [20 x %_QFtest_dt_sliceTt], ptr %[[a3]], i32 0, i64 0, i32 0
 // CHECK: %[[a5:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }
 // CHECK-SAME: { ptr undef, i64 4, i32 20240719, i8 1, i8 9, i8 0, i8 0, [1 x [3 x i64]]
 // CHECK-SAME: [i64 1, i64 5, i64 16
@@ -73,7 +73,7 @@ func.func @emboxSubstring(%arg0: !fir.ref<!fir.array<2x3x!fir.char<1,4>>>) {
   %0 = fir.shape %c2, %c3 : (index, index) -> !fir.shape<2>
   %1 = fir.slice %c1, %c2, %c1, %c1, %c3, %c1 substr %c1_i64, %c2_i64 : (index, index, index, index, index, index, i64, i64) -> !fir.slice<2>
   %2 = fir.embox %arg0(%0) [%1] : (!fir.ref<!fir.array<2x3x!fir.char<1,4>>>, !fir.shape<2>, !fir.slice<2>) -> !fir.box<!fir.array<?x?x!fir.char<1,?>>>
-  // CHECK: %[[addr:.*]] = getelementptr [3 x [2 x [4 x i8]]], ptr %[[arg0]], i64 0, i64 0, i64 0, i64 1
+  // CHECK: %[[addr:.*]] = getelementptr [3 x [2 x [4 x i8]]], ptr %[[arg0]], i32 0, i64 0, i64 0, i32 1
   // CHECK: insertvalue {[[descriptorType:.*]]} { ptr undef, i64 2, i32 20240719, i8 2, i8 40, i8 0, i8 0
   // CHECK-SAME: [2 x [3 x i64]] [{{\[}}3 x i64] [i64 1, i64 2, i64 4], [3 x i64] [i64 1, i64 3, i64 8]] }
   // CHECK-SAME: ptr %[[addr]], 0
diff --git a/flang/test/Fir/omp-reduction-embox-codegen.fir b/flang/test/Fir/omp-reduction-embox-codegen.fir
index 1645e1a407ad4..e517b1352ff5c 100644
--- a/flang/test/Fir/omp-reduction-embox-codegen.fir
+++ b/flang/test/Fir/omp-reduction-embox-codegen.fir
@@ -23,14 +23,14 @@ omp.declare_reduction @test_reduction : !fir.ref<!fir.box<i32>> init {
   omp.yield(%0 : !fir.ref<!fir.box<i32>>)
 }
 
-func.func @_QQmain() attributes {fir.bindc_name = "reduce"} {
+func.func @_QQmain()  -> !fir.ref<!fir.box<i32>> attributes {fir.bindc_name = "reduce"} {
   %4 = fir.alloca !fir.box<i32>
   omp.parallel reduction(byref @test_reduction %4 -> %arg0 : !fir.ref<!fir.box<i32>>) {
     omp.terminator
   }
-  return
+  return %4: !fir.ref<!fir.box<i32>>
 }
 
 // basically we are testing that there isn't a crash
-// CHECK-LABEL: define void @_QQmain
+// CHECK-LABEL: define ptr @_QQmain
 // CHECK-NEXT:    alloca { ptr, i64, i32, i8, i8, i8, i8 }, i64 1, align 8
diff --git a/flang/test/Fir/optional.fir b/flang/test/Fir/optional.fir
index bded8b5332a30..66ff69f083467 100644
--- a/flang/test/Fir/optional.fir
+++ b/flang/test/Fir/optional.fir
@@ -37,8 +37,7 @@ func.func @bar2() -> i1 {
 
 // CHECK-LABEL: @foo3
 func.func @foo3(%arg0: !fir.boxchar<1>) -> i1 {
-  // CHECK: %[[extract:.*]] = extractvalue { ptr, i64 } %{{.*}}, 0
-  // CHECK: %[[ptr:.*]] = ptrtoint ptr %[[extract]] to i64
+  // CHECK: %[[ptr:.*]] = ptrtoint ptr %0 to i64
   // CHECK: icmp ne i64 %[[ptr]], 0
   %0 = fir.is_present %arg0 : (!fir.boxchar<1>) -> i1
   return %0 : i1
diff --git a/flang/test/Fir/pdt.fir b/flang/test/Fir/pdt.fir
index a200cd7e7cc03..411927aae6bdf 100644
--- a/flang/test/Fir/pdt.fir
+++ b/flang/test/Fir/pdt.fir
@@ -96,13 +96,13 @@ func.func @_QTt1P.f2.offset(%0 : i32, %1 : i32) -> i32 {
 
 func.func private @bar(!fir.ref<!fir.char<1,?>>)
 
-// CHECK-LABEL: define void @_QPfoo(i32 %0, i32 %1)
-func.func @_QPfoo(%arg0 : i32, %arg1 : i32) {
+// CHECK-LABEL: define ptr @_QPfoo(i32 %0, i32 %1)
+func.func @_QPfoo(%arg0 : i32, %arg1 : i32) -> !fir.ref<!fir.type<_QTt1>> {
   // CHECK: %[[size:.*]] = call i64 @_QTt1P.mem.size(i32 %0, i32 %1)
   // CHECK: %[[alloc:.*]] = alloca i8, i64 %[[size]]
   %0 = fir.alloca !fir.type<_QTt1(p1:i32,p2:i32){f1:!fir.char<1,?>,f2:!fir.char<1,?>}>(%arg0, %arg1 : i32, i32)
   //%2 = fir.coordinate_of %0, f2 : (!fir.ref<!fir.type<_QTt1>>) -> !fir.ref<!fir.char<1,?>>
   %2 = fir.zero_bits !fir.ref<!fir.char<1,?>>
   fir.call @bar(%2) : (!fir.ref<!fir.char<1,?>>) -> ()
-  return
+  return %0 : !fir.ref<!fir.type<_QTt1>>
 }
diff --git a/flang/test/Fir/rebox.fir b/flang/test/Fir/rebox.fir
index 0c9f6d9bb94ad..d858adfb7c45d 100644
--- a/flang/test/Fir/rebox.fir
+++ b/flang/test/Fir/rebox.fir
@@ -36,7 +36,7 @@ func.func @test_rebox_1(%arg0: !fir.box<!fir.array<?x?xf32>>) {
   // CHECK: %[[VOIDBASE0:.*]] = getelementptr i8, ptr %[[INBASE]], i64 %[[OFFSET_0]]
   // CHECK: %[[OFFSET_1:.*]] = mul i64 2, %[[INSTRIDE_1]]
   // CHECK: %[[VOIDBASE1:.*]] = getelementptr i8, ptr %[[VOIDBASE0]], i64 %[[OFFSET_1]]
-  // CHECK: %[[OUTSTRIDE0:.*]] = mul i64 3, %[[INSTRIDE_1]]
+  // CHECK: %[[OUTSTRIDE0:.*]] = mul i64 %[[INSTRIDE_1]], 3
   // CHECK: %[[OUTBOX1:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } %{{.*}}, i64 %[[OUTSTRIDE0]], 7, 0, 2
   // CHECK: %[[OUTBOX2:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } %[[OUTBOX1]], ptr %[[VOIDBASE1]], 0
   // CHECK: store { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } %[[OUTBOX2]], ptr %[[OUTBOX_ALLOC]], align 8
@@ -63,7 +63,7 @@ func.func @test_rebox_2(%arg0: !fir.box<!fir.array<?x?x!fir.char<1,?>>>) {
   // CHECK: %[[OUTBOX:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [2 x [3 x i64]] }
   // CHECK: %[[LEN_GEP:.*]] = getelementptr { ptr, i64, i32, i8, i8, i8, i8, [2 x [3 x i64]] }, ptr %[[INBOX]], i32 0, i32 1
   // CHECK: %[[LEN:...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Aug 26, 2025

@llvm/pr-subscribers-flang-openmp

Author: Pranav Bhandarkar (bhandarkar-pranav)

Changes

This PR adds support for translation of the private clause on deferred target tasks - that is omp.target operations with the nowait clause.

An offloading call for a deferred target-task is not blocking - the offloading host task continues it execution after issuing the offloading call. Therefore, the key problem we need to solve is to ensure that the data needed for private variables to be initialized in the target task persists even after the host task has completed.
We do this in a new pass called PrepareForOMPOffloadPrivatizationPass. For a privatized variable that needs its host counterpart for initialization (such as the shape of the data from the descriptor when an allocatable is privatized or the value of the data when an allocatable is firstprivatized),

  • the pass allocates memory on the heap.
  • it then initializes this memory by copying the contents of host variable to the newly allocated location on the heap.
  • Then, the pass updates all the omp.map.info operations that pointed to the host variable to now point to the one located in the heap.

The pass uses a rewrite pattern applied using the greedy pattern matcher, which in turn does some constant folding and DCE. Due to this a number of lit tests had to be updated. In GEPs constant get folded into indices and truncated to i32 types. In some tests sequence of insertvalue and extractvalue instructions get cancelled out. So, these needed to be updated too.


Patch is 79.33 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/155348.diff

30 Files Affected:

  • (modified) flang/include/flang/Optimizer/Passes/Pipelines.h (+1)
  • (modified) flang/lib/Optimizer/Passes/Pipelines.cpp (+6)
  • (modified) flang/test/Driver/tco-emit-final-mlir.fir (+2-2)
  • (modified) flang/test/Driver/tco-test-gen.fir (+2-3)
  • (modified) flang/test/Fir/alloc-32.fir (+1-1)
  • (modified) flang/test/Fir/alloc.fir (+9-8)
  • (modified) flang/test/Fir/arrexp.fir (+2-2)
  • (modified) flang/test/Fir/basic-program.fir (+2)
  • (modified) flang/test/Fir/box.fir (+3-3)
  • (modified) flang/test/Fir/boxproc.fir (+4-12)
  • (modified) flang/test/Fir/embox.fir (+3-3)
  • (modified) flang/test/Fir/omp-reduction-embox-codegen.fir (+3-3)
  • (modified) flang/test/Fir/optional.fir (+1-2)
  • (modified) flang/test/Fir/pdt.fir (+3-3)
  • (modified) flang/test/Fir/rebox.fir (+9-9)
  • (modified) flang/test/Fir/select.fir (+1-1)
  • (modified) flang/test/Fir/target.fir (-4)
  • (modified) flang/test/Fir/tbaa-codegen2.fir (+3-9)
  • (modified) flang/test/Integration/OpenMP/map-types-and-sizes.f90 (+7-7)
  • (modified) flang/test/Lower/allocatable-polymorphic.f90 (-4)
  • (modified) flang/test/Lower/forall/character-1.f90 (+2-2)
  • (added) mlir/include/mlir/Dialect/LLVMIR/Transforms/OpenMPOffloadPrivatizationPrepare.h (+23)
  • (modified) mlir/include/mlir/Dialect/LLVMIR/Transforms/Passes.td (+12)
  • (modified) mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td (+2-2)
  • (modified) mlir/lib/Dialect/LLVMIR/Transforms/CMakeLists.txt (+2)
  • (added) mlir/lib/Dialect/LLVMIR/Transforms/OpenMPOffloadPrivatizationPrepare.cpp (+425)
  • (modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+2-9)
  • (modified) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp (+1)
  • (added) mlir/test/Dialect/LLVMIR/omp-offload-privatization-prepare.mlir (+167)
  • (modified) mlir/test/Target/LLVMIR/openmp-todo.mlir (-18)
diff --git a/flang/include/flang/Optimizer/Passes/Pipelines.h b/flang/include/flang/Optimizer/Passes/Pipelines.h
index a3f59ee8dd013..17d48f46e4b9b 100644
--- a/flang/include/flang/Optimizer/Passes/Pipelines.h
+++ b/flang/include/flang/Optimizer/Passes/Pipelines.h
@@ -22,6 +22,7 @@
 #include "mlir/Conversion/SCFToControlFlow/SCFToControlFlow.h"
 #include "mlir/Dialect/GPU/IR/GPUDialect.h"
 #include "mlir/Dialect/LLVMIR/LLVMAttrs.h"
+#include "mlir/Dialect/LLVMIR/Transforms/OpenMPOffloadPrivatizationPrepare.h"
 #include "mlir/Pass/PassManager.h"
 #include "mlir/Transforms/GreedyPatternRewriteDriver.h"
 #include "mlir/Transforms/Passes.h"
diff --git a/flang/lib/Optimizer/Passes/Pipelines.cpp b/flang/lib/Optimizer/Passes/Pipelines.cpp
index ca8e820608688..6a11461cd8380 100644
--- a/flang/lib/Optimizer/Passes/Pipelines.cpp
+++ b/flang/lib/Optimizer/Passes/Pipelines.cpp
@@ -403,6 +403,12 @@ void createMLIRToLLVMPassPipeline(mlir::PassManager &pm,
 
   // Add codegen pass pipeline.
   fir::createDefaultFIRCodeGenPassPipeline(pm, config, inputFilename);
+
+  // Run a pass to prepare for translation of delayed privatization in the
+  // context of deferred target tasks.
+  addNestedPassConditionally<mlir::LLVM::LLVMFuncOp>(pm, disableFirToLlvmIr,[&]() {
+    return mlir::LLVM::createPrepareForOMPOffloadPrivatizationPass();
+  });
 }
 
 } // namespace fir
diff --git a/flang/test/Driver/tco-emit-final-mlir.fir b/flang/test/Driver/tco-emit-final-mlir.fir
index 75f8f153127af..177810cf41378 100644
--- a/flang/test/Driver/tco-emit-final-mlir.fir
+++ b/flang/test/Driver/tco-emit-final-mlir.fir
@@ -13,7 +13,7 @@
 // CHECK: llvm.return
 // CHECK-NOT: func.func
 
-func.func @_QPfoo() {
+func.func @_QPfoo() -> !fir.ref<i32> {
   %1 = fir.alloca i32
-  return
+  return %1 : !fir.ref<i32>
 }
diff --git a/flang/test/Driver/tco-test-gen.fir b/flang/test/Driver/tco-test-gen.fir
index 38d4e50ecf3aa..15483f7ee3534 100644
--- a/flang/test/Driver/tco-test-gen.fir
+++ b/flang/test/Driver/tco-test-gen.fir
@@ -42,11 +42,10 @@ func.func @_QPtest(%arg0: !fir.ref<i32> {fir.bindc_name = "num"}, %arg1: !fir.re
 // CHECK-SAME:      %[[ARG2:.*]]: !llvm.ptr {fir.bindc_name = "ub", llvm.nocapture},
 // CHECK-SAME:      %[[ARG3:.*]]: !llvm.ptr {fir.bindc_name = "step", llvm.nocapture}) {
 
+// CMPLX:           %[[VAL_3:.*]] = llvm.mlir.constant(0 : index) : i64
+// CMPLX:           %[[VAL_2:.*]] = llvm.mlir.constant(1 : index) : i64
 // CMPLX:           %[[VAL_0:.*]] = llvm.mlir.constant(1 : i64) : i64
 // CMPLX:           %[[VAL_1:.*]] = llvm.alloca %[[VAL_0]] x i32 {bindc_name = "i"} : (i64) -> !llvm.ptr
-// CMPLX:           %[[VAL_2:.*]] = llvm.mlir.constant(1 : index) : i64
-// CMPLX:           %[[VAL_3:.*]] = llvm.mlir.constant(0 : index) : i64
-// CMPLX:           %[[VAL_4:.*]] = llvm.mlir.constant(1 : i64) : i64
 
 // SIMPLE:          %[[VAL_3:.*]] = llvm.mlir.constant(0 : index) : i64
 // SIMPLE:          %[[VAL_2:.*]] = llvm.mlir.constant(1 : index) : i64
diff --git a/flang/test/Fir/alloc-32.fir b/flang/test/Fir/alloc-32.fir
index a3cbf200c24fc..f57f6ce6fcf5e 100644
--- a/flang/test/Fir/alloc-32.fir
+++ b/flang/test/Fir/alloc-32.fir
@@ -19,7 +19,7 @@ func.func @allocmem_scalar_nonchar() -> !fir.heap<i32> {
 // CHECK-LABEL: define ptr @allocmem_scalar_dynchar(
 // CHECK-SAME: i32 %[[len:.*]])
 // CHECK: %[[mul1:.*]] = sext i32 %[[len]] to i64
-// CHECK: %[[mul2:.*]] = mul i64 1, %[[mul1]]
+// CHECK: %[[mul2:.*]] = mul i64 %[[mul1]], 1
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[mul2]], 0
 // CHECK: %[[sz:.*]] = select i1 %[[cmp]], i64 %[[mul2]], i64 1
 // CHECK: %[[trunc:.*]] = trunc i64 %[[sz]] to i32
diff --git a/flang/test/Fir/alloc.fir b/flang/test/Fir/alloc.fir
index 8da8b828c18b9..0d3ce323d0d7c 100644
--- a/flang/test/Fir/alloc.fir
+++ b/flang/test/Fir/alloc.fir
@@ -86,7 +86,7 @@ func.func @alloca_scalar_dynchar_kind(%l : i32) -> !fir.ref<!fir.char<2,?>> {
 // CHECK-LABEL: define ptr @allocmem_scalar_dynchar(
 // CHECK-SAME: i32 %[[len:.*]])
 // CHECK: %[[mul1:.*]] = sext i32 %[[len]] to i64
-// CHECK: %[[mul2:.*]] = mul i64 1, %[[mul1]]
+// CHECK: %[[mul2:.*]] = mul i64 %[[mul1]], 1
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[mul2]], 0
 // CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[mul2]], i64 1
 // CHECK: call ptr @malloc(i64 %[[size]])
@@ -98,7 +98,7 @@ func.func @allocmem_scalar_dynchar(%l : i32) -> !fir.heap<!fir.char<1,?>> {
 // CHECK-LABEL: define ptr @allocmem_scalar_dynchar_kind(
 // CHECK-SAME: i32 %[[len:.*]])
 // CHECK: %[[mul1:.*]] = sext i32 %[[len]] to i64
-// CHECK: %[[mul2:.*]] = mul i64 2, %[[mul1]]
+// CHECK: %[[mul2:.*]] = mul i64 %[[mul1]], 2
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[mul2]], 0
 // CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[mul2]], i64 1
 // CHECK: call ptr @malloc(i64 %[[size]])
@@ -185,7 +185,7 @@ func.func @alloca_dynarray_of_nonchar2(%e: index) -> !fir.ref<!fir.array<?x?xi32
 
 // CHECK-LABEL: define ptr @allocmem_dynarray_of_nonchar(
 // CHECK-SAME: i64 %[[extent:.*]])
-// CHECK: %[[prod1:.*]] = mul i64 12, %[[extent]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[extent]], 12
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod1]], 0
 // CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[prod1]], i64 1
 // CHECK: call ptr @malloc(i64 %[[size]])
@@ -196,7 +196,7 @@ func.func @allocmem_dynarray_of_nonchar(%e: index) -> !fir.heap<!fir.array<3x?xi
 
 // CHECK-LABEL: define ptr @allocmem_dynarray_of_nonchar2(
 // CHECK-SAME: i64 %[[extent:.*]])
-// CHECK: %[[prod1:.*]] = mul i64 4, %[[extent]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[extent]], 4
 // CHECK: %[[prod2:.*]] = mul i64 %[[prod1]], %[[extent]]
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod2]], 0
 // CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[prod2]], i64 1
@@ -227,7 +227,7 @@ func.func @alloca_dynarray_of_char2(%e : index) -> !fir.ref<!fir.array<?x?x!fir.
 
 // CHECK-LABEL: define ptr @allocmem_dynarray_of_char(
 // CHECK-SAME: i64 %[[extent:.*]])
-// CHECK: %[[prod1:.*]] = mul i64 60, %[[extent]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[extent]], 60
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod1]], 0
 // CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[prod1]], i64 1
 // CHECK: call ptr @malloc(i64 %[[size]])
@@ -238,7 +238,7 @@ func.func @allocmem_dynarray_of_char(%e : index) -> !fir.heap<!fir.array<3x?x!fi
 
 // CHECK-LABEL: define ptr @allocmem_dynarray_of_char2(
 // CHECK-SAME: i64 %[[extent:.*]])
-// CHECK: %[[prod1:.*]] = mul i64 20, %[[extent]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[extent]], 20
 // CHECK: %[[prod2:.*]] = mul i64 %[[prod1]], %[[extent]]
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod2]], 0
 // CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[mul2]], i64 1
@@ -286,7 +286,7 @@ func.func @allocmem_dynarray_of_dynchar(%l: i32, %e : index) -> !fir.heap<!fir.a
 // CHECK-LABEL: define ptr @allocmem_dynarray_of_dynchar2(
 // CHECK-SAME: i32 %[[len:.*]], i64 %[[extent:.*]])
 // CHECK: %[[a:.*]] = sext i32 %[[len]] to i64
-// CHECK: %[[prod1:.*]] = mul i64 2, %[[a]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[a]], 2
 // CHECK: %[[prod2:.*]] = mul i64 %[[prod1]], %[[extent]]
 // CHECK: %[[prod3:.*]] = mul i64 %[[prod2]], %[[extent]]
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod3]], 0
@@ -366,12 +366,13 @@ func.func @allocmem_array_with_holes_dynchar(%arg0: index, %arg1: index) -> !fir
 // CHECK:    %[[VAL_0:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, ptr, [1 x i64] }, i64 1
 // CHECK:    %[[VAL_3:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]], ptr, [1 x i64] }, i64 1
 // CHECK:    %[[VAL_2:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, ptr, [1 x i64] }, i64 1
-
+func.func private @foo(%0: !fir.ref<!fir.class<none>>, %1: !fir.ref<!fir.class<!fir.array<?xnone>>>, %2: !fir.ref<!fir.box<none>>, %3: !fir.ref<!fir.box<!fir.array<?xnone>>>)
 func.func @alloca_unlimited_polymorphic_box() {
   %0 = fir.alloca !fir.class<none>
   %1 = fir.alloca !fir.class<!fir.array<?xnone>>
   %2 = fir.alloca !fir.box<none>
   %3 = fir.alloca !fir.box<!fir.array<?xnone>>
+  fir.call @foo(%0, %1, %2, %3) : (!fir.ref<!fir.class<none>>, !fir.ref<!fir.class<!fir.array<?xnone>>>, !fir.ref<!fir.box<none>>, !fir.ref<!fir.box<!fir.array<?xnone>>>) -> ()
   return
 }
 // Note: allocmem of fir.box are not possible (fir::HeapType::verify does not
diff --git a/flang/test/Fir/arrexp.fir b/flang/test/Fir/arrexp.fir
index e8ec8ac79e0c2..2eb717228d998 100644
--- a/flang/test/Fir/arrexp.fir
+++ b/flang/test/Fir/arrexp.fir
@@ -143,9 +143,9 @@ func.func @f6(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: f32) {
   %c9 = arith.constant 9 : index
   %c10 = arith.constant 10 : index
 
-  // CHECK: %[[EXT_GEP:.*]] = getelementptr {{.*}} %[[A]], i32 0, i32 7, i64 0, i32 1
+  // CHECK: %[[EXT_GEP:.*]] = getelementptr {{.*}} %[[A]], i32 0, i32 7, i32 0, i32 1
   // CHECK: %[[EXTENT:.*]] = load i64, ptr %[[EXT_GEP]]
-  // CHECK: %[[SIZE:.*]] = mul i64 4, %[[EXTENT]]
+  // CHECK: %[[SIZE:.*]] = mul i64 %[[EXTENT]], 4
   // CHECK: %[[CMP:.*]] = icmp sgt i64 %[[SIZE]], 0
   // CHECK: %[[SZ:.*]] = select i1 %[[CMP]], i64 %[[SIZE]], i64 1
   // CHECK: %[[MALLOC:.*]] = call ptr @malloc(i64 %[[SZ]])
diff --git a/flang/test/Fir/basic-program.fir b/flang/test/Fir/basic-program.fir
index c9fe53bf093a1..6bad03dded24d 100644
--- a/flang/test/Fir/basic-program.fir
+++ b/flang/test/Fir/basic-program.fir
@@ -158,4 +158,6 @@ func.func @_QQmain() {
 // PASSES-NEXT:  LowerNontemporalPass
 // PASSES-NEXT: FIRToLLVMLowering
 // PASSES-NEXT: ReconcileUnrealizedCasts
+// PASSES-NEXT: 'llvm.func' Pipeline
+// PASSES-NEXT: PrepareForOMPOffloadPrivatizationPass
 // PASSES-NEXT: LLVMIRLoweringPass
diff --git a/flang/test/Fir/box.fir b/flang/test/Fir/box.fir
index c0cf3d8375983..760fbd4792122 100644
--- a/flang/test/Fir/box.fir
+++ b/flang/test/Fir/box.fir
@@ -57,7 +57,7 @@ func.func @fa(%a : !fir.ref<!fir.array<100xf32>>) {
 // CHECK-SAME: ptr {{[^%]*}}%[[res:.*]], ptr {{[^%]*}}%[[arg0:.*]], i64 %[[arg1:.*]])
 func.func @b1(%arg0 : !fir.ref<!fir.char<1,?>>, %arg1 : index) -> !fir.box<!fir.char<1,?>> {
   // CHECK: %[[alloca:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8 }
-  // CHECK: %[[size:.*]] = mul i64 1, %[[arg1]]
+  // CHECK: %[[size:.*]] = mul i64 %[[arg1]], 1
   // CHECK: insertvalue {{.*}} undef, i64 %[[size]], 1
   // CHECK: insertvalue {{.*}} i32 20240719, 2
   // CHECK: insertvalue {{.*}} ptr %[[arg0]], 0
@@ -89,7 +89,7 @@ func.func @b2(%arg0 : !fir.ref<!fir.array<?x!fir.char<1,5>>>, %arg1 : index) ->
 func.func @b3(%arg0 : !fir.ref<!fir.array<?x!fir.char<1,?>>>, %arg1 : index, %arg2 : index) -> !fir.box<!fir.array<?x!fir.char<1,?>>> {
   %1 = fir.shape %arg2 : (index) -> !fir.shape<1>
   // CHECK: %[[alloca:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }
-  // CHECK: %[[size:.*]] = mul i64 1, %[[arg1]]
+  // CHECK: %[[size:.*]] = mul i64 %[[arg1]], 1
   // CHECK: insertvalue {{.*}} i64 %[[size]], 1
   // CHECK: insertvalue {{.*}} i32 20240719, 2
   // CHECK: insertvalue {{.*}} i64 %[[arg2]], 7, 0, 1
@@ -108,7 +108,7 @@ func.func @b4(%arg0 : !fir.ref<!fir.array<7x!fir.char<1,?>>>, %arg1 : index) ->
   %c_7 = arith.constant 7 : index
   %1 = fir.shape %c_7 : (index) -> !fir.shape<1>
   // CHECK: %[[alloca:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }
-  // CHECK:   %[[size:.*]] = mul i64 1, %[[arg1]]
+  // CHECK:   %[[size:.*]] = mul i64 %[[arg1]], 1
   // CHECK: insertvalue {{.*}} i64 %[[size]], 1
   // CHECK: insertvalue {{.*}} i32 20240719, 2
   // CHECK: insertvalue {{.*}} i64 7, 7, 0, 1
diff --git a/flang/test/Fir/boxproc.fir b/flang/test/Fir/boxproc.fir
index 97d9b38ed6f40..d4c36a4f5b213 100644
--- a/flang/test/Fir/boxproc.fir
+++ b/flang/test/Fir/boxproc.fir
@@ -82,12 +82,8 @@ func.func @_QPtest_proc_dummy_other(%arg0: !fir.boxproc<() -> ()>) {
 // CHECK:         store [1 x i8] c" ", ptr %[[VAL_18]], align 1
 // CHECK:         call void @llvm.init.trampoline(ptr %[[VAL_20]], ptr @_QFtest_proc_dummy_charPgen_message, ptr %[[VAL_2]])
 // CHECK:         %[[VAL_23:.*]] = call ptr @llvm.adjust.trampoline(ptr %[[VAL_20]])
-// CHECK:         %[[VAL_25:.*]] = insertvalue { ptr, i64 } undef, ptr %[[VAL_23]], 0
-// CHECK:         %[[VAL_26:.*]] = insertvalue { ptr, i64 } %[[VAL_25]], i64 10, 1
 // CHECK:         %[[VAL_27:.*]] = call ptr @llvm.stacksave.p0()
-// CHECK:         %[[VAL_28:.*]] = extractvalue { ptr, i64 } %[[VAL_26]], 0
-// CHECK:         %[[VAL_29:.*]] = extractvalue { ptr, i64 } %[[VAL_26]], 1
-// CHECK:         %[[VAL_30:.*]] = call { ptr, i64 } @_QPget_message(ptr %[[VAL_0]], i64 40, ptr %[[VAL_28]], i64 %[[VAL_29]])
+// CHECK:         %[[VAL_30:.*]] = call { ptr, i64 } @_QPget_message(ptr %[[VAL_0]], i64 40, ptr %[[VAL_23]], i64 10)
 // CHECK:         %[[VAL_32:.*]] = call i1 @_FortranAioOutputAscii(ptr %{{.*}}, ptr %[[VAL_0]], i64 40)
 // CHECK:         call void @llvm.stackrestore.p0(ptr %[[VAL_27]])
 
@@ -115,14 +111,10 @@ func.func @_QPtest_proc_dummy_other(%arg0: !fir.boxproc<() -> ()>) {
 // CHECK-LABEL: define { ptr, i64 } @_QPget_message(ptr
 // CHECK-SAME:                  %[[VAL_0:.*]], i64 %[[VAL_1:.*]], ptr %[[VAL_2:.*]], i64
 // CHECK-SAME:                                                 %[[VAL_3:.*]])
-// CHECK:         %[[VAL_4:.*]] = insertvalue { ptr, i64 } undef, ptr %[[VAL_2]], 0
-// CHECK:         %[[VAL_5:.*]] = insertvalue { ptr, i64 } %[[VAL_4]], i64 %[[VAL_3]], 1
-// CHECK:         %[[VAL_7:.*]] = extractvalue { ptr, i64 } %[[VAL_5]], 0
-// CHECK:         %[[VAL_8:.*]] = extractvalue { ptr, i64 } %[[VAL_5]], 1
 // CHECK:         %[[VAL_9:.*]] = call ptr @llvm.stacksave.p0()
-// CHECK:         %[[VAL_10:.*]] = alloca i8, i64 %[[VAL_8]], align 1
-// CHECK:         %[[VAL_12:.*]] = call { ptr, i64 } %[[VAL_7]](ptr %[[VAL_10]], i64 %[[VAL_8]])
-// CHECK:         %[[VAL_13:.*]] = add i64 %[[VAL_8]], 12
+// CHECK:         %[[VAL_10:.*]] = alloca i8, i64 %[[VAL_3]], align 1
+// CHECK:         %[[VAL_12:.*]] = call { ptr, i64 } %[[VAL_2]](ptr %[[VAL_10]], i64 %[[VAL_3]])
+// CHECK:         %[[VAL_13:.*]] = add i64 %[[VAL_3]], 12
 // CHECK:         %[[VAL_14:.*]] = alloca i8, i64 %[[VAL_13]], align 1
 // CHECK:         call void @llvm.memmove.p0.p0.i64(ptr %[[VAL_14]], ptr {{.*}}, i64 12, i1 false)
 // CHECK:         %[[VAL_18:.*]] = phi i64
diff --git a/flang/test/Fir/embox.fir b/flang/test/Fir/embox.fir
index 0f304cff2c79e..11f7457b6873c 100644
--- a/flang/test/Fir/embox.fir
+++ b/flang/test/Fir/embox.fir
@@ -11,7 +11,7 @@ func.func @_QPtest_callee(%arg0: !fir.box<!fir.array<?xi32>>) {
 func.func @_QPtest_slice() {
 // CHECK:  %[[a1:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8
 // CHECK:  %[[a2:.*]] = alloca [20 x i32], i64 1, align 4
-// CHECK:  %[[a3:.*]] = getelementptr [20 x i32], ptr %[[a2]], i64 0, i64 0
+// CHECK:  %[[a3:.*]] = getelementptr [20 x i32], ptr %[[a2]], i32 0, i64 0
 // CHECK:  %[[a4:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }
 // CHECK:  { ptr undef, i64 4, i32 20240719, i8 1, i8 9, i8 0, i8 0, [1 x [3 x i64]]
 // CHECK: [i64 1, i64 5, i64 8]] }, ptr %[[a3]], 0
@@ -38,7 +38,7 @@ func.func @_QPtest_dt_callee(%arg0: !fir.box<!fir.array<?xi32>>) {
 func.func @_QPtest_dt_slice() {
 // CHECK:  %[[a1:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8
 // CHECK:  %[[a3:.*]] = alloca [20 x %_QFtest_dt_sliceTt], i64 1, align 8
-// CHECK:  %[[a4:.*]] = getelementptr [20 x %_QFtest_dt_sliceTt], ptr %[[a3]], i64 0, i64 0, i32 0
+// CHECK:  %[[a4:.*]] = getelementptr [20 x %_QFtest_dt_sliceTt], ptr %[[a3]], i32 0, i64 0, i32 0
 // CHECK: %[[a5:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }
 // CHECK-SAME: { ptr undef, i64 4, i32 20240719, i8 1, i8 9, i8 0, i8 0, [1 x [3 x i64]]
 // CHECK-SAME: [i64 1, i64 5, i64 16
@@ -73,7 +73,7 @@ func.func @emboxSubstring(%arg0: !fir.ref<!fir.array<2x3x!fir.char<1,4>>>) {
   %0 = fir.shape %c2, %c3 : (index, index) -> !fir.shape<2>
   %1 = fir.slice %c1, %c2, %c1, %c1, %c3, %c1 substr %c1_i64, %c2_i64 : (index, index, index, index, index, index, i64, i64) -> !fir.slice<2>
   %2 = fir.embox %arg0(%0) [%1] : (!fir.ref<!fir.array<2x3x!fir.char<1,4>>>, !fir.shape<2>, !fir.slice<2>) -> !fir.box<!fir.array<?x?x!fir.char<1,?>>>
-  // CHECK: %[[addr:.*]] = getelementptr [3 x [2 x [4 x i8]]], ptr %[[arg0]], i64 0, i64 0, i64 0, i64 1
+  // CHECK: %[[addr:.*]] = getelementptr [3 x [2 x [4 x i8]]], ptr %[[arg0]], i32 0, i64 0, i64 0, i32 1
   // CHECK: insertvalue {[[descriptorType:.*]]} { ptr undef, i64 2, i32 20240719, i8 2, i8 40, i8 0, i8 0
   // CHECK-SAME: [2 x [3 x i64]] [{{\[}}3 x i64] [i64 1, i64 2, i64 4], [3 x i64] [i64 1, i64 3, i64 8]] }
   // CHECK-SAME: ptr %[[addr]], 0
diff --git a/flang/test/Fir/omp-reduction-embox-codegen.fir b/flang/test/Fir/omp-reduction-embox-codegen.fir
index 1645e1a407ad4..e517b1352ff5c 100644
--- a/flang/test/Fir/omp-reduction-embox-codegen.fir
+++ b/flang/test/Fir/omp-reduction-embox-codegen.fir
@@ -23,14 +23,14 @@ omp.declare_reduction @test_reduction : !fir.ref<!fir.box<i32>> init {
   omp.yield(%0 : !fir.ref<!fir.box<i32>>)
 }
 
-func.func @_QQmain() attributes {fir.bindc_name = "reduce"} {
+func.func @_QQmain()  -> !fir.ref<!fir.box<i32>> attributes {fir.bindc_name = "reduce"} {
   %4 = fir.alloca !fir.box<i32>
   omp.parallel reduction(byref @test_reduction %4 -> %arg0 : !fir.ref<!fir.box<i32>>) {
     omp.terminator
   }
-  return
+  return %4: !fir.ref<!fir.box<i32>>
 }
 
 // basically we are testing that there isn't a crash
-// CHECK-LABEL: define void @_QQmain
+// CHECK-LABEL: define ptr @_QQmain
 // CHECK-NEXT:    alloca { ptr, i64, i32, i8, i8, i8, i8 }, i64 1, align 8
diff --git a/flang/test/Fir/optional.fir b/flang/test/Fir/optional.fir
index bded8b5332a30..66ff69f083467 100644
--- a/flang/test/Fir/optional.fir
+++ b/flang/test/Fir/optional.fir
@@ -37,8 +37,7 @@ func.func @bar2() -> i1 {
 
 // CHECK-LABEL: @foo3
 func.func @foo3(%arg0: !fir.boxchar<1>) -> i1 {
-  // CHECK: %[[extract:.*]] = extractvalue { ptr, i64 } %{{.*}}, 0
-  // CHECK: %[[ptr:.*]] = ptrtoint ptr %[[extract]] to i64
+  // CHECK: %[[ptr:.*]] = ptrtoint ptr %0 to i64
   // CHECK: icmp ne i64 %[[ptr]], 0
   %0 = fir.is_present %arg0 : (!fir.boxchar<1>) -> i1
   return %0 : i1
diff --git a/flang/test/Fir/pdt.fir b/flang/test/Fir/pdt.fir
index a200cd7e7cc03..411927aae6bdf 100644
--- a/flang/test/Fir/pdt.fir
+++ b/flang/test/Fir/pdt.fir
@@ -96,13 +96,13 @@ func.func @_QTt1P.f2.offset(%0 : i32, %1 : i32) -> i32 {
 
 func.func private @bar(!fir.ref<!fir.char<1,?>>)
 
-// CHECK-LABEL: define void @_QPfoo(i32 %0, i32 %1)
-func.func @_QPfoo(%arg0 : i32, %arg1 : i32) {
+// CHECK-LABEL: define ptr @_QPfoo(i32 %0, i32 %1)
+func.func @_QPfoo(%arg0 : i32, %arg1 : i32) -> !fir.ref<!fir.type<_QTt1>> {
   // CHECK: %[[size:.*]] = call i64 @_QTt1P.mem.size(i32 %0, i32 %1)
   // CHECK: %[[alloc:.*]] = alloca i8, i64 %[[size]]
   %0 = fir.alloca !fir.type<_QTt1(p1:i32,p2:i32){f1:!fir.char<1,?>,f2:!fir.char<1,?>}>(%arg0, %arg1 : i32, i32)
   //%2 = fir.coordinate_of %0, f2 : (!fir.ref<!fir.type<_QTt1>>) -> !fir.ref<!fir.char<1,?>>
   %2 = fir.zero_bits !fir.ref<!fir.char<1,?>>
   fir.call @bar(%2) : (!fir.ref<!fir.char<1,?>>) -> ()
-  return
+  return %0 : !fir.ref<!fir.type<_QTt1>>
 }
diff --git a/flang/test/Fir/rebox.fir b/flang/test/Fir/rebox.fir
index 0c9f6d9bb94ad..d858adfb7c45d 100644
--- a/flang/test/Fir/rebox.fir
+++ b/flang/test/Fir/rebox.fir
@@ -36,7 +36,7 @@ func.func @test_rebox_1(%arg0: !fir.box<!fir.array<?x?xf32>>) {
   // CHECK: %[[VOIDBASE0:.*]] = getelementptr i8, ptr %[[INBASE]], i64 %[[OFFSET_0]]
   // CHECK: %[[OFFSET_1:.*]] = mul i64 2, %[[INSTRIDE_1]]
   // CHECK: %[[VOIDBASE1:.*]] = getelementptr i8, ptr %[[VOIDBASE0]], i64 %[[OFFSET_1]]
-  // CHECK: %[[OUTSTRIDE0:.*]] = mul i64 3, %[[INSTRIDE_1]]
+  // CHECK: %[[OUTSTRIDE0:.*]] = mul i64 %[[INSTRIDE_1]], 3
   // CHECK: %[[OUTBOX1:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } %{{.*}}, i64 %[[OUTSTRIDE0]], 7, 0, 2
   // CHECK: %[[OUTBOX2:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } %[[OUTBOX1]], ptr %[[VOIDBASE1]], 0
   // CHECK: store { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } %[[OUTBOX2]], ptr %[[OUTBOX_ALLOC]], align 8
@@ -63,7 +63,7 @@ func.func @test_rebox_2(%arg0: !fir.box<!fir.array<?x?x!fir.char<1,?>>>) {
   // CHECK: %[[OUTBOX:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [2 x [3 x i64]] }
   // CHECK: %[[LEN_GEP:.*]] = getelementptr { ptr, i64, i32, i8, i8, i8, i8, [2 x [3 x i64]] }, ptr %[[INBOX]], i32 0, i32 1
   // CHECK: %[[LEN:...
[truncated]

@github-actions
Copy link

github-actions bot commented Aug 26, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor

@tblah tblah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you decide to do this in a pass instead of handling it in MLIR -> LLVM conversion as is done for omp task?

@bhandarkar-pranav
Copy link
Contributor Author

Why did you decide to do this in a pass instead of handling it in MLIR -> LLVM conversion as is done for omp task?

I did anticipate this question especially because MLIR -> LLVM translation is where I had first started out with the intention of extending your work on omp.task to omp.target. I am sorry, in hindsight now, I should have included the explanation in my commit message itself.

A couple of reasons make it too late to do this during MLIR - LLVMIR translation. Too late as in not impossible, but arguably harder to get correct and maintain thereafter. Essentially, what we need to do is

  1. Copy the privatized variable from the stack to the heap.
  2. Fix up any captures of the address of the privatized variable that that are used by the omp.target. Typically, this would be omp.map.info operations
  1. At the time of MLIR -> LLVM conversion the address of the private variable is "captured" into some data structures in memory such as those that process map operations (MapInfoData). MapInfoData is then used to codegen an array of pointers to be offloaded (offload_baseptrs and offload_ptrs).

Now, to allocate heap memory for the private variable, we'd have two options

  1. Create the allocation after omp.map.info operations are processed to create MapInfoData datastructures but before OMPIRBuilder codegens offload_baseptrs and offload_ptrs. This would involve going back into the MapInfoData structures and updating the pointers to private variables with the heap-allocated addresses.
  2. Create the allocation after the array of offloaded pointers have been created by OMPIRBuilder. In this case, we'd have to keep track of and go back and create LLVM IR (.ll) to update some index of offload_baseptrs with the heap allocated one. For instance, this is what it'd look like if the 2nd index out of 4 of offload_baseptrs was the address of a private variable
%priv_var = alloca ...
%0 = getlementptrs [4 x ptr ] offload_base_ptrs, 0, 2
store %priv_var, %0
...
...
%priv_var_heap = call  malloc (size_of_priv_var)
%1 = getlementptrs [4 x ptr ] offload_base_ptrs, 0, 2
store %priv_var_heap, %1

This requires addl bookkeeping and coordination between OpenMPToIRTranslation and OMPIRBuilder to record that index 2 of offload_baseptrs needs to be updated just before the offloading call is made. I feel 1 is better than 2 because the data structures to be updated are not in LLVM IR, but simply in memory. But, if we instead update the original omp.map.info operations themselves with heap memory (ie do the allocation before MLIR -> LLVMIR) then the rest of process moves smoothly without us having to go back and update anything. We achieve a clean separation of concern this way.

…get-tasks

This patch adds support for translation of the private clause on deferred
target tasks - that is `omp.target` operations with the `nowait` clause.

An offloading call for a deferred target-task is not blocking - the offloading
host task continues it execution after issuing the offloading call. Therefore,
the key problem we need to solve is to ensure that the data needed for private
variables to be initialized in the target task persists even after the host
task has completed.
We do this in a new pass called PrepareForOMPOffloadPrivatizationPass. For a privatized
variable that needs its host counterpart for initialization (such as the shape
of the data from the descriptor when an allocatable is privatized or the value of
the data when an allocatable is firstprivatized),
  - the pass allocates memory on the heap.
  - it then initializes this memory by copying the contents of host variable to
    the newly allocated location on the heap.
  - Then, the pass updates all the `omp.map.info` operations that pointed to the
    host variable to now point to the one located in the heap.

The pass uses a rewrite pattern applied using the greedy pattern matcher, which
in turn does some constant folding and DCE. Due to this a number of lit tests
had to be updated. In GEPs constant get folded into indices and truncated to
i32 types. In some tests sequence of insertvalue and extractvalue instructions
get cancelled out. So, these needed to be updated too.
@bhandarkar-pranav bhandarkar-pranav force-pushed the flang/delayed_priv_def_tgt_tasks_translation branch from ff8afbd to c859bbc Compare August 26, 2025 22:16
@tblah
Copy link
Contributor

tblah commented Aug 27, 2025

Ahh I see what you mean. This is different because as well as being (first)private, these variables may also be mapped, which adds another layer of complexity. I haven't followed much about mapping so I will take your word for it and leave it for experts in offloading to give their opinions.


// Allocate heap memory that corresponds to the type of memory
// pointed to by varPtr
// TODO: For boxchars this likely wont be a pointer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you can see in the code for tasks that boxchars are a hack because you can't really have a !fir.ref<!fir.boxchar<>> in the FIR type system. This is handled for tasks so you can see what I did there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I'll take a look.

// Copy the value of the local variable into the heap-allocated location.
mlir::Location loc = chainOfOps.front()->getLoc();
mlir::Type varType = getElemType(varPtr);
auto loadVal = rewriter.create<LLVM::LoadOp>(loc, varType, varPtr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about more complex types e.g. arrays, derived types?

For firstprivate you can use the copy region in the privatizer. For plain private you just need to use an init region to initialise non-trivial types but don't need to copy. This initialisation and copying must happen synchronously.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right that there is a problem here. The problem is that derived types could have any number of pointers and therefore deep copies will be needed. Are you suggesting that I use the copy region of the privatizer by cloning it to get the deep copy that i need? Before that though, I'd have to allocate memory for each pointer inside the derived type. I was hoping to tackle derived types in a subsequent PR, which I should have made clear in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you intend to handle these in a later PR please catch this and create a useful error message (similar to the not yet implemented messages generated from OpenMP MLIR dialect to LLVM IR conversion.

Yes you can do initial allocation/initialization for the derived type by inlining the init region from the privatizer. If it is first private then you can inline the copy region to get a type appropriate copy. So far as I know init+copy should be sufficient without any extra allocation.

Copy link
Member

@ergawy ergawy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay here Pranav. Went through the whole PR. Awesome work, just a number of not so large comments.

@bhandarkar-pranav
Copy link
Contributor Author

Sorry for the delay here Pranav. Went through the whole PR. Awesome work, just a number of not so large comments.

No problem at all, thank you for the review. I'll take care of your comments today.

@bhandarkar-pranav
Copy link
Contributor Author

@ergawy @tblah - I have made the changes you pointed out. Could you please give this a look again? Thank you

Copy link
Member

@ergawy ergawy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for taking care of my comments Pranav.

Copy link
Contributor

@tblah tblah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks Pranav. Please wait for another approval before merging. I see you already have another approval since I started reviewing!

@bhandarkar-pranav bhandarkar-pranav merged commit e2ad554 into llvm:main Oct 22, 2025
10 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Oct 22, 2025

LLVM Buildbot has detected a new failure on builder sanitizer-x86_64-linux-fast running on sanitizer-buildbot4 while building flang,mlir at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/169/builds/16264

Here is the relevant piece of the build log for the reference
Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using ld.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/main.py:74: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 93112 tests, 64 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..
FAIL: MLIR :: Dialect/OpenMP/omp-offload-privatization-prepare.mlir (93103 of 93112)
******************** TEST 'MLIR :: Dialect/OpenMP/omp-offload-privatization-prepare.mlir' FAILED ********************
Exit Code: 2

Command Output (stdout):
--
# RUN: at line 1
/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/mlir-opt --mlir-disable-threading -omp-offload-privatization-prepare --split-input-file /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/test/Dialect/OpenMP/omp-offload-privatization-prepare.mlir | /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/FileCheck /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/test/Dialect/OpenMP/omp-offload-privatization-prepare.mlir
# executed command: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/mlir-opt --mlir-disable-threading -omp-offload-privatization-prepare --split-input-file /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/test/Dialect/OpenMP/omp-offload-privatization-prepare.mlir
# .---command stderr------------
# | PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace and instructions to reproduce the bug.
# | Stack dump:
# | 0.	Program arguments: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/mlir-opt --mlir-disable-threading -omp-offload-privatization-prepare --split-input-file /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/test/Dialect/OpenMP/omp-offload-privatization-prepare.mlir
# |  #0 0x00005b8f9043b6f6 ___interceptor_backtrace /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:4530:13
# |  #1 0x00005b8f906c8c18 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:834:13
# |  #2 0x00005b8f906c23b9 llvm::sys::RunSignalHandlers() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Support/Signals.cpp:0:5
# |  #3 0x00005b8f906cad2e SignalHandler(int, siginfo_t*, void*) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:426:38
# |  #4 0x0000767c276458d0 (/lib/x86_64-linux-gnu/libc.so.6+0x458d0)
# |  #5 0x0000767c276a49bc pthread_kill (/lib/x86_64-linux-gnu/libc.so.6+0xa49bc)
# |  #6 0x0000767c2764579e raise (/lib/x86_64-linux-gnu/libc.so.6+0x4579e)
# |  #7 0x0000767c276288cd abort (/lib/x86_64-linux-gnu/libc.so.6+0x288cd)
# |  #8 0x00005b8f904be3dc (/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/mlir-opt+0x134013dc)
# |  #9 0x00005b8f904bc27e __sanitizer::Die() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_termination.cpp:52:5
# | #10 0x00005b8f9049cedb push_back /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common.h:543:7
# | #11 0x00005b8f9049cedb __asan::ScopedInErrorReport::~ScopedInErrorReport() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/asan/asan_report.cpp:193:29
# | #12 0x00005b8f9049ed6d __asan::ReportGenericError(unsigned long, unsigned long, unsigned long, unsigned long, bool, unsigned long, unsigned int, bool) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/asan/asan_report.cpp:536:1
# | #13 0x00005b8f9049fab6 __asan_report_load4 /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/asan/asan_rtl.cpp:130:1
# | #14 0x00005b8f9e3ea3ba getTrailingObjectsImpl /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Support/TrailingObjects.h:132:72
# | #15 0x00005b8f9e3ea3ba getTrailingObjects<mlir::detail::OpProperties> /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Support/TrailingObjects.h:273:12
# | #16 0x00005b8f9e3ea3ba getTrailingObjects<mlir::detail::OpProperties> /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Support/TrailingObjects.h:283:53
# | #17 0x00005b8f9e3ea3ba getPropertiesStorageUnsafe /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/include/mlir/IR/Operation.h:915:34
# | #18 0x00005b8f9e3ea3ba getProperties<mlir::omp::MapInfoOp> /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/include/mlir/IR/OpDefinition.h:1981:19
# | #19 0x00005b8f9e3ea3ba mlir::omp::MapInfoOp::getODSOperandIndexAndLength(unsigned int) /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.cpp.inc:10133:40
# | #20 0x00005b8f921d35be getOperation /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/include/mlir/IR/OpDefinition.h:112:38
# | #21 0x00005b8f921d35be mlir::omp::MapInfoOp::getODSOperands(unsigned int) /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.h.inc:7004:23
# | #22 0x00005b8f921cdd7c getMembers /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.h.inc:0:12
# | #23 0x00005b8f921cdd7c (anonymous namespace)::PrepareForOMPOffloadPrivatizationPass::runOnOperation()::'lambda'(mlir::omp::TargetOp)::operator()(mlir::omp::TargetOp) const /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/lib/Dialect/OpenMP/Transforms/OpenMPOffloadPrivatizationPrepare.cpp:258:38
# | #24 0x00005b8f90bf5729 void mlir::detail::walk<mlir::ForwardIterator>(mlir::Operation*, llvm::function_ref<void (mlir::Operation*)>, mlir::WalkOrder) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/include/mlir/IR/Visitors.h:0:9
# | #25 0x00005b8f90bf5729 void mlir::detail::walk<mlir::ForwardIterator>(mlir::Operation*, llvm::function_ref<void (mlir::Operation*)>, mlir::WalkOrder) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/include/mlir/IR/Visitors.h:0:9
Step 10 (stage2/asan_ubsan check) failure: stage2/asan_ubsan check (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using ld.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/main.py:74: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 93112 tests, 64 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..
FAIL: MLIR :: Dialect/OpenMP/omp-offload-privatization-prepare.mlir (93103 of 93112)
******************** TEST 'MLIR :: Dialect/OpenMP/omp-offload-privatization-prepare.mlir' FAILED ********************
Exit Code: 2

Command Output (stdout):
--
# RUN: at line 1
/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/mlir-opt --mlir-disable-threading -omp-offload-privatization-prepare --split-input-file /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/test/Dialect/OpenMP/omp-offload-privatization-prepare.mlir | /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/FileCheck /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/test/Dialect/OpenMP/omp-offload-privatization-prepare.mlir
# executed command: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/mlir-opt --mlir-disable-threading -omp-offload-privatization-prepare --split-input-file /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/test/Dialect/OpenMP/omp-offload-privatization-prepare.mlir
# .---command stderr------------
# | PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace and instructions to reproduce the bug.
# | Stack dump:
# | 0.	Program arguments: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/mlir-opt --mlir-disable-threading -omp-offload-privatization-prepare --split-input-file /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/test/Dialect/OpenMP/omp-offload-privatization-prepare.mlir
# |  #0 0x00005b8f9043b6f6 ___interceptor_backtrace /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:4530:13
# |  #1 0x00005b8f906c8c18 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:834:13
# |  #2 0x00005b8f906c23b9 llvm::sys::RunSignalHandlers() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Support/Signals.cpp:0:5
# |  #3 0x00005b8f906cad2e SignalHandler(int, siginfo_t*, void*) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:426:38
# |  #4 0x0000767c276458d0 (/lib/x86_64-linux-gnu/libc.so.6+0x458d0)
# |  #5 0x0000767c276a49bc pthread_kill (/lib/x86_64-linux-gnu/libc.so.6+0xa49bc)
# |  #6 0x0000767c2764579e raise (/lib/x86_64-linux-gnu/libc.so.6+0x4579e)
# |  #7 0x0000767c276288cd abort (/lib/x86_64-linux-gnu/libc.so.6+0x288cd)
# |  #8 0x00005b8f904be3dc (/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/mlir-opt+0x134013dc)
# |  #9 0x00005b8f904bc27e __sanitizer::Die() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_termination.cpp:52:5
# | #10 0x00005b8f9049cedb push_back /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common.h:543:7
# | #11 0x00005b8f9049cedb __asan::ScopedInErrorReport::~ScopedInErrorReport() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/asan/asan_report.cpp:193:29
# | #12 0x00005b8f9049ed6d __asan::ReportGenericError(unsigned long, unsigned long, unsigned long, unsigned long, bool, unsigned long, unsigned int, bool) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/asan/asan_report.cpp:536:1
# | #13 0x00005b8f9049fab6 __asan_report_load4 /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/asan/asan_rtl.cpp:130:1
# | #14 0x00005b8f9e3ea3ba getTrailingObjectsImpl /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Support/TrailingObjects.h:132:72
# | #15 0x00005b8f9e3ea3ba getTrailingObjects<mlir::detail::OpProperties> /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Support/TrailingObjects.h:273:12
# | #16 0x00005b8f9e3ea3ba getTrailingObjects<mlir::detail::OpProperties> /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Support/TrailingObjects.h:283:53
# | #17 0x00005b8f9e3ea3ba getPropertiesStorageUnsafe /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/include/mlir/IR/Operation.h:915:34
# | #18 0x00005b8f9e3ea3ba getProperties<mlir::omp::MapInfoOp> /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/include/mlir/IR/OpDefinition.h:1981:19
# | #19 0x00005b8f9e3ea3ba mlir::omp::MapInfoOp::getODSOperandIndexAndLength(unsigned int) /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.cpp.inc:10133:40
# | #20 0x00005b8f921d35be getOperation /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/include/mlir/IR/OpDefinition.h:112:38
# | #21 0x00005b8f921d35be mlir::omp::MapInfoOp::getODSOperands(unsigned int) /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.h.inc:7004:23
# | #22 0x00005b8f921cdd7c getMembers /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.h.inc:0:12
# | #23 0x00005b8f921cdd7c (anonymous namespace)::PrepareForOMPOffloadPrivatizationPass::runOnOperation()::'lambda'(mlir::omp::TargetOp)::operator()(mlir::omp::TargetOp) const /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/lib/Dialect/OpenMP/Transforms/OpenMPOffloadPrivatizationPrepare.cpp:258:38
# | #24 0x00005b8f90bf5729 void mlir::detail::walk<mlir::ForwardIterator>(mlir::Operation*, llvm::function_ref<void (mlir::Operation*)>, mlir::WalkOrder) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/include/mlir/IR/Visitors.h:0:9
# | #25 0x00005b8f90bf5729 void mlir::detail::walk<mlir::ForwardIterator>(mlir::Operation*, llvm::function_ref<void (mlir::Operation*)>, mlir::WalkOrder) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/include/mlir/IR/Visitors.h:0:9
Step 14 (stage2/msan check) failure: stage2/msan check (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using ld.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/main.py:74: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 93109 tests, 64 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..
FAIL: MLIR :: Dialect/OpenMP/omp-offload-privatization-prepare-by-value.mlir (93103 of 93109)
******************** TEST 'MLIR :: Dialect/OpenMP/omp-offload-privatization-prepare-by-value.mlir' FAILED ********************
Exit Code: 2

Command Output (stdout):
--
# RUN: at line 1
/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/mlir-opt --mlir-disable-threading -omp-offload-privatization-prepare --split-input-file /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/test/Dialect/OpenMP/omp-offload-privatization-prepare-by-value.mlir | /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/FileCheck /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/test/Dialect/OpenMP/omp-offload-privatization-prepare-by-value.mlir
# executed command: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/mlir-opt --mlir-disable-threading -omp-offload-privatization-prepare --split-input-file /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/test/Dialect/OpenMP/omp-offload-privatization-prepare-by-value.mlir
# .---command stderr------------
# | PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace and instructions to reproduce the bug.
# | Stack dump:
# | 0.	Program arguments: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/mlir-opt --mlir-disable-threading -omp-offload-privatization-prepare --split-input-file /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/test/Dialect/OpenMP/omp-offload-privatization-prepare-by-value.mlir
# |  #0 0x000055555df615f2 ___interceptor_backtrace /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/msan/../sanitizer_common/sanitizer_common_interceptors.inc:4530:13
# |  #1 0x000055555e09c2c6 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:0:13
# |  #2 0x000055555e0997f8 llvm::sys::RunSignalHandlers() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Support/Signals.cpp:0:5
# |  #3 0x000055555e09d687 SignalHandler(int, siginfo_t*, void*) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:426:38
# |  #4 0x000055555df9523e IsInInterceptorScope /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:78:10
# |  #5 0x000055555df9523e SignalAction(int, void*, void*) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:1167:3
# |  #6 0x00007ffff7a458d0 (/lib/x86_64-linux-gnu/libc.so.6+0x458d0)
# |  #7 0x00007ffff7aa49bc pthread_kill (/lib/x86_64-linux-gnu/libc.so.6+0xa49bc)
# |  #8 0x00007ffff7a4579e raise (/lib/x86_64-linux-gnu/libc.so.6+0x4579e)
# |  #9 0x00007ffff7a288cd abort (/lib/x86_64-linux-gnu/libc.so.6+0x288cd)
# | #10 0x000055555df1f87c (/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/mlir-opt+0x89cb87c)
# | #11 0x000055555df1d71e __sanitizer::Die() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_termination.cpp:52:5
# | #12 0x000055555df34d63 (/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/mlir-opt+0x89e0d63)
# | #13 0x000055555e26593d bool llvm::all_equal<llvm::ArrayRef<int>&>(llvm::ArrayRef<int>&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/ADT/STLExtras.h:2100:0
# | #14 0x00005555658b5be0 mlir::omp::MapInfoOp::getODSOperandIndexAndLength(unsigned int) /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/tools/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.cpp.inc:10137:11
# | #15 0x000055555eeea33f mlir::omp::MapInfoOp::getODSOperands(unsigned int) /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/tools/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.h.inc:0:23
# | #16 0x000055555eee6e56 operator() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/lib/Dialect/OpenMP/Transforms/OpenMPOffloadPrivatizationPrepare.cpp:258:38
# | #17 0x000055555eee6e56 operator() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/include/mlir/IR/Visitors.h:304:7
# | #18 0x000055555eee6e56 void llvm::function_ref<void (mlir::Operation*)>::callback_fn<std::__1::enable_if<!llvm::is_one_of<mlir::omp::TargetOp, mlir::Operation*, mlir::Region*, mlir::Block*>::value && std::is_same<void, void>::value, void>::type mlir::detail::walk<(mlir::WalkOrder)1, mlir::ForwardIterator, (anonymous namespace)::PrepareForOMPOffloadPrivatizationPass::runOnOperation()::'lambda'(mlir::omp::TargetOp), mlir::omp::TargetOp, void>(mlir::Operation*, (anonymous namespace)::PrepareForOMPOffloadPrivatizationPass::runOnOperation()::'lambda'(mlir::omp::TargetOp)&&)::'lambda'(mlir::Operation*)>(long, mlir::Operation*) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:46:12
# | #19 0x000055555e37a21e void mlir::detail::walk<mlir::ForwardIterator>(mlir::Operation*, llvm::function_ref<void (mlir::Operation*)>, mlir::WalkOrder) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/include/mlir/IR/Visitors.h:153:1
# | #20 0x000055555e37a1db void mlir::detail::walk<mlir::ForwardIterator>(mlir::Operation*, llvm::function_ref<void (mlir::Operation*)>, mlir::WalkOrder) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/include/mlir/IR/Visitors.h:145:27
# | #21 0x000055555e37a1db void mlir::detail::walk<mlir::ForwardIterator>(mlir::Operation*, llvm::function_ref<void (mlir::Operation*)>, mlir::WalkOrder) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/include/mlir/IR/Visitors.h:145:27
# | #22 0x000055555eee4b97 (anonymous namespace)::PrepareForOMPOffloadPrivatizationPass::runOnOperation() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/lib/Dialect/OpenMP/Transforms/OpenMPOffloadPrivatizationPrepare.cpp:336:3
# | #23 0x00005555695be9ec operator() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/lib/Pass/Pass.cpp:610:22
# | #24 0x00005555695be9ec void llvm::function_ref<void ()>::callback_fn<mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_3>(long) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:46:12
# | #25 0x00005555695b4385 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/mlir/lib/Pass/Pass.cpp:0:21

rewriter.setInsertionPoint(cloneModifyAndErase(mapInfoOp));

// Fix any members that may use varPtr to now use heapMem
for (auto member : mapInfoOp.getMembers()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a heap use-after-free here: mapInfoOp should not be accessed here because cloneModifyAndErase (line 255) erased it.

(This was identified by a buildbot: https://lab.llvm.org/buildbot/#/builders/52/builds/12179/steps/12/logs/stdio)

cc @fmayer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposed fix: #164712

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix @thurstond

thurstond added a commit to thurstond/llvm-project that referenced this pull request Oct 22, 2025
mapInfoOp should not be accessed here because cloneModifyAndErase (line 255) erased it. Fix the issue by replacing mapInfoOp with the cloned op.
thurstond added a commit that referenced this pull request Oct 22, 2025
`mapInfoOp.getMembers()` on line 258 is use-after-free, because
cloneModifyAndErase (line 255) erased `mapInfoOp`. Fix the issue by
replacing subsequent `mapInfoOp` usages with `clonedOp`.

Similarly, update `memberMapInfoOp` to avoid subsequent use-after-free.
mikolaj-pirog pushed a commit to mikolaj-pirog/llvm-project that referenced this pull request Oct 23, 2025
…4712)

`mapInfoOp.getMembers()` on line 258 is use-after-free, because
cloneModifyAndErase (line 255) erased `mapInfoOp`. Fix the issue by
replacing subsequent `mapInfoOp` usages with `clonedOp`.

Similarly, update `memberMapInfoOp` to avoid subsequent use-after-free.
cota added a commit to cota/llvm-project that referenced this pull request Oct 23, 2025
…llvm#155348

This fixes strict weak ordering checks violations from llvm#155348
when running these two tests:

    mlir/test/Dialect/OpenMP/omp-offload-privatization-prepare.mlir
    mlir/test/Dialect/OpenMP/omp-offload-privatization-prepare-by-value.mlir

Sample error:

    /stable/src/libcxx/include/__debug_utils/strict_weak_ordering_check.h:50: libc++ Hardening assertion !__comp(*(__first + __a), *(__first + __b)) failed: Your comparator is not a valid strict-weak ordering

This is because (x < x) should be false, not true, to meet
the irreflexibility property. (Note that .dominates(x, x) returns true.)

I'm afraid that even after this commit we can't guarantee a
strict weak ordering, because we can't guarantee transitivity of
equivalence by sorting with a strict dominance function. However the tests
are not failing anymore, and I am not at all familiar with this code so
I will leave this concern up to the original author for consideration.
(Ideas without any further context: I would consider a topological sort
or walking a dominator tree.)

Reference on std::sort and strict weak ordering:
  https://danlark.org/2022/04/20/changing-stdsort-at-googles-scale-and-beyond/
cota added a commit that referenced this pull request Oct 24, 2025
…#155348 (#164833)

This fixes strict weak ordering checks violations from #155348 when
running these two tests:

    mlir/test/Dialect/OpenMP/omp-offload-privatization-prepare.mlir
    mlir/test/Dialect/OpenMP/omp-offload-privatization-prepare-by-value.mlir

Sample error:

    /stable/src/libcxx/include/__debug_utils/strict_weak_ordering_check.h:50: libc++ Hardening assertion !__comp(*__first + __a), *(__first + __b)) failed: Your comparator is not a valid strict-weak ordering

This is because (x < x) should be false, not true, to meet the
irreflexibility property. (Note that .dominates(x, x) returns true.)

I'm afraid that even after this commit we can't guarantee a strict weak
ordering, because we can't guarantee transitivity of equivalence by
sorting with a strict dominance function. However the tests are not
failing anymore, and I am not at all familiar with this code so I will
leave this concern up to the original author for consideration. (Ideas
without any further context: I would consider a topological sort or
walking a dominator tree.)

Reference on std::sort and strict weak ordering:

  https://danlark.org/2022/04/20/changing-stdsort-at-googles-scale-and-beyond/
dvbuka pushed a commit to dvbuka/llvm-project that referenced this pull request Oct 27, 2025
…get-tasks (llvm#155348)

This PR adds support for translation of the private clause on deferred
target tasks - that is `omp.target` operations with the `nowait` clause.

An offloading call for a deferred target-task is not blocking - the
offloading (target-generating) host task continues its execution after issuing the offloading
call. Therefore, the key problem we need to solve is to ensure that the
data needed for private variables to be initialized in the target task
persists even after the host task has completed.
We do this in a new pass called `PrepareForOMPOffloadPrivatizationPass`.
For a privatized variable that needs its host counterpart for
initialization (such as the shape of the data from the descriptor when
an allocatable is privatized or the value of the data when an
allocatable is firstprivatized),
  - the pass allocates memory on the heap.
- it then initializes this memory by using the `init` and `copy` (for
firstprivate) regions of the corresponding `omp::PrivateClauseOp`.
- Finally the memory allocated on the heap is freed using the `dealloc`
region of the same `omp::PrivateClauseOp` instance. This step is not
straightforward though, because we cannot simply free the memory that's
going to be used by another thread without any synchronization. So, for
deallocation, we create a `omp.task` after the `omp.target` and
synchronize the two with a dummy dependency (using the `depend` clause).
In this newly created `omp.task` we do the deallocation.
dvbuka pushed a commit to dvbuka/llvm-project that referenced this pull request Oct 27, 2025
…4712)

`mapInfoOp.getMembers()` on line 258 is use-after-free, because
cloneModifyAndErase (line 255) erased `mapInfoOp`. Fix the issue by
replacing subsequent `mapInfoOp` usages with `clonedOp`.

Similarly, update `memberMapInfoOp` to avoid subsequent use-after-free.
dvbuka pushed a commit to dvbuka/llvm-project that referenced this pull request Oct 27, 2025
…llvm#155348 (llvm#164833)

This fixes strict weak ordering checks violations from llvm#155348 when
running these two tests:

    mlir/test/Dialect/OpenMP/omp-offload-privatization-prepare.mlir
    mlir/test/Dialect/OpenMP/omp-offload-privatization-prepare-by-value.mlir

Sample error:

    /stable/src/libcxx/include/__debug_utils/strict_weak_ordering_check.h:50: libc++ Hardening assertion !__comp(*__first + __a), *(__first + __b)) failed: Your comparator is not a valid strict-weak ordering

This is because (x < x) should be false, not true, to meet the
irreflexibility property. (Note that .dominates(x, x) returns true.)

I'm afraid that even after this commit we can't guarantee a strict weak
ordering, because we can't guarantee transitivity of equivalence by
sorting with a strict dominance function. However the tests are not
failing anymore, and I am not at all familiar with this code so I will
leave this concern up to the original author for consideration. (Ideas
without any further context: I would consider a topological sort or
walking a dominator tree.)

Reference on std::sort and strict weak ordering:

  https://danlark.org/2022/04/20/changing-stdsort-at-googles-scale-and-beyond/
Lukacma pushed a commit to Lukacma/llvm-project that referenced this pull request Oct 29, 2025
…get-tasks (llvm#155348)

This PR adds support for translation of the private clause on deferred
target tasks - that is `omp.target` operations with the `nowait` clause.

An offloading call for a deferred target-task is not blocking - the
offloading (target-generating) host task continues its execution after issuing the offloading
call. Therefore, the key problem we need to solve is to ensure that the
data needed for private variables to be initialized in the target task
persists even after the host task has completed.
We do this in a new pass called `PrepareForOMPOffloadPrivatizationPass`.
For a privatized variable that needs its host counterpart for
initialization (such as the shape of the data from the descriptor when
an allocatable is privatized or the value of the data when an
allocatable is firstprivatized),
  - the pass allocates memory on the heap.
- it then initializes this memory by using the `init` and `copy` (for
firstprivate) regions of the corresponding `omp::PrivateClauseOp`.
- Finally the memory allocated on the heap is freed using the `dealloc`
region of the same `omp::PrivateClauseOp` instance. This step is not
straightforward though, because we cannot simply free the memory that's
going to be used by another thread without any synchronization. So, for
deallocation, we create a `omp.task` after the `omp.target` and
synchronize the two with a dummy dependency (using the `depend` clause).
In this newly created `omp.task` we do the deallocation.
Lukacma pushed a commit to Lukacma/llvm-project that referenced this pull request Oct 29, 2025
…4712)

`mapInfoOp.getMembers()` on line 258 is use-after-free, because
cloneModifyAndErase (line 255) erased `mapInfoOp`. Fix the issue by
replacing subsequent `mapInfoOp` usages with `clonedOp`.

Similarly, update `memberMapInfoOp` to avoid subsequent use-after-free.
Lukacma pushed a commit to Lukacma/llvm-project that referenced this pull request Oct 29, 2025
…llvm#155348 (llvm#164833)

This fixes strict weak ordering checks violations from llvm#155348 when
running these two tests:

    mlir/test/Dialect/OpenMP/omp-offload-privatization-prepare.mlir
    mlir/test/Dialect/OpenMP/omp-offload-privatization-prepare-by-value.mlir

Sample error:

    /stable/src/libcxx/include/__debug_utils/strict_weak_ordering_check.h:50: libc++ Hardening assertion !__comp(*__first + __a), *(__first + __b)) failed: Your comparator is not a valid strict-weak ordering

This is because (x < x) should be false, not true, to meet the
irreflexibility property. (Note that .dominates(x, x) returns true.)

I'm afraid that even after this commit we can't guarantee a strict weak
ordering, because we can't guarantee transitivity of equivalence by
sorting with a strict dominance function. However the tests are not
failing anymore, and I am not at all familiar with this code so I will
leave this concern up to the original author for consideration. (Ideas
without any further context: I would consider a topological sort or
walking a dominator tree.)

Reference on std::sort and strict weak ordering:

  https://danlark.org/2022/04/20/changing-stdsort-at-googles-scale-and-beyond/
aokblast pushed a commit to aokblast/llvm-project that referenced this pull request Oct 30, 2025
…get-tasks (llvm#155348)

This PR adds support for translation of the private clause on deferred
target tasks - that is `omp.target` operations with the `nowait` clause.

An offloading call for a deferred target-task is not blocking - the
offloading (target-generating) host task continues its execution after issuing the offloading
call. Therefore, the key problem we need to solve is to ensure that the
data needed for private variables to be initialized in the target task
persists even after the host task has completed.
We do this in a new pass called `PrepareForOMPOffloadPrivatizationPass`.
For a privatized variable that needs its host counterpart for
initialization (such as the shape of the data from the descriptor when
an allocatable is privatized or the value of the data when an
allocatable is firstprivatized),
  - the pass allocates memory on the heap.
- it then initializes this memory by using the `init` and `copy` (for
firstprivate) regions of the corresponding `omp::PrivateClauseOp`.
- Finally the memory allocated on the heap is freed using the `dealloc`
region of the same `omp::PrivateClauseOp` instance. This step is not
straightforward though, because we cannot simply free the memory that's
going to be used by another thread without any synchronization. So, for
deallocation, we create a `omp.task` after the `omp.target` and
synchronize the two with a dummy dependency (using the `depend` clause).
In this newly created `omp.task` we do the deallocation.
aokblast pushed a commit to aokblast/llvm-project that referenced this pull request Oct 30, 2025
…4712)

`mapInfoOp.getMembers()` on line 258 is use-after-free, because
cloneModifyAndErase (line 255) erased `mapInfoOp`. Fix the issue by
replacing subsequent `mapInfoOp` usages with `clonedOp`.

Similarly, update `memberMapInfoOp` to avoid subsequent use-after-free.
aokblast pushed a commit to aokblast/llvm-project that referenced this pull request Oct 30, 2025
…llvm#155348 (llvm#164833)

This fixes strict weak ordering checks violations from llvm#155348 when
running these two tests:

    mlir/test/Dialect/OpenMP/omp-offload-privatization-prepare.mlir
    mlir/test/Dialect/OpenMP/omp-offload-privatization-prepare-by-value.mlir

Sample error:

    /stable/src/libcxx/include/__debug_utils/strict_weak_ordering_check.h:50: libc++ Hardening assertion !__comp(*__first + __a), *(__first + __b)) failed: Your comparator is not a valid strict-weak ordering

This is because (x < x) should be false, not true, to meet the
irreflexibility property. (Note that .dominates(x, x) returns true.)

I'm afraid that even after this commit we can't guarantee a strict weak
ordering, because we can't guarantee transitivity of equivalence by
sorting with a strict dominance function. However the tests are not
failing anymore, and I am not at all familiar with this code so I will
leave this concern up to the original author for consideration. (Ideas
without any further context: I would consider a topological sort or
walking a dominator tree.)

Reference on std::sort and strict weak ordering:

  https://danlark.org/2022/04/20/changing-stdsort-at-googles-scale-and-beyond/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

flang:driver flang:fir-hlfir flang:openmp flang Flang issues not falling into any other category mlir:core MLIR Core Infrastructure mlir:llvm mlir:openmp mlir

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants