-
Notifications
You must be signed in to change notification settings - Fork 14.6k
[Offload] Introduce ATTACH map-type support for pointer attachment. #149036
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Offload] Introduce ATTACH map-type support for pointer attachment. #149036
Conversation
This patch introduces libomptarget support for the ATTACH map-type, which can be used to implement OpenMP conditional compliant pointer attachment, based on whether the pointer/pointee is newly mapped on a given construct. For example, for the following: ```c int *p; #pragma omp target enter data map(p[1:10]) ``` The following maps can be emitted by clang: ``` (A) &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM &p, &p[1], sizeof(p), ATTACH ``` Without this map-type, the two possible maps emitted by clang: ``` (B) &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM (C) &p, &p[1], 10 * sizeof(p[1]), TO | FROM | PTR_AND_OBJ ```` (B) does not perform any pointer attachment, while (C) also maps the pointer p, which are both incorrect. In terms of implementation, maps with the ATTACH map-type are handled after all other maps have been processed, as it requires knowledge of which new allocations happened as part of the construct. As per OpenMP 5.0, an attachment should happen only when either the pointer or the pointee was newly mapped while handling the construct. Maps with ATTACH map-type-bit do not increase/decrease the ref-count. With OpenMP 6.1, `attach(always/never)` can be used to force/prevent attachment. For `attach(always)`, the compiler will insert the ALWAYS map-type, which would let libomptarget bypass the check about one of the pointer/pointee being new. With `attach(never)`, the ATTACH map will not be emitted at all. The size argument of the ATTACH map-type can specify values greater than `sizeof(void*)` which can be used to support pointer attachment on Fortran descriptors. Note that this also requires shadow-pointer tracking to also support them. That has not been implemented in this patch. This was worked upon in coordination with Ravi Narayanaswamy, who has since retired. Happy retirement, Ravi!
✅ With the latest revision this PR passed the C/C++ code formatter. |
Out of interest, will this change break any of the existing possible mappings clang used to emit (such as B and C in your examples in the description)? I ask as Flang's mappings are based on the way Clang currently handles its mapping, so if this breaks any pre-existing map behaviour, then there's a possibility it'll break a chunk of Flang OpenMP's offload capabilities. It's of course more than fine if it does, but would be good to know ahead of time if there's things that we'd need to address on the Flang side immediately or if we can more slowly phase in the attach system once this PR has landed :-) There's a chunk of fortran OpenMP map tests in check-offload (not a comprehensive set, but it is a good cursory view on if anything might be broken by the changeset) that can be ran to help verify if things are reasonably okay. As an unrelated side note, would be nice if we could align the mapping systems in Flang/Clang one day... |
This change will only affect cases where the compiler emits ATTACH maps. So, the existing PTR_AND_OBJ maps still work as-is. The only potential impact is that we will be keeping track of any new allocations and their sizes in a hashmap. But that information will not be used if no ATTACH map is encountered. The initial PR for emitting ATTACH maps for clang (abhinavgaba#1), which this change was verified with, only uses the ATTACH maps for when the base-pointer that is eligible for attachment is a scalar variable that's not a member of a struct (like the example in the description). For other cases, like: And the map from case |
Thank you very much for clarifying! Excited for the addition, just not so much any potential work moving flang over to it ;-) but has to be done and I imagine it'll give us more flexibility. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @adurang, for adding the plugin interface/changes.
if (IsFirstPointerAttachment) { | ||
IsFirstPointerAttachment = false; | ||
DP("Inserting a data fence before the first pointer attachment.\n"); | ||
Ret = Device.dataFence(AsyncInfo); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be delayed until just before the submitData
call in performPointerAttachment
, but the code-flow would become a bit uglier and less obvious to read, as the flag would have to be passed by reference into performPointerAttachment
, and set to false
in there.
@llvm/pr-subscribers-offload @llvm/pr-subscribers-backend-amdgpu Author: Abhinav Gaba (abhinavgaba) ChangesThis patch introduces libomptarget support for the ATTACH map-type, which can be used to implement OpenMP conditional compliant pointer attachment, based on whether the pointer/pointee is newly mapped on a given construct. For example, for the following: int *p;
#pragma omp target enter data map(p[1:10]) The following maps can be emitted by clang:
Without this map-type, these two possible maps could be emitted by clang:
(B) does not perform any pointer attachment, while (C) also maps the pointer p, which are both incorrect. In terms of implementation, maps with the ATTACH map-type are handled after all other maps have been processed, as it requires knowledge of which new allocations happened as part of the construct. As per OpenMP 5.0, an attachment should happen only when either the pointer or the pointee was newly mapped while handling the construct. Maps with ATTACH map-type-bit do not increase/decrease the ref-count. With OpenMP 6.1, The size argument of the ATTACH map-type can specify values greater than This was worked upon in coordination with Ravi Narayanaswamy, who has since retired. Happy retirement, Ravi! Co-authored-by: Alex Duran <[email protected]> Patch is 33.17 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/149036.diff 11 Files Affected:
diff --git a/offload/include/OpenMP/Mapping.h b/offload/include/OpenMP/Mapping.h
index b9f5c16582931..93c1e56905ae4 100644
--- a/offload/include/OpenMP/Mapping.h
+++ b/offload/include/OpenMP/Mapping.h
@@ -417,12 +417,42 @@ struct MapperComponentsTy {
typedef void (*MapperFuncPtrTy)(void *, void *, void *, int64_t, int64_t,
void *);
+/// Structure to store information about a single ATTACH map entry.
+struct AttachMapInfo {
+ void *PointerBase;
+ void *PointeeBegin;
+ int64_t PointerSize;
+ int64_t MapType;
+ map_var_info_t Pointername;
+
+ AttachMapInfo(void *PointerBase, void *PointeeBegin, int64_t Size,
+ int64_t Type, map_var_info_t Name)
+ : PointerBase(PointerBase), PointeeBegin(PointeeBegin), PointerSize(Size),
+ MapType(Type), Pointername(Name) {}
+};
+
+/// Structure to track ATTACH entries and new allocations across recursive calls
+/// (for handling mappers) to targetDataBegin for a given construct.
+struct AttachInfoTy {
+ /// ATTACH map entries for deferred processing.
+ llvm::SmallVector<AttachMapInfo> AttachEntries;
+
+ /// Key: host pointer, Value: allocation size.
+ llvm::DenseMap<void *, int64_t> NewAllocations;
+
+ AttachInfoTy() = default;
+
+ // Delete copy constructor and copy assignment operator to prevent copying
+ AttachInfoTy(const AttachInfoTy &) = delete;
+ AttachInfoTy &operator=(const AttachInfoTy &) = delete;
+};
+
// Function pointer type for targetData* functions (targetDataBegin,
// targetDataEnd and targetDataUpdate).
typedef int (*TargetDataFuncPtrTy)(ident_t *, DeviceTy &, int32_t, void **,
void **, int64_t *, int64_t *,
map_var_info_t *, void **, AsyncInfoTy &,
- bool);
+ AttachInfoTy *, bool);
void dumpTargetPointerMappings(const ident_t *Loc, DeviceTy &Device,
bool toStdOut = false);
@@ -431,20 +461,26 @@ int targetDataBegin(ident_t *Loc, DeviceTy &Device, int32_t ArgNum,
void **ArgsBase, void **Args, int64_t *ArgSizes,
int64_t *ArgTypes, map_var_info_t *ArgNames,
void **ArgMappers, AsyncInfoTy &AsyncInfo,
+ AttachInfoTy *AttachInfo = nullptr,
bool FromMapper = false);
int targetDataEnd(ident_t *Loc, DeviceTy &Device, int32_t ArgNum,
void **ArgBases, void **Args, int64_t *ArgSizes,
int64_t *ArgTypes, map_var_info_t *ArgNames,
void **ArgMappers, AsyncInfoTy &AsyncInfo,
- bool FromMapper = false);
+ AttachInfoTy *AttachInfo = nullptr, bool FromMapper = false);
int targetDataUpdate(ident_t *Loc, DeviceTy &Device, int32_t ArgNum,
void **ArgsBase, void **Args, int64_t *ArgSizes,
int64_t *ArgTypes, map_var_info_t *ArgNames,
void **ArgMappers, AsyncInfoTy &AsyncInfo,
+ AttachInfoTy *AttachInfo = nullptr,
bool FromMapper = false);
+// Process deferred ATTACH map entries collected during targetDataBegin.
+int processAttachEntries(DeviceTy &Device, AttachInfoTy &AttachInfo,
+ AsyncInfoTy &AsyncInfo);
+
struct MappingInfoTy {
MappingInfoTy(DeviceTy &Device) : Device(Device) {}
diff --git a/offload/include/device.h b/offload/include/device.h
index f4b10abbaa3fd..1e85bb1876c83 100644
--- a/offload/include/device.h
+++ b/offload/include/device.h
@@ -98,6 +98,10 @@ struct DeviceTy {
int32_t dataExchange(void *SrcPtr, DeviceTy &DstDev, void *DstPtr,
int64_t Size, AsyncInfoTy &AsyncInfo);
+ // Insert a data fence between previous data operations and the following
+ // operations if necessary for the device.
+ int32_t dataFence(AsyncInfoTy &AsyncInfo);
+
/// Notify the plugin about a new mapping starting at the host address
/// \p HstPtr and \p Size bytes.
int32_t notifyDataMapped(void *HstPtr, int64_t Size);
diff --git a/offload/include/omptarget.h b/offload/include/omptarget.h
index 6971780c7bdb5..9e4bfd2f9cfbe 100644
--- a/offload/include/omptarget.h
+++ b/offload/include/omptarget.h
@@ -80,6 +80,9 @@ enum tgt_map_type {
// the structured region
// This is an OpenMP extension for the sake of OpenACC support.
OMP_TGT_MAPTYPE_OMPX_HOLD = 0x2000,
+ // Attach pointer and pointee, after processing all other maps.
+ // Applicable to map-entering directives. Does not change ref-count.
+ OMP_TGT_MAPTYPE_ATTACH = 0x4000,
// descriptor for non-contiguous target-update
OMP_TGT_MAPTYPE_NON_CONTIG = 0x100000000000,
// member of struct, member given by [16 MSBs] - 1
diff --git a/offload/libomptarget/device.cpp b/offload/libomptarget/device.cpp
index f88e30ae9e76b..6585286bf4285 100644
--- a/offload/libomptarget/device.cpp
+++ b/offload/libomptarget/device.cpp
@@ -191,6 +191,10 @@ int32_t DeviceTy::dataExchange(void *SrcPtr, DeviceTy &DstDev, void *DstPtr,
DstPtr, Size, AsyncInfo);
}
+int32_t DeviceTy::dataFence(AsyncInfoTy &AsyncInfo) {
+ return RTL->data_fence(RTLDeviceID, AsyncInfo);
+}
+
int32_t DeviceTy::notifyDataMapped(void *HstPtr, int64_t Size) {
DP("Notifying about new mapping: HstPtr=" DPxMOD ", Size=%" PRId64 "\n",
DPxPTR(HstPtr), Size);
diff --git a/offload/libomptarget/interface.cpp b/offload/libomptarget/interface.cpp
index ea354400f2e99..1a65262f9dcda 100644
--- a/offload/libomptarget/interface.cpp
+++ b/offload/libomptarget/interface.cpp
@@ -165,12 +165,27 @@ targetData(ident_t *Loc, int64_t DeviceId, int32_t ArgNum, void **ArgsBase,
OMPT_GET_RETURN_ADDRESS);)
int Rc = OFFLOAD_SUCCESS;
+
+ // Only allocate AttachInfo for targetDataBegin
+ AttachInfoTy *AttachInfo = nullptr;
+ if (TargetDataFunction == targetDataBegin)
+ AttachInfo = new AttachInfoTy();
+
Rc = TargetDataFunction(Loc, *DeviceOrErr, ArgNum, ArgsBase, Args, ArgSizes,
- ArgTypes, ArgNames, ArgMappers, AsyncInfo,
- false /*FromMapper=*/);
+ ArgTypes, ArgNames, ArgMappers, AsyncInfo, AttachInfo,
+ /*FromMapper=*/false);
- if (Rc == OFFLOAD_SUCCESS)
- Rc = AsyncInfo.synchronize();
+ if (Rc == OFFLOAD_SUCCESS) {
+ // Process deferred ATTACH entries BEFORE synchronization
+ if (AttachInfo && !AttachInfo->AttachEntries.empty())
+ Rc = processAttachEntries(*DeviceOrErr, *AttachInfo, AsyncInfo);
+
+ if (Rc == OFFLOAD_SUCCESS)
+ Rc = AsyncInfo.synchronize();
+ }
+
+ if (AttachInfo)
+ delete AttachInfo;
handleTargetOutcome(Rc == OFFLOAD_SUCCESS, Loc);
}
diff --git a/offload/libomptarget/omptarget.cpp b/offload/libomptarget/omptarget.cpp
index 5b25d955dd320..eebfa340c8472 100644
--- a/offload/libomptarget/omptarget.cpp
+++ b/offload/libomptarget/omptarget.cpp
@@ -293,7 +293,8 @@ void targetUnlockExplicit(void *HostPtr, int DeviceNum, const char *Name) {
int targetDataMapper(ident_t *Loc, DeviceTy &Device, void *ArgBase, void *Arg,
int64_t ArgSize, int64_t ArgType, map_var_info_t ArgNames,
void *ArgMapper, AsyncInfoTy &AsyncInfo,
- TargetDataFuncPtrTy TargetDataFunction) {
+ TargetDataFuncPtrTy TargetDataFunction,
+ AttachInfoTy *AttachInfo = nullptr) {
DP("Calling the mapper function " DPxMOD "\n", DPxPTR(ArgMapper));
// The mapper function fills up Components.
@@ -324,17 +325,172 @@ int targetDataMapper(ident_t *Loc, DeviceTy &Device, void *ArgBase, void *Arg,
MapperArgsBase.data(), MapperArgs.data(),
MapperArgSizes.data(), MapperArgTypes.data(),
MapperArgNames.data(), /*arg_mappers*/ nullptr,
- AsyncInfo, /*FromMapper=*/true);
+ AsyncInfo, AttachInfo, /*FromMapper=*/true);
return Rc;
}
+/// Utility function to perform a pointer attachment operation.
+///
+/// For something like:
+/// ```cpp
+/// int *p;
+/// ...
+/// #pragma omp target enter data map(to:p[10:10])
+/// ```
+///
+/// for which the attachment operation gets represented using:
+/// ```
+/// &p, &p[10], sizeof(p), ATTACH
+/// ```
+///
+/// (Hst|Tgt)PtrAddr represents &p
+/// (Hst|Tgt)PteeBase represents &p[0]
+/// (Hst|Tgt)PteeBegin represents &p[10]
+///
+/// This function first computes the expected TgtPteeBase using:
+/// `<Select>TgtPteeBase = TgtPteeBegin - (HstPteeBegin - HstPteeBase)`
+///
+/// and then attaches TgtPteeBase to TgtPtrAddr.
+///
+/// \p HstPtrSize represents the size of the pointer p. For C/C++, this
+/// should be same as "sizeof(void*)" (say 8).
+///
+/// However, for Fortran, pointers/allocatables, which are also eligible for
+/// "pointer-attachment", may be implemented using descriptors that contain the
+/// address of the pointee in the first 8 bytes, but also contain other
+/// information such as lower-bound/upper-bound etc in their subsequent fields.
+///
+/// For example, for the following:
+/// ```fortran
+/// integer, allocatable :: x(:)
+/// integer, pointer :: p(:)
+/// ...
+/// p => x(10: 19)
+/// ...
+/// !$omp target enter data map(to:p(:))
+/// ```
+///
+/// The map should trigger a pointer-attachment (assuming the pointer-attachment
+/// conditions as noted on processAttachEntries are met) between the descriptor
+/// for p, and its pointee data.
+///
+/// Since only the first 8 bytes of the descriptor contain the address of the
+/// pointee, an attachment operation on device descriptors involves:
+/// * Setting the first 8 bytes of the device descriptor to point the device
+/// address of the pointee.
+/// * Copying the remaining information about bounds/offset etc. from the host
+/// descriptor to the device descriptor.
+///
+/// The function also handles pointer-attachment portion of PTR_AND_OBJ maps,
+/// like:
+/// ```
+/// &p, &p[10], 10 * sizeof(p[10]), PTR_AND_OBJ
+/// ```
+/// by using `sizeof(void*)` as \p HstPtrSize.
+static int performPointerAttachment(DeviceTy &Device, AsyncInfoTy &AsyncInfo,
+ void **HstPtrAddr, void *HstPteeBase,
+ void *HstPteeBegin, void **TgtPtrAddr,
+ void *TgtPteeBegin, int64_t HstPtrSize,
+ TargetPointerResultTy &PtrTPR) {
+ assert(PtrTPR.getEntry() &&
+ "Need a valid pointer entry to perform pointer-attachment");
+
+ int64_t VoidPtrSize = sizeof(void *);
+ assert(HstPtrSize >= VoidPtrSize && "PointerSize is too small");
+
+ uint64_t Delta = (uint64_t)HstPteeBegin - (uint64_t)HstPteeBase;
+ void *TgtPteeBase = (void *)((uint64_t)TgtPteeBegin - Delta);
+
+ // Add shadow pointer tracking
+ // TODO: Support shadow-tracking of larger than VoidPtrSize pointers,
+ // to support restoration of Fortran descriptors. Currently, this check
+ // would return false, even if the host Fortran descriptor had been
+ // updated since its previous map, and we should have updated its
+ // device counterpart. e.g.
+ //
+ // !$omp target enter data map(x(1:100)) ! (1)
+ // p => x(10: 19)
+ // !$omp target enter data map(p, p(:)) ! (2)
+ // p => x(5: 9)
+ // !$omp target enter data map(attach(always): p(:)) ! (3)
+ //
+ // While PtrAddr(&desc_p) and PteeBase(&p(1)) are same for (2) and (3), the
+ // pointer attachment for (3) needs to update the bounds information
+ // in the descriptor of p on device.
+ if (!PtrTPR.getEntry()->addShadowPointer(
+ ShadowPtrInfoTy{HstPtrAddr, HstPteeBase, TgtPtrAddr, TgtPteeBase}))
+ return OFFLOAD_SUCCESS;
+
+ DP("Update pointer (" DPxMOD ") -> [" DPxMOD "]\n", DPxPTR(TgtPtrAddr),
+ DPxPTR(TgtPteeBase));
+
+ // Lambda to handle submitData result and perform final steps.
+ auto HandleSubmitResult = [&](int SubmitResult) -> int {
+ if (SubmitResult != OFFLOAD_SUCCESS) {
+ REPORT("Failed to update pointer on device.\n");
+ return OFFLOAD_FAIL;
+ }
+
+ if (PtrTPR.getEntry()->addEventIfNecessary(Device, AsyncInfo) !=
+ OFFLOAD_SUCCESS)
+ return OFFLOAD_FAIL;
+
+ return OFFLOAD_SUCCESS;
+ };
+
+ bool IsPtrAFortranDescriptor = HstPtrSize > VoidPtrSize;
+ if (!IsPtrAFortranDescriptor) {
+ // For "regular" pointers, we can use the VoidPtrLocation from AsyncInfo as
+ // the buffer space for the submission.
+ void *&BufferElement = AsyncInfo.getVoidPtrLocation();
+ BufferElement = TgtPteeBase;
+
+ // Submit the updated pointer value to device
+ return HandleSubmitResult(Device.submitData(
+ TgtPtrAddr, &BufferElement, VoidPtrSize, AsyncInfo, PtrTPR.getEntry()));
+ }
+
+ // For larger "pointers" (like Fortran's descriptors), we create a dynamic
+ // buffer, which will be eventually destroyed by AsyncInfo's post-processing
+ // callback.
+ char *DataBuffer = new char[HstPtrSize];
+
+ // For such descriptors, to the first VoidPtrSize bytes, we store the
+ // pointee's device address.
+ std::memcpy(DataBuffer, &TgtPteeBase, sizeof(void *));
+
+ // And to the remaining bytes, we copy the remaining contents of the host
+ // descriptor after the initial VoidPtrSize bytes.
+ uint64_t HstDescriptorFieldsSize = HstPtrSize - VoidPtrSize;
+ void *HstDescriptorFieldsAddr = (char *)HstPtrAddr + VoidPtrSize;
+ std::memcpy(DataBuffer + VoidPtrSize, HstDescriptorFieldsAddr,
+ HstDescriptorFieldsSize);
+
+ DP("Updating %" PRId64 " bytes of descriptor (" DPxMOD ") (pointer + %" PRId64
+ " additional bytes from host descriptor " DPxMOD ")\n",
+ HstPtrSize, DPxPTR(TgtPtrAddr), HstDescriptorFieldsSize,
+ DPxPTR(HstDescriptorFieldsAddr));
+
+ // Submit the entire buffer to device
+ int SubmitResult = Device.submitData(TgtPtrAddr, DataBuffer, HstPtrSize,
+ AsyncInfo, PtrTPR.getEntry());
+
+ AsyncInfo.addPostProcessingFunction([DataBuffer]() -> int {
+ delete[] DataBuffer;
+ return OFFLOAD_SUCCESS;
+ });
+ return HandleSubmitResult(SubmitResult);
+}
+
/// Internal function to do the mapping and transfer the data to the device
int targetDataBegin(ident_t *Loc, DeviceTy &Device, int32_t ArgNum,
void **ArgsBase, void **Args, int64_t *ArgSizes,
int64_t *ArgTypes, map_var_info_t *ArgNames,
void **ArgMappers, AsyncInfoTy &AsyncInfo,
- bool FromMapper) {
+ AttachInfoTy *AttachInfo, bool FromMapper) {
+ assert(AttachInfo && "AttachInfo must be available for targetDataBegin for "
+ "handling ATTACH map-types.");
// process each input.
for (int32_t I = 0; I < ArgNum; ++I) {
// Ignore private variables and arrays - there is no mapping for them.
@@ -352,7 +508,7 @@ int targetDataBegin(ident_t *Loc, DeviceTy &Device, int32_t ArgNum,
map_var_info_t ArgName = (!ArgNames) ? nullptr : ArgNames[I];
int Rc = targetDataMapper(Loc, Device, ArgsBase[I], Args[I], ArgSizes[I],
ArgTypes[I], ArgName, ArgMappers[I], AsyncInfo,
- targetDataBegin);
+ targetDataBegin, AttachInfo);
if (Rc != OFFLOAD_SUCCESS) {
REPORT("Call to targetDataBegin via targetDataMapper for custom mapper"
@@ -369,6 +525,18 @@ int targetDataBegin(ident_t *Loc, DeviceTy &Device, int32_t ArgNum,
int64_t DataSize = ArgSizes[I];
map_var_info_t HstPtrName = (!ArgNames) ? nullptr : ArgNames[I];
+ // ATTACH map-types are supposed to be handled after all mapping for the
+ // construct is done. Defer their processing.
+ if (ArgTypes[I] & OMP_TGT_MAPTYPE_ATTACH) {
+ AttachInfo->AttachEntries.emplace_back(
+ /*PointerBase=*/HstPtrBase, /*PointeeBegin=*/HstPtrBegin,
+ /*PointerSize=*/DataSize, /*MapType=*/ArgTypes[I],
+ /*PointeeName=*/HstPtrName);
+
+ DP("Deferring ATTACH map-type processing for argument %d\n", I);
+ continue;
+ }
+
// Adjust for proper alignment if this is a combined entry (for structs).
// Look at the next argument - if that is MEMBER_OF this one, then this one
// is a combined entry.
@@ -434,6 +602,11 @@ int targetDataBegin(ident_t *Loc, DeviceTy &Device, int32_t ArgNum,
: "device failure or illegal mapping");
return OFFLOAD_FAIL;
}
+
+ // Track new allocation, for eventual use in attachment decision-making.
+ if (PointerTpr.Flags.IsNewEntry && !IsHostPtr)
+ AttachInfo->NewAllocations[HstPtrBase] = sizeof(void *);
+
DP("There are %zu bytes allocated at target address " DPxMOD " - is%s new"
"\n",
sizeof(void *), DPxPTR(PointerTgtPtrBegin),
@@ -464,6 +637,11 @@ int targetDataBegin(ident_t *Loc, DeviceTy &Device, int32_t ArgNum,
: "device failure or illegal mapping");
return OFFLOAD_FAIL;
}
+
+ // Track new allocation, for eventual use in attachment decision-making.
+ if (TPR.Flags.IsNewEntry && !IsHostPtr && TgtPtrBegin)
+ AttachInfo->NewAllocations[HstPtrBegin] = DataSize;
+
DP("There are %" PRId64 " bytes allocated at target address " DPxMOD
" - is%s new\n",
DataSize, DPxPTR(TgtPtrBegin), (TPR.Flags.IsNewEntry ? "" : " not"));
@@ -476,30 +654,12 @@ int targetDataBegin(ident_t *Loc, DeviceTy &Device, int32_t ArgNum,
}
if (ArgTypes[I] & OMP_TGT_MAPTYPE_PTR_AND_OBJ && !IsHostPtr) {
-
- uint64_t Delta = (uint64_t)HstPtrBegin - (uint64_t)HstPtrBase;
- void *ExpectedTgtPtrBase = (void *)((uint64_t)TgtPtrBegin - Delta);
-
- if (PointerTpr.getEntry()->addShadowPointer(ShadowPtrInfoTy{
- (void **)PointerHstPtrBegin, HstPtrBase,
- (void **)PointerTgtPtrBegin, ExpectedTgtPtrBase})) {
- DP("Update pointer (" DPxMOD ") -> [" DPxMOD "]\n",
- DPxPTR(PointerTgtPtrBegin), DPxPTR(TgtPtrBegin));
-
- void *&TgtPtrBase = AsyncInfo.getVoidPtrLocation();
- TgtPtrBase = ExpectedTgtPtrBase;
-
- int Ret =
- Device.submitData(PointerTgtPtrBegin, &TgtPtrBase, sizeof(void *),
- AsyncInfo, PointerTpr.getEntry());
- if (Ret != OFFLOAD_SUCCESS) {
- REPORT("Copying data to device failed.\n");
- return OFFLOAD_FAIL;
- }
- if (PointerTpr.getEntry()->addEventIfNecessary(Device, AsyncInfo) !=
- OFFLOAD_SUCCESS)
- return OFFLOAD_FAIL;
- }
+ int Ret = performPointerAttachment(
+ Device, AsyncInfo, (void **)PointerHstPtrBegin, HstPtrBase,
+ HstPtrBegin, (void **)PointerTgtPtrBegin, TgtPtrBegin, sizeof(void *),
+ PointerTpr);
+ if (Ret != OFFLOAD_SUCCESS)
+ return OFFLOAD_FAIL;
}
// Check if variable can be used on the device:
@@ -515,6 +675,187 @@ int targetDataBegin(ident_t *Loc, DeviceTy &Device, int32_t ArgNum,
return OFFLOAD_SUCCESS;
}
+/// Process deferred ATTACH map entries collected during targetDataBegin.
+///
+/// From OpenMP's perspective, when mapping something that has a base pointer,
+/// such as:
+/// ```cpp
+/// int *p;
+/// #pragma omp enter target data map(to: p[10:20])
+/// ```
+///
+/// a pointer-attachment between p and &p[10] should occur if both p and
+/// p[10] are present on the device after doing all allocations for all maps
+/// on the construct, and one of the following is true:
+///
+/// * The pointer p was newly allocated while handling the construct
+/// * The pointee p[10:20] was newly allocated while handling the construct
+/// * attach(always) map-type modifier was specified (OpenMP 6.1)
+///
+/// That's why we collect all attach entries and new memory allocations during
+/// targetDataBegin, and use that information to make the decision of whether
+/// to perform a pointer-attachment or not here, after maps have been handled.
+///
+/// Additionally, once we decide that a pointer-attachment should be performed,
+/// we need to make sure that it happens after any previously submitted data
+/// transfers have completed, to...
[truncated]
|
offload/libomptarget/omptarget.cpp
Outdated
assert(HstPtrSize >= VoidPtrSize && "PointerSize is too small"); | ||
|
||
uint64_t Delta = (uint64_t)HstPteeBegin - (uint64_t)HstPteeBase; | ||
void *TgtPteeBase = (void *)((uint64_t)TgtPteeBegin - Delta); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No C-style casts please. And I'm guessing ptee
is pointee
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, Ptee is pointee.
// For larger "pointers" (like Fortran's descriptors), we create a dynamic | ||
// buffer, which will be eventually destroyed by AsyncInfo's post-processing | ||
// callback. | ||
char *DataBuffer = new char[HstPtrSize]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can probably also use unique pointer for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't sumbitData be asynchronous? In that case, we need to ensure that this stays alive until that is done.
That's why it's currently being deleted in a post-processing callback, similar to
delete &EntriesInfo; |
/// | ||
/// For this purpose, we insert a data_fence before the first | ||
/// pointer-attachment, (3), to ensure that all pending transfers finish first. | ||
int processAttachEntries(DeviceTy &Device, AttachInfoTy &AttachInfo, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This smells like the type of action you'd put in a queue with a dependency on the kernel. Is it possible we can do that instead of calling it directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pointer-attachments have to be done before the kernel launch. But after handling the other maps.
Additionally, these also apply to the other directives like target enter data
and target data
, that don't have an associated kernel to launch.
Also, this function is doing similar work as targetDataBegin
-- looking at the maps and queuing up any data-transfers that apply, so calling it just after that seemed like the right thing to do.
What's your opinion?
@@ -2538,6 +2538,10 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy { | |||
getAgent(), (uint64_t)Size); | |||
} | |||
|
|||
Error dataFence(__tgt_async_info *Async) override { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are all just stubs, why do we need these in this patch? I'm not familiar with that this is supposed to do either, I figured fencing like this came from the signals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The requirement for pointer-attachments is that we should wait for any pending data-transfers before submitting those for the pointer-attachments, so as not to let the previous transfers clobber a pointer-attachment.
@adurang added the plugin changes, so he can probably clarify, but the implementation of dataFence
is supposed to vary from device to device. For example, if a device supports device-side barriers, then such a barrier can be inserted into its data-transfer queue, so that anything that precedes it happens before anything after the fence.
For devices that do in-order data-transfers by default, this is a no-op, hence, the dataFence
simply returns.
But if a device supports out-of-order data-transfers, then it should be implemented for that device, with whichever mechanism the device supports.
If you think the plugin changes are better off in a separate PR, then Alex can help create one, and we can rebase this once the plugins PR is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's part of why I'm wondering if you could just set this up to follow the previous action somehow, since lgoically that's what a queue should handle implicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you assuming things are ordered in the queue? Currently, libomptarget makes no assumptions about tha so it always has the proper logic to work in-order and out-of-order (we do a lot of out-of-order operations in our plugin).
Or are you talking about connecting the previous operations with events somehow? That would be great but I don't think the current interfaces are great for that and I feel it would require way more changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for chiming in, Alex.
Joseph, are you suggesting creating a new queue for the ATTACH data-transfers? If that is the case, there is no guarantee that the previous data-transfers would have finished. Which would require us doing an explicit synchronize
between the two queues, which would not be good.
Otherwise, if we are to reuse the same queue as previous submitData
s, then as Alex mentioned, we cannot assume that the queue is doing in-order transfers.
If we are talking aboud connecting previous submitData
s with the ATTACH-specific submitData
s using events, that seems like an overkill, because that would require us to create events for every submitData
done by targetDataBegin
, and then selectively wait for those that the ATTACH entries may overlap with, when doing a submitData
for them. That's because when targetDataBegin
is being run, we don't know which data-transfers may eventually collide with an ATTACH-related transfer.
The data_fence
approach, should be the lowest-impact, most asynchronous approac, since if the underlying queue is in-order, it would be a no-op, and if it's not, and the queue supports inserion of barriers, then that would be the next option, otherwise, the device can choose to do a full synchronize if it doesn't support fences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review!
offload/libomptarget/omptarget.cpp
Outdated
assert(HstPtrSize >= VoidPtrSize && "PointerSize is too small"); | ||
|
||
uint64_t Delta = (uint64_t)HstPteeBegin - (uint64_t)HstPteeBase; | ||
void *TgtPteeBase = (void *)((uint64_t)TgtPteeBegin - Delta); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, Ptee is pointee.
// For larger "pointers" (like Fortran's descriptors), we create a dynamic | ||
// buffer, which will be eventually destroyed by AsyncInfo's post-processing | ||
// callback. | ||
char *DataBuffer = new char[HstPtrSize]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't sumbitData be asynchronous? In that case, we need to ensure that this stays alive until that is done.
That's why it's currently being deleted in a post-processing callback, similar to
delete &EntriesInfo; |
/// | ||
/// For this purpose, we insert a data_fence before the first | ||
/// pointer-attachment, (3), to ensure that all pending transfers finish first. | ||
int processAttachEntries(DeviceTy &Device, AttachInfoTy &AttachInfo, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pointer-attachments have to be done before the kernel launch. But after handling the other maps.
Additionally, these also apply to the other directives like target enter data
and target data
, that don't have an associated kernel to launch.
Also, this function is doing similar work as targetDataBegin
-- looking at the maps and queuing up any data-transfers that apply, so calling it just after that seemed like the right thing to do.
What's your opinion?
@@ -2538,6 +2538,10 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy { | |||
getAgent(), (uint64_t)Size); | |||
} | |||
|
|||
Error dataFence(__tgt_async_info *Async) override { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The requirement for pointer-attachments is that we should wait for any pending data-transfers before submitting those for the pointer-attachments, so as not to let the previous transfers clobber a pointer-attachment.
@adurang added the plugin changes, so he can probably clarify, but the implementation of dataFence
is supposed to vary from device to device. For example, if a device supports device-side barriers, then such a barrier can be inserted into its data-transfer queue, so that anything that precedes it happens before anything after the fence.
For devices that do in-order data-transfers by default, this is a no-op, hence, the dataFence
simply returns.
But if a device supports out-of-order data-transfers, then it should be implemented for that device, with whichever mechanism the device supports.
If you think the plugin changes are better off in a separate PR, then Alex can help create one, and we can rebase this once the plugins PR is merged.
__tgt_async_info *AsyncInfo) { | ||
auto Err = getDevice(DeviceId).dataFence(AsyncInfo); | ||
if (Err) { | ||
REPORT("Failure to place data fence on device %d: %s\n", DeviceId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error messages should start with lowercase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This currently aligns with the casing of messages like line 2226, 1981 etc. Is lower-case the new style going forward? @adurang, FYI.
This patch introduces libomptarget support for the ATTACH map-type, which can be used to implement OpenMP conditional compliant pointer attachment, based on whether the pointer/pointee is newly mapped on a given construct.
For example, for the following:
The following maps can be emitted by clang:
Without this map-type, these two possible maps could be emitted by clang:
(B) does not perform any pointer attachment, while (C) also maps the pointer p, which are both incorrect.
In terms of implementation, maps with the ATTACH map-type are handled after all other maps have been processed, as it requires knowledge of which new allocations happened as part of the construct. As per OpenMP 5.0, an attachment should happen only when either the pointer or the pointee was newly mapped while handling the construct.
Maps with ATTACH map-type-bit do not increase/decrease the ref-count.
With OpenMP 6.1,
attach(always/never)
can be used to force/prevent attachment. Forattach(always)
, the compiler will insert the ALWAYS map-type, which would let libomptarget bypass the check about one of the pointer/pointee being new. Withattach(never)
, the ATTACH map will not be emitted at all.The size argument of the ATTACH map-type can specify values greater than
sizeof(void*)
which can be used to support pointer attachment on Fortran descriptors. Note that this also requires shadow-pointer tracking to also support them. That has not been implemented in this patch.This was worked upon in coordination with Ravi Narayanaswamy, who has since retired. Happy retirement, Ravi!
Co-authored-by: Alex Duran [email protected]