Memory patcher conflicts with OMPI and libfabric

We had [ported OMPI’s patcher code](https://github.com/ofiwg/libfabric/pull/6264) to libfabric not too long ago to get notifications about memory events for its registration cache.  Libfabric also has a userfaultfd notifier, but it defaults to the patcher-based “memhooks” notifier given the additional coverage it provides. Given how patcher uses jump statements to patch the various calls, we can not really stack multiple patcher hooks atop each other. When testing the OFI BTL with EFA, we observed silent data corruption with some benchmarks and root caused it to libfabric using stale registrations having failed to invalidate entries after OMPI’s patcher taking over the hooks.

UCX has a mechanism to take in external events from applications, `ucm_set_external_event()`. OMPI uses this mechanism in the UCX PML to invoke `ucm_vm_munmap()` from a unmap callback based on its internal memory hook events. We can achieve a similar workflow with libfabric's `FI_MR_MMU_NOTIFY` mode. While this mode was designed with a different use-case in mind (allowing registrations that are not backed by physical pages), we should be able to use it in conjunction with `fi_mr_refresh()` to take external events from applications like OMPI. To be specific, libfabric providers can make `FI_MR_MMU_NOTIFY` a soft requirement. If an application does not support it, they can continue using memhooks as is. If an application like OMPI does support that mode, providers can rely on fi_mr_refresh() notifications in place of the internal monitor (or perhaps in addition to userfaultfd) to determine when to evict an entry from the cache. 

On the Open MPI side, the provider query logic will have to set FI_MR_MMU_NOTIFY mode and call fi_mr_refresh() on unmap events. Rcache uses the OPAL memory hooks, so it registers a callback to react to memory events (mca_rcache_base_mem_cb). For what the OFI components need, we can register another OFI-specific callback from common OFI code used both by the BTL and the MTL (which will eventually want to use a cache for CUDA buffers given the cost of querying CUDA buffer attributes). This OFI-specific callback can then directly pass on the notification to libfabric via fi_mr_refresh(). This will all be provider-agnostic. 

@open-mpi/ofi @hppritcha @shefty, thoughts?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memory patcher conflicts with OMPI and libfabric #8822

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Memory patcher conflicts with OMPI and libfabric #8822

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions