As stated in https://devtalk.nvidia.com/default/topic/790095/embedded-systems/cuda-aware-openmpi-fails-on-jetson-k1/ the cuda-aware mpi fails after finishing the programm correctly with cuMemHostRegister errors. I dunno wether its a jetson, cuda or ompi related bug. The bug does not occur on a usual x86 desktop ubuntu pc.