Discussion in https://github.com/pytorch/pytorch/issues/15864 The operator should have both CPU and CUDA implementations.