sycl::half operators like
|
*this = operator float() + static_cast<float>(rhs); |
are converting to float for operations.
We can specialize the cuda backend case in order to call appropriate instructions for f16 precision operators instead, but I opened this issue to raise awareness for other backends/ to get potentially a general solution that lowers to backend specific instructions.