This repository was archived by the owner on Aug 7, 2024. It is now read-only.

Description
Summary
There are two components to this, non_saturated casting and saturated casting.
Non-Saturated casting
- We are currently using bit logic to cast from fp32 to fp8 where as there exists intrinsics to perform the same, see Nikitas comment below.
- Currently for fp16 -> fp8 casting we actually first rescaled fp16 to fp32 and then recast to fp8.
Saturated Casting