Deprecate MultiHeadAttention since tensorflow/tensorflow@f32c80b is merged.
I'm quite confused what our roadmap is. Should we alias/wrap the functionality to core TF or just remove it? In gelu
case, the default argument is changed, and for MultiHeadAttention
, the input signature is totally different.