@@ -106,34 +106,54 @@ floatmax(::Type{T}) where {T <: FixedPoint} = typemax(T)
106106
107107
108108"""
109- floattype(::Type{T})
109+ floattype(::Type{T})::Type{<:AbstractFloat}
110110
111- Return the minimum float type that represents `T` without overflow to `Inf` .
111+ Return a minimal type suitable for performing computations with instances of type `T` without integer overflow .
112112
113- # Example
113+ The fallback definition of `floattype(T)` applies only to `T<:AbstractFloat`.
114+ However, it is permissible to extend `floattype` to return types that are not subtypes of
115+ `AbstractFloat`; the key characteristic is that the return type should support computation without integer overflow.
116+
117+ In general the returned type should have the minimum bitwidth needed to encode the full precision of the input type.
118+ however, a priority should be placed on computational efficiency; consequently, types like `Float16` should be avoided
119+ except in scenarios where they are guaranteed to have hardware support.
120+
121+ # Examples
114122
115123A classic usage is to avoid overflow behavior by promoting `FixedPoint` to `AbstractFloat`
116124
117- ```julia
125+ ```jldoctest
118126julia> x = N0f8(1.0)
1191271.0N0f8
120128
121129julia> x + x # overflow
1221300.996N0f8
123131
124- julia> float_x = floattype(eltype(x)) (x)
125- 1.0f0
132+ julia> T = floattype(x)
133+ Float32
126134
127- julia> float_x + float_x
135+ julia> T(x) + T(x)
1281362.0f0
129137```
138+
139+ The following represents a valid extension of `floattype` to non-AbstractFloats:
140+
141+ ```julia
142+ julia> using FixedPointNumbers, ColorTypes
143+
144+ julia> floattype(RGB{N0f8})
145+ RGB{Float32}
146+ ```
147+
148+ `RGB` itself is not a subtype of `AbstractFloat`, but unlike `RGB{N0f8}` operations with `RGB{Float32}` are not subject to integer overflow.
130149"""
131- floattype (:: Type{T} ) where {T <: Real } = T # fallback
150+ floattype (:: Type{T} ) where {T <: AbstractFloat } = T # fallback (we want a MethodError if no method producing AbstractFloat is defined)
132151floattype (:: Type{T} ) where {T <: Union{ShortInts, Bool} } = Float32
133152floattype (:: Type{T} ) where {T <: Integer } = Float64
134153floattype (:: Type{T} ) where {T <: LongInts } = BigFloat
135154floattype (:: Type{X} ) where {T <: ShortInts , X <: FixedPoint{T} } = Float32
136155floattype (:: Type{X} ) where {T <: Integer , X <: FixedPoint{T} } = Float64
156+ floattype (:: Type{X} ) where {T <: Integer , X <: Rational{T} } = typeof (zero (T)/ oneunit (T))
137157floattype (:: Type{X} ) where {T <: LongInts , X <: FixedPoint{T} } = BigFloat
138158
139159float (x:: FixedPoint ) = convert (floattype (x), x)
0 commit comments