-
Notifications
You must be signed in to change notification settings - Fork 432
Fix (log)(c)cdf with Inf, -Inf and NaN
#1348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Note: The simplified implementation for |
|
It's there something to look out for in the review? |
|
Hmm I'm not sure. Hopefully I could convey the main ideas in the comment above and then I guess it depends on which points you want to examine or discuss more carefully. I assume that src/univariates.jl should show the main structure. The changes in src/truncate.jl, src/univariate/locationscale.jl and src/mixtures/... show how it simplifies (and fixes) the existing implementation. And finally the changes in src/univariate/... fix and simplify different (log)(c)cdf implementations. I extended the tests and (log)(c)cdf are tested for all modified distributions (I only changed these distributions since tests failed), so I am quite confident that the changes are correct. But it's always better if someone else checks it as well 🙂 |
Codecov Report
@@ Coverage Diff @@
## master #1348 +/- ##
==========================================
+ Coverage 82.23% 82.73% +0.50%
==========================================
Files 116 116
Lines 6635 6672 +37
==========================================
+ Hits 5456 5520 +64
+ Misses 1179 1152 -27
Continue to review full report at Codecov.
|
|
The LogExpFunctions PR was merged and released, so tests pass now. |
This PR simplifies the
(log)(c)cdfimplementations and fixes the evaluation with-Inf,Inf, andNaN.There are some problems with the current implementation:
NaNconsistently: e.g.cdf(d::DiscreteUnivariateDistribution, x::Real) = cdf(d, floor(Int, x))etc and a default implementation forcdf(::DiscreteUnivariateDistribution, x::Integer). This is problematic sincecdf(d, x::Real)andcdf(d, x::Integer)for e.g. discreteLocationScale,TruncatedandMixtureModelandDiscreteNonParametricto avoid method ambiguity errorscdffor integers is defined that might not make sense or in the worst case silently produce incorrect results, e.g.logcdf(Dirac(3.4), 3.5)is forwarded tologcdf(Dirac(3.4), 3)which then callslog(cdf(Dirac(3.4, 3))) = log(0) = -Inf). These bugs are difficult to discover and to avoid when implementing a discrete distribution (I fixed some of these forDiscreteNonParametricin GeneralizeLocationScaleto discrete distributions #1286).cdf(d, -Inf),cdf(d, Inf), or `cdf(d, NaN) with these definitions: e.g.Poissonthis can be fixed by generalizing the StatsFuns macro to inputs of typeRealbut this is still an issue for other native implementations such asCategoricalfor which the same errors are thrown. The evaluation ofcdf(d, -Inf)orcdf(d, Inf)is useful e.g. in the construction of truncated distributions - it allows to remove the current type heuristics (not included in this PR).The PR fixes these problems by
cdf(d::DiscreteUnivariateDistribution, x::Real)but notcdf(::DiscreteUnivariateDistribution, x::Integer), similar to [Breaking] Fix inconsistent fallback behaviour of logpdf and pdf #1171cdf_int(d, x)which assumes integer-valued support but not inputs of typeIntcdf_int(d, x)handlesInf,-Inf, andNaNand callscdf(d, floor(Int, x))for other values (not implemented!)cdf(d, ::Int)and gets support for real values includingInf,-Inf, andNaNfor freecdf(d, ::Real)but notcdf(d, ::Int)(and there exists no incorrect default implementation)cdf(d, ::Int)explicitlyintegerunitrange_cdfetc. instead of implementingcdf(d, ::Int)from scratch if the distribution has a unitrange of integers as support (currently the default implementation ofcdf(d, ::Int))Truncated,LocationScaleandMixtureModelcan be simplified.cdf(d, x::Real)etc. instead ofcdf(d, x::Int)for discrete distributionsNaN,Inf, and-Infcorrectly and can deal with non-integer inputs (hopefully soon there will be many more native implementations: Use julia implementations for pdfs and some cdf-like functions StatsFuns.jl#113)(log)(c)cdfof many univariate distributions