Win32 unconstrained failure #22

ChrisRackauckas · 2017-07-31T18:06:56Z

opening this so it's easier to track the printing since AppVeyor doesn't run non-PRs by default.

ChrisRackauckas · 2017-07-31T18:32:48Z

println(u[60:164] - u2[60:164])
[1.0e-10, 1.5e-10, -1.0e-10, 1.3e-9, -4.78e-9, 6.3e-9, -9.993e-9, -7.61008e-9, 1.82e-11, 8.9333e-9, 7.27848e-9, -2.05748e-9, -1.91583e-9, 2.0198e-11, 1.66183e-10, 1.42968e-11, -1.33063e-11, -4.85546e-12, 9.48617e-13, 3.32006e-13, -7.16681e-14, -9.91256e-16, 1.02102e-15, -1.43654e-15, -5.80553e-15, -1.67086e-14, -2.6155e-14, -1.30171e-14, 1.30146e-14, 2.15714e-14, 1.57331e-14, 8.51615e-15, 3.41339e-15, 4.3932e-16, -1.24052e-15, -2.10277e-15, -2.21162e-15, -1.648e-15, -7.19e-16, 1.52e-16, 6.77e-16, 8.087e-16, 6.61e-16, 3.878e-16, 1.116e-16, -9.45e-17, -2.017e-16, -2.16e-16, -1.649e-16, -8.51e-17, -9.0e-18, 4.28e-17, 6.43e-17, 6.01e-17, 4.06e-17, 1.67e-17, -3.3e-18, -1.54e-17, -1.89e-17, -1.59e-17, -9.4e-18, -2.7e-18, 2.5e-18, 5.1e-18, 5.4e-18, 4.1e-18, 2.1e-18, 2.0e-19, -1.0e-18, -1.6e-18, -1.4e-18, -1.0e-18, -4.0e-19, 1.0e-19, 4.0e-19, 5.0e-19, 4.0e-19, 2.0e-19, 0.0, -1.0e-19, -1.0e-19, -2.0e-19, -1.0e-19, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.01652e-15]

println(t[60:164]-t2[60:164])
[0.0, 0.0, 0.0, -4.0e-5, 0.00024, 0.00671, -0.00564, -0.01308, -0.00187, -0.07467, -0.22029, -0.27111, -0.4239, -0.44788, -0.44809, -0.41139, -0.42428, -0.61093, -0.4925, -0.7261, -0.7047, -1.0126, -1.0454, -1.4039, -1.4039, -1.3675, -1.2387, -1.0465, -0.8673, -0.7418, -0.681, -0.6776, -0.7136, -0.7654, -0.8115, -0.8404, -0.8501, -0.8446, -0.8311, -0.816, -0.8042, -0.7976, -0.7963, -0.7988, -0.8031, -0.8072, -0.8102, -0.8116, -0.8114, -0.8105, -0.8092, -0.8081, -0.8074, -0.8072, -0.8073, -0.8076, -0.808, -0.8083, -0.8084, -0.8085, -0.8084, -0.8083, -0.8082, -0.8082, -0.8081, -0.8081, -0.8081, -0.8082, -0.8081, -0.8081, -0.8082, -0.8082, -0.8082, -0.8082, -0.8082, -0.8082, -0.8081, -0.8081, -0.8082, -0.8082, -0.8082, -0.8082, -0.8081, -0.8081, -0.8082, -0.8082, -0.8082, -0.8082, -0.8081, -0.8081, -0.8082, -0.8082, -0.8082, -0.8082, -0.8081, -0.8081, -0.8082, -0.8082, -0.8082, -0.8082, -0.8081, -0.8081, -0.8082, -0.8082, -0.2011]

us diverge at 60 while ts diverge at 63: this is a difference due to the u calculation and not the timestepping.

devmotion · 2017-07-31T18:33:33Z

I did also some tests and printing here: https://ci.appveyor.com/project/devmotion/delaydiffeq-jl-p4v3l/build/job/mh7o7u4kwx0470xx It seems

 alg4 = MethodOfSteps(DP5(), constrained=false, max_fixedpoint_iters=100,
                      fixedpoint_abstol=1e-12, fixedpoint_reltol=1e-12)
 sol4 = solve(prob, alg4)

yields the same time steps and approximately the same values, whereas

alg1 = MethodOfSteps(Tsit5(), constrained=false, max_fixedpoint_iters=100,
                      fixedpoint_abstol=1e-12, fixedpoint_reltol=1e-12)
sol1 = solve(prob, alg1)

leads to different time steps and different values. Hence also the other error margins of this group on my computer

abs(sol1[end] - sol2[end]) = 2.418942752323824e-11
abs(sol1[end] - sol3[end]) = 1.2170867856831081e-14
abs(sol1[end] - sol4[end]) = 7.581097726346565e-15

are all different from the errors on Windows 32bit:

abs(sol1[end] - sol2[end]) = 2.4185782736594985e-11
abs(sol1[end] - sol3[end]) = 8.526081213579137e-15
abs(sol1[end] - sol4[end]) = 1.1225840113190541e-14

Interestingly the errors of the group above with one delay are on my computer

abs(sol1[end] - sol2[end]) = 0.0005272600653263293
abs(sol1[end] - sol3[end]) = 4.101525119537541e-8
abs(sol1[end] - sol4[end]) = 0.0002801627663326922

and on Windows 32bit

abs(sol1[end] - sol2[end]) = 0.0005272600653263293
abs(sol1[end] - sol3[end]) = 4.101525119537541e-8
abs(sol1[end] - sol4[end]) = 0.0002801627663326922

but this example is rather boring since it stays constant for most of the time.

ChrisRackauckas · 2017-07-31T18:40:22Z

So yeah, I think that means it's Tsit5-specific. So it must be in one of these two spots:

https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/master/src/integrators/low_order_rk_integrators.jl#L240
https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/master/src/dense/interpolants.jl#L166

I don't see any integers involved though...

devmotion · 2017-08-01T11:06:09Z

I did some quite detailed printing, which ended up with these absurdly long logs at https://ci.appveyor.com/project/devmotion/delaydiffeq-jl-p4v3l/build/job/so57rm2m0m4d9cnp and https://ci.appveyor.com/project/devmotion/delaydiffeq-jl-p4v3l/build/job/1nhtli7vngkn69qi for Windows 32bit and Windows 64bit, respectively. Before every calculation of the next step I printed integrator.t and integrator.dt, and immediately after every step among others integrator.u and integrator.k.

For the example of

alg1 = MethodOfSteps(Tsit5(), constrained=false, max_fixedpoint_iters=100,
                      fixedpoint_abstol=1e-12, fixedpoint_reltol=1e-12)
sol1 = solve(prob, alg1)

I discovered the first different numbers in lines 3452: on Windows 32bit I got

(integrator.t, integrator.dt) = (1.845212045373377, 0.19346083182274398)

versus

(integrator.t, integrator.dt) = (1.845212045373377, 0.193460831822744)

on Windows 64bit, leading to slightly different integrator.u in the next step.

For the example of

 alg4 = MethodOfSteps(DP5(), constrained=false, max_fixedpoint_iters=100,
                      fixedpoint_abstol=1e-12, fixedpoint_reltol=1e-12)
 sol4 = solve(prob, alg4)

(which yields approximately same time steps and values) the first difference can be found in lines 6723: on Windows 32bit we get

(integrator.t, integrator.dt) = (3.344248354743839, 0.6194676352549794)

and on Windows 64bit

(integrator.t, integrator.dt) = (3.344248354743839, 0.6194676352549793)

which in turn then also leads to slightly different values of integrator.u in the next step.

So is it possible that these slightly different numbers are caused by https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/master/src/integrators/integrator_utils.jl#L165, and in particular by @fastmath? This might also explain why for a different problem these algorithms lead to the same results on Windows 32bit and 64bit...

I'll try to disable @fastmath with --math-mode=ieee.

devmotion · 2017-08-01T11:29:02Z

No, that does not seem to work. Running the test scripts with C:\projects\julia\bin\julia --math-mode=ieee -e "Pkg.test(\"DelayDiffEq\")" leads to exactly the same numbers.

ChrisRackauckas · 2017-08-01T14:22:01Z

Bizarre. That area does define a few constants using rationals

https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/master/src/alg_utils.jl#L231

but those numbers are all so far away from overflowing that it really shouldn't be due to that? I am utterly confused since those all look like just simple floating point operations. As you point out, the only fact of interest there is that it's one of the few areas using @fastmath.

ChrisRackauckas · 2017-08-01T15:30:31Z

Hmm, can we try running it without @fastmath directly? We can make a branch, check out that branch, and run it from there. Maybe C:\projects\julia\bin\julia --math-mode=ieee -e "Pkg.test(\"DelayDiffEq\")" isn't actually doing what we want.

ChrisRackauckas · 2017-08-02T03:55:24Z

The test failures will go away because of df05da0

interesting there was a flip there. Talking with Simon it sounds like @fastmath could be causing this.

codecov · 2017-08-02T04:14:10Z

Codecov Report

Merging #22 into master will decrease coverage by 0.05%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master      #22      +/-   ##
==========================================
- Coverage   82.37%   82.31%   -0.06%     
==========================================
  Files           9        9              
  Lines         295      294       -1     
==========================================
- Hits          243      242       -1     
  Misses         52       52

Impacted Files	Coverage Δ
src/integrator_interface.jl	`62.5% <0%> (-0.43%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1c0d652...f7ad5f3. Read the comment docs.

coveralls · 2017-08-02T04:32:51Z

Coverage decreased (-0.06%) to 82.313% when pulling 7b4a8e1 on win32_tests into 1c0d652 on master.

coveralls · 2017-08-02T05:00:08Z

Coverage decreased (-0.06%) to 82.313% when pulling c51935f on win32_tests into 1c0d652 on master.

coveralls · 2017-08-02T05:13:32Z