Skip to content

Conversation

ChrisRackauckas
Copy link
Member

opening this so it's easier to track the printing since AppVeyor doesn't run non-PRs by default.

@ChrisRackauckas
Copy link
Member Author

println(u[60:164] - u2[60:164])
[1.0e-10, 1.5e-10, -1.0e-10, 1.3e-9, -4.78e-9, 6.3e-9, -9.993e-9, -7.61008e-9, 1.82e-11, 8.9333e-9, 7.27848e-9, -2.05748e-9, -1.91583e-9, 2.0198e-11, 1.66183e-10, 1.42968e-11, -1.33063e-11, -4.85546e-12, 9.48617e-13, 3.32006e-13, -7.16681e-14, -9.91256e-16, 1.02102e-15, -1.43654e-15, -5.80553e-15, -1.67086e-14, -2.6155e-14, -1.30171e-14, 1.30146e-14, 2.15714e-14, 1.57331e-14, 8.51615e-15, 3.41339e-15, 4.3932e-16, -1.24052e-15, -2.10277e-15, -2.21162e-15, -1.648e-15, -7.19e-16, 1.52e-16, 6.77e-16, 8.087e-16, 6.61e-16, 3.878e-16, 1.116e-16, -9.45e-17, -2.017e-16, -2.16e-16, -1.649e-16, -8.51e-17, -9.0e-18, 4.28e-17, 6.43e-17, 6.01e-17, 4.06e-17, 1.67e-17, -3.3e-18, -1.54e-17, -1.89e-17, -1.59e-17, -9.4e-18, -2.7e-18, 2.5e-18, 5.1e-18, 5.4e-18, 4.1e-18, 2.1e-18, 2.0e-19, -1.0e-18, -1.6e-18, -1.4e-18, -1.0e-18, -4.0e-19, 1.0e-19, 4.0e-19, 5.0e-19, 4.0e-19, 2.0e-19, 0.0, -1.0e-19, -1.0e-19, -2.0e-19, -1.0e-19, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.01652e-15]

println(t[60:164]-t2[60:164])
[0.0, 0.0, 0.0, -4.0e-5, 0.00024, 0.00671, -0.00564, -0.01308, -0.00187, -0.07467, -0.22029, -0.27111, -0.4239, -0.44788, -0.44809, -0.41139, -0.42428, -0.61093, -0.4925, -0.7261, -0.7047, -1.0126, -1.0454, -1.4039, -1.4039, -1.3675, -1.2387, -1.0465, -0.8673, -0.7418, -0.681, -0.6776, -0.7136, -0.7654, -0.8115, -0.8404, -0.8501, -0.8446, -0.8311, -0.816, -0.8042, -0.7976, -0.7963, -0.7988, -0.8031, -0.8072, -0.8102, -0.8116, -0.8114, -0.8105, -0.8092, -0.8081, -0.8074, -0.8072, -0.8073, -0.8076, -0.808, -0.8083, -0.8084, -0.8085, -0.8084, -0.8083, -0.8082, -0.8082, -0.8081, -0.8081, -0.8081, -0.8082, -0.8081, -0.8081, -0.8082, -0.8082, -0.8082, -0.8082, -0.8082, -0.8082, -0.8081, -0.8081, -0.8082, -0.8082, -0.8082, -0.8082, -0.8081, -0.8081, -0.8082, -0.8082, -0.8082, -0.8082, -0.8081, -0.8081, -0.8082, -0.8082, -0.8082, -0.8082, -0.8081, -0.8081, -0.8082, -0.8082, -0.8082, -0.8082, -0.8081, -0.8081, -0.8082, -0.8082, -0.2011]

us diverge at 60 while ts diverge at 63: this is a difference due to the u calculation and not the timestepping.

@devmotion
Copy link
Member

I did also some tests and printing here: https://ci.appveyor.com/project/devmotion/delaydiffeq-jl-p4v3l/build/job/mh7o7u4kwx0470xx It seems

 alg4 = MethodOfSteps(DP5(), constrained=false, max_fixedpoint_iters=100,
                      fixedpoint_abstol=1e-12, fixedpoint_reltol=1e-12)
 sol4 = solve(prob, alg4)

yields the same time steps and approximately the same values, whereas

alg1 = MethodOfSteps(Tsit5(), constrained=false, max_fixedpoint_iters=100,
                      fixedpoint_abstol=1e-12, fixedpoint_reltol=1e-12)
sol1 = solve(prob, alg1)

leads to different time steps and different values. Hence also the other error margins of this group on my computer

abs(sol1[end] - sol2[end]) = 2.418942752323824e-11
abs(sol1[end] - sol3[end]) = 1.2170867856831081e-14
abs(sol1[end] - sol4[end]) = 7.581097726346565e-15

are all different from the errors on Windows 32bit:

abs(sol1[end] - sol2[end]) = 2.4185782736594985e-11
abs(sol1[end] - sol3[end]) = 8.526081213579137e-15
abs(sol1[end] - sol4[end]) = 1.1225840113190541e-14

Interestingly the errors of the group above with one delay are on my computer

abs(sol1[end] - sol2[end]) = 0.0005272600653263293
abs(sol1[end] - sol3[end]) = 4.101525119537541e-8
abs(sol1[end] - sol4[end]) = 0.0002801627663326922

and on Windows 32bit

abs(sol1[end] - sol2[end]) = 0.0005272600653263293
abs(sol1[end] - sol3[end]) = 4.101525119537541e-8
abs(sol1[end] - sol4[end]) = 0.0002801627663326922

but this example is rather boring since it stays constant for most of the time.

@ChrisRackauckas
Copy link
Member Author

So yeah, I think that means it's Tsit5-specific. So it must be in one of these two spots:

https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/master/src/integrators/low_order_rk_integrators.jl#L240
https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/master/src/dense/interpolants.jl#L166

I don't see any integers involved though...

@devmotion
Copy link
Member

devmotion commented Aug 1, 2017

I did some quite detailed printing, which ended up with these absurdly long logs at https://ci.appveyor.com/project/devmotion/delaydiffeq-jl-p4v3l/build/job/so57rm2m0m4d9cnp and https://ci.appveyor.com/project/devmotion/delaydiffeq-jl-p4v3l/build/job/1nhtli7vngkn69qi for Windows 32bit and Windows 64bit, respectively. Before every calculation of the next step I printed integrator.t and integrator.dt, and immediately after every step among others integrator.u and integrator.k.

For the example of

alg1 = MethodOfSteps(Tsit5(), constrained=false, max_fixedpoint_iters=100,
                      fixedpoint_abstol=1e-12, fixedpoint_reltol=1e-12)
sol1 = solve(prob, alg1)

I discovered the first different numbers in lines 3452: on Windows 32bit I got

(integrator.t, integrator.dt) = (1.845212045373377, 0.19346083182274398)

versus

(integrator.t, integrator.dt) = (1.845212045373377, 0.193460831822744)

on Windows 64bit, leading to slightly different integrator.u in the next step.

For the example of

 alg4 = MethodOfSteps(DP5(), constrained=false, max_fixedpoint_iters=100,
                      fixedpoint_abstol=1e-12, fixedpoint_reltol=1e-12)
 sol4 = solve(prob, alg4)

(which yields approximately same time steps and values) the first difference can be found in lines 6723: on Windows 32bit we get

(integrator.t, integrator.dt) = (3.344248354743839, 0.6194676352549794)

and on Windows 64bit

(integrator.t, integrator.dt) = (3.344248354743839, 0.6194676352549793)

which in turn then also leads to slightly different values of integrator.u in the next step.

So is it possible that these slightly different numbers are caused by https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/master/src/integrators/integrator_utils.jl#L165, and in particular by @fastmath? This might also explain why for a different problem these algorithms lead to the same results on Windows 32bit and 64bit...

I'll try to disable @fastmath with --math-mode=ieee.

@devmotion
Copy link
Member

No, that does not seem to work. Running the test scripts with C:\projects\julia\bin\julia --math-mode=ieee -e "Pkg.test(\"DelayDiffEq\")" leads to exactly the same numbers.

@ChrisRackauckas
Copy link
Member Author

Bizarre. That area does define a few constants using rationals

https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/master/src/alg_utils.jl#L231

but those numbers are all so far away from overflowing that it really shouldn't be due to that? I am utterly confused since those all look like just simple floating point operations. As you point out, the only fact of interest there is that it's one of the few areas using @fastmath.

@ChrisRackauckas
Copy link
Member Author

Hmm, can we try running it without @fastmath directly? We can make a branch, check out that branch, and run it from there. Maybe C:\projects\julia\bin\julia --math-mode=ieee -e "Pkg.test(\"DelayDiffEq\")" isn't actually doing what we want.

@ChrisRackauckas
Copy link
Member Author

The test failures will go away because of df05da0

interesting there was a flip there. Talking with Simon it sounds like @fastmath could be causing this.

@codecov
Copy link

codecov bot commented Aug 2, 2017

Codecov Report

Merging #22 into master will decrease coverage by 0.05%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #22      +/-   ##
==========================================
- Coverage   82.37%   82.31%   -0.06%     
==========================================
  Files           9        9              
  Lines         295      294       -1     
==========================================
- Hits          243      242       -1     
  Misses         52       52
Impacted Files Coverage Δ
src/integrator_interface.jl 62.5% <0%> (-0.43%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1c0d652...f7ad5f3. Read the comment docs.

@coveralls
Copy link

coveralls commented Aug 2, 2017

Coverage Status

Coverage decreased (-0.06%) to 82.313% when pulling 7b4a8e1 on win32_tests into 1c0d652 on master.

@coveralls
Copy link

coveralls commented Aug 2, 2017

Coverage Status

Coverage decreased (-0.06%) to 82.313% when pulling c51935f on win32_tests into 1c0d652 on master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.06%) to 82.313% when pulling f7ad5f3 on win32_tests into 1c0d652 on master.

1 similar comment
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.06%) to 82.313% when pulling f7ad5f3 on win32_tests into 1c0d652 on master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.06%) to 82.313% when pulling f7ad5f3 on win32_tests into 1c0d652 on master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.06%) to 82.313% when pulling f7ad5f3 on win32_tests into 1c0d652 on master.

1 similar comment
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.06%) to 82.313% when pulling f7ad5f3 on win32_tests into 1c0d652 on master.

@ChrisRackauckas
Copy link
Member Author

With @fastmath off I get

println((u-u2)[60:end])

[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.8e-13, 0.0, -2.0e-13, -4.0e-14, 2.0e-14, 1.0e-14, 4.0e-15, -1.0e-15, -7.3e-16, -5.0e-17, 0.0, -1.0e-18, 1.0e-19, 7.8e-20, 0.0, 6.0e-21, 6.0e-20, 1.9e-19, 4.0e-19, 3.0e-19, -4.0e-19, -6.0e-19, -4.0e-19, -2.0e-19, -1.0e-19, -1.0e-20, 3.0e-20, 4.0e-20, 4.0e-20, 0.0, 1.0e-19, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -1.0e-19]

and the ts are the same. So I think a large part of this is due to @fastmath. But talking to a few different people it sounds like we can't expect LLVM to optimize the same at this level between two separate computers. SIMD instructions can be different, the number of registers can be different, etc. There may even be something in the calculation of the power that comes up as different here. So I think this is a level of variation is not something we can control, and to get similar results here we'd need to pull off @fastmath, pull off @muladd, and probably reduce the optimization level.

Instead of then pursuing how to control the optimizations to get the same numerical results, I think we can instead focus increasing the numerical stability of the error estimate.

atmp[i] = (utilde[i]-u[i])/(atol+max(abs(uprev[i]),abs(u[i]))*rtol)

That is actually higher on the level of cancellation that it can be. What we could do is define btildei = bi - a7i (and comment out the bi in the tableaus for future reference but they are unnecessary to keep) and then redefine

utilde[i] = dt*(btilde1*k1[i] + btilde2*k2[i] + btilde3*k3[i] + btilde4*k4[i] + btilde5*k5[i] + btilde6*k6[i] + btilde7*k7[i])

with

atmp[i] = utilde[i]/(atol+max(abs(uprev[i]),abs(u[i]))*rtol)

By getting rid of the uprev[i] cancelation between utilde and u we decrease the largest value, and thus more digits will be floating point representable.

@ChrisRackauckas ChrisRackauckas deleted the win32_tests branch August 2, 2017 05:56
@ChrisRackauckas
Copy link
Member Author

We can also sort numbers in summations from smallest to largest. That's actually mentioned here:

http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html#1070

and I've seen that some other implementations like Hairer does this every once in awhile. This may be worth considering. Compensated summation also can help, but that's overkill in many cases and does have an overhead.

I made an issue for discussing this further: SciML/OrdinaryDiffEq.jl#93

@devmotion
Copy link
Member

Good that you finally found the reason for these differences! But I'm still surprised that --math-mode=ieee did not disable @fastmath...

@ChrisRackauckas
Copy link
Member Author

When I asked Julia devs in the Slack, they didn't seem too surprised it didn't work. Must be something that is partially broken.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants