fix IQN #53

jbrea · 2020-06-24T10:08:48Z

Experiments on pong seem stable now (trained for 6M steps). The expected duration of the whole experiment is 8 days on a Tesla V100-SXM2. Is this fast or slow? (I guess I should compare at some point to rays RLlib).

findmyway · 2020-06-24T11:58:01Z

src/algorithms/dqns/iqn.jl

+    aₜ = argmax(mean(zₜ, dims = 2), dims = 1)
+    aₜ = aₜ .+ typeof(aₜ)(CartesianIndices((0, 0:N′-1, 0)))
+    qₜ = reshape(zₜ[aₜ], :, batch_size)
+    target = reshape(r, 1, batch_size) .+ learner.γ * reshape(1 .- t, 1, batch_size) .* qₜ  # reshape to allow broadcast


👍 👍 👍

findmyway · 2020-06-24T12:08:30Z

The expected duration of the whole experiment is 8 days on a Tesla V100-SXM2. Is this fast or slow?

Not so bad.

By the way, I fixed a subtle but important bug at JuliaReinforcementLearning/ReinforcementLearningCore.jl#83

According to my test, the result with Rainbow should be aligned with those in Dopamine. https://google.github.io/dopamine/baselines/plots.html

fix

b870d58

jbrea requested a review from findmyway June 24, 2020 10:08

findmyway approved these changes Jun 24, 2020

View reviewed changes

findmyway merged commit c8ebdd9 into master Jun 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix IQN #53

fix IQN #53

Uh oh!

jbrea commented Jun 24, 2020

Uh oh!

findmyway Jun 24, 2020

Uh oh!

findmyway commented Jun 24, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix IQN #53

fix IQN #53

Uh oh!

Conversation

jbrea commented Jun 24, 2020

Uh oh!

findmyway Jun 24, 2020

Choose a reason for hiding this comment

Uh oh!

findmyway commented Jun 24, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants