Skip to content
This repository was archived by the owner on May 6, 2021. It is now read-only.

Conversation

@jbrea
Copy link
Contributor

@jbrea jbrea commented Jun 24, 2020

Experiments on pong seem stable now (trained for 6M steps). The expected duration of the whole experiment is 8 days on a Tesla V100-SXM2. Is this fast or slow? (I guess I should compare at some point to rays RLlib).

@jbrea jbrea requested a review from findmyway June 24, 2020 10:08
Comment on lines +187 to +190
aₜ = argmax(mean(zₜ, dims = 2), dims = 1)
aₜ = aₜ .+ typeof(aₜ)(CartesianIndices((0, 0:N′-1, 0)))
qₜ = reshape(zₜ[aₜ], :, batch_size)
target = reshape(r, 1, batch_size) .+ learner.γ * reshape(1 .- t, 1, batch_size) .* qₜ # reshape to allow broadcast
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 👍 👍

@findmyway
Copy link
Member

The expected duration of the whole experiment is 8 days on a Tesla V100-SXM2. Is this fast or slow?

Not so bad.


By the way, I fixed a subtle but important bug at JuliaReinforcementLearning/ReinforcementLearningCore.jl#83

According to my test, the result with Rainbow should be aligned with those in Dopamine. https://google.github.io/dopamine/baselines/plots.html

@findmyway findmyway merged commit c8ebdd9 into master Jun 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants