This repository was archived by the owner on May 6, 2021. It is now read-only.

Description
Sometimes one wants to give a transformed reward to the learner, but keep the true reward given by the environment for evaluation purposes. For example Dopamine clamps all rewards to [-1, 1] and I believe some of our methods are unstable in the Atari domain, because we don't clip the rewards. Where would it be best to transform rewards? Should we add a POST_OBSERVE hook, or allow for applying the transformation when observations are put into buffers or just before the actual learning takes place?