We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent 7c68af8 commit 4bd1164Copy full SHA for 4bd1164
intermediate_source/reinforcement_q_learning.py
@@ -388,7 +388,7 @@ def plot_durations():
388
# single step of the optimization. It first samples a batch, concatenates
389
# all the tensors into a single one, computes :math:`Q(s_t, a_t)` and
390
# :math:`V(s_{t+1}) = \max_a Q(s_{t+1}, a)`, and combines them into our
391
-# loss. By defition we set :math:`V(s) = 0` if :math:`s` is a terminal
+# loss. By definition we set :math:`V(s) = 0` if :math:`s` is a terminal
392
# state. We also use a target network to compute :math:`V(s_{t+1})` for
393
# added stability. The target network has its weights kept frozen most of
394
# the time, but is updated with the policy network's weights every so often.
0 commit comments