Skip to content

Commit 4bd1164

Browse files
tom-doerrholly1238
andauthored
Fix typo (#891)
Co-authored-by: holly1238 <[email protected]>
1 parent 7c68af8 commit 4bd1164

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

intermediate_source/reinforcement_q_learning.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -388,7 +388,7 @@ def plot_durations():
388388
# single step of the optimization. It first samples a batch, concatenates
389389
# all the tensors into a single one, computes :math:`Q(s_t, a_t)` and
390390
# :math:`V(s_{t+1}) = \max_a Q(s_{t+1}, a)`, and combines them into our
391-
# loss. By defition we set :math:`V(s) = 0` if :math:`s` is a terminal
391+
# loss. By definition we set :math:`V(s) = 0` if :math:`s` is a terminal
392392
# state. We also use a target network to compute :math:`V(s_{t+1})` for
393393
# added stability. The target network has its weights kept frozen most of
394394
# the time, but is updated with the policy network's weights every so often.

0 commit comments

Comments
 (0)