Skip to content

Commit f3531fd

Browse files
authored
Merge branch 'master' into master
2 parents bbcc282 + 0445e9c commit f3531fd

File tree

6 files changed

+25
-23
lines changed

6 files changed

+25
-23
lines changed

advanced_source/cpp_extension.rst

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -115,13 +115,13 @@ PyTorch has no knowledge of the *algorithm* you are implementing. It knows only
115115
of the individual operations you use to compose your algorithm. As such, PyTorch
116116
must execute your operations individually, one after the other. Since each
117117
individual call to the implementation (or *kernel*) of an operation, which may
118-
involve launch of a CUDA kernel, has a certain amount of overhead, this overhead
119-
may become significant across many function calls. Furthermore, the Python
120-
interpreter that is running our code can itself slow down our program.
118+
involve the launch of a CUDA kernel, has a certain amount of overhead, this
119+
overhead may become significant across many function calls. Furthermore, the
120+
Python interpreter that is running our code can itself slow down our program.
121121

122122
A definite method of speeding things up is therefore to rewrite parts in C++ (or
123123
CUDA) and *fuse* particular groups of operations. Fusing means combining the
124-
implementations of many functions into a single functions, which profits from
124+
implementations of many functions into a single function, which profits from
125125
fewer kernel launches as well as other optimizations we can perform with
126126
increased visibility of the global flow of data.
127127

@@ -509,12 +509,12 @@ and with our new C++ version::
509509
Forward: 349.335 us | Backward 443.523 us
510510

511511
We can already see a significant speedup for the forward function (more than
512-
30%). For the backward function a speedup is visible, albeit not major one. The
513-
backward pass I wrote above was not particularly optimized and could definitely
514-
be improved. Also, PyTorch's automatic differentiation engine can automatically
515-
parallelize computation graphs, may use a more efficient flow of operations
516-
overall, and is also implemented in C++, so it's expected to be fast.
517-
Nevertheless, this is a good start.
512+
30%). For the backward function, a speedup is visible, albeit not a major one.
513+
The backward pass I wrote above was not particularly optimized and could
514+
definitely be improved. Also, PyTorch's automatic differentiation engine can
515+
automatically parallelize computation graphs, may use a more efficient flow of
516+
operations overall, and is also implemented in C++, so it's expected to be
517+
fast. Nevertheless, this is a good start.
518518

519519
Performance on GPU Devices
520520
**************************
@@ -571,7 +571,7 @@ And C++/ATen::
571571

572572
That's a great overall speedup compared to non-CUDA code. However, we can pull
573573
even more performance out of our C++ code by writing custom CUDA kernels, which
574-
we'll dive into soon. Before that, let's dicuss another way of building your C++
574+
we'll dive into soon. Before that, let's discuss another way of building your C++
575575
extensions.
576576

577577
JIT Compiling Extensions
@@ -851,7 +851,7 @@ and ``Double``), you can use ``AT_DISPATCH_ALL_TYPES``.
851851

852852
Note that we perform some operations with plain ATen. These operations will
853853
still run on the GPU, but using ATen's default implementations. This makes
854-
sense, because ATen will use highly optimized routines for things like matrix
854+
sense because ATen will use highly optimized routines for things like matrix
855855
multiplies (e.g. ``addmm``) or convolutions which would be much harder to
856856
implement and improve ourselves.
857857

@@ -903,7 +903,7 @@ You can see in the CUDA kernel that we work directly on pointers with the right
903903
type. Indeed, working directly with high level type agnostic tensors inside cuda
904904
kernels would be very inefficient.
905905

906-
However, this comes at a cost of ease of use and readibility, especially for
906+
However, this comes at a cost of ease of use and readability, especially for
907907
highly dimensional data. In our example, we know for example that the contiguous
908908
``gates`` tensor has 3 dimensions:
909909

@@ -920,7 +920,7 @@ arithmetic.
920920
gates.data<scalar_t>()[n*3*state_size + row*state_size + column]
921921
922922
923-
In addition to being verbose, this expression needs stride to be explicitely
923+
In addition to being verbose, this expression needs stride to be explicitly
924924
known, and thus passed to the kernel function within its arguments. You can see
925925
that in the case of kernel functions accepting multiple tensors with different
926926
sizes you will end up with a very long list of arguments.

beginner_source/blitz/cifar10_tutorial.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -43,15 +43,15 @@
4343
4444
We will do the following steps in order:
4545
46-
1. Load and normalizing the CIFAR10 training and test datasets using
46+
1. Load and normalize the CIFAR10 training and test datasets using
4747
``torchvision``
4848
2. Define a Convolutional Neural Network
4949
3. Define a loss function
5050
4. Train the network on the training data
5151
5. Test the network on the test data
5252
53-
1. Loading and normalizing CIFAR10
54-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
53+
1. Load and normalize CIFAR10
54+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
5555
5656
Using ``torchvision``, it’s extremely easy to load CIFAR10.
5757
"""

beginner_source/blitz/neural_networks_tutorial.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ def __init__(self):
5858
def forward(self, x):
5959
# Max pooling over a (2, 2) window
6060
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
61-
# If the size is a square you can only specify a single number
61+
# If the size is a square, you can specify with a single number
6262
x = F.max_pool2d(F.relu(self.conv2(x)), 2)
6363
x = x.view(-1, self.num_flat_features(x))
6464
x = F.relu(self.fc1(x))
@@ -176,7 +176,7 @@ def num_flat_features(self, x):
176176
# -> loss
177177
#
178178
# So, when we call ``loss.backward()``, the whole graph is differentiated
179-
# w.r.t. the loss, and all Tensors in the graph that has ``requires_grad=True``
179+
# w.r.t. the loss, and all Tensors in the graph that have ``requires_grad=True``
180180
# will have their ``.grad`` Tensor accumulated with the gradient.
181181
#
182182
# For illustration, let us follow a few steps backward:

beginner_source/nlp/README.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,9 @@ Deep Learning for NLP with Pytorch
1414
https://pytorch.org/tutorials/beginner/nlp/word_embeddings_tutorial.html
1515

1616
4. sequence_models_tutorial.py
17-
Sequence Models and Long-Short Term Memory Networks
17+
Sequence Models and Long Short-Term Memory Networks
1818
https://pytorch.org/tutorials/beginner/nlp/sequence_models_tutorial.html
1919

2020
5. advanced_tutorial.py
2121
Advanced: Making Dynamic Decisions and the Bi-LSTM CRF
22-
https://pytorch.org/tutorials/beginner/nlp/advanced_tutorial.html
22+
https://pytorch.org/tutorials/beginner/nlp/advanced_tutorial.html

beginner_source/nlp/sequence_models_tutorial.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# -*- coding: utf-8 -*-
22
r"""
3-
Sequence Models and Long-Short Term Memory Networks
3+
Sequence Models and Long Short-Term Memory Networks
44
===================================================
55
66
At this point, we have seen various feed-forward networks. That is,

beginner_source/nlp/word_embeddings_tutorial.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -268,6 +268,8 @@ def forward(self, inputs):
268268
losses.append(total_loss)
269269
print(losses) # The loss decreased every iteration over the training data!
270270

271+
# To get the embedding of a particular word, e.g. "beauty"
272+
print(model.embeddings.weight[word_to_ix["beauty"]])
271273

272274
######################################################################
273275
# Exercise: Computing Word Embeddings: Continuous Bag-of-Words
@@ -277,7 +279,7 @@ def forward(self, inputs):
277279
# learning. It is a model that tries to predict words given the context of
278280
# a few words before and a few words after the target word. This is
279281
# distinct from language modeling, since CBOW is not sequential and does
280-
# not have to be probabilistic. Typcially, CBOW is used to quickly train
282+
# not have to be probabilistic. Typically, CBOW is used to quickly train
281283
# word embeddings, and these embeddings are used to initialize the
282284
# embeddings of some more complicated model. Usually, this is referred to
283285
# as *pretraining embeddings*. It almost always helps performance a couple

0 commit comments

Comments
 (0)