PaddlePaddle · ooooo-create · Nov 8, 2025 · Nov 7, 2025 · Nov 7, 2025 · Nov 7, 2025
diff --git a/_typos.toml b/_typos.toml
@@ -69,11 +69,9 @@ intput = "intput"
 lable = "lable"
 learing = "learing"
 legth = "legth"
-lengthes = "lengthes"
 lenth = "lenth"
 leran = "leran"
 libary = "libary"
-likey = "likey"
 mantained = "mantained"
 matrics = "matrics"
 mdule = "mdule"

diff --git a/docs/design/dist_train/distributed_training_review.md b/docs/design/dist_train/distributed_training_review.md
@@ -30,7 +30,7 @@ Synchronous training usually faces scalability and performance issues, if not ca
 Similar to asynchronous training, the benefit of synchronous training diminishes quickly. Depending on the models, increasing the number of trainers (effectively batch size) beyond a point won’t delivers faster converge time or better final model quality.
 
 # Codistillation
-Codistillation is a technique that tries to scale the training further. A few training instance (each training instance can be distributed) are performed during the same period. Each training instance has extra losses that comes from the prediction of other training instances. (likey teacher and student) The training process converges faster and usually converge to a better model quality. [4]
+Codistillation is a technique that tries to scale the training further. A few training instance (each training instance can be distributed) are performed during the same period. Each training instance has extra losses that comes from the prediction of other training instances. (likely teacher and student) The training process converges faster and usually converge to a better model quality. [4]
 
 
 # Reference

diff --git a/docs/design/network/deep_speech_2.md b/docs/design/network/deep_speech_2.md
@@ -127,7 +127,7 @@ Key ingredients about the layers:
 - **Data Layers**:
    - Frame sequences data of audio **spectrogram** (with FFT).
    - Token sequences data of **transcription** text (labels).
-   - These two type of sequences do not have the same lengthes, thus a CTC-loss layer is required.
+   - These two type of sequences do not have the same lengths, thus a CTC-loss layer is required.
 - **2D Convolution Layers**:
    - Not only temporal convolution, but also **frequency convolution**. Like a 2D image convolution, but with a variable dimension (i.e. temporal dimension).
    - With striding for only the first convlution layer.