Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions _typos.toml
Original file line number Diff line number Diff line change
Expand Up @@ -69,11 +69,9 @@ intput = "intput"
lable = "lable"
learing = "learing"
legth = "legth"
lengthes = "lengthes"
lenth = "lenth"
leran = "leran"
libary = "libary"
likey = "likey"
mantained = "mantained"
matrics = "matrics"
mdule = "mdule"
Expand Down
2 changes: 1 addition & 1 deletion docs/design/dist_train/distributed_training_review.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ Synchronous training usually faces scalability and performance issues, if not ca
Similar to asynchronous training, the benefit of synchronous training diminishes quickly. Depending on the models, increasing the number of trainers (effectively batch size) beyond a point won’t delivers faster converge time or better final model quality.

# Codistillation
Codistillation is a technique that tries to scale the training further. A few training instance (each training instance can be distributed) are performed during the same period. Each training instance has extra losses that comes from the prediction of other training instances. (likey teacher and student) The training process converges faster and usually converge to a better model quality. [4]
Codistillation is a technique that tries to scale the training further. A few training instance (each training instance can be distributed) are performed during the same period. Each training instance has extra losses that comes from the prediction of other training instances. (likely teacher and student) The training process converges faster and usually converge to a better model quality. [4]


# Reference
Expand Down
2 changes: 1 addition & 1 deletion docs/design/network/deep_speech_2.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ Key ingredients about the layers:
- **Data Layers**:
- Frame sequences data of audio **spectrogram** (with FFT).
- Token sequences data of **transcription** text (labels).
- These two type of sequences do not have the same lengthes, thus a CTC-loss layer is required.
- These two type of sequences do not have the same lengths, thus a CTC-loss layer is required.
- **2D Convolution Layers**:
- Not only temporal convolution, but also **frequency convolution**. Like a 2D image convolution, but with a variable dimension (i.e. temporal dimension).
- With striding for only the first convlution layer.
Expand Down