Example pipeline with wav2letter #632

vincentqb · 2020-05-12T19:10:41Z

We implement a reference pipeline using wav2letter model to train on librispeech. The structure will be inspired by torchvision's reference implementation.

As discussed here, this code was initially implemented in this python script which was converted from this notebook to be ran using SLURM using bash script and sbatch.

There are at least a few more things to do:

Clean code from notebook conversion.
Replace model implementation by call to torchaudio.
Remove SLURM-specific code.
~~Add an option to activate torch.autograd.set_detect_anomaly(True)~~
Remove ~~Bring back~~ viterbi decoder.
Make distributed work with 1 GPU per python process?
~~Add 10 ms shift data augmentation~~
~~Publish pre-trained weights~~

Note:

4795a72 removed the modified wav2letter model with dropout and custom hidden units in order to use the exact implementation currently available within torchaudio. This commit also removes the SLURM termination signal capture.
d8ee1e9 removed the 2-gram viterbi decoder as it was not improving CER, and unused dataclass.

See also post by assemblyai, and internal.
cc @zhangguanheng66 for pytorch/text#767

examples/pipeline/wav2letter.py

cpuhrsch · 2020-05-13T00:48:16Z

examples/pipeline/wav2letter.py

+        return len(self._iterable)
+
+
+class MapMemoryCache(torch.utils.data.Dataset):


This is great! Seems like a pretty generic object that should also be useful for core. It's effectively a readthrough cache backed by RAM similar to how diskcache_iterator is a readthrough cache backed by disk.

cpuhrsch · 2020-05-13T00:50:41Z

examples/pipeline/wav2letter.py

+    # return create(["train-clean-100", "train-clean-360", "train-other-500"]), create(["dev-clean", "dev-other"]), None
+
+
+def which_set(filename, validation_percentage, testing_percentage):


Why is this necessary as opposed to a seeded shuffle + split?

This is the script recommended for splitting train/dev/test in SpeechCommands' readme adapted to this use case. I suggest we include it with SpeechCommands in torchaudio.

~~The advantage of this approach is that words and speakers are better distributed between the different splits.~~

How are they better distributed in comparison to a random shuffle?

Sorry, the real advantage of this approach is listed in the docstring:

We want to keep files in the same training, validation, or testing sets even if new ones are added over time. This makes it less likely that testing samples will accidentally be reused in training when long runs are restarted for example.

cpuhrsch · 2020-05-13T00:52:47Z

examples/pipeline/wav2letter.py

+        if c is None:
+            c = count
+        else:
+            c = c + count


I think you could also use update and initialize c as an empty Counter.

Creating a Counter and adding to one over each iteration might be quite slow in comparison.

cpuhrsch · 2020-05-13T00:54:36Z

examples/pipeline/wav2letter.py

+    return output[:, 0, :]
+
+
+def levenshtein_distance_list(r, h):


This seems unused

Only one version is needed indeed. The list version is faster than the pytorch version though.

examples/pipeline/wav2letter.py

cpuhrsch · 2020-05-13T00:58:10Z

examples/pipeline/wav2letter.py

+            data = LIBRISPEECH(
+                root, tag, folder_in_archive=folder_in_archive, download=False)
+        else:
+            data = torch.utils.data.ConcatDataset([LIBRISPEECH(


This could also be done using sum since ConcatDataset can be created via add.

cpuhrsch · 2020-05-13T01:00:29Z

Looks pretty good :)

examples/pipeline/wav2letter.py

codecov · 2020-05-14T00:25:13Z

Codecov Report

Merging #632 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #632   +/-   ##
=======================================
  Coverage   89.99%   89.99%           
=======================================
  Files          35       35           
  Lines        2719     2719           
=======================================
  Hits         2447     2447           
  Misses        272      272

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0a5e29d...db488f4. Read the comment docs.

cpuhrsch · 2020-05-15T01:16:34Z

I'd add "split into multiple files" as another todo as well.

cpuhrsch · 2020-06-10T02:03:22Z

examples/pipeline/metrics.py

+import torch
+
+
+def levenshtein_distance(r: str, h: str, device: Optional[str] = None):


This seems worthwhile a separate PR already. Do you agree? In particular with the C++ extension we can create a JIT-able, fast version of this already.

cpuhrsch · 2020-06-10T02:06:17Z

examples/pipeline/wav2letter.py

+    return args
+
+
+def signal_handler(a, b):


The need for functions like this worry me, because I'd imagine most users to not be aware of their necessity or purpose.

They are not "needed" :) I'll remove them to avoid confusion.

examples/pipeline/wav2letter.py

cpuhrsch · 2020-09-11T15:22:05Z

examples/pipeline_wav2letter/main.py

+            weight_decay=args.weight_decay,
+        )
+    else:
+        raise ValueError("Selected optimizer not supported")


Repeat the given option, i.e. "Selected optimizer %s not supported".format(args.optimizer) if you're going for this type of input sanitization to make it easier for the user to debug.

Also, this code is unreachable from CLI, so NotImplementedError makes more sense, because the only time you reach here is when you intend to add a new choice and changed CLI parser but forgot to add actual implementation.

Also I would extract this into a helper function so that main logic becomes readable. _get_optimizer(...)

cpuhrsch · 2020-09-11T15:24:12Z

examples/pipeline_wav2letter/datasets.py

+            device=tensors[0].device,
+        )
+
+        tensors = torch.nn.utils.rnn.pad_sequence(tensors, batch_first=True)


A wrapped / generalized version of this could form a useful torchaudio function.

pad_sequence requires transposes as it is, since in torchaudio it is the last dimension that we want to pad. I re-implemented pad_sequence for this use case.

cpuhrsch · 2020-09-11T15:26:40Z

examples/pipeline_wav2letter/languagemodels.py

+    def encode(self, iterable):
+        if isinstance(iterable, list):
+            return [self.encode(i) for i in iterable]
+        else:


What if I pass an iterable that yields lists? What's the basecase type here? Maybe that's an easier case to branch on. Also as a very minor nit, I actually like using returns to avoid "else". So you could write

if isinstance(iterable, list): return [self.encode(i) for i in iterable] return [self.mapping[i] + self.mapping[self.char_blank] for i in iterable]

cpuhrsch · 2020-09-11T15:28:52Z

examples/pipeline_wav2letter/ctc_decoders.py

+from torch import topk
+
+
+class GreedyDecoder:


You could generalize this file to be called "decoders.py" and also fold in things such as compute_error_rates

This class is stateless. Can it be a function?

It could be functional corresponding to a transform, but really it's a step towards our beamsearch work

cpuhrsch · 2020-09-11T15:30:27Z

examples/pipeline_wav2letter/main.py

+        metric["dataset length"] += metric["batch size"]
+        metric["iteration"] += 1
+        metric["loss"] = loss.item()
+        metric["cumulative loss"] += metric["loss"]


I'd abstract this accumulation for both training evaluation and merge it into a single function. That way you'll always be sure that both training and evaluation are using the exact same calculations, since that's the last place you'd want to be buggy.

Second that, all those logging logic should belong logger side as a method. That will make the train loop more readable, and achieve better decoupling.

cpuhrsch · 2020-09-11T15:31:35Z

examples/pipeline_wav2letter/main.py

+
+    logging.info("Start time: %s", datetime.now())
+
+    # Explicitly set seed to make sure models created in separate processes


If this is for distributed training I'd worry that this isn't already happening. Did you have a case where this became necessary in order to avoid a bug?

cpuhrsch · 2020-09-11T15:32:29Z

examples/pipeline_wav2letter/main.py

+        collate_fn=collate_fn_train,
+        **loader_training_params,
+    )
+    loader_validation = DataLoader(


For validation "drop_last" is usually undesired because you can end up not running on the entire dataset.

cpuhrsch · 2020-09-11T15:33:30Z

examples/pipeline_wav2letter/main.py

+            "Checkpoint: loaded '%s' at epoch %s", args.checkpoint, checkpoint["epoch"]
+        )
+    else:
+        logging.info("Checkpoint: not found")


This seems like a case I'd error on. If the user intents to resume from this checkpoint and it wasn't found that's probably a mistake.

Also the logic here is strange. If user does not give the checkpoint option (training from scratch), there is no need to say not found.

cpuhrsch · 2020-09-11T15:37:28Z

examples/pipeline_wav2letter/main.py

+            not args.reduce_lr_valid,
+        )
+
+        if not (epoch + 1) % args.print_freq or epoch == args.epochs - 1:


nit: I sometimes like to save on the indent and write something like:

if epoch < args.epoch - 1 and (epoch + 1) % args.print_freq: continue

cpuhrsch · 2020-09-11T15:38:27Z

examples/pipeline_wav2letter/transforms.py

+
+class UnsqueezeFirst(torch.nn.Module):
+    def forward(self, tensor):
+        return tensor.unsqueeze(0)


This is, in my opinion, the sort of issue that makes me dislike using nn.Sequential over a function. You end up wrapping simple, small commands into modules.

However, if you write one (or two) collate_functions you'll probably end up writing function factories that essentially do the same.

cpuhrsch · 2020-09-11T15:40:54Z

examples/pipeline_wav2letter/utils.py

+def save_checkpoint(state, is_best, filename, disable):
+    """
+    Save the model to a temporary file first,
+    then copy it to filename, in case the signal interrupts


Has this happened? I think the scheduler is supposed to signal you and then you get a bunch of time to catch the signal and shutdown gracefully.

Also in my opinion this logic does not corresponds to the name of the function. save_checkpoint should do saving and only saving. The logic for Handling temporary file for the sake of interruption solved different concern and should live in a different function.

I'm not aware of this happening. I'll remove this logic.

mthrok

I have concerns on accuracy metrics. I strongly believe that WER computation is incorrect and we should not re-invent the wheel and use SCTK or something.

mthrok · 2020-09-11T17:15:01Z

examples/pipeline_wav2letter/datasets.py

+        self.dataset = dataset
+        self._cache = [None] * len(dataset)
+
+    def __getitem__(self, n):


This can be simplified.

if self._cache[n] is None: self._cache[n] = self.dataset[n] return self._cache[n]

mthrok · 2020-09-11T17:17:19Z

examples/pipeline_wav2letter/datasets.py

+    def __len__(self):
+        return len(self.dataset)
+
+    def process_datapoint(self, item):


This operation is not generic and requires specific item type, and since it uses index slicing it is very difficult to understand what it does. Please add a docstring.

mthrok · 2020-09-11T17:22:34Z

examples/pipeline_wav2letter/datasets.py

+        if isinstance(transforms, list):
+            transform_list = transforms
+        else:
+            transform_list = [transforms]


Since this is an example code and all the helper functions are for making the example code main code simpler, so making helper functions more specific helps better with maintainability. Instead of allowing multiple types, it's simpler to allow only one type and do the equivalent type conversion in client code.

mthrok · 2020-09-11T17:23:35Z

examples/pipeline_wav2letter/datasets.py

+
+    def collate_fn(batch):
+
+        tensors = [transforms(b[0]) for b in batch if b]


It is very difficult to understand what are being transformed, here.

for b in batch if b

Why is there a case that one item in a batch (denoted as b) can be invalid sample?

what does b[0] represent?

if b is no longer needed, removed :)

b[0] is the waveform from the processed data point tuple. added a comment

mthrok · 2020-09-11T17:28:47Z

examples/pipeline_wav2letter/languagemodels.py

+        self.char_space = char_space
+        self.char_blank = char_blank
+
+        labels = [l for l in labels]


Cannot be labels = list(labels)? What is the expected type of the input labels?

Good catch. Yes, it's just a string.

mthrok · 2020-09-11T18:24:57Z

examples/pipeline_wav2letter/metrics.py

+from typing import List, Union
+
+
+def levenshtein_distance(r: Union[str, List[str]], h: Union[str, List[str]]):


If moving this into the library, docstring needs to be improved with the equation.

mthrok · 2020-09-11T18:27:18Z

examples/pipeline_wav2letter/utils.py

+
+class MetricLogger(defaultdict):
+    def __init__(self, name, print_freq=1, disable=False):
+        super().__init__(lambda: 0.0)


I think super().__init__(float) is better.

mthrok · 2020-09-11T18:28:33Z

examples/pipeline_wav2letter/utils.py

+    """
+
+    if disable:
+        return


I do not think this should be the logic of save_checkpoinit function. This goes against single-responsibility principle. It's caller's responsibility to check when to save.

mthrok · 2020-09-11T18:28:57Z

examples/pipeline_wav2letter/utils.py

+        return
+
+    if filename == "":
+        return


I think, giving empty string as file location should be error.

mthrok · 2020-09-11T18:31:56Z

examples/pipeline_wav2letter/utils.py

+def save_checkpoint(state, is_best, filename, disable):
+    """
+    Save the model to a temporary file first,
+    then copy it to filename, in case the signal interrupts


Also in my opinion this logic does not corresponds to the name of the function. save_checkpoint should do saving and only saving. The logic for Handling temporary file for the sake of interruption solved different concern and should live in a different function.

cpuhrsch

This PR is getting very, very large and has been up for a long time. Let's merge it, since it's already working, and revisit some of these suggested improvements on a PR-by-PR basis. It'll also help us share code across examples etc.

vincentqb · 2020-09-24T18:58:41Z

Following comment, reverted to Aug 19, merging, and moved follow-up to vincentqb#3.

vincentqb · 2020-11-11T15:48:09Z

As mentioned in the README, we can get less than 13.8% "cer over target length" after 30 epochs. See sample output grepped for validation:

{"name": "validation", "epoch": 0, "cumulative loss": 70.70183157920837, "dataset length": 2688.0, "iteration": 21.0, "cer over target length": 1.0, "cumulative cer": 280923.0, "total chars": 280923.0, "cer": 0.0, "cumulative cer over target length": 0.0, "wer over target length": 1.0, "cumulative wer": 54008.0, "total words": 54008.0, "wer": 0.0, "cumulative wer over target length": 0.0, "average loss": 3.3667538847242082, "validation time": 25.846306562423706}
{"name": "validation", "epoch": 1, "cumulative loss": 69.96176052093506, "dataset length": 2688.0, "iteration": 21.0, "cer over target length": 1.0, "cumulative cer": 280923.0, "total chars": 280923.0, "cer": 0.0, "cumulative cer over target length": 0.0, "wer over target length": 1.0, "cumulative wer": 54008.0, "total words": 54008.0, "wer": 0.0, "cumulative wer over target length": 0.0, "average loss": 3.3315124057588124, "validation time": 6.5293145179748535}
{"name": "validation", "epoch": 2, "cumulative loss": 70.07647657394409, "dataset length": 2688.0, "iteration": 21.0, "cer over target length": 1.0, "cumulative cer": 280923.0, "total chars": 280923.0, "cer": 0.0, "cumulative cer over target length": 0.0, "wer over target length": 1.0, "cumulative wer": 54008.0, "total words": 54008.0, "wer": 0.0, "cumulative wer over target length": 0.0, "average loss": 3.3369750749497187, "validation time": 6.5231428146362305}
{"name": "validation", "epoch": 3, "cumulative loss": 69.94338345527649, "dataset length": 2688.0, "iteration": 21.0, "cer over target length": 1.0, "cumulative cer": 280923.0, "total chars": 280923.0, "cer": 0.0, "cumulative cer over target length": 0.0, "wer over target length": 1.0, "cumulative wer": 54008.0, "total words": 54008.0, "wer": 0.0, "cumulative wer over target length": 0.0, "average loss": 3.3306373073941185, "validation time": 5.516169548034668}
{"name": "validation", "epoch": 4, "cumulative loss": 69.97763347625732, "dataset length": 2688.0, "iteration": 21.0, "cer over target length": 1.0, "cumulative cer": 280923.0, "total chars": 280923.0, "cer": 0.0, "cumulative cer over target length": 0.0, "wer over target length": 1.0, "cumulative wer": 54008.0, "total words": 54008.0, "wer": 0.0, "cumulative wer over target length": 0.0, "average loss": 3.3322682607741583, "validation time": 6.777941942214966}
{"name": "validation", "epoch": 5, "cumulative loss": 69.72349190711975, "dataset length": 2688.0, "iteration": 21.0, "cer over target length": 1.0, "cumulative cer": 280923.0, "total chars": 280923.0, "cer": 0.0, "cumulative cer over target length": 0.0, "wer over target length": 1.0, "cumulative wer": 54008.0, "total words": 54008.0, "wer": 0.0, "cumulative wer over target length": 0.0, "average loss": 3.3201662812914168, "validation time": 6.666280746459961}
{"name": "validation", "epoch": 6, "cumulative loss": 70.0134003162384, "dataset length": 2688.0, "iteration": 21.0, "cer over target length": 1.0, "cumulative cer": 280923.0, "total chars": 280923.0, "cer": 0.0, "cumulative cer over target length": 0.0, "wer over target length": 1.0, "cumulative wer": 54008.0, "total words": 54008.0, "wer": 0.0, "cumulative wer over target length": 0.0, "average loss": 3.3339714436304, "validation time": 6.872417688369751}
{"name": "validation", "epoch": 7, "cumulative loss": 69.52196455001831, "dataset length": 2688.0, "iteration": 21.0, "cer over target length": 1.0, "cumulative cer": 280923.0, "total chars": 280923.0, "cer": 0.0, "cumulative cer over target length": 0.0, "wer over target length": 1.0, "cumulative wer": 54008.0, "total words": 54008.0, "wer": 0.0, "cumulative wer over target length": 0.0, "average loss": 3.3105697404770624, "validation time": 6.776198148727417}
{"name": "validation", "epoch": 8, "cumulative loss": 69.42171597480774, "dataset length": 2688.0, "iteration": 21.0, "cer over target length": 1.0, "cumulative cer": 280923.0, "total chars": 280923.0, "cer": 0.0, "cumulative cer over target length": 0.0, "wer over target length": 1.0, "cumulative wer": 54008.0, "total words": 54008.0, "wer": 0.0, "cumulative wer over target length": 0.0, "average loss": 3.3057959988003685, "validation time": 6.710602760314941}
{"name": "validation", "epoch": 9, "cumulative loss": 69.25003266334534, "dataset length": 2688.0, "iteration": 21.0, "cer over target length": 1.0, "cumulative cer": 280923.0, "total chars": 280923.0, "cer": 0.0, "cumulative cer over target length": 0.0, "wer over target length": 1.0, "cumulative wer": 54008.0, "total words": 54008.0, "wer": 0.0, "cumulative wer over target length": 0.0, "average loss": 3.2976206030164446, "validation time": 6.838287353515625}
{"name": "validation", "epoch": 10, "cumulative loss": 64.44925856590271, "dataset length": 2688.0, "iteration": 21.0, "cer over target length": 1.0, "cumulative cer": 280923.0, "total chars": 280923.0, "cer": 0.0, "cumulative cer over target length": 0.0, "wer over target length": 1.0, "cumulative wer": 54008.0, "total words": 54008.0, "wer": 0.0, "cumulative wer over target length": 0.0, "average loss": 3.069012312662034, "validation time": 6.446615695953369}
{"name": "validation", "epoch": 11, "cumulative loss": 63.2327446937561, "dataset length": 2688.0, "iteration": 21.0, "cer over target length": 0.9225932936722553, "cumulative cer": 258065.0, "total chars": 280923.0, "cer": 0.0, "cumulative cer over target length": 0.0, "wer over target length": 1.0010559662090812, "cumulative wer": 54122.0, "total words": 54008.0, "wer": 0.0, "cumulative wer over target length": 0.0, "average loss": 3.0110830806550526, "validation time": 7.578996181488037}
{"name": "validation", "epoch": 12, "cumulative loss": 57.679614543914795, "dataset length": 2688.0, "iteration": 21.0, "cer over target length": 0.6934829637641968, "cumulative cer": 195536.0, "total chars": 280923.0, "cer": 0.0, "cumulative cer over target length": 0.0, "wer over target length": 1.3079901443153819, "cumulative wer": 70233.0, "total words": 54008.0, "wer": 0.0, "cumulative wer over target length": 0.0, "average loss": 2.7466483116149902, "validation time": 12.67360258102417}
{"name": "validation", "epoch": 13, "cumulative loss": 54.061622858047485, "dataset length": 2688.0, "iteration": 21.0, "cer over target length": 0.6738777717685235, "cumulative cer": 189715.0, "total chars": 280923.0, "cer": 0.0, "cumulative cer over target length": 0.0, "wer over target length": 1.3005983808518127, "cumulative wer": 69715.0, "total words": 54008.0, "wer": 0.0, "cumulative wer over target length": 0.0, "average loss": 2.5743629932403564, "validation time": 14.44613003730774}
{"name": "validation", "epoch": 14, "cumulative loss": 42.6647093296051, "dataset length": 2688.0, "iteration": 21.0, "cer over target length": 0.6101270957274202, "cumulative cer": 169292.0, "total chars": 280923.0, "cer": 0.0, "cumulative cer over target length": 0.0, "wer over target length": 1.0091517071453713, "cumulative wer": 54202.0, "total words": 54008.0, "wer": 0.0, "cumulative wer over target length": 0.0, "average loss": 2.0316528252192905, "validation time": 15.787317752838135}
{"name": "validation", "epoch": 15, "cumulative loss": 30.52291715145111, "dataset length": 2688.0, "iteration": 21.0, "cer over target length": 0.42664954029204977, "cumulative cer": 120448.0, "total chars": 280923.0, "cer": 0.0, "cumulative cer over target length": 0.0, "wer over target length": 0.9686730024639212, "cumulative wer": 51122.0, "total words": 54008.0, "wer": 0.0, "cumulative wer over target length": 0.0, "average loss": 1.4534722453071958, "validation time": 18.890476942062378}
{"name": "validation", "epoch": 16, "cumulative loss": 22.910719513893127, "dataset length": 2688.0, "iteration": 21.0, "cer over target length": 0.3174012979989183, "cumulative cer": 91667.0, "total chars": 280923.0, "cer": 0.0, "cumulative cer over target length": 0.0, "wer over target length": 0.7529039070749736, "cumulative wer": 41637.0, "total words": 54008.0, "wer": 0.0, "cumulative wer over target length": 0.0, "average loss": 1.0909866435187203, "validation time": 19.944958686828613}
{"name": "validation", "epoch": 17, "cumulative loss": 17.98588478565216, "dataset length": 2688.0, "iteration": 21.0, "cer over target length": 0.2589913466738778, "cumulative cer": 75363.0, "total chars": 280923.0, "cer": 0.0, "cumulative cer over target length": 0.0, "wer over target length": 0.6613868356212601, "cumulative wer": 36475.0, "total words": 54008.0, "wer": 0.0, "cumulative wer over target length": 0.0, "average loss": 0.8564707040786743, "validation time": 19.98793649673462}
{"name": "validation", "epoch": 18, "cumulative loss": 15.67355227470398, "dataset length": 2688.0, "iteration": 21.0, "cer over target length": 0.22397241752298538, "cumulative cer": 64542.0, "total chars": 280923.0, "cer": 0.0, "cumulative cer over target length": 0.0, "wer over target length": 0.6117564237944386, "cumulative wer": 33368.0, "total words": 54008.0, "wer": 0.0, "cumulative wer over target length": 0.0, "average loss": 0.7463596321287609, "validation time": 20.789494514465332}
{"name": "validation", "epoch": 19, "cumulative loss": 13.239276766777039, "dataset length": 2688.0, "iteration": 21.0, "cer over target length": 0.1870605732828556, "cumulative cer": 55027.0, "total chars": 280923.0, "cer": 0.0, "cumulative cer over target length": 0.0, "wer over target length": 0.5381907778951074, "cumulative wer": 29982.0, "total words": 54008.0, "wer": 0.0, "cumulative wer over target length": 0.0, "average loss": 0.6304417507989066, "validation time": 20.70693564414978}
{"name": "validation", "epoch": 20, "cumulative loss": 12.41661947965622, "dataset length": 2688.0, "iteration": 21.0, "cer over target length": 0.180164954029205, "cumulative cer": 51669.0, "total chars": 280923.0, "cer": 0.0, "cumulative cer over target length": 0.0, "wer over target length": 0.5255191833861317, "cumulative wer": 28399.0, "total words": 54008.0, "wer": 0.0, "cumulative wer over target length": 0.0, "average loss": 0.5912675942693438, "validation time": 20.92684292793274}
{"name": "validation", "epoch": 21, "cumulative loss": 11.557319521903992, "dataset length": 2688.0, "iteration": 21.0, "cer over target length": 0.17056517036235802, "cumulative cer": 49252.0, "total chars": 280923.0, "cer": 0.0, "cumulative cer over target length": 0.0, "wer over target length": 0.5072157690953889, "cumulative wer": 27600.0, "total words": 54008.0, "wer": 0.0, "cumulative wer over target length": 0.0, "average loss": 0.5503485486620948, "validation time": 21.347500801086426}
{"name": "validation", "epoch": 22, "cumulative loss": 10.85356280207634, "dataset length": 2688.0, "iteration": 21.0, "cer over target length": 0.15548945375878853, "cumulative cer": 46219.0, "total chars": 280923.0, "cer": 0.0, "cumulative cer over target length": 0.0, "wer over target length": 0.46673706441393875, "cumulative wer": 26201.0, "total words": 54008.0, "wer": 0.0, "cumulative wer over target length": 0.0, "average loss": 0.5168363239083972, "validation time": 20.922102212905884}
{"name": "validation", "epoch": 23, "cumulative loss": 10.527889400720596, "dataset length": 2688.0, "iteration": 21.0, "cer over target length": 0.1535289345592212, "cumulative cer": 44505.0, "total chars": 280923.0, "cer": 0.0, "cumulative cer over target length": 0.0, "wer over target length": 0.47624076029567053, "cumulative wer": 25758.0, "total words": 54008.0, "wer": 0.0, "cumulative wer over target length": 0.0, "average loss": 0.5013280667009807, "validation time": 20.710388660430908}
{"name": "validation", "epoch": 24, "cumulative loss": 10.139740884304047, "dataset length": 2688.0, "iteration": 21.0, "cer over target length": 0.14710654407787993, "cumulative cer": 43546.0, "total chars": 280923.0, "cer": 0.0, "cumulative cer over target length": 0.0, "wer over target length": 0.44948961633227735, "cumulative wer": 25158.0, "total words": 54008.0, "wer": 0.0, "cumulative wer over target length": 0.0, "average loss": 0.48284480401447843, "validation time": 21.18306064605713}
{"name": "validation", "epoch": 25, "cumulative loss": 10.286657720804214, "dataset length": 2688.0, "iteration": 21.0, "cer over target length": 0.1484586262844781, "cumulative cer": 42859.0, "total chars": 280923.0, "cer": 0.0, "cumulative cer over target length": 0.0, "wer over target length": 0.4463217177050334, "cumulative wer": 24725.0, "total words": 54008.0, "wer": 0.0, "cumulative wer over target length": 0.0, "average loss": 0.48984084384781973, "validation time": 20.720789670944214}
{"name": "validation", "epoch": 26, "cumulative loss": 9.967010378837585, "dataset length": 2688.0, "iteration": 21.0, "cer over target length": 0.14264467279610601, "cumulative cer": 41970.0, "total chars": 280923.0, "cer": 0.0, "cumulative cer over target length": 0.0, "wer over target length": 0.43681802182330165, "cumulative wer": 24719.0, "total words": 54008.0, "wer": 0.0, "cumulative wer over target length": 0.0, "average loss": 0.47461954184940885, "validation time": 20.847771406173706}
{"name": "validation", "epoch": 27, "cumulative loss": 9.573839098215103, "dataset length": 2688.0, "iteration": 21.0, "cer over target length": 0.13737155219037317, "cumulative cer": 40117.0, "total chars": 280923.0, "cer": 0.0, "cumulative cer over target length": 0.0, "wer over target length": 0.42696233720520943, "cumulative wer": 23860.0, "total words": 54008.0, "wer": 0.0, "cumulative wer over target length": 0.0, "average loss": 0.45589709991500493, "validation time": 20.785948276519775}
{"name": "validation", "epoch": 28, "cumulative loss": 9.409523874521255, "dataset length": 2688.0, "iteration": 21.0, "cer over target length": 0.13608707409410492, "cumulative cer": 39419.0, "total chars": 280923.0, "cer": 0.0, "cumulative cer over target length": 0.0, "wer over target length": 0.42625835973248855, "cumulative wer": 23408.0, "total words": 54008.0, "wer": 0.0, "cumulative wer over target length": 0.0, "average loss": 0.44807256545339313, "validation time": 21.98930525779724}
{"name": "validation", "epoch": 29, "cumulative loss": 9.617189735174179, "dataset length": 2688.0, "iteration": 21.0, "cer over target length": 0.1341265548945376, "cumulative cer": 39536.0, "total chars": 280923.0, "cer": 0.0, "cumulative cer over target length": 0.0, "wer over target length": 0.42203449489616335, "cumulative wer": 23571.0, "total words": 54008.0, "wer": 0.0, "cumulative wer over target length": 0.0, "average loss": 0.4579614159606752, "validation time": 21.18846106529236}
{"name": "validation", "epoch": 30, "cumulative loss": 9.645921111106873, "dataset length": 2688.0, "iteration": 21.0, "cer over target length": 0.13608707409410492, "cumulative cer": 39568.0, "total chars": 280923.0, "cer": 0.0, "cumulative cer over target length": 0.0, "wer over target length": 0.4325941569869764, "cumulative wer": 23657.0, "total words": 54008.0, "wer": 0.0, "cumulative wer over target length": 0.0, "average loss": 0.4593295767193749, "validation time": 20.77275061607361}
{"name": "validation", "epoch": 31, "cumulative loss": 9.530102461576462, "dataset length": 2688.0, "iteration": 21.0, "cer over target length": 0.13115197404002163, "cumulative cer": 38687.0, "total chars": 280923.0, "cer": 0.0, "cumulative cer over target length": 0.0, "wer over target length": 0.41358676522351284, "cumulative wer": 23131.0, "total words": 54008.0, "wer": 0.0, "cumulative wer over target length": 0.0, "average loss": 0.4538144029322125, "validation time": 20.9629008769989}
{"name": "validation", "epoch": 32, "cumulative loss": 9.8547984957695, "dataset length": 2688.0, "iteration": 21.0, "cer over target length": 0.13662790697674418, "cumulative cer": 39482.0, "total chars": 280923.0, "cer": 0.0, "cumulative cer over target length": 0.0, "wer over target length": 0.42872228088701164, "cumulative wer": 23718.0, "total words": 54008.0, "wer": 0.0, "cumulative wer over target length": 0.0, "average loss": 0.4692761188461667, "validation time": 21.710347414016724}
{"name": "validation", "epoch": 33, "cumulative loss": 9.720289260149002, "dataset length": 2688.0, "iteration": 21.0, "cer over target length": 0.13270686857760952, "cumulative cer": 38280.0, "total chars": 280923.0, "cer": 0.0, "cumulative cer over target length": 0.0, "wer over target length": 0.4237944385779655, "cumulative wer": 23243.0, "total words": 54008.0, "wer": 0.0, "cumulative wer over target length": 0.0, "average loss": 0.4628709171499525, "validation time": 21.28827738761902}
{"name": "validation", "epoch": 34, "cumulative loss": 10.022859960794449, "dataset length": 2688.0, "iteration": 21.0, "cer over target length": 0.13067874526771228, "cumulative cer": 38105.0, "total chars": 280923.0, "cer": 0.0, "cumulative cer over target length": 0.0, "wer over target length": 0.41534670890531505, "cumulative wer": 23047.0, "total words": 54008.0, "wer": 0.0, "cumulative wer over target length": 0.0, "average loss": 0.4772790457521166, "validation time": 20.7469642162323}
{"name": "validation", "epoch": 35, "cumulative loss": 9.8920236825943, "dataset length": 2688.0, "iteration": 21.0, "cer over target length": 0.1333829096809086, "cumulative cer": 37309.0, "total chars": 280923.0, "cer": 0.0, "cumulative cer over target length": 0.0, "wer over target length": 0.41745864132347765, "cumulative wer": 22699.0, "total words": 54008.0, "wer": 0.0, "cumulative wer over target length": 0.0, "average loss": 0.47104874679020475, "validation time": 21.03089475631714}
{"name": "validation", "epoch": 36, "cumulative loss": 10.471008330583572, "dataset length": 2688.0, "iteration": 21.0, "cer over target length": 0.1385884261763115, "cumulative cer": 40063.0, "total chars": 280923.0, "cer": 0.0, "cumulative cer over target length": 0.0, "wer over target length": 0.43400211193241817, "cumulative wer": 23698.0, "total words": 54008.0, "wer": 0.0, "cumulative wer over target length": 0.0, "average loss": 0.49861944431350347, "validation time": 21.33878517150879}
{"name": "validation", "epoch": 37, "cumulative loss": 10.184874802827835, "dataset length": 2688.0, "iteration": 21.0, "cer over target length": 0.13223363980530017, "cumulative cer": 38067.0, "total chars": 280923.0, "cer": 0.0, "cumulative cer over target length": 0.0, "wer over target length": 0.4104188665962689, "cumulative wer": 22916.0, "total words": 54008.0, "wer": 0.0, "cumulative wer over target length": 0.0, "average loss": 0.4849940382298969, "validation time": 21.298787117004395}
{"name": "validation", "epoch": 38, "cumulative loss": 10.797764033079147, "dataset length": 2688.0, "iteration": 21.0, "cer over target length": 0.13770957274202272, "cumulative cer": 39543.0, "total chars": 280923.0, "cer": 0.0, "cumulative cer over target length": 0.0, "wer over target length": 0.4297782470960929, "cumulative wer": 23707.0, "total words": 54008.0, "wer": 0.0, "cumulative wer over target length": 0.0, "average loss": 0.5141792396704356, "validation time": 21.03481435775757}
{"name": "validation", "epoch": 39, "cumulative loss": 11.389482617378235, "dataset length": 2688.0, "iteration": 21.0, "cer over target length": 0.13547863710113575, "cumulative cer": 39638.0, "total chars": 280923.0, "cer": 0.0, "cumulative cer over target length": 0.0, "wer over target length": 0.42027455121436114, "cumulative wer": 23703.0, "total words": 54008.0, "wer": 0.0, "cumulative wer over target length": 0.0, "average loss": 0.5423563151132493, "validation time": 20.84154224395752}
{"name": "validation", "epoch": 40, "cumulative loss": 12.254583895206451, "dataset length": 2688.0, "iteration": 21.0, "cer over target length": 0.13040832882639264, "cumulative cer": 37159.0, "total chars": 280923.0, "cer": 0.0, "cumulative cer over target length": 0.0, "wer over target length": 0.4171066525871172, "cumulative wer": 22646.0, "total words": 54008.0, "wer": 0.0, "cumulative wer over target length": 0.0, "average loss": 0.58355161405745, "validation time": 20.9519464969635}

…torchscripttutorial Delete Intro_to_TorchScript.py adding redirect to get user to new file.

cpuhrsch changed the title ~~Example pipeline~~ [WIP] Example pipeline May 12, 2020

vincentqb force-pushed the wav2letter branch 2 times, most recently from 2da10fd to 728f6c8 Compare May 12, 2020 19:43

cpuhrsch reviewed May 13, 2020

View reviewed changes

examples/pipeline/wav2letter.py Outdated Show resolved Hide resolved

cpuhrsch reviewed May 13, 2020

View reviewed changes

examples/pipeline/wav2letter.py Outdated Show resolved Hide resolved

cpuhrsch reviewed May 13, 2020

View reviewed changes

examples/pipeline/wav2letter.py Outdated Show resolved Hide resolved

cpuhrsch reviewed May 13, 2020

View reviewed changes

examples/pipeline/wav2letter.py Outdated Show resolved Hide resolved

cpuhrsch reviewed May 13, 2020

View reviewed changes

mthrok reviewed May 13, 2020

View reviewed changes

examples/pipeline/wav2letter.py Outdated Show resolved Hide resolved

vincentqb force-pushed the wav2letter branch from c5f9bbd to f69ab87 Compare May 28, 2020 22:31

vincentqb force-pushed the wav2letter branch from 193a6ac to f02af9c Compare June 8, 2020 22:26

cpuhrsch reviewed Jun 10, 2020

View reviewed changes

examples/pipeline/wav2letter.py Outdated Show resolved Hide resolved

vincentqb force-pushed the wav2letter branch from 38a0a28 to 6333479 Compare June 12, 2020 20:18

jimchen90 mentioned this pull request Jun 24, 2020

Add wavernn example pipeline #749

Merged

2 tasks

vincentqb force-pushed the wav2letter branch 7 times, most recently from c9b4d6c to d347fa4 Compare July 1, 2020 22:12

cpuhrsch reviewed Sep 11, 2020

View reviewed changes

mthrok reviewed Sep 11, 2020

View reviewed changes

This was referenced Sep 11, 2020

Any new 1.60 Keyword spotting examples out there now we have torchaudio? #901

Closed

Extension to wav2letter pipeline vincentqb/audio#1

Merged

vincentqb force-pushed the wav2letter branch 2 times, most recently from d8c6de6 to c378c48 Compare September 17, 2020 21:36

vincentqb mentioned this pull request Sep 18, 2020

Low WER training pipeline in torchaudio with wav2letter #913

Closed

cpuhrsch approved these changes Sep 21, 2020

View reviewed changes

cpuhrsch reviewed Sep 21, 2020

View reviewed changes

vincentqb mentioned this pull request Sep 22, 2020

Follow-up to wav2letter pipeline vincentqb/audio#3

Open

vincentqb force-pushed the wav2letter branch from c378c48 to a2b6ad2 Compare September 22, 2020 22:03

vincentqb mentioned this pull request Sep 22, 2020

Factorization of follow-up to wav2letter pipeline vincentqb/audio#2

Open

vincentqb merged commit 9c27422 into pytorch:master Sep 24, 2020

This was referenced Sep 28, 2020

dataset transform and target_transform #923

Closed

levenshtein distance #927

Closed

vincentqb mentioned this pull request Oct 13, 2020

Feedback for wav2letter pipeline #955

Closed

vincentqb mentioned this pull request Nov 11, 2020

pipeline_wav2letter: CER still 1.0 after 23 epochs. #1016

Closed

mthrok pushed a commit to mthrok/audio that referenced this pull request Feb 26, 2021

Merge pull request pytorch#632 from pytorch/brianjo-patch-remove-old-…

e765aca

…torchscripttutorial Delete Intro_to_TorchScript.py adding redirect to get user to new file.

		return len(self._iterable)


		class MapMemoryCache(torch.utils.data.Dataset):

		# return create(["train-clean-100", "train-clean-360", "train-other-500"]), create(["dev-clean", "dev-other"]), None


		def which_set(filename, validation_percentage, testing_percentage):

		import torch


		def levenshtein_distance(r: str, h: str, device: Optional[str] = None):


		logging.info("Start time: %s", datetime.now())

		# Explicitly set seed to make sure models created in separate processes


		def collate_fn(batch):

		tensors = [transforms(b[0]) for b in batch if b]

		from typing import List, Union


		def levenshtein_distance(r: Union[str, List[str]], h: Union[str, List[str]]):

Example pipeline with wav2letter #632

Example pipeline with wav2letter #632

Uh oh!

Conversation

vincentqb commented May 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vincentqb May 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cpuhrsch May 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cpuhrsch commented May 13, 2020

Uh oh!

Uh oh!

codecov bot commented May 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cpuhrsch commented May 15, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

vincentqb commented May 12, 2020 •

edited

Loading

vincentqb May 13, 2020 •

edited

Loading

cpuhrsch May 13, 2020 •

edited

Loading

codecov bot commented May 14, 2020 •

edited

Loading

cpuhrsch Sep 11, 2020 •

edited

Loading

vincentqb Sep 16, 2020 •

edited

Loading