pytorch
diff --git a/‎examples/source_separation/README.md‎
Lines changed: 168 additions & 0 deletions b/‎examples/source_separation/README.md‎
Lines changed: 168 additions & 0 deletions
diff --git a/‎examples/source_separation/conv_tasnet/README.md‎
Lines changed: 52 additions & 0 deletions b/‎examples/source_separation/conv_tasnet/README.md‎
Lines changed: 52 additions & 0 deletions
diff --git a/‎examples/source_separation/conv_tasnet/__init__.py‎
Lines changed: 4 additions & 0 deletions b/‎examples/source_separation/conv_tasnet/__init__.py‎
Lines changed: 4 additions & 0 deletions
@@ -0,0 +1,168 @@
+# Source Separation Example
+
+This directory contains reference implementations for source separations. For the detail of each model, please checkout the followings.
+
+- [Conv-TasNet](./conv_tasnet/README.md)
+
+## Usage
+
+### Overview
+
+To traing a model, you can use [`train.py`](./train.py). This script takes the form of
+`train.py [parameters for distributed training] -- [parameters for model/training]`
+
+    ```
+    python train.py \
+            [--worker-id WORKER_ID] \
+            [--device-id DEVICE_ID] \
+            [--num-workers NUM_WORKERS] \
+            [--sync-protocol SYNC_PROTOCOL] \
+            -- \
+            <model specific training parameters>
+
+    # For the detail of the parameter values, use;
+    python train.py --help
+
+    # For the detail of the model parameters, use;
+    python train.py -- --help
+    ```
+
+If you would like to just try out the traing script, then try it without any parameters
+for distributed training. `train.py -- --sample-rate 8000 --batch-size <BATCH_SIZE> --dataset-dir <DATASET_DIR> --save-dir <SAVE_DIR>`
+
+This script runs training in Distributed Data Parallel (DDP) framework and has two major
+operation modes. This behavior depends on if `--worker-id` argument is given or not.
+
+1. (`--worker-id` is not given) Launchs training worker subprocesses that performs the actual training.
+2. (`--worker-id` is given) Performs the training as a part of distributed training.
+
+When launching the script without any distributed trainig parameters (operation mode 1),
+this script will check the number of GPUs available on the local system and spawns the same
+number of training subprocesses (as operaiton mode 2). You can reduce the number of GPUs with
+`--num-workers`. If there is no GPU available, only one subprocess is launched and providing
+`--num-workers` larger than 1 results in error.
+
+When launching the script as a worker process of a distributed training, you need to configure
+the coordination of the workers.
+
+- `--num-workers` is the number of training processes being launched.
+- `--worker-id` is the process rank (must be unique across all the processes).
+- `--device-id` is the GPU device ID (should be unique within node).
+- `--sync-protocl` is how each worker process communicate and synchronize.
+  If the training is carried out on a single node, then the default `"env://"` should do.
+  If the training processes span across multiple nodes, then you need to provide a protocol that
+  can communicate over the network. If you know where the master node is located, you can use
+  `"env://"` in combination with `MASTER_ADDR` and `MASER_PORT` environment variables. If you do
+  not know where the master node is located beforehand, you can use `"file://..."` protocol to
+  indicate where the file to which all the worker process have access is located. For other
+  protocols, please refer to the official documentation.
+
+### Distributed Training Notes
+
+<details><summary>Quick overview on DDP (distributed data parallel)</summary>
+
+DDP is single-program multiple-data training paradigm.
+With DDP, the model is replicated on every process,
+and every model replica will be fed with a different set of input data samples.
+
+- **Process**: Worker process (as in Linux process). There are `P` processes per a Node.
+- **Node**: A machine. There are `N` machines, each of which holds `P` processes.
+- **World**: network of nodes, composed of `N` nodes and `N * P` processes.
+- **Rank**: Grobal process ID (unique across nodes) `[0, N * P)`
+- **Local Rank**: Local process ID (unique only within a node) `[0, P)`
+
+```
+          Node 0                    Node 1                          Node N-1
+┌────────────────────────┐┌─────────────────────────┐     ┌───────────────────────────┐
+│╔══════════╗ ┌─────────┐││┌───────────┐ ┌─────────┐│     │┌─────────────┐ ┌─────────┐│
+│║ Process  ╟─┤ GPU: 0  ││││ Process   ├─┤ GPU: 0  ││     ││ Process     ├─┤ GPU: 0  ││
+│║ Rank: 0  ║ └─────────┘│││ Rank:P    │ └─────────┘│     ││ Rank:NP-P   │ └─────────┘│
+│╚══════════╝            ││└───────────┘            │     │└─────────────┘            │
+│┌──────────┐ ┌─────────┐││┌───────────┐ ┌─────────┐│     │┌─────────────┐ ┌─────────┐│
+││ Process  ├─┤ GPU: 1  ││││ Process   ├─┤ GPU: 1  ││     ││ Process     ├─┤ GPU: 1  ││
+││ Rank: 1  │ └─────────┘│││ Rank:P+1  │ └─────────┘│     ││ Rank:NP-P+1 │ └─────────┘│
+│└──────────┘            ││└───────────┘            │ ... │└─────────────┘            │
+│                        ││                         │     │                           │
+│ ...                    ││ ...                     │     │ ...                       │
+│                        ││                         │     │                           │
+│┌──────────┐ ┌─────────┐││┌───────────┐ ┌─────────┐│     │┌─────────────┐ ┌─────────┐│
+││ Process  ├─┤ GPU:P-1 ││││ Process   ├─┤ GPU:P-1 ││     ││ Process     ├─┤ GPU:P-1 ││
+││ Rank:P-1 │ └─────────┘│││ Rank:2P-1 │ └─────────┘│     ││ Rank:NP-1   │ └─────────┘│
+│└──────────┘            ││└───────────┘            │     │└─────────────┘            │
+└────────────────────────┘└─────────────────────────┘     └───────────────────────────┘
+```
+
+</details>
+
+### SLURM
+
+When launched as SLURM job, the follwoing environment variables correspond to
+
+- **SLURM_PROCID*: `--worker-id` (Rank)
+- **SLURM_NTASKS** (or legacy **SLURM_NPPROCS**): the number of total processes (`--num-workers` == world size)
+- **SLURM_LOCALID**: Local Rank (to be mapped with GPU index*)
+
+* Even when GPU resource is allocated with `--gpus-per-task=1`, if there are muptiple
+tasks allocated on the same node, (thus multiple GPUs of the node are allocated to the job)
+each task can see all the GPUs allocated for the tasks. Therefore we need to use
+SLURM_LOCALID to tell each task to which GPU it should be using.
+
+<details><summary>Example scripts for running the training on SLURM cluster</summary>
+
+- **launch_job.sh**
+
+```bash
+#!/bin/bash
+
+#SBATCH --job-name=source_separation
+
+#SBATCH --output=/checkpoint/%u/jobs/%x/%j.out
+
+#SBATCH --error=/checkpoint/%u/jobs/%x/%j.err
+
+#SBATCH --nodes=1
+
+#SBATCH --ntasks-per-node=8
+
+#SBATCH --cpus-per-task=8
+
+#SBATCH --mem-per-cpu=16G
+
+#SBATCH --gpus-per-task=1
+
+#srun env
+srun wrapper.sh $@
+```
+
+- **wrapper.sh**
+
+```bash
+#!/bin/bash
+num_speakers=2
+this_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
+save_dir="/checkpoint/${USER}/jobs/${SLURM_JOB_NAME}/${SLURM_JOB_ID}"
+dataset_dir="/dataset/wsj0-mix/${num_speakers}speakers/wav8k/min"
+
+if [ "${SLURM_JOB_NUM_NODES}" -gt 1 ]; then
+    protocol="file:///checkpoint/${USER}/jobs/source_separation/${SLURM_JOB_ID}/sync"
+else
+    protocol="env://"
+fi
+
+mkdir -p "${save_dir}"
+
+python -u \
+  "${this_dir}/train.py" \
+  --worker-id "${SLURM_PROCID}" \
+  --num-workers "${SLURM_NTASKS}" \
+  --device-id "${SLURM_LOCALID}" \
+  --sync-protocol "${protocol}" \
+  -- \
+  --num-speakers "${num_speakers}" \
+  --sample-rate 8000 \
+  --dataset-dir "${dataset_dir}" \
+  --save-dir "${save_dir}" \
+  --batch-size $((16 / SLURM_NTASKS))
+```
+
+</details>
@@ -0,0 +1,52 @@
+# Conv-TasNet
+
+This is a reference implementation of Conv-TasNet.
+
+> Luo, Yi, and Nima Mesgarani. "Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation." IEEE/ACM Transactions on Audio, Speech, and Language Processing 27.8 (2019): 1256-1266. Crossref. Web.
+
+This implementation is based on [arXiv:1809.07454v3](https://arxiv.org/abs/1809.07454v3) and [the reference implementation](https://github.com/naplab/Conv-TasNet) provided by the authors.
+
+For the usage, please checkout the [source separation README](../README.md).
+
+## (Default) Training Configurations
+
+The default training/model configurations follows the best non-causal implementation from the paper. (causal configuration is not implemented.)
+
+ - Sample rate: 8000 Hz
+ - Batch size: total 16 over distributed training workers
+ - Epochs: 100
+ - Initial learning rate: 1e-3
+ - Gradient clipping: maximum L2 norm of 5.0
+ - Optimizer: Adam
+ - Learning rate scheduling: Halved after 3 epochs of no improvement in validation accuracy.
+ - Objective function: SI-SNRi (The paper uses SI-SNR as the objective function, which differs only by a constant value, so it should practically yield the same result.)
+ - Reported metrics: SI-SNRi, SDRi
+ - Sample audio length: 4 seconds (first 4 seconds)
+ - Encoder/Decoder feature dimension (N): 512
+ - Encoder/Decoder convolution kernel size (L): 16
+ - TCN bottleneck/output feature dimension (B): 128
+ - TCN hidden feature dimension (H): 512
+ - TCN skip connection feature dimension (Sc): 128
+ - TCN convolution kernel size (P): 3
+ - The number of TCN convolution block layers (X): 8
+ - The number of TCN convolution blocks (R): 3
+
+## Evaluation
+
+The following is the evaluation result of training the model on WSJ0-2mix and WSJ0-3mix datasets.
+
+### wsj0-mix 2speakers
+
+|                    | SI-SNRi (dB) | SDRi (dB) | Epoch |
+|:------------------:|-------------:|----------:|------:|
+| Reference          |         15.3 |      15.6 |       |
+| Validation dataset |         13.3 |      13.3 |    86 |
+| Evaluation dataset |         11.3 |      11.3 |    86 |
+
+### wsj0-mix 3speakers
+
+|                    | SI-SNRi (dB) | SDRi (dB) | Epoch |
+|:------------------:|-------------:|----------:|------:|
+| Reference          |         12.7 |      13.1 |       |
+| Validation dataset |         11.5 |      11.5 |    87 |
+| Evaluation dataset |          8.7 |       8.6 |    87 |
@@ -0,0 +1,4 @@
+from . import (
+    train,
+    trainer,
+)
-Original file line number
+Diff line change
@@ @@ -0,0 +1,4 @@ @@
 +from . import (
 +    train,
 +    trainer,
 +)