Skip to content
This repository was archived by the owner on Mar 21, 2024. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 14 additions & 14 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,12 @@ For each Pull Request, the affected code parts should be briefly described and a
Once a release is done, the "Upcoming" section becomes the release changelog, and a new empty "Upcoming" should be
created.


## Upcoming

### Added

- ([#671](https://github.com/microsoft/InnerEye-DeepLearning/pull/671)) Remove sequence models and unused variables. Simplify README.
- ([#693](https://github.com/microsoft/InnerEye-DeepLearning/pull/693)) Improve instructions for HelloWorld model in AzureML.
- ([#678](https://github.com/microsoft/InnerEye-DeepLearning/pull/678)) Add function to get log level name and use it for logging.
- ([#666](https://github.com/microsoft/InnerEye-DeepLearning/pull/666)) Replace RadIO with TorchIO for patch-based inference.
- ([#643](https://github.com/microsoft/InnerEye-DeepLearning/pull/643)) Test for recovery of SSL job. Tracks learning rate and train
Expand Down Expand Up @@ -160,7 +160,6 @@ in inference-only runs when using lightning containers.
- ([#633](https://github.com/microsoft/InnerEye-DeepLearning/pull/633)) Model fields `recovery_checkpoint_save_interval` and `recovery_checkpoints_save_last_k` have been retired.
Recovery checkpoint handling is now controlled by `autosave_every_n_val_epochs`.


## 0.3 (2021-06-01)

### Added
Expand Down Expand Up @@ -291,6 +290,7 @@ console for easier diagnostics.
container models on machines with >1 GPU

### Removed

- ([#439](https://github.com/microsoft/InnerEye-DeepLearning/pull/439)) Deprecated `start_epoch` config argument.
- ([#450](https://github.com/microsoft/InnerEye-DeepLearning/pull/450)) Delete unused `classification_report.ipynb`.
- ([#455](https://github.com/microsoft/InnerEye-DeepLearning/pull/455)) Removed the AzureRunner conda environment.
Expand All @@ -307,11 +307,11 @@ console for easier diagnostics.

- ([#323](https://github.com/microsoft/InnerEye-DeepLearning/pull/323)) There are new model configuration fields
(and hence, commandline options), in particular for controlling PyTorch Lightning (PL) training:
- `max_num_gpus` controls how many GPUs are used at most for training (default: all GPUs, value -1).
- `pl_num_sanity_val_steps` controls the PL trainer flag `num_sanity_val_steps`
- `pl_deterministic` controls the PL trainer flags `benchmark` and `deterministic`
- `generate_report` controls if a HTML report will be written (default: True)
- `recovery_checkpoint_save_interval` determines how often a checkpoint for training recovery is saved.
- `max_num_gpus` controls how many GPUs are used at most for training (default: all GPUs, value -1).
- `pl_num_sanity_val_steps` controls the PL trainer flag `num_sanity_val_steps`
- `pl_deterministic` controls the PL trainer flags `benchmark` and `deterministic`
- `generate_report` controls if a HTML report will be written (default: True)
- `recovery_checkpoint_save_interval` determines how often a checkpoint for training recovery is saved.
- ([#336](https://github.com/microsoft/InnerEye-DeepLearning/pull/336)) New extensions of
SegmentationModelBases `HeadAndNeckBase` and `ProstateBase`. Use these classes to build your own Head&Neck or Prostate
models, by just providing a list of foreground classes.
Expand All @@ -326,17 +326,17 @@ console for easier diagnostics.

- ([#323](https://github.com/microsoft/InnerEye-DeepLearning/pull/323)) The codebase has undergone a massive
refactoring, to use PyTorch Lightning as the foundation for all training. As a consequence of that:
- Training is now using Distributed Data Parallel with synchronized `batchnorm`. The number of GPUs to use can be
- Training is now using Distributed Data Parallel with synchronized `batchnorm`. The number of GPUs to use can be
controlled by a new commandline argument `max_num_gpus`.
- Several classes, like `ModelTrainingSteps*`, have been removed completely.
- The final model is now always the one that is written at the end of all training epochs.
- The old code that options to run full image inference at multiple epochs (i.e., multiple checkpoints), this has
- Several classes, like `ModelTrainingSteps*`, have been removed completely.
- The final model is now always the one that is written at the end of all training epochs.
- The old code that options to run full image inference at multiple epochs (i.e., multiple checkpoints), this has
been removed, alongside the respective commandline options `save_start_epoch`, `save_step_epochs`,
`epochs_to_test`, `test_diff_epochs`, `test_step_epochs`, `test_start_epoch`
- The commandline option `register_model_only_for_epoch` is now called `only_register_model`, and is boolean.
- All metrics are written to AzureML and Tensorboard in a unified format. A training Dice score for 'bladder' would
- The commandline option `register_model_only_for_epoch` is now called `only_register_model`, and is boolean.
- All metrics are written to AzureML and Tensorboard in a unified format. A training Dice score for 'bladder' would
previously be called Train_Dice/bladder, now it is train/Dice/bladder.
- Due to a different checkpoint format, it is no longer possible to use checkpoints written by the previous version
- Due to a different checkpoint format, it is no longer possible to use checkpoints written by the previous version
of the code.
- The arguments of the `score.py` script changed: `data_root` -> `data_folder`, it no longer assumes a fixed
`data` subfolder. `project_root` -> `model_root`, `test_image_channels` -> `image_files`.
Expand Down
7 changes: 2 additions & 5 deletions InnerEye/ML/configs/segmentation/HelloWorld.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,8 @@ class HelloWorld(SegmentationModelBase):

* This model can be trained from the commandline: python InnerEye/runner.py --model=HelloWorld

* If you want to test that your AzureML workspace is working:
- Upload to datasets storage account for your AzureML workspace: Test/ML/test_data/dataset.csv and
Test/ML/test_data/train_and_test_data and name the folder "hello_world"
- If you have set up AzureML then parameter search can be performed for this model by running:
python InnerEye/ML/ runner.py --model=HelloWorld --azureml=True --hyperdrive=True
* If you want to test that your AzureML workspace is working, please follow the instructions in
<repo_root>/docs/hello_world_model.md.

In this example, the model is trained on 2 input image channels channel1 and channel2, and
predicts 2 foreground classes region, region_1.
Expand Down
82 changes: 73 additions & 9 deletions docs/hello_world_model.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,81 @@
# Training a Hello World segmentation model

In the configs folder, you will find a config file called [HelloWorld.py](../InnerEye/ML/configs/segmentation/HelloWorld.py)
In the configs folder, you will find a config file called [HelloWorld.py](../InnerEye/ML/configs/segmentation/HelloWorld.py)
We have created this file to demonstrate how to:

1. Subclass SegmentationModelBase which is the base config for all segmentation model configs
1. Configure the UNet3D implemented in this package
1. Configure Azure HyperDrive based parameter search

* This model can be trained from the commandline, from the root of the repo: `python InnerEye/ML/runner.py --model=HelloWorld`
* If you want to test your AzureML workspace with the HelloWorld model:
* Make sure your AzureML workspace has been set up. You should have inside the folder InnerEye a settings.yml file
that specifies the datastore, the resource group, and the workspace on which to run
* Upload to datasets storage account for your AzureML workspace: `Tests/ML/test_data/dataset.csv` and
`Test/ML/test_data/train_and_test_data` and name the folder "hello_world"
* If you have set up AzureML then parameter search can be performed for this model by running:
`python InnerEye/ML/runner.py --model=HelloWorld --azureml --hyperdrive`
This model can be trained from the commandline from the root of the repo: `python InnerEye/ML/runner.py --model=HelloWorld`.
When used like this, it will use dummy 3D scans as the training data, that are included in this repository. Training will run
on your local dev machine.

In order to get this model to train in AzureML, you need to upload the data to blob storage. This can be done via
[Azure Storage Explorer](https://azure.microsoft.com/en-gb/features/storage-explorer/) or via the
[Azure commandline tools](https://docs.microsoft.com/en-us/cli/azure/). Please find the detailed instructions for both
options below.

Before uploading, you need to know what storage account you have set up to hold the data for your AzureML workspace, see
[Step 4 in the Azure setup](setting_up_aml.md): For the upload you need to know the name of that storage account.

## Option 1: Upload via Azure Storage explorer

First install [Azure Storage Explorer](https://azure.microsoft.com/en-gb/features/storage-explorer/).

When starting Storage Explorer, you need to [log in to Azure](https://docs.microsoft.com/en-gb/azure/vs-azure-tools-storage-manage-with-storage-explorer?tabs=windows).

* Select your subscription in the left-hand navigation, and then the storage account that you set up earlier.
* There should be a section "Blob Containers" for that account.
* Right-click on "Blob Containers", and choose "Create Blob Container". Give that container the name "datasets"
* Click on the newly created container "datasets". You should see no files present.
* Press "Upload" / "Upload folder"
* As the folder to upload, select `<repo_root>/Tests/ML/test_data/train_and_test_data`
* As the destination directory, select `/hello_world`.
* Start the upload. Press the "Refresh" button after a couple of seconds, you should now see a folder `hello_world`, and inside of it, a subfolder `train_and_test_data`.
* Press "Upload" / "Upload files".
* Choose `<repo_root>/Tests/ML/test_data/dataset.csv`, and `/hello_world` as the destination directory.
* Start the upload and refresh.
* Verify that you now have files `/hello_world/dataset.csv` and `/hello_world/train_and_test_data/id1_channel1.nii.gz`

## Option 2: Upload via the Azure CLI

First, install the [Azure commandline tools](https://docs.microsoft.com/en-us/cli/azure/).

Run the following in the command prompt:

```shell
az login
az account list
```

If the `az account list` command returns more than one subscription, run `az account set --name "your subscription name"`

The code below assumes that you are uploading to a storage account that has the name
`stor_acct`, please replace with your actual storage account name.

```shell
cd <your_repository_root>
az storage container create --account-name stor_acct --name datasets
az storage blob upload --account-name stor_acct --container-name datasets --file ./Tests/ML/test_data/dataset.csv --name hello_world/dataset.csv
az storage blob upload-batch --account-name stor_acct --destination datasets --source ./Tests/ML/test_data/train_and_test_data --destination-path hello_world/train_and_test_data
```

## Create an AzureML datastore

A "datastore" in AzureML lingo is an abstraction for the ML systems to access files that can come from different places. In our case, the datastore points to a storage container to which we have just uploaded the data.

Instructions to create the datastore are given
[in the AML setup instructions](setting_up_aml.md) in step 5.

## Run the HelloWorld model in AzureML

Double-check that you have copied your Azure settings into the settings file, as described
[in the AML setup instructions](setting_up_aml.md) in step 6.

Then execute:

```shell
conda activate InnerEye
python InnerEye/ML/runner.py --model=HelloWorld
```
Loading