Skip to content

Commit 96d1cb1

Browse files
committed
improve TensorBoard instructions in README
ghstack-source-id: 7dc4a80 Pull Request resolved: #96
1 parent 5a1689f commit 96d1cb1

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -32,21 +32,21 @@ run the llama debug model locally to verify the setup is correct:
3232

3333
# TensorBoard
3434

35-
To visualize training metrics on TensorBoard:
35+
To visualize TensorBoard metrics of models trained on a remote server via a local web browser:
3636

37-
1. (by default) set `enable_tensorboard = true` in `torchtrain/train_configs/train_config.toml`
37+
1. Make sure `metrics.enable_tensorboard` option is set to true in model training (either from a .toml file or from CLI).
3838

39-
2. set up SSH tunneling
39+
2. Set up SSH tunneling, by running the following from local CLI
4040
```
4141
ssh -L 6006:127.0.0.1:6006 [username]@[hostname]
4242
```
4343

44-
3. then in the torchtrain repo
44+
3. Inside the SSH tunnel that logged into the remote server, go to the torchtrain repo, and start the TensorBoard backend
4545
```
4646
tensorboard --logdir=./torchtrain/outputs/tb
4747
```
4848

49-
4. go to the URL it provides OR to http://localhost:6006/
49+
4. In the local web browser, go to the URL it provides OR to http://localhost:6006/.
5050

5151
## Multi-Node Training
5252
For training on ParallelCluster/Slurm type configurations, you can use the multinode_trainer.slurm file to submit your sbatch job.</br>

0 commit comments

Comments
 (0)