Skip to content
This repository was archived by the owner on Sep 10, 2025. It is now read-only.

Commit f401a2f

Browse files
authored
Update distributed.md
Uncommenting section about generate subcommand w/ distributed inference after review by @mreso Also, Added HF login to make this fully self-contained
1 parent b65f0e4 commit f401a2f

File tree

1 file changed

+16
-7
lines changed

1 file changed

+16
-7
lines changed

docs/distributed.md

Lines changed: 16 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -21,29 +21,38 @@ source .venv/bin/activate
2121

2222
[shell default]: ./install/install_requirements.sh
2323

24+
## Download Weights
25+
Most models use Hugging Face as the distribution channel, so you will need to create a Hugging Face account. Create a Hugging Face user access token as documented here with the write role.
26+
27+
Log into Hugging Face:
28+
29+
[prefix default]: HF_TOKEN="${SECRET_HF_TOKEN_PERIODIC}"
30+
31+
```
32+
huggingface-cli login
33+
```
34+
2435
## Enabling Distributed torchchat Inference
2536

2637
To enable distributed inference, use the option `--distributed`. In addition, `--tp <num>` and `--pp <num>`
2738
allow users to specify the types of parallelism to use (where tp refers to tensor parallelism and pp to pipeline parallelism).
2839

29-
<!--
30-
[skip default]: begin
31-
## Generate output (requires testing and review by mreso)
40+
41+
## Generate Output with Distributed torchchat Inference
3242

3343
To generate output using distributed inference with 4 GPUs, you can use:
3444
```
35-
python3 torchchat.py generate llama3.1 --distributed --tp 2 --pp 2 --prompt "write me a story about a boy and his bear"
45+
python3 torchchat.py generate llama3.1 --distributed --tp 2 --pp 2 --prompt "write me a story about a boy and his bear"
3646
```
37-
[skip default]: end
38-
-->
47+
3948

4049
## Chat with Distributed torchchat Inference
4150

4251
This mode allows you to chat with an LLM in an interactive fashion with distributed Inference. The following example uses 4 GPUs:
4352

4453
[skip default]: begin
4554
```bash
46-
python3 torchchat.py chat llama3.1 --max-new-tokens 10 --distributed --tp 2 --pp 2
55+
python3 torchchat.py chat llama3.1 --max-new-tokens 10 --distributed --tp 2 --pp 2
4756
```
4857
[skip default]: end
4958

0 commit comments

Comments
 (0)