Skip to content
This repository was archived by the owner on Sep 10, 2025. It is now read-only.

Commit fe7e5b2

Browse files
authored
README: Add notes about device specification for AOTI inference (#956)
1 parent 900b6d4 commit fe7e5b2

File tree

1 file changed

+4
-2
lines changed

1 file changed

+4
-2
lines changed

README.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -256,7 +256,7 @@ python3 torchchat.py export llama3 --output-dso-path exportedModels/llama3.so
256256

257257
> [!NOTE]
258258
> If your machine has cuda add this flag for performance
259-
`--quantize config/data/cuda.json` when exporting. You'll also need to tell generate to use `--device cuda` and the runner to use `-d CUDA`
259+
`--quantize config/data/cuda.json` when exporting.
260260

261261

262262
### Run in a Python Enviroment
@@ -266,6 +266,7 @@ To run in a python enviroment, use the generate subcommand like before, but incl
266266
```
267267
python3 torchchat.py generate llama3 --dso-path exportedModels/llama3.so --prompt "Hello my name is"
268268
```
269+
**Note:** Depending on which accelerator is used to generate the .dso file, the command may need the device specified: `--device (cuda | cpu)`.
269270

270271

271272
### Run using our C++ Runner
@@ -275,10 +276,11 @@ To run in a C++ enviroment, we need to build the runner binary.
275276
scripts/build_native.sh aoti
276277
```
277278

278-
Then run the compiled executable, with the exported DSO from earlier:
279+
Then run the compiled executable, with the exported DSO from earlier.
279280
```bash
280281
cmake-out/aoti_run exportedModels/llama3.so -z `python3 torchchat.py where llama3`/tokenizer.model -l 3 -i "Once upon a time"
281282
```
283+
**Note:** Depending on which accelerator is used to generate the .dso file, the runner may need the device specified: `-d (CUDA | CPU)`.
282284

283285
## Mobile Execution
284286

0 commit comments

Comments
 (0)