You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For now this literally just runs `NGPU=4 ./run_llama_train.sh` but I
verified at least it catches problems.
As a follow up, we should integrate mgpu test infra from pytorch and set
up actual unit tests to run in this job.
We should probably also keep testing the run_llama_train.sh script, and
add other combinations of 2D parallelism to ensure they all keep
working.
<img width="2120" alt="image"
src="https://github.com/pytorch/torchtrain/assets/4984825/2c235e9a-04ed-4f2d-9915-67de39d78e1c">
0 commit comments