Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion create_seed_checkpoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@

set -ex

export USE_LIBUV=1
TRAINER_DIR=${1:-/home/$USER/local/torchtitan}
NGPU=1
LOG_RANK=0
Expand Down
1 change: 0 additions & 1 deletion multinode_trainer.slurm
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,6 @@ export NCCL_SOCKET_IFNAME="eth0,en,eth,em,bond"
export NCCL_BUFFSIZE=2097152
#export TORCH_DIST_INIT_BARRIER=1
export FI_EFA_SET_CUDA_SYNC_MEMOPS=0
#export USE_LIBUV=1
CONFIG_FILE=${CONFIG_FILE:-"./train_configs/llama2_13b.toml"}

dcgmi profile --pause
Expand Down
3 changes: 0 additions & 3 deletions run_llama_train.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,6 @@

set -ex

# libUV is a scalable backend for TCPStore which is used in processGroup
# rendezvous. This is the recommended backend for distributed training.
export USE_LIBUV=1
TRAINER_DIR=${TRAINER_DIR:-/home/$USER/local/torchtitan}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I ask what this line is doing?
My repo sits in /home/$USER/torchtitan, would it be a problem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think this line is doing anything at all. I actually have a later PR to delete it.


# use envs as local overrides for convenience
Expand Down