Skip to content

Conversation

@wwwjn
Copy link
Contributor

@wwwjn wwwjn commented Jun 19, 2025

Command to run: NGPU=1 CONFIG_FILE="./torchtitan/models/deepseek_v3/train_configs/debug_model.toml" ./run_train.sh

Context

  1. Added model args for 4 model settings, and training config for debug model
  2. Debugged the forward pass, and the backward pass works out of pocket.
  3. Reused c4-test dataset, and tiktokenizer from llama3 model for current testing

Screenshot 2025-06-20 at 11 52 49 AM

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jun 19, 2025
@wwwjn wwwjn requested review from H-Huang and tianyu-l June 19, 2025 15:28
Copy link
Contributor

@tianyu-l tianyu-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks quite good! left some comments

@wwwjn wwwjn requested review from H-Huang and tianyu-l June 20, 2025 19:09
Copy link
Member

@H-Huang H-Huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thanks for getting working so quickly!

@wwwjn wwwjn merged commit 968a889 into deepseek-v3 Jun 23, 2025
5 checks passed
@tianyu-l tianyu-l deleted the dsv3-configs branch June 24, 2025 03:59
H-Huang pushed a commit to H-Huang/torchtitan that referenced this pull request Jun 26, 2025
Command to run: `NGPU=1
CONFIG_FILE="./torchtitan/models/deepseek_v3/train_configs/debug_model.toml"
./run_train.sh`

## Context
1. Added model args for 4 model settings, and training config for debug
model
2. Debugged the forward pass, and the backward pass works out of pocket.
3. Reused c4-test dataset, and tiktokenizer from llama3 model for
current testing

![Screenshot 2025-06-20 at 11 52
49 AM](https://github.com/user-attachments/assets/81d938a2-9a85-4e8c-b8e1-7f9510d785c2)
wwwjn added a commit that referenced this pull request Jul 1, 2025
Command to run: `NGPU=1
CONFIG_FILE="./torchtitan/models/deepseek_v3/train_configs/debug_model.toml"
./run_train.sh`

## Context
1. Added model args for 4 model settings, and training config for debug
model
2. Debugged the forward pass, and the backward pass works out of pocket.
3. Reused c4-test dataset, and tiktokenizer from llama3 model for
current testing

![Screenshot 2025-06-20 at 11 52
49 AM](https://github.com/user-attachments/assets/81d938a2-9a85-4e8c-b8e1-7f9510d785c2)
wwwjn added a commit that referenced this pull request Jul 1, 2025
Command to run: `NGPU=1
CONFIG_FILE="./torchtitan/models/deepseek_v3/train_configs/debug_model.toml"
./run_train.sh`

## Context
1. Added model args for 4 model settings, and training config for debug
model
2. Debugged the forward pass, and the backward pass works out of pocket.
3. Reused c4-test dataset, and tiktokenizer from llama3 model for
current testing

![Screenshot 2025-06-20 at 11 52
49 AM](https://github.com/user-attachments/assets/81d938a2-9a85-4e8c-b8e1-7f9510d785c2)
wwwjn added a commit that referenced this pull request Jul 2, 2025
Command to run: `NGPU=1
CONFIG_FILE="./torchtitan/models/deepseek_v3/train_configs/debug_model.toml"
./run_train.sh`

## Context
1. Added model args for 4 model settings, and training config for debug
model
2. Debugged the forward pass, and the backward pass works out of pocket.
3. Reused c4-test dataset, and tiktokenizer from llama3 model for
current testing

![Screenshot 2025-06-20 at 11 52
49 AM](https://github.com/user-attachments/assets/81d938a2-9a85-4e8c-b8e1-7f9510d785c2)
H-Huang pushed a commit to H-Huang/torchtitan that referenced this pull request Jul 3, 2025
Command to run: `NGPU=1
CONFIG_FILE="./torchtitan/models/deepseek_v3/train_configs/debug_model.toml"
./run_train.sh`

## Context
1. Added model args for 4 model settings, and training config for debug
model
2. Debugged the forward pass, and the backward pass works out of pocket.
3. Reused c4-test dataset, and tiktokenizer from llama3 model for
current testing

![Screenshot 2025-06-20 at 11 52
49 AM](https://github.com/user-attachments/assets/81d938a2-9a85-4e8c-b8e1-7f9510d785c2)
H-Huang pushed a commit to H-Huang/torchtitan that referenced this pull request Jul 8, 2025
Command to run: `NGPU=1
CONFIG_FILE="./torchtitan/models/deepseek_v3/train_configs/debug_model.toml"
./run_train.sh`

## Context
1. Added model args for 4 model settings, and training config for debug
model
2. Debugged the forward pass, and the backward pass works out of pocket.
3. Reused c4-test dataset, and tiktokenizer from llama3 model for
current testing

![Screenshot 2025-06-20 at 11 52
49 AM](https://github.com/user-attachments/assets/81d938a2-9a85-4e8c-b8e1-7f9510d785c2)
wwwjn added a commit that referenced this pull request Jul 8, 2025
Command to run: `NGPU=1
CONFIG_FILE="./torchtitan/models/deepseek_v3/train_configs/debug_model.toml"
./run_train.sh`

## Context
1. Added model args for 4 model settings, and training config for debug
model
2. Debugged the forward pass, and the backward pass works out of pocket.
3. Reused c4-test dataset, and tiktokenizer from llama3 model for
current testing

![Screenshot 2025-06-20 at 11 52
49 AM](https://github.com/user-attachments/assets/81d938a2-9a85-4e8c-b8e1-7f9510d785c2)
wwwjn added a commit that referenced this pull request Jul 10, 2025
Command to run: `NGPU=1
CONFIG_FILE="./torchtitan/models/deepseek_v3/train_configs/debug_model.toml"
./run_train.sh`

## Context
1. Added model args for 4 model settings, and training config for debug
model
2. Debugged the forward pass, and the backward pass works out of pocket.
3. Reused c4-test dataset, and tiktokenizer from llama3 model for
current testing

![Screenshot 2025-06-20 at 11 52
49 AM](https://github.com/user-attachments/assets/81d938a2-9a85-4e8c-b8e1-7f9510d785c2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants