From NVIDIA Megatron-LM for visibility #18

RaymondLi0 · 2023-01-24T20:01:13Z

No description provided.

ci: Add copy-pr-bot See merge request ADLR/megatron-lm!3829

Signed-off-by: oliver könig <[email protected]>

…model grads refactor Co-authored-by: Pranav Prashant Thombre <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Mcore Bot <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> Co-authored-by: Yu Yao <[email protected]>

M4 p2p communication, schedules and finalize model grads refactor See merge request ADLR/megatron-lm!3378

Co-authored-by: Tong Liu <[email protected]>

feat(moe): Add MoE router fusion See merge request ADLR/megatron-lm!3809

Apex.contrib.nccl_allocator migration See merge request ADLR/megatron-lm!3814

Signed-off-by: oliver könig <[email protected]>

…rnorm/moe_act/shared_experts

perf(MoE): Support recomputation for FP8 layernorm/moe_act/shared_experts See merge request ADLR/megatron-lm!3465

…rallel inference Co-authored-by: Mcore Bot <[email protected]> Co-authored-by: Oliver Koenig <[email protected]> Co-authored-by: Youngeun Kwon <[email protected]> Co-authored-by: Helen Ngo <[email protected]> Co-authored-by: Shifang Xu <[email protected]> Co-authored-by: James Shen <[email protected]> Co-authored-by: Kunlun Li <[email protected]> Co-authored-by: Slawek Kierat <[email protected]> Co-authored-by: Zijie Yan <[email protected]> Co-authored-by: Li Tao <[email protected]> Co-authored-by: Mikolaj Blaz <[email protected]> Co-authored-by: Charlie Truong <[email protected]> Co-authored-by: Dong Hyuk Chang <[email protected]> Co-authored-by: Chenjie Luo <[email protected]>

ZMQ based communication of requests during parallel inference See merge request ADLR/megatron-lm!3757

… to support full CUDA graph capture

Add is_cg_capturable flag to CrossEntropyLoss to support full CUDA graph capture See merge request ADLR/megatron-lm!3815

…ndently installable Co-authored-by: jianbinc <[email protected]> Co-authored-by: Youngeun Kwon <[email protected]> Co-authored-by: Cory Ye <[email protected]> Co-authored-by: Boxiang Wang <[email protected]>

[FSDP] Decouple Custom FSDP to make it independently installable See merge request ADLR/megatron-lm!3443

…ation gives an error about missing eval_iters

This fixes the bug where not using full_validation gives an error about missing eval_iters See merge request ADLR/megatron-lm!3842

Fix cuda graph when VPP is used See merge request ADLR/megatron-lm!3824

Co-authored-by: oliver könig <[email protected]> Co-authored-by: Mcore Bot <[email protected]>

chore: Upgrade dependencies (2025-08-18) See merge request ADLR/megatron-lm!3834

…oss 2/5] Co-authored-by: Mcore Bot <[email protected]>

Co-authored-by: Wil Kong <[email protected]>

Co-authored-by: Mcore Bot <[email protected]>

… independetly parallel modules. Co-authored-by: Mcore Bot <[email protected]> Co-authored-by: Pranav Prashant Thombre <[email protected]>

…ture changed and wasn't refactored

…uf-for-mxfp8-param-ag

Co-authored-by: Selvaraj Anandaraj <[email protected]>

…damW

Co-authored-by: Selvaraj Anandaraj <[email protected]>

Co-authored-by: Mcore Bot <[email protected]>

This reverts commit c173615.

…alization for Megatron FSDP

Author: Robin Zhang <[email protected]> Signed-off-by: oliver könig <[email protected]>

…TE fused MLP Co-authored-by: Mcore Bot <[email protected]>

…gradient existence assertion to fully_shard tests.

…case with decode-only graphs

… engine case with decode-only graphs" This reverts commit 4cf968c.

RaymondLi0 changed the base branch from multi-query-attention to before-merge June 20, 2023 20:12

RaymondLi0 changed the base branch from before-merge to multi-query-attention June 20, 2023 20:12

ko3n1g and others added 28 commits August 16, 2025 11:36

ADLR/megatron-lm!3829 - ci: Add copy-pr-bot

254ef23

Merge branch 'ko3n1g/ci/copy-pr-bot' into 'main'

c769b67

ci: Add copy-pr-bot See merge request ADLR/megatron-lm!3829

ci(hotfix): Restart on malloc(): unaligned tcache chunk detected

69b65e0

Signed-off-by: oliver könig <[email protected]>

Merge branch 'yash/p2p_class' into 'main'

de512dc

M4 p2p communication, schedules and finalize model grads refactor See merge request ADLR/megatron-lm!3378

ADLR/megatron-lm!3809 - feat(moe): Add MoE router fusion

c08d89b

Co-authored-by: Tong Liu <[email protected]>

Merge branch 'denliu/router_fusoin' into 'main'

d93743a

feat(moe): Add MoE router fusion See merge request ADLR/megatron-lm!3809

chore: Version bump

79d04be

ADLR/megatron-lm!3814 - Apex.contrib.nccl_allocator migration

d3df238

Merge branch 'remove-apex-nccl-allocator' into 'main'

66a1dfc

Apex.contrib.nccl_allocator migration See merge request ADLR/megatron-lm!3814

chore: Version bump 0.15.0rc0

8e11c52

Signed-off-by: oliver könig <[email protected]>

ci: Fix segfaults (maybe)

551b734

Signed-off-by: oliver könig <[email protected]>

ci: DEV tests from A100 to H100 cluster

f778f7b

Signed-off-by: oliver könig <[email protected]>

ADLR/megatron-lm!3465 - perf(MoE): Support recomputation for FP8 laye…

781e765

…rnorm/moe_act/shared_experts

Merge branch 'hongxiaob/save_original_input' into 'main'

6850cc6

perf(MoE): Support recomputation for FP8 layernorm/moe_act/shared_experts See merge request ADLR/megatron-lm!3465

Merge branch 'dynamic-inference-parallelism' into 'main'

c47cf0a

ZMQ based communication of requests during parallel inference See merge request ADLR/megatron-lm!3757

ADLR/megatron-lm!3815 - Add is_cg_capturable flag to CrossEntropyLoss…

8d9dbed

… to support full CUDA graph capture

Merge branch 'add_is_cg_capturable_flag' into 'main'

4cd81e8

Add is_cg_capturable flag to CrossEntropyLoss to support full CUDA graph capture See merge request ADLR/megatron-lm!3815

ADLR/megatron-lm!3443 - [FSDP] Decouple Custom FSDP to make it indepe…

af28b5a

…ndently installable Co-authored-by: jianbinc <[email protected]> Co-authored-by: Youngeun Kwon <[email protected]> Co-authored-by: Cory Ye <[email protected]> Co-authored-by: Boxiang Wang <[email protected]>

Merge branch 'nvfsdp_convergence' into 'main'

4dd2f2b

[FSDP] Decouple Custom FSDP to make it independently installable See merge request ADLR/megatron-lm!3443

ADLR/megatron-lm!3842 - This fixes the bug where not using full_valid…

237080b

…ation gives an error about missing eval_iters

Merge branch 'fix_full_validation_bug' into 'main'

4db3c78

This fixes the bug where not using full_validation gives an error about missing eval_iters See merge request ADLR/megatron-lm!3842

ADLR/megatron-lm!3824 - Fix cuda graph when VPP is used

7139518

Merge branch 'fix_cuda_graph_with_vpp' into 'main'

c6aab54

Fix cuda graph when VPP is used See merge request ADLR/megatron-lm!3824

ADLR/megatron-lm!3834 - chore: Upgrade dependencies (2025-08-18)

eb0c03e

Co-authored-by: oliver könig <[email protected]> Co-authored-by: Mcore Bot <[email protected]>

Merge branch 'ci-bot/build/upgrade-dependencies-2025-08-18' into 'main'

09ca1d2

chore: Upgrade dependencies (2025-08-18) See merge request ADLR/megatron-lm!3834

ADLR/megatron-lm!3600 - fix sync save utility

d6d094a

gdengk and others added 30 commits September 19, 2025 11:31

ADLR/megatron-lm!3991 - Fix the print err when torch is not intialized

c223178

ADLR/megatron-lm!3855 - Enabling mixing SWA with full attention [gpt-…

e5bc924

…oss 2/5] Co-authored-by: Mcore Bot <[email protected]>

ADLR/megatron-lm!4019 - [Flux] Fix Full Iter CUDA Graph Issues

8479eb3

Co-authored-by: Wil Kong <[email protected]>

chore: Version bump

c78fb08

ADLR/megatron-lm!3858 - Enable bias in expert mlp [gpt-oss 5/5]

a329dd6

Co-authored-by: Mcore Bot <[email protected]>

ADLR/megatron-lm!4040 - chore: Upgrade dependencies (2025-09-22)

82f564b

Co-authored-by: Mcore Bot <[email protected]>

ADLR/megatron-lm!4041 - ci: Prevent usage of print-statement

148284d

ADLR/megatron-lm!3602 - Bridge Communicator: Enable joint training of…

6245b58

… independetly parallel modules. Co-authored-by: Mcore Bot <[email protected]> Co-authored-by: Pranav Prashant Thombre <[email protected]>

ADLR/megatron-lm!3907 - Update get_tensor_shapes function whose signa…

ef8708e

…ture changed and wasn't refactored

ADLR/megatron-lm!4021 - Enable KD support with Hybrid model train loop

48d7275

ADLR/megatron-lm!3999 - bugfix: Fix convergence bug of --reuse-grad-b…

c2c36f7

…uf-for-mxfp8-param-ag

ADLR/megatron-lm!4046 - ci: Enable dev branch

d7ad48f

ADLR/megatron-lm!4038 - Add support for gradient accumulation fusion

dd7e131

Co-authored-by: Selvaraj Anandaraj <[email protected]>

ADLR/megatron-lm!4049 - chore: Add post-training group

1b40eb4

ADLR/megatron-lm!3866 - Add setting to support using either Adam or A…

03fd0b4

…damW

ADLR/megatron-lm!4039 - Added average in collective support

d91daa0

Co-authored-by: Selvaraj Anandaraj <[email protected]>

ADLR/megatron-lm!4043 - chore: Add github workflows

f32b273

Co-authored-by: Mcore Bot <[email protected]>

ADLR/megatron-lm!4055 - Disable blank issues

55c0a76

ADLR/megatron-lm!4063 - ci: Add main/dev branching to queuemanager

4fea992

Co-authored-by: Mcore Bot <[email protected]>

ADLR/megatron-lm!4053 - Add fp8_dpa option for CurrentScaling FP8 recipe

61047e6

ADLR/megatron-lm!4066 - ci(fix): Run inference tests

16e19d0

ADLR/megatron-lm!3904 - Cudagraph code refactor

c173615

Revert "ADLR/megatron-lm!3904 - Cudagraph code refactor"

793b89a

This reverts commit c173615.

ADLR/megatron-lm!4047 - fix(FSDP): avoid redundant meta device materi…

cabf5d1

…alization for Megatron FSDP

Replay (!3904) Cudagraph code refactor

98b6f0e

Author: Robin Zhang <[email protected]> Signed-off-by: oliver könig <[email protected]>

ADLR/megatron-lm!3972 - Handle pre-forward and post-forward hooks in …

8503180

…TE fused MLP Co-authored-by: Mcore Bot <[email protected]>

ADLR/megatron-lm!4062 - Fix mis-set of model_auto_sync and add basic …

03045f2

…gradient existence assertion to fully_shard tests.

ADLR/megatron-lm!4067 - Add a throughput test for the dynamic engine …

4cf968c

…case with decode-only graphs

Revert "ADLR/megatron-lm!4067 - Add a throughput test for the dynamic…

56021bd

… engine case with decode-only graphs" This reverts commit 4cf968c.

ADLR/megatron-lm!4079 - build: Test TE2.7 wheel

ce8185c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

From NVIDIA Megatron-LM for visibility #18

From NVIDIA Megatron-LM for visibility #18

Uh oh!

RaymondLi0 commented Jan 24, 2023

Uh oh!

Uh oh!

From NVIDIA Megatron-LM for visibility #18

Are you sure you want to change the base?

From NVIDIA Megatron-LM for visibility #18

Uh oh!

Conversation

RaymondLi0 commented Jan 24, 2023

Uh oh!

Uh oh!