Skip to content

Commit d60dad6

Browse files
dongfengybyshiue
andauthored
[None][fix] Update deployment guide and cherry-pick CI test fix from main (#7623)
Signed-off-by: Dongfeng Yu <[email protected]> Signed-off-by: bhsueh <[email protected]> Co-authored-by: bhsueh_NV <[email protected]>
1 parent 75745c7 commit d60dad6

File tree

4 files changed

+17
-13
lines changed

4 files changed

+17
-13
lines changed

docs/source/deployment-guide/quick-start-recipe-for-gpt-oss-on-trtllm.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@ cuda_graph_config:
7272
max_batch_size: 720
7373
moe_config:
7474
backend: TRTLLM
75-
stream_interval: 10
75+
stream_interval: 20
7676
num_postprocess_workers: 4
7777
EOF
7878
```
@@ -89,8 +89,12 @@ cuda_graph_config:
8989
max_batch_size: 720
9090
moe_config:
9191
backend: CUTLASS
92-
stream_interval: 10
92+
stream_interval: 20
9393
num_postprocess_workers: 4
94+
attention_dp_config:
95+
enable_balance: true
96+
batching_wait_iters: 50
97+
timeout_iters: 1
9498
EOF
9599
```
96100

tests/integration/defs/accuracy/references/gsm8k.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -189,5 +189,7 @@ GPT-OSS/MXFP4:
189189
accuracy: 90.3
190190
- quant_algo: W4A8_MXFP4_FP8
191191
accuracy: 90.3
192+
- quant_algo: W4A16_MXFP4
193+
accuracy: 90.3
192194
LGAI-EXAONE/EXAONE-4.0-32B:
193195
- accuracy: 88.36

tests/integration/defs/accuracy/test_llm_api_pytorch.py

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2866,8 +2866,6 @@ class TestGPTOSS(LlmapiAccuracyTestHarness):
28662866
extra_evaluator_kwargs = {
28672867
"fewshot_as_multiturn": True,
28682868
"apply_chat_template": True,
2869-
"scores_filter": "exact_match,flexible-extract",
2870-
"MAX_OUTPUT_LEN": 8192
28712869
}
28722870

28732871
MODEL_PATH = f"{llm_models_root()}/gpt_oss/gpt-oss-120b"
@@ -2881,7 +2879,9 @@ class TestGPTOSS(LlmapiAccuracyTestHarness):
28812879
(True, True),
28822880
])
28832881
def test_w4_1gpu(self, moe_backend, cuda_graph, overlap_scheduler, mocker):
2884-
pytest.skip("https://nvbugs/5481087")
2882+
mocker.patch.object(GSM8K, "MAX_OUTPUT_LEN", 8192)
2883+
mocker.patch.dict(GSM8K.EVALUATE_KWARGS,
2884+
{"scores_filter": "exact_match,flexible-extract"})
28852885
if moe_backend == "TRITON" and not IS_TRITON_KERNELS_AVAILABLE:
28862886
pytest.skip("Triton kernels are not available")
28872887

@@ -2899,7 +2899,6 @@ def test_w4_1gpu(self, moe_backend, cuda_graph, overlap_scheduler, mocker):
28992899

29002900
with llm:
29012901
model_name = "GPT-OSS/MXFP4"
2902-
mocker.patch.object(GSM8K, "MAX_OUTPUT_LEN", 8192)
29032902
task = GSM8K(model_name)
29042903
task.evaluate(llm,
29052904
extra_evaluator_kwargs=self.extra_evaluator_kwargs)
@@ -2919,7 +2918,9 @@ def test_w4_1gpu(self, moe_backend, cuda_graph, overlap_scheduler, mocker):
29192918
ids=["tp4", "ep4", "dp4"])
29202919
def test_w4_4gpus(self, moe_backend, tp_size, pp_size, ep_size,
29212920
attention_dp, cuda_graph, overlap_scheduler, mocker):
2922-
pytest.skip("https://nvbugs/5481087")
2921+
mocker.patch.object(GSM8K, "MAX_OUTPUT_LEN", 8192)
2922+
mocker.patch.dict(GSM8K.EVALUATE_KWARGS,
2923+
{"scores_filter": "exact_match,flexible-extract"})
29232924
if moe_backend == "TRITON":
29242925
if not IS_TRITON_KERNELS_AVAILABLE:
29252926
pytest.skip("Triton kernels are not available")
@@ -2940,7 +2941,6 @@ def test_w4_4gpus(self, moe_backend, tp_size, pp_size, ep_size,
29402941
with llm:
29412942
model_name = "GPT-OSS/MXFP4"
29422943
task = GSM8K(model_name)
2943-
mocker.patch.object(GSM8K, "MAX_OUTPUT_LEN", 8192)
29442944
task.evaluate(llm,
29452945
extra_evaluator_kwargs=self.extra_evaluator_kwargs)
29462946

@@ -2952,6 +2952,9 @@ def test_w4_4gpus(self, moe_backend, tp_size, pp_size, ep_size,
29522952
ids=["dp4"])
29532953
def test_w4a16(self, tp_size, pp_size, ep_size, attention_dp, cuda_graph,
29542954
overlap_scheduler, monkeypatch, mocker):
2955+
mocker.patch.object(GSM8K, "MAX_OUTPUT_LEN", 8192)
2956+
mocker.patch.dict(GSM8K.EVALUATE_KWARGS,
2957+
{"scores_filter": "exact_match,flexible-extract"})
29552958
if not IS_TRITON_KERNELS_AVAILABLE:
29562959
pytest.skip("Triton kernels are not available")
29572960
monkeypatch.setenv("OVERRIDE_QUANT_ALGO", "W4A16_MXFP4")
@@ -2971,7 +2974,6 @@ def test_w4a16(self, tp_size, pp_size, ep_size, attention_dp, cuda_graph,
29712974
with llm:
29722975
model_name = "GPT-OSS/BF16"
29732976
task = GSM8K(model_name)
2974-
mocker.patch.object(GSM8K, {"MAX_OUTPUT_LEN": 8192})
29752977
task.evaluate(llm,
29762978
extra_evaluator_kwargs=self.extra_evaluator_kwargs)
29772979

tests/integration/test_lists/waives.txt

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -331,10 +331,6 @@ accuracy/test_cli_flow.py::TestPhi4MiniInstruct::test_tp2 SKIP (https://nvbugs/5
331331
accuracy/test_cli_flow.py::TestLongAlpaca7B::test_auto_dtype SKIP (https://nvbugs/5481075)
332332
accuracy/test_llm_api.py::TestPhi4MiniInstruct::test_fp8 SKIP (https://nvbugs/5465143)
333333
accuracy/test_llm_api_pytorch.py::TestDeepSeekR1::test_fp8_blockscale[throughput] SKIP (https://nvbugs/5471106)
334-
accuracy/test_llm_api_pytorch.py::TestGPTOSS::test_w4_1gpu[True-True-cutlass] SKIP (https://nvbugs/5481080)
335-
accuracy/test_llm_api_pytorch.py::TestGPTOSS::test_w4_4gpus[tp4-cutlass] SKIP (https://nvbugs/5481080)
336-
accuracy/test_llm_api_pytorch.py::TestGPTOSS::test_w4_4gpus[ep4-cutlass] SKIP (https://nvbugs/5481080)
337-
accuracy/test_llm_api_pytorch.py::TestGPTOSS::test_w4_4gpus[dp4-cutlass] SKIP (https://nvbugs/5481080)
338334
accuracy/test_llm_api_pytorch.py::TestEXAONE4::test_auto_dtype SKIP (https://nvbugs/5481090)
339335
test_e2e.py::test_ptp_quickstart_advanced_8gpus_chunked_prefill_sq_22k[Llama-4-Maverick-17B-128E-Instruct-FP8-llama4-models/nvidia/Llama-4-Maverick-17B-128E-Instruct-FP8-False] SKIP (https://nvbugs/5481094)
340336
test_e2e.py::test_ptp_quickstart_advanced_8gpus_chunked_prefill_sq_22k[Llama-4-Maverick-17B-128E-Instruct-FP8-llama4-models/nvidia/Llama-4-Maverick-17B-128E-Instruct-FP8-True] SKIP (https://nvbugs/5481094)

0 commit comments

Comments
 (0)