[#9230][feat] Slimmed down implementation of nemotron H #9235

2ez4bz · 2025-11-17T22:35:41Z

Summary by CodeRabbit

New Features
- Added public API to register custom model implementations via register_custom_model_cls() method for flexible model extensibility.
- Introduced support for NemotronH model with Mamba-inspired architecture, including specialized mixer, attention, and normalization components.
Tests
- Added test coverage for custom model registration and building workflows.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Description

Why?

The reference nemotron H code on HuggingFace is out of date, and therefore bugged, and has several untested code paths. This makes an already hairy patching system even hairier.

The proposal is to do away with those patches, and replace the original implementation with one that is heavily slimmed down.

What?

This PR sets the basis for an alternative path with such an slimmed down implementation that:

fixes bugs in the current HF implementation
adds no new dependencies to TensorRT-LLM
does away with unnecessary features for TensorRT-LLM / AutoDeploy:
- no training related code (dropout, gradient checkpointing, etc.)
- no caching business (we want to replace it with our own anyway)
- no attention masking where possible
reuses existing AD custom ops for mamba SSM update / causal conv1d

In order for the above to be usable in the AD apparatus, some extensions to its LlmArgs and AutoModelForCausalLMFactory are made.

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

tensorrt_llm/_torch/auto_deploy/models/modeling_nemotron_h.py

tensorrt_llm/_torch/auto_deploy/models/hf.py

tensorrt_llm/_torch/auto_deploy/llm_args.py

tensorrt_llm/_torch/auto_deploy/models/modeling_nemotron_h.py

coderabbitai · 2025-11-19T22:05:37Z

📝 Walkthrough

Walkthrough

The changes introduce a pluggable custom model registration system for AutoModelForCausalLMFactory and implement a complete PyTorch NemotronH model with custom Mamba-inspired mixers, attention, and feed-forward components. Tests validate the new registration mechanism and update test utilities for input handling consistency.

Changes

Cohort / File(s)	Summary
Custom Model Registration System `tensorrt_llm/_torch/auto_deploy/models/hf.py`	Added `_custom_model_mapping` dictionary and `register_custom_model_cls()` class method to `AutoModelForCausalLMFactory` and `AutoModelForImageTextToTextFactory`. Modified `_build_model()` to check custom mapping before falling back to standard `automodel_cls.from_config()` flow.
NemotronH Model Implementation `tensorrt_llm/_torch/auto_deploy/models/modeling_nemotron_h.py`	New module implementing `NemotronHForCausalLM`, `NemotronHModel`, and supporting components including `NemotronHMamba2Mixer` (state-space mixer), `NemotronHAttention`, `NemotronHMLP`, `NemotronHBlock`, `NemotronHPreTrainedModel`, and custom output dataclasses. Includes RMSNorm, weight initialization, and generation support. Registers with factory as "nemotron_h".
Custom Model Registration Tests `tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_hf.py`	Added fixture to restore factory state and new test cases: `test_register_custom_model_cls()`, `test_build_model_uses_custom_model_cls_from_config()`, `test_build_model_uses_custom_model_cls_directly()`. Added imports for mocking and error validation.
Test Input Handling Updates `tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_hybrid_patches.py`	Refactored test parameterization from `pytest.param` to plain tuples and updated input preparation to explicitly extract `input_ids` before passing to `model.generate()`. Reformatted function signatures and updated helper comments.

Sequence Diagrams

sequenceDiagram
    participant Client
    participant Factory as AutoModelForCausalLMFactory
    participant Registry as _custom_model_mapping
    participant CustomModel
    participant StdAutoModel

    Client->>Factory: register_custom_model_cls("nemotron_h", CustomModel)
    Factory->>Registry: Store mapping
    
    Client->>Factory: build_model(model_config)
    Factory->>Factory: Extract model_type from config
    Factory->>Registry: Lookup model_type
    Registry-->>Factory: CustomModel found
    Factory->>CustomModel: _from_config or __init__
    CustomModel-->>Factory: Instance created
    Factory-->>Client: Return model instance

sequenceDiagram
    participant Client
    participant Factory as AutoModelForCausalLMFactory
    participant Registry as _custom_model_mapping
    participant StdAutoModel
    
    Client->>Factory: build_model(model_config)
    Factory->>Factory: Extract model_type from config
    Factory->>Registry: Lookup model_type
    Registry-->>Factory: No custom model registered
    Factory->>StdAutoModel: from_config(model_config)
    StdAutoModel-->>Factory: Instance created
    Factory-->>Client: Return model instance

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

modeling_nemotron_h.py — Largest addition with 10+ classes requiring careful review of mixer logic, attention implementation, weight initialization, and HuggingFace integration patterns
hf.py — Straightforward addition of dictionary and two simple class methods; low complexity
test_hf.py — New tests follow standard mocking patterns; verify registration and dual code paths (_from_config and direct construction)
test_hybrid_patches.py — Mechanical refactoring of test inputs; verify parameter updates don't alter test intent

Areas requiring extra attention:

NemotronHMamba2Mixer forward pass and state-space computations; verify A, B, C, D transformations align with design intent
Weight initialization in NemotronHPreTrainedModel, particularly mamba-specific A_log and dt handling
Custom model class instantiation in _build_model() — ensure both _from_config and direct constructor paths are properly tested
Input handling changes in test_hybrid_patches.py — confirm tokenizer output extraction doesn't break existing test assertions

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 17.50% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly identifies the main change: introducing a slimmed down implementation of the Nemotron H model, which aligns with the PR's core objective.
Description check	✅ Passed	The PR description provides a clear explanation of why the change is needed (outdated HF implementation), what is being changed (slimmed down implementation), and the key benefits (bug fixes, no new dependencies, removed unnecessary features).

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

Provide your own instructions using the high_level_summary_instructions setting.
Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

📝 Description — Summarize the main change in 50–60 words, explaining what was done.

📓 References — List relevant issues, discussions, documentation, or related PRs.

📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.

📊 Contributor Summary — Include a Markdown table showing contributions:
| Contributor | Lines Added | Lines Removed | Files Changed |

✔️ Additional Notes — Add any extra reviewer context.
Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d4abb86 and 2b59298.

📒 Files selected for processing (4)

tensorrt_llm/_torch/auto_deploy/models/hf.py (3 hunks)
tensorrt_llm/_torch/auto_deploy/models/modeling_nemotron_h.py (1 hunks)
tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_hf.py (3 hunks)
tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_hybrid_patches.py (3 hunks)

🧰 Additional context used

🧠 Learnings (4)

📚 Learning: 2025-10-20T17:07:18.745Z

Learnt from: nvchenghaoz
Repo: NVIDIA/TensorRT-LLM PR: 8469
File: tensorrt_llm/_torch/auto_deploy/models/patches/nemotron_h.py:98-116
Timestamp: 2025-10-20T17:07:18.745Z
Learning: In NemotronH models (tensorrt_llm/_torch/auto_deploy/models/patches/nemotron_h.py), the gate (self.gate) returns topk_indices and topk_weights that are already in the correct shape to be passed directly to torch_ops.auto_deploy.torch_moe without needing to reshape them when hidden_states is flattened.

Applied to files:

tensorrt_llm/_torch/auto_deploy/models/modeling_nemotron_h.py

📚 Learning: 2025-10-20T16:54:09.824Z

Learnt from: nvchenghaoz
Repo: NVIDIA/TensorRT-LLM PR: 8469
File: tensorrt_llm/_torch/auto_deploy/custom_ops/rms_norm.py:6-6
Timestamp: 2025-10-20T16:54:09.824Z
Learning: In tensorrt_llm/_torch/auto_deploy/custom_ops/rms_norm.py, the import `from ...modules.mamba.layernorm_gated import _layer_norm_fwd` is correct and should not be changed to modules.fla.layernorm_gated. The _layer_norm_fwd function exists in both modules/mamba/layernorm_gated.py and modules/fla/layernorm_gated.py, but the mamba version is the intended implementation for this use case.

Applied to files:

tensorrt_llm/_torch/auto_deploy/models/modeling_nemotron_h.py

📚 Learning: 2025-08-27T16:22:10.695Z

Learnt from: Fridah-nv
Repo: NVIDIA/TensorRT-LLM PR: 7227
File: tensorrt_llm/_torch/auto_deploy/utils/quantization_utils.py:94-100
Timestamp: 2025-08-27T16:22:10.695Z
Learning: When there are inconsistent operator detection methods (like custom_op() vs target_op()), removing one method and standardizing on the other is often cleaner than supporting both methods simultaneously.

Applied to files:

tensorrt_llm/_torch/auto_deploy/models/modeling_nemotron_h.py

📚 Learning: 2025-09-03T13:16:38.028Z

Learnt from: nvpohanh
Repo: NVIDIA/TensorRT-LLM PR: 7478
File: tests/unittest/_torch/modeling/test_modeling_llama_min_latency.py:286-308
Timestamp: 2025-09-03T13:16:38.028Z
Learning: In test files, temporary monkey-patches for upstream bugs can be kept simple when they are explicitly intended to be removed soon, rather than investing effort in making them more robust.

Applied to files:

tensorrt_llm/_torch/auto_deploy/models/modeling_nemotron_h.py

🧬 Code graph analysis (4)

tensorrt_llm/_torch/auto_deploy/models/hf.py (1)

tensorrt_llm/bench/benchmark/__init__.py (1)

model_type (71-72)

tensorrt_llm/_torch/auto_deploy/models/modeling_nemotron_h.py (3)

tensorrt_llm/functional.py (3)

sqrt (450-454)

conv1d (3548-3585)

arange (1498-1569)

tensorrt_llm/_torch/auto_deploy/models/hf.py (1)

register_custom_model_cls (494-508)

tensorrt_llm/_torch/auto_deploy/custom_ops/torch_attention.py (1)

torch_attention (96-212)

tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_hf.py (1)

tensorrt_llm/_torch/auto_deploy/models/hf.py (2)

AutoModelForCausalLMFactory (100-508)

register_custom_model_cls (494-508)

tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_hybrid_patches.py (2)

tensorrt_llm/_torch/auto_deploy/custom_ops/attention_interface.py (2)

to (492-499)

device (217-218)

tensorrt_llm/_torch/auto_deploy/shim/interface.py (1)

to (42-46)

🪛 Ruff (0.14.5)

tensorrt_llm/_torch/auto_deploy/models/hf.py

115-115: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

tensorrt_llm/_torch/auto_deploy/models/modeling_nemotron_h.py

262-262: Avoid specifying long messages outside the exception class

(TRY003)

314-314: Avoid specifying long messages outside the exception class

(TRY003)

381-381: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

470-470: Unused method argument: prefix

(ARG002)

470-470: Unused method argument: args

(ARG002)

486-486: Unused method argument: kwargs

(ARG002)

489-489: Avoid specifying long messages outside the exception class

(TRY003)

505-505: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

532-532: Unused method argument: position_ids

(ARG002)

533-533: Unused method argument: kwargs

(ARG002)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (16)

tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_hybrid_patches.py (3)

19-21: LGTM! Enhanced test coverage.

Adding test cases for both verification modes provides better coverage of the generation path.

69-71: LGTM! Clear documentation of constraint.

The updated comment clearly explains the tokenization length requirement and the workaround being used.

175-178: LGTM! More explicit input handling.

The explicit extraction of input_ids before moving to device improves code clarity and consistency with the rest of the file.

tensorrt_llm/_torch/auto_deploy/models/hf.py (2)

212-229: LGTM! Clean custom model routing implementation.

The logic correctly:

Checks for a custom model class based on model_type

Prefers _from_config (HF convention) when available

Falls back to the default behavior when no custom model is registered

493-508: LGTM! Clean registration API.

The registration method provides a simple, well-documented interface for custom model routing.

tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_hf.py (3)

23-27: LGTM! Proper test isolation.

The fixture correctly preserves and restores the custom model mapping between tests, ensuring test independence.

137-144: LGTM! Clear test of registration functionality.

The test verifies that register_custom_model_cls correctly stores the custom model class in the mapping.

151-185: LGTM! Comprehensive test coverage of custom model paths.

Both tests effectively verify:

Custom model class is invoked based on model_type

Both construction paths (_from_config and direct) work correctly

Error propagation from custom model classes is preserved

tensorrt_llm/_torch/auto_deploy/models/modeling_nemotron_h.py (8)

38-86: LGTM! Self-contained RMS norm implementation.

The implementation correctly replaces the mamba_ssm dependency with a PyTorch + einops version, handling grouped gated RMS normalization appropriately.

158-224: LGTM! Proper custom ops integration.

The Mamba2Mixer correctly integrates torch_causal_conv1d and torch_ssm custom ops, with appropriate dtype handling throughout.

245-272: LGTM! Correct block implementation.

The block correctly switches between mixer types (mamba, attention, mlp) and applies standard pre-norm with residual connections.

306-368: LGTM! Correct attention implementation.

The attention module properly uses the torch_attention custom op with bsnd layout, matching the expected interface.

385-428: LGTM! Proper weight initialization.

The initialization logic correctly handles Mamba-specific parameters (A_log, D, dt_bias) and follows the GPT-2 rescaling scheme for prenorm residual connections.

470-501: LGTM! Correct model implementation.

The model properly:

Handles legacy checkpoint compatibility via load_hook

Validates input arguments (exactly one of input_ids or inputs_embeds)

Applies standard transformer flow: embeddings → layers → final norm

Note: The unused prefix and args parameters in load_hook are part of the PyTorch hook signature and cannot be removed.

516-540: LGTM! Correct causal LM implementation.

The implementation properly:

Delegates embedding access to the backbone

Applies dtype conversion for the LM head

Provides the interface needed for GenerationMixin

Note: The unused position_ids and kwargs parameters are required for compatibility with HuggingFace's generation interface and cannot be removed.

543-543: LGTM! Proper model registration.

The registration correctly wires NemotronHForCausalLM into the custom model factory system for the "nemotron_h" model type.

tensorrt_llm/_torch/auto_deploy/models/hf.py

tensorrt_llm/_torch/auto_deploy/models/modeling_nemotron_h.py

2ez4bz · 2025-11-20T05:18:38Z

/bot run

tensorrt-cicd · 2025-11-20T05:24:28Z

PR_Github #25148 [ run ] triggered by Bot. Commit: 2b59298

tensorrt-cicd · 2025-11-20T06:58:12Z

PR_Github #25148 [ run ] completed with state SUCCESS. Commit: 2b59298
/LLM/main/L0_MergeRequest_PR pipeline #19014 completed with status: 'FAILURE'

tensorrt_llm/_torch/auto_deploy/models/modeling_nemotron_h.py

2ez4bz · 2025-11-21T07:14:23Z

/bot run

tensorrt-cicd · 2025-11-21T07:20:40Z

PR_Github #25321 [ run ] triggered by Bot. Commit: 77ccb30

tensorrt-cicd · 2025-11-21T09:03:06Z

PR_Github #25321 [ run ] completed with state SUCCESS. Commit: 77ccb30
/LLM/main/L0_MergeRequest_PR pipeline #19155 completed with status: 'FAILURE'

govind-ramnarayan

LGTM, some small nits

tensorrt_llm/_torch/auto_deploy/models/hf.py

tensorrt_llm/_torch/auto_deploy/models/modeling_nemotron_h.py

tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_hf.py

Signed-off-by: William Zhang <[email protected]>

2ez4bz · 2025-11-22T05:55:40Z

/bot run

tensorrt-cicd · 2025-11-22T06:01:41Z

PR_Github #25412 [ run ] triggered by Bot. Commit: aff6752

tensorrt-cicd · 2025-11-22T07:29:14Z

PR_Github #25412 [ run ] completed with state SUCCESS. Commit: aff6752
/LLM/main/L0_MergeRequest_PR pipeline #19228 completed with status: 'FAILURE'

2ez4bz · 2025-11-22T17:34:26Z

/bot run

tensorrt-cicd · 2025-11-22T17:40:24Z

PR_Github #25427 [ run ] triggered by Bot. Commit: aff6752

2ez4bz force-pushed the dev-nano-ref-impl branch from 522846b to 00d516b Compare November 17, 2025 22:40

2ez4bz commented Nov 17, 2025

View reviewed changes

lucaslie reviewed Nov 18, 2025

View reviewed changes

2ez4bz force-pushed the dev-nano-ref-impl branch from 00d516b to 2b59298 Compare November 19, 2025 21:58

2ez4bz marked this pull request as ready for review November 19, 2025 22:01

2ez4bz requested a review from a team as a code owner November 19, 2025 22:01

2ez4bz requested a review from greg-kwasniewski1 November 19, 2025 22:01

coderabbitai bot reviewed Nov 19, 2025

View reviewed changes

tensorrt_llm/_torch/auto_deploy/models/hf.py Show resolved Hide resolved

tensorrt_llm/_torch/auto_deploy/models/modeling_nemotron_h.py Show resolved Hide resolved

tensorrt_llm/_torch/auto_deploy/models/modeling_nemotron_h.py Show resolved Hide resolved

suyoggupta reviewed Nov 21, 2025

View reviewed changes

tensorrt_llm/_torch/auto_deploy/models/modeling_nemotron_h.py Outdated Show resolved Hide resolved

2ez4bz force-pushed the dev-nano-ref-impl branch from 2b59298 to 77ccb30 Compare November 21, 2025 07:14

2ez4bz force-pushed the dev-nano-ref-impl branch from 77ccb30 to 4d5fa97 Compare November 22, 2025 00:07

govind-ramnarayan approved these changes Nov 22, 2025

View reviewed changes

tensorrt_llm/_torch/auto_deploy/models/hf.py Outdated Show resolved Hide resolved

tensorrt_llm/_torch/auto_deploy/models/modeling_nemotron_h.py Outdated Show resolved Hide resolved

tests/unittest/_torch/auto_deploy/unit/singlegpu/models/test_hf.py Outdated Show resolved Hide resolved

[None][feat] Reference pytorch implementation for nemotron H

aff6752

Signed-off-by: William Zhang <[email protected]>

2ez4bz force-pushed the dev-nano-ref-impl branch from 4d5fa97 to aff6752 Compare November 22, 2025 05:42

2ez4bz enabled auto-merge (squash) November 22, 2025 05:57

[#9230][feat] Slimmed down implementation of nemotron H #9235

Are you sure you want to change the base?

[#9230][feat] Slimmed down implementation of nemotron H #9235

Conversation

2ez4bz commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Why?

What?

Test Coverage

PR Checklist

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot commented Nov 19, 2025

Walkthrough

Changes

Sequence Diagrams

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

2ez4bz commented Nov 20, 2025

Uh oh!

tensorrt-cicd commented Nov 20, 2025

Uh oh!

tensorrt-cicd commented Nov 20, 2025

Uh oh!

Uh oh!

2ez4bz commented Nov 21, 2025

Uh oh!

tensorrt-cicd commented Nov 21, 2025

Uh oh!

tensorrt-cicd commented Nov 21, 2025

Uh oh!

govind-ramnarayan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

2ez4bz commented Nov 22, 2025

Uh oh!

tensorrt-cicd commented Nov 22, 2025

Uh oh!

tensorrt-cicd commented Nov 22, 2025

Uh oh!

2ez4bz commented Nov 22, 2025

Uh oh!

tensorrt-cicd commented Nov 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

2ez4bz commented Nov 17, 2025 •

edited

Loading