Skip to content

Conversation

@nvchenghaoz
Copy link
Collaborator

@nvchenghaoz nvchenghaoz commented Nov 7, 2025

Summary by CodeRabbit

  • Bug Fixes
    • Fixed decoding phase calculations in Mamba model operations for improved correctness during inference.

Signed-off-by: Chenghao Zhang <[email protected]>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 7, 2025

📝 Walkthrough

Walkthrough

Both files contain decode-phase optimizations for the Mamba model. The CUDA backend simplifies decoding index calculation by replacing index-based copying with direct slicing using offsets. The Triton backend removes redundant dt_pre computation, instead passing dt_hp directly to selective_state_update with softplus enabled.

Changes

Cohort / File(s) Summary
Mamba CUDA backend decode optimization
tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/cuda_backend_causal_conv.py
Replaces decoding index calculation and in-place index_copy_ operation with direct sliced copy_. Uses total_prefill_tokens and num_decode offsets for explicit slice bounds, ensuring dtype consistency via to().
Mamba Triton backend dt computation
tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/triton_backend_mamba.py
Removes dt_pre computation (softplus and clipping) in decode path. Passes dt_hp directly to selective_state_update with dt_bias_hp as bias and dt_softplus enabled, replacing previous zero-bias non-softplus path.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–25 minutes

  • Triton backend changes may require verification that dt_softplus parameter produces numerically equivalent results and doesn't affect model accuracy
  • CUDA backend dtype handling should be verified to ensure the explicit .to(y_flat.dtype) conversion doesn't introduce unexpected precision changes
  • Both files involve low-level GPU kernels where subtle logic changes could have significant performance or numerical implications

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is incomplete. Only '@coderabbitai summary' was provided without any actual description, test coverage, or checklist completion. Complete the PR description with sections explaining the issue/solution, test coverage, and completion of the PR checklist as specified in the template.
✅ Passed checks (2 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title check ✅ Passed The title clearly and directly summarizes the main change: a performance improvement for mamba layers in the AutoDeploy module, which matches the file modifications and optimization changes described in the raw summary.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@nvchenghaoz nvchenghaoz changed the title [None][Feat] AutoDeploy: Perf improvement for mamba layers. [None][feat] AutoDeploy: Perf improvement for mamba layers. Nov 7, 2025
@nvchenghaoz nvchenghaoz changed the title [None][feat] AutoDeploy: Perf improvement for mamba layers. [None][feat] AutoDeploy: Perf improvement for mamba layers Nov 7, 2025
@nvchenghaoz
Copy link
Collaborator Author

/bot run

@github-project-automation github-project-automation bot moved this from Backlog to In review in AutoDeploy Board Nov 7, 2025
@tensorrt-cicd
Copy link
Collaborator

PR_Github #23878 [ run ] triggered by Bot. Commit: 45fbb9d

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23878 [ run ] completed with state SUCCESS. Commit: 45fbb9d
/LLM/main/L0_MergeRequest_PR pipeline #17975 completed with status: 'FAILURE'

Signed-off-by: Chenghao Zhang <[email protected]>
@suyoggupta
Copy link
Collaborator

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23903 [ run ] triggered by Bot. Commit: 76530a4

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23903 [ run ] completed with state SUCCESS. Commit: 76530a4
/LLM/main/L0_MergeRequest_PR pipeline #17995 completed with status: 'FAILURE'

@suyoggupta
Copy link
Collaborator

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23905 [ run ] triggered by Bot. Commit: c63abe0

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23905 [ run ] completed with state SUCCESS. Commit: c63abe0
/LLM/main/L0_MergeRequest_PR pipeline #17997 completed with status: 'FAILURE'

Signed-off-by: Suyog Gupta <[email protected]>
@suyoggupta
Copy link
Collaborator

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23907 [ run ] triggered by Bot. Commit: 8eb0c25

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23907 [ run ] completed with state SUCCESS. Commit: 8eb0c25
/LLM/main/L0_MergeRequest_PR pipeline #17999 completed with status: 'FAILURE'

Signed-off-by: Suyog Gupta <[email protected]>
@suyoggupta
Copy link
Collaborator

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23908 [ run ] triggered by Bot. Commit: 324181a

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23908 [ run ] completed with state SUCCESS. Commit: 324181a
/LLM/main/L0_MergeRequest_PR pipeline #18000 completed with status: 'FAILURE'

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23914 [ run ] triggered by Bot. Commit: 324181a

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23914 [ run ] completed with state SUCCESS. Commit: 324181a
/LLM/main/L0_MergeRequest_PR pipeline #18004 completed with status: 'FAILURE'

@suyoggupta
Copy link
Collaborator

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23916 [ run ] triggered by Bot. Commit: 324181a

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23916 [ run ] completed with state SUCCESS. Commit: 324181a
/LLM/main/L0_MergeRequest_PR pipeline #18006 completed with status: 'FAILURE'

@suyoggupta
Copy link
Collaborator

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23930 [ run ] triggered by Bot. Commit: 324181a

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23930 [ run ] completed with state SUCCESS. Commit: 324181a
/LLM/main/L0_MergeRequest_PR pipeline #18019 completed with status: 'FAILURE'

@nvchenghaoz
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #24039 [ run ] triggered by Bot. Commit: 324181a

@tensorrt-cicd
Copy link
Collaborator

PR_Github #24039 [ run ] completed with state SUCCESS. Commit: 324181a
/LLM/main/L0_MergeRequest_PR pipeline #18113 completed with status: 'FAILURE'

@nvchenghaoz
Copy link
Collaborator Author

/bot run

@nvchenghaoz nvchenghaoz self-assigned this Nov 10, 2025
@tensorrt-cicd
Copy link
Collaborator

PR_Github #24044 [ run ] triggered by Bot. Commit: eb7c92b

@tensorrt-cicd
Copy link
Collaborator

PR_Github #24044 [ run ] completed with state SUCCESS. Commit: eb7c92b
/LLM/main/L0_MergeRequest_PR pipeline #18118 completed with status: 'FAILURE'

@nvchenghaoz
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #24060 [ run ] triggered by Bot. Commit: eb7c92b

@tensorrt-cicd
Copy link
Collaborator

PR_Github #24060 [ run ] completed with state SUCCESS. Commit: eb7c92b
/LLM/main/L0_MergeRequest_PR pipeline #18132 completed with status: 'FAILURE'

@nvchenghaoz
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #24105 [ run ] triggered by Bot. Commit: eb7c92b

@tensorrt-cicd
Copy link
Collaborator

PR_Github #24105 [ run ] completed with state SUCCESS. Commit: eb7c92b
/LLM/main/L0_MergeRequest_PR pipeline #18168 completed with status: 'SUCCESS'

@nvchenghaoz nvchenghaoz merged commit ec9cf71 into NVIDIA:main Nov 11, 2025
5 checks passed
@github-project-automation github-project-automation bot moved this from In review to Done in AutoDeploy Board Nov 11, 2025
suyoggupta added a commit to nv-auto-deploy/TensorRT-LLM that referenced this pull request Nov 12, 2025
Signed-off-by: Chenghao Zhang <[email protected]>
Signed-off-by: Suyog Gupta <[email protected]>
Co-authored-by: Suyog Gupta <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

4 participants