-
Notifications
You must be signed in to change notification settings - Fork 35
Nemotron-H mamba2 #355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: hybrid_dev
Are you sure you want to change the base?
Nemotron-H mamba2 #355
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice!
Didn't have a detailed look at this yet. But running the tests/models/test_checkpoint
tests is a good way to test the conversion and the modeling file. For this you can add a new model config in tests/utils/model_configs.py
that would use your new nm2
block. I can help for this if needed
fast_llm/layers/ssm/mamba2.py
Outdated
self._local_xb_size = xb_dim.size | ||
|
||
state_size = tensor_space[SSMDimNames.state].size | ||
div(self._local_inner_size, state_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing ... =
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed, added model for test, the tests seem to pass.
pytest tests/models/test_checkpoint.py --models hybrid_nm2
....
1 failed, 23 passed, 219 skipped, 34 warnings in 45.57s
Not sure if the failed one (tests/models/test_checkpoint.py::test_save_and_load_in_parallel[hybrid_nm2]@dependency_group_3) is expected to fail (it fails on previous mamba models as well)
Mamba2 implementation as in nemotron h. This also uses correct mamba2 kernels.
Motivation:
TODOs:
🔍 Type of change
Select all that apply:
📝 Changes
List the key changes introduced in this PR:
✅ Checklist
Make sure the following tasks are completed before submitting the PR:
General
Dependencies and Configuration
Testing
Performance Impact
📊 Performance Impact Details
If there is any impact on performance, describe it and provide benchmark results, if applicable:
🗒️ Additional Notes
Include any additional context, information, or considerations here, such as known issues, follow-up tasks, or backward compatibility concerns.