- 
                Notifications
    
You must be signed in to change notification settings  - Fork 427
 
Generate CDI specification including additional GIDs #630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| 
           Note: This updates the default spec version to v0.7.0 since the DRM devices generally require additional GIDs. As such, this should probably be an opt-in (or at least an opt-out) feature.  | 
    
74441f4    to
    285ea44      
    Compare
  
    | }, | ||
| &cli.BoolFlag{ | ||
| Name: "--allow-additional-gids", | ||
| Usage: "Allow the use of the additionalGIDs field for generated CDI specifications. Note this will generate a v0.7.0 CDI specification.", | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which versions of containerd / cri-o (and other runtimes for that matter) support this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have to check. Note that this is not enabled by default when generating the CDI spec and only for internal representations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- containerd
- (>= 1.7.16) - containerd/containerd@7a2f49f
 - (>= 1.8.0, >= 2.0.0) - containerd/containerd@1b62224
 
 - docker (>= 26.1.0) - moby/moby@745e235
 - podman (>= 5.1.0) - containers/podman@a40cf31
 - crio (>= 1.30.0) - cri-o/cri-o@fd9aa76
 
        
          
                internal/config/features.go
              
                Outdated
          
        
      | MOFED *feature `toml:"mofed,omitempty"` | ||
| NVSWITCH *feature `toml:"nvswitch,omitempty"` | ||
| GDRCopy *feature `toml:"gdrcopy,omitempty"` | ||
| NoAdditionalGIDs *feature `toml:"no-additional-gids,omitempty"` | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this reversed (i.e. no rather than allow) from the flag passed in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this has different default behaviour. For the NVIDIA Contianer Runtime I want to generated an internal representation including the additionalGIDs container edits for device nodes. This allows users to opt out if there is an issue.
I suppose we could make the argument that if we backport this (which we may want to due to the Jetpack feature request) we should keep the existing behaviour and not make these modifications either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, after having thought this through again, I have disabled this by default and switched the feature flag to --allow-additional-gids.
        
          
                internal/edits/device.go
              
                Outdated
          
        
      | 
               | 
          ||
| var additionalGIDs []uint32 | ||
| if allowAdditionalGIDs { | ||
| if requiredGID, _ := d.getRequiredGID(); requiredGID != 0 { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we log the error instead of swallowing it ? Or we could just have the function not return an error struct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to not return an error.
This change refactors the creation of contianer edits to make provision for injecting options that affect the generated CDI specifications. Signed-off-by: Evan Lezar <[email protected]>
285ea44    to
    141772c      
    Compare
  
    7ecd467    to
    fab0c6d      
    Compare
  
    This change adds support for injecting additional GIDs using the internal CDI representations. (Applicable for Tegra-based systems and /dev/dri devices) This is disabled by default, but can be opted in to by setting the features.allow-additional-gids feature flag. This can also be done by running sudo nvidia-ctk config --in-place --set features.allow-additional-gids Signed-off-by: Evan Lezar <[email protected]>
Signed-off-by: Evan Lezar <[email protected]>
fab0c6d    to
    5d55813      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces support for automatically including required group IDs (AdditionalGIDs) for device nodes in generated CDI specs when enabled via config or CLI. It updates the internal edits pipeline, CLI, and config to propagate an allowAdditionalGIDs flag, adds deduplication logic for the new field, and refactors resource construction.
- Added 
allowAdditionalGIDsoption and plumbing throughnvcdiliband modifiers - Extended 
device.toEditsto conditionally gather required GIDs and deduplicate them - Updated CLI to expose 
--allow-additional-gidsflag and config handling 
Reviewed Changes
Copilot reviewed 28 out of 28 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description | 
|---|---|
| pkg/nvcdi/transform/merged-device.go | Switched to edits.NewResource for merged device construction | 
| pkg/nvcdi/transform/deduplicate.go | Added deduplicateAdditionalGIDs to clean up GID lists | 
| pkg/nvcdi/options.go | Introduced WithAllowAdditionalGIDs option | 
| pkg/nvcdi/lib.go | Added allowAdditionalGIDs field and editsFromDiscoverer | 
| pkg/nvcdi/lib-wsl.go | Updated WSL library to use new edits API | 
| pkg/nvcdi/lib-nvml.go | Removed unused imports and updated GetCommonEdits | 
| pkg/nvcdi/lib-csv.go | Refactored CSV logic to use new edits API | 
| pkg/nvcdi/gds.go | Refactored GDS logic to use new edits API | 
| pkg/nvcdi/mofed.go | Updated MOFED logic to use editsFromDiscoverer | 
| pkg/nvcdi/mig-device-nvml.go | Refactored MIG device specs to use NewResource | 
| pkg/nvcdi/management.go | Updated management logic to use editsFromDiscoverer | 
| pkg/nvcdi/full-gpu-nvml.go | Refactored full GPU spec creation to use new edits API | 
| internal/edits/resource.go | Added NewResource helper | 
| internal/edits/options.go | Added allowAdditionalGIDs option for edits library | 
| internal/edits/device.go | Extended toEdits to collect AdditionalGIDs | 
| internal/edits/lib.go | Implemented new Interface for edits creation | 
| internal/edits/api.go | Defined edits Interface | 
| internal/edits/edits.go | Removed old global FromDiscoverer and NewSpecEdits | 
| internal/edits/edits_test.go | Updated tests to use new edits API | 
| pkg/nvcdi/transform/deduplicate.go | (duplicate entry) Deduplication logic for AdditionalGIDs | 
| internal/modifier/graphics.go | Passed allowAdditionalGIDs flag into graphics modifier | 
| internal/modifier/gated.go | Added allowAdditionalGIDs argument to gated modifier | 
| internal/modifier/discover.go | Updated NewModifierFromDiscoverer signature | 
| internal/modifier/discover_test.go | Updated tests for new modifier signature | 
| internal/config/features.go | Added FeatureAllowAdditionalGIDs handling | 
| internal/config/features_test.go | Added tests for AllowAdditionalGIDs feature flag | 
| cmd/nvidia-ctk/cdi/generate/generate.go | Introduced CLI flag for allow-additional-gids | 
Comments suppressed due to low confidence (2)
pkg/nvcdi/transform/deduplicate.go:162
- [nitpick] Add unit tests for deduplicateAdditionalGIDs to verify that duplicate GIDs are removed and the slice is sorted correctly.
 
func (d dedupe) deduplicateAdditionalGIDs(entities []uint32) ([]uint32, error) {
cmd/nvidia-ctk/cdi/generate/generate.go:178
- CLI flag Name should not include leading dashes; use "allow-additional-gids" so the flag is recognized properly by the parser.
 
			Name:        "--allow-additional-gids",
| } | ||
| 
               | 
          ||
| // WithAllowAdditionalGIDs specifies whether a generated CDI spec can contain | ||
| // the additionaGIDs field. | 
    
      
    
      Copilot
AI
    
    
    
      Jul 4, 2025 
    
  
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typo in comment: change 'additionaGIDs' to 'additionalGIDs'.
| // the additionaGIDs field. | |
| // the additionalGIDs field. | 
| } | ||
| 
               | 
          ||
| return NewModifierFromDiscoverer(logger, discover.Merge(discoverers...)) | ||
| allowAdditionalGIDs := !cfg.Features.IsEnabled(config.FeatureAllowAdditionalGIDs, image) | 
    
      
    
      Copilot
AI
    
    
    
      Jul 4, 2025 
    
  
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The negation operator '!' inverts the intended flag; remove '!' so the feature flag passes through correctly.
| allowAdditionalGIDs := !cfg.Features.IsEnabled(config.FeatureAllowAdditionalGIDs, image) | |
| allowAdditionalGIDs := cfg.Features.IsEnabled(config.FeatureAllowAdditionalGIDs, image) | 
This change adds logic to include the
AdditionalGIDsrequired for a device node in a generated CDI specification.When a device node requires specific group membership for access, a non-root user running in a container (e.g. using the
-uDocker command line argument) may not have access to the device.in the v0.7.0 CDI specification, support was added for specifying additional GIDs in the CDI specification and these changes extract the required information from the device nodes on the host if available.
This is disabled by default.
It can be enabled for internal CDI representations such as those used for Tegra-based systems or
/dev/dridevices by setting:in the config.toml file.
This can also be opted in to on a per-container basis by setting
NVIDIA_ALLOW_ADDITIONAL_GIDS=enabled.The
nvidia-ctkcommand can be used to toggle this feature by running:It can be enabled when running
nvidia-ctk cdi generateby specifying the--allow-additional-gidscommand line argument: