Reduce overhead of bindings requiring `cuPythonInit()` #894

mdboom · 2025-08-22T16:36:53Z

Description

This reduces the calling overhead of binding functions that require cuPythonInit or cudaPythonInit to be called.

This reduces the time it takes to call driver.cuDeviceGet(0) (for example) by about 50ns (on my machine):

Base: Mean +- std dev: 149 ns +- 8 ns
This PR: Mean +- std dev: 101 ns +- 2 ns

Note that these times include the work done by the actual underlying cuDeviceGet call, not just the function call overhead.

Why this works

The cuPythonInit function is extremely large, so no C compiler is likely to ever inline it. By creating a small wrapper function just to check the init flag and then delegate to the big function, the C compiler inlines it. This not only removes a C function call, but probably helps out the branch predictor when checking the flag.

closes

Checklist

New or existing tests cover these changes.
The documentation is up to date with these changes.

copy-pr-bot · 2025-08-22T16:36:56Z

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Copilot

Pull Request Overview

This PR optimizes the initialization check for CUDA Python bindings by reducing function call overhead. The optimization splits the existing cuPythonInit() and cudaPythonInit() functions into two parts: a small wrapper function that checks if initialization has already occurred, and a larger function that performs the actual initialization work.

Refactors initialization functions to use a small wrapper pattern for better compiler inlining
Moves the initialization flag check to a separate small function to enable C compiler optimization
Applies the same pattern across three binding files for consistency

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File	Description
cyruntime.pyx.in	Splits `cudaPythonInit()` into wrapper and implementation functions
cynvrtc.pyx.in	Splits `cuPythonInit()` into wrapper and implementation functions
cydriver.pyx.in	Splits `cuPythonInit()` into wrapper and implementation functions

Comments suppressed due to low confidence (1)

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

mdboom · 2025-08-22T16:58:40Z

/ok to test

github-actions · 2025-08-22T17:12:44Z

Doc Preview CI
🚀 View preview at https://nvidia.github.io/cuda-python/pr-preview/pr-894/
https://nvidia.github.io/cuda-python/pr-preview/pr-894/cuda-core/
https://nvidia.github.io/cuda-python/pr-preview/pr-894/cuda-bindings/
Preview will be ready when the GitHub Pages deployment is complete.

leofang

Let's also add a release note to 13.X.Y. (I want to backport it to 12.9.X, so perhaps also a good idea to touch its release note. Note that we only generate docs on the main branch #809.)

Also we need to backport this to the codegen 🙂

cuda_bindings/cuda/bindings/_bindings/cydriver.pyx.in

mdboom · 2025-08-22T18:00:54Z

Let's also add a release note to 13.X.Y. (I want to backport it to 12.9.X, so perhaps also a good idea to touch its release note. Note that we only generate docs on the main branch #809.)

👍

Also we need to backport this to the codegen 🙂

Sure, will do. The upstream of the generator at [email protected]:NVIDIA/cuda-python-private.git doesn't seem to have the free-threading fix yet (so running the generator for this change reverts all that). Is that just because that fix hasn't been implemented in the generator yet, or am I using the wrong branch or something on my end?

leofang · 2025-08-22T18:08:46Z

I guess you're blocked by me now... let me get to it asap

leofang · 2025-08-22T18:58:28Z

/ok to test 4c6a057

Reduce overhead of bindings requiring cuPythonInit()

a62185f

github-project-automation bot added this to CCCL Aug 22, 2025

github-project-automation bot moved this to Todo in CCCL Aug 22, 2025

mdboom requested review from Copilot and leofang August 22, 2025 16:36

Copilot AI reviewed Aug 22, 2025

View reviewed changes

leofang assigned mdboom Aug 22, 2025

leofang added enhancement Any code-related improvements P1 Medium priority - Should do cuda.bindings Everything related to the cuda.bindings module to-be-backported Trigger the bot to raise a backport PR upon merge labels Aug 22, 2025

leofang added this to the cuda-python parking lot milestone Aug 22, 2025

leofang reviewed Aug 22, 2025

View reviewed changes

cuda_bindings/cuda/bindings/_bindings/cydriver.pyx.in Show resolved Hide resolved

Add changelog entry

5de749b

Explicitly specific inline

4c6a057

leofang approved these changes Aug 22, 2025

View reviewed changes

github-project-automation bot moved this from Todo to In Review in CCCL Aug 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce overhead of bindings requiring `cuPythonInit()` #894

Reduce overhead of bindings requiring `cuPythonInit()` #894

mdboom commented Aug 22, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Aug 22, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

mdboom commented Aug 22, 2025

Uh oh!

github-actions bot commented Aug 22, 2025

Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

leofang left a comment

Uh oh!

Uh oh!

mdboom commented Aug 22, 2025

Uh oh!

leofang commented Aug 22, 2025

Uh oh!

leofang commented Aug 22, 2025

Uh oh!

Uh oh!

Reduce overhead of bindings requiring cuPythonInit() #894

Are you sure you want to change the base?

Reduce overhead of bindings requiring cuPythonInit() #894

Conversation

mdboom commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Why this works

Checklist

Uh oh!

copy-pr-bot bot commented Aug 22, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

mdboom commented Aug 22, 2025

Uh oh!

github-actions bot commented Aug 22, 2025

Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

leofang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mdboom commented Aug 22, 2025

Uh oh!

leofang commented Aug 22, 2025

Uh oh!

leofang commented Aug 22, 2025

Uh oh!

Uh oh!

Reduce overhead of bindings requiring `cuPythonInit()` #894

Reduce overhead of bindings requiring `cuPythonInit()` #894

mdboom commented Aug 22, 2025 •

edited

Loading