Skip to content

Conversation

jopperm
Copy link

@jopperm jopperm commented Feb 20, 2025

Adds three new methods on kernel_bundle to interact with device globals defined in runtime-compiled code:

bool ext_oneapi_has_device_global(const std::string &name);
void *ext_oneapi_get_device_global_address(const std::string &name, const device &dev); // return a device USM pointer suitable for queue::memcpy etc.
size_t ext_oneapi_get_device_global_size(const std::string &name);

`bundle_state::ext_oneapi_source` in the language `source_language::sycl`, and
* it defines a device global whose name is `name` and which was declared without
the `device_image_scope` property, and
* `dev` is contained by the context associated with this bundle.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is dev a parameter of this function? It doesn't seem to serve any purpose.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took the signature straight from the presentation you gave a while back 😇 But I agree, checking the device only makes sense if we also test if the device supports device globals. Playing around with the E2E tests for DGs, such a check does not seem to happen for code compiled with the normal pipeline.

I'll drop the parameter.

* `dev` is contained by the context associated with this bundle.

`name` must be a {cpp} identifier that is valid for referencing the device
global at the bottom of the source code.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really true? Can you handle cases like:

  • The device global is defined in a namespace, and name includes the namespace qualifier like mynamespace::var?
  • The device global is defined in a namespace, the application defines a namespace alias, and name uses the namespace alias like myalias::var?
  • The device global is defined as a static member of a struct, and name includes the name of the struct like mystruct::var?

I suspect these would be hard to implement for the same reasons we cannot just pass an arbitrary C++ expression for the address of a kernel function to ext_oneapi_get_kernel, It seems like we would want to use registered_kernel_names also for the names of device globals. (Maybe we should rename the property to registered_names instead.) In this case, registered_names would accept either:

  • A C++ expression for a pointer to a kernel function, or
  • A C++ expression for a pointer to a device global variable.

In both cases, the expression is conceptually compiled at the bottom of the source code.

Then, the name parameter to ext_oneapi_has_device_global could be either:

  • The name of the device global if it was declared in the global namespace (i.e. the top-level namespace), or
  • The name of a device global that was registered via registered_names.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a reasonable set of (scoped) names that we could mangle on the fly, but I didn't consider aliases, inline namespaces etc.. I agree that to do this right, we should extend registered_kernel_names to also support DG names.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this would be quite a heavy-weight change, I'm proposing to limit device globals to be declared on global scope only in 83d2898. Happy to revert if you think that's too constrained.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementing it in phases is OK, but I think we need to lift the restriction before we consider this extension "finished". Note that NVRTC allows global variables to be defined in namespaces, etc., and their nvrtcAddNameExpression API allows registration of either kernel or global variable names. Therefore allowing device global names in registered_names would be equivalent to what NVRTC has now.

If we plan to rename registered_kernel_names to registered_names, we should do that sooner rather than later. Better to do this before people start using the extension in ernest.

`bundle_state::ext_oneapi_source`.

_Effects:_ If device memory for `name` has not been allocated at the time of
this call, it will be allocated and zero-initialized synchronously.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a detail of our implementation. It would also be valid to allocate the device memory earlier (e.g. when the kernel bundle is compiled), or allocate it later (e.g. when the host first reads or writes to the address). Maybe we could instead add a Remarks: section like:

Remarks: The contents of the device global may be read or written from the host by reading from or writing to this address. If the address is read before any kernel writes to the device global, the read operation returns the device global's initial value.

this call, it will be allocated and zero-initialized synchronously.

_Returns:_ Returns a USM pointer to the device global `name`'s storage on device
`dev`.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should say that this is a "device USM" pointer. That way application's know they cannot directly access it from the host.

The formatting is messed up here because of the "'s". I generally avoid pluralizing identifiers for stylistic reasons. I'd rewrite the sentence like:

Returns: A device USM pointer to the storage for the device global name on device dev.

_Constraints:_ This function is not available when `State` is
`bundle_state::ext_oneapi_source`.

_Returns:_ Returns the size in bytes of device global `name`.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of a nit, but the size is the size of the device global's storage, which is not the same as sizeof(name). I'd say:

Returns: The size in bytes of the USM storage for device global name.

(BTW, notice that we do not repeat the word "Returns" in the description of the return value.)

[source,c++]
----
size_t ext_oneapi_get_device_global_size(const std::string &name,
const device &dev)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the dev parameter? Is the size different on different devices?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, see discussion on ext_oneapi_has_device_global. I'll drop the parameter.

gmlueck pushed a commit that referenced this pull request Feb 26, 2025
…6386)

The overloads for single_task and parallel_for in the
sycl_ext_oneapi_kernel_properties extension are being deprecated as
mentioned in intel#14785. So I'm rewriting
tests containg such overloads so that they can still run after the
deprecation.

---------

Signed-off-by: Hu, Peisen <[email protected]>
@jopperm
Copy link
Author

jopperm commented Mar 12, 2025

Superseded by intel#17401.

@jopperm jopperm closed this Mar 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants