From 4aa560f8a7d2e5424e195d30bbc2d3dc8d1424df Mon Sep 17 00:00:00 2001 From: Alexey Sachkov Date: Tue, 6 May 2025 12:14:59 -0700 Subject: [PATCH 01/12] [SYCL][Doc] Release notes for Mar'25 release --- sycl/ReleaseNotes.md | 695 ++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 693 insertions(+), 2 deletions(-) diff --git a/sycl/ReleaseNotes.md b/sycl/ReleaseNotes.md index 0b1634d11a614..4b65589605306 100644 --- a/sycl/ReleaseNotes.md +++ b/sycl/ReleaseNotes.md @@ -16,6 +16,699 @@ - Did this and that ... intel/llvm#pr +# Release notes Mar'25 + +Release notes for commit range +[b0212c37b2](https://github.com/intel/llvm/commit/b0212c37b230d9dd3bb129df9f4ecc417b92ad8) +... +[b23d69e2c3](https://github.com/intel/llvm/commit/b23d69e2c3fda1d69351137991897c96bf6a586d) + +## New Features + +### Runtime compilation of SYCL code + +- [`sycl_ext_oneapi_kernel_compiler`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/experimental/sycl_ext_oneapi_kernel_compiler.asciidoc) + extension specification was updated to accept `sycl` as source language, thus + providing functionality similar to + [NVRTC](https://docs.nvidia.com/cuda/nvrtc/). intel/llvm#11985, + intel/llvm#17446 +- Initial support for this feature was implemented. intel/llvm#16132, + intel/llvm#16222, intel/llvm#16132, intel/llvm#17640, intel/llvm#17356, + intel/llvm#16565, intel/llvm#17383, intel/llvm#17447, intel/llvm#17307, + intel/llvm#17373, intel/llvm#17331, intel/llvm#17329, intel/llvm#17266, + intel/llvm#17032, intel/llvm#16823, intel/llvm#16702, intel/llvm#16638, + intel/llvm#16316, intel/llvm#17359, intel/llvm#16485, intel/llvm#16821 +- Known issues and limitations are documented + [in the extension specification](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/experimental/sycl_ext_oneapi_kernel_compiler.asciidoc#non-normative-implementation-notes-for-dpc). intel/llvm#17307, + intel/llvm#17459 + +### SYCL graphs + +- Introduced and implemented + [`sycl_ext_codeplay_enqueue_native_command`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/experimental/sycl_ext_codeplay_enqueue_native_command.asciidoc) + extension which allows to include custom commands for interoperability with + native runtimes into graphs built using `sycl_ext_oneapi_graph` extension. + intel/llvm#16871 + +### Bindless images + +- Extended the extension + [specification](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/experimental/sycl_ext_oneapi_bindless_images.asciidoc) + to support more kinds of copy operations (`image_mem_handle` to USM and vice + versa, USM to USM, etc.) and implemented them. intel/llvm#16661, + intel/llvm#17507 +- Extended the extension specification and implementation to support + `gather_image` device built-in function. The implementation was only done for + CUDA backend so far. intel/llvm#17322 + +### Native CPU Device + +- Added support for source-based code coverage on Native CPU. intel/llvm#15073 + +### KHR extensions + +Please note that KHR extensions are being specified and released by Khronos +Group. The process of completing and publishing a KHR extension takes a while, +but as implementors we need to prototype them early to help find possible issues +with specifications and ensure that they are implementable. + +During that stage in an extension development its specification is incomplete +and subject to change without any notice. Therefore, we will refer to those +extensions using **prototyped** word. Their implementation is not available by +default and requires `__DPCPP_ENABLE_UNFINISHED_KHR_EXTENSIONS` macro to be set +_before_ including `` header to make them available. Considering +that specifications of such extensions are not final and not versioned, their +prototypes may not exactly match the latest publicly available versions of the +corresponding specifications. There is no guarantee of completeness either. +You can find more details on our development processes in +[this document](sycl/doc/developer/KHRExtensions.md). + +The only reason those extensions are mentioned here is to give you a glimpse of +the future about which extensions will be supported in future releases. We do +not recommend to use such extensions right know, but advanced users who are +driving those extension specifications forward can do early experiments with +them to provide feedback to the Khronos Group. + +- Implemented + [`sycl_khr_default_context`](https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:khr-default-context) + extension. intel/llvm#15645 +- **Prototyped** + [`sycl_khr_free_function_commands`](https://github.com/KhronosGroup/SYCL-Docs/pull/644) + extension. intel/llvm#16770, intel/llvm#17222 + +### Other extensions + +- Introduced and implemented + [`sycl_ext_oneapi_device_image_backend_content`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/experimental/sycl_ext_oneapi_device_image_backend_content.asciidoc) + extension which allows to query underlying content of a device image for + interoperability with with other runtimes (such as OpenCL or Level Zero). + intel/llvm#14811, intel/llvm#16633 +- Introduced and implemented + [`sycl_ext_oneapi_current_device`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/experimental/sycl_ext_oneapi_current_device.asciidoc) + extension which introduces another state into SYCL holding per-thread + `device`. intel/llvm#15382, intel/llvm#16970 +- Introduced and implemented + [`sycl_ext_oneapi_work_group_static`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/experimental/sycl_ext_oneapi_work_group_static.asciidoc) + and + [`sycl_ext_oneapi_work_group_scratch_memory`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/experimental/sycl_ext_oneapi_work_group_scratch_memory.asciidoc) + extensions that provide different ways of allocating and accessing device + local memory (i.e. shared by all work-items within a work-group). + intel/llvm#15061, intel/llvm#16325 + - The former is only supported on CUDA backend +- Introduced and implemented + [`sycl_ext_intel_kernel_queries`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/supported/sycl_ext_intel_kernel_queries.asciidoc) + extension. intel/llvm#16834 +- Implemented proposed + [`sycl_ext_intel_event_mode`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/proposed/sycl_ext_intel_event_mode.asciidoc) + extension. intel/llvm#16108 +- Completed implementation of + [`sycl_ext_oneapi_launch_queries`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/proposed/sycl_ext_oneapi_launch_queries.asciidoc) + extension. intel/llvm#16709, intel/llvm#16051 +- Completed implementation of the + [`sycl_ext_oneapi_kernel_arg_properties`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/experimental/sycl_ext_oneapi_kernel_arg_properties.asciidoc) + extension by implementing missing `unaliased` property. intel/llvm#16090 + - It used to be called `restrict` in previous versions of the extension, but + a renaming was done to avoid conflict with C99 `restrict` type qualifier. + intel/llvm#16814 +- Introduced and implemented the + [`sycl_ext_oneapi_num_compute_units`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/supported/sycl_ext_oneapi_num_compute_units.asciidoc) + extension. intel/llvm#16293, intel/llvm#16538 + +### New compiler options + +- Added support for ``-f[no]-offload-fp32-prec-div` and + `-f[no-]-offload-fp32-prec-sqrt` compiler flags to control precision of + floating-point division and square root. intel/llvm#15836, intel/llvm#16107, + intel/llvm#16993, intel/llvm#17044, intel/llvm#17033, intel/llvm#16942, + intel/llvm#16714, intel/llvm#17393, intel/llvm#17253 + +### Sanitizers + +#### Memory Sanitizer + +- Introduced memory sanitizer support. intel/llvm#15955, intel/llvm#16427, + intel/llvm#16478, intel/llvm#16935, intel/llvm#16535, intel/llvm#16477, + intel/llvm#16567, intel/llvm#16526, intel/llvm#16678, intel/llvm#16566, + intel/llvm#16619, intel/llvm#16705 + + It features: + - Checking for uses of uninitalized values in private memory. intel/llvm#17309 + - Checking for uses of unitialized values in local memory, such + as `local_accessor` or `group_local_memory`. intel/llvm#17180, + intel/llvm#17054 + - Sanitizing USM operations like `memset` or `memcpy`. intel/llvm#16511 + +#### Thread Sanitizer + +- Introduced thread sanitizer support for device code. intel/llvm#17345, + intel/llvm#17211, intel/llvm#17155, intel/llvm#17181 + +## Improvements and bugfixes + +### `sycl_ext_oneapi_graph` extension + +- Reimplemented topological sort algorithm used to determine graph nodes + execution order to avoid issues with overflowing stack on huge graphs and + improve performance. intel/llvm#17495 +- Documented kernel binary update feature which allows to update kernel nodes + in graphs. This feature had been implemented earlier already. intel/llvm#14896 +- Introduced ability to update host-task nodes in graphs. intel/llvm#16853 +- Fixed race condition in `mutable_command_graph` node queries. intel/llvm#17012 +- Fixed the issue with not all graph-related classes fully implementing + common reference semantics. intel/llvm#16788 +- Documented interaction with `sycl_ext_oneapi_local_memory` extension. + intel/llvm#16379 +- Documented interaction with `sycl_ext_oneapi_work_group_memory` extension. + intel/llvm#16229 +- Made `ext_oneapi_weak_object` extension work with graph objects. + intel/llvm#16209 +- Fixed a bug where using `local_accessor` or `work_group_memory` objects as + part of whole graph update would function incorrectly on CUDA & HIP backends. + intel/llvm#16025 + +### SYCLcompat library + +- Introduced new set of group utility functions and classes aimed to reduce the + gap between `syclcompat` and `dpct` namespaces. intel/llvm#17263 +- 73e6b224aacf [SYCLCOMPAT] Forward launch arguments to avoid copies (#16965) + - Definitely user-visible, but I'm not sure how to word that +- Fixed `compare_mask` putting results in the wrong 2-byte segment of 4-byte + output. intel/llvm#16768 +- Optimized implementation of `permute_sub_group_by_xor` for the case when + `logical_sub_group_size == 32`. intel/llvm#16646 +- Added new function `ternary_logic_op` to perform bitwise logical operations + on three input values based on the specified 8-bit truth table. + intel/llvm#16509 +- 6e0d90e73ed1 [SYCLCompat] Fix vectorized_binary impl to make SYCLomatic migrated code run pass (#16553) + - Not sure how to word that +- 16c447998836 [SYCL][COMPAT] Replace T{-1} with static_cast(-1) for mask creation (#16527) + - bugfix? + +### Explicit SIMD extension + +- Extended + [`sycl_ext_intel_esimd`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/supported/sycl_ext_intel_esimd/sycl_ext_intel_esimd.md) + extension specification and implementation with new queries to check support + for 2d load/store/prefetch operations. intel/llvm#15905 +- Fixed miscompilations of ESIMD functions under high optimization levels when + compiler performs aggressive inlining. intel/llvm#16193 + +### Sanitizers + +- ada16682e8c3 [DevASAN] Only report warning if passing host ptr to kernel (#16654) + - seems like a potentially important bugfix +- ce4a320806b2 [DeivceASAN] Make ShadowMemory one instance per type (#16687) + - seems like some kind of bugfix +- ef4d66af3b74 [DeviceSAN] Fix kernel name addressspace (#16425) +- 1fba00d3be7d [DeviceASAN] Fix ASAN with kernel assert (#16256) +- 34aeabab551e [SYCL][DeviceASAN] Fix AcceeChain to a matrix for bfloat16 (#16323) +- 6f3b0e857d15 [DevASAN] Do allocation with USM pool to reduce memory overhead (#16280) +- a8c6e7715be2 [DeviceASAN] Re-use shadow if required size is not larger than last one (#16258) + - As the above, some optimization of memory usage by ASAN? +- 201725664cc5 [DeviceSanitizer] Fix device global type of KernelMetadata (#16357) + +#### Address Sanitizer + +- Fixed ASAN throwing an exception with `UR_RESULT_ERROR_INVALID_ARGUMENT` when + detecting incorect memory free operation. intel/llvm#16706 + + + + +- bee8a397ac72 [UR][DeviceASAN] Sync the latest changes in asan_libdevice.hpp (#15911) +- 6347914485a8 [DeviceAsan] bugfixes for UR (#16257) +- e9143ca66108 [DeviceAsan] Report error when using unsupported API (#16281) +- 092cd2dfc034 [UR][DeviceASAN] Bugfix for mmap (#16466) +- 696514238e2e [DeviceASAN] Fix kernel release order (#16688) +- cc6148dfd17c [UR][DeviceASAN] Bugfix for GetDeviceType (#16745) +- 76c665363565 [DeviceSanitizers] Adjust backtrace addresses to call instruction (#17404) +- e2ab2b9ba963 [DevSan][Refactor] Make Options an unified class shared by all sanitizers (#17157) + +### Bindless images + +- Added support for timeline semaphores. intel/llvm#17395 +- Added support for `ext_oneapi_bindless_sampled_image_fetch_1d`, + `ext_oneapi_bindless_sampled_image_fetch_1d_usm`, + `ext_oneapi_bindless_sampled_image_fetch_2d`, + `ext_oneapi_bindless_sampled_image_fetch_2d_usm` and + `ext_oneapi_bindless_sampled_image_fetch_3d` aspects on Level Zero backend. + intel/llvm#16862 +- Fixed return types of image extent queries to match the specification. + intel/llvm#16829 +- Clarified the types of supported USM memory in the extension specification. + intel/llvm#16622 + +- 3161af314190 [SYCL][Ext][Bindless] Initial implementation of image spirv builtins on HIP (#16439) + - What exactly does it mean for end user? +- b732a3c4c9cb [SYCL][Bindless] Fix incorrect mangling of bindless images builtin functions (#16135) + - What was the user-visible effect of the issue? + +### Native CPU device + +- Improved support for `dynamic_address_cast` on Native CPU device. + intel/llvm#16676 +- Improved performance of Native CPU device: less memory allocations and thread + launches. intel/llvm#17102 +- Fixed a bug where submitting the same kernel multiple times at about the same + time with different argument would lead to incorrect arguments being used. + intel/llvm#16995 +- Fixed compiler crashes when building applications that use atomics. + intel/llvm#16737 +- Fixed segfaults happening in SYCL CTS tests for `async_work_group_copy` + API. intel/llvm#16500 +- Improved support for sub-groups by updating version of OneAPI Construction + Kit. intel/llvm#16785 + +### Matrix + +- Aligned `joint_matrix_apply` implementation with the specification change + (intel/llvm#13153) to be able to modify both matrices. intel/llvm#16155 + +### Documentation + +- Proposed the + [`sycl_ext_oneapi_syclbin`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/proposed/sycl_ext_oneapi_syclbin.asciidoc) + extension. intel/llvm#16784 +- Updated the + [`sycl_ext_intel_device_info`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/supported/sycl_ext_intel_device_info.md) + extension specification to clarify that no additional environment variables + are required anymore to make the extension functional. intel/llvm#16715 +- Updated the + [`sycl_ext_intel_device_info`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/supported/sycl_ext_intel_device_info.md) + extension to reflect the current level of support for it on different + backends. intel/llvm#16792 +- Fixed mistakes in APIs naming in the + [`sycl_ext_oneapi_peer_access`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/supported/sycl_ext_oneapi_peer_access.asciidoc) + extension specification. intel/llvm#17327 +- Fixed example provided in the + [`sycl_ext_oneapi_backend_level_zero`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/supported/sycl_ext_oneapi_backend_level_zero.md) + extension. intel/llvm#16901 +- Updated wording in the proposed + [`sycl_ext_oneapi_launch_queries`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/proposed/sycl_ext_oneapi_launch_queries.asciidoc) + extension to better match ISO C++ format and clarify how different overloads + are intended to behave. intel/llvm#16014 + +#### intel/llvm project + +This sub-category does not cover the product (Intel's SYCL implementation), but +it covers how you can engage and interact with the project, i.e. various +development processes. + +- Updated the project's + [security policy](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/SECURITY.md) + . intel/llvm#16559 +- Documented + [process](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/developer/KHRExtensions.md) + of prototyping KHR extensions. intel/llvm#16883 +- Documented + [process](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/developer/WorkingOnAReleaseBranch.md) + of working on release branches. intel/llvm#17042 +- Refreshed + [documentation](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/test-e2e/README.md) + on adding tests to the repository to reflect recent infrastructure + advancements/changes. intel/llvm#16409, intel/llvm#16875, intel/llvm#16967 + +### Support for new hardware + +- Updated + [`sycl_ext_oneapi_device_architecture`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/experimental/sycl_ext_oneapi_device_architecture.asciidoc) + extension specification and implementation to recognize Intel Panther Lake + H & U GPUs and Intel Xeon processors codenamed Diamond Rapids devices. + intel/llvm#16294, intel/llvm#16543 +- Taught the compiler about optional features supported by Intel Panther Lake + H & U GPUs (necessary for the correct AOT compilation). intel/lvm#16368 +- Updated + [`sycl_ext_intel_matrix`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_intel_matrix.asciidoc) + extension specification and implementation to support Intel Xeon processors + codenamed Diamond Rapids. intel/llvm#16543 + +### Optimizations of SYCL Runtime + +Within this release some work has been done to reduce overheads incurred by +SYCL runtime over low-level runtimes (such as Level Zero or OpenCL): + +- Reduced amount of string copies unnecessarily made by SYCL RT for debug traces + even if debug tracing is disabled. intel/llvm#16596 +- Reduced number of times `shared_ptr`s are copied. intel/llvm#17396, + intel/llvm#17477, intel/llvm#17473 +- Reduced amount of memory allocations happening by moving away from using + `std::function`. This should also help with reducing compilation time of SYCL + headers. intel/llvm#17202, intel/llvm#16668 +- Reduced amount of memory allocations required for `local_accessor`. + intel/llvm#17147, intel/llvm#17510 +- Reduce amount of memory allocations on "fast" kernel enqueue path and dropped + some unnecessary runtime checks. intel/llvm#17312, intel/llvm#17376 +- Made more queue operations go through "fast" path. intel/llvm#16735 + +### Core SYCL 2020 functionality + +- Aligned `SYCL_LANGUAGE_VERSION` macro definition with the recent SYCL 2020 + spec change (KhronosGroup/SYCL-Docs#704). intel/llvm#15890 +- Implemented `swizzle` method for swizzles. intel/llvm#16353 + +### Other changes in SYCL Compiler + +- Introduced a new optimization to eliminate back-to-back barriers when its + safe. Such chain of barriers may occur when multiple group algorithms are + used next to each other. intel/llvm#16750 +- Removed a busy-wait loop from the implementation of + `-fsycl-max-parallel-link-jobs` flag, making it consume less resources when + waiting. intel/llvm#17260 +- Made `-O0` to be the default optimization level when debug info is enabled + through `-g` flag. intel/llvm#16408 +- Uplifted maximum version of SPIR-V that compiler can generate to 1.5. + intel/llvm#16626 +- Made compiler embed device library needed for `bfloat16` support into the + application (if it is used). This change will allow us to reduce the size + of redistributable SYCL RT package by eliminating some files from it. + intel/llvm#16729 +- Added a compiler diagnostic (warning) about undefined `SYCL_EXTERNAL` + functions used in a module to help catch linking errors earlier. + intel/llvm#17346 +- Addressed issue intel/llvm#11531 where the compiler would generate invalid + SPIR-V if kernel used arguments of boolean type. intel/llvm#17427 +- Switched to use native `bfloat16` implementation for devices that support it + (LNL, PVC), as well as fixed a bug where native implementation won't be used + if multiple AOT targets are specified. intel/llvm#17154, intel/llvm#16240, + intel/llvm#16494 +- Aligned behavior of `-Wimplicit-float-conversion` with the upstream clang for + non-SYCL language modes. intel/llvm#16857 +- Added support for `dynamic_address_cast` on CUDA & HIP backends. + intel/llvm#16604 +- Fixed compilation errors when building applications that use `nearbyint` and + `rint` for HIP targets. intel/lllvm#16373 +- Improved check for unsupported data types to actually rely on target + information instead of hardcoded knowledge. For example, this allows 128-bit + integeres to be used in device code when targeting CUDA backend. + intel/llvm#17036 +- Fixed hangs on AMD and crashes on NVIDA when `atomic_ref` is used with + `work_item` memory scope. intel/llvm#16172 +- Fixed `-fcuda-short-ptr` flag causing compilation errors. Its use will still + result in a warning that some implicitly linked object is not compiled with + that flag (namely some of our built-in libraries), but it shouldn't be a + problem because those libraries don't operate on pointers. intel/llvm#15642 +- Fixed intel/llvm#15852 where compilation with `-mlong-double-64` would still + result in error that 128 double is not supported by a target. intel/llvm#16441 +- Fixed a bug that linking static libraries with SYCL code in them using + `-l:libname.a` spelling would ignore device code from those libraries. + intel/llvm#17149 +- Fixed a bug where having a pure virtual function marked as device one would + cause unresolved symbol errors emitted by device compiler on Windows. + intel/llvm#16231 +- Fixed a bug where having two kernels (one annotated with + `reqd_work_group_size` attribute/property and another without it) together + with `-fsycl-device-code-split=off` would cause runtime error about + mismatched work-group size. intel/llvm#16236 +- Fixed debug information for kernels that use global offest on HIP & CUDA + backends. intel/llvm#16963 + +### Other changes in SYCL Library + +- Made `group_[load|store]` functions to use native built-ins when used with + vectors of 16 `short`s. intel/llvm#16581 +- Extended support for shared libraries to make it work with kernel bundles + as well. intel/llvm#16228 +- In response to intel/llvm#17114 added tracing (through `SYCL_UR_TRACE`) for + `SYCL_DEVICE_ALLOWLIST` decisions for better discoverability of the feature. + intel/llvm#17426 +- Aligned implementation of `info::execution_capability` query with the recent + SYCL 2020 specification change made in KhronosGroup/SYCL-Docs#625. + intel/llvm#16673 +- Fixed compilation issues with group functions like `select_from_group` with + certain data types (pointers, `marray` for example). + intel/llvm#17055 +- Implemented persistent cache eviction. intel/llvm#16289, intel/llvm#16522, + intel/llvm#16454 +- Enforced constraints documented by the + [`sycl_ext_oneapi_reduction_properties`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/experimental/sycl_ext_oneapi_reduction_properties.asciidoc) + extension. intel/llvm#16238 +- Clarified and enforced properties constraints in the + [`sycl_ext_oneapi_group_load_store`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/experimental/sycl_ext_oneapi_group_load_store.asciidoc) + extension specification and implementation. intel/llvm#16422 +- Implemented properties validation to kernel bundle and graph APIs. + intel/llvm#15647 +- Updated the + [`sycl_ext_oneapi_in_order_queue_events`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/experimental/sycl_ext_oneapi_in_order_queue_events.asciidoc) + extension specification and implementation to make event returned by + `ext_oneapi_get_last_event` optional for queues where no work had been + submitted. intel/llvm#16645 +- Update the + [`sycl_ext_oneapi_group_load_store`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/experimental/sycl_ext_oneapi_group_load_store.asciidoc) + extension specification and implementation to accept the `alignment` property + in group load/store built-in functions to allow for more optimized + implementation. intel/llvm#16882, intel/llvm#16890 +- Lifted restriction that host APIs from `sycl_ext_oneapi_free_function_kernels` + had to be guarded by `#ifndef __SYCL_DEVICE_ONLY__`. intel/llvm#17446 +- Completely disabled legacy images support (from SYCL 1.2.1) on HIP backend. + They were previously available under an environment variable, but the status + was so bad that there is no sense to keep the support at all. intel/llvm#17296 +- Fixed potential resource leaks in online compiler extension. intel/llvm#16517 +- Fixed an issue where `known_identity` would return incorrect values + with `-ffast-math` flag. intel/llvm#17028 +- Fixed a UB in implementation of `device_global` which sometimes led to + spurious results. intel/llvm#16224 +- Fixed a `static_assert` failure in SYCL headers when an application is + built with `-funsigned-char`. intel/llvm#17133 +- Resolved intel/llvm#15606. The issue caused memory operations enqueued through + `sycl_ext_oneapi_enqueue_functions` extension to break functionality of + `sycl_ext_oneapi_enqueue_barrier` extension. intel/llvm#16223 +- Fixed a bug where compiling with `-D_FORTIFY_SOURCE=2` would cause errors + from device compilers at JIT stage (or during AOT compilation) about + undefined `__memcpy_chk` symbol. intel/llvm#16501 +- Fixed an incorrect result of `std::exp(std::complex)` in some corner cases. +- Fixed a crash happening when you launch a kernel that is defined in both the + application and a `dlopen`-ed shared library after that library was unloaded + through `dlclose`. intel/llvm#17091 +- Fixed issue intel/llvm#14357 about + `kernel_device_specific::compile_sub_group_size` info query returning + incorrect results for CUDA & HIP backends. intel/llvm#17137 +- Fixed a memory leak happening when a kernel submission failed. + intel/llvm#17125 +- Fixed a bug where using `vec::operator[]` would cause compilation issues on + Windows when an application is built using `clang.exe` and `_DEBUG` macro is + set. intel/llvm#17025, intel/llvm#17261 + intel/llvm#17440 + +#### Issues with 3rd-party host compilers + +- Fixed compilation issue with `get_vec_idx` internal helper with MSVC as + host compiler. intel/llvm#16480 +- Fixed missing `#include` when building with GCC 13 as host compiler. + intel/llvm#16480 +- Fixed compilation issue with joint matrix extension with MSVC from Visual + Studio 2019 as host compiler. intel/llvm#17336 + +### Support for pre-C++11 ABI + +Many SYCL APIs use `std::string` as argument or return type and it is known for +its ABI being broken by `gcc` at some point. There are applications which are +still built using old, pre-C++11 ABI and in order to support them, SYCL RT +should not have `std::string` (and some other classes) used at the ABI boundry. +This effort has been largely complete, but some APIs still sneak up from time +to time and being fixed: + +- Added support for `print_graph` API in pre-C++11 ABI mode. intel/llvm#16194, + intel/llvm#16390 +- Added support for `pipe::get_pipe_name` API in pre-C++11 ABI mode. + intel/llvm#16178 +- Decided **not** to support `get_backend_info` in pre-C++11 ABI mode (at least + for now) because there are no queries that could be done through it. Calling + it under pre-C++11 ABI mode now causes an error. intel/llvm#16272 + +## Misc + +- Removed testing on FPGA Emulator as a step towards our strategy to drop FPGA + support (see intel/llvm#16929). Starting with this release there is no + guarantee that FPGA-specific features continue to work. intel/llvm#17223 +- Introduced new Unified Runtime adapter for Level Zero called `v2`. It is + expected to be more performant than existing one, but it is still in + development and unused by default. intel/llvm#16656, intel/llvm#17407 +- Docker images containing nightly builds are not provided anymore, but we + still provide Dockerfiles so you can build those images yourself. + intel/llvm#16539 +- Fixed OCL CPU Runtime installation script leaving incorrect permissions on + a system folder. intel/llvm#16719 + +## API/ABI breakages + +### Changes that are effective immediately + +- Removed support for FPGA-related options as part of our strategy to drop FPGA + support (see intel/llvm#16929). Removed options: `-fintelfpga`, + `-fsycl-targets=spir64_fpga[-unknown-unknown]`, `-fsycl-link=early|image`, + `-Xsycl-target-backend=spir64_fpga "opt"`, `-reuse-exe=arg` and + `-fsycl-help=fpga`. intel/llvm#16864 +- Removed experimental `sycl_ext_intel_oneapi_compiler` extension support. Its + APIs have been marked as deprecated for a while and + `sycl_ext_oneapi_kernel_compiler` extension should be used instead. + intel/llvm#16776 +- Restricted accepted spellings for AMD targets in `-fsyhcl-targets` to + `amdgcn-amd-amdhsa`. intel/llvm#15990 + +### Deprecations + +Those APIs are still present and tested, but they will be removed in future +releases: + +- Deprecated [`sycl_ext_oneapi_default_context`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/deprecated/sycl_ext_oneapi_default_context.asciidoc) + extension in favor of + [`sycl_khr_default_context`](https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:khr-default-context) + extension. intel/llvm#17135 +- Deprecated `-fsycl-fp32-prec-sqrt` compiler flag in favor of + `-foffload-fp32-prec-sqrt` flag. intel/llvm#17257 +- Deprecated overloads of `single_task` and `parallel_for` APIs that accept + properties which used to be a part of `sycl_ext_oneapi_kernel_properties` + extension. `sycl_ext_oneapi_enqueue_functions` extension should be used + instead. intel/llvm#16728 + - Deprecated overloads were completely removed from the extension + specification. intel/llvm#14785 +- Deprecated current implementation of `get_backend_info` API. The SYCL 2020 + specification currently does not document anything that could be queried + through it and therefore existing queries supported through it are deprecated + to avoid possible confusion. intel/llvm#16700 + +### Upcoming API/ABI breakages + +This changes are available for preview under `-fpreview-breaking-changes` flag. +They will be enabled by default (with no option to switch to the old behavior) +in the next ABI-breaking release: + +- Removed implementation of `get_backend_info` APIs, see above in the + Deprecations section. intel/llvm#16700 + +## Known Issues + +- SYCL headers use unreserved identifiers which sometimes cause clashes with + user-provided macro definitions (intel/llvm#3677). Known identifiers include: + - `G`. intel/llvm#11335 + - `VL`. intel/llvm#2981 +- On Windows, the Unified Runtime's Level Zero leak check does not work + correctly with the default contexts on Windows. This is because on Windows + the release of the plugin DLLs races against the release of static global + variables (like the default context). +- Intel Graphic Compiler's Vector Compute backend does not support + O0 code and often gets miscompiled, produces wrong answers + and crashes. This issue directly affects ESIMD code at O0. As a + temporary workaround, we have optimize ESIMD code even in O0 mode. + [00749b1e8](https://github.com/intel/llvm/commit/00749b1e8e3085acfdc63108f073a255842533e2) +- When using `sycl_ext_oneapi_matrix` extension it is important for some + devices to use the sm version (Compute Capability) corresponding to the + device that will run the program, i.e. use `-fsycl-targets=nvidia_gpu_sm_xx` + during compilation. This particularly affects matrix operations using + `half` data type. For more information on this issue consult with + https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#wmma-restrictions +- C/C++ math built-ins (like `exp` or `tanh`) can return incorrect results + on Windows for some edge-case input. The problems have been fixed in the + SYCL implementation, and the remaining issues are thought to be in MSVC. +- There are known issues and limitations in virtual functions + functionality, such as: + - Optional kernel features handling implementation is not complete yet. + - AOT support is not complete yet. + - A virtual function definition and definitions of all kernels using it + must be in the same translation unit. Please refer to + [`sycl/test-e2e/VirtualFunctions`](https://github.com/intel/llvm/tree/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/test-e2e/VirtualFunctions) + to see the list of working and non-working examples. + + +43ee65117935 [UR] Fix potential deadlock in the WaitEvent path of CmdBuffers (#16697) +40f0a6a0630b [SYCL][Graph] Fix L0 multi-device kernel bundles (#16343) +c5c0be57c660 [UR][CUDA][HIP] Add missing catch for native commands (#17524) +55a098709da2 [UR] Regenerate ur_valddi.cpp (#17487) +9c5762226d4f [UR] Fix cfi initialization (#17472) +a0df523df9c1 [UR] Deprecate UR_DEVICE_INFO_BFLOAT16 (#17053) +7650d831bb0f [UR] Check null pointer before handle in validation layer (#17474) +d6214ad114f8 [SYCL][UR][CUDA] Fix CMake CUPTI config (#17457) +44f120e92c12 [UR] Remove unnecessary unique pointer from cl program helper. (#17101) +19dbfb7605e9 [UR][L0] Create pool descriptors from subdevices... (#17465) +a9cb8f1e8540 [UR] Bump UMF to v0.11.0-dev4 (#17468) +1167ee6b388e [UR][CMake]Set CMAKE_MSVC_RUNTIME_LIBRARY to fix UMF linking issues (#17366) +fb582a748e42 [UR] [V2] Fix synchronization between command_list_manager usages (#17297) +b2db12abbb33 [UR] Fix typo in cfi flags handling. (#17452) +a1065507ce1f [UR] Handle adapters returning no platforms during testing (#17410) +768c5eaf4aca [UR] Generated code hidden by default in PR diffs (#17414) +2975d26c7050 [SYCL][UR][L0 v2] use blocking free when returning memory to the driver (#17375) +c342667c7e4f [UR] Updates to source checks job (#17160) +17df762be147 [SYCL][UR][CUDA] Use FindCUDAToolkit CMake module instead of FindCUDA (#17315) +0e155f0c1ab0 [SYCL][CUDA] Fix cupti library dynamic loading (#17272) +a000e56a9542 [UR][SYCL] Remove UR context atomic queries. (#16160) +38d750678fae [UR][L0] Manage UMF pools through usm::pool_manager (#17065) +c07039e2263d [UR] Add device info query for native assert. (#15929) +f870412188a5 [UR] Add UNSUPPORTED_FEATURE return for urDeviceGetGlobalTimestamps (#17389) +16713eae8c44 [UR] Allow loader to skip adapters based on prefilter device type. (#17072) +6ef844897f8a [UR][L0] Fix bfloat16 lookup to check for the extension (#17364) +68cacbfed4f5 [UR][L0] Disable Immediate Command List DG2 Windows (#17334) +607dff4c92a1 [UR] Stop using extension strings to report support for exp features. (#16046) +60ffdc3a97de [UR][L0] fix external semaphore with updated headers and report device info support (#17286) +255760e5af11 [UR] Fix various defects from static analysis (#17299) +fad173cbd14f [UR] Add UR_EXTERNAL_DEPENDENCIES CMake option (#17291) +a7774f2a74c5 [UR] Fix some tests that are broken when run with multiple cuda devices available. (#17216) +f01edd3abbb4 [UR] [L0] Update UR to link the Loader as static (#17104) +f36f787137d1 [UR][L0] Fix assignment of the in order flag for sync immediate list (#17199) +feed8b11e4fd [SYCL][CUDA] Fix adapter cupti linking (#17224) +b183751df103 [AsyncAlloc][UR][Exp] Initial API for async alloc entry points (#17117) +35fba198274c [UR][CUDA] Avoid unnecessary calls to cuFuncSetAttribute (#16928) +8fa2a120729b [UR] Improvements to align CTS and Spec for Program (#17094) +a4f976433067 [UR][CUDA] Change MAX_MEMORY_BANDWIDTH device query to uint64 (#16869) +5c76d4cd47ad [UR] Remove unnecessary and confusing unique_ptr usage (#17144) +1eccddb52284 [UR][L0 v2] check if copy offload is supported before requesting it (#17120) +646a5088f0d4 [UR] Update dependentloadflag for L0 adapters dlls (#17078) +2e288b083bc3 [UR] Improvements to align CTS and Spec for Device (#16746) +1515afacc6ae [UR][L0] Disable command-buffer immediate append path (#17097) +ae09897ff29d [UR] Use relative xpti/xptifw source when available (#17099) +d5bcb59367f7 [UR] Correct copyright string in Windows proxy loader (#17060) +5aa3157aba07 [SYCL][CUDA] Use UMF Proxy pool manager with UMF CUDA memory provider in UR (#17015) +e925b2b9f4c5 [SYCL][CUDA] Update UMF in UR to fix issue in LLVM (#17034) +5e01636b22ae Move Unified Runtime code into intel/llvm +f3d12f0167da Do not fetch cudart from gitlab for UMF (#16941) +23b2457304bd Use UMF CUDA provider in Unified Runtime (#16761) +64a095c36113 [SYCL][UR] Improve header copy dependencies (#17093) +928ed3e5a470 [UR][L0] Fix issue with command-buffer local mem update (#17069) +1638be92ec1d [UR] Choose in-tree unified-runtime directory if present (#16833) +113b46788672 [UR][L0]: MAX_COMPUTE_UNITS using ze_eu_count_ext_t (#16818) +d3e825ca3058 [UR][L0]: fix missing destroy of event given enqueue wait out event (#16759) +479da1d68964 [UR] Bump tag to 08d36b76 (#16810) +a6ebaa40ec28 [UR] Move urMemImageGetInfo success test from a switch to individual test (#16655) +a739d3418140 [UR] Make each profiling info variant for urEventGetProfilingInfo optional and improve its conformance test (#17067) +5f7043dc931a [UR] Don't set -pie on shared objects (#16880) +988c4777a709 [UR] In-order path for OpenCL command-buffers (#17056) +69941b863470 [UR] Make command-buffer creation descriptor mandatory (#17058) +0b979bf73689 [UR] Add remaining calls shared with queue in level-zero v2 adapter (#17061) +d142923d2a61 [UR][CL] Fix invalid use of dlopen() (#16736) +cf19f7758c6e [UR] fix parseDisjointPoolConfig and add tests (#16791) +02d2e34c1c83 [UR] Fix kernel arguments being overwritten in the CUDA and HIP adapters (#16733) +73f54e5296c5 [UR] Make adapters check native properties before dereferencing. (#16730) +9c65739ea12b [UR] Bump with DEVICE_INFO_PROGRAM_SET_SPECIALIZATION_CONSTANTS (#16659) +16ca790241fe [UR] Update tag to 8b7a9957 for https://github.com/oneapi-src/unified-runtime/pull/2582 (#16689) +b9a755831a77 [SYCL] Update UR tag for L0 synchronize fix (#16629) +8998b9b54f85 [UR] Unified clang format (#16672) +69cbf2a86d94 [UR] Bump UR version (main) with UMF v0.10.1 release (#16571) +a204f4031b45 [UR] Wrap urEventSetCallback when ran through loader (#16572) +48297dfbd765 [UR] Pull in fix needed for CTS device parameterization. (#16519) +c6b1edfb7512 [UR] Use reference counting on factories (#15296) +93de8f1c6127 [UR] Improve Kernel CTS (#16555) +c0e2fd19641e [UR] Bump tag to 3472b5bda (#16531) +8329e7bffca4 Bump for uninitialized-cuda-events-fix (#16469) +dcfdcfa249ab [UR] Update tag to ad288bb (#16512) +788ff7ffc01d [UR] Bump UR with zeCommandListImmediateAppendCommandListsExp usages fixes (#16458) +931a93b3fecf Bump UR and adjust use of urKernelSuggestMaxCooperativeGroupCountExp (#15966) +7a4a978e3483 [UR][L0] Fix Event Memory Leak due to no destroy on delete (#16410) +f1627498fc10 [UR][OpenCL] add a few missing Intel GPU device queries, fix device ID query (#16299) +a37bba8086f5 [UR] Update tag to 6d4eec8c for UR#2272 and UR#2336 (#15965) +fe88f1dc4e61 UR [SYCL][CUDA][HIP] Update images enable variable (#16147) + Has something to do with disabling images by default +5d8a55236008 [UR][L0] Bump main tag to 39df0317 (#16365) +28e84168b5e4 [UR] Update tag to 58e4d76 (#16322) +8106796a4093 [UR] Update tag to 45f3d8 (#16327) +8a41b47be40d [SYCL][UR][L0] Fix issue with event caching causing profiling tag conflicts (#16233) +06e57374b23d [UR] Pull in fixes for issues raised in latest coverity scan. (#16158) +33a0411c0bbf [UR] Interrupt-based event implementation (#16252) +590960a4205f [UR][L0] Add Support for External Semaphores (#16162) +c3eb1603adb2 [UR][L0] Disabling Driver In Order Lists by default (#16263) +6fd5143b5431 [UR] Update UR tag to include the fix to set the execution flag for all kernels (#16241) +b56ffc5f7c98 [UR] Update tag to 3fdf7e3 for UR #2353 and #2413 (#16262) +130a901922bc [UR][L0] fix event caching (#16207) +73b99be383ac [UR] Bump UR tag to eb076da (#16216) + # Release notes Nov'24 Release notes for commit range @@ -107,8 +800,6 @@ Release notes for commit range for SYCL Matrix. intel/llvm#15351 intel/llvm#15932 intel/llvm#15547 - Added support for specialization constants on Native CPU. intel/llvm#14446 - Added support for atomic fence on Native CPU. intel/llvm#14619 -- Added a new overload for `joint_matrix_apply` to be able to return result - into a different matrix. intel/llvm#13153 - Added `max_work_group_size`and `max_linear_work_group_size` kernel properties to allow users to specify the maximum work-group size that a kernel will be invoked with. intel/llvm#14518 From 2bb28168823489b8d34b9eba6d7971be569bb5ee Mon Sep 17 00:00:00 2001 From: Dmitry Vodopyanov Date: Mon, 19 May 2025 07:32:43 -0700 Subject: [PATCH 02/12] Remove info about unfinished KHR extensions --- sycl/ReleaseNotes.md | 25 ------------------------- 1 file changed, 25 deletions(-) diff --git a/sycl/ReleaseNotes.md b/sycl/ReleaseNotes.md index 4b65589605306..24120cf606a39 100644 --- a/sycl/ReleaseNotes.md +++ b/sycl/ReleaseNotes.md @@ -67,34 +67,9 @@ Release notes for commit range ### KHR extensions -Please note that KHR extensions are being specified and released by Khronos -Group. The process of completing and publishing a KHR extension takes a while, -but as implementors we need to prototype them early to help find possible issues -with specifications and ensure that they are implementable. - -During that stage in an extension development its specification is incomplete -and subject to change without any notice. Therefore, we will refer to those -extensions using **prototyped** word. Their implementation is not available by -default and requires `__DPCPP_ENABLE_UNFINISHED_KHR_EXTENSIONS` macro to be set -_before_ including `` header to make them available. Considering -that specifications of such extensions are not final and not versioned, their -prototypes may not exactly match the latest publicly available versions of the -corresponding specifications. There is no guarantee of completeness either. -You can find more details on our development processes in -[this document](sycl/doc/developer/KHRExtensions.md). - -The only reason those extensions are mentioned here is to give you a glimpse of -the future about which extensions will be supported in future releases. We do -not recommend to use such extensions right know, but advanced users who are -driving those extension specifications forward can do early experiments with -them to provide feedback to the Khronos Group. - - Implemented [`sycl_khr_default_context`](https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:khr-default-context) extension. intel/llvm#15645 -- **Prototyped** - [`sycl_khr_free_function_commands`](https://github.com/KhronosGroup/SYCL-Docs/pull/644) - extension. intel/llvm#16770, intel/llvm#17222 ### Other extensions From 9fe2cf7bd2369ff8beacd0714750eaa936c8ea5d Mon Sep 17 00:00:00 2001 From: Dmitry Vodopyanov Date: Mon, 19 May 2025 07:35:03 -0700 Subject: [PATCH 03/12] Fix typo --- sycl/ReleaseNotes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sycl/ReleaseNotes.md b/sycl/ReleaseNotes.md index 24120cf606a39..27de6848d0b9d 100644 --- a/sycl/ReleaseNotes.md +++ b/sycl/ReleaseNotes.md @@ -374,7 +374,7 @@ SYCL runtime over low-level runtimes (such as Level Zero or OpenCL): `rint` for HIP targets. intel/lllvm#16373 - Improved check for unsupported data types to actually rely on target information instead of hardcoded knowledge. For example, this allows 128-bit - integeres to be used in device code when targeting CUDA backend. + integers to be used in device code when targeting CUDA backend. intel/llvm#17036 - Fixed hangs on AMD and crashes on NVIDA when `atomic_ref` is used with `work_item` memory scope. intel/llvm#16172 From 0c04a260b61ae6d45577b822ac2e38c2c90b8e17 Mon Sep 17 00:00:00 2001 From: Dmitry Vodopyanov Date: Mon, 19 May 2025 08:21:43 -0700 Subject: [PATCH 04/12] Remove patch which was introduced and reverted in the same release --- sycl/ReleaseNotes.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/sycl/ReleaseNotes.md b/sycl/ReleaseNotes.md index 27de6848d0b9d..3094aaaef4ffa 100644 --- a/sycl/ReleaseNotes.md +++ b/sycl/ReleaseNotes.md @@ -349,8 +349,6 @@ SYCL runtime over low-level runtimes (such as Level Zero or OpenCL): - Removed a busy-wait loop from the implementation of `-fsycl-max-parallel-link-jobs` flag, making it consume less resources when waiting. intel/llvm#17260 -- Made `-O0` to be the default optimization level when debug info is enabled - through `-g` flag. intel/llvm#16408 - Uplifted maximum version of SPIR-V that compiler can generate to 1.5. intel/llvm#16626 - Made compiler embed device library needed for `bfloat16` support into the From 699e622d70073f32bd12905a618441077a2f4cf0 Mon Sep 17 00:00:00 2001 From: Dmitry Vodopyanov Date: Tue, 20 May 2025 06:07:34 -0700 Subject: [PATCH 05/12] Apply CR changes --- sycl/ReleaseNotes.md | 201 ++++++++++++++----------------------------- 1 file changed, 63 insertions(+), 138 deletions(-) diff --git a/sycl/ReleaseNotes.md b/sycl/ReleaseNotes.md index 3094aaaef4ffa..e742904010b9c 100644 --- a/sycl/ReleaseNotes.md +++ b/sycl/ReleaseNotes.md @@ -2,20 +2,56 @@ ## New Features +### Component A + +- Added support for ... intel/llvm#pr + +### Component B + - Added support for ... intel/llvm#pr -## Improvements + +## Improvements and bugfixes + +### Component A - Improved handling of ... intel/llvm#pr +- Fixed ... intel/llvm#pr -## Bug Fixes +### Component B +- Improved handling of ... intel/llvm#pr - Fixed ... intel/llvm#pr ## Misc - Did this and that ... intel/llvm#pr +## API/ABI breakages + +### Changes that are effective immediately + +- Removed ... intel/llvm#pr + +### Deprecations + +Those APIs are still present and tested, but they will be removed in future +releases: + +- Deprecated ... intel/llvm#pr + +### Upcoming API/ABI breakages + +This changes are available for preview under `-fpreview-breaking-changes` flag. +They will be enabled by default (with no option to switch to the old behavior) +in the next ABI-breaking release: + +- Removed ... intel/llvm#pr + +## Known Issues + +- ... + # Release notes Mar'25 Release notes for commit range @@ -44,22 +80,20 @@ Release notes for commit range ### SYCL graphs -- Introduced and implemented +- Implemented [`sycl_ext_codeplay_enqueue_native_command`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/experimental/sycl_ext_codeplay_enqueue_native_command.asciidoc) - extension which allows to include custom commands for interoperability with - native runtimes into graphs built using `sycl_ext_oneapi_graph` extension. + extension which allows submitting custom commands for interoperability with + native runtimes to graphs built using the `sycl_ext_oneapi_graph` extension. intel/llvm#16871 +- Introduced ability to update host-task nodes in graphs. intel/llvm#16853 ### Bindless images -- Extended the extension - [specification](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/experimental/sycl_ext_oneapi_bindless_images.asciidoc) - to support more kinds of copy operations (`image_mem_handle` to USM and vice - versa, USM to USM, etc.) and implemented them. intel/llvm#16661, - intel/llvm#17507 -- Extended the extension specification and implementation to support - `gather_image` device built-in function. The implementation was only done for - CUDA backend so far. intel/llvm#17322 +- Added support for more kinds of copy operations (`image_mem_handle` to USM and vice + versa, USM to USM, etc.) intel/llvm#16661, intel/llvm#17507 +- Added support for `gather_image` device built-in function. This feature is currently + only supported on the CUDA backend. intel/llvm#17322 +- Added support for Vulkan timeline semaphores. intel/llvm#17395 ### Native CPU Device @@ -89,7 +123,8 @@ Release notes for commit range extensions that provide different ways of allocating and accessing device local memory (i.e. shared by all work-items within a work-group). intel/llvm#15061, intel/llvm#16325 - - The former is only supported on CUDA backend + - [`sycl_ext_oneapi_work_group_static`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/experimental/sycl_ext_oneapi_work_group_static.asciidoc) + is currently only supported on CUDA backend - Introduced and implemented [`sycl_ext_intel_kernel_queries`](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/supported/sycl_ext_intel_kernel_queries.asciidoc) extension. intel/llvm#16834 @@ -111,7 +146,7 @@ Release notes for commit range ### New compiler options -- Added support for ``-f[no]-offload-fp32-prec-div` and +- Added support for `-f[no]-offload-fp32-prec-div` and `-f[no-]-offload-fp32-prec-sqrt` compiler flags to control precision of floating-point division and square root. intel/llvm#15836, intel/llvm#16107, intel/llvm#16993, intel/llvm#17044, intel/llvm#17033, intel/llvm#16942, @@ -135,7 +170,8 @@ Release notes for commit range #### Thread Sanitizer -- Introduced thread sanitizer support for device code. intel/llvm#17345, +- Introduced thread sanitizer support for SYCL and OpenMP C/C++ device code. It + features data race detection in USM and device global memory. intel/llvm#17345, intel/llvm#17211, intel/llvm#17155, intel/llvm#17181 ## Improvements and bugfixes @@ -147,8 +183,7 @@ Release notes for commit range improve performance. intel/llvm#17495 - Documented kernel binary update feature which allows to update kernel nodes in graphs. This feature had been implemented earlier already. intel/llvm#14896 -- Introduced ability to update host-task nodes in graphs. intel/llvm#16853 -- Fixed race condition in `mutable_command_graph` node queries. intel/llvm#17012 +- Fixed race condition in `command_graph` node queries. intel/llvm#17012 - Fixed the issue with not all graph-related classes fully implementing common reference semantics. intel/llvm#16788 - Documented interaction with `sycl_ext_oneapi_local_memory` extension. @@ -158,15 +193,16 @@ Release notes for commit range - Made `ext_oneapi_weak_object` extension work with graph objects. intel/llvm#16209 - Fixed a bug where using `local_accessor` or `work_group_memory` objects as - part of whole graph update would function incorrectly on CUDA & HIP backends. + part of graph update would function incorrectly on CUDA & HIP backends. intel/llvm#16025 ### SYCLcompat library - Introduced new set of group utility functions and classes aimed to reduce the gap between `syclcompat` and `dpct` namespaces. intel/llvm#17263 -- 73e6b224aacf [SYCLCOMPAT] Forward launch arguments to avoid copies (#16965) - - Definitely user-visible, but I'm not sure how to word that +- Fixed an issue where `CUTensorMap` objects would be unintentionally copied, + causing `CUDA_ERROR_ILLEGAL_ADDRESS` when running on the CUDA backend. + intel/llvm#16965 - Fixed `compare_mask` putting results in the wrong 2-byte segment of 4-byte output. intel/llvm#16768 - Optimized implementation of `permute_sub_group_by_xor` for the case when @@ -207,36 +243,22 @@ Release notes for commit range - Fixed ASAN throwing an exception with `UR_RESULT_ERROR_INVALID_ARGUMENT` when detecting incorect memory free operation. intel/llvm#16706 - - - -- bee8a397ac72 [UR][DeviceASAN] Sync the latest changes in asan_libdevice.hpp (#15911) -- 6347914485a8 [DeviceAsan] bugfixes for UR (#16257) -- e9143ca66108 [DeviceAsan] Report error when using unsupported API (#16281) -- 092cd2dfc034 [UR][DeviceASAN] Bugfix for mmap (#16466) -- 696514238e2e [DeviceASAN] Fix kernel release order (#16688) -- cc6148dfd17c [UR][DeviceASAN] Bugfix for GetDeviceType (#16745) -- 76c665363565 [DeviceSanitizers] Adjust backtrace addresses to call instruction (#17404) -- e2ab2b9ba963 [DevSan][Refactor] Make Options an unified class shared by all sanitizers (#17157) - ### Bindless images -- Added support for timeline semaphores. intel/llvm#17395 - Added support for `ext_oneapi_bindless_sampled_image_fetch_1d`, `ext_oneapi_bindless_sampled_image_fetch_1d_usm`, `ext_oneapi_bindless_sampled_image_fetch_2d`, `ext_oneapi_bindless_sampled_image_fetch_2d_usm` and `ext_oneapi_bindless_sampled_image_fetch_3d` aspects on Level Zero backend. intel/llvm#16862 +- Added the initial support for bindless images on AMD GPUs. intel/llvm#16439 - Fixed return types of image extent queries to match the specification. intel/llvm#16829 - Clarified the types of supported USM memory in the extension specification. intel/llvm#16622 - -- 3161af314190 [SYCL][Ext][Bindless] Initial implementation of image spirv builtins on HIP (#16439) - - What exactly does it mean for end user? -- b732a3c4c9cb [SYCL][Bindless] Fix incorrect mangling of bindless images builtin functions (#16135) - - What was the user-visible effect of the issue? +- Fixed compiler crash caused by the use of anisotropic sampling operations on 3D mipmaps, + due to the intrinsic being generated with an incorrect number of LOD gradient parameters. + intel/llvm#16135 ### Native CPU device @@ -343,7 +365,7 @@ SYCL runtime over low-level runtimes (such as Level Zero or OpenCL): ### Other changes in SYCL Compiler -- Introduced a new optimization to eliminate back-to-back barriers when its +- Introduced a new optimization to eliminate back-to-back barriers when it is safe. Such chain of barriers may occur when multiple group algorithms are used next to each other. intel/llvm#16750 - Removed a busy-wait loop from the implementation of @@ -355,7 +377,7 @@ SYCL runtime over low-level runtimes (such as Level Zero or OpenCL): application (if it is used). This change will allow us to reduce the size of redistributable SYCL RT package by eliminating some files from it. intel/llvm#16729 -- Added a compiler diagnostic (warning) about undefined `SYCL_EXTERNAL` +- Added a compiler warning diagnostic about undefined `SYCL_EXTERNAL` functions used in a module to help catch linking errors earlier. intel/llvm#17346 - Addressed issue intel/llvm#11531 where the compiler would generate invalid @@ -583,104 +605,7 @@ in the next ABI-breaking release: to see the list of working and non-working examples. -43ee65117935 [UR] Fix potential deadlock in the WaitEvent path of CmdBuffers (#16697) 40f0a6a0630b [SYCL][Graph] Fix L0 multi-device kernel bundles (#16343) -c5c0be57c660 [UR][CUDA][HIP] Add missing catch for native commands (#17524) -55a098709da2 [UR] Regenerate ur_valddi.cpp (#17487) -9c5762226d4f [UR] Fix cfi initialization (#17472) -a0df523df9c1 [UR] Deprecate UR_DEVICE_INFO_BFLOAT16 (#17053) -7650d831bb0f [UR] Check null pointer before handle in validation layer (#17474) -d6214ad114f8 [SYCL][UR][CUDA] Fix CMake CUPTI config (#17457) -44f120e92c12 [UR] Remove unnecessary unique pointer from cl program helper. (#17101) -19dbfb7605e9 [UR][L0] Create pool descriptors from subdevices... (#17465) -a9cb8f1e8540 [UR] Bump UMF to v0.11.0-dev4 (#17468) -1167ee6b388e [UR][CMake]Set CMAKE_MSVC_RUNTIME_LIBRARY to fix UMF linking issues (#17366) -fb582a748e42 [UR] [V2] Fix synchronization between command_list_manager usages (#17297) -b2db12abbb33 [UR] Fix typo in cfi flags handling. (#17452) -a1065507ce1f [UR] Handle adapters returning no platforms during testing (#17410) -768c5eaf4aca [UR] Generated code hidden by default in PR diffs (#17414) -2975d26c7050 [SYCL][UR][L0 v2] use blocking free when returning memory to the driver (#17375) -c342667c7e4f [UR] Updates to source checks job (#17160) -17df762be147 [SYCL][UR][CUDA] Use FindCUDAToolkit CMake module instead of FindCUDA (#17315) -0e155f0c1ab0 [SYCL][CUDA] Fix cupti library dynamic loading (#17272) -a000e56a9542 [UR][SYCL] Remove UR context atomic queries. (#16160) -38d750678fae [UR][L0] Manage UMF pools through usm::pool_manager (#17065) -c07039e2263d [UR] Add device info query for native assert. (#15929) -f870412188a5 [UR] Add UNSUPPORTED_FEATURE return for urDeviceGetGlobalTimestamps (#17389) -16713eae8c44 [UR] Allow loader to skip adapters based on prefilter device type. (#17072) -6ef844897f8a [UR][L0] Fix bfloat16 lookup to check for the extension (#17364) -68cacbfed4f5 [UR][L0] Disable Immediate Command List DG2 Windows (#17334) -607dff4c92a1 [UR] Stop using extension strings to report support for exp features. (#16046) -60ffdc3a97de [UR][L0] fix external semaphore with updated headers and report device info support (#17286) -255760e5af11 [UR] Fix various defects from static analysis (#17299) -fad173cbd14f [UR] Add UR_EXTERNAL_DEPENDENCIES CMake option (#17291) -a7774f2a74c5 [UR] Fix some tests that are broken when run with multiple cuda devices available. (#17216) -f01edd3abbb4 [UR] [L0] Update UR to link the Loader as static (#17104) -f36f787137d1 [UR][L0] Fix assignment of the in order flag for sync immediate list (#17199) -feed8b11e4fd [SYCL][CUDA] Fix adapter cupti linking (#17224) -b183751df103 [AsyncAlloc][UR][Exp] Initial API for async alloc entry points (#17117) -35fba198274c [UR][CUDA] Avoid unnecessary calls to cuFuncSetAttribute (#16928) -8fa2a120729b [UR] Improvements to align CTS and Spec for Program (#17094) -a4f976433067 [UR][CUDA] Change MAX_MEMORY_BANDWIDTH device query to uint64 (#16869) -5c76d4cd47ad [UR] Remove unnecessary and confusing unique_ptr usage (#17144) -1eccddb52284 [UR][L0 v2] check if copy offload is supported before requesting it (#17120) -646a5088f0d4 [UR] Update dependentloadflag for L0 adapters dlls (#17078) -2e288b083bc3 [UR] Improvements to align CTS and Spec for Device (#16746) -1515afacc6ae [UR][L0] Disable command-buffer immediate append path (#17097) -ae09897ff29d [UR] Use relative xpti/xptifw source when available (#17099) -d5bcb59367f7 [UR] Correct copyright string in Windows proxy loader (#17060) -5aa3157aba07 [SYCL][CUDA] Use UMF Proxy pool manager with UMF CUDA memory provider in UR (#17015) -e925b2b9f4c5 [SYCL][CUDA] Update UMF in UR to fix issue in LLVM (#17034) -5e01636b22ae Move Unified Runtime code into intel/llvm -f3d12f0167da Do not fetch cudart from gitlab for UMF (#16941) -23b2457304bd Use UMF CUDA provider in Unified Runtime (#16761) -64a095c36113 [SYCL][UR] Improve header copy dependencies (#17093) -928ed3e5a470 [UR][L0] Fix issue with command-buffer local mem update (#17069) -1638be92ec1d [UR] Choose in-tree unified-runtime directory if present (#16833) -113b46788672 [UR][L0]: MAX_COMPUTE_UNITS using ze_eu_count_ext_t (#16818) -d3e825ca3058 [UR][L0]: fix missing destroy of event given enqueue wait out event (#16759) -479da1d68964 [UR] Bump tag to 08d36b76 (#16810) -a6ebaa40ec28 [UR] Move urMemImageGetInfo success test from a switch to individual test (#16655) -a739d3418140 [UR] Make each profiling info variant for urEventGetProfilingInfo optional and improve its conformance test (#17067) -5f7043dc931a [UR] Don't set -pie on shared objects (#16880) -988c4777a709 [UR] In-order path for OpenCL command-buffers (#17056) -69941b863470 [UR] Make command-buffer creation descriptor mandatory (#17058) -0b979bf73689 [UR] Add remaining calls shared with queue in level-zero v2 adapter (#17061) -d142923d2a61 [UR][CL] Fix invalid use of dlopen() (#16736) -cf19f7758c6e [UR] fix parseDisjointPoolConfig and add tests (#16791) -02d2e34c1c83 [UR] Fix kernel arguments being overwritten in the CUDA and HIP adapters (#16733) -73f54e5296c5 [UR] Make adapters check native properties before dereferencing. (#16730) -9c65739ea12b [UR] Bump with DEVICE_INFO_PROGRAM_SET_SPECIALIZATION_CONSTANTS (#16659) -16ca790241fe [UR] Update tag to 8b7a9957 for https://github.com/oneapi-src/unified-runtime/pull/2582 (#16689) -b9a755831a77 [SYCL] Update UR tag for L0 synchronize fix (#16629) -8998b9b54f85 [UR] Unified clang format (#16672) -69cbf2a86d94 [UR] Bump UR version (main) with UMF v0.10.1 release (#16571) -a204f4031b45 [UR] Wrap urEventSetCallback when ran through loader (#16572) -48297dfbd765 [UR] Pull in fix needed for CTS device parameterization. (#16519) -c6b1edfb7512 [UR] Use reference counting on factories (#15296) -93de8f1c6127 [UR] Improve Kernel CTS (#16555) -c0e2fd19641e [UR] Bump tag to 3472b5bda (#16531) -8329e7bffca4 Bump for uninitialized-cuda-events-fix (#16469) -dcfdcfa249ab [UR] Update tag to ad288bb (#16512) -788ff7ffc01d [UR] Bump UR with zeCommandListImmediateAppendCommandListsExp usages fixes (#16458) -931a93b3fecf Bump UR and adjust use of urKernelSuggestMaxCooperativeGroupCountExp (#15966) -7a4a978e3483 [UR][L0] Fix Event Memory Leak due to no destroy on delete (#16410) -f1627498fc10 [UR][OpenCL] add a few missing Intel GPU device queries, fix device ID query (#16299) -a37bba8086f5 [UR] Update tag to 6d4eec8c for UR#2272 and UR#2336 (#15965) -fe88f1dc4e61 UR [SYCL][CUDA][HIP] Update images enable variable (#16147) - Has something to do with disabling images by default -5d8a55236008 [UR][L0] Bump main tag to 39df0317 (#16365) -28e84168b5e4 [UR] Update tag to 58e4d76 (#16322) -8106796a4093 [UR] Update tag to 45f3d8 (#16327) -8a41b47be40d [SYCL][UR][L0] Fix issue with event caching causing profiling tag conflicts (#16233) -06e57374b23d [UR] Pull in fixes for issues raised in latest coverity scan. (#16158) -33a0411c0bbf [UR] Interrupt-based event implementation (#16252) -590960a4205f [UR][L0] Add Support for External Semaphores (#16162) -c3eb1603adb2 [UR][L0] Disabling Driver In Order Lists by default (#16263) -6fd5143b5431 [UR] Update UR tag to include the fix to set the execution flag for all kernels (#16241) -b56ffc5f7c98 [UR] Update tag to 3fdf7e3 for UR #2353 and #2413 (#16262) -130a901922bc [UR][L0] fix event caching (#16207) -73b99be383ac [UR] Bump UR tag to eb076da (#16216) # Release notes Nov'24 From b5bc94e7a9c11f9104cac04a67fcd77423359826 Mon Sep 17 00:00:00 2001 From: Dmitry Vodopyanov Date: Tue, 20 May 2025 08:22:30 -0700 Subject: [PATCH 06/12] Update info about the kernel compiler --- sycl/ReleaseNotes.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sycl/ReleaseNotes.md b/sycl/ReleaseNotes.md index e742904010b9c..c01d3f519ddf9 100644 --- a/sycl/ReleaseNotes.md +++ b/sycl/ReleaseNotes.md @@ -75,7 +75,7 @@ Release notes for commit range intel/llvm#17032, intel/llvm#16823, intel/llvm#16702, intel/llvm#16638, intel/llvm#16316, intel/llvm#17359, intel/llvm#16485, intel/llvm#16821 - Known issues and limitations are documented - [in the extension specification](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/experimental/sycl_ext_oneapi_kernel_compiler.asciidoc#non-normative-implementation-notes-for-dpc). intel/llvm#17307, + [in the extension specification](https://github.com/intel/llvm/blob/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/doc/extensions/experimental/sycl_ext_oneapi_kernel_compiler.asciidoc#known-issues-and-limitations-when-the-language-is-sycl). intel/llvm#17307, intel/llvm#17459 ### SYCL graphs @@ -90,7 +90,7 @@ Release notes for commit range ### Bindless images - Added support for more kinds of copy operations (`image_mem_handle` to USM and vice - versa, USM to USM, etc.) intel/llvm#16661, intel/llvm#17507 + versa, USM to USM, etc.) intel/llvm#16661, intel/llvm#17507 - Added support for `gather_image` device built-in function. This feature is currently only supported on the CUDA backend. intel/llvm#17322 - Added support for Vulkan timeline semaphores. intel/llvm#17395 From 4eed4ca2c595df35d0aa15c17cf5679e59a505b2 Mon Sep 17 00:00:00 2001 From: Dmitry Vodopyanov Date: Tue, 20 May 2025 08:29:50 -0700 Subject: [PATCH 07/12] Clarify one patch --- sycl/ReleaseNotes.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sycl/ReleaseNotes.md b/sycl/ReleaseNotes.md index c01d3f519ddf9..b384683299a57 100644 --- a/sycl/ReleaseNotes.md +++ b/sycl/ReleaseNotes.md @@ -407,8 +407,8 @@ SYCL runtime over low-level runtimes (such as Level Zero or OpenCL): - Fixed a bug that linking static libraries with SYCL code in them using `-l:libname.a` spelling would ignore device code from those libraries. intel/llvm#17149 -- Fixed a bug where having a pure virtual function marked as device one would - cause unresolved symbol errors emitted by device compiler on Windows. +- Fixed a bug where having a pure virtual function during device compilation + would cause unresolved symbol errors emitted by device compiler on Windows. intel/llvm#16231 - Fixed a bug where having two kernels (one annotated with `reqd_work_group_size` attribute/property and another without it) together From a439d7eb3e61e9096fccd19cec78c01f1245398d Mon Sep 17 00:00:00 2001 From: Dmitry Vodopyanov Date: Tue, 20 May 2025 08:34:33 -0700 Subject: [PATCH 08/12] Removed some extra info --- sycl/ReleaseNotes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sycl/ReleaseNotes.md b/sycl/ReleaseNotes.md index b384683299a57..cd901df01a6d5 100644 --- a/sycl/ReleaseNotes.md +++ b/sycl/ReleaseNotes.md @@ -182,7 +182,7 @@ Release notes for commit range execution order to avoid issues with overflowing stack on huge graphs and improve performance. intel/llvm#17495 - Documented kernel binary update feature which allows to update kernel nodes - in graphs. This feature had been implemented earlier already. intel/llvm#14896 + in graphs. intel/llvm#14896 - Fixed race condition in `command_graph` node queries. intel/llvm#17012 - Fixed the issue with not all graph-related classes fully implementing common reference semantics. intel/llvm#16788 From 964745c7800f57fa32ef767292ecb6f1a33e35db Mon Sep 17 00:00:00 2001 From: Dmitry Vodopyanov Date: Tue, 20 May 2025 08:44:37 -0700 Subject: [PATCH 09/12] Remove extra line --- sycl/ReleaseNotes.md | 1 - 1 file changed, 1 deletion(-) diff --git a/sycl/ReleaseNotes.md b/sycl/ReleaseNotes.md index cd901df01a6d5..425c74e1e0761 100644 --- a/sycl/ReleaseNotes.md +++ b/sycl/ReleaseNotes.md @@ -10,7 +10,6 @@ - Added support for ... intel/llvm#pr - ## Improvements and bugfixes ### Component A From ee6de17d2e84c2deefbe313a3dbaf268fd655efb Mon Sep 17 00:00:00 2001 From: Dmitry Vodopyanov Date: Wed, 21 May 2025 04:35:55 -0700 Subject: [PATCH 10/12] Apply comments from Graphs, Native CPU and SYCLcompat components --- sycl/ReleaseNotes.md | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/sycl/ReleaseNotes.md b/sycl/ReleaseNotes.md index 425c74e1e0761..03d72311c845c 100644 --- a/sycl/ReleaseNotes.md +++ b/sycl/ReleaseNotes.md @@ -198,7 +198,8 @@ Release notes for commit range ### SYCLcompat library - Introduced new set of group utility functions and classes aimed to reduce the - gap between `syclcompat` and `dpct` namespaces. intel/llvm#17263 + gap between `syclcompat` and `dpct` namespaces when migrating CUB functions. + intel/llvm#17263 - Fixed an issue where `CUTensorMap` objects would be unintentionally copied, causing `CUDA_ERROR_ILLEGAL_ADDRESS` when running on the CUDA backend. intel/llvm#16965 @@ -209,10 +210,8 @@ Release notes for commit range - Added new function `ternary_logic_op` to perform bitwise logical operations on three input values based on the specified 8-bit truth table. intel/llvm#16509 -- 6e0d90e73ed1 [SYCLCompat] Fix vectorized_binary impl to make SYCLomatic migrated code run pass (#16553) - - Not sure how to word that -- 16c447998836 [SYCL][COMPAT] Replace T{-1} with static_cast(-1) for mask creation (#16527) - - bugfix? +- Fixed issues with multiple vectorized operations returning wrong results. + intel/llvm#16553 intel/llvm#16527 ### Explicit SIMD extension @@ -225,6 +224,14 @@ Release notes for commit range ### Sanitizers +- Only report warning if passing host ptr to kernel intel/llvm#16654 +- Make ShadowMemory one instance per type intel/llvm#16687 +- Fixed kernel name addressspace intel/llvm#16425 +- Fix ASAN with kernel assert intel/llvm#16256 +- Fix AcceeChain to a matrix for bfloat16 intel/llvm/16323 +- Reduce the frequency of shadow memory reallocation to reduce memory overhead and improve runtime performance intel/llvm#16280, intel/llvm#16258 +- Fix device global type of KernelMetadata intel/llvm#16357 + - ada16682e8c3 [DevASAN] Only report warning if passing host ptr to kernel (#16654) - seems like a potentially important bugfix - ce4a320806b2 [DeivceASAN] Make ShadowMemory one instance per type (#16687) @@ -264,7 +271,7 @@ Release notes for commit range - Improved support for `dynamic_address_cast` on Native CPU device. intel/llvm#16676 - Improved performance of Native CPU device: less memory allocations and thread - launches. intel/llvm#17102 + launches. intel/llvm#17102, intel/llvm#17215 - Fixed a bug where submitting the same kernel multiple times at about the same time with different argument would lead to incorrect arguments being used. intel/llvm#16995 @@ -603,9 +610,6 @@ in the next ABI-breaking release: [`sycl/test-e2e/VirtualFunctions`](https://github.com/intel/llvm/tree/b23d69e2c3fda1d69351137991897c96bf6a586d/sycl/test-e2e/VirtualFunctions) to see the list of working and non-working examples. - -40f0a6a0630b [SYCL][Graph] Fix L0 multi-device kernel bundles (#16343) - # Release notes Nov'24 Release notes for commit range From d33e509a640505d9d8626975d5d4107d80b7b414 Mon Sep 17 00:00:00 2001 From: Dmitry Vodopyanov Date: Wed, 21 May 2025 04:38:21 -0700 Subject: [PATCH 11/12] Apply comment from NativeCPU component --- sycl/ReleaseNotes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sycl/ReleaseNotes.md b/sycl/ReleaseNotes.md index 03d72311c845c..31d67bdbfaae4 100644 --- a/sycl/ReleaseNotes.md +++ b/sycl/ReleaseNotes.md @@ -279,7 +279,7 @@ Release notes for commit range intel/llvm#16737 - Fixed segfaults happening in SYCL CTS tests for `async_work_group_copy` API. intel/llvm#16500 -- Improved support for sub-groups by updating version of OneAPI Construction +- Improved support for sub-groups by updating version of oneAPI Construction Kit. intel/llvm#16785 ### Matrix From 1079097f014bb3bec733e37f2fb0801f4260daa9 Mon Sep 17 00:00:00 2001 From: Dmitry Vodopyanov Date: Wed, 21 May 2025 04:48:35 -0700 Subject: [PATCH 12/12] Apply changes for Sanitizers --- sycl/ReleaseNotes.md | 21 ++------------------- 1 file changed, 2 insertions(+), 19 deletions(-) diff --git a/sycl/ReleaseNotes.md b/sycl/ReleaseNotes.md index 31d67bdbfaae4..2480e2daa23a6 100644 --- a/sycl/ReleaseNotes.md +++ b/sycl/ReleaseNotes.md @@ -224,25 +224,8 @@ Release notes for commit range ### Sanitizers -- Only report warning if passing host ptr to kernel intel/llvm#16654 -- Make ShadowMemory one instance per type intel/llvm#16687 -- Fixed kernel name addressspace intel/llvm#16425 -- Fix ASAN with kernel assert intel/llvm#16256 -- Fix AcceeChain to a matrix for bfloat16 intel/llvm/16323 -- Reduce the frequency of shadow memory reallocation to reduce memory overhead and improve runtime performance intel/llvm#16280, intel/llvm#16258 -- Fix device global type of KernelMetadata intel/llvm#16357 - -- ada16682e8c3 [DevASAN] Only report warning if passing host ptr to kernel (#16654) - - seems like a potentially important bugfix -- ce4a320806b2 [DeivceASAN] Make ShadowMemory one instance per type (#16687) - - seems like some kind of bugfix -- ef4d66af3b74 [DeviceSAN] Fix kernel name addressspace (#16425) -- 1fba00d3be7d [DeviceASAN] Fix ASAN with kernel assert (#16256) -- 34aeabab551e [SYCL][DeviceASAN] Fix AcceeChain to a matrix for bfloat16 (#16323) -- 6f3b0e857d15 [DevASAN] Do allocation with USM pool to reduce memory overhead (#16280) -- a8c6e7715be2 [DeviceASAN] Re-use shadow if required size is not larger than last one (#16258) - - As the above, some optimization of memory usage by ASAN? -- 201725664cc5 [DeviceSanitizer] Fix device global type of KernelMetadata (#16357) +- Reduce the frequency of shadow memory reallocation to reduce memory overhead + and improve runtime performance intel/llvm#16280, intel/llvm#16258 #### Address Sanitizer