@@ -101,11 +101,13 @@ in the group.
101101 and default constructible.
102102* `Properties` is an instance of `sycl::ext::oneapi::experimental::properties`
103103
104+ _Mandates_: If `Properties` contains the `alignment` property, `InputIteratorT` must be a pointer.
105+
104106_Effects_: Loads single element from `in_iter` to `out` by using the `g` group
105107object to identify memory location as `in_iter` + `g.get_local_linear_id()`.
106108
107- Properties may provide xref:optimization_properties[assertions] which can
108- enable better optimizations.
109+ Properties may provide xref:optimization_properties[assertions] or the `alignment` property
110+ which can enable better optimizations.
109111
110112==== `sycl::vec` Overload
111113
@@ -132,6 +134,8 @@ in the group.
132134 and default constructible.
133135* `Properties` is an instance of `sycl::ext::oneapi::experimental::properties`
134136
137+ _Mandates_: If `Properties` contains the `alignment` property, `InputIteratorT` must be a pointer.
138+
135139_Effects_: Loads `N` elements from `in_iter` to `out`
136140using the `g` group object.
137141Properties may specify xref:data_placement[data placement].
@@ -140,8 +144,9 @@ Default data placement is a blocked one:
140144in striped case:
141145`out[i]` = `in_iter[g.get_local_linear_id() + g.get_local_linear_range() * i];`
142146for `i` between `0` and `N`.
143- Properties may also provide xref:optimization_properties[assertions] which can
144- enable better optimizations.
147+ Properties may also provide xref:optimization_properties[assertions] or the `alignment` property
148+ which can enable better optimizations.
149+
145150
146151==== Fixed-size Array Overload
147152
@@ -169,6 +174,8 @@ work-group or sub-group.
169174 and default constructible.
170175* `Properties` is an instance of `sycl::ext::oneapi::experimental::properties`
171176
177+ _Mandates_: If `Properties` contains the `alignment` property, `InputIteratorT` must be a pointer.
178+
172179_Effects_: Loads `ElementsPerWorkItem` elements from `in_iter` to `out`
173180using the `g` group object.
174181Properties may specify xref:data_placement[data placement].
@@ -177,8 +184,9 @@ Default placement is a blocked one:
177184in striped case:
178185`out[i]` = `in_iter[g.get_local_linear_id() + g.get_local_linear_range() * i];`
179186for `i` between `0` and `ElementsPerWorkItem`.
180- Properties may also provide xref:optimization_properties[assertions] which can
181- enable better optimizations.
187+ Properties may also provide xref:optimization_properties[assertions] or the `alignment` property
188+ which can enable better optimizations.
189+
182190
183191
184192=== Store API
@@ -209,11 +217,13 @@ in the group.
209217 and default constructible.
210218* `Properties` is an instance of `sycl::ext::oneapi::experimental::properties`
211219
220+ _Mandates_: If `Properties` contains the `alignment` property, `OutputIteratorT` must be a pointer.
221+
212222_Effects_: Stores single element `in` to `out_iter` by using the `g` group
213223object to identify memory location as `out_iter` + `g.get_local_linear_id()`
214224
215- Properties may provide xref:optimization_properties[assertions] which can
216- enable better optimizations.
225+ Properties may provide xref:optimization_properties[assertions] or the `alignment` property
226+ which can enable better optimizations.
217227
218228
219229==== `sycl::vec` Overload
@@ -241,6 +251,8 @@ in the group.
241251 and default constructible.
242252* `Properties` is an instance of `sycl::ext::oneapi::experimental::properties`
243253
254+ _Mandates_: If `Properties` contains the `alignment` property, `OutputIteratorT` must be a pointer.
255+
244256_Effects_: Stores `N` elements from `in` vec to `out_iter`
245257using the `g` group object.
246258Properties may specify xref:data_placement[data placement].
@@ -249,8 +261,8 @@ Default placement is a blocked one:
249261in striped case:
250262`out_iter[g.get_local_linear_id() + g.get_local_linear_range() * i]` = `in[i];`
251263for `i` between `0` and `N`.
252- Properties may also provide xref:optimization_properties[assertions] which can
253- enable better optimizations.
264+ Properties may also provide xref:optimization_properties[assertions] or the `alignment` property
265+ which can enable better optimizations.
254266
255267
256268==== Fixed-size Array Overload
@@ -280,6 +292,8 @@ work-group or sub-group.
280292 and default constructible.
281293* `Properties` is an instance of `sycl::ext::oneapi::experimental::properties`
282294
295+ _Mandates_: If `Properties` contains the `alignment` property, `OutputIteratorT` must be a pointer.
296+
283297_Effects_: Stores `ElementsPerWorkItem` elements from `in` span to `out_iter`
284298using the `g` group object.
285299
@@ -289,8 +303,9 @@ Default placement is a blocked one:
289303in striped case:
290304`out_iter[g.get_local_linear_id() + g.get_local_linear_range() * i]` = `in[i];`
291305for `i` between `0` and `ItemsPerWorkItem`.
292- Properties may also provide xref:optimization_properties[assertions] which can
293- enable better optimizations.
306+ Properties may also provide xref:optimization_properties[assertions] or the `alignment` property
307+ which can enable better optimizations.
308+
294309
295310=== Data Placement
296311
@@ -442,6 +457,23 @@ so the implementation can rely on `get_max_local_range()` range size:
442457
443458If partition is uneven the behavior is undefined.
444459
460+ == Alignment
461+
462+ If `InputIteratorT`/`OutputIteratorT` is a pointer then the following property can be used
463+ to provide an alignment of the pointer. It can allow to avoid dynamic alignment check.
464+
465+ ```c++
466+ namespace sycl::ext::oneapi::experimental {
467+ struct alignment_key {
468+ template <int K>
469+ using value_t = property_value<alignment_key, std::integral_constant<int, K>>;
470+ };
471+
472+ template<int K>
473+ inline constexpr alignment_key::value_t<K> alignment;
474+ } // namespace sycl::ext::oneapi::experimental
475+ ```
476+
445477== Usage Example
446478
447479Example shows the simplest case without local memory usage of blocked load
@@ -458,8 +490,8 @@ constexpr std::size_t block_count = 2;
458490constexpr std::size_t size = block_count * block_size * items_per_thread;
459491
460492sycl::queue q;
461- T* input = sycl::malloc_device <T>(size, q);
462- T* output = sycl::malloc_device <T>(size, q);
493+ T* input = sycl::aligned_alloc_device <T>(16, size, q);
494+ T* output = sycl::aligned_alloc_device <T>(16, size, q);
463495
464496q.submit([&](sycl::handler& cgh) {
465497 cgh.parallel_for(
@@ -472,7 +504,7 @@ q.submit([&](sycl::handler& cgh) {
472504 auto offset = g.get_group_id(0) * g.get_local_range(0) *
473505 items_per_thread;
474506
475- auto props = sycl_exp::properties{sycl_exp::contiguous_memory};
507+ auto props = sycl_exp::properties{sycl_exp::contiguous_memory, sycl_exp::alignment<16> };
476508
477509 sycl_exp::group_load(g, input + offset, sycl::span{ data }, props);
478510
0 commit comments