@@ -267,45 +267,38 @@ Examples:
267267 @llvm.dx.handle.fromHeap.tdx.RawBuffer_v4f32_1_0(
268268 i32 2, i1 false)
269269
270- Buffer Loads and Stores
271- -----------------------
272-
273- *relevant types: Buffers *
274-
275- We need to treat buffer loads and stores from "dx.TypedBuffer" and
276- "dx.RawBuffer" separately. For TypedBuffer, we have ``llvm.dx.typedBufferLoad ``
277- and ``llvm.dx.typedBufferStore ``, which load and store 16-byte "rows" of data
278- via a simple index. For RawBuffer, we have ``llvm.dx.rawBufferPtr ``, which
279- return a pointer that can be indexed, loaded, and stored to as needed.
280-
281- The typed load and store operations always operate on exactly 16 bytes of data,
282- so there are only a few valid overloads. For types that are 32-bits or smaller,
283- we operate on 4-element vectors, such as ``<4 x i32> ``, ``<4 x float> ``, or
284- ``<4 x half> ``. Note that in 16-bit cases each 16-bit value occupies 32-bits of
285- storage. For 64-bit types we operate on 2-element vectors - ``<2 x double> `` or
286- ``<2 x i64> ``. When a type like `Buffer<float> ` is used at the HLSL level, it
287- is expected that this will operate on a single float in each 16 byte row - that
288- is, a load would use the ``<4 x float> `` variant and then extract the first
289- element.
290-
291- .. note :: In DXC, trying to operate on a ``Buffer<double4>`` crashes the
292- compiler. We should probably just reject this in the frontend.
293-
294- The TypedBuffer intrinsics are lowered to the `bufferLoad `_ and `bufferStore `_
295- operations, and the operations on the memory accessed by RawBufferPtr are
296- lowered to `rawBufferLoad `_ and `rawBufferStore `_. Note that if we want to
297- support DXIL versions prior to 1.2 we'll need to lower the RawBuffer loads and
298- stores to the non-raw operations as well.
299-
300- .. note :: TODO: We need to account for `CheckAccessFullyMapped`_ here.
301-
302- In DXIL the load operations always return an ``i32 `` status value, but this
303- isn't very ergonomic when it isn't used. We can (1) bite the bullet and have
304- the loads return `{%ret_type, %i32} ` all the time, (2) create a variant or
305- update the signature iff the status is used, or (3) hide this in a sideband
306- channel somewhere. I'm leaning towards (2), but could probably be convinced
307- that the ugliness of (1) is worth the simplicity.
308-
270+ 16-byte Loads, Samples, and Gathers
271+ -----------------------------------
272+
273+ *relevant types: TypedBuffer, CBuffer, and Textures *
274+
275+ TypedBuffer, CBuffer, and Texture loads, as well as samples and gathers, can
276+ return 1 to 4 elements from the given resource, to a maximum of 16 bytes of
277+ data. DXIL's modeling of this is influenced by DirectX and DXBC's history and
278+ it generally treats these operations as returning 4 32-bit values. For 16-bit
279+ elements the values are 16-bit values, and for 64-bit values the operations
280+ return 4 32-bit integers and emit further code to construct the double.
281+
282+ In DXIL, these operations return `ResRet `_ and `CBufRet `_ values, are structs
283+ containing 4 elements of the same type, and in the case of `ResRet ` a 5th
284+ element that is used by the `CheckAccessFullyMapped `_ operation.
285+
286+ In LLVM IR the intrinsics will return the contained type of the resource
287+ instead. That is, ``llvm.dx.typedBufferLoad `` from a ``Buffer<float> `` would
288+ return a single float, from ``Buffer<float4> `` a vector of 4 floats, and from
289+ ``Buffer<double2> `` a vector of two doubles, etc. The operations are then
290+ expanded out to match DXIL's format during lowering.
291+
292+ In cases where we need ``CheckAccessFullyMapped ``, we have a second intrinsic
293+ that returns an anonymous struct with element-0 being the contained type, and
294+ element-1 being the ``i1 `` result of a ``CheckAccessFullyMapped `` call. We
295+ don't have a separate call to ``CheckAccessFullyMapped `` at all, since that's
296+ the only operation that can possibly be done on this value. In practice this
297+ may mean we insert a DXIL operation for the check when this was missing in the
298+ HLSL source, but this actually matches DXC's behaviour in practice.
299+
300+ .. _ResRet : https://github.com/microsoft/DirectXShaderCompiler/blob/main/docs/DXIL.rst#resource-operation-return-types
301+ .. _CBufRet : https://github.com/microsoft/DirectXShaderCompiler/blob/main/docs/DXIL.rst#cbufferloadlegacy
309302.. _CheckAccessFullyMapped : https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/checkaccessfullymapped
310303
311304.. list-table :: ``@llvm.dx.typedBufferLoad``
@@ -317,7 +310,7 @@ stores to the non-raw operations as well.
317310 - Description
318311 * - Return value
319312 -
320- - A 4- or 2-element vector of the type of the buffer
313+ - The contained type of the buffer
321314 - The data loaded from the buffer
322315 * - ``%buffer ``
323316 - 0
@@ -332,16 +325,23 @@ Examples:
332325
333326.. code-block :: llvm
334327
335- %ret = call <4 x float> @llvm.dx.typedBufferLoad.tdx.TypedBuffer_f32_0_0t(
336- target("dx.TypedBuffer", f32, 0, 0) %buffer, i32 %index)
337- %ret = call <4 x i32> @llvm.dx.typedBufferLoad.tdx.TypedBuffer_i32_0_0t(
338- target("dx.TypedBuffer", i32, 0, 0) %buffer, i32 %index)
339- %ret = call <4 x half> @llvm.dx.typedBufferLoad.tdx.TypedBuffer_f16_0_0t(
340- target("dx.TypedBuffer", f16, 0, 0) %buffer, i32 %index)
341- %ret = call <2 x double> @llvm.dx.typedBufferLoad.tdx.TypedBuffer_f64_0_0t(
342- target("dx.TypedBuffer", double, 0, 0) %buffer, i32 %index)
343-
344- .. list-table :: ``@llvm.dx.typedBufferStore``
328+ %ret = call <4 x float>
329+ @llvm.dx.typedBufferLoad.v4f32.tdx.TypedBuffer_v4f32_0_0_0t(
330+ target("dx.TypedBuffer", <4 x float>, 0, 0, 0) %buffer, i32 %index)
331+ %ret = call float
332+ @llvm.dx.typedBufferLoad.f32.tdx.TypedBuffer_f32_0_0_0t(
333+ target("dx.TypedBuffer", float, 0, 0, 0) %buffer, i32 %index)
334+ %ret = call <4 x i32>
335+ @llvm.dx.typedBufferLoad.v4i32.tdx.TypedBuffer_v4i32_0_0_0t(
336+ target("dx.TypedBuffer", <4 x i32>, 0, 0, 0) %buffer, i32 %index)
337+ %ret = call <4 x half>
338+ @llvm.dx.typedBufferLoad.v4f16.tdx.TypedBuffer_v4f16_0_0_0t(
339+ target("dx.TypedBuffer", <4 x half>, 0, 0, 0) %buffer, i32 %index)
340+ %ret = call <2 x double>
341+ @llvm.dx.typedBufferLoad.v2f64.tdx.TypedBuffer_v2f64_0_0t(
342+ target("dx.TypedBuffer", <2 x double>, 0, 0, 0) %buffer, i32 %index)
343+
344+ .. list-table :: ``@llvm.dx.typedBufferLoad.checkbit``
345345 :header-rows: 1
346346
347347 * - Argument
@@ -350,46 +350,11 @@ Examples:
350350 - Description
351351 * - Return value
352352 -
353- - `` void ``
354- -
353+ - A structure of the contained type and the check bit
354+ - The data loaded from the buffer and the check bit
355355 * - ``%buffer ``
356356 - 0
357357 - ``target(dx.TypedBuffer, ...) ``
358- - The buffer to store into
359- * - ``%index ``
360- - 1
361- - ``i32 ``
362- - Index into the buffer
363- * - ``%data ``
364- - 2
365- - A 4- or 2-element vector of the type of the buffer
366- - The data to store
367-
368- Examples:
369-
370- .. code-block :: llvm
371-
372- call void @llvm.dx.bufferStore.tdx.Buffer_f32_1_0t(
373- target("dx.TypedBuffer", f32, 1, 0) %buf, i32 %index, <4 x f32> %data)
374- call void @llvm.dx.bufferStore.tdx.Buffer_f16_1_0t(
375- target("dx.TypedBuffer", f16, 1, 0) %buf, i32 %index, <4 x f16> %data)
376- call void @llvm.dx.bufferStore.tdx.Buffer_f64_1_0t(
377- target("dx.TypedBuffer", f64, 1, 0) %buf, i32 %index, <2 x f64> %data)
378-
379- .. list-table :: ``@llvm.dx.rawBufferPtr``
380- :header-rows: 1
381-
382- * - Argument
383- -
384- - Type
385- - Description
386- * - Return value
387- -
388- - ``ptr ``
389- - Pointer to an element of the buffer
390- * - ``%buffer ``
391- - 0
392- - ``target(dx.RawBuffer, ...) ``
393358 - The buffer to load from
394359 * - ``%index ``
395360 - 1
@@ -400,37 +365,7 @@ Examples:
400365
401366.. code-block :: llvm
402367
403- ; Load a float4 from a buffer
404- %buf = call ptr @llvm.dx.rawBufferPtr.tdx.RawBuffer_v4f32_0_0t(
405- target("dx.RawBuffer", <4 x f32>, 0, 0) %buffer, i32 %index)
406- %val = load <4 x float>, ptr %buf, align 16
407-
408- ; Load the double from a struct containing an int, a float, and a double
409- %buf = call ptr @llvm.dx.rawBufferPtr.tdx.RawBuffer_sl_i32f32f64s_0_0t(
410- target("dx.RawBuffer", {i32, f32, f64}, 0, 0) %buffer, i32 %index)
411- %val = getelementptr inbounds {i32, f32, f64}, ptr %buf, i32 0, i32 2
412- %d = load double, ptr %val, align 8
413-
414- ; Load a float from a byte address buffer
415- %buf = call ptr @llvm.dx.rawBufferPtr.tdx.RawBuffer_i8_0_0t(
416- target("dx.RawBuffer", i8, 0, 0) %buffer, i32 %index)
417- %val = getelementptr inbounds float, ptr %buf, i64 0
418- %f = load float, ptr %val, align 4
419-
420- ; Store to a buffer containing float4
421- %addr = call ptr @llvm.dx.rawBufferPtr.tdx.RawBuffer_v4f32_0_0t(
422- target("dx.RawBuffer", <4 x f32>, 0, 0) %buffer, i32 %index)
423- store <4 x float> %val, ptr %addr
424-
425- ; Store the double in a struct containing an int, a float, and a double
426- %buf = call ptr @llvm.dx.rawBufferPtr.tdx.RawBuffer_sl_i32f32f64s_0_0t(
427- target("dx.RawBuffer", {i32, f32, f64}, 0, 0) %buffer, i32 %index)
428- %addr = getelementptr inbounds {i32, f32, f64}, ptr %buf, i32 0, i32 2
429- store double %d, ptr %addr
430-
431- ; Store a float into a byte address buffer
432- %buf = call ptr @llvm.dx.rawBufferPtr.tdx.RawBuffer_i8_0_0t(
433- target("dx.RawBuffer", i8, 0, 0) %buffer, i32 %index)
434- %addr = getelementptr inbounds float, ptr %buf, i64 0
435- store float %f, ptr %val
368+ %ret = call {<4 x float>, i1}
369+ @llvm.dx.typedBufferLoad.checkbit.v4f32.tdx.TypedBuffer_v4f32_0_0_0t(
370+ target("dx.TypedBuffer", <4 x float>, 0, 0, 0) %buffer, i32 %index)
436371
0 commit comments