@@ -1455,7 +1455,6 @@ The AMDGPU backend implements the following LLVM IR intrinsics.
14551455 Returns a pair for the swapped registers. The first element of the return corresponds
14561456 to the swapped element of the first argument.
14571457
1458-
14591458 llvm.amdgcn.permlane32.swap Provide direct access to `v_permlane32_swap_b32` instruction on supported targets.
14601459 Swaps the values across lanes of first 2 operands. Rows 2 and 3 of the first operand are
14611460 swapped with rows 0 and 1 of the second operand (one row is 16 lanes).
@@ -1476,6 +1475,25 @@ The AMDGPU backend implements the following LLVM IR intrinsics.
14761475 - `v_mov_b32 <dest> <old>`
14771476 - `v_mov_b32 <dest> <src> <dpp_ctrl> <row_mask> <bank_mask> <bound_ctrl>`
14781477
1478+ :ref:`llvm.prefetch <int_prefetch>` Implemented on gfx1250, ignored on earlier targets.
1479+ First argument is flat, global, or constant address space pointer.
1480+ Any other address space is not supported.
1481+ On gfx125x generates flat_prefetch_b8 or global_prefetch_b8 and brings data to GL2.
1482+ Second argument is rw and currently ignored. Can be 0 or 1.
1483+ Third argument is locality, 0-3. Translates to memory scope:
1484+
1485+ * 0 - SCOPE_SYS
1486+ * 1 - SCOPE_DEV
1487+ * 2 - SCOPE_SE
1488+ * 3 - SCOPE_SE
1489+
1490+ Note that SCOPE_CU is not generated and not safe on an invalid address.
1491+ Fourth argument is cache type:
1492+
1493+ * 0 - Instruction cache, currently ignored and no code is generated.
1494+ * 1 - Data cache.
1495+
1496+ Instruction cache prefetches are unsafe on invalid address.
14791497 ============================================== ==========================================================
14801498
14811499.. TODO::
0 commit comments