Skip to content
Closed
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# `sycl_ext_oneapi_matrix` extension constraints specific to the `ext_oneapi_cuda` backend.
:source-highlighter: coderay
:coderay-linenums-mode: table
:dpcpp: pass:[DPC++]

// This section needs to be after the document title.
:doctype: book
:toc2:
:toc: left
:encoding: utf-8
:lang: en

:blank: pass:[ +]

// Set the default source code type in this document to C++,
// for syntax highlighting purposes. This is needed because
// docbook uses c++ and html5 uses cpp.
:language: {basebackend@docbook:c++:cpp}


== Notice

Copyright (c) 2022-2022 Intel Corporation. All rights reserved.

NOTE: Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are
trademarks of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc.
used by permission by Khronos.

This extension is written against the SYCL 2020 revision 6 specification. All
references below to the "core SYCL specification" or to section numbers in the
SYCL specification refer to that revision.


**_NOTE:_** This document describes the current design and API for the `ext_oneapi_cuda` only features matrix
extension to {dpcpp}. This is an initial experimental version to try out functionality
and performance, and **future versions of this API may change in ways that are incompatible with this experimental version**.

## Introduction
The `ext_oneapi_cuda` backend supports `joint_matrix`, `joint_matrix_load`, `joint_matrix_store`, `joint_matrix_mad` and `joint_matrix_fill` as they are defined in the `sycl_ext_oneapi_matrix` extension. The complete set of `joint_matrix` types and shapes that are valid in the `ext_oneapi_cuda` backend are listed in this document.
This extension presents any constraints that apply specifically when using the `ext_oneapi_cuda` backend, which may not apply generally to the `sycl_ext_oneapi_matrix` extension.

### Valid `joint_matrix` types and shapes

The complete set of matrix data types and shapes that are supported by the `ext_oneapi_cuda` backend are represented in the following table. Tm indicates the matrix element data type held by a "multiplicand" `joint_matrix`: i.e requiring `use::a` or `use::b`. Tc indicates the matrix element data type held by an "accumulator" `joint_matrix`: i.e requiring `use::accumulator`.
--
[.center]
|======================
|Tm (`use::a` or `use::b`) |Tc (`use::accumulator`) |M |N |K | Minimum Compute Capability
.3+|half .3+|float
|16 |16 |16| sm_70
|8 |32 |16| sm_70
|32 |8 |16| sm_70
.3+|half .3+|half
|16 |16 |16| sm_70
|8 |32 |16| sm_70
|32 |8 |16| sm_70
.3+|int8_t .3+|int32_t
|16 |16 |16| sm_72
|8 |32 |16| sm_72
|32 |8 |16| sm_72
.3+|uint8_t .3+|int32_t
|16 |16 |16| sm_72
|8 |32 |16| sm_72
|32 |8 |16| sm_72
|precision::tf32 |float |16 |16 |8| sm_80
.3+|bfloat16 .3+|float
|16 |16 |16 |sm_80
|8 |32 |16 |sm_80
|32 |8 |16 |sm_80
|double |double |8 |8 |4 |sm_80
|======================
--

The M, N, K triple from the above table defines the complete set of matrix shapes constructible:
--
[.center]
|======================
|use |NumRows | NumCols
|a |M |K
|b |K |N
|accumulator | M| N
|======================
--

### Additional contraints in the `ext_oneapi_cuda` backend

IMPORTANT: The `stride` argument to `joint_matrix_load` and `joint_matrix_store` must be a multiple of 8 when `T` is `half`, and a multiple of 4 when `T` is `float`; where `T` is the type of the `joint_matrix` elements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a functional or performance requirement?
If functional, can there be a workaround to support other strides (like some sort of padding at the load level)?

Copy link
Contributor Author

@JackAKirk JackAKirk Dec 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is functional. The ptx builtin requires this constraint. A work-around isn't possible.

## Revision History

[frame="none",options="header"]
|======================
|Rev |Date |Author |Changes
|1 |2022-10-5 |Jack Kirk |Initial public working draft.
|======================