-
Notifications
You must be signed in to change notification settings - Fork 798
Description
Describe the bug
With a queue in a recording mode, queue::memset
is accepted just fine; but it is, instead, executed during the recording but not added to the graph.
This seems different from handler::memset
, which is known to be not currently supported and correctly raises an error when used.
To Reproduce
Here, we do the same sequence of operations using queue::fill
and queue::memset
, operating on a USM device memory.
The code below initializes two variables to a pattern of ones before recording the graph. The graph sets both values to a pattern of twos, using queue::memset
and queue::fill
(graph is only recorded, not executed). Then, values are reset to a pattern of threes, and then the graph is executed.
$ clang++ -fsycl -g test_graph_memset.cpp -o test_graph_memset && ONEAPI_DEVICE_SELECTOR=level_zero:0 ./test_graph_memset
Intel(R) Arc(TM) A770 Graphics : native
Setting to 0x01010101 outside the graph: memset=0x1010101; fill=0x1010101
After recording the graph, should be still 0x1010101: memset=0x2020202; fill=0x1010101
Resetting to 0x03030303 outside the graph: memset=0x3030303; fill=0x3030303
After replaying the graph, should be 0x2020202: memset=0x3030303; fill=0x2020202
#include <sycl/sycl.hpp>
namespace syclex = sycl::ext::oneapi::experimental;
int main() {
for (const auto &dev : sycl::device::get_devices()) {
using graph_support = syclex::info::device::graph_support;
using gsl = syclex::graph_support_level;
const auto gs = dev.get_info<graph_support>();
std::cout << dev.get_info<sycl::info::device::name>() << " : "
<< (gs == gsl::unsupported
? "unsupported"
: (gs == gsl::emulated ? "emulated" : "native"))
<< std::endl;
if (gs != gsl::unsupported) {
sycl::context ctx{dev};
sycl::queue q1{ctx, dev, {sycl::property::queue::in_order()}};
std::vector<sycl::queue> queuesToRecord{q1};
const sycl::property_list propList{
syclex::property::graph::no_cycle_check()};
syclex::command_graph<syclex::graph_state::modifiable> graph(ctx, dev,
propList);
constexpr int N = 2;
constexpr int S = sizeof(int);
int *v_dev = sycl::malloc_device<int>(N, dev, ctx);
std::array<int, N> v_host;
q1.memset(v_dev + 0, 0x01, S);
q1.fill(v_dev + 1, 0x01010101, 1);
q1.memcpy(v_host.data(), v_dev, N*S).wait();
std::cout << "Setting to 0x01010101 outside the graph: memset=0x" << std::hex << int{v_host[0]} << "; fill=0x" << int{v_host[1]} << std::endl;
// Record graph setting values now;
bool result = graph.begin_recording(queuesToRecord);
if (!result) {
std::cout << " Could not start the recording" << std::endl;
}
q1.memset(v_dev + 0, 0x02, S);
q1.fill(v_dev + 1, 0x02020202, 1);
graph.end_recording();
q1.memcpy(v_host.data(), v_dev, N*S).wait();
std::cout << "After recording the graph, should be still 0x1010101: memset=0x" << std::hex << int{v_host[0]} << "; fill=0x" << int{v_host[1]} << std::endl;
q1.memset(v_dev + 0, 0x03, S);
q1.fill(v_dev + 1, 0x03030303, 1);
q1.memcpy(v_host.data(), v_dev, N*S).wait();
std::cout << "Resetting to 0x03030303 outside the graph: memset=0x" << std::hex << int{v_host[0]} << "; fill=0x" << int{v_host[1]} << std::endl;
graph.print_graph("out.dot");
auto instance = graph.finalize();
q1.ext_oneapi_graph(instance).wait_and_throw();
q1.memcpy(v_host.data(), v_dev, N*S).wait();
std::cout << "After replaying the graph, should be 0x2020202: memset=0x" << std::hex << int{v_host[0]} << "; fill=0x" << int{v_host[1]} << std::endl;
}
}
}
Environment (please complete the following information):
- OS: Ubuntu 22.04
- Target device and vendor: Intel Arc A770, NVIDIA RTX3060
- DPC++ version: 9322d14
- Dependencies version: Intel Compute Runtime 23.30.26918.9, CUDA 12.3
With CUDA, the behavior is identical to LevelZero.