Skip to content

[Graph] queue::memset is executed immediately instead of being recorded #11940

@al42and

Description

@al42and

Describe the bug

With a queue in a recording mode, queue::memset is accepted just fine; but it is, instead, executed during the recording but not added to the graph.

This seems different from handler::memset, which is known to be not currently supported and correctly raises an error when used.

To Reproduce

Here, we do the same sequence of operations using queue::fill and queue::memset, operating on a USM device memory.

The code below initializes two variables to a pattern of ones before recording the graph. The graph sets both values to a pattern of twos, using queue::memset and queue::fill (graph is only recorded, not executed). Then, values are reset to a pattern of threes, and then the graph is executed.

$ clang++ -fsycl -g test_graph_memset.cpp -o test_graph_memset && ONEAPI_DEVICE_SELECTOR=level_zero:0 ./test_graph_memset
Intel(R) Arc(TM) A770 Graphics : native
Setting to 0x01010101 outside the graph: memset=0x1010101; fill=0x1010101
After recording the graph, should be still 0x1010101: memset=0x2020202; fill=0x1010101
Resetting to 0x03030303 outside the graph: memset=0x3030303; fill=0x3030303
After replaying the graph, should be 0x2020202: memset=0x3030303; fill=0x2020202
#include <sycl/sycl.hpp>

namespace syclex = sycl::ext::oneapi::experimental;

int main() {
  for (const auto &dev : sycl::device::get_devices()) {
    using graph_support = syclex::info::device::graph_support;
    using gsl = syclex::graph_support_level;
    const auto gs = dev.get_info<graph_support>();
    std::cout << dev.get_info<sycl::info::device::name>() << " : "
              << (gs == gsl::unsupported
                      ? "unsupported"
                      : (gs == gsl::emulated ? "emulated" : "native"))
              << std::endl;
    if (gs != gsl::unsupported) {
      sycl::context ctx{dev};
      sycl::queue q1{ctx, dev, {sycl::property::queue::in_order()}};
      std::vector<sycl::queue> queuesToRecord{q1};

      const sycl::property_list propList{
          syclex::property::graph::no_cycle_check()};
      syclex::command_graph<syclex::graph_state::modifiable> graph(ctx, dev,
                                                                   propList);

      constexpr int N = 2;
      constexpr int S = sizeof(int);
      int *v_dev = sycl::malloc_device<int>(N, dev, ctx);
      std::array<int, N> v_host;

      q1.memset(v_dev + 0, 0x01, S);
      q1.fill(v_dev + 1, 0x01010101, 1);
      q1.memcpy(v_host.data(), v_dev, N*S).wait();
      std::cout << "Setting to 0x01010101 outside the graph: memset=0x" << std::hex << int{v_host[0]} << "; fill=0x" << int{v_host[1]} << std::endl;

      // Record graph setting values now;
      bool result = graph.begin_recording(queuesToRecord);
      if (!result) {
        std::cout << "  Could not start the recording" << std::endl;
      }
      q1.memset(v_dev + 0, 0x02, S);
      q1.fill(v_dev + 1, 0x02020202, 1);
      graph.end_recording();

      q1.memcpy(v_host.data(), v_dev, N*S).wait();
      std::cout << "After recording the graph, should be still 0x1010101: memset=0x" << std::hex << int{v_host[0]} << "; fill=0x" << int{v_host[1]} << std::endl;

      q1.memset(v_dev + 0, 0x03, S);
      q1.fill(v_dev + 1, 0x03030303, 1);
      q1.memcpy(v_host.data(), v_dev, N*S).wait();
      std::cout << "Resetting to 0x03030303 outside the graph: memset=0x" << std::hex << int{v_host[0]} << "; fill=0x" << int{v_host[1]} << std::endl;

      graph.print_graph("out.dot");
      auto instance = graph.finalize();

      q1.ext_oneapi_graph(instance).wait_and_throw();
      
      q1.memcpy(v_host.data(), v_dev, N*S).wait();
      std::cout << "After replaying the graph, should be 0x2020202: memset=0x" << std::hex << int{v_host[0]} << "; fill=0x" << int{v_host[1]} << std::endl;
    }
  }
}

Environment (please complete the following information):

  • OS: Ubuntu 22.04
  • Target device and vendor: Intel Arc A770, NVIDIA RTX3060
  • DPC++ version: 9322d14
  • Dependencies version: Intel Compute Runtime 23.30.26918.9, CUDA 12.3

With CUDA, the behavior is identical to LevelZero.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions