Skip to content

Graph update can't handle duplicate disconnected nodes #19450

@Pennycook

Description

@Pennycook

Describe the bug

Updating a graph that contains multiple instances of the same node, with no dependencies between them, produces incorrect results.

I might be incorrect about the root-cause here, but it's the best explanation I can come up with.

To reproduce

#include <cstdint>
#include <sycl/sycl.hpp>

int main()
{
  sycl::queue q;

  static constexpr size_t R = 10;
  static constexpr size_t I = 5;
  int* output = sycl::malloc_shared<int>(I, q);
  std::fill(output, output + I, 0);

  std::unique_ptr<sycl::ext::oneapi::experimental::command_graph<sycl::ext::oneapi::experimental::graph_state::executable>> graph;
  for (int r = 0; r < R; ++r) {

    sycl::ext::oneapi::experimental::command_graph<sycl::ext::oneapi::experimental::graph_state::modifiable> modifiable_graph(q.get_context(), q.get_device());
    for (size_t i = 1; i < I; ++i) {
      sycl::range global = {i, i, i};
      sycl::range local = {i, i, i};
      modifiable_graph.add([=](sycl::handler& h) {
      h.parallel_for<class test>(sycl::nd_range{global, local}, [=](sycl::nd_item<3> it) noexcept {
        if (it.get_group().leader()) {
          output[i]++;
        }
      });
      });
    }

    if (r == 0) {
      printf("Building graph\n");
      const auto instance = modifiable_graph.finalize(sycl::ext::oneapi::experimental::property::graph::updatable{});
      graph = std::make_unique<sycl::ext::oneapi::experimental::command_graph<sycl::ext::oneapi::experimental::graph_state::executable>>(std::move(instance));
    }
    else {
      printf("Updating graph\n");
      graph->update(modifiable_graph);
    }
    printf("Launching graph\n");
    q.ext_oneapi_graph(*graph).wait();
  }

  q.wait();

  for (int i = 0; i < I; ++i) {
    std::cout << i << ": " << output[i] << std::endl;
  }
}

Compile with:

clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda update.cpp

Run with:

./a.out

Expected Output

Each kernel should execute 10 times, updating location i each time, so the output should be:

0: 0
1: 10
2: 10
3: 10
4: 10

This is the output that I get if I don't use graphs, or if I replace r == 0 with true (to force re-building the graph every time).

Observed Output

0: 0
1: 1
2: 1
3: 1
4: 37

As far as I can tell, the first graph launch correctly executes one instance of each kernel, but every updated graph launch just executes multiple instances of the kernel with i set to 4.

Environment

  • OS: Linux
  • NVIDIA A100
  • DPC++ version: ede5e44
  • Driver Version: 570.133.20 CUDA Version: 12.8

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingcudaCUDA back-endsycl-graph

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions