Graph update can't handle duplicate disconnected nodes

### Describe the bug

Updating a graph that contains multiple instances of the same node, with no dependencies between them, produces incorrect results.

I might be incorrect about the root-cause here, but it's the best explanation I can come up with.

### To reproduce

```c++
#include <cstdint>
#include <sycl/sycl.hpp>

int main()
{
  sycl::queue q;

  static constexpr size_t R = 10;
  static constexpr size_t I = 5;
  int* output = sycl::malloc_shared<int>(I, q);
  std::fill(output, output + I, 0);

  std::unique_ptr<sycl::ext::oneapi::experimental::command_graph<sycl::ext::oneapi::experimental::graph_state::executable>> graph;
  for (int r = 0; r < R; ++r) {

    sycl::ext::oneapi::experimental::command_graph<sycl::ext::oneapi::experimental::graph_state::modifiable> modifiable_graph(q.get_context(), q.get_device());
    for (size_t i = 1; i < I; ++i) {
      sycl::range global = {i, i, i};
      sycl::range local = {i, i, i};
      modifiable_graph.add([=](sycl::handler& h) {
      h.parallel_for<class test>(sycl::nd_range{global, local}, [=](sycl::nd_item<3> it) noexcept {
        if (it.get_group().leader()) {
          output[i]++;
        }
      });
      });
    }

    if (r == 0) {
      printf("Building graph\n");
      const auto instance = modifiable_graph.finalize(sycl::ext::oneapi::experimental::property::graph::updatable{});
      graph = std::make_unique<sycl::ext::oneapi::experimental::command_graph<sycl::ext::oneapi::experimental::graph_state::executable>>(std::move(instance));
    }
    else {
      printf("Updating graph\n");
      graph->update(modifiable_graph);
    }
    printf("Launching graph\n");
    q.ext_oneapi_graph(*graph).wait();
  }

  q.wait();

  for (int i = 0; i < I; ++i) {
    std::cout << i << ": " << output[i] << std::endl;
  }
}
```

Compile with:
```
clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda update.cpp
```

Run with:
```
./a.out
```

## Expected Output

Each kernel should execute 10 times, updating location _i_ each time, so the output should be:

```
0: 0
1: 10
2: 10
3: 10
4: 10
```

This is the output that I get if I don't use graphs, or if I replace `r == 0` with `true` (to force re-building the graph every time).

## Observed Output

```
0: 0
1: 1
2: 1
3: 1
4: 37
```

As far as I can tell, the first graph launch correctly executes one instance of each kernel, but every updated graph launch just executes multiple instances of the kernel with _i_ set to 4.

### Environment

- OS: Linux
- NVIDIA A100
- DPC++ version: ede5e444e68b6ce9be5e4fa981fc7798c8da20ec
- Driver Version: 570.133.20     CUDA Version: 12.8


### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Graph update can't handle duplicate disconnected nodes #19450

Describe the bug

To reproduce

Expected Output

Observed Output

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Graph update can't handle duplicate disconnected nodes #19450

Description

Describe the bug

To reproduce

Expected Output

Observed Output

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions