-
Notifications
You must be signed in to change notification settings - Fork 798
Closed
Labels
Description
Hello. I have build clang with cuda backend, however, blur (which I've reduced to a simple data copy) doesn't compute. The same code works fine on Windows version of oneAPI on CPU device, and on custom-built version of AdaptiveCpp on CUDA device.
The output of the program with all error messages:
Running on device: NVIDIA GeForce RTX 3080 Laptop GPU
UR CUDA ERROR:
Value: 700
Name: CUDA_ERROR_ILLEGAL_ADDRESS
Description: an illegal memory access was encountered
Function: operator ()
Source Location: D:\source\dpcpp\llvm\build\_deps\unified-runtime-src\source\adapters\cuda\queue.cpp:218
UR CUDA ERROR:
Value: 700
Name: CUDA_ERROR_ILLEGAL_ADDRESS
Description: an illegal memory access was encountered
Function: getNextTransferStream
Source Location: D:\source\dpcpp\llvm\build\_deps\unified-runtime-src\source\adapters\cuda\queue.cpp:107
UR CUDA ERROR:
Value: 700
Name: CUDA_ERROR_ILLEGAL_ADDRESS
Description: an illegal memory access was encountered
Function: wait
Source Location: D:\source\dpcpp\llvm\build\_deps\unified-runtime-src\source\adapters\cuda\event.cpp:142
An exception is caught for blur:
Native API failed. Native API returns: -999 (Unknown PI error) -999 (Unknown PI error)
Commands I used to compile and run the program:
set PATH=D:\source\dpcpp\llvm\build\bin;%PATH%
set LIB=D:\source\dpcpp\llvm\build\lib;%LIB%
clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda simple_blur1.cpp -o simple_blur1.exe
set ONEAPI_DEVICE_SELECTOR=cuda:*
simple-blur1.exe
OS: Windows 11 Home version : 22H2
CPU: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz 2.30 GHz
Target device and vendor: NVIDIA GeForce RTX 3080 Laptop GPU Driver Version: 31.0.15.3713
CUDA version: 12.2
clang version 18.0.0 (https://github.com/intel/llvm 8fe166f)
Code which doesn't work:
#include <sycl/sycl.hpp>
#include <vector>
#include <iostream>
using namespace sycl;
const size_t resolution = 2048;
size_t vector_size = resolution * resolution;
void Blur(queue& q, std::vector<float>& a_data, std::vector<float>& b_data)
{
range<2> num_items{ resolution, resolution };
buffer a_buf(a_data);
buffer b_buf(b_data);
q.submit([&](handler& h) {
auto a = a_buf.get_access<access::mode::read>(h);
auto b = b_buf.get_access<access::mode::write>(h);
h.parallel_for(num_items, [=](auto it)
{
int x = it[1];
int y = it[0];
float value = a[y * resolution + x];
b[y * resolution + x] = value;
});
});
q.wait();
}
int main(int argc, char* argv[]) {
auto d_selector{ sycl::gpu_selector_v };
std::vector<float> a, b;
a.resize(vector_size);
b.resize(vector_size);
a[0] = 1.0f;
try
{
queue q(d_selector);
std::cout << "Running on device: "
<< q.get_device().get_info<info::device::name>() << "\n";
Blur(q, a, b);
}
catch (exception const& e)
{
std::cout << "An exception is caught for blur:\n";
std::cout << e.what();
std::terminate();
}
std::cout << "Blur successfully completed on device.\n";
system("pause");
return 0;
}
abagusetty