Description
Describe the bug
The spec is somewhat vague about the behavior of is_compatible
:
A kernel that is defined in the application is compatible with a device unless:
• It uses optional features which are not supported on the device, as described in Section 5.7; or
• It is decorated with a [[sycl::device_has()]] C++ attribute that lists an aspect that is not supported by
the device, as described in Section 5.8.1.
The current implementation seems to fulfill these requirements, so it is not technically broken.
However, it does not handle many cases related to whether the device was targeted during compilation. One would expect that is_compatible
returning true
would mean that the kernel can be run on the device. This is not the case.
E.g., when targeting SPIR-V, the function falsely reports OpenCL AMD device as compatible (an exception is thrown when trying to launch the kernel) and throws an exception when called for HIP and CUDA devices (one would expect it to return false
):
$ clang++ -fsycl -fsycl-targets=spir64 test.cpp -o test
$ ONEAPI_DEVICE_SELECTOR=level_zero:gpu ./test # Works OK!
Checking Intel(R) UHD Graphics 770 [0x4680]
Device is compatible
$ ONEAPI_DEVICE_SELECTOR=opencl:gpu ./test # AMD device reported as compatible, while it is not
Checking Intel(R) UHD Graphics 770 [0x4680]
Device is compatible
Checking gfx1034
Device is compatible
terminate called after throwing an instance of 'sycl::_V1::runtime_error'
what(): Native API failed. Native API returns: -59 (PI_ERROR_INVALID_OPERATION) -59 (PI_ERROR_INVALID_OPERATION)
Aborted (core dumped)
$ ONEAPI_DEVICE_SELECTOR=hip:gpu ./test # Now is_compatible itself throws instead of returning false
Checking AMD Radeon RX 6400
terminate called after throwing an instance of 'sycl::_V1::runtime_error'
what(): Native API failed. Native API returns: -42 (PI_ERROR_INVALID_BINARY) -42 (PI_ERROR_INVALID_BINARY)
Aborted (core dumped)
If we target NVPTX backed with unsupported --offload-arch
, the kernel is anyway reported as compatible, while it cannot be launched:
$ clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda -Xsycl-target-backend --offload-arch=sm_90 test.cpp -o test
$ ONEAPI_DEVICE_SELECTOR=level_zero:gpu ./test # Throws again
Checking Intel(R) UHD Graphics 770 [0x4680]
terminate called after throwing an instance of 'sycl::_V1::runtime_error'
what(): Native API failed. Native API returns: -42 (PI_ERROR_INVALID_BINARY) -42 (PI_ERROR_INVALID_BINARY)
Aborted (core dumped)
$ ONEAPI_DEVICE_SELECTOR=cuda:gpu ./test # I have sm_86 device, which is not compatible, yet the function reports that it is:
Checking NVIDIA GeForce RTX 3060
Device is compatible
PI CUDA ERROR:
Value: 209
Name: CUDA_ERROR_NO_BINARY_FOR_GPU
Description: no kernel image is available for execution on the device
Function: build_program
Source Location: /home/aland/intel-sycl/llvm/sycl/plugins/cuda/pi_cuda.cpp:733
PI CUDA ERROR:
Value: 400
Name: CUDA_ERROR_INVALID_HANDLE
Description: invalid resource handle
Function: cuda_piProgramRelease
Source Location: /home/aland/intel-sycl/llvm/sycl/plugins/cuda/pi_cuda.cpp:3609
terminate called after throwing an instance of 'sycl::_V1::compile_program_error'
what(): The program was built for 1 devices
Build program log for 'NVIDIA GeForce RTX 3060':
-999 (Unknown PI error)
Aborted (core dumped)
To Reproduce
#include <iostream>
#include <CL/sycl.hpp>
class Kernel;
int main()
{
for (const auto& dev : sycl::device::get_devices())
{
std::cout << "Checking " << dev.get_info<sycl::info::device::name>() << std::endl;
bool deviceOk = sycl::is_compatible<Kernel>(dev);
std::cout << "Device is " << (deviceOk ? "compatible" : "incompatible") << std::endl;
if (deviceOk)
{
sycl::queue q{ dev };
q.submit([&](sycl::handler& cgh) {
cgh.parallel_for<Kernel>(sycl::range<1>{ 1 },
[=](sycl::id<1> threadId) { int x = threadId[0]; });
}).wait_and_throw();
}
}
return 0;
}
Environment (please complete the following information):
- OS: Ubuntu 20.04
- Target device and vendor: Intel, NVIDIA, AMD GPU
- DPC++ version: clang version 16.0.0 (https://github.com/intel/llvm 67f6bba)
- Dependencies version: Intel compute runtime 22.39.24347, CUDA 11.8, ROCm 5.3.3