Skip to content

[SYCL] Support non-x86 platforms #2333

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Sep 3, 2020
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 49 additions & 4 deletions sycl/source/detail/platform_util.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -10,16 +10,24 @@
#include <CL/sycl/exception.hpp>
#include <detail/platform_util.hpp>

#if defined(__x86_64__) || defined(__i386__)
#if defined(SYCL_RT_OS_LINUX)
#include <cpuid.h>
#elif defined(SYCL_RT_OS_WINDOWS)
#include <intrin.h>
#endif
#endif

#if defined(SYCL_RT_OS_LINUX)
#include <errno.h>
#include <unistd.h>
#endif

__SYCL_INLINE_NAMESPACE(cl) {
namespace sycl {
namespace detail {

#if defined(__x86_64__) || defined(__i386__)
// Used by methods that duplicate OpenCL behaviour in order to get CPU info
static void cpuid(uint32_t *CPUInfo, uint32_t Type, uint32_t SubType = 0) {
#if defined(SYCL_RT_OS_LINUX)
Expand All @@ -28,11 +36,13 @@ static void cpuid(uint32_t *CPUInfo, uint32_t Type, uint32_t SubType = 0) {
__cpuidex(reinterpret_cast<int *>(CPUInfo), Type, SubType);
#endif
}
#endif

uint32_t PlatformUtil::getMaxClockFrequency() {
throw runtime_error(
"max_clock_frequency parameter is not supported for host device",
PI_INVALID_DEVICE);
#if defined(__x86_64__) || defined(__i386__)
uint32_t CPUInfo[4];
string_class Buff(sizeof(CPUInfo) * 3 + 1, 0);
size_t Offset = 0;
Expand Down Expand Up @@ -62,21 +72,49 @@ uint32_t PlatformUtil::getMaxClockFrequency() {
Buff = Buff.substr(Buff.rfind(' '), Buff.length());
Freq *= std::stod(Buff);
return Freq;
#else
#warning Your platform is not supported!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to produce warning any time when code is compiled for non-x86 platform, right? But patch enables the support of non-x64 platform, so this is a bit confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warning is meant to indicate that the specific function is not supported. Should I change the warning? It is very hard to determine the CPU frequency on an ARM platform and the only thing I found requires a Linux 5.x kernel and read access to /sys...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I am ok to not support this function for ARM.
If someone compiles this for ARM and calls this function 0 is going to be returned, should we also throw runtime error or compile-time warning is enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, this programs throws the following error unconditionally, but I figure that will be removed at some point so I decided to add a placeholder for a non-x86 implementation, for whenever somebody figures out how to get the frequency of an ARM CPU.

throw runtime_error(
      "max_clock_frequency parameter is not supported for host device",	
      PI_INVALID_DEVICE);

#endif
return 0;
}

uint32_t PlatformUtil::getMemCacheLineSize() {
#if defined(__x86_64__) || defined(__i386__)
uint32_t CPUInfo[4];
cpuid(CPUInfo, 0x80000006);
return CPUInfo[2] & 0xff;
#elif defined(SYCL_RT_OS_LINUX) && defined(_SC_LEVEL2_DCACHE_LINESIZE)
long lineSize = sysconf(_SC_LEVEL2_DCACHE_LINESIZE);
if (lineSize > 0) {
return lineSize;
}
#else
#warning Your platform is not supported.
#endif
return 8;
}

uint64_t PlatformUtil::getMemCacheSize() {
#if defined(__x86_64__) || defined(__i386__)
uint32_t CPUInfo[4];
cpuid(CPUInfo, 0x80000006);
return static_cast<uint64_t>(CPUInfo[2] >> 16) * 1024;
#elif defined(SYCL_RT_OS_LINUX) && defined(_SC_LEVEL2_DCACHE_SIZE)
long cacheSize = sysconf(_SC_LEVEL2_DCACHE_SIZE);
if (cacheSize > 0) {
return cacheSize;
}
#else
#warning Your platform is not supported.
#endif
return static_cast<uint64_t>(16 * 1024);
}

uint32_t PlatformUtil::getNativeVectorWidth(PlatformUtil::TypeIndex TIndex) {

uint32_t Index = static_cast<uint32_t>(TIndex);

#if defined(__x86_64__) || defined(__i386__)
// SSE4.2 has 16 byte (XMM) registers
static constexpr uint32_t VECTOR_WIDTH_SSE42[] = {16, 8, 4, 2, 4, 2, 0};
// AVX supports 32 byte (YMM) registers only for floats and doubles
Expand All @@ -86,8 +124,6 @@ uint32_t PlatformUtil::getNativeVectorWidth(PlatformUtil::TypeIndex TIndex) {
// AVX512 has 64 byte (ZMM) registers
static constexpr uint32_t VECTOR_WIDTH_AVX512[] = {64, 32, 16, 8, 16, 8, 0};

uint32_t Index = static_cast<uint32_t>(TIndex);

#if defined(SYCL_RT_OS_LINUX)
if (__builtin_cpu_supports("avx512f"))
return VECTOR_WIDTH_AVX512[Index];
Expand Down Expand Up @@ -119,14 +155,23 @@ uint32_t PlatformUtil::getNativeVectorWidth(PlatformUtil::TypeIndex TIndex) {
#endif

return VECTOR_WIDTH_SSE42[Index];

#elif defined(__ARM_NEON)
// NEON has 16 byte registers
static constexpr uint32_t VECTOR_WIDTH_NEON[] = {16, 8, 4, 2, 4, 2, 0};
return VECTOR_WIDTH_NEON[Index];

#else
#warning Your platform is not supported!
#endif
return 0;
}

void PlatformUtil::prefetch(const char *Ptr, size_t NumBytes) {
if (!Ptr)
return;

// The current implementation assumes 64-byte x86 cache lines.
const size_t CacheLineSize = 64;
const size_t CacheLineSize = PlatformUtil::getMemCacheLineSize();
const size_t CacheLineMask = ~(CacheLineSize - 1);
const char *PtrEnd = Ptr + NumBytes;

Expand Down