Skip to content

[SYCL] Hide SYCL service kernels #4519

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Sep 15, 2021
21 changes: 21 additions & 0 deletions sycl/include/CL/sycl/detail/service_kernel_names.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
//==-------- service_kernels.hpp - SYCL service kernel name types ----------==//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

#pragma once

__SYCL_INLINE_NAMESPACE(cl) {
namespace sycl {
namespace detail {
namespace __sycl_service_kernel__ {

class AssertInfoCopier;

} // namespace __sycl_service_kernel__
} // namespace detail
} // namespace sycl
} // __SYCL_INLINE_NAMESPACE(cl)
6 changes: 3 additions & 3 deletions sycl/include/CL/sycl/queue.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
#include <CL/sycl/detail/assert_happened.hpp>
#include <CL/sycl/detail/common.hpp>
#include <CL/sycl/detail/export.hpp>
#include <CL/sycl/detail/service_kernel_names.hpp>
#include <CL/sycl/device.hpp>
#include <CL/sycl/device_selector.hpp>
#include <CL/sycl/event.hpp>
Expand Down Expand Up @@ -79,11 +80,10 @@ class queue;
namespace detail {
class queue_impl;
#if __SYCL_USE_FALLBACK_ASSERT
class AssertInfoCopier;
static event submitAssertCapture(queue &, event &, queue *,
const detail::code_location &);
#endif
}
} // namespace detail

/// Encapsulates a single SYCL queue which schedules kernels on a SYCL device.
///
Expand Down Expand Up @@ -1167,7 +1167,7 @@ event submitAssertCapture(queue &Self, event &Event, queue *SecondaryQueue,

auto Acc = Buffer.get_access<access::mode::write>(CGH);

CGH.single_task<AssertInfoCopier>([Acc] {
CGH.single_task<__sycl_service_kernel__::AssertInfoCopier>([Acc] {
#ifdef __SYCL_DEVICE_ONLY__
__devicelib_assert_read(&Acc[0]);
#else
Expand Down
20 changes: 20 additions & 0 deletions sycl/source/detail/program_manager/program_manager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,16 @@ ProgramManager &ProgramManager::getInstance() {
return GlobalHandler::instance().getProgramManager();
}

// This function allows for identifying "service" kernels. A SYCL service kernel
// is a kernel that has not been defined by the user but is instead generated by
// the SYCL runtime. Service kernel name types must be declared in the
// sycl::detail::__sycl_service_kernel__ namespace which is exclusively used for
// this purpose. As such service kernels can be identified by
// __sycl_service_kernel__ appearing in the mangled kernel name.
static bool isServiceKernel(const std::string &KernelName) {
return KernelName.find("__sycl_service_kernel__") != std::string::npos;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite straightforward approach and will work.
Though, I'm considered on its performance.
Is possible to get notion of mangling and check just the beginning of the string for Linux and ending for Windows?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking the start of the string was the original approach, but it did not consider Windows mangling. I am not very familiar with the Windows mangling scheme, but I worry that even if we could determine which mangling scheme is used, having templated classes would break the assumption about the end of the mangled name.

Copy link
Contributor

@romanovvlad romanovvlad Sep 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we have the same(Itanium) mangling for kernels on both windows and linux.
In any case, if my another suggest is applied, the only place where we check if a kernel is a service one would be addImages function which should be called once per process for a given image. Having said that I think this is not a performance critical peace of code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems, like my yesterday's comment didn't get to post here:
Is it possible to cache this flag?
I believe, using unordered_map will allow for quick access. Also, we can eliminate use of mutexes as the initialization is only needed at boot-strap and there are only read requests during run-time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe, using unordered_map will allow for quick access.

You're thinking adding an unordered_set (I think set would do the trick) where AddImages caches all found service kernel names. Then ProgramManager::getSYCLDeviceImagesWithCompatibleState would check against that rather than calling isServiceKernel?

Also, we can eliminate use of mutexes as the initialization is only needed at boot-strap and there are only read requests during run-time.

Are you referring to m_KernelIDsMutex?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're thinking adding an unordered_set (I think set would do the trick) where AddImages caches all found service kernel names. Then ProgramManager::getSYCLDeviceImagesWithCompatibleState would check against that rather than calling isServiceKernel?

Right. This will do the thing.

Are you referring to m_KernelIDsMutex?

Never, never, never. I meant any mutex required for lazy initialization of cache. With the cache initialized at boot-strap there is no place for lazy initialization and, thus, no need for thread-safety mechanism except for ensuring that only read access takes place during run-time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. This will do the thing.

I have implemented caching of the service kernels found in ProgramManager::AddImages. This allows for faster lookup later. However, one concern I have is that the only time later that service kernels are looked up is in an assert, so in release builds we will never look up anything in the new cache.

Never, never, never. I meant any mutex required for lazy initialization of cache. With the cache initialized at boot-strap there is no place for lazy initialization and, thus, no need for thread-safety mechanism except for ensuring that only read access takes place during run-time.

I think this may apply to m_KernelIDsMutex as well. That is outside the scope of this PR, so I'll make another PR in which we can discuss further.

}

static RT::PiProgram
createBinaryProgram(const ContextImplPtr Context, const device &Device,
const unsigned char *Data, size_t DataLen,
Expand Down Expand Up @@ -1044,6 +1054,11 @@ void ProgramManager::addImages(pi_device_binaries DeviceBinary) {
auto Result = KSIdMap.insert(std::make_pair(EntriesIt->name, KSId));
(void)Result;
assert(Result.second && "Kernel sets are not disjoint");

// Skip creating unique kernel ID if it is a service kernel
if (isServiceKernel(EntriesIt->name))
continue;

// ... and create a unique kernel ID for the entry
std::shared_ptr<detail::kernel_id_impl> KernelIDImpl =
std::make_shared<detail::kernel_id_impl>(EntriesIt->name);
Expand Down Expand Up @@ -1333,6 +1348,11 @@ ProgramManager::getSYCLDeviceImagesWithCompatibleState(
for (_pi_offload_entry EntriesIt = DevBin->EntriesBegin;
EntriesIt != DevBin->EntriesEnd; ++EntriesIt) {
auto KernelID = m_KernelIDs.find(EntriesIt->name);

// Service kernels do not have kernel IDs
if (KernelID == m_KernelIDs.end() && isServiceKernel(EntriesIt->name))
continue;

assert(KernelID != m_KernelIDs.end() &&
"Kernel ID in device binary missing from cache");
KernelIDs.push_back(KernelID->second);
Expand Down
9 changes: 8 additions & 1 deletion sycl/source/kernel_bundle.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,14 @@ bool has_kernel_bundle_impl(const context &Ctx, const std::vector<device> &Devs,
// TODO: Add a check that all kernel ids are compatible with at least one
// device in Devs

return (bool)DeviceImages.size();
// Some device images may have service kernels that do not have kernel IDs.
// A kernel bundle must have at least one kernel, excluding service kernels.
return std::any_of(
DeviceImages.begin(), DeviceImages.end(),
[](device_image_plain DeviceImage) {
return detail::getSyclObjImpl(DeviceImage)->get_kernel_ids().size() !=
0;
});
}

bool has_kernel_bundle_impl(const context &Ctx, const std::vector<device> &Devs,
Expand Down
32 changes: 31 additions & 1 deletion sycl/unittests/SYCL2020/KernelID.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
class TestKernel1;
class TestKernel2;
class TestKernel3;
class ServiceKernel1;

__SYCL_INLINE_NAMESPACE(cl) {
namespace sycl {
Expand Down Expand Up @@ -57,6 +58,19 @@ template <> struct KernelInfo<TestKernel3> {
static constexpr bool callsAnyThisFreeFunction() { return false; }
};

template <> struct KernelInfo<ServiceKernel1> {
static constexpr unsigned getNumParams() { return 0; }
static const kernel_param_desc_t &getParamDesc(int) {
static kernel_param_desc_t Dummy;
return Dummy;
}
static constexpr const char *getName() {
return "_ZTSN2cl4sycl6detail23__sycl_service_kernel__14ServiceKernel1";
}
static constexpr bool isESIMD() { return false; }
static constexpr bool callsThisItem() { return false; }
static constexpr bool callsAnyThisFreeFunction() { return false; }
};
} // namespace detail
} // namespace sycl
} // __SYCL_INLINE_NAMESPACE(cl)
Expand Down Expand Up @@ -84,7 +98,9 @@ generateDefaultImage(std::initializer_list<std::string> Kernels) {

static sycl::unittest::PiImage Imgs[2] = {
generateDefaultImage({"KernelID_TestKernel1", "KernelID_TestKernel3"}),
generateDefaultImage({"KernelID_TestKernel2"})};
generateDefaultImage(
{"KernelID_TestKernel2",
"_ZTSN2cl4sycl6detail23__sycl_service_kernel__14ServiceKernel1"})};
static sycl::unittest::PiImageArray<2> ImgArray{Imgs};

TEST(KernelID, AllProgramKernelIds) {
Expand All @@ -106,6 +122,20 @@ TEST(KernelID, AllProgramKernelIds) {
}
}

TEST(KernelID, NoServiceKernelIds) {
const char *ServiceKernel1Name =
sycl::detail::KernelInfo<ServiceKernel1>::getName();

std::vector<sycl::kernel_id> AllKernelIDs = sycl::get_kernel_ids();

auto NoFoundServiceKernelID = std::none_of(
AllKernelIDs.begin(), AllKernelIDs.end(), [=](sycl::kernel_id KernelID) {
return strcmp(KernelID.get_name(), ServiceKernel1Name) == 0;
});

EXPECT_TRUE(NoFoundServiceKernelID);
}

TEST(KernelID, FreeKernelIDEqualsKernelBundleId) {
sycl::platform Plt{sycl::default_selector()};
if (Plt.is_host()) {
Expand Down
23 changes: 19 additions & 4 deletions sycl/unittests/assert/assert.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,10 @@ template <> struct KernelInfo<TestKernel> {
static constexpr const kernel_param_desc_t Signatures[] = {
{kernel_param_kind_t::kind_accessor, 4062, 0}};

template <> struct KernelInfo<::sycl::detail::AssertInfoCopier> {
template <>
struct KernelInfo<::sycl::detail::__sycl_service_kernel__::AssertInfoCopier> {
static constexpr const char *getName() {
return "_ZTSN2cl4sycl6detail16AssertInfoCopierE";
return "_ZTSN2cl4sycl6detail23__sycl_service_kernel__16AssertInfoCopierE";
}
static constexpr unsigned getNumParams() { return 1; }
static constexpr const kernel_param_desc_t &getParamDesc(unsigned Idx) {
Expand All @@ -73,7 +74,7 @@ static sycl::unittest::PiImage generateDefaultImage() {

static const std::string KernelName = "TestKernel";
static const std::string CopierKernelName =
"_ZTSN2cl4sycl6detail16AssertInfoCopierE";
"_ZTSN2cl4sycl6detail23__sycl_service_kernel__16AssertInfoCopierE";

PiPropertySet PropSet;

Expand All @@ -98,7 +99,7 @@ static sycl::unittest::PiImage generateCopierKernelImage() {
using namespace sycl::unittest;

static const std::string CopierKernelName =
"_ZTSN2cl4sycl6detail16AssertInfoCopierE";
"_ZTSN2cl4sycl6detail23__sycl_service_kernel__16AssertInfoCopierE";

PiPropertySet PropSet;

Expand Down Expand Up @@ -391,3 +392,17 @@ TEST(Assert, TestPositive) {
}
#endif // _WIN32
}

TEST(Assert, TestAssertServiceKernelHidden) {
const char *AssertServiceKernelName = sycl::detail::KernelInfo<
sycl::detail::__sycl_service_kernel__::AssertInfoCopier>::getName();

std::vector<sycl::kernel_id> AllKernelIDs = sycl::get_kernel_ids();

auto NoFoundServiceKernelID = std::none_of(
AllKernelIDs.begin(), AllKernelIDs.end(), [=](sycl::kernel_id KernelID) {
return strcmp(KernelID.get_name(), AssertServiceKernelName) == 0;
});

EXPECT_TRUE(NoFoundServiceKernelID);
}