Skip to content

[SYCL] Indicate lack of direct OpenCL interop when not using OpenCL BE #1159

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 916 commits into from

Conversation

rbegam
Copy link
Contributor

@rbegam rbegam commented Feb 21, 2020

Signed-off-by: Sergey V Maslov sergey.v.maslov@intel.com

@bader bader changed the title [SYCL] indicate lack of direct OpenCL interop when not using OpenCL BE [SYCL] Indicate lack of direct OpenCL interop when not using OpenCL BE Feb 21, 2020
bader
bader previously approved these changes Feb 21, 2020
topperc and others added 22 commits February 23, 2020 15:13
…mIntrinsicSDNode as we usually do.

Leave the gather/scatter subclasses, but make them inherit from
MemIntrinsicSDNode and delete their constructor and destructor.
This way we can still have the getIndex, getMask, etc. convenience
functions.
…ot used.

Targets are expected to use getMemIntrinsicNode and not provide
their own subclasses. X86 was previously the only user.
Summary:
The IR printing always prints out all functions in a module with the new pass manager, even with -filter-print-funcs specified. This is being fixed in this change. However, there are two exceptions, i.e, with user-specified wildcast switch -filter-print-funcs=* or -print-module-scope, under which IR of all functions should be printed.

Test Plan:
make check-clang
make check-llvm

Reviewers: wenlei

Reviewed By: wenlei

Subscribers: wenlei, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D74814
Split the tryAndWithMask into several small calls.

Differential Revision: https://reviews.llvm.org/D72250
Also, the source layout document has been updated to reflect the current
layout of the `utils` directory.

Reviewers: PaulkaToast

Differential Revision: https://reviews.llvm.org/D74502
… getMemIntrinsicNode.

These appear to have their own SDNode type and shouldn't use
MemIntrinsicSDNode.
The type profile we use for the isel patterns lied about how
many operands the gather/scatter node has to skip the index
and scale operands. This allowed us to expand the baseptr
operand into base, displacement, and segment and then merge
the index and scale with them in the final instruction during
isel. This is kind of a hack that relies on isel not checking the
number of operands at all.

This commit switches to custom isel where we can manage this
directly without relying on holes in the isel checking.
…, cast the mask to integer type.

The gather intrinsics use a floating point mask when the result
type is FP. But we call DemandedBits on the mask assuming its an
integer type. We also use integer types when we create it from
generic IR. So add a bitcast to the intrinsic path to guarantee
the integer type.
This optimization bypasses GOT loads and calls/branches through stubs when the
ultimate target of the access/branch is found to be within range of the
reference.

Extra debugging output is also added to the generic JITLink algorithm and
basic GOT and Stubs builder utility to aid debugging.
…ode.

Summary:
We have a lot of code in our lookup code to pass around `current_id` counters which end up in our logs like this:
```
AOCTV::FT [234] Found XYZ
```

This patch removes all of this code because:
* I'm splitting up all humongous functions, so I need to write more and more boilerplate to pass around these ids.
* I never saw any similar counters in the LLDB/LLVM code base.
* They're essentially globals and the last thing we need in LLDB is even more global state.
* They're not really useful when readings logs. It doesn't help that there isn't just 1 or 2 counters, but 12 (!) unique counters. I always thought that if I see two identical counter values in those brackets it's the same lookup request, but it seems that's only true by accident (and you can't know which of the 12 counters is actually printed without reading the code). The only time I know I can trust the counters is when it's obvious from the log that it's the same counter like in the log below, but then why have the counters in the first place?

```
 LayoutRecordType[28] on (ASTContext*)0x00007FFA1C840200 'scratch ASTContext' for (RecordDecl*)0x00007FFA0AAE8CF0 [name = '__tree']
 LRT[28] returned:
 LRT[28]   Original = (RecordDecl*)%p
 LRT[28]   Size = %lld
 LRT[28]   Alignment = %lld
 LRT[28]   Fields:
 LRT[28]     (FieldDecl*)0x00007FFA1A13B1D0, Name = '__begin_node_', Offset = 0 bits
 LRT[28]     (FieldDecl*)0x00007FFA1C08FD30, Name = '__pair1_', Offset = 64 bits
 LRT[28]     (FieldDecl*)0x00007FFA1C061210, Name = '__pair3_', Offset = 128 bits
 LRT[28]   Bases:
```

Reviewers: labath, shafik, JDevlieghere

Reviewed By: labath, shafik, JDevlieghere

Subscribers: abidh, lldb-commits

Tags: #lldb

Differential Revision: https://reviews.llvm.org/D74951
For MVE, don't look at the users of the extending loads so that more
as desirable for folding.

Differential Revision: https://reviews.llvm.org/D74958
Summary:
When we added support for type units in dwo files, we changed the
"manual" dwarf index to index _all_ dwarf units in the dwo file instead
of just the split unit belonging to our skeleton unit. This was fine for
dwo files, as they contain only a single compile units and type units do
not have a split type unit which would point to them.

However, this does not work for dwp files because, these files do
contain multiple split compile units, and the current approach means
that each unit gets indexed multiple times (once for each split unit =>
n^2 complexity).

This patch teaches the manual dwarf index to treat dwp files specially.
Any type units in the dwp file added to the main list of compile units
and indexed with them in a single batch. Split compile units in dwp
files are still indexed as a part of their skeleton unit -- this is done
because we need the DW_AT_language attribute from the skeleton unit to
index them properly.

Handling of dwo files remains unchanged -- all units (type and skeleton)
are indexed when we reach the dwo file through the split unit.

Reviewers: clayborg, JDevlieghere, aprantl

Subscribers: arphaman, lldb-commits

Tags: #lldb

Differential Revision: https://reviews.llvm.org/D74964
Explicit dynsym/dynstr sections were added in a6370d5 to compensate for
a yaml2obj change D74764. This test doesn't need those sections, so
instead I just delete the explicit section blocks, and also the
"DynamicSymbols" block, which triggers their implicit generation.
…uite

Summary:
Currently the test suite runs with enabled automatically applied Clang fix-its for expressions.
This is causing that sometimes incorrect expressions in tests are still evaluated even though they
are actually incorrect. Let's disable this feature in the test suite so that we know when expressions
are wrong and leave the fix-it testing to the dedicated tests for that feature.

Also updates the `lang/cpp/operators/` test as it seems Clang needs the `struct` keywords
before C and would otherwise fail without fixits.

Reviewers: jingham, JDevlieghere, shafik

Reviewed By: JDevlieghere, shafik

Subscribers: shafik, lldb-commits

Tags: #lldb

Differential Revision: https://reviews.llvm.org/D74957
Summary:
The type used to represent functional units in MC is
'unsigned', which is 32 bits wide. This is currently
not a problem in any upstream target as no one seems
to have hit the limit on this yet, but in our
downstream one, we need to define more than 32
functional units.

Increasing the size does not seem to cause a huge
size increase in the binary (an llc debug build went
from 1366497672 to 1366523984, a difference of 26k),
so perhaps it would be acceptable to have this patch
applied upstream as well.

Subscribers: hiraditya, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71210
Summary:
This should produce slightly better error messages in case of failures.
Only slightly, because this code was pretty careful about that to begin
with -- I've seen code which does much worse.

Reviewers: jhenderson, dblaikie

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74899
…ng exec** functions

Summary:
There is no need to write out gcdas when forking because we can just reset the counters in the parent process.
Let say a counter is N before the fork, then fork and this counter is set to 0 in the child process.
In the parent process, the counter is incremented by P and in the child process it's incremented by C.
When dump is ran at exit, parent process will dump N+P for the given counter and the child process will dump 0+C, so when the gcdas are merged the resulting counter will be N+P+C.
About exec** functions, since the current process is replaced by an another one there is no need to reset the counters but just write out the gcdas since the counters are definitely lost.
To avoid to have lists in a bad state, we just lock them during the fork and the flush (if called explicitely) and lock them when an element is added.

Reviewers: marco-c

Reviewed By: marco-c

Subscribers: hiraditya, cfe-commits, #sanitizers, llvm-commits, sylvestre.ledru

Tags: #clang, #sanitizers, #llvm

Differential Revision: https://reviews.llvm.org/D74953
Summary:
This patch adds intrinsics and ISelDAG nodes for signed
and unsigned fixed-point division:

```
llvm.sdiv.fix.sat.*
llvm.udiv.fix.sat.*
```

These intrinsics perform scaled, saturating division
on two integers or vectors of integers. They are
required for the implementation of the Embedded-C
fixed-point arithmetic in Clang.

Reviewers: bjope, leonardchan, craig.topper

Subscribers: hiraditya, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71550
This exploits the fact that the iterations of parallel loops are
independent so tiling becomes just an index transformation. This pass
only tiles the innermost loop of a loop nest.

The ultimate goal is to allow vectorization of the tiled loops, but I
don't think we're there yet with the current rewriting, as the tiled
loops don't have a constant trip count.

Differential Revision: https://reviews.llvm.org/D74954
mlir/lib/Parser/Parser.cpp:4484:15: warning: 'parseAssignmentList' overrides a member function but is not marked 'override' [-Winconsistent-missing-override]
  ParseResult parseAssignmentList(SmallVectorImpl<OperandType> &lhs,
              ^
mlir/include/mlir/IR/OpImplementation.h:662:3: note: overridden virtual function is here
  parseAssignmentList(SmallVectorImpl<OperandType> &lhs,
  ^
mlir/lib/Parser/Parser.cpp:4488:12: warning: unused variable 'type' [-Wunused-variable]
      Type type;
           ^
Summary:
Implements the following SVE2 intrinsics:
 - @llvm.aarch64.sve.aesd
 - @llvm.aarch64.sve.aesimc
 - @llvm.aarch64.sve.aese
 - @llvm.aarch64.sve.aesmc
 - @llvm.aarch64.sve.rax1
 - @llvm.aarch64.sve.sm4e
 - @llvm.aarch64.sve.sm4ekey

Reviewers: sdesmalen, c-rhodes, dancgr, cameron.mcinally, efriedma, rengolin

Reviewed By: sdesmalen

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74833
bader and others added 18 commits February 26, 2020 15:19
Signed-off-by: Alexey Bader <alexey.bader@intel.com>
New commands will fetch 3 commits: merge commit and two parent commits
with the state of the target branch and PR branch.

Signed-off-by: Alexey Bader <alexey.bader@intel.com>
LLVM: 6201f66
LLVM-SPIRV-Translator: 39edc1dc
Signed-off-by: Alexander Batashev <alexander.batashev@intel.com>
This attirubte is applicabale to pipe storage decl.
It's expected to be used only in SYCL headers to distinguish
I/O pipes from ordinar pipes and from each other.

Signed-off-by: Dmitry Sidorov <dmitry.sidorov@intel.com>
kernel_readable_io_pipe and kernel_writeable_io_pipe classes
were added.

Signed-off-by: Dmitry Sidorov <dmitry.sidorov@intel.com>
Kernel launching should fail with PI_INVALID_WORK_GROUP_SIZE when the specified work-item sizes are out of bounds.

Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>
configure.py requires Ninja to be installed on build machine, which is
missing from default GitHub Actions Linux environments. Install
ninja-build package before proceeding to CMake configuration.

Signed-off-by: Alexander Batashev <alexander.batashev@intel.com>
intel#1196)

It's not correct to remove commands from leaves that we are not
depending on.

Signed-off-by: Vlad Romanov <vlad.romanov@intel.com>
Signed-off-by: James Brodman <james.brodman@intel.com>
- Fix classes missing due to inline namespace macro.
- Enable first sentence as brief.
- Make diagrams clickable.

Signed-off-by: Alexander Batashev <alexander.batashev@intel.com>
Signed-off-by: Igor Dubinov <igor.dubinov@intel.com>
This also makes `cuda_piDevicesGet` return all available CUDA devices.

Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>
The problem is that gcc < 7.2 emit the following warning:
```
warning: dereferencing type-punned pointer will break stric t-aliasing rules [-Wstrict-aliasing]
(*(T *)m_Mem).~T();
       ^
```

Interesting, that this only happens with `-O2` optimization level

Replaced C-style casts with C++ `reinterpret_cast` and this is actually
quite a hack, because according to the docs [1]:

> the resulting pointer may only be dereferenced safely if allowed by
> the type aliasing rules

Type aliasing rules allow to represent any object as `char *`, but not
the way around, i.e. array of characters cannot be reinterpreted as an
object.

`std::memcpy` also doesn't work here, because there is no requirement
that `T` is trivially copyable.

Another way to trick a compiler is to save pointer returned from
placement new: it already has type `T *`, so, we can store it within the
class and avoid casts. Hovewer, this is also a tricky thing, because since
`m_Mem` and this new pointer point to different types, compiler could
assume that they don't alias (when they actually point to the same memory
location) and perform some undesired transformations.

[1]: https://en.cppreference.com/w/cpp/language/reinterpret_cast

Signed-off-by: Alexey Sachkov <alexey.sachkov@intel.com>
The problem is that gcc < 7.2 emit the following warning:
```
warning: dereferencing type-punned pointer will break stric t-aliasing rules [-Wstrict-aliasing]
(*(T *)m_Mem).~T();
       ^
```

Interesting, that this only happens with `-O2` optimization level

Replaced C-style casts with C++ `reinterpret_cast` and this is actually
quite a hack, because according to the docs [1]:

> the resulting pointer may only be dereferenced safely if allowed by
> the type aliasing rules

Type aliasing rules allow to represent any object as `char *`, but not
the way around, i.e. array of characters cannot be reinterpreted as an
object.

`std::memcpy` also doesn't work here, because there is no requirement
that `T` is trivially copyable.

Another way to trick a compiler is to save pointer returned from
placement new: it already has type `T *`, so, we can store it within the
class and avoid casts. Hovewer, this is also a tricky thing, because since
`m_Mem` and this new pointer point to different types, compiler could
assume that they don't alias (when they actually point to the same memory
location) and perform some undesired transformations.

[1]: https://en.cppreference.com/w/cpp/language/reinterpret_cast

Signed-off-by: Alexey Sachkov <alexey.sachkov@intel.com>
get_build_options should return compile options for a program in a
compiled state and not the other way around.

Signed-off-by: Sergey Semenov <sergey.semenov@intel.com>
- Builtins for atomic load and store (`int`, `unsigned int`, `long`, `unsigned long`, `long long`, and `unsigned long long`)
- Added `long`, `unsigned long`, `long long`, and `unsigned long long` variants of the following existing builtins:
  - Atomic add
  - Atomic sub
  - Atomic and
  - Atomic or
  - Atomic xor
  - Atomic exhange
  - Atomic compare-and-exchange
  - Atomic max
  - Atomic min

Notice: `long` is currently used in place of `long long` due to a problem with the width of `long long` being defined as 128-bit.

Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>
…intel#1164)

Signed-off-by: Premanand M Rao <premanand.m.rao@intel.com>
@bjoernknafla
Copy link
Contributor

Great work to get LIT testing for CUDA pass!

@bjoernknafla
Copy link
Contributor

I have more related fixes in the following - not yet quite ready - PR - any feedback would be very appreciated: codeplaysoftware/sycl-for-cuda#17

@@ -1,3 +1,5 @@
// REQUIRES: opencl-interop

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noted (and created) this weirdness, too. So far my guideline was:

  • XFAIL either marks a test that should fail, or
  • XFAIL marks a test that should be fixed and I want to see the XPASS message to not forget to update it once it is fixed (UNSUPPORTED would just be invisible). Such a test needs a comment below/above XFAIL.
  • UNSUPPORTED is for tests that cannot be supported with a specific feature, e.g., image tests for CUDA. A comment is required.
  • REQUIRES is for tests that require a specific feature, e.g., opencl for OpenCL interop.

In a sense REQUIRES and UNSUPPORTED are expressing the same thing in opposite ways, e.g.,
REQUIRES: opencl
should not need an extra
UNSUPPORTED: cuda
if we always differentiate between test runs with OpenCL backends vs the CUDA backend.

However, at one point we might want to have tests that use multiple backends and then we might want the explicit REQUIRES and UNSUPPORTED to filter out tests that only work for one backend at a time (though I can think of other ways to work around this)?

It might be a good idea (for a different PR) to only have REQUIRES xor UNSUPPORTED per LIT tests (as long as we do not try to filter out for something very specific).

config.environment['SYCL_BE'] = os.environ['SYCL_BE']
# There is no direct OpenCL interop when not running OpenCL BE.
if config.environment['SYCL_BE'] != "PI_OPENCL":
config.available_features.remove('opencl-interop')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In line 185 below we are setting opencl to express availability of OpenCL to do the same thing.

The following not-yet-opened PR also adds LIT markup to suppress tests requiring OpenCL interop when running for CUDA (with more fixes on top): codeplaysoftware/sycl-for-cuda#17

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like this whole code section became redundant. I am planning to remove this and change all the REQUIRES: opencl-interop -> REQUIRES: opencl. Let me know if that sounds good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be great! Thank you.

@@ -67,6 +67,16 @@
if 'OCL_ICD_FILENAMES' in os.environ:
config.environment['OCL_ICD_FILENAMES'] = os.environ['OCL_ICD_FILENAMES']

config.available_features.add('opencl-interop')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already set a feature opencl for the same reason below.

@@ -67,6 +67,16 @@
if 'OCL_ICD_FILENAMES' in os.environ:
config.environment['OCL_ICD_FILENAMES'] = os.environ['OCL_ICD_FILENAMES']

config.available_features.add('opencl-interop')
if 'SYCL_BE' in os.environ:
config.environment['SYCL_BE'] = os.environ['SYCL_BE']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic of this config file needs some though and cleanup (we created a tiny monster) - but not for this PR...

We might want both - a way to explicitly instruct LIT what backend to test and a way to pick up an environment variable setting.

The explicit LIT parameter should take precedence for the CMake targets check-sycl to use OpenCL regardless of environment variables and check-sycl-cuda to use CUDA.

Though when we run llvm-lit directly setting an environment is nicer than to remember using the --param SYCL_BE=PI_CUDA flag.

rbegam and others added 3 commits February 28, 2020 09:59
Rehana Begam <rehana.begam@intel.com>

Co-Authored-By: Bjoern Knafla <bjoern@codeplay.com>
Signed-off-by: Aleksander Fadeev <aleksander.fadeev@intel.com>
Change-Id: I10e6510ff19c4e0ef100f2acb36ac2e314850a8a
Signed-off-by: Sergey V Maslov <sergey.v.maslov@intel.com>
rbegam added 2 commits March 3, 2020 15:39
Signed-off-by: rbegam <rehana.begam@intel.com>
Signed-off-by: rbegam <rehana.begam@intel.com>
@rbegam
Copy link
Contributor Author

rbegam commented Mar 4, 2020

@bader @romanovvlad @garimagu @bjoernknafla @smaslov-intel
I've submitted a new PR for this change here: #1242
It has all the corrections requested so far. Please review that one instead.

I'll be closing this.

@rbegam rbegam closed this Mar 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.