Skip to content

[WIP][SYCL][CUDA] Fix LIT testing with CUDA devices after open sourcing #17

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 12 commits into from

Conversation

bjoernknafla
Copy link
Contributor

Make LIT tests less brittle when testing CUDA.

  • Clean up get_device_count_by_type tool.
  • Re-enable CUDA testing by (re-)introducing the LIT substitution gpu_run_substitute for CUDA.
  • Mark LIT tests using OpenCL interop as not supported by CUDA.
  • Introduce a CHECK helper macro to use instead of assert to have checking even if assertions are not turned on.
  • Replace assert in LIT tests that failed in CUDA with the newly introduced CHECK macro.
  • Clean up tests that failed in CUDA but did not report errors, either by returning 1 or throwing an exception.
  • Clean up asynchronous exception handlers to re-throw the exceptions they catch.
  • Clean up tests to fail when an expected exception is not caught.

bjoernknafla and others added 12 commits February 27, 2020 14:45
We did not build the `get_device_count_by_type` tool with the required
preprocessor definitions to enable CUDA, nor did we link with CUDA.

We passed the wrong backend name to the `get_device_count_by_type` tool
which let to running tests with PI OpenCL instead of CUDA.

Clean up the `get_device_count_by_type` tool code to make it harder to
call it with the wrong backend and get unexpected query results back.

Signed-off-by: Bjoern Knafla <bjoern@codeplay.com>
Fix failing LIT test for SYCL images by marking it unsupported by CUDA.
CUDA images are incompatible with the OpenCL image spec.

Signed-off-by: Bjoern Knafla <bjoern@codeplay.com>
Re-throw unexpected exceptions instead of only printing an error message
and running the following parts of the test.

Introduce a `CHECK` macro to replace the use of `assert` in LIT tests
that are not compiled in debug mode or with parameters explicitly keep
assertions on.

Signed-off-by: Bjoern Knafla <bjoern@codeplay.com>
Signed-off-by: Bjoern Knafla <bjoern@codeplay.com>
The LIT image api test:
* requires OpenCL interop which CUDA does not support, and
* SYCL images are not compatible with CUDA.

Signed-off-by: Bjoern Knafla <bjoern@codeplay.com>
Mark tests that fail with CUDA as XFAIL.
Tests that pass due to detection that the sub-group extension is not
supported are run normally.

The above should lead to passing test or unexpected failures the moment
work on sub-groups for CUDA would start, and therefore reminds us
automatically to revisit the LIT tests.

Signed-off-by: Bjoern Knafla <bjoern@codeplay.com>
Marks two tests as UNSUPPORTED on CUDA as the tested funcitonality
has not been fully implemented yet.

Signed-off-by: Alexander Johnston <alexander@codeplay.com>
Signed-off-by: Bjoern Knafla <bjoern@codeplay.com>
As the PI plugin function table can grow without the CUDA plugin being
adapted, set the whole function pointer table to 0 to trigger crashes
on calling functions that are not set.

Signed-off-by: Bjoern Knafla <bjoern@codeplay.com>
Fix LIT test using `-Werror` to fail when the compiler driver warns
about an unknown CUDA version as CUDA has otherwise nothing to do with
the test.

Signed-off-by: Bjoern Knafla <bjoern@codeplay.com>
Marks LIT tests as UNSUPPORTED that require OpenCL or behavior not
supported by PI CUDA, e.g., online compilation/linking.

Marks tests as XFAIL that could be supported by CUDA but aren't yet.

Signed-off-by: Bjoern Knafla <bjoern@codeplay.com>
Change tests that should pass but fail as XFAIL.
Mark tests that are not supported by CUDA as UNSUPPORTED.

Signed-off-by: Bjoern Knafla <bjoern@codeplay.com>
codeplay-sycl pushed a commit that referenced this pull request Apr 16, 2020
Bitcode file alignment is only 32-bit so 64-bit offsets need
special handling.
/b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/lib/Serialization/ASTReader.cpp:6327:28: runtime error: load of misaligned address 0x7fca2bcfe54c for type 'const uint64_t' (aka 'const unsigned long'), which requires 8 byte alignment
0x7fca2bcfe54c: note: pointer points here
  00 00 00 00 5a a6 01 00  00 00 00 00 19 a7 01 00  00 00 00 00 48 a7 01 00  00 00 00 00 7d a7 01 00
              ^
    #0 0x3be2fe4 in clang::ASTReader::TypeCursorForIndex(unsigned int) /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/lib/Serialization/ASTReader.cpp:6327:28
    #1 0x3be30a0 in clang::ASTReader::readTypeRecord(unsigned int) /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/lib/Serialization/ASTReader.cpp:6348:24
    #2 0x3bd3d4a in clang::ASTReader::GetType(unsigned int) /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/lib/Serialization/ASTReader.cpp:6985:26
    #3 0x3c5d9ae in clang::ASTDeclReader::Visit(clang::Decl*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/lib/Serialization/ASTReaderDecl.cpp:533:31
    #4 0x3c91cac in clang::ASTReader::ReadDeclRecord(unsigned int) /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/lib/Serialization/ASTReaderDecl.cpp:4045:10
    #5 0x3bd4fb1 in clang::ASTReader::GetDecl(unsigned int) /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/lib/Serialization/ASTReader.cpp:7352:5
    #6 0x3bce2f9 in clang::ASTReader::ReadASTBlock(clang::serialization::ModuleFile&, unsigned int) /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/lib/Serialization/ASTReader.cpp:3625:22
    #7 0x3bd6d75 in clang::ASTReader::ReadAST(llvm::StringRef, clang::serialization::ModuleKind, clang::SourceLocation, unsigned int, llvm::SmallVectorImpl<clang::ASTReader::ImportedSubmodule>*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/lib/Serialization/ASTReader.cpp:4230:32
    #8 0x3a6b415 in clang::CompilerInstance::createPCHExternalASTSource(llvm::StringRef, llvm::StringRef, bool, bool, clang::Preprocessor&, clang::InMemoryModuleCache&, clang::ASTContext&, clang::PCHContainerReader const&, llvm::ArrayRef<std::shared_ptr<clang::ModuleFileExtension> >, llvm::ArrayRef<std::shared_ptr<clang::DependencyCollector> >, void*, bool, bool, bool) /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/lib/Frontend/CompilerInstance.cpp:539:19
    #9 0x3a6b00e in clang::CompilerInstance::createPCHExternalASTSource(llvm::StringRef, bool, bool, void*, bool) /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/lib/Frontend/CompilerInstance.cpp:501:18
    #10 0x3abac80 in clang::FrontendAction::BeginSourceFile(clang::CompilerInstance&, clang::FrontendInputFile const&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/lib/Frontend/FrontendAction.cpp:865:12
    #11 0x3a6e61c in clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/lib/Frontend/CompilerInstance.cpp:972:13
    #12 0x3ba74bf in clang::ExecuteCompilerInvocation(clang::CompilerInstance*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp:282:25
    #13 0xa3f753 in cc1_main(llvm::ArrayRef<char const*>, char const*, void*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/tools/driver/cc1_main.cpp:240:15
    #14 0xa3a68a in ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/tools/driver/driver.cpp:330:12
    #15 0xa37f31 in main /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/tools/driver/driver.cpp:407:12
    #16 0x7fca2a7032e0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e0)
    #17 0xa21029 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/clang-11+0xa21029)

This reverts commit 30d5946.
codeplay-sycl pushed a commit that referenced this pull request May 8, 2020
  CONFLICT (content): Merge conflict in clang/include/clang/Basic/LangOptions.h
codeplay-sycl pushed a commit that referenced this pull request Aug 4, 2020
Fixed build fail after b0eb40c [NFC] Remove unused GetUnderlyingObject paramenter
codeplay-sycl pushed a commit that referenced this pull request Aug 22, 2020
When `Target::GetEntryPointAddress()` calls `exe_module->GetObjectFile()->GetEntryPointAddress()`, and the returned
`entry_addr` is valid, it can immediately be returned.

However, just before that, an `llvm::Error` value has been setup, but in this case it is not consumed before returning, like is done further below in the function.

In https://bugs.freebsd.org/248745 we got a bug report for this, where a very simple test case aborts and dumps core:

```
* thread #1, name = 'testcase', stop reason = breakpoint 1.1
    frame #0: 0x00000000002018d4 testcase`main(argc=1, argv=0x00007fffffffea18) at testcase.c:3:5
   1	int main(int argc, char *argv[])
   2	{
-> 3	    return 0;
   4	}
(lldb) p argc
Program aborted due to an unhandled Error:
Error value was Success. (Note: Success values must still be checked prior to being destroyed).

Thread 1 received signal SIGABRT, Aborted.
thr_kill () at thr_kill.S:3
3	thr_kill.S: No such file or directory.
(gdb) bt
#0  thr_kill () at thr_kill.S:3
#1  0x00000008049a0004 in __raise (s=6) at /usr/src/lib/libc/gen/raise.c:52
#2  0x0000000804916229 in abort () at /usr/src/lib/libc/stdlib/abort.c:67
#3  0x000000000451b5f5 in fatalUncheckedError () at /usr/src/contrib/llvm-project/llvm/lib/Support/Error.cpp:112
#4  0x00000000019cf008 in GetEntryPointAddress () at /usr/src/contrib/llvm-project/llvm/include/llvm/Support/Error.h:267
#5  0x0000000001bccbd8 in ConstructorSetup () at /usr/src/contrib/llvm-project/lldb/source/Target/ThreadPlanCallFunction.cpp:67
#6  0x0000000001bcd2c0 in ThreadPlanCallFunction () at /usr/src/contrib/llvm-project/lldb/source/Target/ThreadPlanCallFunction.cpp:114
#7  0x00000000020076d4 in InferiorCallMmap () at /usr/src/contrib/llvm-project/lldb/source/Plugins/Process/Utility/InferiorCallPOSIX.cpp:97
#8  0x0000000001f4be33 in DoAllocateMemory () at /usr/src/contrib/llvm-project/lldb/source/Plugins/Process/FreeBSD/ProcessFreeBSD.cpp:604
#9  0x0000000001fe51b9 in AllocatePage () at /usr/src/contrib/llvm-project/lldb/source/Target/Memory.cpp:347
#10 0x0000000001fe5385 in AllocateMemory () at /usr/src/contrib/llvm-project/lldb/source/Target/Memory.cpp:383
#11 0x0000000001974da2 in AllocateMemory () at /usr/src/contrib/llvm-project/lldb/source/Target/Process.cpp:2301
#12 CanJIT () at /usr/src/contrib/llvm-project/lldb/source/Target/Process.cpp:2331
#13 0x0000000001a1bf3d in Evaluate () at /usr/src/contrib/llvm-project/lldb/source/Expression/UserExpression.cpp:190
#14 0x00000000019ce7a2 in EvaluateExpression () at /usr/src/contrib/llvm-project/lldb/source/Target/Target.cpp:2372
#15 0x0000000001ad784c in EvaluateExpression () at /usr/src/contrib/llvm-project/lldb/source/Commands/CommandObjectExpression.cpp:414
#16 0x0000000001ad86ae in DoExecute () at /usr/src/contrib/llvm-project/lldb/source/Commands/CommandObjectExpression.cpp:646
#17 0x0000000001a5e3ed in Execute () at /usr/src/contrib/llvm-project/lldb/source/Interpreter/CommandObject.cpp:1003
#18 0x0000000001a6c4a3 in HandleCommand () at /usr/src/contrib/llvm-project/lldb/source/Interpreter/CommandInterpreter.cpp:1762
#19 0x0000000001a6f98c in IOHandlerInputComplete () at /usr/src/contrib/llvm-project/lldb/source/Interpreter/CommandInterpreter.cpp:2760
#20 0x0000000001a90b08 in Run () at /usr/src/contrib/llvm-project/lldb/source/Core/IOHandler.cpp:548
#21 0x00000000019a6c6a in ExecuteIOHandlers () at /usr/src/contrib/llvm-project/lldb/source/Core/Debugger.cpp:903
#22 0x0000000001a70337 in RunCommandInterpreter () at /usr/src/contrib/llvm-project/lldb/source/Interpreter/CommandInterpreter.cpp:2946
#23 0x0000000001d9d812 in RunCommandInterpreter () at /usr/src/contrib/llvm-project/lldb/source/API/SBDebugger.cpp:1169
#24 0x0000000001918be8 in MainLoop () at /usr/src/contrib/llvm-project/lldb/tools/driver/Driver.cpp:675
#25 0x000000000191a114 in main () at /usr/src/contrib/llvm-project/lldb/tools/driver/Driver.cpp:890```

Fix the incorrect error catch by only instantiating an `Error` object if it is necessary.

Reviewed By: JDevlieghere

Differential Revision: https://reviews.llvm.org/D86355
codeplay-sycl pushed a commit that referenced this pull request Dec 9, 2020
CXXDeductionGuideDecl with a local typedef has its own copy of the
TypedefDecl with the CXXDeductionGuideDecl as the DeclContext of that
TypedefDecl.
```
      template <typename T> struct A {
        typedef T U;
        A(U, T);
      };
      A a{(int)0, (int)0};
```
Related discussion on cfe-dev:
http://lists.llvm.org/pipermail/cfe-dev/2020-November/067252.html

Without this fix, when we import the CXXDeductionGuideDecl (via
VisitFunctionDecl) then before creating the Decl we must import the
FunctionType. However, the first parameter's type is the afore mentioned
local typedef. So, we then start importing the TypedefDecl whose
DeclContext is the CXXDeductionGuideDecl itself. The infinite loop is
formed.
```
 #0 clang::ASTNodeImporter::VisitCXXDeductionGuideDecl(clang::CXXDeductionGuideDecl*) clang/lib/AST/ASTImporter.cpp:3543:0
 #1 clang::declvisitor::Base<std::add_pointer, clang::ASTNodeImporter, llvm::Expected<clang::Decl*> >::Visit(clang::Decl*) /home/egbomrt/WORK/llvm5/build/debug/tools/clang/include/clang/AST/DeclNodes.inc:405:0
 #2 clang::ASTImporter::ImportImpl(clang::Decl*) clang/lib/AST/ASTImporter.cpp:8038:0
 #3 clang::ASTImporter::Import(clang::Decl*) clang/lib/AST/ASTImporter.cpp:8200:0
 #4 clang::ASTImporter::ImportContext(clang::DeclContext*) clang/lib/AST/ASTImporter.cpp:8297:0
 #5 clang::ASTNodeImporter::ImportDeclContext(clang::Decl*, clang::DeclContext*&, clang::DeclContext*&) clang/lib/AST/ASTImporter.cpp:1852:0
 #6 clang::ASTNodeImporter::ImportDeclParts(clang::NamedDecl*, clang::DeclContext*&, clang::DeclContext*&, clang::DeclarationName&, clang::NamedDecl*&, clang::SourceLocation&) clang/lib/AST/ASTImporter.cpp:1628:0
 #7 clang::ASTNodeImporter::VisitTypedefNameDecl(clang::TypedefNameDecl*, bool) clang/lib/AST/ASTImporter.cpp:2419:0
 #8 clang::ASTNodeImporter::VisitTypedefDecl(clang::TypedefDecl*) clang/lib/AST/ASTImporter.cpp:2500:0
 #9 clang::declvisitor::Base<std::add_pointer, clang::ASTNodeImporter, llvm::Expected<clang::Decl*> >::Visit(clang::Decl*) /home/egbomrt/WORK/llvm5/build/debug/tools/clang/include/clang/AST/DeclNodes.inc:315:0
 #10 clang::ASTImporter::ImportImpl(clang::Decl*) clang/lib/AST/ASTImporter.cpp:8038:0
 #11 clang::ASTImporter::Import(clang::Decl*) clang/lib/AST/ASTImporter.cpp:8200:0
 #12 llvm::Expected<clang::TypedefNameDecl*> clang::ASTNodeImporter::import<clang::TypedefNameDecl>(clang::TypedefNameDecl*) clang/lib/AST/ASTImporter.cpp:165:0
 #13 clang::ASTNodeImporter::VisitTypedefType(clang::TypedefType const*) clang/lib/AST/ASTImporter.cpp:1304:0
 #14 clang::TypeVisitor<clang::ASTNodeImporter, llvm::Expected<clang::QualType> >::Visit(clang::Type const*) /home/egbomrt/WORK/llvm5/build/debug/tools/clang/include/clang/AST/TypeNodes.inc:74:0
 #15 clang::ASTImporter::Import(clang::QualType) clang/lib/AST/ASTImporter.cpp:8071:0
 #16 llvm::Expected<clang::QualType> clang::ASTNodeImporter::import<clang::QualType>(clang::QualType const&) clang/lib/AST/ASTImporter.cpp:179:0
 #17 clang::ASTNodeImporter::VisitFunctionProtoType(clang::FunctionProtoType const*) clang/lib/AST/ASTImporter.cpp:1244:0
 #18 clang::TypeVisitor<clang::ASTNodeImporter, llvm::Expected<clang::QualType> >::Visit(clang::Type const*) /home/egbomrt/WORK/llvm5/build/debug/tools/clang/include/clang/AST/TypeNodes.inc:47:0
 #19 clang::ASTImporter::Import(clang::QualType) clang/lib/AST/ASTImporter.cpp:8071:0
 #20 llvm::Expected<clang::QualType> clang::ASTNodeImporter::import<clang::QualType>(clang::QualType const&) clang/lib/AST/ASTImporter.cpp:179:0
 #21 clang::QualType clang::ASTNodeImporter::importChecked<clang::QualType>(llvm::Error&, clang::QualType const&) clang/lib/AST/ASTImporter.cpp:198:0
 #22 clang::ASTNodeImporter::VisitFunctionDecl(clang::FunctionDecl*) clang/lib/AST/ASTImporter.cpp:3313:0
 #23 clang::ASTNodeImporter::VisitCXXDeductionGuideDecl(clang::CXXDeductionGuideDecl*) clang/lib/AST/ASTImporter.cpp:3543:0
```

The fix is to first create the TypedefDecl and only then start to import
the DeclContext.
Basically, we could do this during the import of all other Decls (not
just for typedefs). But it seems, there is only one another AST
construct that has a similar cycle: a struct defined as a function
parameter:
```
int struct_in_proto(struct data_t{int a;int b;} *d);

```
In that case, however, we had decided to return simply with an error
back then because that seemed to be a very rare construct.

Differential Revision: https://reviews.llvm.org/D92209
@bjoernknafla
Copy link
Contributor Author

This PR got broken into smaller GitHub PRs after our work become available as open source.

A part of this PR that wasn't broken out into new PRs was to review existing LIT tests and to identify cases where:

  • run-time failures were caught but then ignored, so that failing tests seemed to pass,
  • tests were checked with assert instead of the existing CHECK macros.

It might make sense in the future to review the existing intel/llvm upsteam tests for these problems but I doubt that the changes of this PR would apply cleanly are even that they are more than noise and not worth keeping around. Therefore closing the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant