-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[Clang][OpenMP] Capture mapped pointers on target
by reference.
#145454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
For the following: ```c int *p; \#pragma omp target map(p[0]) // (A) (void)p; \#pragma omp target map(p) // (B) (void)p; \#pragma omp target map(p, p[0]) // (C) (void)p; \#pragma omp target map(p[0], p) // (D) (void)p; ``` For (A), the pointer `p` is predetermined `firstprivate`, so it should be (and is) captured by-copy. However, for (B), (C), and (D), since `p` is already listed in a `map` clause, it's not predetermined `firstprivate`, and hence, should be captured by-reference, like any other mapped variable. To ensure the correct handling of (C) and (D), the following changes were made: 1. In SemaOpenMP, we now ensure that `p` is marked to be captured by-reference in these cases. 2. We no longer ignore `map(p)` during codegen of `target` constructs, even if there's another map like `map(p[0])` that would have been mapped using a PTR_AND_OBJ map. 3. For cases like (D), we now handle `map(p)` before `map(p[0])`, so the former gets the TARGET_PARAM flag and sets the kernel argument.
✅ With the latest revision this PR passed the C/C++ code formatter. |
OMPClauseMappableExprCommon::MappableExprComponentListRef Components = | ||
std::get<0>(LHS); | ||
OMPClauseMappableExprCommon::MappableExprComponentListRef ComponentsR = | ||
std::get<0>(RHS); | ||
if (VD && VD->getType()->isAnyPointerType() && Components.size() == 1 && | ||
ComponentsR.size() > 1) | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the following preferable?
const Expr *E = std::get<5>(LHS);
const Expr *ER = std::get<5>(RHS);
if (VD && VD->getType()->isAnyPointerType() &&
isa_and_present<DeclRefExpr>(E) && !isa_and_present<DeclRefExpr>(ER))
// &ptr, &ptr, sizeof(ptr), TO | FROM | PARAM | ||
// &ptr, &ptr[0], 2 * sizeof(ptr[0]), TO | FROM | PTR_AND_OBJ | ||
// | ||
// CHECK-DAG: call i32 @__tgt_target_kernel(ptr @{{.+}}, i64 -1, i32 -1, i32 0, ptr @.[[KERNEL00]].region_id, ptr [[ARGS:%.+]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test now follows the CHECKs in https://github.com/llvm/llvm-project/blob/0e461d1781f1c67cc2724150c6da848f6ea75f3a/clang/test/OpenMP/target_map_codegen_26.cpp etc.
I think these are more readable since this form only has relevant CHECKs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to generate the checks using utils/update_cc_test_checks.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That reduces our coverage of i386 RUNs (line 4, 5), because the pointer/int size (in GEPs) are different, so the update script gives up and just emits:
//// NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
// CHECK: {{.*}}
I can use the script if you think it's still preferable and we don't want to test for i386.
// | ||
// CHECK-DAG: [[RVAR02]] = getelementptr inbounds {{.*}}[[RVAR0:%.+]], i{{.+}} 2 | ||
// CHECK-DAG: [[RVAR0]] = load ptr, ptr [[VAR0]] | ||
#pragma omp target data map(ptr, ptr[2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No change to the maps for target data in this PR.
// CHECK-NEXT: [[PTR_ADDR:%.*]] = alloca ptr | ||
// CHECK-NEXT: store ptr [[PTR]], ptr [[PTR_ADDR]] | ||
// CHECK-NEXT: [[TMP0:%.*]] = load ptr, ptr [[PTR_ADDR]] | ||
// CHECK-NEXT: [[TMP1:%.*]] = load ptr, ptr [[TMP0]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have an extra load in the kernels because PTR is now passed by reference.
// CHECK-DAG: [[RVAR00]] = getelementptr inbounds {{.*}}[[RVAR0:%.+]], i{{.+}} 0 | ||
// CHECK-DAG: [[RVAR0]] = load ptr, ptr [[VAR0]] | ||
// | ||
// CHECK-DAG: call void @[[KERNEL00]](ptr [[VAR0]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
%ptr
itself is passed into the kernel, since we now capture it by reference.
// CHECK-DAG: store ptr [[VAR0:%ptr]], ptr [[BP1]] | ||
// CHECK-DAG: store ptr [[RVAR02:%.+]], ptr [[P1]] | ||
// | ||
// CHECK-DAG: [[RVAR02]] = getelementptr inbounds {{.*}}[[RVAR0:%.+]], i{{.+}} 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The maps for the three target
constructs are identical, except for the offset of this GEP for &p[0/2]
.
#pragma omp target data map(ptr, ptr[2]) | ||
|
||
// Region 02 | ||
// &ptr, &ptr, sizeof(ptr), TO | FROM | PARAM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This exercises the sorting code to handle map(ptr)
before map(ptr[2])
@alexey-bataev, please review. |
@llvm/pr-subscribers-offload @llvm/pr-subscribers-clang Author: Abhinav Gaba (abhinavgaba) ChangesFor the following: int *p;
#pragma omp target map(p[0]) // (A)
(void)p;
#pragma omp target map(p) // (B)
(void)p;
#pragma omp target map(p, p[0]) // (C)
(void)p;
#pragma omp target map(p[0], p) // (D)
(void)p; For (A), the pointer To ensure the correct handling of (C) and (D), the following changes were made:
Patch is 34.47 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/145454.diff 5 Files Affected:
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index 8ccc37ef98a74..39571105e26b2 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -7270,8 +7270,14 @@ class MappableExprsHandler {
// of arguments, hence MEMBER_OF(4)
//
// map(p, p[:100])
+ // For "pragma omp target":
+ // &p, &p, sizeof(p), TARGET_PARAM | TO | FROM
+ // &p, &p[0], 100*sizeof(float), PTR_AND_OBJ | TO | FROM (*)
+ // Otherwise:
// ===> map(p[:100])
// &p, &p[0], 100*sizeof(float), TARGET_PARAM | PTR_AND_OBJ | TO | FROM
+ // (*) We need to use PTR_AND_OBJ here to ensure that the mapped copies of
+ // p and p[0] get attached.
// Track if the map information being generated is the first for a capture.
bool IsCaptureFirstInfo = IsFirstComponentList;
@@ -7289,14 +7295,26 @@ class MappableExprsHandler {
// components.
bool IsExpressionFirstInfo = true;
bool FirstPointerInComplexData = false;
+ bool SkipStandalonePtrMapping = false;
Address BP = Address::invalid();
const Expr *AssocExpr = I->getAssociatedExpression();
const auto *AE = dyn_cast<ArraySubscriptExpr>(AssocExpr);
const auto *OASE = dyn_cast<ArraySectionExpr>(AssocExpr);
const auto *OAShE = dyn_cast<OMPArrayShapingExpr>(AssocExpr);
- if (AreBothBasePtrAndPteeMapped && std::next(I) == CE)
+ // For map(p, p[0]) on a "target" construct, we need to map "p" by itself
+ // as it has to be passed by-reference as the kernel argument.
+ // For other constructs, we can skip mapping "p" because the PTR_AND_OBJ
+ // mapping for map(p[0]) will take care of mapping p as well.
+ SkipStandalonePtrMapping =
+ AreBothBasePtrAndPteeMapped &&
+ (!isa<const OMPExecutableDirective *>(CurDir) ||
+ !isOpenMPTargetExecutionDirective(
+ cast<const OMPExecutableDirective *>(CurDir)->getDirectiveKind()));
+
+ if (SkipStandalonePtrMapping && std::next(I) == CE)
return;
+
if (isa<MemberExpr>(AssocExpr)) {
// The base is the 'this' pointer. The content of the pointer is going
// to be the base of the field being mapped.
@@ -7672,7 +7690,7 @@ class MappableExprsHandler {
getMapTypeBits(MapType, MapModifiers, MotionModifiers, IsImplicit,
!IsExpressionFirstInfo || RequiresReference ||
FirstPointerInComplexData || IsMemberReference,
- AreBothBasePtrAndPteeMapped ||
+ SkipStandalonePtrMapping ||
(IsCaptureFirstInfo && !RequiresReference),
IsNonContiguous);
@@ -8811,8 +8829,19 @@ class MappableExprsHandler {
++EI;
}
}
- llvm::stable_sort(DeclComponentLists, [](const MapData &LHS,
- const MapData &RHS) {
+ llvm::stable_sort(DeclComponentLists, [VD](const MapData &LHS,
+ const MapData &RHS) {
+ // For cases like map(p, p[0], p[0][0]), the shortest map, like map(p)
+ // in this case, should be handled first, to ensure that it gets the
+ // TARGET_PARAM flag.
+ OMPClauseMappableExprCommon::MappableExprComponentListRef Components =
+ std::get<0>(LHS);
+ OMPClauseMappableExprCommon::MappableExprComponentListRef ComponentsR =
+ std::get<0>(RHS);
+ if (VD && VD->getType()->isAnyPointerType() && Components.size() == 1 &&
+ ComponentsR.size() > 1)
+ return true;
+
ArrayRef<OpenMPMapModifierKind> MapModifiers = std::get<2>(LHS);
OpenMPMapClauseKind MapType = std::get<1>(RHS);
bool HasPresent =
diff --git a/clang/lib/Sema/SemaOpenMP.cpp b/clang/lib/Sema/SemaOpenMP.cpp
index 00f4658180807..02e4e7b910d2e 100644
--- a/clang/lib/Sema/SemaOpenMP.cpp
+++ b/clang/lib/Sema/SemaOpenMP.cpp
@@ -2146,6 +2146,7 @@ bool SemaOpenMP::isOpenMPCapturedByRef(const ValueDecl *D, unsigned Level,
// | ptr | n.a. | - | x | - | - | bycopy|
// | ptr | n.a. | x | - | - | - | null |
// | ptr | n.a. | - | - | - | x | byref |
+ // | ptr | n.a. | - | - | - | x, x[] | bycopy|
// | ptr | n.a. | - | - | - | x[] | bycopy|
// | ptr | n.a. | - | - | x | | bycopy|
// | ptr | n.a. | - | - | x | x | bycopy|
@@ -2171,18 +2172,22 @@ bool SemaOpenMP::isOpenMPCapturedByRef(const ValueDecl *D, unsigned Level,
// - For pointers mapped by value that have either an implicit map or an
// array section, the runtime library may pass the NULL value to the
// device instead of the value passed to it by the compiler.
+ // - If both a pointer an a dereference of it are mapped, then the pointer
+ // should be passed by reference.
if (Ty->isReferenceType())
Ty = Ty->castAs<ReferenceType>()->getPointeeType();
- // Locate map clauses and see if the variable being captured is referred to
- // in any of those clauses. Here we only care about variables, not fields,
- // because fields are part of aggregates.
+ // Locate map clauses and see if the variable being captured is mapped by
+ // itself, or referred to, in any of those clauses. Here we only care about
+ // variables, not fields, because fields are part of aggregates.
bool IsVariableAssociatedWithSection = false;
+ bool IsVariableItselfMapped = false;
DSAStack->checkMappableExprComponentListsForDeclAtLevel(
D, Level,
[&IsVariableUsedInMapClause, &IsVariableAssociatedWithSection,
+ &IsVariableItselfMapped,
D](OMPClauseMappableExprCommon::MappableExprComponentListRef
MapExprComponents,
OpenMPClauseKind WhereFoundClauseKind) {
@@ -2198,8 +2203,19 @@ bool SemaOpenMP::isOpenMPCapturedByRef(const ValueDecl *D, unsigned Level,
assert(EI != EE && "Invalid map expression!");
- if (isa<DeclRefExpr>(EI->getAssociatedExpression()))
- IsVariableUsedInMapClause |= EI->getAssociatedDeclaration() == D;
+ if (isa<DeclRefExpr>(EI->getAssociatedExpression()) &&
+ EI->getAssociatedDeclaration() == D) {
+ IsVariableUsedInMapClause = true;
+
+ // If the component list has only one element, it's for mapping the
+ // variable itself, like map(p). This takes precedence in
+ // determining how it's captured, so we don't need to look further
+ // for any other maps that use the variable (like map(p[0]) etc.)
+ if (MapExprComponents.size() == 1) {
+ IsVariableItselfMapped = true;
+ return true;
+ }
+ }
++EI;
if (EI == EE)
@@ -2213,8 +2229,10 @@ bool SemaOpenMP::isOpenMPCapturedByRef(const ValueDecl *D, unsigned Level,
isa<MemberExpr>(EI->getAssociatedExpression()) ||
isa<OMPArrayShapingExpr>(Last->getAssociatedExpression())) {
IsVariableAssociatedWithSection = true;
- // There is nothing more we need to know about this variable.
- return true;
+ // We've found a case like map(p[0]) or map(p->a) or map(*p),
+ // so we are done with this particular map, but we need to keep
+ // looking in case we find a map(p).
+ return false;
}
// Keep looking for more map info.
@@ -2223,8 +2241,23 @@ bool SemaOpenMP::isOpenMPCapturedByRef(const ValueDecl *D, unsigned Level,
if (IsVariableUsedInMapClause) {
// If variable is identified in a map clause it is always captured by
- // reference except if it is a pointer that is dereferenced somehow.
- IsByRef = !(Ty->isPointerType() && IsVariableAssociatedWithSection);
+ // reference except if it is a pointer that is dereferenced somehow, but
+ // not itself mapped.
+ //
+ // OpenMP 6.0, 7.1.1: Data sharing attribute rules, variables referenced
+ // in a construct::
+ // If a list item in a has_device_addr clause or in a map clause on the
+ // target construct has a base pointer, and the base pointer is a scalar
+ // variable *that is not a list item in a map clause on the construct*,
+ // the base pointer is firstprivate.
+ //
+ // OpenMP 4.5, 2.15.1.1: Data-sharing Attribute Rules for Variables
+ // Referenced in a Construct:
+ // If an array section is a list item in a map clause on the target
+ // construct and the array section is derived from a variable for which
+ // the type is pointer then that variable is firstprivate.
+ IsByRef = IsVariableItselfMapped ||
+ !(Ty->isPointerType() && IsVariableAssociatedWithSection);
} else {
// By default, all the data that has a scalar type is mapped by copy
// (except for reduction variables).
diff --git a/clang/test/OpenMP/target_map_both_pointer_pointee_codegen.cpp b/clang/test/OpenMP/target_map_both_pointer_pointee_codegen.cpp
index 87fa7fe462daa..9a8f234da718c 100644
--- a/clang/test/OpenMP/target_map_both_pointer_pointee_codegen.cpp
+++ b/clang/test/OpenMP/target_map_both_pointer_pointee_codegen.cpp
@@ -1,174 +1,178 @@
-// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --include-generated-funcs --replace-value-regex "__omp_offloading_[0-9a-z]+_[0-9a-z]+" "reduction_size[.].+[.]" "pl_cond[.].+[.|,]" --prefix-filecheck-ir-name _
// RUN: %clang_cc1 -verify -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple powerpc64le-unknown-unknown -emit-llvm %s -o - | FileCheck %s
// RUN: %clang_cc1 -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -std=c++11 -triple powerpc64le-unknown-unknown -emit-pch -o %t %s
// RUN: %clang_cc1 -fopenmp -fopenmp-targets=powerpc64le-ibm-linux-gnu -x c++ -triple powerpc64le-unknown-unknown -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s
+// RUN: %clang_cc1 -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ -std=c++11 -triple i386-unknown-unknown -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp -fopenmp-targets=i386-pc-linux-gnu -x c++ -triple i386-unknown-unknown -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s
// expected-no-diagnostics
#ifndef HEADER
#define HEADER
+// CHECK: @.[[KERNEL00:__omp_offloading_.*foov_l[0-9]+]].region_id = weak constant i8 0
+// CHECK: [[SIZE00:@.+]] = private unnamed_addr constant [2 x i64] [i64 {{8|4}}, i64 8]
+// CHECK: [[MYTYPE00:@.+]] = private unnamed_addr constant [2 x i64] [i64 35, i64 19]
+
+// CHECK: @.[[KERNEL01:__omp_offloading_.*foov_l[0-9]+]].region_id = weak constant i8 0
+// CHECK: [[SIZE01:@.+]] = private unnamed_addr constant [2 x i64] [i64 {{8|4}}, i64 4]
+// CHECK: [[MYTYPE01:@.+]] = private unnamed_addr constant [2 x i64] [i64 35, i64 19]
+
+// CHECK: @.[[KERNEL02:__omp_offloading_.*foov_l[0-9]+]].region_id = weak constant i8 0
+// CHECK: [[SIZE02:@.+]] = private unnamed_addr constant [2 x i64] [i64 {{8|4}}, i64 4]
+// CHECK: [[MYTYPE02:@.+]] = private unnamed_addr constant [2 x i64] [i64 35, i64 19]
+
+// CHECK: [[SIZE03:@.+]] = private unnamed_addr constant [1 x i64] [i64 4]
+// CHECK: [[MYTYPE03:@.+]] = private unnamed_addr constant [1 x i64] [i64 51]
+
extern void *malloc (int __size) throw () __attribute__ ((__malloc__));
+// CHECK-LABEL: define{{.*}}@_Z3foov{{.*}}(
void foo() {
int *ptr = (int *) malloc(3 * sizeof(int));
+// Region 00
+// &ptr, &ptr, sizeof(ptr), TO | FROM | PARAM
+// &ptr, &ptr[0], 2 * sizeof(ptr[0]), TO | FROM | PTR_AND_OBJ
+//
+// CHECK-DAG: call i32 @__tgt_target_kernel(ptr @{{.+}}, i64 -1, i32 -1, i32 0, ptr @.[[KERNEL00]].region_id, ptr [[ARGS:%.+]])
+// CHECK-DAG: [[BPARG:%.+]] = getelementptr inbounds {{.+}}[[ARGS]], i32 0, i32 2
+// CHECK-DAG: store ptr [[BPGEP:%.+]], ptr [[BPARG]]
+// CHECK-DAG: [[PARG:%.+]] = getelementptr inbounds {{.+}}[[ARGS]], i32 0, i32 3
+// CHECK-DAG: store ptr [[PGEP:%.+]], ptr [[PARG]]
+// CHECK-DAG: [[BPGEP]] = getelementptr inbounds {{.+}}[[BP:%[^,]+]]
+// CHECK-DAG: [[PGEP]] = getelementptr inbounds {{.+}}[[P:%[^,]+]]
+//
+// CHECK-DAG: [[BP0:%.+]] = getelementptr inbounds {{.+}}[[BP]], i{{.+}} 0, i{{.+}} 0
+// CHECK-DAG: [[P0:%.+]] = getelementptr inbounds {{.+}}[[P]], i{{.+}} 0, i{{.+}} 0
+// CHECK-DAG: store ptr [[VAR0:%ptr]], ptr [[BP0]]
+// CHECK-DAG: store ptr [[VAR0]], ptr [[P0]]
+//
+// CHECK-DAG: [[BP1:%.+]] = getelementptr inbounds {{.+}}[[BP]], i{{.+}} 0, i{{.+}} 1
+// CHECK-DAG: [[P1:%.+]] = getelementptr inbounds {{.+}}[[P]], i{{.+}} 0, i{{.+}} 1
+// CHECK-DAG: store ptr [[VAR0:%ptr]], ptr [[BP1]]
+// CHECK-DAG: store ptr [[RVAR00:%.+]], ptr [[P1]]
+//
+// CHECK-DAG: [[RVAR00]] = getelementptr inbounds {{.*}}[[RVAR0:%.+]], i{{.+}} 0
+// CHECK-DAG: [[RVAR0]] = load ptr, ptr [[VAR0]]
+//
+// CHECK-DAG: call void @[[KERNEL00]](ptr [[VAR0]])
#pragma omp target map(ptr, ptr[0:2])
{
ptr[1] = 6;
}
+
+// Region 01
+// &ptr, &ptr, sizeof(ptr), TO | FROM | PARAM
+// &ptr, &ptr[2], sizeof(ptr[2]), TO | FROM | PTR_AND_OBJ
+//
+// CHECK-DAG: call i32 @__tgt_target_kernel(ptr @{{.+}}, i64 -1, i32 -1, i32 0, ptr @.[[KERNEL01]].region_id, ptr [[ARGS:%.+]])
+// CHECK-DAG: [[BPARG:%.+]] = getelementptr inbounds {{.+}}[[ARGS]], i32 0, i32 2
+// CHECK-DAG: store ptr [[BPGEP:%.+]], ptr [[BPARG]]
+// CHECK-DAG: [[PARG:%.+]] = getelementptr inbounds {{.+}}[[ARGS]], i32 0, i32 3
+// CHECK-DAG: store ptr [[PGEP:%.+]], ptr [[PARG]]
+// CHECK-DAG: [[BPGEP]] = getelementptr inbounds {{.+}}[[BP:%[^,]+]]
+// CHECK-DAG: [[PGEP]] = getelementptr inbounds {{.+}}[[P:%[^,]+]]
+//
+// CHECK-DAG: [[BP0:%.+]] = getelementptr inbounds {{.+}}[[BP]], i{{.+}} 0, i{{.+}} 0
+// CHECK-DAG: [[P0:%.+]] = getelementptr inbounds {{.+}}[[P]], i{{.+}} 0, i{{.+}} 0
+// CHECK-DAG: store ptr [[VAR0:%ptr]], ptr [[BP0]]
+// CHECK-DAG: store ptr [[VAR0]], ptr [[P0]]
+//
+// CHECK-DAG: [[BP1:%.+]] = getelementptr inbounds {{.+}}[[BP]], i{{.+}} 0, i{{.+}} 1
+// CHECK-DAG: [[P1:%.+]] = getelementptr inbounds {{.+}}[[P]], i{{.+}} 0, i{{.+}} 1
+// CHECK-DAG: store ptr [[VAR0:%ptr]], ptr [[BP1]]
+// CHECK-DAG: store ptr [[RVAR02:%.+]], ptr [[P1]]
+//
+// CHECK-DAG: [[RVAR02]] = getelementptr inbounds {{.*}}[[RVAR0:%.+]], i{{.+}} 2
+// CHECK-DAG: [[RVAR0]] = load ptr, ptr [[VAR0]]
+//
+// CHECK-DAG: call void @[[KERNEL01]](ptr [[VAR0]])
#pragma omp target map(ptr, ptr[2])
{
ptr[2] = 8;
}
- #pragma omp target data map(ptr, ptr[2])
+
+// Region 02
+// &ptr, &ptr, sizeof(ptr), TO | FROM | PARAM
+// &ptr, &ptr[2], sizeof(ptr[2]), TO | FROM | PTR_AND_OBJ
+//
+// CHECK-DAG: call i32 @__tgt_target_kernel(ptr @{{.+}}, i64 -1, i32 -1, i32 0, ptr @.[[KERNEL02]].region_id, ptr [[ARGS:%.+]])
+// CHECK-DAG: [[BPARG:%.+]] = getelementptr inbounds {{.+}}[[ARGS]], i32 0, i32 2
+// CHECK-DAG: store ptr [[BPGEP:%.+]], ptr [[BPARG]]
+// CHECK-DAG: [[PARG:%.+]] = getelementptr inbounds {{.+}}[[ARGS]], i32 0, i32 3
+// CHECK-DAG: store ptr [[PGEP:%.+]], ptr [[PARG]]
+// CHECK-DAG: [[BPGEP]] = getelementptr inbounds {{.+}}[[BP:%[^,]+]]
+// CHECK-DAG: [[PGEP]] = getelementptr inbounds {{.+}}[[P:%[^,]+]]
+//
+// CHECK-DAG: [[BP0:%.+]] = getelementptr inbounds {{.+}}[[BP]], i{{.+}} 0, i{{.+}} 0
+// CHECK-DAG: [[P0:%.+]] = getelementptr inbounds {{.+}}[[P]], i{{.+}} 0, i{{.+}} 0
+// CHECK-DAG: store ptr [[VAR0:%ptr]], ptr [[BP0]]
+// CHECK-DAG: store ptr [[VAR0]], ptr [[P0]]
+//
+// CHECK-DAG: [[BP1:%.+]] = getelementptr inbounds {{.+}}[[BP]], i{{.+}} 0, i{{.+}} 1
+// CHECK-DAG: [[P1:%.+]] = getelementptr inbounds {{.+}}[[P]], i{{.+}} 0, i{{.+}} 1
+// CHECK-DAG: store ptr [[VAR0:%ptr]], ptr [[BP1]]
+// CHECK-DAG: store ptr [[RVAR02:%.+]], ptr [[P1]]
+//
+// CHECK-DAG: [[RVAR02]] = getelementptr inbounds {{.*}}[[RVAR0:%.+]], i{{.+}} 2
+// CHECK-DAG: [[RVAR0]] = load ptr, ptr [[VAR0]]
+//
+// CHECK-DAG: call void @[[KERNEL02]](ptr [[VAR0]])
+ #pragma omp target map(ptr[2], ptr)
{
ptr[2] = 9;
}
+
+// Region 03
+// &ptr, &ptr[2], sizeof(ptr[2]), TO | FROM | PARAM | PTR_AND_OBJ
+// FIXME: PARAM seems to be redundant here.
+//
+// CHECK-DAG: call void @__tgt_target_data_begin_mapper(ptr @{{.+}}, i64 -1, i32 1, ptr [[BPGEP:.+]], ptr [[PGEP:.+]], ptr [[SIZE03]], ptr [[MYTYPE03]], ptr null, ptr null)
+// CHECK-DAG: [[BPGEP]] = getelementptr inbounds {{.+}}[[BP:%[^,]+]]
+// CHECK-DAG: [[PGEP]] = getelementptr inbounds {{.+}}[[P:%[^,]+]]
+//
+// CHECK-DAG: [[BP0:%.+]] = getelementptr inbounds {{.+}}[[BP]], i{{.+}} 0, i{{.+}} 0
+// CHECK-DAG: [[P0:%.+]] = getelementptr inbounds {{.+}}[[P]], i{{.+}} 0, i{{.+}} 0
+// CHECK-DAG: store ptr [[VAR0:%ptr]], ptr [[BP0]]
+// CHECK-DAG: store ptr [[RVAR02:%.+]], ptr [[P0]]
+//
+// CHECK-DAG: [[RVAR02]] = getelementptr inbounds {{.*}}[[RVAR0:%.+]], i{{.+}} 2
+// CHECK-DAG: [[RVAR0]] = load ptr, ptr [[VAR0]]
+ #pragma omp target data map(ptr, ptr[2])
+ {
+ ptr[2] = 10;
+ }
}
-#endif
-// CHECK-LABEL: define {{[^@]+}}@_Z3foov
-// CHECK-SAME: () #[[ATTR0:[0-9]+]] {
+
+// CHECK-LABEL: define internal void
+// CHECK-SAME: @[[KERNEL00]](ptr {{[^,]*}}[[PTR:%[^,]+]])
// CHECK-NEXT: entry:
-// CHECK-NEXT: [[PTR:%.*]] = alloca ptr, align 8
-// CHECK-NEXT: [[DOTOFFLOAD_BASEPTRS:%.*]] = alloca [1 x ptr], align 8
-// CHECK-NEXT: [[DOTOFFLOAD_PTRS:%.*]] = alloca [1 x ptr], align 8
-// CHECK-NEXT: [[DOTOFFLOAD_MAPPERS:%.*]] = alloca [1 x ptr], align 8
-// CHECK-NEXT: [[KERNEL_ARGS:%.*]] = alloca [[STRUCT___TGT_KERNEL_ARGUMENTS:%.*]], align 8
-// CHECK-NEXT: [[DOTOFFLOAD_BASEPTRS2:%.*]] = alloca [1 x ptr], align 8
-// CHECK-NEXT: [[DOTOFFLOAD_PTRS3:%.*]] = alloca [1 x ptr], align 8
-// CHECK-NEXT: [[DOTOFFLOAD_MAPPERS4:%.*]] = alloca [1 x ptr], align 8
-// CHECK-NEXT: [[KERNEL_ARGS5:%.*]] = alloca [[STRUCT___TGT_KERNEL_ARGUMENTS]], align 8
-// CHECK-NEXT: [[DOTOFFLOAD_BASEPTRS9:%.*]] = alloca [1 x ptr], align 8
-// CHECK-NEXT: [[DOTOFFLOAD_PTRS10:%.*]] = alloca [1 x ptr], align 8
-// CHECK-NEXT: [[DOTOFFLOAD_MAPPERS11:%.*]] = alloca [1 x ptr], align 8
-// CHECK-NEXT: [[CALL:%.*]] = call noalias noundef ptr @_Z6malloci(i32 noundef signext 12) #[[ATTR3:[0-9]+]]
-// CHECK-NEXT: store ptr [[CALL]], ptr [[PTR]], align 8
-// CHECK-NEXT: [[TMP0:%.*]] = load ptr, ptr [[PTR]], align 8
-// CHECK-NEXT: [[TMP1:%.*]] = load ptr, ptr [[PTR]], align 8
-// CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds nuw i32, ptr [[TMP1]], i64 0
-// CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds [1 x ptr], ptr [[DOTOFFLOAD_BASEPTRS]], i32 0, i32 0
-// CHECK-NEXT: store ptr [[PTR]], ptr [[TMP2]], align 8
-// CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds [1 x ptr], ptr [[DOTOFFLOAD_PTRS]], i32 0, i32 0
-// CHECK-NEXT: store ptr [[ARRAYIDX]], ptr [[TMP3]], align 8
-// CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds [1 x ptr], ptr [[DOTOFFLOAD_MAPPERS]], i64 0, i64 0
-// CHECK-NEXT: store ptr null, ptr [[TMP4]], align 8
-// CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds [1 x ptr], ptr [[DOTOFFLOAD_BASEPTRS]], i32 0, i32 0
-// CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds [1 x ptr], ptr [[DOTOFFLOAD_PTRS]], i32 0, i32 0
-// CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 0
-// CHECK-NEXT: store i32 3, ptr [[TMP7]], align 4
-// CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 1
-// CHECK-NEXT: store i32 1, ptr [[TMP8]], align 4
-// CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 2
-// CHECK-NEXT: store ptr [[TMP5]], ptr [[TMP9]], align 8
-// CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 3
-// CHECK-NEXT: store ptr [[TMP6]], ptr [[TMP10]], align 8
-// CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 4
-// CHECK-NEXT: store ptr @.offload_sizes, ptr [[TMP11]], align 8
-// CHECK-NEXT: [[TMP12:%.*]] = getelement...
[truncated]
|
const MapData &RHS) { | ||
llvm::stable_sort(DeclComponentLists, [VD](const MapData &LHS, | ||
const MapData &RHS) { | ||
// For cases like map(p, p[0], p[0][0]), the shortest map, like map(p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is not sufficient for cases like map(sp, sp->x)
. In that case, the problem is that sp->x populates PartialStruct in
llvm-project/clang/lib/CodeGen/CGOpenMPRuntime.cpp
Lines 8985 to 8990 in 90f3147
generateInfoForComponentList( | |
MapType, MapModifiers, {}, Components, CombinedInfo, | |
StructBaseCombinedInfo, PartialStruct, IsFirstComponentList, | |
IsImplicit, /*GenerateAllInfoForClauses*/ false, Mapper, | |
/*ForDeviceAddr=*/false, VD, VarRef, | |
/*OverlappedElements*/ {}, HasMapBasePtr && HasMapArraySec); |
And even though initially after going through generateInfoFroCapture, map(sp)
occupies the top of the map-chain in CurInfo:
&sp, &sp, sizeof(sp), TO | PARAM
&sp[0], &sp->x, sizeof(sp->x), TO
After, emitCombinedEntry, a new entry gets added for:
&sp[0], &sp[0], sizeof(sp[0]), ALLOC | PARAM
and all existing entries in CurInfo get the MEMBER_OF(1) bit applied to them.
CombinedInfo.append(PartialStruct.PreliminaryMapData); |
So the final CombinedInfo after emitCombinedEntry and then appending CurInfo to it becomes:
&sp[0], &sp[0], sizeof(sp[0]), ALLOC | PARAM
&sp, &sp, sizeof(sp), TO | MEMBER_OF(1)
&sp[0], &sp->x, sizeof(sp->x), TO | MEMBER_OF(1)
instead of the desired:
&sp, &sp, sizeof(sp), TO | PARAM
&sp[0], &sp[0], sizeof(sp[0]), ALLOC
&sp[0], &sp->x, sizeof(sp->x), TO | MEMBER_OF(2)
(Note that this &sp getting MEMBER_OF(1) issue currently exists on target enter data as well.)
Maybe we can split up
llvm-project/clang/lib/CodeGen/CGOpenMPRuntime.cpp
Lines 9489 to 9491 in 90f3147
// If we have any information in the map clause, we use it, otherwise we | |
// just do a default mapping. | |
MEHandler.generateInfoForCapture(CI, *CV, CurInfo, PartialStruct); |
// Pseudo code-flow
// Directly pass the original CombinedInfo to the first two
generateInfoForCaptureForIDP/HDA(...&CombinedInfo...)
generateInfoForCaptureForMaps(&CombinedInfo..., /*filter for only component-lists that are single-length and for pointer base decls*/)
// Pass CurInfo to the next two
generateInfoForCaptureForMaps(&CurInfo, &PartialStruct, /*Filter for remining component-lists*/)
emitCombinedEntry(...&CurInfo, &PartialStruct...)
CombinedInfo.append(CurInfo)
@alexey-bataev, do you have any suggestions/feedback?
Future Issue:
The problem long-term is that we currently assume that all component-lists for the same VAR should contribute towards a single combined PartialStruct, if any. That should not be the case for cases like:
map(to: sp->x, sp->y) map(to: sp->sq->a, sp->sq->b) map(sp)
In the above case, we should get two independent "containing-structs" mapped by themselves:
&sp, &sp, sizeof(sp), TO | PARAM
// map-chain for the containing-struct sp[0] with base-pointer sp
&sp[0], &sp[0], sizeof(sp[0]), ALLOC /* Can be optimized to not alloc the full struct */
&sp[0], &sp->x, sizeof(sp->x), TO | MEMBER_OF(2)
&sp[0], &sp->y, sizeof(sp->y), TO | MEMBER_OF(2)
&sp, &sp[0], ATTACH
// map-chain for the containing-struct sp->sq[0] with base-pointer sp->sq
&(sp->sq[0]), &(sp->sq[0]), sizeof(sp->sq[0]), ALLOC
&sp->sq[0], &sp->sq->a, sizeof(sp->sq->a), TO | MEMBER_OF(6)
&sp->sq[0], &sp->sq->a, sizeof(sp->sq->a), TO | MEMBER_OF(6)
&sp->sq, &sp->sq[0], ATTACH
So we need to loop-over all component-lists, and only process those together that share the same attachable-base-pointer (sp or sp->sq in this case).
So, our second invocation for the "remaining component-lists" would become something like:
for (Expr* AttachBasePtr: AttachBasePtrs) {
MapCombinedInfoTy PerAttachBaseCurInfo;
StructRangeInfoTy PerAttachBasePartialStruct;
generateInfoForCaptureForMaps(&PerAttachBaseCurInfo, &PerAttachBasePartialStruct, /*Filter for every component-list that has AttachBaseptr as its attachable base-pointer*/)
emitCombinedEntry(...&PerAttachBaseCurInfo, &PerAttachBasePartialStruct...)
CombinedInfo.append(PerAttachBaseCurInfo)
}
Alexey, do you foresee any issues with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Not sure splitting is a good idea, we may lose some dependency info potentially, but not sure.
- What if we have 3-level dependency, will we need the third loop in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I'm also not sure yet how the lambda captures function would interact with this if we split g`enerateInfoForCapture.
- What if we have 3-level dependency, will we need the third loop in this case?
For a three-level pointer case like this:
map(to: sp->x, sp->y) map(to: sp->sq->a, sp->sq->b) map(to: sp->sq->sr->c, sp->sq->sr->d) map(sp)
We should eventually have something like:
&sp, &sp, sizeof(sp), TO | PARAM
// map-chain for the containing-struct sp[0] with base-pointer sp
&sp[0], &sp[0], sizeof(sp[0]), ALLOC
&sp[0], &sp->x, sizeof(sp->x), TO | MEMBER_OF(2)
&sp[0], &sp->y, sizeof(sp->y), TO | MEMBER_OF(2)
&sp, &sp[0], ATTACH
// map-chain for the containing-struct sp->sq[0] with base-pointer sp->sq
&(sp->sq[0]), &(sp->sq[0]), sizeof(sp->sq[0]), ALLOC
&sp->sq[0], &sp->sq->a, sizeof(sp->sq->a), TO | MEMBER_OF(6)
&sp->sq[0], &sp->sq->a, sizeof(sp->sq->a), TO | MEMBER_OF(6)
&sp->sq, &sp->sq[0], ATTACH
// map-chain for the containing-struct sp->sq->sr[0] with base-pointer sp->sq->sr
&(sp->sq->sr[0]), &(sp->sq->sr[0]), sizeof(sp->sq->sr[0]), ALLOC
&sp->sq->sr[0], &sp->sq->sr->c, sizeof(sp->sq->sr->c), TO | MEMBER_OF(10)
&sp->sq->sr[0], &sp->sq->sr->d, sizeof(sp->sq->sr->d), TO | MEMBER_OF(10)
&sp->sq->sr, &sp->sq->sr[0], ATTACH
The for (Expr* AttachBasePtr: AttachBasePtrs)
loop would need to run three times, once each for the for the attachable-base-pointer sp
, sp->sq
, and sp->sq->sr
.
And we can think of the first invocation to be for the component-lists that have no base-pointer, and pull that into the loop as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to fix in a single loop at first. If it does not work, try splitting
For the following:
For (A), the pointer
p
is predeterminedfirstprivate
, so it should be (and is) captured by-copy. However, for (B), (C), and (D), sincep
is already listed in amap
clause, it's not predeterminedfirstprivate
, and hence, should be captured by-reference, like any other mapped variable.To ensure the correct handling of (C) and (D), the following changes were made:
In SemaOpenMP, we now ensure that
p
is marked to be captured by-reference in these cases.We no longer ignore
map(p)
during codegen oftarget
constructs, even if there's another map likemap(p[0])
that would have been mapped using a PTR_AND_OBJ map.For cases like (D), we now handle
map(p)
beforemap(p[0])
, so the former gets the TARGET_PARAM flag and sets the kernel argument.