Skip to content

[Clang][OpenMP] Capture mapped pointers on target by reference. #145454

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 33 additions & 4 deletions clang/lib/CodeGen/CGOpenMPRuntime.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7270,8 +7270,14 @@ class MappableExprsHandler {
// of arguments, hence MEMBER_OF(4)
//
// map(p, p[:100])
// For "pragma omp target":
// &p, &p, sizeof(p), TARGET_PARAM | TO | FROM
// &p, &p[0], 100*sizeof(float), PTR_AND_OBJ | TO | FROM (*)
// Otherwise:
// ===> map(p[:100])
// &p, &p[0], 100*sizeof(float), TARGET_PARAM | PTR_AND_OBJ | TO | FROM
// (*) We need to use PTR_AND_OBJ here to ensure that the mapped copies of
// p and p[0] get attached.

// Track if the map information being generated is the first for a capture.
bool IsCaptureFirstInfo = IsFirstComponentList;
Expand All @@ -7289,14 +7295,26 @@ class MappableExprsHandler {
// components.
bool IsExpressionFirstInfo = true;
bool FirstPointerInComplexData = false;
bool SkipStandalonePtrMapping = false;
Address BP = Address::invalid();
const Expr *AssocExpr = I->getAssociatedExpression();
const auto *AE = dyn_cast<ArraySubscriptExpr>(AssocExpr);
const auto *OASE = dyn_cast<ArraySectionExpr>(AssocExpr);
const auto *OAShE = dyn_cast<OMPArrayShapingExpr>(AssocExpr);

if (AreBothBasePtrAndPteeMapped && std::next(I) == CE)
// For map(p, p[0]) on a "target" construct, we need to map "p" by itself
// as it has to be passed by-reference as the kernel argument.
// For other constructs, we can skip mapping "p" because the PTR_AND_OBJ
// mapping for map(p[0]) will take care of mapping p as well.
SkipStandalonePtrMapping =
AreBothBasePtrAndPteeMapped &&
(!isa<const OMPExecutableDirective *>(CurDir) ||
!isOpenMPTargetExecutionDirective(
cast<const OMPExecutableDirective *>(CurDir)->getDirectiveKind()));

if (SkipStandalonePtrMapping && std::next(I) == CE)
return;

if (isa<MemberExpr>(AssocExpr)) {
// The base is the 'this' pointer. The content of the pointer is going
// to be the base of the field being mapped.
Expand Down Expand Up @@ -7672,7 +7690,7 @@ class MappableExprsHandler {
getMapTypeBits(MapType, MapModifiers, MotionModifiers, IsImplicit,
!IsExpressionFirstInfo || RequiresReference ||
FirstPointerInComplexData || IsMemberReference,
AreBothBasePtrAndPteeMapped ||
SkipStandalonePtrMapping ||
(IsCaptureFirstInfo && !RequiresReference),
IsNonContiguous);

Expand Down Expand Up @@ -8811,8 +8829,19 @@ class MappableExprsHandler {
++EI;
}
}
llvm::stable_sort(DeclComponentLists, [](const MapData &LHS,
const MapData &RHS) {
llvm::stable_sort(DeclComponentLists, [VD](const MapData &LHS,
const MapData &RHS) {
// For cases like map(p, p[0], p[0][0]), the shortest map, like map(p)
Copy link
Contributor Author

@abhinavgaba abhinavgaba Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is not sufficient for cases like map(sp, sp->x). In that case, the problem is that sp->x populates PartialStruct in

generateInfoForComponentList(
MapType, MapModifiers, {}, Components, CombinedInfo,
StructBaseCombinedInfo, PartialStruct, IsFirstComponentList,
IsImplicit, /*GenerateAllInfoForClauses*/ false, Mapper,
/*ForDeviceAddr=*/false, VD, VarRef,
/*OverlappedElements*/ {}, HasMapBasePtr && HasMapArraySec);
,

And even though initially after going through generateInfoFroCapture, map(sp) occupies the top of the map-chain in CurInfo:

&sp, &sp, sizeof(sp), TO | PARAM
&sp[0], &sp->x, sizeof(sp->x), TO

After, emitCombinedEntry, a new entry gets added for:

&sp[0], &sp[0], sizeof(sp[0]), ALLOC | PARAM

and all existing entries in CurInfo get the MEMBER_OF(1) bit applied to them.

CombinedInfo.append(PartialStruct.PreliminaryMapData);

So the final CombinedInfo after emitCombinedEntry and then appending CurInfo to it becomes:

&sp[0], &sp[0], sizeof(sp[0]), ALLOC | PARAM
&sp, &sp, sizeof(sp), TO | MEMBER_OF(1)
&sp[0], &sp->x, sizeof(sp->x), TO | MEMBER_OF(1)

instead of the desired:

&sp, &sp, sizeof(sp), TO | PARAM
&sp[0], &sp[0], sizeof(sp[0]), ALLOC
&sp[0], &sp->x, sizeof(sp->x), TO | MEMBER_OF(2)

(Note that this &sp getting MEMBER_OF(1) issue currently exists on target enter data as well.)

Maybe we can split up

// If we have any information in the map clause, we use it, otherwise we
// just do a default mapping.
MEHandler.generateInfoForCapture(CI, *CV, CurInfo, PartialStruct);
to first work on is_device_ptr, has_device_addr, and maps on pointers with component-lists of size one, and then subsequently on the remaining component-lists, but I'm not sure if I'm thinking of all the cases there, and if that'll cause issues.

// Pseudo code-flow
// Directly pass the original CombinedInfo to the first two
generateInfoForCaptureForIDP/HDA(...&CombinedInfo...)
generateInfoForCaptureForMaps(&CombinedInfo..., /*filter for only component-lists that are single-length and for pointer base decls*/)

// Pass CurInfo to the next two
generateInfoForCaptureForMaps(&CurInfo, &PartialStruct, /*Filter for remining component-lists*/)
emitCombinedEntry(...&CurInfo, &PartialStruct...)
CombinedInfo.append(CurInfo)

@alexey-bataev, do you have any suggestions/feedback?


Future Issue:

The problem long-term is that we currently assume that all component-lists for the same VAR should contribute towards a single combined PartialStruct, if any. That should not be the case for cases like:

map(to: sp->x, sp->y) map(to: sp->sq->a, sp->sq->b) map(sp)

In the above case, we should get two independent "containing-structs" mapped by themselves:

&sp, &sp, sizeof(sp), TO | PARAM

// map-chain for the containing-struct sp[0] with base-pointer sp
&sp[0], &sp[0], sizeof(sp[0]), ALLOC /* Can be optimized to not alloc the full struct */
&sp[0], &sp->x, sizeof(sp->x), TO | MEMBER_OF(2)
&sp[0], &sp->y, sizeof(sp->y), TO | MEMBER_OF(2)
&sp, &sp[0], ATTACH

// map-chain for the containing-struct sp->sq[0] with base-pointer sp->sq
&(sp->sq[0]), &(sp->sq[0]), sizeof(sp->sq[0]), ALLOC
&sp->sq[0], &sp->sq->a, sizeof(sp->sq->a), TO | MEMBER_OF(6)
&sp->sq[0], &sp->sq->a, sizeof(sp->sq->a), TO | MEMBER_OF(6)
&sp->sq, &sp->sq[0], ATTACH

So we need to loop-over all component-lists, and only process those together that share the same attachable-base-pointer (sp or sp->sq in this case).

So, our second invocation for the "remaining component-lists" would become something like:

for (Expr* AttachBasePtr: AttachBasePtrs) {
  MapCombinedInfoTy PerAttachBaseCurInfo;
  StructRangeInfoTy PerAttachBasePartialStruct;

  generateInfoForCaptureForMaps(&PerAttachBaseCurInfo, &PerAttachBasePartialStruct, /*Filter for every component-list that has AttachBaseptr as its attachable base-pointer*/)
  emitCombinedEntry(...&PerAttachBaseCurInfo, &PerAttachBasePartialStruct...)
  CombinedInfo.append(PerAttachBaseCurInfo)
}

Alexey, do you foresee any issues with this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Not sure splitting is a good idea, we may lose some dependency info potentially, but not sure.
  2. What if we have 3-level dependency, will we need the third loop in this case?

Copy link
Contributor Author

@abhinavgaba abhinavgaba Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I'm also not sure yet how the lambda captures function would interact with this if we split g`enerateInfoForCapture.

  1. What if we have 3-level dependency, will we need the third loop in this case?

For a three-level pointer case like this:

map(to: sp->x, sp->y) map(to: sp->sq->a, sp->sq->b) map(to: sp->sq->sr->c, sp->sq->sr->d) map(sp)

We should eventually have something like:

&sp, &sp, sizeof(sp), TO | PARAM

// map-chain for the containing-struct sp[0] with base-pointer sp
&sp[0], &sp[0], sizeof(sp[0]), ALLOC
&sp[0], &sp->x, sizeof(sp->x), TO | MEMBER_OF(2)
&sp[0], &sp->y, sizeof(sp->y), TO | MEMBER_OF(2)
&sp, &sp[0], ATTACH

// map-chain for the containing-struct sp->sq[0] with base-pointer sp->sq
&(sp->sq[0]), &(sp->sq[0]), sizeof(sp->sq[0]), ALLOC
&sp->sq[0], &sp->sq->a, sizeof(sp->sq->a), TO | MEMBER_OF(6)
&sp->sq[0], &sp->sq->a, sizeof(sp->sq->a), TO | MEMBER_OF(6)
&sp->sq, &sp->sq[0], ATTACH

// map-chain for the containing-struct sp->sq->sr[0] with base-pointer sp->sq->sr
&(sp->sq->sr[0]), &(sp->sq->sr[0]), sizeof(sp->sq->sr[0]), ALLOC
&sp->sq->sr[0], &sp->sq->sr->c, sizeof(sp->sq->sr->c), TO | MEMBER_OF(10)
&sp->sq->sr[0], &sp->sq->sr->d, sizeof(sp->sq->sr->d), TO | MEMBER_OF(10)
&sp->sq->sr, &sp->sq->sr[0], ATTACH

The for (Expr* AttachBasePtr: AttachBasePtrs) loop would need to run three times, once each for the for the attachable-base-pointer sp, sp->sq, and sp->sq->sr.

And we can think of the first invocation to be for the component-lists that have no base-pointer, and pull that into the loop as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to fix in a single loop at first. If it does not work, try splitting

// in this case, should be handled first, to ensure that it gets the
// TARGET_PARAM flag.
OMPClauseMappableExprCommon::MappableExprComponentListRef Components =
std::get<0>(LHS);
OMPClauseMappableExprCommon::MappableExprComponentListRef ComponentsR =
std::get<0>(RHS);
if (VD && VD->getType()->isAnyPointerType() &&
Components.size() != ComponentsR.size())
return Components.size() < ComponentsR.size();

ArrayRef<OpenMPMapModifierKind> MapModifiers = std::get<2>(LHS);
OpenMPMapClauseKind MapType = std::get<1>(RHS);
bool HasPresent =
Expand Down
51 changes: 42 additions & 9 deletions clang/lib/Sema/SemaOpenMP.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2146,6 +2146,7 @@ bool SemaOpenMP::isOpenMPCapturedByRef(const ValueDecl *D, unsigned Level,
// | ptr | n.a. | - | x | - | - | bycopy|
// | ptr | n.a. | x | - | - | - | null |
// | ptr | n.a. | - | - | - | x | byref |
// | ptr | n.a. | - | - | - | x, x[] | bycopy|
// | ptr | n.a. | - | - | - | x[] | bycopy|
// | ptr | n.a. | - | - | x | | bycopy|
// | ptr | n.a. | - | - | x | x | bycopy|
Expand All @@ -2171,18 +2172,22 @@ bool SemaOpenMP::isOpenMPCapturedByRef(const ValueDecl *D, unsigned Level,
// - For pointers mapped by value that have either an implicit map or an
// array section, the runtime library may pass the NULL value to the
// device instead of the value passed to it by the compiler.
// - If both a pointer an a dereference of it are mapped, then the pointer
// should be passed by reference.

if (Ty->isReferenceType())
Ty = Ty->castAs<ReferenceType>()->getPointeeType();

// Locate map clauses and see if the variable being captured is referred to
// in any of those clauses. Here we only care about variables, not fields,
// because fields are part of aggregates.
// Locate map clauses and see if the variable being captured is mapped by
// itself, or referred to, in any of those clauses. Here we only care about
// variables, not fields, because fields are part of aggregates.
bool IsVariableAssociatedWithSection = false;
bool IsVariableItselfMapped = false;

DSAStack->checkMappableExprComponentListsForDeclAtLevel(
D, Level,
[&IsVariableUsedInMapClause, &IsVariableAssociatedWithSection,
&IsVariableItselfMapped,
D](OMPClauseMappableExprCommon::MappableExprComponentListRef
MapExprComponents,
OpenMPClauseKind WhereFoundClauseKind) {
Expand All @@ -2198,8 +2203,19 @@ bool SemaOpenMP::isOpenMPCapturedByRef(const ValueDecl *D, unsigned Level,

assert(EI != EE && "Invalid map expression!");

if (isa<DeclRefExpr>(EI->getAssociatedExpression()))
IsVariableUsedInMapClause |= EI->getAssociatedDeclaration() == D;
if (isa<DeclRefExpr>(EI->getAssociatedExpression()) &&
EI->getAssociatedDeclaration() == D) {
IsVariableUsedInMapClause = true;

// If the component list has only one element, it's for mapping the
// variable itself, like map(p). This takes precedence in
// determining how it's captured, so we don't need to look further
// for any other maps that use the variable (like map(p[0]) etc.)
if (MapExprComponents.size() == 1) {
IsVariableItselfMapped = true;
return true;
}
}

++EI;
if (EI == EE)
Expand All @@ -2213,8 +2229,10 @@ bool SemaOpenMP::isOpenMPCapturedByRef(const ValueDecl *D, unsigned Level,
isa<MemberExpr>(EI->getAssociatedExpression()) ||
isa<OMPArrayShapingExpr>(Last->getAssociatedExpression())) {
IsVariableAssociatedWithSection = true;
// There is nothing more we need to know about this variable.
return true;
// We've found a case like map(p[0]) or map(p->a) or map(*p),
// so we are done with this particular map, but we need to keep
// looking in case we find a map(p).
return false;
}

// Keep looking for more map info.
Expand All @@ -2223,8 +2241,23 @@ bool SemaOpenMP::isOpenMPCapturedByRef(const ValueDecl *D, unsigned Level,

if (IsVariableUsedInMapClause) {
// If variable is identified in a map clause it is always captured by
// reference except if it is a pointer that is dereferenced somehow.
IsByRef = !(Ty->isPointerType() && IsVariableAssociatedWithSection);
// reference except if it is a pointer that is dereferenced somehow, but
// not itself mapped.
//
// OpenMP 6.0, 7.1.1: Data sharing attribute rules, variables referenced
// in a construct::
// If a list item in a has_device_addr clause or in a map clause on the
// target construct has a base pointer, and the base pointer is a scalar
// variable *that is not a list item in a map clause on the construct*,
// the base pointer is firstprivate.
//
// OpenMP 4.5, 2.15.1.1: Data-sharing Attribute Rules for Variables
// Referenced in a Construct:
// If an array section is a list item in a map clause on the target
// construct and the array section is derived from a variable for which
// the type is pointer then that variable is firstprivate.
IsByRef = IsVariableItselfMapped ||
!(Ty->isPointerType() && IsVariableAssociatedWithSection);
} else {
// By default, all the data that has a scalar type is mapped by copy
// (except for reduction variables).
Expand Down
Loading