Skip to content

Add clang atomic control options and attribute #114841

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 27, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
160 changes: 160 additions & 0 deletions clang/docs/LanguageExtensions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5442,6 +5442,166 @@ third argument, can only occur at file scope.
a = b[i] * c[i] + e;
}

Extensions for controlling atomic code generation
=================================================

The ``[[clang::atomic]]`` statement attribute enables users to control how
atomic operations are lowered in LLVM IR by conveying additional metadata to
the backend. The primary goal is to allow users to specify certain options,
like whether the affected atomic operations might be used with specific types of memory or
whether to ignore denormal mode correctness in floating-point operations,
without affecting the correctness of code that does not rely on these properties.

In LLVM, lowering of atomic operations (e.g., ``atomicrmw``) can differ based
on the target's capabilities. Some backends support native atomic instructions
only for certain operation types or alignments, or only in specific memory
regions. Likewise, floating-point atomic instructions may or may not respect
IEEE denormal requirements. When the user is unconcerned about denormal-mode
compliance (for performance reasons) or knows that certain atomic operations
will not be performed on a particular type of memory, extra hints are needed to
tell the backend how to proceed.

A classic example is an architecture where floating-point atomic add does not
fully conform to IEEE denormal-mode handling. If the user does not mind ignoring
that aspect, they would prefer to emit a faster hardware atomic instruction,
rather than a fallback or CAS loop. Conversely, on certain GPUs (e.g., AMDGPU),
memory accessed via PCIe may only support a subset of atomic operations. To ensure
correct and efficient lowering, the compiler must know whether the user needs
the atomic operations to work with that type of memory.

The allowed atomic attribute values are now ``remote_memory``, ``fine_grained_memory``,
and ``ignore_denormal_mode``, each optionally prefixed with ``no_``. The meanings
are as follows:

- ``remote_memory`` means atomic operations may be performed on remote
memory, i.e. memory accessed through off-chip interconnects (e.g., PCIe).
On ROCm platforms using HIP, remote memory refers to memory accessed via
PCIe and is subject to specific atomic operation support. See
`ROCm PCIe Atomics <https://rocm.docs.amd.com/en/latest/conceptual/
pcie-atomics.html>`_ for further details. Prefixing with ``no_remote_memory`` indicates that
atomic operations should not be performed on remote memory.
- ``fine_grained_memory`` means atomic operations may be performed on fine-grained
memory, i.e. memory regions that support fine-grained coherence, where updates to
memory are visible to other parts of the system even while modifications are ongoing.
For example, in HIP, fine-grained coherence ensures that host and device share
up-to-date data without explicit synchronization (see
`HIP Definition <https://rocm.docs.amd.com/projects/HIP/en/docs-6.3.3/how-to/hip_runtime_api/memory_management/coherence_control.html#coherence-control>`_).
Similarly, OpenCL 2.0 provides fine-grained synchronization in shared virtual memory
allocations, allowing concurrent modifications by host and device (see
`OpenCL 2.0 Overview <https://www.intel.com/content/www/us/en/developer/articles/technical/opencl-20-shared-virtual-memory-overview.html>`_).
Prefixing with ``no_fine_grained_memory`` indicates that atomic operations should not
be performed on fine-grained memory.
- ``ignore_denormal_mode`` means that atomic operations are allowed to ignore
correctness for denormal mode in floating-point operations, potentially improving
performance on architectures that handle denormals inefficiently. The negated form,
if specified as ``no_ignore_denormal_mode``, would enforce strict denormal mode
correctness.

Any unspecified option is inherited from the global defaults, which can be set
by a compiler flag or the target's built-in defaults.

Within the same atomic attribute, duplicate and conflicting values are accepted,
and the last of any conflicting values wins. Multiple atomic attributes are
allowed for the same compound statement, and the last atomic attribute wins.

Without any atomic metadata, LLVM IR defaults to conservative settings for
correctness: atomic operations enforce denormal mode correctness and are assumed
to potentially use remote and fine-grained memory (i.e., the equivalent of
``remote_memory``, ``fine_grained_memory``, and ``no_ignore_denormal_mode``).

The attribute may be applied only to a compound statement and looks like:

.. code-block:: c++

[[clang::atomic(remote_memory, fine_grained_memory, ignore_denormal_mode)]]
{
// Atomic instructions in this block carry extra metadata reflecting
// these user-specified options.
}

A new compiler option now globally sets the defaults for these atomic-lowering
options. The command-line format has changed to:

.. code-block:: console

$ clang -fatomic-remote-memory -fno-atomic-fine-grained-memory -fatomic-ignore-denormal-mode file.cpp

Each option has a corresponding flag:
``-fatomic-remote-memory`` / ``-fno-atomic-remote-memory``,
``-fatomic-fine-grained-memory`` / ``-fno-atomic-fine-grained-memory``,
and ``-fatomic-ignore-denormal-mode`` / ``-fno-atomic-ignore-denormal-mode``.

Code using the ``[[clang::atomic]]`` attribute can then selectively override
the command-line defaults on a per-block basis. For instance:

.. code-block:: c++

// Suppose the global defaults assume:
// remote_memory, fine_grained_memory, and no_ignore_denormal_mode
// (for conservative correctness)

void example() {
// Locally override the settings: disable remote_memory and enable
// fine_grained_memory.
[[clang::atomic(no_remote_memory, fine_grained_memory)]]
{
// In this block:
// - Atomic operations are not performed on remote memory.
// - Atomic operations are performed on fine-grained memory.
// - The setting for denormal mode remains as the global default
// (typically no_ignore_denormal_mode, enforcing strict denormal mode correctness).
// ...
}
}

Function bodies do not accept statement attributes, so this will not work:

.. code-block:: c++

void func() [[clang::atomic(remote_memory)]] { // Wrong: applies to function type
}

Use the attribute on a compound statement within the function:

.. code-block:: c++

void func() {
[[clang::atomic(remote_memory)]]
{
// Atomic operations in this block carry the specified metadata.
}
}

The ``[[clang::atomic]]`` attribute affects only the code generation of atomic
instructions within the annotated compound statement. Clang attaches target-specific
metadata to those atomic instructions in the emitted LLVM IR to guide backend lowering.
This metadata is fixed at the Clang code generation phase and is not modified by later
LLVM passes (such as function inlining).

For example, consider:

.. code-block:: cpp

inline void func() {
[[clang::atomic(remote_memory)]]
{
// Atomic instructions lowered with metadata.
}
}

void foo() {
[[clang::atomic(no_remote_memory)]]
{
func(); // Inlined by LLVM, but the metadata from 'func()' remains unchanged.
}
}

Although current usage focuses on AMDGPU, the mechanism is general. Other
backends can ignore or implement their own responses to these flags if desired.
If a target does not understand or enforce these hints, the IR remains valid,
and the resulting program is still correct (although potentially less optimized
for that user's needs).

Specifying an attribute for multiple declarations (#pragma clang attribute)
===========================================================================

Expand Down
7 changes: 7 additions & 0 deletions clang/docs/ReleaseNotes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,13 @@ related warnings within the method body.
``format_matches`` accepts an example valid format string as its third
argument. For more information, see the Clang attributes documentation.

- Introduced a new statement attribute ``[[clang::atomic]]`` that enables
fine-grained control over atomic code generation on a per-statement basis.
Supported options include ``[no_]remote_memory``,
``[no_]fine_grained_memory``, and ``[no_]ignore_denormal_mode``. These are
particularly relevant for AMDGPU targets, where they map to corresponding IR
metadata.

Improvements to Clang's diagnostics
-----------------------------------

Expand Down
15 changes: 15 additions & 0 deletions clang/include/clang/Basic/Attr.td
Original file line number Diff line number Diff line change
Expand Up @@ -5001,3 +5001,18 @@ def NoTrivialAutoVarInit: InheritableAttr {
let Documentation = [NoTrivialAutoVarInitDocs];
let SimpleHandler = 1;
}

def Atomic : StmtAttr {
let Spellings = [Clang<"atomic">];
let Args = [VariadicEnumArgument<"AtomicOptions", "ConsumedOption",
/*is_string=*/false,
["remote_memory", "no_remote_memory",
"fine_grained_memory", "no_fine_grained_memory",
"ignore_denormal_mode", "no_ignore_denormal_mode"],
["remote_memory", "no_remote_memory",
"fine_grained_memory", "no_fine_grained_memory",
"ignore_denormal_mode", "no_ignore_denormal_mode"]>];
let Subjects = SubjectList<[CompoundStmt], ErrorDiag, "compound statements">;
let Documentation = [AtomicDocs];
let StrictEnumParameters = 1;
}
15 changes: 15 additions & 0 deletions clang/include/clang/Basic/AttrDocs.td
Original file line number Diff line number Diff line change
Expand Up @@ -8205,6 +8205,21 @@ for details.
}];
}

def AtomicDocs : Documentation {
let Category = DocCatStmt;
let Content = [{
The ``atomic`` attribute can be applied to *compound statements* to override or
further specify the default atomic code-generation behavior, especially on
targets such as AMDGPU. You can annotate compound statements with options
to modify how atomic instructions inside that statement are emitted at the IR
level.

For details, see the documentation for `@atomic
<http://clang.llvm.org/docs/LanguageExtensions.html#extensions-for-controlling-atomic-code-generation>`_

}];
}

def ClangRandomizeLayoutDocs : Documentation {
let Category = DocCatDecl;
let Heading = "randomize_layout, no_randomize_layout";
Expand Down
4 changes: 4 additions & 0 deletions clang/include/clang/Basic/DiagnosticSemaKinds.td
Original file line number Diff line number Diff line change
Expand Up @@ -3286,6 +3286,10 @@ def err_invalid_branch_protection_spec : Error<
"invalid or misplaced branch protection specification '%0'">;
def warn_unsupported_branch_protection_spec : Warning<
"unsupported branch protection specification '%0'">, InGroup<BranchProtection>;
def err_attribute_invalid_atomic_argument : Error<
"invalid argument '%0' to atomic attribute; valid options are: "
"'remote_memory', 'fine_grained_memory', 'ignore_denormal_mode' (optionally "
"prefixed with 'no_')">;

def warn_unsupported_target_attribute
: Warning<"%select{unsupported|duplicate|unknown}0%select{| CPU|"
Expand Down
2 changes: 2 additions & 0 deletions clang/include/clang/Basic/Features.def
Original file line number Diff line number Diff line change
Expand Up @@ -313,6 +313,8 @@ EXTENSION(datasizeof, LangOpts.CPlusPlus)

FEATURE(cxx_abi_relative_vtable, LangOpts.CPlusPlus && LangOpts.RelativeCXXABIVTables)

FEATURE(clang_atomic_attributes, true)

// CUDA/HIP Features
FEATURE(cuda_noinline_keyword, LangOpts.CUDA)
EXTENSION(cuda_implicit_host_device_templates, LangOpts.CUDA && LangOpts.OffloadImplicitHostDeviceTemplates)
Expand Down
66 changes: 66 additions & 0 deletions clang/include/clang/Basic/LangOptions.h
Original file line number Diff line number Diff line change
Expand Up @@ -630,6 +630,12 @@ class LangOptions : public LangOptionsBase {
// WebAssembly target.
bool NoWasmOpt = false;

/// Atomic code-generation options.
/// These flags are set directly from the command-line options.
bool AtomicRemoteMemory = false;
bool AtomicFineGrainedMemory = false;
bool AtomicIgnoreDenormalMode = false;

LangOptions();

/// Set language defaults for the given input language and
Expand Down Expand Up @@ -1109,6 +1115,66 @@ inline void FPOptions::applyChanges(FPOptionsOverride FPO) {
*this = FPO.applyOverrides(*this);
}

// The three atomic code-generation options.
// The canonical (positive) names are:
// "remote_memory", "fine_grained_memory", and "ignore_denormal_mode".
// In attribute or command-line parsing, a token prefixed with "no_" inverts its
// value.
enum class AtomicOptionKind {
RemoteMemory, // enable remote memory.
FineGrainedMemory, // enable fine-grained memory.
IgnoreDenormalMode, // ignore floating-point denormals.
LANGOPT_ATOMIC_OPTION_LAST = IgnoreDenormalMode,
};

struct AtomicOptions {
// Bitfields for each option.
unsigned remote_memory : 1;
unsigned fine_grained_memory : 1;
unsigned ignore_denormal_mode : 1;

AtomicOptions()
: remote_memory(0), fine_grained_memory(0), ignore_denormal_mode(0) {}

AtomicOptions(const LangOptions &LO)
: remote_memory(LO.AtomicRemoteMemory),
fine_grained_memory(LO.AtomicFineGrainedMemory),
ignore_denormal_mode(LO.AtomicIgnoreDenormalMode) {}

bool getOption(AtomicOptionKind Kind) const {
switch (Kind) {
case AtomicOptionKind::RemoteMemory:
return remote_memory;
case AtomicOptionKind::FineGrainedMemory:
return fine_grained_memory;
case AtomicOptionKind::IgnoreDenormalMode:
return ignore_denormal_mode;
}
llvm_unreachable("Invalid AtomicOptionKind");
}

void setOption(AtomicOptionKind Kind, bool Value) {
switch (Kind) {
case AtomicOptionKind::RemoteMemory:
remote_memory = Value;
return;
case AtomicOptionKind::FineGrainedMemory:
fine_grained_memory = Value;
return;
case AtomicOptionKind::IgnoreDenormalMode:
ignore_denormal_mode = Value;
return;
}
llvm_unreachable("Invalid AtomicOptionKind");
}

LLVM_DUMP_METHOD void dump() const {
llvm::errs() << "\n remote_memory: " << remote_memory
<< "\n fine_grained_memory: " << fine_grained_memory
<< "\n ignore_denormal_mode: " << ignore_denormal_mode << "\n";
}
};

/// Describes the kind of translation unit being processed.
enum TranslationUnitKind {
/// The translation unit is a complete translation unit.
Expand Down
10 changes: 6 additions & 4 deletions clang/include/clang/Basic/TargetInfo.h
Original file line number Diff line number Diff line change
Expand Up @@ -301,6 +301,9 @@ class TargetInfo : public TransferrableTargetInfo,
// in function attributes in IR.
llvm::StringSet<> ReadOnlyFeatures;

// Default atomic options
AtomicOptions AtomicOpts;

public:
/// Construct a target for the given options.
///
Expand Down Expand Up @@ -1060,10 +1063,6 @@ class TargetInfo : public TransferrableTargetInfo,
/// available on this target.
bool hasRISCVVTypes() const { return HasRISCVVTypes; }

/// Returns whether or not the AMDGPU unsafe floating point atomics are
/// allowed.
bool allowAMDGPUUnsafeFPAtomics() const { return AllowAMDGPUUnsafeFPAtomics; }

/// For ARM targets returns a mask defining which coprocessors are configured
/// as Custom Datapath.
uint32_t getARMCDECoprocMask() const { return ARMCDECoprocMask; }
Expand Down Expand Up @@ -1699,6 +1698,9 @@ class TargetInfo : public TransferrableTargetInfo,
return CC_C;
}

/// Get the default atomic options.
AtomicOptions getAtomicOpts() const { return AtomicOpts; }

enum CallingConvCheckResult {
CCCR_OK,
CCCR_Warning,
Expand Down
3 changes: 0 additions & 3 deletions clang/include/clang/Basic/TargetOptions.h
Original file line number Diff line number Diff line change
Expand Up @@ -75,9 +75,6 @@ class TargetOptions {
/// address space.
bool NVPTXUseShortPointers = false;

/// \brief If enabled, allow AMDGPU unsafe floating point atomics.
bool AllowAMDGPUUnsafeFPAtomics = false;

/// \brief Code object version for AMDGPU.
llvm::CodeObjectVersionKind CodeObjectVersion =
llvm::CodeObjectVersionKind::COV_None;
Expand Down
Loading
Loading