llvm · yxsamliu · Feb 27, 2025 · Jul 17, 2024
diff --git a/clang/docs/LanguageExtensions.rst b/clang/docs/LanguageExtensions.rst
@@ -5442,6 +5442,166 @@ third argument, can only occur at file scope.
     a = b[i] * c[i] + e;
   }
 
+Extensions for controlling atomic code generation
+=================================================
+
+The ``[[clang::atomic]]`` statement attribute enables users to control how
+atomic operations are lowered in LLVM IR by conveying additional metadata to
+the backend. The primary goal is to allow users to specify certain options,
+like whether the affected atomic operations might be used with specific types of memory or
+whether to ignore denormal mode correctness in floating-point operations,
+without affecting the correctness of code that does not rely on these properties.
+
+In LLVM, lowering of atomic operations (e.g., ``atomicrmw``) can differ based
+on the target's capabilities. Some backends support native atomic instructions
+only for certain operation types or alignments, or only in specific memory
+regions. Likewise, floating-point atomic instructions may or may not respect
+IEEE denormal requirements. When the user is unconcerned about denormal-mode
+compliance (for performance reasons) or knows that certain atomic operations
+will not be performed on a particular type of memory, extra hints are needed to
+tell the backend how to proceed.
+
+A classic example is an architecture where floating-point atomic add does not
+fully conform to IEEE denormal-mode handling. If the user does not mind ignoring
+that aspect, they would prefer to emit a faster hardware atomic instruction,
+rather than a fallback or CAS loop. Conversely, on certain GPUs (e.g., AMDGPU),
+memory accessed via PCIe may only support a subset of atomic operations. To ensure
+correct and efficient lowering, the compiler must know whether the user needs
+the atomic operations to work with that type of memory.
+
+The allowed atomic attribute values are now ``remote_memory``, ``fine_grained_memory``,
+and ``ignore_denormal_mode``, each optionally prefixed with ``no_``. The meanings
+are as follows:
+
+- ``remote_memory`` means atomic operations may be performed on remote
+  memory, i.e. memory accessed through off-chip interconnects (e.g., PCIe).
+  On ROCm platforms using HIP, remote memory refers to memory accessed via
+  PCIe and is subject to specific atomic operation support. See
+  `ROCm PCIe Atomics <https://rocm.docs.amd.com/en/latest/conceptual/
+  pcie-atomics.html>`_ for further details. Prefixing with ``no_remote_memory`` indicates that
+  atomic operations should not be performed on remote memory.
+- ``fine_grained_memory`` means atomic operations may be performed on fine-grained
+  memory, i.e. memory regions that support fine-grained coherence, where updates to
+  memory are visible to other parts of the system even while modifications are ongoing.
+  For example, in HIP, fine-grained coherence ensures that host and device share
+  up-to-date data without explicit synchronization (see
+  `HIP Definition <https://rocm.docs.amd.com/projects/HIP/en/docs-6.3.3/how-to/hip_runtime_api/memory_management/coherence_control.html#coherence-control>`_).
+  Similarly, OpenCL 2.0 provides fine-grained synchronization in shared virtual memory
+  allocations, allowing concurrent modifications by host and device (see
+  `OpenCL 2.0 Overview <https://www.intel.com/content/www/us/en/developer/articles/technical/opencl-20-shared-virtual-memory-overview.html>`_).
+  Prefixing with ``no_fine_grained_memory`` indicates that atomic operations should not
+  be performed on fine-grained memory.
+- ``ignore_denormal_mode`` means that atomic operations are allowed to ignore
+  correctness for denormal mode in floating-point operations, potentially improving
+  performance on architectures that handle denormals inefficiently. The negated form,
+  if specified as ``no_ignore_denormal_mode``, would enforce strict denormal mode
+  correctness.
+
+Any unspecified option is inherited from the global defaults, which can be set
+by a compiler flag or the target's built-in defaults.
+
+Within the same atomic attribute, duplicate and conflicting values are accepted,
+and the last of any conflicting values wins. Multiple atomic attributes are
+allowed for the same compound statement, and the last atomic attribute wins.
+
+Without any atomic metadata, LLVM IR defaults to conservative settings for
+correctness: atomic operations enforce denormal mode correctness and are assumed
+to potentially use remote and fine-grained memory (i.e., the equivalent of
+``remote_memory``, ``fine_grained_memory``, and ``no_ignore_denormal_mode``).
+
+The attribute may be applied only to a compound statement and looks like:
+
+.. code-block:: c++
+
+   [[clang::atomic(remote_memory, fine_grained_memory, ignore_denormal_mode)]]
+   {
+       // Atomic instructions in this block carry extra metadata reflecting
+       // these user-specified options.
+   }
+
+A new compiler option now globally sets the defaults for these atomic-lowering
+options. The command-line format has changed to:
+
+.. code-block:: console
+
+   $ clang -fatomic-remote-memory -fno-atomic-fine-grained-memory -fatomic-ignore-denormal-mode file.cpp
+
+Each option has a corresponding flag:
+``-fatomic-remote-memory`` / ``-fno-atomic-remote-memory``,
+``-fatomic-fine-grained-memory`` / ``-fno-atomic-fine-grained-memory``,
+and ``-fatomic-ignore-denormal-mode`` / ``-fno-atomic-ignore-denormal-mode``.
+
+Code using the ``[[clang::atomic]]`` attribute can then selectively override
+the command-line defaults on a per-block basis. For instance:
+
+.. code-block:: c++
+
+   // Suppose the global defaults assume:
+   //   remote_memory, fine_grained_memory, and no_ignore_denormal_mode
+   // (for conservative correctness)
+
+   void example() {
+       // Locally override the settings: disable remote_memory and enable
+       // fine_grained_memory.
+       [[clang::atomic(no_remote_memory, fine_grained_memory)]]
+       {
+           // In this block:
+           //   - Atomic operations are not performed on remote memory.
+           //   - Atomic operations are performed on fine-grained memory.
+           //   - The setting for denormal mode remains as the global default
+           //     (typically no_ignore_denormal_mode, enforcing strict denormal mode correctness).
+           // ...
+       }
+   }
+
+Function bodies do not accept statement attributes, so this will not work:
+
+.. code-block:: c++
+
+   void func() [[clang::atomic(remote_memory)]] {  // Wrong: applies to function type
+   }
+
+Use the attribute on a compound statement within the function:
+
+.. code-block:: c++
+
+   void func() {
+       [[clang::atomic(remote_memory)]]
+       {
+           // Atomic operations in this block carry the specified metadata.
+       }
+   }
+
+The ``[[clang::atomic]]`` attribute affects only the code generation of atomic
+instructions within the annotated compound statement. Clang attaches target-specific
+metadata to those atomic instructions in the emitted LLVM IR to guide backend lowering.
+This metadata is fixed at the Clang code generation phase and is not modified by later
+LLVM passes (such as function inlining).
+
+For example, consider:
+
+.. code-block:: cpp
+
+  inline void func() {
+    [[clang::atomic(remote_memory)]]
+    {
+      // Atomic instructions lowered with metadata.
+    }
+  }
+
+  void foo() {
+    [[clang::atomic(no_remote_memory)]]
+    {
+      func(); // Inlined by LLVM, but the metadata from 'func()' remains unchanged.
+    }
+  }
+
+Although current usage focuses on AMDGPU, the mechanism is general. Other
+backends can ignore or implement their own responses to these flags if desired.
+If a target does not understand or enforce these hints, the IR remains valid,
+and the resulting program is still correct (although potentially less optimized
+for that user's needs).
+
 Specifying an attribute for multiple declarations (#pragma clang attribute)
 ===========================================================================
 

diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
@@ -181,6 +181,13 @@ related warnings within the method body.
   ``format_matches`` accepts an example valid format string as its third
   argument. For more information, see the Clang attributes documentation.
 
+- Introduced a new statement attribute ``[[clang::atomic]]`` that enables
+  fine-grained control over atomic code generation on a per-statement basis.
+  Supported options include ``[no_]remote_memory``,
+  ``[no_]fine_grained_memory``, and ``[no_]ignore_denormal_mode``. These are
+  particularly relevant for AMDGPU targets, where they map to corresponding IR
+  metadata.
+
 Improvements to Clang's diagnostics
 -----------------------------------
 

diff --git a/clang/include/clang/Basic/Attr.td b/clang/include/clang/Basic/Attr.td
@@ -5001,3 +5001,18 @@ def NoTrivialAutoVarInit: InheritableAttr {
   let Documentation = [NoTrivialAutoVarInitDocs];
   let SimpleHandler = 1;
 }
+
+def Atomic : StmtAttr {
+  let Spellings = [Clang<"atomic">];
+  let Args = [VariadicEnumArgument<"AtomicOptions", "ConsumedOption",
+                                   /*is_string=*/false,
+                                   ["remote_memory", "no_remote_memory",
+                                    "fine_grained_memory", "no_fine_grained_memory",
+                                    "ignore_denormal_mode", "no_ignore_denormal_mode"],
+                                   ["remote_memory", "no_remote_memory",
+                                    "fine_grained_memory", "no_fine_grained_memory",
+                                    "ignore_denormal_mode", "no_ignore_denormal_mode"]>];
+  let Subjects = SubjectList<[CompoundStmt], ErrorDiag, "compound statements">;
+  let Documentation = [AtomicDocs];
+  let StrictEnumParameters = 1;
+}
diff --git a/clang/include/clang/Basic/AttrDocs.td b/clang/include/clang/Basic/AttrDocs.td
@@ -8205,6 +8205,21 @@ for details.
   }];
 }
 
+def AtomicDocs : Documentation {
+  let Category = DocCatStmt;
+  let Content = [{
+The ``atomic`` attribute can be applied to *compound statements* to override or
+further specify the default atomic code-generation behavior, especially on
+targets such as AMDGPU. You can annotate compound statements with options
+to modify how atomic instructions inside that statement are emitted at the IR
+level.
+
+For details, see the documentation for `@atomic
+<http://clang.llvm.org/docs/LanguageExtensions.html#extensions-for-controlling-atomic-code-generation>`_
+
+  }];
+}
+
 def ClangRandomizeLayoutDocs : Documentation {
   let Category = DocCatDecl;
   let Heading = "randomize_layout, no_randomize_layout";

diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -3286,6 +3286,10 @@ def err_invalid_branch_protection_spec : Error<
   "invalid or misplaced branch protection specification '%0'">;
 def warn_unsupported_branch_protection_spec : Warning<
   "unsupported branch protection specification '%0'">, InGroup<BranchProtection>;
+def err_attribute_invalid_atomic_argument : Error<
+  "invalid argument '%0' to atomic attribute; valid options are: "
+  "'remote_memory', 'fine_grained_memory', 'ignore_denormal_mode' (optionally "
+  "prefixed with 'no_')">;
 
 def warn_unsupported_target_attribute
     : Warning<"%select{unsupported|duplicate|unknown}0%select{| CPU|"

diff --git a/clang/include/clang/Basic/Features.def b/clang/include/clang/Basic/Features.def
@@ -313,6 +313,8 @@ EXTENSION(datasizeof, LangOpts.CPlusPlus)
 
 FEATURE(cxx_abi_relative_vtable, LangOpts.CPlusPlus && LangOpts.RelativeCXXABIVTables)
 
+FEATURE(clang_atomic_attributes, true)
+
 // CUDA/HIP Features
 FEATURE(cuda_noinline_keyword, LangOpts.CUDA)
 EXTENSION(cuda_implicit_host_device_templates, LangOpts.CUDA && LangOpts.OffloadImplicitHostDeviceTemplates)

diff --git a/clang/include/clang/Basic/LangOptions.h b/clang/include/clang/Basic/LangOptions.h
@@ -630,6 +630,12 @@ class LangOptions : public LangOptionsBase {
   // WebAssembly target.
   bool NoWasmOpt = false;
 
+  /// Atomic code-generation options.
+  /// These flags are set directly from the command-line options.
+  bool AtomicRemoteMemory = false;
+  bool AtomicFineGrainedMemory = false;
+  bool AtomicIgnoreDenormalMode = false;
+
   LangOptions();
 
   /// Set language defaults for the given input language and
@@ -1109,6 +1115,66 @@ inline void FPOptions::applyChanges(FPOptionsOverride FPO) {
   *this = FPO.applyOverrides(*this);
 }
 
+// The three atomic code-generation options.
+// The canonical (positive) names are:
+//   "remote_memory", "fine_grained_memory", and "ignore_denormal_mode".
+// In attribute or command-line parsing, a token prefixed with "no_" inverts its
+// value.
+enum class AtomicOptionKind {
+  RemoteMemory,       // enable remote memory.
+  FineGrainedMemory,  // enable fine-grained memory.
+  IgnoreDenormalMode, // ignore floating-point denormals.
+  LANGOPT_ATOMIC_OPTION_LAST = IgnoreDenormalMode,
+};
+
+struct AtomicOptions {
+  // Bitfields for each option.
+  unsigned remote_memory : 1;
+  unsigned fine_grained_memory : 1;
+  unsigned ignore_denormal_mode : 1;
+
+  AtomicOptions()
+      : remote_memory(0), fine_grained_memory(0), ignore_denormal_mode(0) {}
+
+  AtomicOptions(const LangOptions &LO)
+      : remote_memory(LO.AtomicRemoteMemory),
+        fine_grained_memory(LO.AtomicFineGrainedMemory),
+        ignore_denormal_mode(LO.AtomicIgnoreDenormalMode) {}
+
+  bool getOption(AtomicOptionKind Kind) const {
+    switch (Kind) {
+    case AtomicOptionKind::RemoteMemory:
+      return remote_memory;
+    case AtomicOptionKind::FineGrainedMemory:
+      return fine_grained_memory;
+    case AtomicOptionKind::IgnoreDenormalMode:
+      return ignore_denormal_mode;
+    }
+    llvm_unreachable("Invalid AtomicOptionKind");
+  }
+
+  void setOption(AtomicOptionKind Kind, bool Value) {
+    switch (Kind) {
+    case AtomicOptionKind::RemoteMemory:
+      remote_memory = Value;
+      return;
+    case AtomicOptionKind::FineGrainedMemory:
+      fine_grained_memory = Value;
+      return;
+    case AtomicOptionKind::IgnoreDenormalMode:
+      ignore_denormal_mode = Value;
+      return;
+    }
+    llvm_unreachable("Invalid AtomicOptionKind");
+  }
+
+  LLVM_DUMP_METHOD void dump() const {
+    llvm::errs() << "\n remote_memory: " << remote_memory
+                 << "\n fine_grained_memory: " << fine_grained_memory
+                 << "\n ignore_denormal_mode: " << ignore_denormal_mode << "\n";
+  }
+};
+
 /// Describes the kind of translation unit being processed.
 enum TranslationUnitKind {
   /// The translation unit is a complete translation unit.

diff --git a/clang/include/clang/Basic/TargetInfo.h b/clang/include/clang/Basic/TargetInfo.h
@@ -301,6 +301,9 @@ class TargetInfo : public TransferrableTargetInfo,
   // in function attributes in IR.
   llvm::StringSet<> ReadOnlyFeatures;
 
+  // Default atomic options
+  AtomicOptions AtomicOpts;
+
 public:
   /// Construct a target for the given options.
   ///
@@ -1060,10 +1063,6 @@ class TargetInfo : public TransferrableTargetInfo,
   /// available on this target.
   bool hasRISCVVTypes() const { return HasRISCVVTypes; }
 
-  /// Returns whether or not the AMDGPU unsafe floating point atomics are
-  /// allowed.
-  bool allowAMDGPUUnsafeFPAtomics() const { return AllowAMDGPUUnsafeFPAtomics; }
-
   /// For ARM targets returns a mask defining which coprocessors are configured
   /// as Custom Datapath.
   uint32_t getARMCDECoprocMask() const { return ARMCDECoprocMask; }
@@ -1699,6 +1698,9 @@ class TargetInfo : public TransferrableTargetInfo,
     return CC_C;
   }
 
+  /// Get the default atomic options.
+  AtomicOptions getAtomicOpts() const { return AtomicOpts; }
+
   enum CallingConvCheckResult {
     CCCR_OK,
     CCCR_Warning,

diff --git a/clang/include/clang/Basic/TargetOptions.h b/clang/include/clang/Basic/TargetOptions.h
@@ -75,9 +75,6 @@ class TargetOptions {
   /// address space.
   bool NVPTXUseShortPointers = false;
 
-  /// \brief If enabled, allow AMDGPU unsafe floating point atomics.
-  bool AllowAMDGPUUnsafeFPAtomics = false;
-
   /// \brief Code object version for AMDGPU.
   llvm::CodeObjectVersionKind CodeObjectVersion =
       llvm::CodeObjectVersionKind::COV_None;