Skip to content

[Clang] Make the SizeType, SignedSizeType and PtrdiffType be named sugar types instead of built-in types #143653

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 30 commits into
base: main
Choose a base branch
from

Conversation

YexuanXiao
Copy link

@YexuanXiao YexuanXiao commented Jun 11, 2025

Including the results of sizeof, sizeof..., __datasizeof, __alignof, _Alignof, alignof, _Countof, size_t literals, and signed size_t literals, the results of pointer-pointer subtraction and checks for standard library functions (and their calls).

The goal is to enable clang and downstream tools such as clangd and clang-tidy to provide more portable hints and diagnostics.

The previous discussion can be found at #136542.

The current HEAD commit implements this feature by introducing a new subtype of Type called PredefinedSugarType, which was considered appropriate in discussions. I tried to keep PredefinedSugarType simple enough yet not limited to size_t and ptrdiff_t so that it can be used for other purposes. PredefinedSugarType wraps a canonical Type and provides a name, conceptually similar to a compiler internal TypedefType but without depending on a TypedefDecl or a source file.

Additionally, checks for the z and t format specifiers in format strings for scanf and printf were added. It will precisely match expressions using typedefs or built-in expressions.

The affected tests indicates that it works very well.

Several code assume that SizeType is canonical and must remain canonical, so I converted SizeType to its canonical form.

@YexuanXiao YexuanXiao requested a review from Endilll as a code owner June 11, 2025 04:19
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen IR generation bugs: mangling, exceptions, etc. clang:static analyzer coroutines C++20 coroutines clang:openmp OpenMP related changes to Clang labels Jun 11, 2025
@YexuanXiao
Copy link
Author

CC @AaronBallman

@llvmbot
Copy link
Member

llvmbot commented Jun 11, 2025

@llvm/pr-subscribers-clang-static-analyzer-1
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-codegen

Author: YexuanXiao (YexuanXiao)

Changes

Includeing the results of sizeof, sizeof..., __datasizeof, __alignof, _Alignof, alignof, _Countof, size_t literals, and signed size_t literals, as well as the results of pointer-pointer subtraction. The goal is to enable clang and downstream tools such as clangd and clang-tidy to provide more portable hints and diagnostics.

The previous discussion can be found at #136542.

It was implemented by injecting __size_t, __signed_size_t, and __ptrdiff_t into the AST. Additionally, checks for the z and j format specifiers in format strings for scanf and printf were added.

Several code assume that SizeType is canonical and must remain canonical, so I converted SizeType to its canonical form. Extensive testing of the modifications indicates that it works very well (aside from the unsightly double underscores).

The test CodeGen/cfi-unrelated-cast.cpp could not be fixed because I am unfamiliar with LLVM IR. The tests Modules/new-delete.cpp, PCH/cxx-exprs.cpp, PCH/cxx1z-aligned-alloc.cpp, SemaCXX/delete.cpp, and OpenMP/declare_target_codegen.cpp reported ambiguity issues with new and delete expressions. Since I have no clue how to resolve them, I was unable to fix these tests. I would be very grateful if someone could fix them.


Patch is 325.96 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/143653.diff

56 Files Affected:

  • (modified) clang/include/clang/AST/ASTContext.h (+19-11)
  • (modified) clang/lib/AST/ASTContext.cpp (+50-17)
  • (modified) clang/lib/AST/FormatString.cpp (+87-21)
  • (modified) clang/lib/AST/PrintfFormatString.cpp (+6-3)
  • (modified) clang/lib/AST/ScanfFormatString.cpp (+12-7)
  • (modified) clang/lib/CodeGen/CGCall.cpp (+2-1)
  • (modified) clang/lib/CodeGen/CGCoroutine.cpp (+2-2)
  • (modified) clang/lib/CodeGen/CGObjCMac.cpp (+1-1)
  • (modified) clang/lib/Sema/SemaChecking.cpp (+1-1)
  • (modified) clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp (+44-36)
  • (modified) clang/lib/StaticAnalyzer/Checkers/VLASizeChecker.cpp (+1-1)
  • (modified) clang/test/AST/ast-dump-array.cpp (+1-1)
  • (modified) clang/test/AST/ast-dump-expr-json.c (+9-3)
  • (modified) clang/test/AST/ast-dump-expr-json.cpp (+18-10)
  • (modified) clang/test/AST/ast-dump-expr.c (+3-3)
  • (modified) clang/test/AST/ast-dump-expr.cpp (+8-8)
  • (modified) clang/test/AST/ast-dump-openmp-distribute-parallel-for-simd.c (+10-10)
  • (modified) clang/test/AST/ast-dump-openmp-distribute-parallel-for.c (+10-10)
  • (modified) clang/test/AST/ast-dump-openmp-target-teams-distribute-parallel-for-simd.c (+80-80)
  • (modified) clang/test/AST/ast-dump-openmp-target-teams-distribute-parallel-for.c (+80-80)
  • (modified) clang/test/AST/ast-dump-openmp-teams-distribute-parallel-for-simd.c (+80-80)
  • (modified) clang/test/AST/ast-dump-openmp-teams-distribute-parallel-for.c (+80-80)
  • (modified) clang/test/AST/ast-dump-recovery.c (+1-1)
  • (modified) clang/test/AST/ast-dump-stmt-json.cpp (+58-28)
  • (modified) clang/test/AST/ast-dump-stmt.cpp (+2-2)
  • (modified) clang/test/AST/ast-dump-traits.cpp (+4-4)
  • (modified) clang/test/AST/ast-dump-types-errors-json.cpp (+3-1)
  • (modified) clang/test/Analysis/cfg.cpp (+1-1)
  • (modified) clang/test/Analysis/explain-svals.cpp (+1-1)
  • (modified) clang/test/Analysis/std-c-library-functions-arg-weakdeps.c (+1-1)
  • (modified) clang/test/Analysis/std-c-library-functions-lookup.c (+1-1)
  • (modified) clang/test/Analysis/std-c-library-functions-vs-stream-checker.c (+2-2)
  • (modified) clang/test/Analysis/std-c-library-functions.c (+2-2)
  • (modified) clang/test/CXX/drs/cwg2xx.cpp (+1-1)
  • (modified) clang/test/CXX/lex/lex.literal/lex.ext/p2.cpp (+5-5)
  • (modified) clang/test/CXX/lex/lex.literal/lex.ext/p5.cpp (+3-3)
  • (modified) clang/test/CXX/lex/lex.literal/lex.ext/p7.cpp (+1-1)
  • (modified) clang/test/FixIt/fixit-format-ios-nopedantic.m (+1-1)
  • (modified) clang/test/FixIt/format.m (+3-3)
  • (modified) clang/test/Sema/format-strings-fixit-ssize_t.c (+1-1)
  • (modified) clang/test/Sema/format-strings-int-typedefs.c (+6-6)
  • (modified) clang/test/Sema/format-strings-scanf.c (+4-4)
  • (modified) clang/test/Sema/format-strings-size_t.c (+6-7)
  • (modified) clang/test/Sema/matrix-type-builtins.c (+4-4)
  • (modified) clang/test/Sema/ptrauth-atomic-ops.c (+1-1)
  • (modified) clang/test/Sema/ptrauth.c (+1-1)
  • (modified) clang/test/SemaCXX/cxx2c-trivially-relocatable.cpp (+1-1)
  • (modified) clang/test/SemaCXX/enum-scoped.cpp (+2-2)
  • (modified) clang/test/SemaCXX/new-delete.cpp (+1-1)
  • (modified) clang/test/SemaCXX/static-assert-cxx26.cpp (+7-7)
  • (modified) clang/test/SemaCXX/type-aware-new-delete-basic-free-declarations.cpp (+1-1)
  • (modified) clang/test/SemaCXX/unavailable_aligned_allocation.cpp (+12-12)
  • (modified) clang/test/SemaObjC/format-size-spec-nsinteger.m (+5-12)
  • (modified) clang/test/SemaObjC/matrix-type-builtins.m (+1-1)
  • (modified) clang/test/SemaOpenCL/cl20-device-side-enqueue.cl (+3-3)
  • (modified) clang/test/SemaTemplate/type_pack_element.cpp (+6-6)
diff --git a/clang/include/clang/AST/ASTContext.h b/clang/include/clang/AST/ASTContext.h
index 8d24d393eab09..bd4600e479b1b 100644
--- a/clang/include/clang/AST/ASTContext.h
+++ b/clang/include/clang/AST/ASTContext.h
@@ -25,6 +25,7 @@
 #include "clang/AST/RawCommentList.h"
 #include "clang/AST/SYCLKernelInfo.h"
 #include "clang/AST/TemplateName.h"
+#include "clang/AST/Type.h"
 #include "clang/Basic/LLVM.h"
 #include "clang/Basic/PartialDiagnostic.h"
 #include "clang/Basic/SourceLocation.h"
@@ -1952,6 +1953,13 @@ class ASTContext : public RefCountedBase<ASTContext> {
                                                         bool IsDependent,
                                                         QualType Canon) const;
 
+  // The core language uses these types as the result types of some expressions,
+  // which are typically standard integer types and consistent with it's
+  // typedefs (if any). These variables store the typedefs generated in the AST,
+  // not the typedefs provided in the header files.
+  mutable QualType SizeType;       // __size_t
+  mutable QualType SignedSizeType; // __signed_size_t
+  mutable QualType PtrdiffType;    // __ptrdiff_t
 public:
   /// Return the unique reference to the type for the specified TagDecl
   /// (struct/union/class/enum) decl.
@@ -1961,11 +1969,20 @@ class ASTContext : public RefCountedBase<ASTContext> {
   /// <stddef.h>.
   ///
   /// The sizeof operator requires this (C99 6.5.3.4p4).
-  CanQualType getSizeType() const;
+  QualType getSizeType() const;
 
   /// Return the unique signed counterpart of
   /// the integer type corresponding to size_t.
-  CanQualType getSignedSizeType() const;
+  QualType getSignedSizeType() const;
+
+  /// Return the unique type for "ptrdiff_t" (C99 7.17) defined in
+  /// <stddef.h>. Pointer - pointer requires this (C99 6.5.6p9).
+  QualType getPointerDiffType() const;
+
+  /// Return the unique unsigned counterpart of "ptrdiff_t"
+  /// integer type. The standard (C11 7.21.6.1p7) refers to this type
+  /// in the definition of %tu format specifier.
+  QualType getUnsignedPointerDiffType() const;
 
   /// Return the unique type for "intmax_t" (C99 7.18.1.5), defined in
   /// <stdint.h>.
@@ -2006,15 +2023,6 @@ class ASTContext : public RefCountedBase<ASTContext> {
   /// as defined by the target.
   QualType getUIntPtrType() const;
 
-  /// Return the unique type for "ptrdiff_t" (C99 7.17) defined in
-  /// <stddef.h>. Pointer - pointer requires this (C99 6.5.6p9).
-  QualType getPointerDiffType() const;
-
-  /// Return the unique unsigned counterpart of "ptrdiff_t"
-  /// integer type. The standard (C11 7.21.6.1p7) refers to this type
-  /// in the definition of %tu format specifier.
-  QualType getUnsignedPointerDiffType() const;
-
   /// Return the unique type for "pid_t" defined in
   /// <sys/types.h>. We need this to compute the correct type for vfork().
   QualType getProcessIDType() const;
diff --git a/clang/lib/AST/ASTContext.cpp b/clang/lib/AST/ASTContext.cpp
index 45f9602856840..00f8f87466273 100644
--- a/clang/lib/AST/ASTContext.cpp
+++ b/clang/lib/AST/ASTContext.cpp
@@ -6726,17 +6726,63 @@ QualType ASTContext::getTagDeclType(const TagDecl *Decl) const {
   return getTypeDeclType(const_cast<TagDecl*>(Decl));
 }
 
+// Inject __size_t, __signed_size_t, and __ptrdiff_t to provide portable hints
+// and diagnostics. In C and C++, expressions of type size_t can be obtained via
+// the sizeof operator, expressions of type ptrdiff_t via pointer subtraction,
+// and expressions of type signed size_t via the z literal suffix (since C++23).
+// However, no core language mechanism directly produces an expression of type
+// unsigned ptrdiff_t. The unsigned ptrdiff_t type is solely required by format
+// specifiers for printf and scanf. Consequently, no expression's type needs to
+// be displayed as unsigned ptrdiff_t. Verification of whether a type is
+// unsigned ptrdiff_t is also unnecessary, as no corresponding typedefs exist.
+// Therefore, injecting a typedef for signed ptrdiff_t is not required.
+
 /// getSizeType - Return the unique type for "size_t" (C99 7.17), the result
 /// of the sizeof operator (C99 6.5.3.4p4). The value is target dependent and
 /// needs to agree with the definition in <stddef.h>.
-CanQualType ASTContext::getSizeType() const {
-  return getFromTargetType(Target->getSizeType());
+QualType ASTContext::getSizeType() const {
+  if (SizeType.isNull()) {
+    if (auto const &LO = getLangOpts(); !LO.HLSL && (LO.C99 || LO.CPlusPlus))
+      SizeType = getTypedefType(buildImplicitTypedef(
+          getFromTargetType(Target->getSizeType()), "__size_t"));
+    else
+      SizeType = getFromTargetType(Target->getSizeType());
+  }
+  return SizeType;
 }
 
 /// Return the unique signed counterpart of the integer type
 /// corresponding to size_t.
-CanQualType ASTContext::getSignedSizeType() const {
-  return getFromTargetType(Target->getSignedSizeType());
+QualType ASTContext::getSignedSizeType() const {
+  if (SignedSizeType.isNull()) {
+    if (auto const &LO = getLangOpts(); !LO.HLSL && (LO.C99 || LO.CPlusPlus))
+      SignedSizeType = getTypedefType(buildImplicitTypedef(
+          getFromTargetType(Target->getSignedSizeType()), "__signed_size_t"));
+    else
+      SignedSizeType = getFromTargetType(Target->getSignedSizeType());
+  }
+  return SignedSizeType;
+}
+
+/// getPointerDiffType - Return the unique type for "ptrdiff_t" (C99 7.17)
+/// defined in <stddef.h>. Pointer - pointer requires this (C99 6.5.6p9).
+QualType ASTContext::getPointerDiffType() const {
+  if (PtrdiffType.isNull()) {
+    if (auto const &LO = getLangOpts(); !LO.HLSL && (LO.C99 || LO.CPlusPlus))
+      PtrdiffType = getTypedefType(buildImplicitTypedef(
+          getFromTargetType(Target->getPtrDiffType(LangAS::Default)),
+          "__ptrdiff_t"));
+    else
+      PtrdiffType = getFromTargetType(Target->getPtrDiffType(LangAS::Default));
+  }
+  return PtrdiffType;
+}
+
+/// Return the unique unsigned counterpart of "ptrdiff_t"
+/// integer type. The standard (C11 7.21.6.1p7) refers to this type
+/// in the definition of %tu format specifier.
+QualType ASTContext::getUnsignedPointerDiffType() const {
+  return getFromTargetType(Target->getUnsignedPtrDiffType(LangAS::Default));
 }
 
 /// getIntMaxType - Return the unique type for "intmax_t" (C99 7.18.1.5).
@@ -6771,19 +6817,6 @@ QualType ASTContext::getUIntPtrType() const {
   return getCorrespondingUnsignedType(getIntPtrType());
 }
 
-/// getPointerDiffType - Return the unique type for "ptrdiff_t" (C99 7.17)
-/// defined in <stddef.h>. Pointer - pointer requires this (C99 6.5.6p9).
-QualType ASTContext::getPointerDiffType() const {
-  return getFromTargetType(Target->getPtrDiffType(LangAS::Default));
-}
-
-/// Return the unique unsigned counterpart of "ptrdiff_t"
-/// integer type. The standard (C11 7.21.6.1p7) refers to this type
-/// in the definition of %tu format specifier.
-QualType ASTContext::getUnsignedPointerDiffType() const {
-  return getFromTargetType(Target->getUnsignedPtrDiffType(LangAS::Default));
-}
-
 /// Return the unique type for "pid_t" defined in
 /// <sys/types.h>. We need this to compute the correct type for vfork().
 QualType ASTContext::getProcessIDType() const {
diff --git a/clang/lib/AST/FormatString.cpp b/clang/lib/AST/FormatString.cpp
index 5d3b56fc4e713..0c1fd33b56f25 100644
--- a/clang/lib/AST/FormatString.cpp
+++ b/clang/lib/AST/FormatString.cpp
@@ -11,6 +11,7 @@
 //
 //===----------------------------------------------------------------------===//
 
+#include "clang/AST/FormatString.h"
 #include "FormatStringParsing.h"
 #include "clang/Basic/LangOptions.h"
 #include "clang/Basic/TargetInfo.h"
@@ -320,6 +321,69 @@ bool clang::analyze_format_string::ParseUTF8InvalidSpecifier(
 // Methods on ArgType.
 //===----------------------------------------------------------------------===//
 
+static bool namedTypeToLengthModifierKind(QualType QT,
+                                          LengthModifier::Kind &K) {
+  for (/**/; const auto *TT = QT->getAs<TypedefType>();
+       QT = TT->getDecl()->getUnderlyingType()) {
+    StringRef Name = TT->getDecl()->getIdentifier()->getName();
+    if (Name == "size_t" || Name == "__size_t") {
+      K = LengthModifier::AsSizeT;
+      return true;
+    } else if (Name == "__signed_size_t" ||
+               Name == "ssize_t" /*Not C99, but common in Unix.*/) {
+      K = LengthModifier::AsSizeT;
+      return true;
+    } else if (Name == "ptrdiff_t" || Name == "__ptrdiff_t") {
+      K = LengthModifier::AsPtrDiff;
+      return true;
+    } else if (Name == "intmax_t") {
+      K = LengthModifier::AsIntMax;
+      return true;
+    } else if (Name == "uintmax_t") {
+      K = LengthModifier::AsIntMax;
+      return true;
+    }
+  }
+  return false;
+}
+
+// Check whether T and E are compatible size_t/ptrdiff_t typedefs. E must be
+// consistent with LE.
+// T is the type of the actual expression in the code to be checked, and E is
+// the expected type parsed from the format string.
+static clang::analyze_format_string::ArgType::MatchKind
+matchesSizeTPtrdiffT(ASTContext &C, QualType T, QualType E,
+                     LengthModifier::Kind LE) {
+  using Kind = LengthModifier::Kind;
+  using MatchKind = clang::analyze_format_string::ArgType::MatchKind;
+  assert(LE == Kind::AsPtrDiff || LE == Kind::AsSizeT);
+
+  if (!T->isIntegerType())
+    return MatchKind::NoMatch;
+
+  if (C.getCorrespondingSignedType(T.getCanonicalType()) !=
+      C.getCorrespondingSignedType(E.getCanonicalType()))
+    return MatchKind::NoMatch;
+
+  // signed size_t and unsigned ptrdiff_t does not have typedefs in C and C++.
+  if (LE == Kind::AsSizeT && E->isSignedIntegerType())
+    return T->isSignedIntegerType() ? MatchKind::Match
+                                    : MatchKind::NoMatchSignedness;
+
+  if (LE == LengthModifier::Kind::AsPtrDiff && E->isUnsignedIntegerType())
+    return T->isUnsignedIntegerType() ? MatchKind::Match
+                                      : MatchKind::NoMatchSignedness;
+
+  if (Kind Actual = Kind::None; namedTypeToLengthModifierKind(T, Actual)) {
+    if (Actual == LE)
+      return MatchKind::Match;
+    else if (Actual == Kind::AsPtrDiff || Actual == Kind::AsSizeT)
+      return MatchKind::NoMatchSignedness;
+  }
+
+  return MatchKind::NoMatch;
+}
+
 clang::analyze_format_string::ArgType::MatchKind
 ArgType::matchesType(ASTContext &C, QualType argTy) const {
   // When using the format attribute in C++, you can receive a function or an
@@ -394,6 +458,13 @@ ArgType::matchesType(ASTContext &C, QualType argTy) const {
     }
 
     case SpecificTy: {
+      if (TK != TypeKind::DontCare) {
+        return matchesSizeTPtrdiffT(C, argTy, T,
+                                    TK == TypeKind::SizeT
+                                        ? LengthModifier::Kind::AsSizeT
+                                        : LengthModifier::AsPtrDiff);
+      }
+
       if (const EnumType *ETy = argTy->getAs<EnumType>()) {
         // If the enum is incomplete we know nothing about the underlying type.
         // Assume that it's 'int'. Do not use the underlying type for a scoped
@@ -653,6 +724,18 @@ ArgType::matchesArgType(ASTContext &C, const ArgType &Other) const {
 
   if (Left.K == AK::SpecificTy) {
     if (Right.K == AK::SpecificTy) {
+      if (Left.TK != TypeKind::DontCare) {
+        return matchesSizeTPtrdiffT(C, Right.T, Left.T,
+                                    Left.TK == TypeKind::SizeT
+                                        ? LengthModifier::Kind::AsSizeT
+                                        : LengthModifier::AsPtrDiff);
+      } else if (Right.TK != TypeKind::DontCare) {
+        return matchesSizeTPtrdiffT(C, Left.T, Right.T,
+                                    Right.TK == TypeKind::SizeT
+                                        ? LengthModifier::Kind::AsSizeT
+                                        : LengthModifier::AsPtrDiff);
+      }
+
       auto Canon1 = C.getCanonicalType(Left.T);
       auto Canon2 = C.getCanonicalType(Right.T);
       if (Canon1 == Canon2)
@@ -1200,27 +1283,10 @@ FormatSpecifier::getCorrectedLengthModifier() const {
 
 bool FormatSpecifier::namedTypeToLengthModifier(QualType QT,
                                                 LengthModifier &LM) {
-  for (/**/; const auto *TT = QT->getAs<TypedefType>();
-       QT = TT->getDecl()->getUnderlyingType()) {
-    const TypedefNameDecl *Typedef = TT->getDecl();
-    const IdentifierInfo *Identifier = Typedef->getIdentifier();
-    if (Identifier->getName() == "size_t") {
-      LM.setKind(LengthModifier::AsSizeT);
-      return true;
-    } else if (Identifier->getName() == "ssize_t") {
-      // Not C99, but common in Unix.
-      LM.setKind(LengthModifier::AsSizeT);
-      return true;
-    } else if (Identifier->getName() == "intmax_t") {
-      LM.setKind(LengthModifier::AsIntMax);
-      return true;
-    } else if (Identifier->getName() == "uintmax_t") {
-      LM.setKind(LengthModifier::AsIntMax);
-      return true;
-    } else if (Identifier->getName() == "ptrdiff_t") {
-      LM.setKind(LengthModifier::AsPtrDiff);
-      return true;
-    }
+  if (LengthModifier::Kind Out = LengthModifier::Kind::None;
+      namedTypeToLengthModifierKind(QT, Out)) {
+    LM.setKind(Out);
+    return true;
   }
   return false;
 }
diff --git a/clang/lib/AST/PrintfFormatString.cpp b/clang/lib/AST/PrintfFormatString.cpp
index 293164ddac8f8..397a1d4c1172f 100644
--- a/clang/lib/AST/PrintfFormatString.cpp
+++ b/clang/lib/AST/PrintfFormatString.cpp
@@ -543,7 +543,8 @@ ArgType PrintfSpecifier::getScalarArgType(ASTContext &Ctx,
       case LengthModifier::AsIntMax:
         return ArgType(Ctx.getIntMaxType(), "intmax_t");
       case LengthModifier::AsSizeT:
-        return ArgType::makeSizeT(ArgType(Ctx.getSignedSizeType(), "ssize_t"));
+        return ArgType::makeSizeT(
+            ArgType(Ctx.getSignedSizeType(), "signed size_t"));
       case LengthModifier::AsInt3264:
         return Ctx.getTargetInfo().getTriple().isArch64Bit()
                    ? ArgType(Ctx.LongLongTy, "__int64")
@@ -626,9 +627,11 @@ ArgType PrintfSpecifier::getScalarArgType(ASTContext &Ctx,
       case LengthModifier::AsIntMax:
         return ArgType::PtrTo(ArgType(Ctx.getIntMaxType(), "intmax_t"));
       case LengthModifier::AsSizeT:
-        return ArgType::PtrTo(ArgType(Ctx.getSignedSizeType(), "ssize_t"));
+        return ArgType::PtrTo(ArgType::makeSizeT(
+            ArgType(Ctx.getSignedSizeType(), "signed size_t")));
       case LengthModifier::AsPtrDiff:
-        return ArgType::PtrTo(ArgType(Ctx.getPointerDiffType(), "ptrdiff_t"));
+        return ArgType::PtrTo(ArgType::makePtrdiffT(
+            ArgType(Ctx.getPointerDiffType(), "ptrdiff_t")));
       case LengthModifier::AsLongDouble:
         return ArgType(); // FIXME: Is this a known extension?
       case LengthModifier::AsAllocate:
diff --git a/clang/lib/AST/ScanfFormatString.cpp b/clang/lib/AST/ScanfFormatString.cpp
index 7ee21c8c61954..e3926185860db 100644
--- a/clang/lib/AST/ScanfFormatString.cpp
+++ b/clang/lib/AST/ScanfFormatString.cpp
@@ -251,9 +251,11 @@ ArgType ScanfSpecifier::getArgType(ASTContext &Ctx) const {
         case LengthModifier::AsIntMax:
           return ArgType::PtrTo(ArgType(Ctx.getIntMaxType(), "intmax_t"));
         case LengthModifier::AsSizeT:
-          return ArgType::PtrTo(ArgType(Ctx.getSignedSizeType(), "ssize_t"));
+          return ArgType::PtrTo(ArgType::makeSizeT(
+              ArgType(Ctx.getSignedSizeType(), "signed size_t")));
         case LengthModifier::AsPtrDiff:
-          return ArgType::PtrTo(ArgType(Ctx.getPointerDiffType(), "ptrdiff_t"));
+          return ArgType::PtrTo(ArgType::makePtrdiffT(
+              ArgType(Ctx.getPointerDiffType(), "ptrdiff_t")));
         case LengthModifier::AsLongDouble:
           // GNU extension.
           return ArgType::PtrTo(Ctx.LongLongTy);
@@ -292,10 +294,11 @@ ArgType ScanfSpecifier::getArgType(ASTContext &Ctx) const {
         case LengthModifier::AsIntMax:
           return ArgType::PtrTo(ArgType(Ctx.getUIntMaxType(), "uintmax_t"));
         case LengthModifier::AsSizeT:
-          return ArgType::PtrTo(ArgType(Ctx.getSizeType(), "size_t"));
-        case LengthModifier::AsPtrDiff:
           return ArgType::PtrTo(
-              ArgType(Ctx.getUnsignedPointerDiffType(), "unsigned ptrdiff_t"));
+              ArgType::makeSizeT(ArgType(Ctx.getSizeType(), "size_t")));
+        case LengthModifier::AsPtrDiff:
+          return ArgType::PtrTo(ArgType::makePtrdiffT(
+              ArgType(Ctx.getUnsignedPointerDiffType(), "unsigned ptrdiff_t")));
         case LengthModifier::AsLongDouble:
           // GNU extension.
           return ArgType::PtrTo(Ctx.UnsignedLongLongTy);
@@ -390,9 +393,11 @@ ArgType ScanfSpecifier::getArgType(ASTContext &Ctx) const {
         case LengthModifier::AsIntMax:
           return ArgType::PtrTo(ArgType(Ctx.getIntMaxType(), "intmax_t"));
         case LengthModifier::AsSizeT:
-          return ArgType::PtrTo(ArgType(Ctx.getSignedSizeType(), "ssize_t"));
+          return ArgType::PtrTo(ArgType::makeSizeT(
+              ArgType(Ctx.getSignedSizeType(), "signed size_t")));
         case LengthModifier::AsPtrDiff:
-          return ArgType::PtrTo(ArgType(Ctx.getPointerDiffType(), "ptrdiff_t"));
+          return ArgType::PtrTo(ArgType::makePtrdiffT(
+              ArgType(Ctx.getPointerDiffType(), "ptrdiff_t")));
         case LengthModifier::AsLongDouble:
           return ArgType(); // FIXME: Is this a known extension?
         case LengthModifier::AsAllocate:
diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index 46a5d64412275..3ff2597d65e54 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -223,7 +223,8 @@ static void appendParameterTypes(
   for (unsigned I = 0, E = FPT->getNumParams(); I != E; ++I) {
     prefix.push_back(FPT->getParamType(I));
     if (ExtInfos[I].hasPassObjectSize())
-      prefix.push_back(CGT.getContext().getSizeType());
+      prefix.push_back(
+          CGT.getContext().getSizeType()->getCanonicalTypeUnqualified());
   }
 
   addExtParameterInfosForCall(paramInfos, FPT.getTypePtr(), PrefixSize,
diff --git a/clang/lib/CodeGen/CGCoroutine.cpp b/clang/lib/CodeGen/CGCoroutine.cpp
index 0fc488e98aaf0..265dedf228e69 100644
--- a/clang/lib/CodeGen/CGCoroutine.cpp
+++ b/clang/lib/CodeGen/CGCoroutine.cpp
@@ -1002,14 +1002,14 @@ RValue CodeGenFunction::EmitCoroutineIntrinsic(const CallExpr *E,
   }
   case llvm::Intrinsic::coro_size: {
     auto &Context = getContext();
-    CanQualType SizeTy = Context.getSizeType();
+    CanQualType SizeTy = Context.getSizeType()->getCanonicalTypeUnqualified();
     llvm::IntegerType *T = Builder.getIntNTy(Context.getTypeSize(SizeTy));
     llvm::Function *F = CGM.getIntrinsic(llvm::Intrinsic::coro_size, T);
     return RValue::get(Builder.CreateCall(F));
   }
   case llvm::Intrinsic::coro_align: {
     auto &Context = getContext();
-    CanQualType SizeTy = Context.getSizeType();
+    CanQualType SizeTy = Context.getSizeType()->getCanonicalTypeUnqualified();
     llvm::IntegerType *T = Builder.getIntNTy(Context.getTypeSize(SizeTy));
     llvm::Function *F = CGM.getIntrinsic(llvm::Intrinsic::coro_align, T);
     return RValue::get(Builder.CreateCall(F));
diff --git a/clang/lib/CodeGen/CGObjCMac.cpp b/clang/lib/CodeGen/CGObjCMac.cpp
index 1c23a8b4db918..5a0d2a2286bac 100644
--- a/clang/lib/CodeGen/CGObjCMac.cpp
+++ b/clang/lib/CodeGen/CGObjCMac.cpp
@@ -285,7 +285,7 @@ class ObjCCommonTypesHelper {
     SmallVector<CanQualType, 5> Params;
     Params.push_back(Ctx.VoidPtrTy);
     Params.push_back(Ctx.VoidPtrTy);
-    Params.push_back(Ctx.getSizeType());
+    Params.push_back(Ctx.getSizeType()->getCanonicalTypeUnqualified());
     Params.push_back(Ctx.BoolTy);
     Params.push_back(Ctx.BoolTy);
     llvm::FunctionType *FTy = Types.GetFunctionType(
diff --git a/clang/lib/Sema/SemaChecking.cpp b/clang/lib/Sema/SemaChecking.cpp
index 8f8e1ceb7197e..9a0d824a26ae6 100644
--- a/clang/lib/Sema/SemaChecking.cpp
+++ b/clang/lib/Sema/SemaChecking.cpp
@@ -5131,7 +5131,7 @@ bool Sema::BuiltinVAStartARMMicrosoft(CallExpr *Call) {
         << 3                                      /* parameter mismatch */
         << 2 << Arg1->getType() << ConstCharPtrTy;
 
-  const QualType SizeTy = Context.getSizeType();
+  const QualType SizeTy = Context.getSizeTyp...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Jun 11, 2025

@llvm/pr-subscribers-coroutines

Author: YexuanXiao (YexuanXiao)

Changes

Includeing the results of sizeof, sizeof..., __datasizeof, __alignof, _Alignof, alignof, _Countof, size_t literals, and signed size_t literals, as well as the results of pointer-pointer subtraction. The goal is to enable clang and downstream tools such as clangd and clang-tidy to provide more portable hints and diagnostics.

The previous discussion can be found at #136542.

It was implemented by injecting __size_t, __signed_size_t, and __ptrdiff_t into the AST. Additionally, checks for the z and j format specifiers in format strings for scanf and printf were added.

Several code assume that SizeType is canonical and must remain canonical, so I converted SizeType to its canonical form. Extensive testing of the modifications indicates that it works very well (aside from the unsightly double underscores).

The test CodeGen/cfi-unrelated-cast.cpp could not be fixed because I am unfamiliar with LLVM IR. The tests Modules/new-delete.cpp, PCH/cxx-exprs.cpp, PCH/cxx1z-aligned-alloc.cpp, SemaCXX/delete.cpp, and OpenMP/declare_target_codegen.cpp reported ambiguity issues with new and delete expressions. Since I have no clue how to resolve them, I was unable to fix these tests. I would be very grateful if someone could fix them.


Patch is 325.96 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/143653.diff

56 Files Affected:

  • (modified) clang/include/clang/AST/ASTContext.h (+19-11)
  • (modified) clang/lib/AST/ASTContext.cpp (+50-17)
  • (modified) clang/lib/AST/FormatString.cpp (+87-21)
  • (modified) clang/lib/AST/PrintfFormatString.cpp (+6-3)
  • (modified) clang/lib/AST/ScanfFormatString.cpp (+12-7)
  • (modified) clang/lib/CodeGen/CGCall.cpp (+2-1)
  • (modified) clang/lib/CodeGen/CGCoroutine.cpp (+2-2)
  • (modified) clang/lib/CodeGen/CGObjCMac.cpp (+1-1)
  • (modified) clang/lib/Sema/SemaChecking.cpp (+1-1)
  • (modified) clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp (+44-36)
  • (modified) clang/lib/StaticAnalyzer/Checkers/VLASizeChecker.cpp (+1-1)
  • (modified) clang/test/AST/ast-dump-array.cpp (+1-1)
  • (modified) clang/test/AST/ast-dump-expr-json.c (+9-3)
  • (modified) clang/test/AST/ast-dump-expr-json.cpp (+18-10)
  • (modified) clang/test/AST/ast-dump-expr.c (+3-3)
  • (modified) clang/test/AST/ast-dump-expr.cpp (+8-8)
  • (modified) clang/test/AST/ast-dump-openmp-distribute-parallel-for-simd.c (+10-10)
  • (modified) clang/test/AST/ast-dump-openmp-distribute-parallel-for.c (+10-10)
  • (modified) clang/test/AST/ast-dump-openmp-target-teams-distribute-parallel-for-simd.c (+80-80)
  • (modified) clang/test/AST/ast-dump-openmp-target-teams-distribute-parallel-for.c (+80-80)
  • (modified) clang/test/AST/ast-dump-openmp-teams-distribute-parallel-for-simd.c (+80-80)
  • (modified) clang/test/AST/ast-dump-openmp-teams-distribute-parallel-for.c (+80-80)
  • (modified) clang/test/AST/ast-dump-recovery.c (+1-1)
  • (modified) clang/test/AST/ast-dump-stmt-json.cpp (+58-28)
  • (modified) clang/test/AST/ast-dump-stmt.cpp (+2-2)
  • (modified) clang/test/AST/ast-dump-traits.cpp (+4-4)
  • (modified) clang/test/AST/ast-dump-types-errors-json.cpp (+3-1)
  • (modified) clang/test/Analysis/cfg.cpp (+1-1)
  • (modified) clang/test/Analysis/explain-svals.cpp (+1-1)
  • (modified) clang/test/Analysis/std-c-library-functions-arg-weakdeps.c (+1-1)
  • (modified) clang/test/Analysis/std-c-library-functions-lookup.c (+1-1)
  • (modified) clang/test/Analysis/std-c-library-functions-vs-stream-checker.c (+2-2)
  • (modified) clang/test/Analysis/std-c-library-functions.c (+2-2)
  • (modified) clang/test/CXX/drs/cwg2xx.cpp (+1-1)
  • (modified) clang/test/CXX/lex/lex.literal/lex.ext/p2.cpp (+5-5)
  • (modified) clang/test/CXX/lex/lex.literal/lex.ext/p5.cpp (+3-3)
  • (modified) clang/test/CXX/lex/lex.literal/lex.ext/p7.cpp (+1-1)
  • (modified) clang/test/FixIt/fixit-format-ios-nopedantic.m (+1-1)
  • (modified) clang/test/FixIt/format.m (+3-3)
  • (modified) clang/test/Sema/format-strings-fixit-ssize_t.c (+1-1)
  • (modified) clang/test/Sema/format-strings-int-typedefs.c (+6-6)
  • (modified) clang/test/Sema/format-strings-scanf.c (+4-4)
  • (modified) clang/test/Sema/format-strings-size_t.c (+6-7)
  • (modified) clang/test/Sema/matrix-type-builtins.c (+4-4)
  • (modified) clang/test/Sema/ptrauth-atomic-ops.c (+1-1)
  • (modified) clang/test/Sema/ptrauth.c (+1-1)
  • (modified) clang/test/SemaCXX/cxx2c-trivially-relocatable.cpp (+1-1)
  • (modified) clang/test/SemaCXX/enum-scoped.cpp (+2-2)
  • (modified) clang/test/SemaCXX/new-delete.cpp (+1-1)
  • (modified) clang/test/SemaCXX/static-assert-cxx26.cpp (+7-7)
  • (modified) clang/test/SemaCXX/type-aware-new-delete-basic-free-declarations.cpp (+1-1)
  • (modified) clang/test/SemaCXX/unavailable_aligned_allocation.cpp (+12-12)
  • (modified) clang/test/SemaObjC/format-size-spec-nsinteger.m (+5-12)
  • (modified) clang/test/SemaObjC/matrix-type-builtins.m (+1-1)
  • (modified) clang/test/SemaOpenCL/cl20-device-side-enqueue.cl (+3-3)
  • (modified) clang/test/SemaTemplate/type_pack_element.cpp (+6-6)
diff --git a/clang/include/clang/AST/ASTContext.h b/clang/include/clang/AST/ASTContext.h
index 8d24d393eab09..bd4600e479b1b 100644
--- a/clang/include/clang/AST/ASTContext.h
+++ b/clang/include/clang/AST/ASTContext.h
@@ -25,6 +25,7 @@
 #include "clang/AST/RawCommentList.h"
 #include "clang/AST/SYCLKernelInfo.h"
 #include "clang/AST/TemplateName.h"
+#include "clang/AST/Type.h"
 #include "clang/Basic/LLVM.h"
 #include "clang/Basic/PartialDiagnostic.h"
 #include "clang/Basic/SourceLocation.h"
@@ -1952,6 +1953,13 @@ class ASTContext : public RefCountedBase<ASTContext> {
                                                         bool IsDependent,
                                                         QualType Canon) const;
 
+  // The core language uses these types as the result types of some expressions,
+  // which are typically standard integer types and consistent with it's
+  // typedefs (if any). These variables store the typedefs generated in the AST,
+  // not the typedefs provided in the header files.
+  mutable QualType SizeType;       // __size_t
+  mutable QualType SignedSizeType; // __signed_size_t
+  mutable QualType PtrdiffType;    // __ptrdiff_t
 public:
   /// Return the unique reference to the type for the specified TagDecl
   /// (struct/union/class/enum) decl.
@@ -1961,11 +1969,20 @@ class ASTContext : public RefCountedBase<ASTContext> {
   /// <stddef.h>.
   ///
   /// The sizeof operator requires this (C99 6.5.3.4p4).
-  CanQualType getSizeType() const;
+  QualType getSizeType() const;
 
   /// Return the unique signed counterpart of
   /// the integer type corresponding to size_t.
-  CanQualType getSignedSizeType() const;
+  QualType getSignedSizeType() const;
+
+  /// Return the unique type for "ptrdiff_t" (C99 7.17) defined in
+  /// <stddef.h>. Pointer - pointer requires this (C99 6.5.6p9).
+  QualType getPointerDiffType() const;
+
+  /// Return the unique unsigned counterpart of "ptrdiff_t"
+  /// integer type. The standard (C11 7.21.6.1p7) refers to this type
+  /// in the definition of %tu format specifier.
+  QualType getUnsignedPointerDiffType() const;
 
   /// Return the unique type for "intmax_t" (C99 7.18.1.5), defined in
   /// <stdint.h>.
@@ -2006,15 +2023,6 @@ class ASTContext : public RefCountedBase<ASTContext> {
   /// as defined by the target.
   QualType getUIntPtrType() const;
 
-  /// Return the unique type for "ptrdiff_t" (C99 7.17) defined in
-  /// <stddef.h>. Pointer - pointer requires this (C99 6.5.6p9).
-  QualType getPointerDiffType() const;
-
-  /// Return the unique unsigned counterpart of "ptrdiff_t"
-  /// integer type. The standard (C11 7.21.6.1p7) refers to this type
-  /// in the definition of %tu format specifier.
-  QualType getUnsignedPointerDiffType() const;
-
   /// Return the unique type for "pid_t" defined in
   /// <sys/types.h>. We need this to compute the correct type for vfork().
   QualType getProcessIDType() const;
diff --git a/clang/lib/AST/ASTContext.cpp b/clang/lib/AST/ASTContext.cpp
index 45f9602856840..00f8f87466273 100644
--- a/clang/lib/AST/ASTContext.cpp
+++ b/clang/lib/AST/ASTContext.cpp
@@ -6726,17 +6726,63 @@ QualType ASTContext::getTagDeclType(const TagDecl *Decl) const {
   return getTypeDeclType(const_cast<TagDecl*>(Decl));
 }
 
+// Inject __size_t, __signed_size_t, and __ptrdiff_t to provide portable hints
+// and diagnostics. In C and C++, expressions of type size_t can be obtained via
+// the sizeof operator, expressions of type ptrdiff_t via pointer subtraction,
+// and expressions of type signed size_t via the z literal suffix (since C++23).
+// However, no core language mechanism directly produces an expression of type
+// unsigned ptrdiff_t. The unsigned ptrdiff_t type is solely required by format
+// specifiers for printf and scanf. Consequently, no expression's type needs to
+// be displayed as unsigned ptrdiff_t. Verification of whether a type is
+// unsigned ptrdiff_t is also unnecessary, as no corresponding typedefs exist.
+// Therefore, injecting a typedef for signed ptrdiff_t is not required.
+
 /// getSizeType - Return the unique type for "size_t" (C99 7.17), the result
 /// of the sizeof operator (C99 6.5.3.4p4). The value is target dependent and
 /// needs to agree with the definition in <stddef.h>.
-CanQualType ASTContext::getSizeType() const {
-  return getFromTargetType(Target->getSizeType());
+QualType ASTContext::getSizeType() const {
+  if (SizeType.isNull()) {
+    if (auto const &LO = getLangOpts(); !LO.HLSL && (LO.C99 || LO.CPlusPlus))
+      SizeType = getTypedefType(buildImplicitTypedef(
+          getFromTargetType(Target->getSizeType()), "__size_t"));
+    else
+      SizeType = getFromTargetType(Target->getSizeType());
+  }
+  return SizeType;
 }
 
 /// Return the unique signed counterpart of the integer type
 /// corresponding to size_t.
-CanQualType ASTContext::getSignedSizeType() const {
-  return getFromTargetType(Target->getSignedSizeType());
+QualType ASTContext::getSignedSizeType() const {
+  if (SignedSizeType.isNull()) {
+    if (auto const &LO = getLangOpts(); !LO.HLSL && (LO.C99 || LO.CPlusPlus))
+      SignedSizeType = getTypedefType(buildImplicitTypedef(
+          getFromTargetType(Target->getSignedSizeType()), "__signed_size_t"));
+    else
+      SignedSizeType = getFromTargetType(Target->getSignedSizeType());
+  }
+  return SignedSizeType;
+}
+
+/// getPointerDiffType - Return the unique type for "ptrdiff_t" (C99 7.17)
+/// defined in <stddef.h>. Pointer - pointer requires this (C99 6.5.6p9).
+QualType ASTContext::getPointerDiffType() const {
+  if (PtrdiffType.isNull()) {
+    if (auto const &LO = getLangOpts(); !LO.HLSL && (LO.C99 || LO.CPlusPlus))
+      PtrdiffType = getTypedefType(buildImplicitTypedef(
+          getFromTargetType(Target->getPtrDiffType(LangAS::Default)),
+          "__ptrdiff_t"));
+    else
+      PtrdiffType = getFromTargetType(Target->getPtrDiffType(LangAS::Default));
+  }
+  return PtrdiffType;
+}
+
+/// Return the unique unsigned counterpart of "ptrdiff_t"
+/// integer type. The standard (C11 7.21.6.1p7) refers to this type
+/// in the definition of %tu format specifier.
+QualType ASTContext::getUnsignedPointerDiffType() const {
+  return getFromTargetType(Target->getUnsignedPtrDiffType(LangAS::Default));
 }
 
 /// getIntMaxType - Return the unique type for "intmax_t" (C99 7.18.1.5).
@@ -6771,19 +6817,6 @@ QualType ASTContext::getUIntPtrType() const {
   return getCorrespondingUnsignedType(getIntPtrType());
 }
 
-/// getPointerDiffType - Return the unique type for "ptrdiff_t" (C99 7.17)
-/// defined in <stddef.h>. Pointer - pointer requires this (C99 6.5.6p9).
-QualType ASTContext::getPointerDiffType() const {
-  return getFromTargetType(Target->getPtrDiffType(LangAS::Default));
-}
-
-/// Return the unique unsigned counterpart of "ptrdiff_t"
-/// integer type. The standard (C11 7.21.6.1p7) refers to this type
-/// in the definition of %tu format specifier.
-QualType ASTContext::getUnsignedPointerDiffType() const {
-  return getFromTargetType(Target->getUnsignedPtrDiffType(LangAS::Default));
-}
-
 /// Return the unique type for "pid_t" defined in
 /// <sys/types.h>. We need this to compute the correct type for vfork().
 QualType ASTContext::getProcessIDType() const {
diff --git a/clang/lib/AST/FormatString.cpp b/clang/lib/AST/FormatString.cpp
index 5d3b56fc4e713..0c1fd33b56f25 100644
--- a/clang/lib/AST/FormatString.cpp
+++ b/clang/lib/AST/FormatString.cpp
@@ -11,6 +11,7 @@
 //
 //===----------------------------------------------------------------------===//
 
+#include "clang/AST/FormatString.h"
 #include "FormatStringParsing.h"
 #include "clang/Basic/LangOptions.h"
 #include "clang/Basic/TargetInfo.h"
@@ -320,6 +321,69 @@ bool clang::analyze_format_string::ParseUTF8InvalidSpecifier(
 // Methods on ArgType.
 //===----------------------------------------------------------------------===//
 
+static bool namedTypeToLengthModifierKind(QualType QT,
+                                          LengthModifier::Kind &K) {
+  for (/**/; const auto *TT = QT->getAs<TypedefType>();
+       QT = TT->getDecl()->getUnderlyingType()) {
+    StringRef Name = TT->getDecl()->getIdentifier()->getName();
+    if (Name == "size_t" || Name == "__size_t") {
+      K = LengthModifier::AsSizeT;
+      return true;
+    } else if (Name == "__signed_size_t" ||
+               Name == "ssize_t" /*Not C99, but common in Unix.*/) {
+      K = LengthModifier::AsSizeT;
+      return true;
+    } else if (Name == "ptrdiff_t" || Name == "__ptrdiff_t") {
+      K = LengthModifier::AsPtrDiff;
+      return true;
+    } else if (Name == "intmax_t") {
+      K = LengthModifier::AsIntMax;
+      return true;
+    } else if (Name == "uintmax_t") {
+      K = LengthModifier::AsIntMax;
+      return true;
+    }
+  }
+  return false;
+}
+
+// Check whether T and E are compatible size_t/ptrdiff_t typedefs. E must be
+// consistent with LE.
+// T is the type of the actual expression in the code to be checked, and E is
+// the expected type parsed from the format string.
+static clang::analyze_format_string::ArgType::MatchKind
+matchesSizeTPtrdiffT(ASTContext &C, QualType T, QualType E,
+                     LengthModifier::Kind LE) {
+  using Kind = LengthModifier::Kind;
+  using MatchKind = clang::analyze_format_string::ArgType::MatchKind;
+  assert(LE == Kind::AsPtrDiff || LE == Kind::AsSizeT);
+
+  if (!T->isIntegerType())
+    return MatchKind::NoMatch;
+
+  if (C.getCorrespondingSignedType(T.getCanonicalType()) !=
+      C.getCorrespondingSignedType(E.getCanonicalType()))
+    return MatchKind::NoMatch;
+
+  // signed size_t and unsigned ptrdiff_t does not have typedefs in C and C++.
+  if (LE == Kind::AsSizeT && E->isSignedIntegerType())
+    return T->isSignedIntegerType() ? MatchKind::Match
+                                    : MatchKind::NoMatchSignedness;
+
+  if (LE == LengthModifier::Kind::AsPtrDiff && E->isUnsignedIntegerType())
+    return T->isUnsignedIntegerType() ? MatchKind::Match
+                                      : MatchKind::NoMatchSignedness;
+
+  if (Kind Actual = Kind::None; namedTypeToLengthModifierKind(T, Actual)) {
+    if (Actual == LE)
+      return MatchKind::Match;
+    else if (Actual == Kind::AsPtrDiff || Actual == Kind::AsSizeT)
+      return MatchKind::NoMatchSignedness;
+  }
+
+  return MatchKind::NoMatch;
+}
+
 clang::analyze_format_string::ArgType::MatchKind
 ArgType::matchesType(ASTContext &C, QualType argTy) const {
   // When using the format attribute in C++, you can receive a function or an
@@ -394,6 +458,13 @@ ArgType::matchesType(ASTContext &C, QualType argTy) const {
     }
 
     case SpecificTy: {
+      if (TK != TypeKind::DontCare) {
+        return matchesSizeTPtrdiffT(C, argTy, T,
+                                    TK == TypeKind::SizeT
+                                        ? LengthModifier::Kind::AsSizeT
+                                        : LengthModifier::AsPtrDiff);
+      }
+
       if (const EnumType *ETy = argTy->getAs<EnumType>()) {
         // If the enum is incomplete we know nothing about the underlying type.
         // Assume that it's 'int'. Do not use the underlying type for a scoped
@@ -653,6 +724,18 @@ ArgType::matchesArgType(ASTContext &C, const ArgType &Other) const {
 
   if (Left.K == AK::SpecificTy) {
     if (Right.K == AK::SpecificTy) {
+      if (Left.TK != TypeKind::DontCare) {
+        return matchesSizeTPtrdiffT(C, Right.T, Left.T,
+                                    Left.TK == TypeKind::SizeT
+                                        ? LengthModifier::Kind::AsSizeT
+                                        : LengthModifier::AsPtrDiff);
+      } else if (Right.TK != TypeKind::DontCare) {
+        return matchesSizeTPtrdiffT(C, Left.T, Right.T,
+                                    Right.TK == TypeKind::SizeT
+                                        ? LengthModifier::Kind::AsSizeT
+                                        : LengthModifier::AsPtrDiff);
+      }
+
       auto Canon1 = C.getCanonicalType(Left.T);
       auto Canon2 = C.getCanonicalType(Right.T);
       if (Canon1 == Canon2)
@@ -1200,27 +1283,10 @@ FormatSpecifier::getCorrectedLengthModifier() const {
 
 bool FormatSpecifier::namedTypeToLengthModifier(QualType QT,
                                                 LengthModifier &LM) {
-  for (/**/; const auto *TT = QT->getAs<TypedefType>();
-       QT = TT->getDecl()->getUnderlyingType()) {
-    const TypedefNameDecl *Typedef = TT->getDecl();
-    const IdentifierInfo *Identifier = Typedef->getIdentifier();
-    if (Identifier->getName() == "size_t") {
-      LM.setKind(LengthModifier::AsSizeT);
-      return true;
-    } else if (Identifier->getName() == "ssize_t") {
-      // Not C99, but common in Unix.
-      LM.setKind(LengthModifier::AsSizeT);
-      return true;
-    } else if (Identifier->getName() == "intmax_t") {
-      LM.setKind(LengthModifier::AsIntMax);
-      return true;
-    } else if (Identifier->getName() == "uintmax_t") {
-      LM.setKind(LengthModifier::AsIntMax);
-      return true;
-    } else if (Identifier->getName() == "ptrdiff_t") {
-      LM.setKind(LengthModifier::AsPtrDiff);
-      return true;
-    }
+  if (LengthModifier::Kind Out = LengthModifier::Kind::None;
+      namedTypeToLengthModifierKind(QT, Out)) {
+    LM.setKind(Out);
+    return true;
   }
   return false;
 }
diff --git a/clang/lib/AST/PrintfFormatString.cpp b/clang/lib/AST/PrintfFormatString.cpp
index 293164ddac8f8..397a1d4c1172f 100644
--- a/clang/lib/AST/PrintfFormatString.cpp
+++ b/clang/lib/AST/PrintfFormatString.cpp
@@ -543,7 +543,8 @@ ArgType PrintfSpecifier::getScalarArgType(ASTContext &Ctx,
       case LengthModifier::AsIntMax:
         return ArgType(Ctx.getIntMaxType(), "intmax_t");
       case LengthModifier::AsSizeT:
-        return ArgType::makeSizeT(ArgType(Ctx.getSignedSizeType(), "ssize_t"));
+        return ArgType::makeSizeT(
+            ArgType(Ctx.getSignedSizeType(), "signed size_t"));
       case LengthModifier::AsInt3264:
         return Ctx.getTargetInfo().getTriple().isArch64Bit()
                    ? ArgType(Ctx.LongLongTy, "__int64")
@@ -626,9 +627,11 @@ ArgType PrintfSpecifier::getScalarArgType(ASTContext &Ctx,
       case LengthModifier::AsIntMax:
         return ArgType::PtrTo(ArgType(Ctx.getIntMaxType(), "intmax_t"));
       case LengthModifier::AsSizeT:
-        return ArgType::PtrTo(ArgType(Ctx.getSignedSizeType(), "ssize_t"));
+        return ArgType::PtrTo(ArgType::makeSizeT(
+            ArgType(Ctx.getSignedSizeType(), "signed size_t")));
       case LengthModifier::AsPtrDiff:
-        return ArgType::PtrTo(ArgType(Ctx.getPointerDiffType(), "ptrdiff_t"));
+        return ArgType::PtrTo(ArgType::makePtrdiffT(
+            ArgType(Ctx.getPointerDiffType(), "ptrdiff_t")));
       case LengthModifier::AsLongDouble:
         return ArgType(); // FIXME: Is this a known extension?
       case LengthModifier::AsAllocate:
diff --git a/clang/lib/AST/ScanfFormatString.cpp b/clang/lib/AST/ScanfFormatString.cpp
index 7ee21c8c61954..e3926185860db 100644
--- a/clang/lib/AST/ScanfFormatString.cpp
+++ b/clang/lib/AST/ScanfFormatString.cpp
@@ -251,9 +251,11 @@ ArgType ScanfSpecifier::getArgType(ASTContext &Ctx) const {
         case LengthModifier::AsIntMax:
           return ArgType::PtrTo(ArgType(Ctx.getIntMaxType(), "intmax_t"));
         case LengthModifier::AsSizeT:
-          return ArgType::PtrTo(ArgType(Ctx.getSignedSizeType(), "ssize_t"));
+          return ArgType::PtrTo(ArgType::makeSizeT(
+              ArgType(Ctx.getSignedSizeType(), "signed size_t")));
         case LengthModifier::AsPtrDiff:
-          return ArgType::PtrTo(ArgType(Ctx.getPointerDiffType(), "ptrdiff_t"));
+          return ArgType::PtrTo(ArgType::makePtrdiffT(
+              ArgType(Ctx.getPointerDiffType(), "ptrdiff_t")));
         case LengthModifier::AsLongDouble:
           // GNU extension.
           return ArgType::PtrTo(Ctx.LongLongTy);
@@ -292,10 +294,11 @@ ArgType ScanfSpecifier::getArgType(ASTContext &Ctx) const {
         case LengthModifier::AsIntMax:
           return ArgType::PtrTo(ArgType(Ctx.getUIntMaxType(), "uintmax_t"));
         case LengthModifier::AsSizeT:
-          return ArgType::PtrTo(ArgType(Ctx.getSizeType(), "size_t"));
-        case LengthModifier::AsPtrDiff:
           return ArgType::PtrTo(
-              ArgType(Ctx.getUnsignedPointerDiffType(), "unsigned ptrdiff_t"));
+              ArgType::makeSizeT(ArgType(Ctx.getSizeType(), "size_t")));
+        case LengthModifier::AsPtrDiff:
+          return ArgType::PtrTo(ArgType::makePtrdiffT(
+              ArgType(Ctx.getUnsignedPointerDiffType(), "unsigned ptrdiff_t")));
         case LengthModifier::AsLongDouble:
           // GNU extension.
           return ArgType::PtrTo(Ctx.UnsignedLongLongTy);
@@ -390,9 +393,11 @@ ArgType ScanfSpecifier::getArgType(ASTContext &Ctx) const {
         case LengthModifier::AsIntMax:
           return ArgType::PtrTo(ArgType(Ctx.getIntMaxType(), "intmax_t"));
         case LengthModifier::AsSizeT:
-          return ArgType::PtrTo(ArgType(Ctx.getSignedSizeType(), "ssize_t"));
+          return ArgType::PtrTo(ArgType::makeSizeT(
+              ArgType(Ctx.getSignedSizeType(), "signed size_t")));
         case LengthModifier::AsPtrDiff:
-          return ArgType::PtrTo(ArgType(Ctx.getPointerDiffType(), "ptrdiff_t"));
+          return ArgType::PtrTo(ArgType::makePtrdiffT(
+              ArgType(Ctx.getPointerDiffType(), "ptrdiff_t")));
         case LengthModifier::AsLongDouble:
           return ArgType(); // FIXME: Is this a known extension?
         case LengthModifier::AsAllocate:
diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index 46a5d64412275..3ff2597d65e54 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -223,7 +223,8 @@ static void appendParameterTypes(
   for (unsigned I = 0, E = FPT->getNumParams(); I != E; ++I) {
     prefix.push_back(FPT->getParamType(I));
     if (ExtInfos[I].hasPassObjectSize())
-      prefix.push_back(CGT.getContext().getSizeType());
+      prefix.push_back(
+          CGT.getContext().getSizeType()->getCanonicalTypeUnqualified());
   }
 
   addExtParameterInfosForCall(paramInfos, FPT.getTypePtr(), PrefixSize,
diff --git a/clang/lib/CodeGen/CGCoroutine.cpp b/clang/lib/CodeGen/CGCoroutine.cpp
index 0fc488e98aaf0..265dedf228e69 100644
--- a/clang/lib/CodeGen/CGCoroutine.cpp
+++ b/clang/lib/CodeGen/CGCoroutine.cpp
@@ -1002,14 +1002,14 @@ RValue CodeGenFunction::EmitCoroutineIntrinsic(const CallExpr *E,
   }
   case llvm::Intrinsic::coro_size: {
     auto &Context = getContext();
-    CanQualType SizeTy = Context.getSizeType();
+    CanQualType SizeTy = Context.getSizeType()->getCanonicalTypeUnqualified();
     llvm::IntegerType *T = Builder.getIntNTy(Context.getTypeSize(SizeTy));
     llvm::Function *F = CGM.getIntrinsic(llvm::Intrinsic::coro_size, T);
     return RValue::get(Builder.CreateCall(F));
   }
   case llvm::Intrinsic::coro_align: {
     auto &Context = getContext();
-    CanQualType SizeTy = Context.getSizeType();
+    CanQualType SizeTy = Context.getSizeType()->getCanonicalTypeUnqualified();
     llvm::IntegerType *T = Builder.getIntNTy(Context.getTypeSize(SizeTy));
     llvm::Function *F = CGM.getIntrinsic(llvm::Intrinsic::coro_align, T);
     return RValue::get(Builder.CreateCall(F));
diff --git a/clang/lib/CodeGen/CGObjCMac.cpp b/clang/lib/CodeGen/CGObjCMac.cpp
index 1c23a8b4db918..5a0d2a2286bac 100644
--- a/clang/lib/CodeGen/CGObjCMac.cpp
+++ b/clang/lib/CodeGen/CGObjCMac.cpp
@@ -285,7 +285,7 @@ class ObjCCommonTypesHelper {
     SmallVector<CanQualType, 5> Params;
     Params.push_back(Ctx.VoidPtrTy);
     Params.push_back(Ctx.VoidPtrTy);
-    Params.push_back(Ctx.getSizeType());
+    Params.push_back(Ctx.getSizeType()->getCanonicalTypeUnqualified());
     Params.push_back(Ctx.BoolTy);
     Params.push_back(Ctx.BoolTy);
     llvm::FunctionType *FTy = Types.GetFunctionType(
diff --git a/clang/lib/Sema/SemaChecking.cpp b/clang/lib/Sema/SemaChecking.cpp
index 8f8e1ceb7197e..9a0d824a26ae6 100644
--- a/clang/lib/Sema/SemaChecking.cpp
+++ b/clang/lib/Sema/SemaChecking.cpp
@@ -5131,7 +5131,7 @@ bool Sema::BuiltinVAStartARMMicrosoft(CallExpr *Call) {
         << 3                                      /* parameter mismatch */
         << 2 << Arg1->getType() << ConstCharPtrTy;
 
-  const QualType SizeTy = Context.getSizeType();
+  const QualType SizeTy = Context.getSizeTyp...
[truncated]

Copy link

github-actions bot commented Jun 12, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@YexuanXiao YexuanXiao changed the title [Clang] Make the result type of sizeof/pointer subtraction/size_t literals be typedefs instead of built-in types [Clang] Make the result type of sizeof/pointer subtraction/size_t literals be a sugar types instead of built-in types Jun 14, 2025
@YexuanXiao YexuanXiao changed the title [Clang] Make the result type of sizeof/pointer subtraction/size_t literals be a sugar types instead of built-in types [Clang] Make the result type of sizeof/pointer subtraction/size_t literals be sugar types instead of built-in types Jun 14, 2025
@YexuanXiao YexuanXiao changed the title [Clang] Make the result type of sizeof/pointer subtraction/size_t literals be sugar types instead of built-in types [Clang] Make the SizeType, SignedSizeType and PtrdiffType be sugar types instead of built-in types Jun 14, 2025
@YexuanXiao YexuanXiao changed the title [Clang] Make the SizeType, SignedSizeType and PtrdiffType be sugar types instead of built-in types [Clang] Make the SizeType, SignedSizeType and PtrdiffType be named sugar types instead of built-in types Jun 14, 2025
Copy link
Contributor

@mizvekov mizvekov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!

I have left a small review, but since I am traveling to the WG21 meeting, I can't look much into it for the next couple of weeks.

Also, please try this on the llvm compile time tracker, and take a look at any changes to the amount of AST nodes created when compiling some complex test case (there is a compiler option to print this, but I can't remember the spelling right now).

We want to make sure this doesn't regress performance.

@YexuanXiao YexuanXiao requested review from mizvekov and Endilll June 14, 2025 10:49
@YexuanXiao
Copy link
Author

YexuanXiao commented Jun 18, 2025

CI shows that it passed all tests on Linux, but there were 4 tests failed on Windows, which seem unrelated to this PR.

Copy link
Contributor

@mizvekov mizvekov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks good!

It would be awesome to use these in existing cases which we handle with a builtin typedef, as this would be helpful to validate the design, but this is not required and doesn't need to be part of this PR.

/// The "size_t" type.
SizeT,

/// The "signed size_t" type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// The "signed size_t" type.
/// The "ssize_t" type.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@frederick-vs-ja mentioned in the comment above that POSIX does not require the typedef ssize_t to be the signed version of size_t. At the same time, the C and C++ standards also do not require the typedef ptrdiff_t to be the signed version of size_t. I am not sure whether there actually exist platforms with such inconsistencies, but perhaps it should not be confused?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well that's another good reason, but I was more worried about the inconsistency, and the fact that this is not even a valid type spelling: 'signed' is a specifier, not a qualifier, and it can't be attached to a typedef name.


Kind getKind() const { return Kind(PredefinedSugarTypeBits.Kind); }

StringRef getName() const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worthwhile to consider returning an IdentifierInfo. These are uniqued, which makes it cheaper to compare them, and it increases commonality with other code that deals with these identifiers.

Comment on lines +8069 to +8073
PredefinedSugarType(Kind KD, QualType UnderlyingType)
: Type(PredefinedSugar, UnderlyingType->getCanonicalTypeInternal(),
TypeDependence::None) {
PredefinedSugarTypeBits.Kind = llvm::to_underlying(KD);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
PredefinedSugarType(Kind KD, QualType UnderlyingType)
: Type(PredefinedSugar, UnderlyingType->getCanonicalTypeInternal(),
TypeDependence::None) {
PredefinedSugarTypeBits.Kind = llvm::to_underlying(KD);
}
PredefinedSugarType(Kind KD, QualType CanonicalType)
: Type(PredefinedSugar, CanonicalType,
TypeDependence::None) {
PredefinedSugarTypeBits.Kind = llvm::to_underlying(KD);
}

Comment on lines +5242 to +5254
auto getUnderlyingType = [](const ASTContext &Ctx, Kind KDI) -> QualType {
switch (KDI) {
case Kind::SizeT:
return Ctx.getFromTargetType(Ctx.Target->getSizeType());
case Kind::SignedSizeT:
return Ctx.getFromTargetType(Ctx.Target->getSignedSizeType());
case Kind::PtrdiffT:
return Ctx.getFromTargetType(Ctx.Target->getPtrDiffType(LangAS::Default));
}
llvm_unreachable("unexpected kind");
};
auto *New = new (*this, alignof(PredefinedSugarType)) PredefinedSugarType(
static_cast<Kind>(KD), getUnderlyingType(*this, static_cast<Kind>(KD)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should already be canonical types, right?

Suggested change
auto getUnderlyingType = [](const ASTContext &Ctx, Kind KDI) -> QualType {
switch (KDI) {
case Kind::SizeT:
return Ctx.getFromTargetType(Ctx.Target->getSizeType());
case Kind::SignedSizeT:
return Ctx.getFromTargetType(Ctx.Target->getSignedSizeType());
case Kind::PtrdiffT:
return Ctx.getFromTargetType(Ctx.Target->getPtrDiffType(LangAS::Default));
}
llvm_unreachable("unexpected kind");
};
auto *New = new (*this, alignof(PredefinedSugarType)) PredefinedSugarType(
static_cast<Kind>(KD), getUnderlyingType(*this, static_cast<Kind>(KD)));
auto getCanonicalType = [](const ASTContext &Ctx, Kind KDI) -> QualType {
switch (KDI) {
case Kind::SizeT:
return Ctx.getFromTargetType(Ctx.Target->getSizeType());
case Kind::SignedSizeT:
return Ctx.getFromTargetType(Ctx.Target->getSignedSizeType());
case Kind::PtrdiffT:
return Ctx.getFromTargetType(Ctx.Target->getPtrDiffType(LangAS::Default));
}
llvm_unreachable("unexpected kind");
};
auto *New = new (*this, alignof(PredefinedSugarType)) PredefinedSugarType(
static_cast<Kind>(KD), getCanonicalType(*this, static_cast<Kind>(KD)));

@@ -1567,6 +1567,8 @@ class ASTContext : public RefCountedBase<ASTContext> {
/// and bit count.
QualType getDependentBitIntType(bool Unsigned, Expr *BitsExpr) const;

QualType getPredefinedSugarType(uint32_t KD) const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more helpful and less error prone to use the enum type here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is used by TypeProperties.td and may not support enums, similar to other definitions in same file. I recall attempting to use enums before but failing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use enums in TypeProperties, if you simply convert between it and its underlying type when serializing.
It's also possible to add this enum type to tablegen, but this seems unnecessary, since there is a single use of it.

Comment on lines +1488 to +1490
else
assert(TP1->getCanonicalTypeInternal() ==
TP2->getCanonicalTypeInternal());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
else
assert(TP1->getCanonicalTypeInternal() ==
TP2->getCanonicalTypeInternal());

We can't directly compare pointers like that here, as this structural equivalence library is used to compare nodes from different ASTContexts, and the pointers being different doesn't mean anything in that case.

return "__ptrdiff_t";
}
llvm_unreachable("unexpected kind");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing newline at EOF.

Comment on lines +7253 to +7267
template <typename Derived>
QualType TreeTransform<Derived>::TransformPredefinedSugarType(
TypeLocBuilder &TLB, PredefinedSugarTypeLoc TL) {
const PredefinedSugarType *EIT = TL.getTypePtr();
QualType Result = TL.getType();

if (getDerived().AlwaysRebuild()) {
Result = getDerived().RebuildPredefinedSugarType(
llvm::to_underlying(EIT->getKind()));
}

PredefinedSugarTypeLoc NewTL = TLB.push<PredefinedSugarTypeLoc>(Result);
NewTL.setNameLoc(TL.getNameLoc());
return Result;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think anyone would object if this was simply unreachable, I can't think of a use case for transforming these.
This seems untested either way.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand its purpose, but some code using TypeNodes.td requires it. The current logic is modeled after TypedefType.

Comment on lines +2770 to +2772
class PredefinedSugarTypeLoc final
: public InheritingConcreteTypeLoc<TypeSpecTypeLoc, PredefinedSugarTypeLoc,
PredefinedSugarType> {};
Copy link
Contributor

@mizvekov mizvekov Jun 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should ideally never be used in practice, as a PredefinedSugarType is never written in source code.
I wonder how hard it is to avoid it. Where is this coming up?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:openmp OpenMP related changes to Clang clang:static analyzer clang Clang issues not falling into any other category coroutines C++20 coroutines
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants