-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[Clang][PowerPC] Add __dmr1024 type and DMF integer calculation builtins #142480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Define the __dmr type used to manipulate the new DMR registers introduced by the Dense Math Facility (DMF) on PowerPC, and add six Clang builtins that correspond to the integer outer-product accumulate to ACC instructions: __builtin_mma_dmxvi8gerx4, __builtin_mma_pmdmxvi8gerx4, __builtin_mma_dmxvi8gerx4pp, __builtin_mma_pmdmxvi8gerx4pp, __builtin_mma_dmxvi8gerx4spp, and __builtin_mma_pmdmxvi8gerx4spp.
@llvm/pr-subscribers-lldb @llvm/pr-subscribers-clang Author: Maryam Moghadas (maryammo) ChangesDefine the __dmr type used to manipulate the new DMR registers introduced by the Dense Math Facility (DMF) on PowerPC, and add six Clang builtins that correspond to the integer outer-product accumulate to ACC instructions: __builtin_mma_dmxvi8gerx4, __builtin_mma_pmdmxvi8gerx4, __builtin_mma_dmxvi8gerx4pp, __builtin_mma_pmdmxvi8gerx4pp, __builtin_mma_dmxvi8gerx4spp, and __builtin_mma_pmdmxvi8gerx4spp. Patch is 26.81 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/142480.diff 11 Files Affected:
diff --git a/clang/include/clang/Basic/BuiltinsPPC.def b/clang/include/clang/Basic/BuiltinsPPC.def
index bb7d54bbb793e..099500754a0e0 100644
--- a/clang/include/clang/Basic/BuiltinsPPC.def
+++ b/clang/include/clang/Basic/BuiltinsPPC.def
@@ -1134,6 +1134,18 @@ UNALIASED_CUSTOM_BUILTIN(mma_pmxvbf16ger2np, "vW512*VVi15i15i3", true,
"mma,paired-vector-memops")
UNALIASED_CUSTOM_BUILTIN(mma_pmxvbf16ger2nn, "vW512*VVi15i15i3", true,
"mma,paired-vector-memops")
+UNALIASED_CUSTOM_BUILTIN(mma_dmxvi8gerx4, "vW1024*W256V", false,
+ "mma,paired-vector-memops")
+UNALIASED_CUSTOM_BUILTIN(mma_pmdmxvi8gerx4, "vW1024*W256Vi255i15i15", false,
+ "mma,paired-vector-memops")
+UNALIASED_CUSTOM_BUILTIN(mma_dmxvi8gerx4pp, "vW1024*W256V", true,
+ "mma,paired-vector-memops")
+UNALIASED_CUSTOM_BUILTIN(mma_pmdmxvi8gerx4pp, "vW1024*W256Vi255i15i15", true,
+ "mma,paired-vector-memops")
+UNALIASED_CUSTOM_BUILTIN(mma_dmxvi8gerx4spp, "vW1024*W256V", true,
+ "mma,paired-vector-memops")
+UNALIASED_CUSTOM_BUILTIN(mma_pmdmxvi8gerx4spp, "vW1024*W256Vi255i15i15", true,
+ "mma,paired-vector-memops")
// FIXME: Obviously incomplete.
diff --git a/clang/include/clang/Basic/PPCTypes.def b/clang/include/clang/Basic/PPCTypes.def
index 9e2cb2aedc9fc..cfc9de3a473d4 100644
--- a/clang/include/clang/Basic/PPCTypes.def
+++ b/clang/include/clang/Basic/PPCTypes.def
@@ -30,6 +30,7 @@
#endif
+PPC_VECTOR_MMA_TYPE(__dmr, VectorDmr, 1024)
PPC_VECTOR_MMA_TYPE(__vector_quad, VectorQuad, 512)
PPC_VECTOR_VSX_TYPE(__vector_pair, VectorPair, 256)
diff --git a/clang/lib/AST/ASTContext.cpp b/clang/lib/AST/ASTContext.cpp
index 45f9602856840..ffb4ca61b00c4 100644
--- a/clang/lib/AST/ASTContext.cpp
+++ b/clang/lib/AST/ASTContext.cpp
@@ -3455,6 +3455,7 @@ static void encodeTypeForFunctionPointerAuth(const ASTContext &Ctx,
case BuiltinType::BFloat16:
case BuiltinType::VectorQuad:
case BuiltinType::VectorPair:
+ case BuiltinType::VectorDmr:
OS << "?";
return;
diff --git a/clang/test/AST/ast-dump-ppc-types.c b/clang/test/AST/ast-dump-ppc-types.c
index 26ae5441f20d7..a430268284413 100644
--- a/clang/test/AST/ast-dump-ppc-types.c
+++ b/clang/test/AST/ast-dump-ppc-types.c
@@ -1,9 +1,11 @@
+// RUN: %clang_cc1 -triple powerpc64le-unknown-unknown -target-cpu future \
+// RUN: -ast-dump %s | FileCheck %s
// RUN: %clang_cc1 -triple powerpc64le-unknown-unknown -target-cpu pwr10 \
-// RUN: -ast-dump -ast-dump-filter __vector %s | FileCheck %s
+// RUN: -ast-dump %s | FileCheck %s
// RUN: %clang_cc1 -triple powerpc64le-unknown-unknown -target-cpu pwr9 \
-// RUN: -ast-dump -ast-dump-filter __vector %s | FileCheck %s
+// RUN: -ast-dump %s | FileCheck %s
// RUN: %clang_cc1 -triple powerpc64le-unknown-unknown -target-cpu pwr8 \
-// RUN: -ast-dump -ast-dump-filter __vector %s | FileCheck %s
+// RUN: -ast-dump %s | FileCheck %s
// RUN: %clang_cc1 -triple x86_64-unknown-unknown -ast-dump %s | FileCheck %s \
// RUN: --check-prefix=CHECK-X86_64
// RUN: %clang_cc1 -triple arm-unknown-unknown -ast-dump %s | FileCheck %s \
@@ -15,16 +17,21 @@
// are correctly defined. We also added checks on a couple of other targets to
// ensure the types are target-dependent.
+// CHECK: TypedefDecl {{.*}} implicit __dmr '__dmr'
+// CHECK: `-BuiltinType {{.*}} '__dmr'
// CHECK: TypedefDecl {{.*}} implicit __vector_quad '__vector_quad'
// CHECK-NEXT: -BuiltinType {{.*}} '__vector_quad'
// CHECK: TypedefDecl {{.*}} implicit __vector_pair '__vector_pair'
// CHECK-NEXT: -BuiltinType {{.*}} '__vector_pair'
+// CHECK-X86_64-NOT: __dmr
// CHECK-X86_64-NOT: __vector_quad
// CHECK-X86_64-NOT: __vector_pair
+// CHECK-ARM-NOT: __dmr
// CHECK-ARM-NOT: __vector_quad
// CHECK-ARM-NOT: __vector_pair
+// CHECK-RISCV64-NOT: __dmr
// CHECK-RISCV64-NOT: __vector_quad
// CHECK-RISCV64-NOT: __vector_pair
diff --git a/clang/test/CodeGen/PowerPC/builtins-ppc-mmaplus.c b/clang/test/CodeGen/PowerPC/builtins-ppc-mmaplus.c
new file mode 100644
index 0000000000000..2c335218ed32a
--- /dev/null
+++ b/clang/test/CodeGen/PowerPC/builtins-ppc-mmaplus.c
@@ -0,0 +1,94 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// RUN: %clang_cc1 -O3 -triple powerpc64le-unknown-unknown -target-cpu future \
+// RUN: -emit-llvm %s -o - | FileCheck %s
+// RUN: %clang_cc1 -O3 -triple powerpc64-ibm-aix -target-cpu future \
+// RUN: -emit-llvm %s -o - | FileCheck %s
+
+
+// CHECK-LABEL: @test_dmxvi8gerx4(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[TMP0:%.*]] = load <256 x i1>, ptr [[VPP:%.*]], align 32, !tbaa [[TBAA2:![0-9]+]]
+// CHECK-NEXT: [[TMP1:%.*]] = tail call <1024 x i1> @llvm.ppc.mma.dmxvi8gerx4(<256 x i1> [[TMP0]], <16 x i8> [[VC:%.*]])
+// CHECK-NEXT: store <1024 x i1> [[TMP1]], ptr [[RESP:%.*]], align 128, !tbaa [[TBAA6:![0-9]+]]
+// CHECK-NEXT: ret void
+//
+void test_dmxvi8gerx4(unsigned char *vdmrp, unsigned char *vpp, vector unsigned char vc, unsigned char *resp) {
+ __dmr vdmr = *((__dmr *)vdmrp);
+ __vector_pair vp = *((__vector_pair *)vpp);
+ __builtin_mma_dmxvi8gerx4(&vdmr, vp, vc);
+ *((__dmr *)resp) = vdmr;
+}
+
+// CHECK-LABEL: @test_pmdmxvi8gerx4(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[TMP0:%.*]] = load <256 x i1>, ptr [[VPP:%.*]], align 32, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP1:%.*]] = tail call <1024 x i1> @llvm.ppc.mma.pmdmxvi8gerx4(<256 x i1> [[TMP0]], <16 x i8> [[VC:%.*]], i32 0, i32 0, i32 0)
+// CHECK-NEXT: store <1024 x i1> [[TMP1]], ptr [[RESP:%.*]], align 128, !tbaa [[TBAA6]]
+// CHECK-NEXT: ret void
+//
+void test_pmdmxvi8gerx4(unsigned char *vdmrp, unsigned char *vpp, vector unsigned char vc, unsigned char *resp) {
+ __dmr vdmr = *((__dmr *)vdmrp);
+ __vector_pair vp = *((__vector_pair *)vpp);
+ __builtin_mma_pmdmxvi8gerx4(&vdmr, vp, vc, 0, 0, 0);
+ *((__dmr *)resp) = vdmr;
+}
+
+// CHECK-LABEL: @test_dmxvi8gerx4pp(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[TMP0:%.*]] = load <1024 x i1>, ptr [[VDMRP:%.*]], align 128, !tbaa [[TBAA6]]
+// CHECK-NEXT: [[TMP1:%.*]] = load <256 x i1>, ptr [[VPP:%.*]], align 32, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP2:%.*]] = tail call <1024 x i1> @llvm.ppc.mma.dmxvi8gerx4pp(<1024 x i1> [[TMP0]], <256 x i1> [[TMP1]], <16 x i8> [[VC:%.*]])
+// CHECK-NEXT: store <1024 x i1> [[TMP2]], ptr [[RESP:%.*]], align 128, !tbaa [[TBAA6]]
+// CHECK-NEXT: ret void
+//
+void test_dmxvi8gerx4pp(unsigned char *vdmrp, unsigned char *vpp, vector unsigned char vc, unsigned char *resp) {
+ __dmr vdmr = *((__dmr *)vdmrp);
+ __vector_pair vp = *((__vector_pair *)vpp);
+ __builtin_mma_dmxvi8gerx4pp(&vdmr, vp, vc);
+ *((__dmr *)resp) = vdmr;
+}
+
+// CHECK-LABEL: @test_pmdmxvi8gerx4pp(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[TMP0:%.*]] = load <1024 x i1>, ptr [[VDMRP:%.*]], align 128, !tbaa [[TBAA6]]
+// CHECK-NEXT: [[TMP1:%.*]] = load <256 x i1>, ptr [[VPP:%.*]], align 32, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP2:%.*]] = tail call <1024 x i1> @llvm.ppc.mma.pmdmxvi8gerx4pp(<1024 x i1> [[TMP0]], <256 x i1> [[TMP1]], <16 x i8> [[VC:%.*]], i32 0, i32 0, i32 0)
+// CHECK-NEXT: store <1024 x i1> [[TMP2]], ptr [[RESP:%.*]], align 128, !tbaa [[TBAA6]]
+// CHECK-NEXT: ret void
+//
+void test_pmdmxvi8gerx4pp(unsigned char *vdmrp, unsigned char *vpp, vector unsigned char vc, unsigned char *resp) {
+ __dmr vdmr = *((__dmr *)vdmrp);
+ __vector_pair vp = *((__vector_pair *)vpp);
+ __builtin_mma_pmdmxvi8gerx4pp(&vdmr, vp, vc, 0, 0, 0);
+ *((__dmr *)resp) = vdmr;
+}
+
+// CHECK-LABEL: @test_dmxvi8gerx4spp(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[TMP0:%.*]] = load <1024 x i1>, ptr [[VDMRP:%.*]], align 128, !tbaa [[TBAA6]]
+// CHECK-NEXT: [[TMP1:%.*]] = load <256 x i1>, ptr [[VPP:%.*]], align 32, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP2:%.*]] = tail call <1024 x i1> @llvm.ppc.mma.dmxvi8gerx4spp(<1024 x i1> [[TMP0]], <256 x i1> [[TMP1]], <16 x i8> [[VC:%.*]])
+// CHECK-NEXT: store <1024 x i1> [[TMP2]], ptr [[RESP:%.*]], align 128, !tbaa [[TBAA6]]
+// CHECK-NEXT: ret void
+//
+void test_dmxvi8gerx4spp(unsigned char *vdmrp, unsigned char *vpp, vector unsigned char vc, unsigned char *resp) {
+ __dmr vdmr = *((__dmr *)vdmrp);
+ __vector_pair vp = *((__vector_pair *)vpp);
+ __builtin_mma_dmxvi8gerx4spp(&vdmr, vp, vc);
+ *((__dmr *)resp) = vdmr;
+}
+
+// CHECK-LABEL: @test_pmdmxvi8gerx4spp(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[TMP0:%.*]] = load <1024 x i1>, ptr [[VDMRP:%.*]], align 128, !tbaa [[TBAA6]]
+// CHECK-NEXT: [[TMP1:%.*]] = load <256 x i1>, ptr [[VPP:%.*]], align 32, !tbaa [[TBAA2]]
+// CHECK-NEXT: [[TMP2:%.*]] = tail call <1024 x i1> @llvm.ppc.mma.pmdmxvi8gerx4spp(<1024 x i1> [[TMP0]], <256 x i1> [[TMP1]], <16 x i8> [[VC:%.*]], i32 0, i32 0, i32 0)
+// CHECK-NEXT: store <1024 x i1> [[TMP2]], ptr [[RESP:%.*]], align 128, !tbaa [[TBAA6]]
+// CHECK-NEXT: ret void
+//
+void test_pmdmxvi8gerx4spp(unsigned char *vdmrp, unsigned char *vpp, vector unsigned char vc, unsigned char *resp) {
+ __dmr vdmr = *((__dmr *)vdmrp);
+ __vector_pair vp = *((__vector_pair *)vpp);
+ __builtin_mma_pmdmxvi8gerx4spp(&vdmr, vp, vc, 0, 0, 0);
+ *((__dmr *)resp) = vdmr;
+}
diff --git a/clang/test/CodeGen/PowerPC/ppc-future-mma-builtin-err.c b/clang/test/CodeGen/PowerPC/ppc-future-mma-builtin-err.c
new file mode 100644
index 0000000000000..c6029f9eb8352
--- /dev/null
+++ b/clang/test/CodeGen/PowerPC/ppc-future-mma-builtin-err.c
@@ -0,0 +1,21 @@
+// RUN: not %clang_cc1 -triple powerpc64le-unknown-linux-gnu -target-cpu future \
+// RUN: %s -emit-llvm-only 2>&1 | FileCheck %s
+
+__attribute__((target("no-mma")))
+void test_mma(unsigned char *vdmrp, unsigned char *vpp, vector unsigned char vc) {
+ __dmr vdmr = *((__dmr *)vdmrp);
+ __vector_pair vp = *((__vector_pair *)vpp);
+ __builtin_mma_dmxvi8gerx4(&vdmr, vp, vc);
+ __builtin_mma_pmdmxvi8gerx4(&vdmr, vp, vc, 0, 0, 0);
+ __builtin_mma_dmxvi8gerx4pp(&vdmr, vp, vc);
+ __builtin_mma_pmdmxvi8gerx4pp(&vdmr, vp, vc, 0, 0, 0);
+ __builtin_mma_dmxvi8gerx4spp(&vdmr, vp, vc);
+ __builtin_mma_pmdmxvi8gerx4spp(&vdmr, vp, vc, 0, 0, 0);
+
+// CHECK: error: '__builtin_mma_dmxvi8gerx4' needs target feature mma,paired-vector-memops
+// CHECK: error: '__builtin_mma_pmdmxvi8gerx4' needs target feature mma,paired-vector-memops
+// CHECK: error: '__builtin_mma_dmxvi8gerx4pp' needs target feature mma,paired-vector-memops
+// CHECK: error: '__builtin_mma_pmdmxvi8gerx4pp' needs target feature mma,paired-vector-memops
+// CHECK: error: '__builtin_mma_dmxvi8gerx4spp' needs target feature mma,paired-vector-memops
+// CHECK: error: '__builtin_mma_pmdmxvi8gerx4spp' needs target feature mma,paired-vector-memops
+}
diff --git a/clang/test/CodeGen/PowerPC/ppc-future-paired-vec-memops-builtin-err.c b/clang/test/CodeGen/PowerPC/ppc-future-paired-vec-memops-builtin-err.c
new file mode 100644
index 0000000000000..c31847e3ca4c4
--- /dev/null
+++ b/clang/test/CodeGen/PowerPC/ppc-future-paired-vec-memops-builtin-err.c
@@ -0,0 +1,20 @@
+// RUN: not %clang_cc1 -triple powerpc64le-unknown-linux-gnu -target-cpu future \
+// RUN: %s -emit-llvm-only 2>&1 | FileCheck %s
+
+__attribute__((target("no-paired-vector-memops")))
+void test_pair(unsigned char *vdmr, unsigned char *vpp, vector unsigned char vc) {
+ __vector_pair vp = *((__vector_pair *)vpp);
+ __builtin_mma_dmxvi8gerx4((__dmr *)vdmr, vp, vc);
+ __builtin_mma_pmdmxvi8gerx4((__dmr *)vdmr, vp, vc, 0, 0, 0);
+ __builtin_mma_dmxvi8gerx4pp((__dmr *)vdmr, vp, vc);
+ __builtin_mma_pmdmxvi8gerx4pp((__dmr *)vdmr, vp, vc, 0, 0, 0);
+ __builtin_mma_dmxvi8gerx4spp((__dmr *)vdmr, vp, vc);
+ __builtin_mma_pmdmxvi8gerx4spp((__dmr *)vdmr, vp, vc, 0, 0, 0);
+
+// CHECK: error: '__builtin_mma_dmxvi8gerx4' needs target feature mma,paired-vector-memops
+// CHECK: error: '__builtin_mma_pmdmxvi8gerx4' needs target feature mma,paired-vector-memops
+// CHECK: error: '__builtin_mma_dmxvi8gerx4pp' needs target feature mma,paired-vector-memops
+// CHECK: error: '__builtin_mma_pmdmxvi8gerx4pp' needs target feature mma,paired-vector-memops
+// CHECK: error: '__builtin_mma_dmxvi8gerx4spp' needs target feature mma,paired-vector-memops
+// CHECK: error: '__builtin_mma_pmdmxvi8gerx4spp' needs target feature mma,paired-vector-memops
+}
diff --git a/clang/test/CodeGen/PowerPC/ppc-mmaplus-types.c b/clang/test/CodeGen/PowerPC/ppc-mmaplus-types.c
new file mode 100644
index 0000000000000..dbae2d0c0829a
--- /dev/null
+++ b/clang/test/CodeGen/PowerPC/ppc-mmaplus-types.c
@@ -0,0 +1,184 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// RUN: %clang_cc1 -triple powerpc64le-linux-unknown -target-cpu future \
+// RUN: -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple powerpc64le-linux-unknown -target-cpu pwr10 \
+// RUN: -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple powerpc64le-linux-unknown -target-cpu pwr9 \
+// RUN: -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple powerpc64le-linux-unknown -target-cpu pwr8 \
+// RUN: -emit-llvm -o - %s | FileCheck %s
+
+typedef __vector_quad vq_t;
+
+// CHECK-LABEL: @test_dmr_copy(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[PTR1_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT: [[PTR2_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT: store ptr [[PTR1:%.*]], ptr [[PTR1_ADDR]], align 8
+// CHECK-NEXT: store ptr [[PTR2:%.*]], ptr [[PTR2_ADDR]], align 8
+// CHECK-NEXT: [[TMP0:%.*]] = load ptr, ptr [[PTR1_ADDR]], align 8
+// CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds <1024 x i1>, ptr [[TMP0]], i64 2
+// CHECK-NEXT: [[TMP1:%.*]] = load <1024 x i1>, ptr [[ADD_PTR]], align 128
+// CHECK-NEXT: [[TMP2:%.*]] = load ptr, ptr [[PTR2_ADDR]], align 8
+// CHECK-NEXT: [[ADD_PTR1:%.*]] = getelementptr inbounds <1024 x i1>, ptr [[TMP2]], i64 1
+// CHECK-NEXT: store <1024 x i1> [[TMP1]], ptr [[ADD_PTR1]], align 128
+// CHECK-NEXT: ret void
+//
+void test_dmr_copy(__dmr *ptr1, __dmr *ptr2) {
+ *(ptr2 + 1) = *(ptr1 + 2);
+}
+
+// CHECK-LABEL: @test_dmr_typedef(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[INP_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT: [[OUTP_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT: [[VDMRIN:%.*]] = alloca ptr, align 8
+// CHECK-NEXT: [[VDMROUT:%.*]] = alloca ptr, align 8
+// CHECK-NEXT: store ptr [[INP:%.*]], ptr [[INP_ADDR]], align 8
+// CHECK-NEXT: store ptr [[OUTP:%.*]], ptr [[OUTP_ADDR]], align 8
+// CHECK-NEXT: [[TMP0:%.*]] = load ptr, ptr [[INP_ADDR]], align 8
+// CHECK-NEXT: store ptr [[TMP0]], ptr [[VDMRIN]], align 8
+// CHECK-NEXT: [[TMP1:%.*]] = load ptr, ptr [[OUTP_ADDR]], align 8
+// CHECK-NEXT: store ptr [[TMP1]], ptr [[VDMROUT]], align 8
+// CHECK-NEXT: [[TMP2:%.*]] = load ptr, ptr [[VDMRIN]], align 8
+// CHECK-NEXT: [[TMP3:%.*]] = load <1024 x i1>, ptr [[TMP2]], align 128
+// CHECK-NEXT: [[TMP4:%.*]] = load ptr, ptr [[VDMROUT]], align 8
+// CHECK-NEXT: store <1024 x i1> [[TMP3]], ptr [[TMP4]], align 128
+// CHECK-NEXT: ret void
+//
+void test_dmr_typedef(int *inp, int *outp) {
+ __dmr *vdmrin = (__dmr *)inp;
+ __dmr *vdmrout = (__dmr *)outp;
+ *vdmrout = *vdmrin;
+}
+
+// CHECK-LABEL: @test_dmr_arg(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[VDMR_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT: [[PTR_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT: [[VDMRP:%.*]] = alloca ptr, align 8
+// CHECK-NEXT: store ptr [[VDMR:%.*]], ptr [[VDMR_ADDR]], align 8
+// CHECK-NEXT: store ptr [[PTR:%.*]], ptr [[PTR_ADDR]], align 8
+// CHECK-NEXT: [[TMP0:%.*]] = load ptr, ptr [[PTR_ADDR]], align 8
+// CHECK-NEXT: store ptr [[TMP0]], ptr [[VDMRP]], align 8
+// CHECK-NEXT: [[TMP1:%.*]] = load ptr, ptr [[VDMR_ADDR]], align 8
+// CHECK-NEXT: [[TMP2:%.*]] = load <1024 x i1>, ptr [[TMP1]], align 128
+// CHECK-NEXT: [[TMP3:%.*]] = load ptr, ptr [[VDMRP]], align 8
+// CHECK-NEXT: store <1024 x i1> [[TMP2]], ptr [[TMP3]], align 128
+// CHECK-NEXT: ret void
+//
+void test_dmr_arg(__dmr *vdmr, int *ptr) {
+ __dmr *vdmrp = (__dmr *)ptr;
+ *vdmrp = *vdmr;
+}
+
+// CHECK-LABEL: @test_dmr_const_arg(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[VDMR_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT: [[PTR_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT: [[VDMRP:%.*]] = alloca ptr, align 8
+// CHECK-NEXT: store ptr [[VDMR:%.*]], ptr [[VDMR_ADDR]], align 8
+// CHECK-NEXT: store ptr [[PTR:%.*]], ptr [[PTR_ADDR]], align 8
+// CHECK-NEXT: [[TMP0:%.*]] = load ptr, ptr [[PTR_ADDR]], align 8
+// CHECK-NEXT: store ptr [[TMP0]], ptr [[VDMRP]], align 8
+// CHECK-NEXT: [[TMP1:%.*]] = load ptr, ptr [[VDMR_ADDR]], align 8
+// CHECK-NEXT: [[TMP2:%.*]] = load <1024 x i1>, ptr [[TMP1]], align 128
+// CHECK-NEXT: [[TMP3:%.*]] = load ptr, ptr [[VDMRP]], align 8
+// CHECK-NEXT: store <1024 x i1> [[TMP2]], ptr [[TMP3]], align 128
+// CHECK-NEXT: ret void
+//
+void test_dmr_const_arg(const __dmr *const vdmr, int *ptr) {
+ __dmr *vdmrp = (__dmr *)ptr;
+ *vdmrp = *vdmr;
+}
+
+// CHECK-LABEL: @test_dmr_array_arg(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[VDMRA_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT: [[PTR_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT: [[VDMRP:%.*]] = alloca ptr, align 8
+// CHECK-NEXT: store ptr [[VDMRA:%.*]], ptr [[VDMRA_ADDR]], align 8
+// CHECK-NEXT: store ptr [[PTR:%.*]], ptr [[PTR_ADDR]], align 8
+// CHECK-NEXT: [[TMP0:%.*]] = load ptr, ptr [[PTR_ADDR]], align 8
+// CHECK-NEXT: store ptr [[TMP0]], ptr [[VDMRP]], align 8
+// CHECK-NEXT: [[TMP1:%.*]] = load ptr, ptr [[VDMRA_ADDR]], align 8
+// CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds <1024 x i1>, ptr [[TMP1]], i64 0
+// CHECK-NEXT: [[TMP2:%.*]] = load <1024 x i1>, ptr [[ARRAYIDX]], align 128
+// CHECK-NEXT: [[TMP3:%.*]] = load ptr, ptr [[VDMRP]], align 8
+// CHECK-NEXT: store <1024 x i1> [[TMP2]], ptr [[TMP3]], align 128
+// CHECK-NEXT: ret void
+//
+void test_dmr_array_arg(__dmr vdmra[], int *ptr) {
+ __dmr *vdmrp = (__dmr *)ptr;
+ *vdmrp = vdmra[0];
+}
+
+// CHECK-LABEL: @test_dmr_ret(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[PTR_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT: [[VDMRP:%.*]] = alloca ptr, align 8
+// CHECK-NEXT: store ptr [[PTR:%.*]], ptr [[PTR_ADDR]], align 8
+// CHECK-NEXT: [[TMP0:%.*]] = load ptr, ptr [[PTR_ADDR]], align 8
+// CHECK-NEXT: store ptr [[TMP0]], ptr [[VDMRP]], align 8
+// CHECK-NEXT: [[TMP1:%.*]] = load ptr, ptr [[VDMRP]], align 8
+// CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds <1024 x i1>, ptr [[TMP1]], i64 2
+// CHECK-NEXT: ret ptr [[ADD_PTR]]
+//
+__dmr *test_dmr_ret(int *ptr) {
+ __dmr *vdmrp = (__dmr *)ptr;
+ return vdmrp + 2;
+}
+
+// CHECK-LABEL: @test_dmr_ret_const(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[PTR_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT: [[VDMRP:%.*]] = alloca ptr, align 8
+// CHECK-NEXT: store ptr [[PTR:%.*]], ptr [[PTR_ADDR]], align 8
+// CHECK-NEXT: [[TMP0:%.*]] = load ptr, ptr [[PTR_ADDR]], align 8
+// CHECK-NEXT: store ptr [[TMP0]], ptr [[VDMRP]], align 8
+// CHECK-NEXT: [[TMP1:%.*]] = load ptr, ptr [[VDMRP]], align 8
+// CHECK-NEXT: [[ADD_PTR:%.*]] = getelementptr inbounds <1024 x i1>, ptr [[TMP1]], i64 2
+// CHECK-NEXT: ret ptr [[ADD_PTR]]
+//
+const __dmr *test_dmr_ret_const(int *ptr) {
+ __dmr *vdmrp = (__dmr *)ptr;
+ return vdmrp + 2;
+}
+
+// CHECK-LABEL: @test_dmr_sizeof_alignof(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[PTR_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT: [[VDMRP:%.*]] = alloca ptr, align 8
+// CHECK-NEXT: [[VDMR:%.*]] = alloca <1024 x i1>, align 128
+// CHECK-NEXT: [[SIZET:%.*]] = alloca i32, align 4
+// CHECK-NEXT: [[ALIGNT:%.*]] = alloca i32, align 4
+// CHECK-NEXT: [[SIZEV:%.*]] = alloca i32, align 4
+// CHECK-NEXT: [[ALIGNV:%.*]] = alloca i32, align 4
+// CHECK-NEXT: store ptr [[PTR:%.*]], ptr [[PTR_ADDR]], align 8
+// CHECK-NEXT: [[TMP0:%.*]] = load ptr, ptr [[PTR_ADDR]], align 8
+// CHECK-NEXT: store ptr [[TMP0]], ptr [[VDMRP]], align 8
+// CHECK-NEXT: [[TMP1:%.*]] = load ptr, ptr [[VDMRP]], align 8
+// CHECK-NEXT: [[TMP2:%.*]] = load <1024 x i1>, ptr [[TMP1]], align 128
+// CHECK-NEXT: store <1024 x i1> [[TMP2]], ptr [[VDMR]], align 128
+// CHECK-NEXT: store i32 128, ptr [[SIZET]], align 4
+// CHECK-NEXT: sto...
[truncated]
|
@@ -30,6 +30,7 @@ | |||
#endif | |||
|
|||
|
|||
PPC_VECTOR_MMA_TYPE(__dmr1024, VectorDmr1024, 1024) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a vector type, maybe we can just keep the same naming convention?
PPC_VECTOR_MMA_TYPE(__dmr1024, VectorDmr1024, 1024) | |
PPC_VECTOR_MMA_TYPE(__vector_dmr, VectorDmr, 1024) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type names have to match GCC and are negotiated - __dmr1024 is the current choice.
@@ -0,0 +1,184 @@ | |||
// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe this tests can be renamed to dmf-types.c
?
@@ -0,0 +1,20 @@ | |||
// RUN: not %clang_cc1 -triple powerpc64le-unknown-linux-gnu -target-cpu future \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: better not to have "future" in the test case names as "future" is a sliding thing?
@@ -0,0 +1,94 @@ | |||
// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be missing something, what does mmaplus
imply here? Since this tests mma builtins that uses the dmr registers, maybe builtins-ppc-mma-dmr.c
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general this LGTM.
Just a few nits. Please also update your descripton and PR title as it says you are dding __dmr
type, but you are actually adding type __dmr1024
.
Thx!
clang/lib/AST/ASTContext.cpp
Outdated
@@ -3455,6 +3455,7 @@ static void encodeTypeForFunctionPointerAuth(const ASTContext &Ctx, | |||
case BuiltinType::BFloat16: | |||
case BuiltinType::VectorQuad: | |||
case BuiltinType::VectorPair: | |||
case BuiltinType::VectorDmr1024: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Maybe we can do this since DMR is an acronym and to match the actual new type name defined?
case BuiltinType::VectorDmr1024: | |
case BuiltinType::DMR1024: |
// RUN: %clang_cc1 -O3 -triple powerpc64le-unknown-unknown -target-cpu future \ | ||
// RUN: -emit-llvm %s -o - | FileCheck %s | ||
// RUN: %clang_cc1 -O3 -triple powerpc64-ibm-aix -target-cpu future \ | ||
// RUN: -emit-llvm %s -o - | FileCheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need testing for aix 32bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PC_VECTOR_MMA_TYPE is only defined for PPC64
// RUN: %clang_cc1 -triple powerpc64le-linux-unknown -target-cpu pwr9 \ | ||
// RUN: -emit-llvm -o - %s | FileCheck %s | ||
// RUN: %clang_cc1 -triple powerpc64le-linux-unknown -target-cpu pwr8 \ | ||
// RUN: -emit-llvm -o - %s | FileCheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since DMR is future specific, do we need run lines on cpu targets that it won't be valid for?
clang/test/Sema/ppc-pair-mma-types.c
Outdated
@@ -2,12 +2,110 @@ | |||
// RUN: -target-cpu pwr10 %s -verify |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should move these to a different test file -> ppc-dmr-types.c
Define the __dmr1024 type used to manipulate the new DMR registers introduced by the Dense Math Facility (DMF) on PowerPC, and add six Clang builtins that correspond to the integer outer-product accumulate to ACC PowerPC instructions: