Skip to content

Commit 74cdb8e

Browse files
authored
[llvm][ARM] Emit MVE .arch_extension after .fpu directive if it does not include MVE features (#71545)
The floating-point and MVE features together specify the MVE functionality that is supported on the Cortex-M85 processor. But the FPU extension for the underlying architecture(armv8.1-m.main) is FPV5 which does not include MVE-F. So Compiler's -S output and `-save-temps=obj` loses MVE feature which leads to assembler error. What happening here is .fpu directive overrides any previously set features by .cpu directive. Since the the corresponding .fpu generated (.fpu fpv5-d16) does not include MVE-F, it overrides those features even though it is supported and set by the .cpu directive. Looks like .fpu is supposed to do this. In this case, there should be an .arch_extension directive re-enabling the relevant extensions after .fpu if the goal is to keep these extensions enabled. GCC also does the same. So this patch enables the MVE features by emitting the below arch extension: .fpu fpv5-d16 .arch_extension mve.fp --------- Co-authored-by: Simi Pallipurath <simi.pallipurath.com>
1 parent 2164678 commit 74cdb8e

File tree

4 files changed

+33
-6
lines changed

4 files changed

+33
-6
lines changed

llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12648,6 +12648,9 @@ bool ARMAsmParser::enableArchExtFeature(StringRef Name, SMLoc &ExtLoc) {
1264812648
{ARM::AEK_CRYPTO,
1264912649
{Feature_HasV8Bit},
1265012650
{ARM::FeatureCrypto, ARM::FeatureNEON, ARM::FeatureFPARMv8}},
12651+
{(ARM::AEK_DSP | ARM::AEK_SIMD | ARM::AEK_FP),
12652+
{Feature_HasV8_1MMainlineBit},
12653+
{ARM::HasMVEFloatOps}},
1265112654
{ARM::AEK_FP,
1265212655
{Feature_HasV8Bit},
1265312656
{ARM::FeatureVFP2_SP, ARM::FeatureFPARMv8}},

llvm/lib/Target/ARM/MCTargetDesc/ARMTargetStreamer.cpp

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -238,14 +238,18 @@ void ARMTargetStreamer::emitTargetAttributes(const MCSubtargetInfo &STI) {
238238
? ARMBuildAttrs::AllowNeonARMv8_1a
239239
: ARMBuildAttrs::AllowNeonARMv8);
240240
} else {
241-
if (STI.hasFeature(ARM::FeatureFPARMv8_D16_SP))
241+
if (STI.hasFeature(ARM::FeatureFPARMv8_D16_SP)) {
242242
// FPv5 and FP-ARMv8 have the same instructions, so are modeled as one
243243
// FPU, but there are two different names for it depending on the CPU.
244-
emitFPU(STI.hasFeature(ARM::FeatureD32)
245-
? ARM::FK_FP_ARMV8
246-
: (STI.hasFeature(ARM::FeatureFP64) ? ARM::FK_FPV5_D16
247-
: ARM::FK_FPV5_SP_D16));
248-
else if (STI.hasFeature(ARM::FeatureVFP4_D16_SP))
244+
if (STI.hasFeature(ARM::FeatureD32))
245+
emitFPU(ARM::FK_FP_ARMV8);
246+
else {
247+
emitFPU(STI.hasFeature(ARM::FeatureFP64) ? ARM::FK_FPV5_D16
248+
: ARM::FK_FPV5_SP_D16);
249+
if (STI.hasFeature(ARM::HasMVEFloatOps))
250+
emitArchExtension(ARM::AEK_SIMD | ARM::AEK_DSP | ARM::AEK_FP);
251+
}
252+
} else if (STI.hasFeature(ARM::FeatureVFP4_D16_SP))
249253
emitFPU(STI.hasFeature(ARM::FeatureD32)
250254
? ARM::FK_VFPV4
251255
: (STI.hasFeature(ARM::FeatureFP64) ? ARM::FK_VFPV4_D16
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
; RUN: llc -mtriple=arm-none-eabi -mcpu=cortex-m85 --float-abi=hard %s -o - | FileCheck %s
2+
; RUN: llc -mtriple=arm-none-eabi -mcpu=cortex-m55 --float-abi=hard %s -o - | FileCheck %s
3+
4+
; CHECK: .fpu fpv5-d16
5+
; CHECK-NEXT: .arch_extension mve.fp
6+
7+
define <4 x float> @vsubf32(<4 x float> %A, <4 x float> %B) {
8+
; CHECK-LABEL: vsubf32:
9+
; CHECK: @ %bb.0:
10+
; CHECK-NEXT: vsub.f32 q0, q0, q1
11+
; CHECK-NEXT: bx lr
12+
%tmp3 = fsub <4 x float> %A, %B
13+
ret <4 x float> %tmp3
14+
}
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
// RUN: llvm-mc -triple thumbv8.1m.main-none-eabi -filetype asm -o - %s 2>&1 | FileCheck %s
2+
3+
.arch_extension mve.fp
4+
vsub.f32 q0, q0, q1
5+
// CHECK: vsub.f32 q0, q0, q1
6+

0 commit comments

Comments
 (0)