-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[RISCV] Align MCOperandPredicates with AsmParser #146184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-backend-risc-v @llvm/pr-subscribers-mc Author: Sam Elliott (lenary) ChangesThis adds more tests for emitting Xqci instructions with symbols, to check the case where the instruction has a symbol but may get compressed. Some of these tests were causing crashes before. This change then fixes the crashes, by changing the MCOperandPredicates, which are used for compression. This prevents compressing instructions with symbols when we have no way of applying the fixup to the compressed instruction, as is the case for CI-type instructions, but also for Full diff: https://github.com/llvm/llvm-project/pull/146184.diff 6 Files Affected:
diff --git a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.cpp b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.cpp
index d407beffcd7d1..835d38382acd5 100644
--- a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.cpp
+++ b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.cpp
@@ -12,9 +12,12 @@
//===----------------------------------------------------------------------===//
#include "RISCVBaseInfo.h"
+#include "MCTargetDesc/RISCVMCAsmInfo.h"
+#include "llvm/MC/MCExpr.h"
#include "llvm/MC/MCInst.h"
#include "llvm/MC/MCRegisterInfo.h"
#include "llvm/MC/MCSubtargetInfo.h"
+#include "llvm/Support/Casting.h"
#include "llvm/Support/raw_ostream.h"
#include "llvm/TargetParser/TargetParser.h"
#include "llvm/TargetParser/Triple.h"
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoC.td b/llvm/lib/Target/RISCV/RISCVInstrInfoC.td
index 8252a9b170eb3..de235b3d8dde8 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoC.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoC.td
@@ -37,9 +37,9 @@ def uimmlog2xlennonzero : RISCVOp, ImmLeaf<XLenVT, [{
def simm6 : RISCVSImmLeafOp<6> {
let MCOperandPredicate = [{
int64_t Imm;
- if (MCOp.evaluateAsConstantImm(Imm))
- return isInt<6>(Imm);
- return MCOp.isBareSymbolRef();
+ if (!MCOp.evaluateAsConstantImm(Imm))
+ return false;
+ return isInt<6>(Imm);
}];
}
@@ -51,9 +51,9 @@ def simm6nonzero : RISCVOp,
let OperandType = "OPERAND_SIMM6_NONZERO";
let MCOperandPredicate = [{
int64_t Imm;
- if (MCOp.evaluateAsConstantImm(Imm))
- return (Imm != 0) && isInt<6>(Imm);
- return MCOp.isBareSymbolRef();
+ if (!MCOp.evaluateAsConstantImm(Imm))
+ return false;
+ return (Imm != 0) && isInt<6>(Imm);
}];
}
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoXqci.td b/llvm/lib/Target/RISCV/RISCVInstrInfoXqci.td
index 64c5910293b55..240d91c03488b 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoXqci.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoXqci.td
@@ -146,7 +146,12 @@ def simm20_li : RISCVOp<XLenVT>,
int64_t Imm;
if (MCOp.evaluateAsConstantImm(Imm))
return isInt<20>(Imm);
- return MCOp.isBareSymbolRef();
+
+ if (!MCOp.isExpr())
+ return false;
+
+ return MCOp.getExpr()->getKind() == MCExpr::Specifier &&
+ cast<MCSpecifierExpr>(MCOp.getExpr())->getSpecifier() == RISCV::S_QC_ABS20;
}];
}
diff --git a/llvm/test/MC/RISCV/xqcibi-relocations.s b/llvm/test/MC/RISCV/xqcibi-relocations.s
index 0f7cc8c5787a1..6af496f353ee3 100644
--- a/llvm/test/MC/RISCV/xqcibi-relocations.s
+++ b/llvm/test/MC/RISCV/xqcibi-relocations.s
@@ -86,6 +86,22 @@ qc.e.bgeui s2, 24, same_section
.option norelax
+## Enable compression/relaxation to check how symbols are handled.
+.option noexact
+.option relax
+
+# ASM: qc.bnei t1, 10, undef
+# OBJ: qc.beqi t1, 0xa, 0x42 <same_section_extern+0x16>
+# OBJ-NEXT: j 0x3e <same_section_extern+0x12>
+# OBJ-NEXT: R_RISCV_JAL undef{{$}}
+qc.bnei t1, 10, undef
+
+# ASM: qc.e.bgeui s0, 40, undef
+# OBJ-NEXT: qc.e.bltui s0, 0x28, 0x4c <same_section_extern+0x20>
+# OBJ-NEXT: j 0x48 <same_section_extern+0x1c>
+# OBJ-NEXT: R_RISCV_JAL undef{{$}}
+qc.e.bgeui s0, 40, undef
+
.section .text.second, "ax", @progbits
# ASM-LABEL: other_section:
diff --git a/llvm/test/MC/RISCV/xqcilb-relocations.s b/llvm/test/MC/RISCV/xqcilb-relocations.s
index ab08beed9be94..daf84a6b200c5 100644
--- a/llvm/test/MC/RISCV/xqcilb-relocations.s
+++ b/llvm/test/MC/RISCV/xqcilb-relocations.s
@@ -94,6 +94,24 @@ qc.e.jal same_section
.option norelax
+## Enable compression/relaxation to check how symbols are handled.
+.option noexact
+.option relax
+
+qc.e.j undef
+# ASM: j undef
+# OBJ: qc.e.j 0x44 <same_section_extern+0x10>
+# OBJ-NEXT: R_RISCV_VENDOR QUALCOMM{{$}}
+# OBJ-NEXT: R_RISCV_CUSTOM195 undef{{$}}
+# OBJ-NEXT: R_RISCV_RELAX
+
+qc.e.jal undef
+# ASM: jal undef
+# OBJ: qc.e.jal 0x4a <same_section_extern+0x16>
+# OBJ-NEXT: R_RISCV_VENDOR QUALCOMM{{$}}
+# OBJ-NEXT: R_RISCV_CUSTOM195 undef{{$}}
+# OBJ-NEXT: R_RISCV_RELAX
+
.section .text.other, "ax", @progbits
# ASM-LABEL: other_section:
diff --git a/llvm/test/MC/RISCV/xqcili-relocations.s b/llvm/test/MC/RISCV/xqcili-relocations.s
index 866586236fa46..356c1c89a516d 100644
--- a/llvm/test/MC/RISCV/xqcili-relocations.s
+++ b/llvm/test/MC/RISCV/xqcili-relocations.s
@@ -99,6 +99,24 @@ qc.e.li s1, undef
.option norelax
+## Enable compression/relaxation to check how symbols are handled.
+.option noexact
+.option relax
+
+# ASM: qc.li a1, %qc.abs20(undef)
+# OBJ-NEXT: qc.li a1, 0x0
+# OBJ-NEXT: R_RISCV_VENDOR QUALCOMM{{$}}
+# OBJ-NEXT: R_RISCV_CUSTOM192 undef{{$}}
+# OBJ-NEXT: R_RISCV_RELAX
+qc.li a1, %qc.abs20(undef)
+
+# ASM: qc.e.li a1, undef
+# OBJ-NEXT: qc.e.li a1, 0x0
+# OBJ-NEXT: R_RISCV_VENDOR QUALCOMM{{$}}
+# OBJ-NEXT: R_RISCV_CUSTOM194 undef{{$}}
+# OBJ-NEXT: R_RISCV_RELAX
+qc.e.li a1, undef
+
.section .text.other, "ax", @progbits
# ASM-LABEL: other_section:
|
This comment was marked as outdated.
This comment was marked as outdated.
I've become more convinced that this change is "correct" by examining the behaviour in the assembler. Broadly, it's worth noting that I will revisit the structure of this PR:
The second commit, if i get this right, should only change the behaviour for the |
This adds assembly test coverage for the following operand types: - simm12 - uimm20_lui - uimm20_auipc - simm6 - simm6nonzero - simm5 - simm5_plus1 - simm20_li
Before this change, many MCOperandPredicates accepted bare symbols but there was no cohesive logic as to which did and which didn't, and it did not correspond to where the parser did and did not accept bare symbols. MCOperandPredicates are used by the assembly printer when choosing to whether to use an alias, and by the assembler compression and decompression mechanism. RISC-V uses both of these extensively. Xqci had a bug with a compress pattern for `qc.e.li` (to `c.li`) which was causing crashes because `RISCVMCCodeEmitter::getImmOpValue` could not handle a bare symbol ref when encoding the `c.li` instruction. This patch aligns the MCOperandPredicates with the predicates in the RISC-V assembly parser, denying bare symbol references in places they were accepted for aliases/compression in the past. This is not NFC, as some assembly printing has changed with the `li` alias for `addi`, but this seems very minor. The crash when assembling `qc.e.li a0, undef` is now fixed.
9f16659
to
002af0c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Xqci specific changes look good to me. Please wait for someone else to review the rest.
llvm/lib/MC/MCInst.cpp
Outdated
} | ||
|
||
bool MCOperand::isSimpleSymbolRef(MCExpr::Spec &Specifier) const { | ||
assert(isExpr() && "isBareSymbolRef expects only expressions"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you maybe want to change the assert to say isSimpleSymbolRef instead of isBareSymbolRef?
✅ With the latest revision this PR passed the C/C++ code formatter. |
llvm/lib/MC/MCInst.cpp
Outdated
bool MCOperand::isSimpleSymbolRef(MCExpr::Spec &Specifier) const { | ||
assert(isExpr() && "isSimpleSymbolRef expects only expressions"); | ||
const MCExpr *Expr = getExpr(); | ||
MCExpr::ExprKind Kind = getExpr()->getKind(); | ||
return Kind == MCExpr::SymbolRef && | ||
cast<MCSymbolRefExpr>(Expr)->getSpecifier() == 0; | ||
|
||
switch (Kind) { | ||
case MCExpr::Binary: | ||
case MCExpr::Unary: | ||
case MCExpr::Constant: | ||
break; | ||
case MCExpr::Target: | ||
// It's not clear this is right, does MCTargetExpr need another virtual method? | ||
break; | ||
case MCExpr::SymbolRef: | ||
Specifier = cast<MCSymbolRefExpr>(Expr)->getSpecifier(); | ||
return true; | ||
case MCExpr::Specifier: | ||
Specifier = cast<MCSpecifierExpr>(Expr)->getSpecifier(); | ||
return true; | ||
} | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MaskRay it would be good to understand your view on this method - the aim is to effectively capture everywhere which is just a specifier and one symbol, extracting the specifier - like what RISCVAsmParser::classifySymbolRef
does.
I haven't had to cover MCTargetExpr because only RISC-V uses this method right now, and that backend now uses MCSpecifierExpr. I guess maybe I should be using Expr->evaluateAsRelocatable(...))
and looking at the resulting MCValue
, rather than adding a new virtual method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking. I have migrated most targets to use MCSpecifierExpr.
@@ -327,12 +327,20 @@ def uimm16 : RISCVUImmOp<16>; | |||
def uimm32 : RISCVUImmOp<32>; | |||
def uimm48 : RISCVUImmOp<48>; | |||
def uimm64 : RISCVUImmOp<64>; | |||
|
|||
def simm12 : RISCVSImmLeafOp<12> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the urge to call this something like simm12_lo
to make it clearer that this accepts symbols with lo
specifiers, so that it is clearer it is different from the rest of simm<N>
. If I do, it will be in a NFC follow-up.
The introduction of The use of Additionally, I've noticed that RISCVAsmParser.cpp tends to report errors during parsing more aggressively than other target parsers. In some cases, this seems unnecessary, and deferring certain errors to the relocation generation phase could be cleaner (e.g. |
@@ -179,6 +179,7 @@ class MCOperand { | |||
LLVM_ABI void print(raw_ostream &OS, const MCContext *Ctx = nullptr) const; | |||
LLVM_ABI void dump() const; | |||
LLVM_ABI bool isBareSymbolRef() const; | |||
LLVM_ABI bool isSimpleSymbolRef(MCExpr::Spec &Specifier) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should try eliminating isBareSymbolRef instead of adding more variants...
I do see that only RISC-V and CSKY use I understand that you don't like how the asm parser reports errors right now, and that it could be deferred (like gnu as does), but this might give up some information in the messages. We also have to defer errors when we move to treating branch immediates as absolute addresses (like gnu as does), which I have the start of a prototype for in #133555 (which I think you might be recalling?). I do want to push on this work in the future. I think some people like the extra information, and I do think it's a drawback that the error messages get worse when deferred. However, I am unable to start this major refactor now, especially with the release branch happening in about a week or so, and this aims to solve a crash. I think we still have a bit of a problem for compression here (maybe not for aliases?), but I also know that only two of the compression patterns are the problematic ones - unfortunately we think they are pretty important when someone writes a As the code flow currently works, the following happens:
It sounds like a better short-term solution from your PoV is:
This approximately echoes how we solve this for almost everywhere else, except that everywhere else we already have a fixup and that fixup has a corresponding relocation. I'm not sure we really want a fixup which doesn't have a corresponding relocation, but at least that direction is less intrusive on target-independent code. I'll upload that as a separate patch, and we can come back to the right way to sort out this issue. |
There's a long comment explaining this approach in RISCVInstrInfoXqci.td This is presented as an alternative to llvm#146184.
Before this change, many MCOperandPredicates accepted bare symbols but there was no cohesive logic as to which did and which didn't, and it did not correspond to where the parser did and did not accept bare symbols.
MCOperandPredicates are used by the assembly printer when choosing to whether to use an alias, and by the assembler compression and decompression mechanism. RISC-V uses both of these extensively.
Xqci had a bug with a compress pattern for
qc.e.li
(toc.li
) which was causing crashes becauseRISCVMCCodeEmitter::getImmOpValue
could not handle a bare symbol ref when encoding thec.li
instruction.This patch aligns the MCOperandPredicates with the predicates in the RISC-V assembly parser, denying bare symbol references in places they were accepted for aliases/compression in the past.
This is not NFC, as some assembly printing has changed with the
li
alias foraddi
, but this seems very minor. The crash when assemblingqc.e.li a0, undef
is now fixed.