Skip to content

[RISCV] Align MCOperandPredicates with AsmParser #146184

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

lenary
Copy link
Member

@lenary lenary commented Jun 28, 2025

Before this change, many MCOperandPredicates accepted bare symbols but there was no cohesive logic as to which did and which didn't, and it did not correspond to where the parser did and did not accept bare symbols.

MCOperandPredicates are used by the assembly printer when choosing to whether to use an alias, and by the assembler compression and decompression mechanism. RISC-V uses both of these extensively.

Xqci had a bug with a compress pattern for qc.e.li (to c.li) which was causing crashes because RISCVMCCodeEmitter::getImmOpValue could not handle a bare symbol ref when encoding the c.li instruction.

This patch aligns the MCOperandPredicates with the predicates in the RISC-V assembly parser, denying bare symbol references in places they were accepted for aliases/compression in the past.

This is not NFC, as some assembly printing has changed with the li alias for addi, but this seems very minor. The crash when assembling qc.e.li a0, undef is now fixed.

@lenary lenary requested review from topperc and svs-quic June 28, 2025 01:28
@llvmbot llvmbot added backend:RISC-V mc Machine (object) code labels Jun 28, 2025
@llvmbot
Copy link
Member

llvmbot commented Jun 28, 2025

@llvm/pr-subscribers-backend-risc-v

@llvm/pr-subscribers-mc

Author: Sam Elliott (lenary)

Changes

This adds more tests for emitting Xqci instructions with symbols, to check the case where the instruction has a symbol but may get compressed. Some of these tests were causing crashes before.

This change then fixes the crashes, by changing the MCOperandPredicates, which are used for compression. This prevents compressing instructions with symbols when we have no way of applying the fixup to the compressed instruction, as is the case for CI-type instructions, but also for qc.li which takes a symbol with a specifier.


Full diff: https://github.com/llvm/llvm-project/pull/146184.diff

6 Files Affected:

  • (modified) llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.cpp (+3)
  • (modified) llvm/lib/Target/RISCV/RISCVInstrInfoC.td (+6-6)
  • (modified) llvm/lib/Target/RISCV/RISCVInstrInfoXqci.td (+6-1)
  • (modified) llvm/test/MC/RISCV/xqcibi-relocations.s (+16)
  • (modified) llvm/test/MC/RISCV/xqcilb-relocations.s (+18)
  • (modified) llvm/test/MC/RISCV/xqcili-relocations.s (+18)
diff --git a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.cpp b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.cpp
index d407beffcd7d1..835d38382acd5 100644
--- a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.cpp
+++ b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.cpp
@@ -12,9 +12,12 @@
 //===----------------------------------------------------------------------===//
 
 #include "RISCVBaseInfo.h"
+#include "MCTargetDesc/RISCVMCAsmInfo.h"
+#include "llvm/MC/MCExpr.h"
 #include "llvm/MC/MCInst.h"
 #include "llvm/MC/MCRegisterInfo.h"
 #include "llvm/MC/MCSubtargetInfo.h"
+#include "llvm/Support/Casting.h"
 #include "llvm/Support/raw_ostream.h"
 #include "llvm/TargetParser/TargetParser.h"
 #include "llvm/TargetParser/Triple.h"
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoC.td b/llvm/lib/Target/RISCV/RISCVInstrInfoC.td
index 8252a9b170eb3..de235b3d8dde8 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoC.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoC.td
@@ -37,9 +37,9 @@ def uimmlog2xlennonzero : RISCVOp, ImmLeaf<XLenVT, [{
 def simm6 : RISCVSImmLeafOp<6> {
   let MCOperandPredicate = [{
     int64_t Imm;
-    if (MCOp.evaluateAsConstantImm(Imm))
-      return isInt<6>(Imm);
-    return MCOp.isBareSymbolRef();
+    if (!MCOp.evaluateAsConstantImm(Imm))
+      return false;
+    return isInt<6>(Imm);
   }];
 }
 
@@ -51,9 +51,9 @@ def simm6nonzero : RISCVOp,
   let OperandType = "OPERAND_SIMM6_NONZERO";
   let MCOperandPredicate = [{
     int64_t Imm;
-    if (MCOp.evaluateAsConstantImm(Imm))
-      return (Imm != 0) && isInt<6>(Imm);
-    return MCOp.isBareSymbolRef();
+    if (!MCOp.evaluateAsConstantImm(Imm))
+      return false;
+    return (Imm != 0) && isInt<6>(Imm);
   }];
 }
 
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoXqci.td b/llvm/lib/Target/RISCV/RISCVInstrInfoXqci.td
index 64c5910293b55..240d91c03488b 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoXqci.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoXqci.td
@@ -146,7 +146,12 @@ def simm20_li : RISCVOp<XLenVT>,
     int64_t Imm;
     if (MCOp.evaluateAsConstantImm(Imm))
       return isInt<20>(Imm);
-    return MCOp.isBareSymbolRef();
+
+    if (!MCOp.isExpr())
+      return false;
+
+    return MCOp.getExpr()->getKind() == MCExpr::Specifier &&
+      cast<MCSpecifierExpr>(MCOp.getExpr())->getSpecifier() == RISCV::S_QC_ABS20;
   }];
 }
 
diff --git a/llvm/test/MC/RISCV/xqcibi-relocations.s b/llvm/test/MC/RISCV/xqcibi-relocations.s
index 0f7cc8c5787a1..6af496f353ee3 100644
--- a/llvm/test/MC/RISCV/xqcibi-relocations.s
+++ b/llvm/test/MC/RISCV/xqcibi-relocations.s
@@ -86,6 +86,22 @@ qc.e.bgeui s2, 24, same_section
 
 .option norelax
 
+## Enable compression/relaxation to check how symbols are handled.
+.option noexact
+.option relax
+
+# ASM: qc.bnei t1, 10, undef
+# OBJ: qc.beqi t1, 0xa, 0x42 <same_section_extern+0x16>
+# OBJ-NEXT: j 0x3e <same_section_extern+0x12>
+# OBJ-NEXT: R_RISCV_JAL undef{{$}}
+qc.bnei t1, 10, undef
+
+# ASM: qc.e.bgeui s0, 40, undef
+# OBJ-NEXT: qc.e.bltui s0, 0x28, 0x4c <same_section_extern+0x20>
+# OBJ-NEXT: j 0x48 <same_section_extern+0x1c>
+# OBJ-NEXT: R_RISCV_JAL undef{{$}}
+qc.e.bgeui s0, 40, undef
+
 .section .text.second, "ax", @progbits
 
 # ASM-LABEL: other_section:
diff --git a/llvm/test/MC/RISCV/xqcilb-relocations.s b/llvm/test/MC/RISCV/xqcilb-relocations.s
index ab08beed9be94..daf84a6b200c5 100644
--- a/llvm/test/MC/RISCV/xqcilb-relocations.s
+++ b/llvm/test/MC/RISCV/xqcilb-relocations.s
@@ -94,6 +94,24 @@ qc.e.jal same_section
 
 .option norelax
 
+## Enable compression/relaxation to check how symbols are handled.
+.option noexact
+.option relax
+
+qc.e.j undef
+# ASM: j undef
+# OBJ: qc.e.j 0x44 <same_section_extern+0x10>
+# OBJ-NEXT: R_RISCV_VENDOR QUALCOMM{{$}}
+# OBJ-NEXT: R_RISCV_CUSTOM195 undef{{$}}
+# OBJ-NEXT: R_RISCV_RELAX
+
+qc.e.jal undef
+# ASM: jal undef
+# OBJ: qc.e.jal 0x4a <same_section_extern+0x16>
+# OBJ-NEXT: R_RISCV_VENDOR QUALCOMM{{$}}
+# OBJ-NEXT: R_RISCV_CUSTOM195 undef{{$}}
+# OBJ-NEXT: R_RISCV_RELAX
+
 .section .text.other, "ax", @progbits
 
 # ASM-LABEL: other_section:
diff --git a/llvm/test/MC/RISCV/xqcili-relocations.s b/llvm/test/MC/RISCV/xqcili-relocations.s
index 866586236fa46..356c1c89a516d 100644
--- a/llvm/test/MC/RISCV/xqcili-relocations.s
+++ b/llvm/test/MC/RISCV/xqcili-relocations.s
@@ -99,6 +99,24 @@ qc.e.li s1, undef
 
 .option norelax
 
+## Enable compression/relaxation to check how symbols are handled.
+.option noexact
+.option relax
+
+# ASM: qc.li a1, %qc.abs20(undef)
+# OBJ-NEXT: qc.li a1, 0x0
+# OBJ-NEXT: R_RISCV_VENDOR QUALCOMM{{$}}
+# OBJ-NEXT: R_RISCV_CUSTOM192 undef{{$}}
+# OBJ-NEXT: R_RISCV_RELAX
+qc.li a1, %qc.abs20(undef)
+
+# ASM: qc.e.li a1, undef
+# OBJ-NEXT: qc.e.li a1, 0x0
+# OBJ-NEXT: R_RISCV_VENDOR QUALCOMM{{$}}
+# OBJ-NEXT: R_RISCV_CUSTOM194 undef{{$}}
+# OBJ-NEXT: R_RISCV_RELAX
+qc.e.li a1, undef
+
 .section .text.other, "ax", @progbits
 
 # ASM-LABEL: other_section:

@lenary

This comment was marked as outdated.

@lenary
Copy link
Member Author

lenary commented Jul 1, 2025

I've become more convinced that this change is "correct" by examining the behaviour in the assembler. Broadly, it's worth noting that isSImm6 in the assembler does not accept any symbols, so this aligns the MCOperandPredicate with that. This does not affect absolute symbol handling, because the parser turns absolute symbols into immediates as soon as it can, from what i can see with -show-inst (this doesn't apply to absolute symbols defined after the instruction is parsed, but these error at parse-time like an undefined symbol anyway)

I will revisit the structure of this PR:

  • 1 NFC (pre-)commit adding tests for instructions with absolute, relative, undef, and latedef (an absolute symbol def made after the symbol is used) symbols.
  • 1 commit to align the MCOperandPredicates with the parser predicates.

The second commit, if i get this right, should only change the behaviour for the qc.e.li a0, undef case which is currently crashing.

lenary added 2 commits June 30, 2025 21:34
This adds assembly test coverage for the following operand types:
- simm12
- uimm20_lui
- uimm20_auipc
- simm6
- simm6nonzero
- simm5
- simm5_plus1
- simm20_li
Before this change, many MCOperandPredicates accepted bare symbols but
there was no cohesive logic as to which did and which didn't, and it did
not correspond to where the parser did and did not accept bare symbols.

MCOperandPredicates are used by the assembly printer when choosing to
whether to use an alias, and by the assembler compression and
decompression mechanism. RISC-V uses both of these extensively.

Xqci had a bug with a compress pattern for `qc.e.li` (to `c.li`) which
was causing crashes because `RISCVMCCodeEmitter::getImmOpValue` could
not handle a bare symbol ref when encoding the `c.li` instruction.

This patch aligns the MCOperandPredicates with the predicates in the
RISC-V assembly parser, denying bare symbol references in places they
were accepted for aliases/compression in the past.

This is not NFC, as some assembly printing has changed with the `li`
alias for `addi`, but this seems very minor. The crash when assembling
`qc.e.li a0, undef` is now fixed.
@lenary lenary force-pushed the pr/riscv-xqci-compress-preds branch from 9f16659 to 002af0c Compare July 1, 2025 04:35
@lenary lenary changed the title [RISCV] Fix Compression with Symbols for Xqci [RISCV] Align MCOperandPredicates with AsmParser Jul 1, 2025
Copy link
Contributor

@svs-quic svs-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Xqci specific changes look good to me. Please wait for someone else to review the rest.

}

bool MCOperand::isSimpleSymbolRef(MCExpr::Spec &Specifier) const {
assert(isExpr() && "isBareSymbolRef expects only expressions");
Copy link
Contributor

@svs-quic svs-quic Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you maybe want to change the assert to say isSimpleSymbolRef instead of isBareSymbolRef?

Copy link

github-actions bot commented Jul 1, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Comment on lines 70 to 90
bool MCOperand::isSimpleSymbolRef(MCExpr::Spec &Specifier) const {
assert(isExpr() && "isSimpleSymbolRef expects only expressions");
const MCExpr *Expr = getExpr();
MCExpr::ExprKind Kind = getExpr()->getKind();
return Kind == MCExpr::SymbolRef &&
cast<MCSymbolRefExpr>(Expr)->getSpecifier() == 0;

switch (Kind) {
case MCExpr::Binary:
case MCExpr::Unary:
case MCExpr::Constant:
break;
case MCExpr::Target:
// It's not clear this is right, does MCTargetExpr need another virtual method?
break;
case MCExpr::SymbolRef:
Specifier = cast<MCSymbolRefExpr>(Expr)->getSpecifier();
return true;
case MCExpr::Specifier:
Specifier = cast<MCSpecifierExpr>(Expr)->getSpecifier();
return true;
}
return false;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaskRay it would be good to understand your view on this method - the aim is to effectively capture everywhere which is just a specifier and one symbol, extracting the specifier - like what RISCVAsmParser::classifySymbolRef does.

I haven't had to cover MCTargetExpr because only RISC-V uses this method right now, and that backend now uses MCSpecifierExpr. I guess maybe I should be using Expr->evaluateAsRelocatable(...)) and looking at the resulting MCValue, rather than adding a new virtual method?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking. I have migrated most targets to use MCSpecifierExpr.

@@ -327,12 +327,20 @@ def uimm16 : RISCVUImmOp<16>;
def uimm32 : RISCVUImmOp<32>;
def uimm48 : RISCVUImmOp<48>;
def uimm64 : RISCVUImmOp<64>;

def simm12 : RISCVSImmLeafOp<12> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the urge to call this something like simm12_lo to make it clearer that this accepts symbols with lo specifiers, so that it is clearer it is different from the rest of simm<N>. If I do, it will be in a NFC follow-up.

@MaskRay
Copy link
Member

MaskRay commented Jul 2, 2025

The introduction of MCOperand::isBareSymbolRef in https://reviews.llvm.org/D45385 (2018) seems problematic.
Generally, the assembly parser should treat expressions like sym, sym + 0, or (sym + 3) - 3 equivalently. Most targets don't have the "bare symbol" notation.

The use of return MCOp.isBareSymbolRef(); in RISC-V TableGen files appears to be an inappropriate overuse and should be eliminated.
Unless needed to resolve parser ambiguity (as noted in a patch you co-authored, though I can't recall), MCOperand::isBareSymbolRef should be avoided.
I recall that the parser ambiguity is due to an LLVM AsmMatcher issue. GNU Assembler's opcodes/riscv-opc.c driven parser doesn't have the issue.

Additionally, I've noticed that RISCVAsmParser.cpp tends to report errors during parsing more aggressively than other target parsers. In some cases, this seems unnecessary, and deferring certain errors to the relocation generation phase could be cleaner (e.g. error: %pltpcrel can only be used in a .word directive). While diagnostic quality might be affected, we can improve RISCVELFObjectWriter.cpp.

@@ -179,6 +179,7 @@ class MCOperand {
LLVM_ABI void print(raw_ostream &OS, const MCContext *Ctx = nullptr) const;
LLVM_ABI void dump() const;
LLVM_ABI bool isBareSymbolRef() const;
LLVM_ABI bool isSimpleSymbolRef(MCExpr::Spec &Specifier) const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should try eliminating isBareSymbolRef instead of adding more variants...

@lenary
Copy link
Member Author

lenary commented Jul 2, 2025

I do see that only RISC-V and CSKY use MCOp.isBareSymbolRef(), and can see that we want to be better about symbol expressions.

I understand that you don't like how the asm parser reports errors right now, and that it could be deferred (like gnu as does), but this might give up some information in the messages. We also have to defer errors when we move to treating branch immediates as absolute addresses (like gnu as does), which I have the start of a prototype for in #133555 (which I think you might be recalling?). I do want to push on this work in the future. I think some people like the extra information, and I do think it's a drawback that the error messages get worse when deferred.

However, I am unable to start this major refactor now, especially with the release branch happening in about a week or so, and this aims to solve a crash.


I think we still have a bit of a problem for compression here (maybe not for aliases?), but I also know that only two of the compression patterns are the problematic ones - unfortunately we think they are pretty important when someone writes a qc.e.li with a small constant immediate.

As the code flow currently works, the following happens:

  1. Either RISCVAsmPrinter::EmitToStreamer or RISCVAsmParser::emitToStreamer are called, depending on whether the instruction came from codegen or from the assembler.
  2. Both of these attempt to compress the instruction using RISCVRVC::compress before calling emitInstruction. RISCVRVC::compress will use the MCOperandPredicate to check whether an operand is compressible, and if so, just compress it. Right now, these contain a check that says "if there's a bare symbol, it's ok to compress".
  3. emitInstruction (relaxall=false) eventually tries to emit the instruction as data, either to the existing fragment or to a new, relaxable, one (the latter if the instruction may need relaxation)
  4. RISCVMCCodeEmitter::encodeInstruction is called, which calls getBinaryCodeForInstr, which eventually calls RISCVMCCodeEmitter::getImmOpValue on the immediates.
  5. This fails the assert at the end, because we don't know the right fixup for a CI-type instruction.

It sounds like a better short-term solution from your PoV is:

  • Abandon this change.
  • Add a CI-format fixup, which we can relax, but we cannot emit a relocation for.
  • Add the right logic to mayNeedRelaxation and relaxInstruction so that c.li a0, symbol is turned back into qc.e.li a0, symbol.

This approximately echoes how we solve this for almost everywhere else, except that everywhere else we already have a fixup and that fixup has a corresponding relocation. I'm not sure we really want a fixup which doesn't have a corresponding relocation, but at least that direction is less intrusive on target-independent code.


I'll upload that as a separate patch, and we can come back to the right way to sort out this issue.

lenary added a commit to lenary/llvm-project that referenced this pull request Jul 2, 2025
There's a long comment explaining this approach in RISCVInstrInfoXqci.td

This is presented as an alternative to llvm#146184.
@lenary
Copy link
Member Author

lenary commented Jul 2, 2025

@MaskRay #146763 is the alternative implementation, as described. Turns out, the MCOperandPredicates are still a problem, but in a totally different way. There's a long comment in RISCVInstrInfoXqci.td on that PR explaining how the PR works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:RISC-V mc Machine (object) code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants