Skip to content

[RISCV] Handle scalable ops with < EEW / 2 narrow types in combineBinOp_VLToVWBinOp_VL #84158

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 3 additions & 11 deletions llvm/lib/Target/RISCV/RISCVISelLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13652,16 +13652,6 @@ struct NodeExtensionHelper {
if (!VT.isVector())
break;

SDValue NarrowElt = OrigOperand.getOperand(0);
MVT NarrowVT = NarrowElt.getSimpleValueType();

unsigned ScalarBits = VT.getScalarSizeInBits();
unsigned NarrowScalarBits = NarrowVT.getScalarSizeInBits();

// Ensure the extension's semantic is equivalent to rvv vzext or vsext.
if (ScalarBits != NarrowScalarBits * 2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If after the prior change which moves this transform after legalize types, the only case which needs this restriction to keep the transform between legalize types and legalize ops is the i1 vector case, why not simply check if the narrow vt is a i1 vector here? Wouldn't that be less disruptive than moving the combine after legalize ops?

Note that you should also be asserting that both narrow and wide are legal types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If after the prior change which moves this transform after legalize types, the only case which needs this restriction to keep the transform between legalize types and legalize ops is the i1 vector case, why not simply check if the narrow vt is a i1 vector here?

I moved it to after the legalize vector ops phase since we weren't checking for i1 vectors in any of the other _VL nodes. So I think there was already an implicit invariant here that the combine would only run after legalize ops, and it seemed safer to just be explicit about it.

Wouldn't that be less disruptive than moving the combine after legalize ops?

Since the combine was already happening after legalize ops for the _VL nodes, this should only affect the ISD::ADD/SUB/MUL nodes that were added in #76785

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that you should also be asserting that both narrow and wide are legal types.

I've moved the narrow type assert in 0ef61ed so that we now check the narrow type for all extend node types, and we have an assert that the wide type is legal here:

assert(DAG.getTargetLoweringInfo().isTypeLegal(Root->getValueType(0)));

break;

SupportsZExt = Opc == ISD::ZERO_EXTEND;
SupportsSExt = Opc == ISD::SIGN_EXTEND;

Expand Down Expand Up @@ -14112,7 +14102,9 @@ static SDValue combineBinOp_VLToVWBinOp_VL(SDNode *N,
TargetLowering::DAGCombinerInfo &DCI,
const RISCVSubtarget &Subtarget) {
SelectionDAG &DAG = DCI.DAG;
if (DCI.isBeforeLegalize())
// Don't perform this until types are legalized and any legal i1 types are
// custom lowered to avoid introducing unselectable V{S,Z}EXT_VLs.
if (DCI.isBeforeLegalizeOps())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is 100% reliable. Its theoretically possible for an i1 vector to be created by the DAG combiner after legalize ops. The last DAG combine stage also runs the legalizer on every node as part of its worklist. So its not illegal for an i1 zext to created as it would get legalized before isel.

Copy link
Member

@sun-jacobi sun-jacobi Mar 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is better for us to still check whether the op is legal.
But as @lukel97 said, the original EEW / 2 check, which the VP intrinsics already does, could be removed, AFAIU.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the PR to instead check that the narrow element type isn't i1 across the different possible extend ops

return SDValue();

if (!NodeExtensionHelper::isSupportedRoot(N))
Expand Down
38 changes: 20 additions & 18 deletions llvm/test/CodeGen/RISCV/rvv/vscale-vw-web-simplification.ll
Original file line number Diff line number Diff line change
Expand Up @@ -283,18 +283,19 @@ define <vscale x 2 x i32> @vwop_vscale_sext_i8i32_multiple_users(ptr %x, ptr %y,
;
; FOLDING-LABEL: vwop_vscale_sext_i8i32_multiple_users:
; FOLDING: # %bb.0:
; FOLDING-NEXT: vsetvli a3, zero, e32, m1, ta, ma
; FOLDING-NEXT: vsetvli a3, zero, e16, mf2, ta, ma
; FOLDING-NEXT: vle8.v v8, (a0)
; FOLDING-NEXT: vle8.v v9, (a1)
; FOLDING-NEXT: vle8.v v10, (a2)
; FOLDING-NEXT: vsext.vf4 v11, v8
; FOLDING-NEXT: vsext.vf4 v8, v9
; FOLDING-NEXT: vsext.vf4 v9, v10
; FOLDING-NEXT: vmul.vv v8, v11, v8
; FOLDING-NEXT: vadd.vv v10, v11, v9
; FOLDING-NEXT: vsub.vv v9, v11, v9
; FOLDING-NEXT: vor.vv v8, v8, v10
; FOLDING-NEXT: vor.vv v8, v8, v9
; FOLDING-NEXT: vsext.vf2 v11, v8
; FOLDING-NEXT: vsext.vf2 v8, v9
; FOLDING-NEXT: vsext.vf2 v9, v10
; FOLDING-NEXT: vwmul.vv v10, v11, v8
; FOLDING-NEXT: vwadd.vv v8, v11, v9
; FOLDING-NEXT: vwsub.vv v12, v11, v9
; FOLDING-NEXT: vsetvli zero, zero, e32, m1, ta, ma
; FOLDING-NEXT: vor.vv v8, v10, v8
; FOLDING-NEXT: vor.vv v8, v8, v12
; FOLDING-NEXT: ret
%a = load <vscale x 2 x i8>, ptr %x
%b = load <vscale x 2 x i8>, ptr %y
Expand Down Expand Up @@ -563,18 +564,19 @@ define <vscale x 2 x i32> @vwop_vscale_zext_i8i32_multiple_users(ptr %x, ptr %y,
;
; FOLDING-LABEL: vwop_vscale_zext_i8i32_multiple_users:
; FOLDING: # %bb.0:
; FOLDING-NEXT: vsetvli a3, zero, e32, m1, ta, ma
; FOLDING-NEXT: vsetvli a3, zero, e16, mf2, ta, ma
; FOLDING-NEXT: vle8.v v8, (a0)
; FOLDING-NEXT: vle8.v v9, (a1)
; FOLDING-NEXT: vle8.v v10, (a2)
; FOLDING-NEXT: vzext.vf4 v11, v8
; FOLDING-NEXT: vzext.vf4 v8, v9
; FOLDING-NEXT: vzext.vf4 v9, v10
; FOLDING-NEXT: vmul.vv v8, v11, v8
; FOLDING-NEXT: vadd.vv v10, v11, v9
; FOLDING-NEXT: vsub.vv v9, v11, v9
; FOLDING-NEXT: vor.vv v8, v8, v10
; FOLDING-NEXT: vor.vv v8, v8, v9
; FOLDING-NEXT: vzext.vf2 v11, v8
; FOLDING-NEXT: vzext.vf2 v8, v9
; FOLDING-NEXT: vzext.vf2 v9, v10
; FOLDING-NEXT: vwmulu.vv v10, v11, v8
; FOLDING-NEXT: vwaddu.vv v8, v11, v9
; FOLDING-NEXT: vwsubu.vv v12, v11, v9
; FOLDING-NEXT: vsetvli zero, zero, e32, m1, ta, ma
; FOLDING-NEXT: vor.vv v8, v10, v8
; FOLDING-NEXT: vor.vv v8, v8, v12
; FOLDING-NEXT: ret
%a = load <vscale x 2 x i8>, ptr %x
%b = load <vscale x 2 x i8>, ptr %y
Expand Down
Loading