Skip to content

Integer range checks should lift outside of lossless truncations #60683

Open
@scottmcm

Description

@scottmcm

I was experimenting in Rust with implementing the ASCII versions of certain unicode predicates in terms of an implementation on ASCII bytes, something like this https://rust.godbolt.org/z/c495YKTsM

pub fn demo_is_ascii_digit(c: char) -> bool {
    if let Some(c) = AsciiChar::new(c) {
        c.is_digit()
    } else {
        false
    }
}

Unfortunately, that optimizes much worse than checking against the 32-bit type directly:

define noundef zeroext i1 @_ZN10playground19demo_is_ascii_digit17h37e7c57418c2459fE(i32 noundef %c) unnamed_addr #0 {
start:
  %_4.i = icmp ult i32 %c, 128
  %_7.i = trunc i32 %c to i8
  %0 = add i8 %_7.i, -48
  %1 = icmp ult i8 %0, 10
  %.0 = and i1 %_4.i, %1
  ret i1 %.0
}

define noundef zeroext i1 @_ZN10playground18std_is_ascii_digit17he953dd4cafd53a1eE(i32 noundef %0) unnamed_addr #0 {
start:
  %1 = add i32 %0, -48
  %2 = icmp ult i32 %1, 10
  ret i1 %2
}

Alive2 confirms that converting the former there into the latter is legal: https://alive2.llvm.org/ce/z/_vhQJE

But opt seemingly can't do it today: https://llvm.godbolt.org/z/zdsEsvrsP

Perhaps the way to get there would be to do something like this:

 start:
   %_4.i = icmp ult i32 %c, 128
-  %_7.i = trunc i32 %c to i8
-  %0 = add i8 %_7.i, -48
+  %0 = add i32 %c, -48
-  %1 = icmp ult i8 %0, 10
+  %1 = icmp ult i32 %0, 10
   %.0 = and i1 %_4.i, %1
   ret i1 %.0

Since alive2 says that's legal https://alive2.llvm.org/ce/z/bETvXP and after that other existing optimizations can get the rest of the way: https://llvm.godbolt.org/z/aEhThW96M

I tried to make a C++ repro as well, which came out with a slightly different, but still suboptimal, output: https://cpp.godbolt.org/z/r5xGxa8bc

bool is_ascii_char(uint32_t c) {
    if (c < 128) {
        auto b = (uint8_t)c;
        return (b >= (uint8_t)'0') && (b <= (uint8_t)'9');
    } else {
        return false;
    }
}
define dso_local noundef zeroext i1 @_Z13is_ascii_charj(i32 noundef %0) local_unnamed_addr #0 {
  %2 = icmp ult i32 %0, 128
  %3 = and i32 %0, 254    ; <-- the comparisons got widened, but the `and` didn't disappear
  %4 = add nsw i32 %3, -48
  %5 = icmp ult i32 %4, 10
  %6 = select i1 %2, i1 %5, i1 false
  ret i1 %6
}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions