Skip to content

support empty alternations #524

Closed
Closed
@BurntSushi

Description

@BurntSushi

Today, if you try to compile a regex with an empty alternation, e.g., a||b, then you'll get this error message:

alternations cannot currently contain empty sub-expressions

When I initially built the regex crate, I don't think I was clear on what an empty alternation meant, so I simply made them illegal. However, an empty alternation should have the same match semantics as an empty regex. That is, a||b should match a, b or the empty string.

When I rewrote the regex-syntax crate, I specifically made sure to support empty alternations, which I believe were forbidden in the previous version of regex-syntax. The intent was to propagate that through the regex compiler. However, when I did that, I discovered that it did not implement the correct match semantics. Fixing it did not seem easy, so I simply made the compiler return an error if it found an empty alternate:

regex/src/compile.rs

Lines 491 to 500 in 488fe56

if prev_entry == self.insts.len() {
// TODO(burntsushi): It is kind of silly that we don't support
// empty-subexpressions in alternates, but it is supremely
// awkward to support them in the existing compiler
// infrastructure. This entire compiler needs to be thrown out
// anyway, so don't feel too bad.
return Err(Error::Syntax(
"alternations cannot currently contain \
empty sub-expressions".to_string()));
}

Part of my plans for the future are to rethink a lot of the regex internals, and the compiler itself is at the top of that list. So I plan to tackle this problem when I rework the compiler.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions