Report partial matches

Hello,

First, this issue's description may overlap a bit with https://github.com/rust-lang/regex/issues/425, by attempting to do what https://github.com/rust-lang/regex/issues/25 wanted.
However, I believe the solution described below to handle it would probably be much easier, and I think it would be a good thing to handle my use case even if #425 were implemented (even though it could be worked around by using `longest_potential_match_len` in such a world).
So, I thought it was worth it suggesting this intermediate solution, that would hopefully be reasonable to implement even though #425 appears to not be as of today, and solve at least some issues :)

#### Problem description

I'm trying to use regexes from streaming `nom` parsers, for parsing SMTP information.

In order to do so, the nom way of doing things is basically to, when receiving a packet, optimistically parse it, and if the parser reports that it needs additional data to proceed, then wait for enough data to come in and parse again -- this way most matches are done in one pass, even though it means one additional case for the rare case where a message was split between two packets.

However, in order to use regexes from this, there has to be a way to be able to know whether the regex didn't match because it couldn't match, or if it didn't match because it didn't have a complete match yet.

#### Proposed solution

The solution I can think of is simple with only basic theory of automata -- not having any idea how `regex`'s engine works in practice, I cannot say whether it would be simple in practice, though.

The idea is the following: if the automaton has reached a failing state before the end of the input, return that there can be no match. If it has not, and reaches the end of the input in a start or intermediate state, return that there may be a match that would come up if there were more input.

Note that this scheme works well only for regexes that start with `^` or similar -- but any regex that doesn't would potentially match later on, which means that this would be working as intended.

As a further evolution that might be enough to handle part of #425's problem too, it might make sense to also report when the regex was last in its start state, so that the caller could just buffer starting from the last time the regex was in its start state and re-run from that last position the next time. It's less optimized than doing it all in one pass, but hopefully would be fast enough (by making the user rescan only the parts of the buffer that could contain a match), while unlocking additional use cases.

What do you think about this idea? Would it be simple enough to deserve implementing, until #425 would be solved, or would I be better off implementing this myself based on the `regex_automata` crate?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Report partial matches #678

Problem description

Proposed solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Report partial matches #678

Description

Problem description

Proposed solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions