Skip to content

[clang][Preprocessor] Add peekNextPPToken, makes look ahead next token without side-effects #143898

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 24, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions clang/include/clang/Lex/Lexer.h
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ class Lexer : public PreprocessorLexer {
//===--------------------------------------------------------------------===//
// Context that changes as the file is lexed.
// NOTE: any state that mutates when in raw mode must have save/restore code
// in Lexer::isNextPPTokenLParen.
// in Lexer::peekNextPPToken.

// BufferPtr - Current pointer into the buffer. This is the next character
// to be lexed.
Expand Down Expand Up @@ -642,10 +642,10 @@ class Lexer : public PreprocessorLexer {
BufferPtr = TokEnd;
}

/// isNextPPTokenLParen - Return 1 if the next unexpanded token will return a
/// tok::l_paren token, 0 if it is something else and 2 if there are no more
/// tokens in the buffer controlled by this lexer.
unsigned isNextPPTokenLParen();
/// peekNextPPToken - Return std::nullopt if there are no more tokens in the
/// buffer controlled by this lexer, otherwise return the next unexpanded
/// token.
std::optional<Token> peekNextPPToken();

//===--------------------------------------------------------------------===//
// Lexer character reading interfaces.
Expand Down
8 changes: 7 additions & 1 deletion clang/include/clang/Lex/Preprocessor.h
Original file line number Diff line number Diff line change
Expand Up @@ -2288,7 +2288,9 @@ class Preprocessor {
/// Determine whether the next preprocessor token to be
/// lexed is a '('. If so, consume the token and return true, if not, this
/// method should have no observable side-effect on the lexed tokens.
bool isNextPPTokenLParen();
bool isNextPPTokenLParen() {
return peekNextPPToken().value_or(Token{}).is(tok::l_paren);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like that we are creating a new token everytime - I'm not sure that it matters though

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lexer is almost always on the hot path but Tokens are small... I think it may be worth verifying that this has no performance difference from an approach without value_or.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use an if-stmt to avoid create a dummy Token{}. WDYT? (I'm interested in this performance overhead, give me some time to test it).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I was thinking of as an alternative, but Erich's suggestion also seems like a reasonable idea worth exploring. It's a bummer to go back to foo == tok::blah || foo == tok::yada instead of using IsOneOf(), but if that gets too frustrating, we could add a helper function like isTokenKindOneOf() that takes a pack of token kinds.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An additional alternative, is to just have

template<TokenKind TK1, TokenKind...TKs>
isNextPPTokenOneOf();

Then just ask the question we REALLY care about.

So uses of THIS function become:
isNextPPTokenOneOf<tok::l_paren>()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! This looks good to me, let me give a try.

}

private:
/// Identifiers used for SEH handling in Borland. These are only
Expand Down Expand Up @@ -2667,6 +2669,10 @@ class Preprocessor {

void removeCachedMacroExpandedTokensOfLastLexer();

/// Peek the next token. If so, return the token, if not, this
/// method should have no observable side-effect on the lexed tokens.
std::optional<Token> peekNextPPToken();

/// After reading "MACRO(", this method is invoked to read all of the formal
/// arguments specified for the macro invocation. Returns null on error.
MacroArgs *ReadMacroCallArgumentList(Token &MacroName, MacroInfo *MI,
Expand Down
7 changes: 3 additions & 4 deletions clang/include/clang/Lex/TokenLexer.h
Original file line number Diff line number Diff line change
Expand Up @@ -139,10 +139,9 @@ class TokenLexer {
void Init(const Token *TokArray, unsigned NumToks, bool DisableMacroExpansion,
bool OwnsTokens, bool IsReinject);

/// If the next token lexed will pop this macro off the
/// expansion stack, return 2. If the next unexpanded token is a '(', return
/// 1, otherwise return 0.
unsigned isNextTokenLParen() const;
/// If the next token lexed will pop this macro off the expansion stack,
/// return std::nullopt, otherwise return the next unexpanded token.
std::optional<Token> peekNextPPToken() const;

/// Lex and return a token from this macro stream.
bool Lex(Token &Tok);
Expand Down
21 changes: 11 additions & 10 deletions clang/lib/Lex/Lexer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3200,18 +3200,19 @@ bool Lexer::LexEndOfFile(Token &Result, const char *CurPtr) {
return PP->HandleEndOfFile(Result, isPragmaLexer());
}

/// isNextPPTokenLParen - Return 1 if the next unexpanded token lexed from
/// the specified lexer will return a tok::l_paren token, 0 if it is something
/// else and 2 if there are no more tokens in the buffer controlled by the
/// lexer.
unsigned Lexer::isNextPPTokenLParen() {
/// peekNextPPToken - Return std::nullopt if there are no more tokens in the
/// buffer controlled by this lexer, otherwise return the next unexpanded
/// token.
std::optional<Token> Lexer::peekNextPPToken() {
assert(!LexingRawMode && "How can we expand a macro from a skipping buffer?");

if (isDependencyDirectivesLexer()) {
if (NextDepDirectiveTokenIndex == DepDirectives.front().Tokens.size())
return 2;
return DepDirectives.front().Tokens[NextDepDirectiveTokenIndex].is(
tok::l_paren);
return std::nullopt;
Token Result;
(void)convertDependencyDirectiveToken(
DepDirectives.front().Tokens[NextDepDirectiveTokenIndex], Result);
return Result;
}

// Switch to 'skipping' mode. This will ensure that we can lex a token
Expand Down Expand Up @@ -3240,8 +3241,8 @@ unsigned Lexer::isNextPPTokenLParen() {
LexingRawMode = false;

if (Tok.is(tok::eof))
return 2;
return Tok.is(tok::l_paren);
return std::nullopt;
return Tok;
}

/// Find the end of a version control conflict marker.
Expand Down
32 changes: 15 additions & 17 deletions clang/lib/Lex/PPMacroExpansion.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -418,42 +418,40 @@ static bool isTrivialSingleTokenExpansion(const MacroInfo *MI,
return !llvm::is_contained(MI->params(), II);
}

/// isNextPPTokenLParen - Determine whether the next preprocessor token to be
/// lexed is a '('. If so, consume the token and return true, if not, this
/// method should have no observable side-effect on the lexed tokens.
bool Preprocessor::isNextPPTokenLParen() {
/// isNextPPTokenLParen - Peek the next token. If so, return the token, if not,
/// this method should have no observable side-effect on the lexed tokens.
std::optional<Token> Preprocessor::peekNextPPToken() {
// Do some quick tests for rejection cases.
unsigned Val;
std::optional<Token> Val;
if (CurLexer)
Val = CurLexer->isNextPPTokenLParen();
Val = CurLexer->peekNextPPToken();
else
Val = CurTokenLexer->isNextTokenLParen();
Val = CurTokenLexer->peekNextPPToken();

if (Val == 2) {
if (!Val) {
// We have run off the end. If it's a source file we don't
// examine enclosing ones (C99 5.1.1.2p4). Otherwise walk up the
// macro stack.
if (CurPPLexer)
return false;
return std::nullopt;
for (const IncludeStackInfo &Entry : llvm::reverse(IncludeMacroStack)) {
if (Entry.TheLexer)
Val = Entry.TheLexer->isNextPPTokenLParen();
Val = Entry.TheLexer->peekNextPPToken();
else
Val = Entry.TheTokenLexer->isNextTokenLParen();
Val = Entry.TheTokenLexer->peekNextPPToken();

if (Val != 2)
if (Val)
break;

// Ran off the end of a source file?
if (Entry.ThePPLexer)
return false;
return std::nullopt;
}
}

// Okay, if we know that the token is a '(', lex it and return. Otherwise we
// have found something that isn't a '(' or we found the end of the
// translation unit. In either case, return false.
return Val == 1;
// Okay, we found the token and return. Otherwise we found the end of the
// translation unit.
return Val;
}

/// HandleMacroExpandedIdentifier - If an identifier token is read that is to be
Expand Down
10 changes: 5 additions & 5 deletions clang/lib/Lex/TokenLexer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -921,13 +921,13 @@ bool TokenLexer::pasteTokens(Token &LHSTok, ArrayRef<Token> TokenStream,
}

/// isNextTokenLParen - If the next token lexed will pop this macro off the
/// expansion stack, return 2. If the next unexpanded token is a '(', return
/// 1, otherwise return 0.
unsigned TokenLexer::isNextTokenLParen() const {
/// expansion stack, return std::nullopt, otherwise return the next unexpanded
/// token.
std::optional<Token> TokenLexer::peekNextPPToken() const {
// Out of tokens?
if (isAtEnd())
return 2;
return Tokens[CurTokenIdx].is(tok::l_paren);
return std::nullopt;
return Tokens[CurTokenIdx];
}

/// isParsingPreprocessorDirective - Return true if we are in the middle of a
Expand Down
Loading