From 34e7e5fc28ff095241577bb15f77a850b9a90152 Mon Sep 17 00:00:00 2001 From: Matthew Woodcraft Date: Sat, 5 Mar 2022 22:29:25 +0000 Subject: [PATCH 01/24] tokens.md: move the link reference definitions to the end of the file --- src/tokens.md | 77 +++++++++++++++++++++++++-------------------------- 1 file changed, 37 insertions(+), 40 deletions(-) diff --git a/src/tokens.md b/src/tokens.md index 5516fb7b3..9a3484400 100644 --- a/src/tokens.md +++ b/src/tokens.md @@ -430,8 +430,6 @@ Note that the Rust syntax considers `-1i8` as an application of the [unary minus operator] to an integer literal `1i8`, rather than a single integer literal. -[unary minus operator]: expressions/operator-expr.md#negation-operators - #### Tuple index > **Lexer**\ @@ -542,8 +540,6 @@ Lifetime parameters and [loop labels] use LIFETIME_OR_LABEL tokens. Any LIFETIME_TOKEN will be accepted by the lexer, and for example, can be used in macros. -[loop labels]: expressions/loop-expr.md - ## Punctuation Punctuation symbol tokens are listed here for completeness. Their individual @@ -609,6 +605,41 @@ them are referred to as "token trees" in [macros]. The three types of brackets | `[` `]` | Square brackets | | `(` `)` | Parentheses | +## Reserved prefixes + +> **Lexer 2021+**\ +> RESERVED_TOKEN_DOUBLE_QUOTE : ( IDENTIFIER_OR_KEYWORD _Except `b` or `r` or `br`_ | `_` ) `"`\ +> RESERVED_TOKEN_SINGLE_QUOTE : ( IDENTIFIER_OR_KEYWORD _Except `b`_ | `_` ) `'`\ +> RESERVED_TOKEN_POUND : ( IDENTIFIER_OR_KEYWORD _Except `r` or `br`_ | `_` ) `#` + +Some lexical forms known as _reserved prefixes_ are reserved for future use. + +Source input which would otherwise be lexically interpreted as a non-raw identifier (or a keyword or `_`) which is immediately followed by a `#`, `'`, or `"` character (without intervening whitespace) is identified as a reserved prefix. + +Note that raw identifiers, raw string literals, and raw byte string literals may contain a `#` character but are not interpreted as containing a reserved prefix. + +Similarly the `r`, `b`, and `br` prefixes used in raw string literals, byte literals, byte string literals, and raw byte string literals are not interpreted as reserved prefixes. + +> **Edition Differences**: Starting with the 2021 edition, reserved prefixes are reported as an error by the lexer (in particular, they cannot be passed to macros). +> +> Before the 2021 edition, a reserved prefixes are accepted by the lexer and interpreted as multiple tokens (for example, one token for the identifier or keyword, followed by a `#` token). +> +> Examples accepted in all editions: +> ```rust +> macro_rules! lexes {($($_:tt)*) => {}} +> lexes!{a #foo} +> lexes!{continue 'foo} +> lexes!{match "..." {}} +> lexes!{r#let#foo} // three tokens: r#let # foo +> ``` +> +> Examples accepted before the 2021 edition but rejected later: +> ```rust,edition2018 +> macro_rules! lexes {($($_:tt)*) => {}} +> lexes!{a#foo} +> lexes!{continue'foo} +> lexes!{match"..." {}} +> ``` [Inferred types]: types/inferred.md [Range patterns]: patterns.md#range-patterns @@ -636,6 +667,7 @@ them are referred to as "token trees" in [macros]. The three types of brackets [if let]: expressions/if-expr.md#if-let-expressions [keywords]: keywords.md [lazy-bool]: expressions/operator-expr.md#lazy-boolean-operators +[loop labels]: expressions/loop-expr.md [machine types]: types/numeric.md [macros]: macros-by-example.md [match]: expressions/match-expr.md @@ -656,42 +688,7 @@ them are referred to as "token trees" in [macros]. The three types of brackets [tuple structs]: items/structs.md [tuple variants]: items/enumerations.md [tuples]: types/tuple.md +[unary minus operator]: expressions/operator-expr.md#negation-operators [use declarations]: items/use-declarations.md [use wildcards]: items/use-declarations.md [while let]: expressions/loop-expr.md#predicate-pattern-loops - -## Reserved prefixes - -> **Lexer 2021+**\ -> RESERVED_TOKEN_DOUBLE_QUOTE : ( IDENTIFIER_OR_KEYWORD _Except `b` or `r` or `br`_ | `_` ) `"`\ -> RESERVED_TOKEN_SINGLE_QUOTE : ( IDENTIFIER_OR_KEYWORD _Except `b`_ | `_` ) `'`\ -> RESERVED_TOKEN_POUND : ( IDENTIFIER_OR_KEYWORD _Except `r` or `br`_ | `_` ) `#` - -Some lexical forms known as _reserved prefixes_ are reserved for future use. - -Source input which would otherwise be lexically interpreted as a non-raw identifier (or a keyword or `_`) which is immediately followed by a `#`, `'`, or `"` character (without intervening whitespace) is identified as a reserved prefix. - -Note that raw identifiers, raw string literals, and raw byte string literals may contain a `#` character but are not interpreted as containing a reserved prefix. - -Similarly the `r`, `b`, and `br` prefixes used in raw string literals, byte literals, byte string literals, and raw byte string literals are not interpreted as reserved prefixes. - -> **Edition Differences**: Starting with the 2021 edition, reserved prefixes are reported as an error by the lexer (in particular, they cannot be passed to macros). -> -> Before the 2021 edition, a reserved prefixes are accepted by the lexer and interpreted as multiple tokens (for example, one token for the identifier or keyword, followed by a `#` token). -> -> Examples accepted in all editions: -> ```rust -> macro_rules! lexes {($($_:tt)*) => {}} -> lexes!{a #foo} -> lexes!{continue 'foo} -> lexes!{match "..." {}} -> lexes!{r#let#foo} // three tokens: r#let # foo -> ``` -> -> Examples accepted before the 2021 edition but rejected later: -> ```rust,edition2018 -> macro_rules! lexes {($($_:tt)*) => {}} -> lexes!{a#foo} -> lexes!{continue'foo} -> lexes!{match"..." {}} -> ``` From e06fea070ab22f49ea689be77693b01dd44e14ce Mon Sep 17 00:00:00 2001 From: Matthew Woodcraft Date: Sun, 6 Mar 2022 16:46:44 +0000 Subject: [PATCH 02/24] Move the general description of literal expressions from tokens.md to literal-expr.md --- src/expressions/literal-expr.md | 9 +++++++-- src/tokens.md | 7 ++----- 2 files changed, 9 insertions(+), 7 deletions(-) diff --git a/src/expressions/literal-expr.md b/src/expressions/literal-expr.md index 0e2d7c0a4..ec439f658 100644 --- a/src/expressions/literal-expr.md +++ b/src/expressions/literal-expr.md @@ -12,8 +12,11 @@ >    | [FLOAT_LITERAL]\ >    | [BOOLEAN_LITERAL] -A _literal expression_ consists of one of the [literal](../tokens.md#literals) forms described earlier. -It directly describes a number, character, string, or boolean value. +A _literal expression_ is an expression consisting of a single token, rather than a sequence of tokens, that immediately and directly denotes the value it evaluates to, rather than referring to it by name or some other evaluation rule. + +A literal is a form of [constant expression], so is evaluated (primarily) at compile time. + +Each of the lexical [literal][literal tokens] forms described earlier can make up a literal expression. ```rust "hello"; // string type @@ -21,6 +24,8 @@ It directly describes a number, character, string, or boolean value. 5; // integer type ``` +[constant expression]: ../const_eval.md#constant-expressions +[literal tokens]: ../tokens.md#literals [CHAR_LITERAL]: ../tokens.md#character-literals [STRING_LITERAL]: ../tokens.md#string-literals [RAW_STRING_LITERAL]: ../tokens.md#raw-string-literals diff --git a/src/tokens.md b/src/tokens.md index 9a3484400..ed8ae1f99 100644 --- a/src/tokens.md +++ b/src/tokens.md @@ -18,11 +18,7 @@ table production] form, and appear in `monospace` font. ## Literals -A literal is an expression consisting of a single token, rather than a sequence -of tokens, that immediately and directly denotes the value it evaluates to, -rather than referring to it by name or some other evaluation rule. A literal is -a form of [constant expression](const_eval.md#constant-expressions), so is -evaluated (primarily) at compile time. +Literals are tokens used in [literal expressions]. ### Examples @@ -667,6 +663,7 @@ Similarly the `r`, `b`, and `br` prefixes used in raw string literals, byte lite [if let]: expressions/if-expr.md#if-let-expressions [keywords]: keywords.md [lazy-bool]: expressions/operator-expr.md#lazy-boolean-operators +[literal expressions]: expressions/literal-expr.md [loop labels]: expressions/loop-expr.md [machine types]: types/numeric.md [macros]: macros-by-example.md From 9b32e40eb7245ed9b7244af043fcfb003c16000c Mon Sep 17 00:00:00 2001 From: Matthew Woodcraft Date: Sun, 6 Mar 2022 16:46:44 +0000 Subject: [PATCH 03/24] Describe the effect of integer literal suffixes in literal-expr.md rather than tokens.md --- src/expressions/literal-expr.md | 37 +++++++++++++++++++++++++++ src/tokens.md | 45 ++++++++++++--------------------- 2 files changed, 53 insertions(+), 29 deletions(-) diff --git a/src/expressions/literal-expr.md b/src/expressions/literal-expr.md index ec439f658..5494d2259 100644 --- a/src/expressions/literal-expr.md +++ b/src/expressions/literal-expr.md @@ -24,8 +24,45 @@ Each of the lexical [literal][literal tokens] forms described earlier can make u 5; // integer type ``` +## Integer literal expressions + +An integer literal expression consists of a single [INTEGER_LITERAL] token. + +If the token has a [suffix], the suffix will be the name of one of the [primitive integer types][numeric types]: `u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`, `u128`, `i128`, `usize`, or `isize`, and the expression has that type. + +If the token has no suffix, the expression's type is determined by type inference: + +* If an integer type can be _uniquely_ determined from the surrounding program context, the expression has that type. + +* If the program context under-constrains the type, it defaults to the signed 32-bit integer `i32`. + +* If the program context over-constrains the type, it is considered a static type error. + +Examples of integer literal expressions: + +```rust +123; // type i32 +123i32; // type i32 +123u32; // type u32 +123_u32; // type u32 +let a: u64 = 123; // type u64 + +0xff; // type i32 +0xff_u8; // type u8 + +0o70; // type i32 +0o70_i16; // type i16 + +0b1111_1111_1001_0000; // type i32 +0b1111_1111_1001_0000i64; // type i64 + +0usize; // type usize +``` + [constant expression]: ../const_eval.md#constant-expressions [literal tokens]: ../tokens.md#literals +[numeric types]: ../types/numeric.md +[suffix]: ../tokens.md#suffixes [CHAR_LITERAL]: ../tokens.md#character-literals [STRING_LITERAL]: ../tokens.md#string-literals [RAW_STRING_LITERAL]: ../tokens.md#raw-string-literals diff --git a/src/tokens.md b/src/tokens.md index ed8ae1f99..bbc10378f 100644 --- a/src/tokens.md +++ b/src/tokens.md @@ -359,43 +359,29 @@ An _integer literal_ has one of four forms: (`0b`) and continues as any mixture (with at least one digit) of binary digits and underscores. -Like any literal, an integer literal may be followed (immediately, -without any spaces) by an _integer suffix_, which forcibly sets the -type of the literal. The integer suffix must be the name of one of the -integral types: `u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`, -`u128`, `i128`, `usize`, or `isize`. - -The type of an _unsuffixed_ integer literal is determined by type inference: - -* If an integer type can be _uniquely_ determined from the surrounding - program context, the unsuffixed integer literal has that type. - -* If the program context under-constrains the type, it defaults to the - signed 32-bit integer `i32`. - -* If the program context over-constrains the type, it is considered a - static type error. +Like any literal, an integer literal may be followed (immediately, without any spaces) by an _integer suffix_, which must be the name of one of the [primitive integer types][numeric types]: +`u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`, `u128`, `i128`, `usize`, or `isize`. +See [literal expressions] for the effect of these suffixes. Examples of integer literals of various forms: ```rust -123; // type i32 -123i32; // type i32 -123u32; // type u32 -123_u32; // type u32 -let a: u64 = 123; // type u64 +123; +123i32; +123u32; +123_u32; -0xff; // type i32 -0xff_u8; // type u8 +0xff; +0xff_u8; -0o70; // type i32 -0o70_i16; // type i16 +0o70; +0o70_i16; -0b1111_1111_1001_0000; // type i32 -0b1111_1111_1001_0000i64; // type i64 -0b________1; // type i32 +0b1111_1111_1001_0000; +0b1111_1111_1001_0000i64; +0b________1; -0usize; // type usize +0usize; ``` Examples of invalid integer literals: @@ -671,6 +657,7 @@ Similarly the `r`, `b`, and `br` prefixes used in raw string literals, byte lite [negation]: expressions/operator-expr.md#negation-operators [negative impls]: items/implementations.md [never type]: types/never.md +[numeric types]: types/numeric.md [paths]: paths.md [patterns]: patterns.md [question]: expressions/operator-expr.md#the-question-mark-operator From 8fbbbda6abb680950b417c3635a632b0f014a1a8 Mon Sep 17 00:00:00 2001 From: Matthew Woodcraft Date: Sun, 6 Mar 2022 16:46:44 +0000 Subject: [PATCH 04/24] Describe the effect of floating-point literal suffixes in literal-expr.md rather than tokens.md --- src/expressions/literal-expr.md | 26 +++++++++++++++++++++++++ src/tokens.md | 34 +++++++++------------------------ 2 files changed, 35 insertions(+), 25 deletions(-) diff --git a/src/expressions/literal-expr.md b/src/expressions/literal-expr.md index 5494d2259..c2d035b79 100644 --- a/src/expressions/literal-expr.md +++ b/src/expressions/literal-expr.md @@ -59,7 +59,33 @@ let a: u64 = 123; // type u64 0usize; // type usize ``` +## Floating-point literal expressions + +A floating-point literal expression consists of a single [FLOAT_LITERAL] token. + +If the token has a [suffix], the suffix will be the name of one of the [primitive floating-point types][floating-point types]: `f32` or `f64`, and the expression has that type. + +If the token has no suffix, the expression's type is determined by type inference: + +* If a floating-point type can be _uniquely_ determined from the surrounding program context, the expression has that type. + +* If the program context under-constrains the type, it defaults to `f64`. + +* If the program context over-constrains the type, it is considered a static type error. + +Examples of floating-point literal expressions: + +```rust +123.0f64; // type f64 +0.1f64; // type f64 +0.1f32; // type f32 +12E+99_f64; // type f64 +5f32; // type f32 +let x: f64 = 2.; // type f64 +``` + [constant expression]: ../const_eval.md#constant-expressions +[floating-point types]: ../types/numeric.md#floating-point-types [literal tokens]: ../tokens.md#literals [numeric types]: ../types/numeric.md [suffix]: ../tokens.md#suffixes diff --git a/src/tokens.md b/src/tokens.md index bbc10378f..d68b080f8 100644 --- a/src/tokens.md +++ b/src/tokens.md @@ -465,40 +465,24 @@ A _floating-point literal_ has one of two forms: Like integer literals, a floating-point literal may be followed by a suffix, so long as the pre-suffix part does not end with `U+002E` (`.`). -The suffix forcibly sets the type of the literal. There are two valid -_floating-point suffixes_, `f32` and `f64` (the 32-bit and 64-bit floating point -types), which explicitly determine the type of the literal. - -The type of an _unsuffixed_ floating-point literal is determined by -type inference: - -* If a floating-point type can be _uniquely_ determined from the - surrounding program context, the unsuffixed floating-point literal - has that type. - -* If the program context under-constrains the type, it defaults to `f64`. - -* If the program context over-constrains the type, it is considered a - static type error. +There are two valid _floating-point suffixes_: `f32` and `f64` (the names of the 32-bit and 64-bit [primitive floating-point types][floating-point types]). +See [literal expressions] for the effect of these suffixes. Examples of floating-point literals of various forms: ```rust -123.0f64; // type f64 -0.1f64; // type f64 -0.1f32; // type f32 -12E+99_f64; // type f64 -5f32; // type f32 -let x: f64 = 2.; // type f64 +123.0f64; +0.1f64; +0.1f32; +12E+99_f64; +5f32; +let x: f64 = 2.; ``` This last example is different because it is not possible to use the suffix syntax with a floating point literal ending in a period. `2.f64` would attempt to call a method named `f64` on `2`. -The representation semantics of floating-point numbers are described in -["Machine Types"][machine types]. - ### Boolean literals > **Lexer**\ @@ -642,6 +626,7 @@ Similarly the `r`, `b`, and `br` prefixes used in raw string literals, byte lite [extern crates]: items/extern-crates.md [extern]: items/external-blocks.md [field]: expressions/field-expr.md +[floating-point types]: types/numeric.md#floating-point-types [function pointer type]: types/function-pointer.md [functions]: items/functions.md [generics]: items/generics.md @@ -651,7 +636,6 @@ Similarly the `r`, `b`, and `br` prefixes used in raw string literals, byte lite [lazy-bool]: expressions/operator-expr.md#lazy-boolean-operators [literal expressions]: expressions/literal-expr.md [loop labels]: expressions/loop-expr.md -[machine types]: types/numeric.md [macros]: macros-by-example.md [match]: expressions/match-expr.md [negation]: expressions/operator-expr.md#negation-operators From d478e54526342eb227f3c30b98788ca4043557e6 Mon Sep 17 00:00:00 2001 From: Matthew Woodcraft Date: Sun, 6 Mar 2022 16:46:44 +0000 Subject: [PATCH 05/24] Document how the value of an integer literal expression is determined --- src/expressions/literal-expr.md | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/src/expressions/literal-expr.md b/src/expressions/literal-expr.md index c2d035b79..11e280b60 100644 --- a/src/expressions/literal-expr.md +++ b/src/expressions/literal-expr.md @@ -59,6 +59,18 @@ let a: u64 = 123; // type u64 0usize; // type usize ``` +The value of the expression is determined from the string representation of the token as follows: + +* An integer radix is chosen by inspecting the first two characters of the string: `0b` indicates radix 2, `0o` indicates radix 8, `0x` indicates radix 16; otherwise the radix is 10. + +* If the radix is not 10, the first two characters are removed from the string. + +* Any underscores are removed from the string. + +* The string is converted to a `u128` value as if by [`u128::from_str_radix`] with the chosen radix. + +* The `u128` value is converted to the expression's type via a [numeric cast]. + ## Floating-point literal expressions A floating-point literal expression consists of a single [FLOAT_LITERAL] token. @@ -87,8 +99,10 @@ let x: f64 = 2.; // type f64 [constant expression]: ../const_eval.md#constant-expressions [floating-point types]: ../types/numeric.md#floating-point-types [literal tokens]: ../tokens.md#literals +[numeric cast]: operator-expr.md#numeric-cast [numeric types]: ../types/numeric.md [suffix]: ../tokens.md#suffixes +[`u128::from_str_radix`]: ../../core/primitive.u128.md#method.from_str_radix [CHAR_LITERAL]: ../tokens.md#character-literals [STRING_LITERAL]: ../tokens.md#string-literals [RAW_STRING_LITERAL]: ../tokens.md#raw-string-literals From 938011b4f8d912ceab43bd59e5e258cf0f821582 Mon Sep 17 00:00:00 2001 From: Matthew Woodcraft Date: Sun, 6 Mar 2022 16:46:44 +0000 Subject: [PATCH 06/24] Say that integer literals out of the u128 range are a parse-time error --- src/expressions/literal-expr.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/expressions/literal-expr.md b/src/expressions/literal-expr.md index 11e280b60..08511770c 100644 --- a/src/expressions/literal-expr.md +++ b/src/expressions/literal-expr.md @@ -8,9 +8,11 @@ >    | [BYTE_LITERAL]\ >    | [BYTE_STRING_LITERAL]\ >    | [RAW_BYTE_STRING_LITERAL]\ ->    | [INTEGER_LITERAL]\ +>    | [INTEGER_LITERAL][^out-of-range]\ >    | [FLOAT_LITERAL]\ >    | [BOOLEAN_LITERAL] +> +> [^out-of-range]: A value ≥ 2128 is not allowed. A _literal expression_ is an expression consisting of a single token, rather than a sequence of tokens, that immediately and directly denotes the value it evaluates to, rather than referring to it by name or some other evaluation rule. @@ -68,6 +70,7 @@ The value of the expression is determined from the string representation of the * Any underscores are removed from the string. * The string is converted to a `u128` value as if by [`u128::from_str_radix`] with the chosen radix. +If the value does not fit in `u128`, the expression is rejected by the parser. * The `u128` value is converted to the expression's type via a [numeric cast]. From 94957445e2d6b2e38c97c7645728dc3eae8335c9 Mon Sep 17 00:00:00 2001 From: Matthew Woodcraft Date: Sun, 6 Mar 2022 16:46:44 +0000 Subject: [PATCH 07/24] Literal expressions: add a Note describing the overflowing_literals lint --- src/expressions/literal-expr.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/src/expressions/literal-expr.md b/src/expressions/literal-expr.md index 08511770c..b5b591cea 100644 --- a/src/expressions/literal-expr.md +++ b/src/expressions/literal-expr.md @@ -74,6 +74,9 @@ If the value does not fit in `u128`, the expression is rejected by the parser. * The `u128` value is converted to the expression's type via a [numeric cast]. +> **Note**: The final cast will truncate the value of the literal if it does not fit in the expression's type. +> There is a [lint check] named `overflowing_literals`, defaulting to `deny`, which rejects expressions where this occurs. + ## Floating-point literal expressions A floating-point literal expression consists of a single [FLOAT_LITERAL] token. @@ -101,6 +104,7 @@ let x: f64 = 2.; // type f64 [constant expression]: ../const_eval.md#constant-expressions [floating-point types]: ../types/numeric.md#floating-point-types +[lint check]: ../attributes/diagnostics.md#lint-check-attributes [literal tokens]: ../tokens.md#literals [numeric cast]: operator-expr.md#numeric-cast [numeric types]: ../types/numeric.md From 72a554a93e9ae8e06c4dc110be5501fdb03895a6 Mon Sep 17 00:00:00 2001 From: Matthew Woodcraft Date: Sun, 6 Mar 2022 16:46:44 +0000 Subject: [PATCH 08/24] Document how the value of a floating-point literal expression is determined For now, refer to the stdlib docs for the actual interpretation. --- src/expressions/literal-expr.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/src/expressions/literal-expr.md b/src/expressions/literal-expr.md index b5b591cea..37930db54 100644 --- a/src/expressions/literal-expr.md +++ b/src/expressions/literal-expr.md @@ -102,6 +102,12 @@ Examples of floating-point literal expressions: let x: f64 = 2.; // type f64 ``` +The value of the expression is determined from the string representation of the token as follows: + +* Any underscores are removed from the string. + +* The string is converted to the expression's type as if by [`f32::from_str`] or [`f64::from_str`]. + [constant expression]: ../const_eval.md#constant-expressions [floating-point types]: ../types/numeric.md#floating-point-types [lint check]: ../attributes/diagnostics.md#lint-check-attributes @@ -109,6 +115,8 @@ let x: f64 = 2.; // type f64 [numeric cast]: operator-expr.md#numeric-cast [numeric types]: ../types/numeric.md [suffix]: ../tokens.md#suffixes +[`f32::from_str`]: ../../core/primitive.f32.md#method.from_str +[`f64::from_str`]: ../../core/primitive.f64.md#method.from_str [`u128::from_str_radix`]: ../../core/primitive.u128.md#method.from_str_radix [CHAR_LITERAL]: ../tokens.md#character-literals [STRING_LITERAL]: ../tokens.md#string-literals From c8b9a2053881d735466aa8f3e0c86c1a59f6cc88 Mon Sep 17 00:00:00 2001 From: Matthew Woodcraft Date: Sun, 6 Mar 2022 16:46:44 +0000 Subject: [PATCH 09/24] Literal expressions: add a Note on infinite and NaN floating-point literals --- src/expressions/literal-expr.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/src/expressions/literal-expr.md b/src/expressions/literal-expr.md index 37930db54..b4b7ddfaf 100644 --- a/src/expressions/literal-expr.md +++ b/src/expressions/literal-expr.md @@ -108,6 +108,10 @@ The value of the expression is determined from the string representation of the * The string is converted to the expression's type as if by [`f32::from_str`] or [`f64::from_str`]. +> **Note**: `inf` and `NaN` are not literal tokens. +> The [`f32::INFINITY`], [`f64::INFINITY`], [`f32::NAN`], and [`f64::NAN`] constants can be used instead of literal expressions. +> A literal large enough to be evaluated as infinite will trigger the `overflowing_literals` lint check. + [constant expression]: ../const_eval.md#constant-expressions [floating-point types]: ../types/numeric.md#floating-point-types [lint check]: ../attributes/diagnostics.md#lint-check-attributes @@ -116,7 +120,11 @@ The value of the expression is determined from the string representation of the [numeric types]: ../types/numeric.md [suffix]: ../tokens.md#suffixes [`f32::from_str`]: ../../core/primitive.f32.md#method.from_str +[`f32::INFINITY`]: ../../core/primitive.f32.md#associatedconstant.INFINITY +[`f32::NAN`]: ../../core/primitive.f32.md#associatedconstant.NAN [`f64::from_str`]: ../../core/primitive.f64.md#method.from_str +[`f64::INFINITY`]: ../../core/primitive.f64.md#associatedconstant.INFINITY +[`f64::NAN`]: ../../core/primitive.f64.md#associatedconstant.NAN [`u128::from_str_radix`]: ../../core/primitive.u128.md#method.from_str_radix [CHAR_LITERAL]: ../tokens.md#character-literals [STRING_LITERAL]: ../tokens.md#string-literals From 1913a4f12aa75355c8fc589a972922d41f8f3b8c Mon Sep 17 00:00:00 2001 From: Matthew Woodcraft Date: Sun, 6 Mar 2022 16:46:44 +0000 Subject: [PATCH 10/24] Notes about negated literals --- src/expressions/literal-expr.md | 5 +++++ src/tokens.md | 8 ++++---- 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/src/expressions/literal-expr.md b/src/expressions/literal-expr.md index b4b7ddfaf..e01ac1786 100644 --- a/src/expressions/literal-expr.md +++ b/src/expressions/literal-expr.md @@ -77,6 +77,8 @@ If the value does not fit in `u128`, the expression is rejected by the parser. > **Note**: The final cast will truncate the value of the literal if it does not fit in the expression's type. > There is a [lint check] named `overflowing_literals`, defaulting to `deny`, which rejects expressions where this occurs. +> **Note**: `-1i8`, for example, is an application of the [negation operator] to the literal expression `1i8`, not a single integer literal expression. + ## Floating-point literal expressions A floating-point literal expression consists of a single [FLOAT_LITERAL] token. @@ -108,6 +110,8 @@ The value of the expression is determined from the string representation of the * The string is converted to the expression's type as if by [`f32::from_str`] or [`f64::from_str`]. +> **Note**: `-1.0`, for example, is an application of the [negation operator] to the literal expression `1.0`, not a single floating-point literal expression. + > **Note**: `inf` and `NaN` are not literal tokens. > The [`f32::INFINITY`], [`f64::INFINITY`], [`f32::NAN`], and [`f64::NAN`] constants can be used instead of literal expressions. > A literal large enough to be evaluated as infinite will trigger the `overflowing_literals` lint check. @@ -119,6 +123,7 @@ The value of the expression is determined from the string representation of the [numeric cast]: operator-expr.md#numeric-cast [numeric types]: ../types/numeric.md [suffix]: ../tokens.md#suffixes +[negation operator]: operator-expr.md#negation-operators [`f32::from_str`]: ../../core/primitive.f32.md#method.from_str [`f32::INFINITY`]: ../../core/primitive.f32.md#associatedconstant.INFINITY [`f32::NAN`]: ../../core/primitive.f32.md#associatedconstant.NAN diff --git a/src/tokens.md b/src/tokens.md index d68b080f8..8f4e5ccda 100644 --- a/src/tokens.md +++ b/src/tokens.md @@ -384,6 +384,8 @@ Examples of integer literals of various forms: 0usize; ``` +Note that `-1i8`, for example, is analyzed as two tokens: `-` followed by `1i8`. + Examples of invalid integer literals: ```rust,compile_fail @@ -408,10 +410,6 @@ Examples of invalid integer literals: 0b____; ``` -Note that the Rust syntax considers `-1i8` as an application of the [unary minus -operator] to an integer literal `1i8`, rather than -a single integer literal. - #### Tuple index > **Lexer**\ @@ -483,6 +481,8 @@ This last example is different because it is not possible to use the suffix syntax with a floating point literal ending in a period. `2.f64` would attempt to call a method named `f64` on `2`. +Note that `-1.0`, for example, is analyzed as two tokens: `-` followed by `1.0`. + ### Boolean literals > **Lexer**\ From 46d4a279f9945e29a19f96a89df525cd76f69781 Mon Sep 17 00:00:00 2001 From: Matthew Woodcraft Date: Sun, 6 Mar 2022 16:46:44 +0000 Subject: [PATCH 11/24] Say that out-of-range suffixed integer literals are valid lexer tokens --- src/tokens.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/src/tokens.md b/src/tokens.md index 8f4e5ccda..cc5b8bf4c 100644 --- a/src/tokens.md +++ b/src/tokens.md @@ -366,6 +366,7 @@ See [literal expressions] for the effect of these suffixes. Examples of integer literals of various forms: ```rust +# #![allow(overflowing_literals)] 123; 123i32; 123u32; @@ -382,6 +383,12 @@ Examples of integer literals of various forms: 0b________1; 0usize; + +// These are too big for their type, but are still valid tokens + +128_i8; +256_u8; + ``` Note that `-1i8`, for example, is analyzed as two tokens: `-` followed by `1i8`. @@ -399,11 +406,6 @@ Examples of invalid integer literals: 0b0102; 0o0581; -// integers too big for their type (they overflow) - -128_i8; -256_u8; - // bin, hex, and octal literals must have at least one digit 0b_; From 5f81f6a423073f4d15a89ceb97684bd927d80bac Mon Sep 17 00:00:00 2001 From: Matthew Woodcraft Date: Sun, 6 Mar 2022 16:46:44 +0000 Subject: [PATCH 12/24] Make the FLOAT_LITERAL rule mention keywords as well as identifiers --- src/tokens.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/tokens.md b/src/tokens.md index cc5b8bf4c..8058ddcef 100644 --- a/src/tokens.md +++ b/src/tokens.md @@ -444,7 +444,7 @@ let horse = example.0b10; // ERROR no field named `0b10` > **Lexer**\ > FLOAT_LITERAL :\ >       DEC_LITERAL `.` -> _(not immediately followed by `.`, `_` or an [identifier]_)\ +> _(not immediately followed by `.`, `_` or an [identifier] or [keyword][keywords]_)\ >    | DEC_LITERAL FLOAT_EXPONENT\ >    | DEC_LITERAL `.` DEC_LITERAL FLOAT_EXPONENT?\ >    | DEC_LITERAL (`.` DEC_LITERAL)? From 71fd6e398539a33912c2b56ab2ea8c186be8f843 Mon Sep 17 00:00:00 2001 From: Matthew Woodcraft Date: Sun, 6 Mar 2022 16:46:44 +0000 Subject: [PATCH 13/24] Add the 5f32 case to the text description of floating-point literals --- src/tokens.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/tokens.md b/src/tokens.md index 8058ddcef..234864c7f 100644 --- a/src/tokens.md +++ b/src/tokens.md @@ -457,11 +457,12 @@ let horse = example.0b10; // ERROR no field named `0b10` > FLOAT_SUFFIX :\ >    `f32` | `f64` -A _floating-point literal_ has one of two forms: +A _floating-point literal_ has one of three forms: * A _decimal literal_ followed by a period character `U+002E` (`.`). This is optionally followed by another decimal literal, with an optional _exponent_. * A single _decimal literal_ followed by an _exponent_. +* A single _decimal literal_ (in which case a suffix is required). Like integer literals, a floating-point literal may be followed by a suffix, so long as the pre-suffix part does not end with `U+002E` (`.`). From e2015cc3a8c8863f27c00b71fbbe69408e4a1c8a Mon Sep 17 00:00:00 2001 From: Matthew Woodcraft Date: Sun, 6 Mar 2022 16:46:44 +0000 Subject: [PATCH 14/24] Add some examples of possibly confusing hexadecimal literals --- src/tokens.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/tokens.md b/src/tokens.md index 234864c7f..4cb36654b 100644 --- a/src/tokens.md +++ b/src/tokens.md @@ -374,6 +374,8 @@ Examples of integer literals of various forms: 0xff; 0xff_u8; +0x01_f32; // integer 7986, not floating-point 1.0 +0x01_e3; // integer 483, not floating-point 1000.0 0o70; 0o70_i16; From 6a379ac850d3656014c051eefc99519a0ef2431b Mon Sep 17 00:00:00 2001 From: Matthew Woodcraft Date: Sun, 6 Mar 2022 16:46:44 +0000 Subject: [PATCH 15/24] Add a Lexer rules block for number literals with arbitrary suffixes --- src/tokens.md | 41 ++++++++++++++++++++++++++++++++++++----- 1 file changed, 36 insertions(+), 5 deletions(-) diff --git a/src/tokens.md b/src/tokens.md index 4cb36654b..8ad43fdbf 100644 --- a/src/tokens.md +++ b/src/tokens.md @@ -398,13 +398,8 @@ Note that `-1i8`, for example, is analyzed as two tokens: `-` followed by `1i8`. Examples of invalid integer literals: ```rust,compile_fail -// invalid suffixes - -0invalidSuffix; - // uses numbers of the wrong base -123AFB43; 0b0102; 0o0581; @@ -488,6 +483,42 @@ to call a method named `f64` on `2`. Note that `-1.0`, for example, is analyzed as two tokens: `-` followed by `1.0`. +#### Number pseudoliterals + +> **Lexer**\ +> NUMBER_PSEUDOLITERAL :\ +>       DEC_LITERAL ( . DEC_LITERAL)? FLOAT_EXPONENT NUMBER_PSEUDOLITERAL_SUFFIX\ +>    | DEC_LITERAL ( . DEC_LITERAL)? NUMBER_PSEUDOLITERAL_SUFFIX_NO_E\ +>    | ( BIN_LITERAL | OCT_LITERAL | HEX_LITERAL ) NUMBER_PSEUDOLITERAL_SUFFIX_NO_E +> +> NUMBER_PSEUDOLITERAL_SUFFIX :\ +>    IDENTIFIER_OR_KEYWORD _not matching INTEGER_SUFFIX or FLOAT_SUFFIX_ +> +> NUMBER_PSEUDOLITERAL_SUFFIX_NO_E :\ +>    NUMBER_PSEUDOLITERAL_SUFFIX _not beginning with `e` or `E`_ + +As described [above](#suffixes), tokens with the same form as numeric literals other than in the content of their suffix are accepted by the lexer, with the exception of some cases in which the suffix begins with `e` or `E`. + +Examples of such tokens: +```rust,compile_fail +0invalidSuffix; +123AFB43; +0b010a; +0xAB_CD_EF_GH; +2.0f80; +2e5f80; +2e5e6; +2.0e5e6; + +// Lexer errors: +2e; +2.0e; +0b101e; +2em; +2.0em; +0b101em; +``` + ### Boolean literals > **Lexer**\ From 6e1979203b1dd9385c195e43b7c2b8f22e9ecc99 Mon Sep 17 00:00:00 2001 From: Matthew Woodcraft Date: Sun, 6 Mar 2022 16:46:44 +0000 Subject: [PATCH 16/24] Document reserved forms similar to number literals --- src/tokens.md | 51 +++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 43 insertions(+), 8 deletions(-) diff --git a/src/tokens.md b/src/tokens.md index 8ad43fdbf..7b3d7994f 100644 --- a/src/tokens.md +++ b/src/tokens.md @@ -497,7 +497,7 @@ Note that `-1.0`, for example, is analyzed as two tokens: `-` followed by `1.0`. > NUMBER_PSEUDOLITERAL_SUFFIX_NO_E :\ >    NUMBER_PSEUDOLITERAL_SUFFIX _not beginning with `e` or `E`_ -As described [above](#suffixes), tokens with the same form as numeric literals other than in the content of their suffix are accepted by the lexer, with the exception of some cases in which the suffix begins with `e` or `E`. +As described [above](#suffixes), tokens with the same form as numeric literals other than in the content of their suffix are accepted by the lexer (with the exception of some reserved forms described below). Examples of such tokens: ```rust,compile_fail @@ -509,14 +509,49 @@ Examples of such tokens: 2e5f80; 2e5e6; 2.0e5e6; +``` + +#### Reserved forms similar to number literals + +> **Lexer**\ +> RESERVED_NUMBER :\ +>       BIN_LITERAL \[`2`-`9`]\ +>    | OCT_LITERAL \[`8`-`9`]\ +>    | ( BIN_LITERAL | OCT_LITERAL | HEX_LITERAL ) `.` \ +>          _(not immediately followed by `.`, `_` or an [identifier] or [keyword][keywords]_)\ +>    | ( BIN_LITERAL | OCT_LITERAL ) `e`\ +>    | `0b` `_`\* _end of input or not BIN_DIGIT_\ +>    | `0o` `_`\* _end of input or not OCT_DIGIT_\ +>    | `0x` `_`\* _end of input or not HEX_DIGIT_\ +>    | DEC_LITERAL ( . DEC_LITERAL)? (`e`|`E`) (`+`|`-`)? _end of input or not DEC_DIGIT_ + +The following lexical forms similar to number literals are _reserved forms_: + +* An unsuffixed binary or octal literal followed, without intervening whitespace, by a decimal digit out of the range for its radix. + +* An unsuffixed binary, octal, or hexadecimal literal followed, without intervening whitespace, by a period character (with the same restrictions on what follows the period as for floating-point literals). -// Lexer errors: -2e; -2.0e; -0b101e; -2em; -2.0em; -0b101em; +* An unsuffixed binary or octal literal followed, without intervening whitespace, by the character `e`. + +* Input which begins with one of the radix prefixes but is not a valid binary, octal, or hexadecimal literal (because it contains no digits). + +* Input which has the form of a floating-point literal with no digits in the exponent. + +Any input containing one of these reserved forms is reported as an error by the lexer. + +Examples of reserved forms: + +```rust,compile_fail +0b0102; // this is not `0b010` followed by `2` +0o1279; // this is not `0o127` followed by `9` +0x80.0; // this is not `0x80` followed by `.` and `0` +0b101e; // this is not a pseudoliteral, or `0b101` followed by `e` +0b; // this is not a pseudoliteral, or `0` followed by `b` +0b_; // this is not a pseudoliteral, or `0` followed by `b_` +2e; // this is not a pseudoliteral, or `2` followed by `e` +2.0e; // this is not a pseudoliteral, or `2.0` followed by `e` +2em; // this is not a pseudoliteral, or `2` followed by `em` +2.0em; // this is not a pseudoliteral, or `2.0` followed by `em` ``` ### Boolean literals From e5ef69aecbac4262c130c2073a66126d0051a334 Mon Sep 17 00:00:00 2001 From: Matthew Woodcraft Date: Sun, 6 Mar 2022 16:46:44 +0000 Subject: [PATCH 17/24] tokens.md: add two zero-width spaces to placate linkchecker A LINKCHECK_EXCEPTIONS entry in linkchecker/main.rs seems to be the right way to do this, but that's not in this repository. --- src/tokens.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/tokens.md b/src/tokens.md index 7b3d7994f..365c4e8ca 100644 --- a/src/tokens.md +++ b/src/tokens.md @@ -515,8 +515,8 @@ Examples of such tokens: > **Lexer**\ > RESERVED_NUMBER :\ ->       BIN_LITERAL \[`2`-`9`]\ ->    | OCT_LITERAL \[`8`-`9`]\ +>       BIN_LITERAL \[`2`-`9`​]\ +>    | OCT_LITERAL \[`8`-`9`​]\ >    | ( BIN_LITERAL | OCT_LITERAL | HEX_LITERAL ) `.` \ >          _(not immediately followed by `.`, `_` or an [identifier] or [keyword][keywords]_)\ >    | ( BIN_LITERAL | OCT_LITERAL ) `e`\ From 8aa8b9af5bb51920243fa1fabf85b78fde429d35 Mon Sep 17 00:00:00 2001 From: Matthew Woodcraft Date: Mon, 21 Mar 2022 21:09:56 +0000 Subject: [PATCH 18/24] Cover two missing cases of number pseudoliterals --- src/tokens.md | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/src/tokens.md b/src/tokens.md index 365c4e8ca..7a4dfce86 100644 --- a/src/tokens.md +++ b/src/tokens.md @@ -487,9 +487,13 @@ Note that `-1.0`, for example, is analyzed as two tokens: `-` followed by `1.0`. > **Lexer**\ > NUMBER_PSEUDOLITERAL :\ ->       DEC_LITERAL ( . DEC_LITERAL)? FLOAT_EXPONENT NUMBER_PSEUDOLITERAL_SUFFIX\ ->    | DEC_LITERAL ( . DEC_LITERAL)? NUMBER_PSEUDOLITERAL_SUFFIX_NO_E\ ->    | ( BIN_LITERAL | OCT_LITERAL | HEX_LITERAL ) NUMBER_PSEUDOLITERAL_SUFFIX_NO_E +>       DEC_LITERAL ( . DEC_LITERAL )? FLOAT_EXPONENT\ +>          ( NUMBER_PSEUDOLITERAL_SUFFIX | INTEGER_SUFFIX )\ +>    | DEC_LITERAL . DEC_LITERAL\ +>          ( NUMBER_PSEUDOLITERAL_SUFFIX_NO_E | INTEGER SUFFIX )\ +>    | DEC_LITERAL NUMBER_PSEUDOLITERAL_SUFFIX_NO_E\ +>    | ( BIN_LITERAL | OCT_LITERAL | HEX_LITERAL )\ +>          ( NUMBER_PSEUDOLITERAL_SUFFIX_NO_E | FLOAT_SUFFIX ) > > NUMBER_PSEUDOLITERAL_SUFFIX :\ >    IDENTIFIER_OR_KEYWORD _not matching INTEGER_SUFFIX or FLOAT_SUFFIX_ @@ -509,6 +513,8 @@ Examples of such tokens: 2e5f80; 2e5e6; 2.0e5e6; +1.3e10u64; +0b1111_f32; ``` #### Reserved forms similar to number literals From 7baad0af5b8ee9f57ee7aedd80cc203593045c37 Mon Sep 17 00:00:00 2001 From: Matthew Woodcraft Date: Tue, 22 Mar 2022 22:17:50 +0000 Subject: [PATCH 19/24] Make the FLOAT_LITERAL rule about final `.` more accurate The previous phrasing missed raw identifiers, raw string literals, and byte literals. Writing in terms of characters rather than tokens matches the implementation more closely. --- src/tokens.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/tokens.md b/src/tokens.md index 7a4dfce86..10bef7e51 100644 --- a/src/tokens.md +++ b/src/tokens.md @@ -441,7 +441,7 @@ let horse = example.0b10; // ERROR no field named `0b10` > **Lexer**\ > FLOAT_LITERAL :\ >       DEC_LITERAL `.` -> _(not immediately followed by `.`, `_` or an [identifier] or [keyword][keywords]_)\ +> _(not immediately followed by `.`, `_` or an XID_Start character)_\ >    | DEC_LITERAL FLOAT_EXPONENT\ >    | DEC_LITERAL `.` DEC_LITERAL FLOAT_EXPONENT?\ >    | DEC_LITERAL (`.` DEC_LITERAL)? @@ -524,7 +524,7 @@ Examples of such tokens: >       BIN_LITERAL \[`2`-`9`​]\ >    | OCT_LITERAL \[`8`-`9`​]\ >    | ( BIN_LITERAL | OCT_LITERAL | HEX_LITERAL ) `.` \ ->          _(not immediately followed by `.`, `_` or an [identifier] or [keyword][keywords]_)\ +>          _(not immediately followed by `.`, `_` or an XID_Start character)_\ >    | ( BIN_LITERAL | OCT_LITERAL ) `e`\ >    | `0b` `_`\* _end of input or not BIN_DIGIT_\ >    | `0o` `_`\* _end of input or not OCT_DIGIT_\ From 9704aad87c01121e38f60e25506e3d07c12f2395 Mon Sep 17 00:00:00 2001 From: Matthew Woodcraft Date: Tue, 22 Mar 2022 22:29:33 +0000 Subject: [PATCH 20/24] Literal expressions: text improvements from ehuss --- src/expressions/literal-expr.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/expressions/literal-expr.md b/src/expressions/literal-expr.md index e01ac1786..9a611409e 100644 --- a/src/expressions/literal-expr.md +++ b/src/expressions/literal-expr.md @@ -75,7 +75,7 @@ If the value does not fit in `u128`, the expression is rejected by the parser. * The `u128` value is converted to the expression's type via a [numeric cast]. > **Note**: The final cast will truncate the value of the literal if it does not fit in the expression's type. -> There is a [lint check] named `overflowing_literals`, defaulting to `deny`, which rejects expressions where this occurs. +> `rustc` includes a [lint check] named `overflowing_literals`, defaulting to `deny`, which rejects expressions where this occurs. > **Note**: `-1i8`, for example, is an application of the [negation operator] to the literal expression `1i8`, not a single integer literal expression. @@ -114,7 +114,7 @@ The value of the expression is determined from the string representation of the > **Note**: `inf` and `NaN` are not literal tokens. > The [`f32::INFINITY`], [`f64::INFINITY`], [`f32::NAN`], and [`f64::NAN`] constants can be used instead of literal expressions. -> A literal large enough to be evaluated as infinite will trigger the `overflowing_literals` lint check. +> In `rustc`, a literal large enough to be evaluated as infinite will trigger the `overflowing_literals` lint check. [constant expression]: ../const_eval.md#constant-expressions [floating-point types]: ../types/numeric.md#floating-point-types From 56105c2482edf34477587002d549b9f6eb5c13a8 Mon Sep 17 00:00:00 2001 From: Matthew Woodcraft Date: Tue, 22 Mar 2022 22:31:33 +0000 Subject: [PATCH 21/24] tokens.md: add missing superscript markup --- src/tokens.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/tokens.md b/src/tokens.md index 10bef7e51..fba0da013 100644 --- a/src/tokens.md +++ b/src/tokens.md @@ -448,7 +448,7 @@ let horse = example.0b10; // ERROR no field named `0b10` > FLOAT_EXPONENT? FLOAT_SUFFIX > > FLOAT_EXPONENT :\ ->    (`e`|`E`) (`+`|`-`)? +>    (`e`|`E`) (`+`|`-`)? > (DEC_DIGIT|`_`)\* DEC_DIGIT (DEC_DIGIT|`_`)\* > > FLOAT_SUFFIX :\ @@ -529,7 +529,7 @@ Examples of such tokens: >    | `0b` `_`\* _end of input or not BIN_DIGIT_\ >    | `0o` `_`\* _end of input or not OCT_DIGIT_\ >    | `0x` `_`\* _end of input or not HEX_DIGIT_\ ->    | DEC_LITERAL ( . DEC_LITERAL)? (`e`|`E`) (`+`|`-`)? _end of input or not DEC_DIGIT_ +>    | DEC_LITERAL ( . DEC_LITERAL)? (`e`|`E`) (`+`|`-`)? _end of input or not DEC_DIGIT_ The following lexical forms similar to number literals are _reserved forms_: From d1f3e7fb82fe8e07d5dc920eebdcdd73866d674a Mon Sep 17 00:00:00 2001 From: Matthew Woodcraft Date: Tue, 22 Mar 2022 22:40:33 +0000 Subject: [PATCH 22/24] Number pseudoliterals and reserved forms: text improvements from ehuss --- src/tokens.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/tokens.md b/src/tokens.md index fba0da013..0db558330 100644 --- a/src/tokens.md +++ b/src/tokens.md @@ -501,7 +501,8 @@ Note that `-1.0`, for example, is analyzed as two tokens: `-` followed by `1.0`. > NUMBER_PSEUDOLITERAL_SUFFIX_NO_E :\ >    NUMBER_PSEUDOLITERAL_SUFFIX _not beginning with `e` or `E`_ -As described [above](#suffixes), tokens with the same form as numeric literals other than in the content of their suffix are accepted by the lexer (with the exception of some reserved forms described below). +Tokenization of numeric literals allows arbitrary suffixes as described in the grammar above. +These values generate valid tokens, but are not valid [literal expressions], so are usually an error except as macro arguments. Examples of such tokens: ```rust,compile_fail @@ -531,7 +532,8 @@ Examples of such tokens: >    | `0x` `_`\* _end of input or not HEX_DIGIT_\ >    | DEC_LITERAL ( . DEC_LITERAL)? (`e`|`E`) (`+`|`-`)? _end of input or not DEC_DIGIT_ -The following lexical forms similar to number literals are _reserved forms_: +The following lexical forms similar to number literals are _reserved forms_. +Due to the possible ambiguity these raise, they are rejected by the tokenizer instead of being interpreted as separate tokens. * An unsuffixed binary or octal literal followed, without intervening whitespace, by a decimal digit out of the range for its radix. @@ -543,8 +545,6 @@ The following lexical forms similar to number literals are _reserved forms_: * Input which has the form of a floating-point literal with no digits in the exponent. -Any input containing one of these reserved forms is reported as an error by the lexer. - Examples of reserved forms: ```rust,compile_fail From 0c4554f83f0073bc3a395f19a6869ce9259c82ce Mon Sep 17 00:00:00 2001 From: Matthew Woodcraft Date: Wed, 23 Mar 2022 19:55:14 +0000 Subject: [PATCH 23/24] Literal expressions: use a sublist when describing choice of radix --- src/expressions/literal-expr.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/src/expressions/literal-expr.md b/src/expressions/literal-expr.md index 9a611409e..f6687eb11 100644 --- a/src/expressions/literal-expr.md +++ b/src/expressions/literal-expr.md @@ -63,7 +63,12 @@ let a: u64 = 123; // type u64 The value of the expression is determined from the string representation of the token as follows: -* An integer radix is chosen by inspecting the first two characters of the string: `0b` indicates radix 2, `0o` indicates radix 8, `0x` indicates radix 16; otherwise the radix is 10. +* An integer radix is chosen by inspecting the first two characters of the string, as follows: + + * `0b` indicates radix 2 + * `0o` indicates radix 8 + * `0x` indicates radix 16 + * otherwise the radix is 10. * If the radix is not 10, the first two characters are removed from the string. From 2c783999081e8d76b8144cf4eb26d5a380e26d7f Mon Sep 17 00:00:00 2001 From: Matthew Woodcraft Date: Wed, 23 Mar 2022 20:07:52 +0000 Subject: [PATCH 24/24] Literal expressions: add placeholder sections for types not yet documented --- src/expressions/literal-expr.md | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) diff --git a/src/expressions/literal-expr.md b/src/expressions/literal-expr.md index f6687eb11..24552a915 100644 --- a/src/expressions/literal-expr.md +++ b/src/expressions/literal-expr.md @@ -26,6 +26,30 @@ Each of the lexical [literal][literal tokens] forms described earlier can make u 5; // integer type ``` +## Character literal expressions + +A character literal expression consists of a single [CHAR_LITERAL] token. + +> **Note**: This section is incomplete. + +## String literal expressions + +A string literal expression consists of a single [STRING_LITERAL] or [RAW_STRING_LITERAL] token. + +> **Note**: This section is incomplete. + +## Byte literal expressions + +A byte literal expression consists of a single [BYTE_LITERAL] token. + +> **Note**: This section is incomplete. + +## Byte string literal expressions + +A string literal expression consists of a single [BYTE_STRING_LITERAL] or [RAW_BYTE_STRING_LITERAL] token. + +> **Note**: This section is incomplete. + ## Integer literal expressions An integer literal expression consists of a single [INTEGER_LITERAL] token. @@ -121,6 +145,12 @@ The value of the expression is determined from the string representation of the > The [`f32::INFINITY`], [`f64::INFINITY`], [`f32::NAN`], and [`f64::NAN`] constants can be used instead of literal expressions. > In `rustc`, a literal large enough to be evaluated as infinite will trigger the `overflowing_literals` lint check. +## Boolean literal expressions + +A boolean literal expression consists of a single [BOOLEAN_LITERAL] token. + +> **Note**: This section is incomplete. + [constant expression]: ../const_eval.md#constant-expressions [floating-point types]: ../types/numeric.md#floating-point-types [lint check]: ../attributes/diagnostics.md#lint-check-attributes