Skip to content

Commit f07e1f9

Browse files
RalfJungmark-i-m
authored andcommitted
apply linebreaks
1 parent b69fcaf commit f07e1f9

File tree

1 file changed

+96
-63
lines changed

1 file changed

+96
-63
lines changed

src/miri.md

Lines changed: 96 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -55,33 +55,40 @@ Before the evaluation, a virtual memory location (in this case essentially a
5555
`vec![u8; 4]` or `vec![u8; 8]`) is created for storing the evaluation result.
5656

5757
At the start of the evaluation, `_0` and `_1` are
58-
`Operand::Immediate(Immediate::Scalar(ScalarMaybeUndef::Undef))`.
59-
This is quite a mouthful: [`Operand`] can represent either data stored somewhere in the [interpreter memory](#memory) (`Operand::Indirect`), or (as an optimization) immediate data stored in-line.
60-
And [`Immediate`] can either be a single (potentially uninitialized) [scalar value][`Scalar`] (integer or thin pointer), or a pair of two of them.
61-
In our case, the single scalar value is *not* (yet) initialized.
62-
63-
When the initialization of `_1` is invoked, the
64-
value of the `FOO` constant is required, and triggers another call to
65-
`tcx.const_eval`, which will not be shown here. If the evaluation of FOO is
66-
successful, `42` will be subtracted from its value `4096` and the result stored in
67-
`_1` as `Operand::Immediate(Immediate::ScalarPair(Scalar::Raw { data: 4054, .. }, Scalar::Raw { data: 0, .. })`. The first
68-
part of the pair is the computed value, the second part is a bool that's true if
69-
an overflow happened. A `Scalar::Raw` also stores the size (in bytes) of this scalar value; we are eliding that here.
58+
`Operand::Immediate(Immediate::Scalar(ScalarMaybeUndef::Undef))`. This is quite
59+
a mouthful: [`Operand`] can represent either data stored somewhere in the
60+
[interpreter memory](#memory) (`Operand::Indirect`), or (as an optimization)
61+
immediate data stored in-line. And [`Immediate`] can either be a single
62+
(potentially uninitialized) [scalar value][`Scalar`] (integer or thin pointer),
63+
or a pair of two of them. In our case, the single scalar value is *not* (yet)
64+
initialized.
65+
66+
When the initialization of `_1` is invoked, the value of the `FOO` constant is
67+
required, and triggers another call to `tcx.const_eval`, which will not be shown
68+
here. If the evaluation of FOO is successful, `42` will be subtracted from its
69+
value `4096` and the result stored in `_1` as
70+
`Operand::Immediate(Immediate::ScalarPair(Scalar::Raw { data: 4054, .. },
71+
Scalar::Raw { data: 0, .. })`. The first part of the pair is the computed value,
72+
the second part is a bool that's true if an overflow happened. A `Scalar::Raw`
73+
also stores the size (in bytes) of this scalar value; we are eliding that here.
7074

7175
The next statement asserts that said boolean is `0`. In case the assertion
7276
fails, its error message is used for reporting a compile-time error.
7377

74-
Since it does not fail, `Operand::Immediate(Immediate::Scalar(Scalar::Raw { data: 4054, .. }))` is stored in the
75-
virtual memory was allocated before the evaluation. `_0` always refers to that
76-
location directly.
77-
78-
After the evaluation is done, the return value is converted from [`Operand`] to [`ConstValue`] by [`op_to_const`]:
79-
the former representation is geared towards what is needed *during* cost evaluation, while [`ConstValue`]
80-
is shaped by the needs of the remaining parts of the compiler that consume the results of const evaluation.
81-
As part of this conversion, for types with scalar values, even if
82-
the resulting [`Operand`] is `Indirect`, it will return an immediate `ConstValue::Scalar(computed_value)` (instead of the usual `ConstValue::ByRef`).
83-
This makes using the result much more efficient and also more convenient, as no further queries need to be
84-
executed in order to get at something as simple as a `usize`.
78+
Since it does not fail, `Operand::Immediate(Immediate::Scalar(Scalar::Raw {
79+
data: 4054, .. }))` is stored in the virtual memory was allocated before the
80+
evaluation. `_0` always refers to that location directly.
81+
82+
After the evaluation is done, the return value is converted from [`Operand`] to
83+
[`ConstValue`] by [`op_to_const`]: the former representation is geared towards
84+
what is needed *during* cost evaluation, while [`ConstValue`] is shaped by the
85+
needs of the remaining parts of the compiler that consume the results of const
86+
evaluation. As part of this conversion, for types with scalar values, even if
87+
the resulting [`Operand`] is `Indirect`, it will return an immediate
88+
`ConstValue::Scalar(computed_value)` (instead of the usual `ConstValue::ByRef`).
89+
This makes using the result much more efficient and also more convenient, as no
90+
further queries need to be executed in order to get at something as simple as a
91+
`usize`.
8592

8693
Future evaluations of the same constants will not actually invoke
8794
Miri, but just use the cached result.
@@ -96,31 +103,39 @@ Miri, but just use the cached result.
96103

97104
Miri's outside-facing datastructures can be found in
98105
[librustc/mir/interpret](https://github.com/rust-lang/rust/blob/master/src/librustc/mir/interpret).
99-
This is mainly the error enum and the [`ConstValue`] and [`Scalar`] types. A `ConstValue` can
100-
be either `Scalar` (a single `Scalar`, i.e., integer or thin pointer),
101-
`Slice` (to represent byte slices and strings, as needed for pattern matching) or `ByRef`, which is used for anything else and
102-
refers to a virtual allocation. These allocations can be accessed via the
103-
methods on `tcx.interpret_interner`.
104-
A `Scalar` is either some `Raw` integer or a pointer; see [the next section](#memory) for more on that.
106+
This is mainly the error enum and the [`ConstValue`] and [`Scalar`] types. A
107+
`ConstValue` can be either `Scalar` (a single `Scalar`, i.e., integer or thin
108+
pointer), `Slice` (to represent byte slices and strings, as needed for pattern
109+
matching) or `ByRef`, which is used for anything else and refers to a virtual
110+
allocation. These allocations can be accessed via the methods on
111+
`tcx.interpret_interner`. A `Scalar` is either some `Raw` integer or a pointer;
112+
see [the next section](#memory) for more on that.
105113

106114
If you are expecting a numeric result, you can use `eval_usize` (panics on
107115
anything that can't be representad as a `u64`) or `try_eval_usize` which results
108116
in an `Option<u64>` yielding the `Scalar` if possible.
109117

110118
## Memory
111119

112-
To support any kind of pointers, Miri needs to have a "virtual memory" that the pointers can point to.
113-
This is implemented in the [`Memory`] type.
114-
In the simplest model, every global variable, stack variable and every dynamic allocation corresponds to an [`Allocation`] in that memory.
115-
(Actually using an allocation for every MIR stack variable would be very inefficient; that's why we have `Operand::Immediate` for stack variables that are both small and never have their address taken.
116-
But that is purely an optimization.)
117-
118-
Such an `Allocation` is basically just a sequence of `u8` storing the value of each byte in this allocation.
119-
(Plus some extra data, see below.)
120-
Every `Allocation` has a globally unique `AllocId` assigned in `Memory`.
121-
With that, a [`Pointer`] consists of a pair of an `AllocId` (indicating the allocation) and an offset into the allocation (indicating which byte of the allocation the pointer points to).
122-
It may seem odd that a `Pointer` is not just an integer address, but remember that during const evaluation, we cannot know at which actual integer address the allocation will end up -- so we use `AllocId` as symbolic base addresses, which means we need a separate offset.
123-
(As an aside, it turns out that pointers at run-time are [more than just integers, too](https://rust-lang.github.io/unsafe-code-guidelines/glossary.html#pointer-provenance).)
120+
To support any kind of pointers, Miri needs to have a "virtual memory" that the
121+
pointers can point to. This is implemented in the [`Memory`] type. In the
122+
simplest model, every global variable, stack variable and every dynamic
123+
allocation corresponds to an [`Allocation`] in that memory. (Actually using an
124+
allocation for every MIR stack variable would be very inefficient; that's why we
125+
have `Operand::Immediate` for stack variables that are both small and never have
126+
their address taken. But that is purely an optimization.)
127+
128+
Such an `Allocation` is basically just a sequence of `u8` storing the value of
129+
each byte in this allocation. (Plus some extra data, see below.) Every
130+
`Allocation` has a globally unique `AllocId` assigned in `Memory`. With that, a
131+
[`Pointer`] consists of a pair of an `AllocId` (indicating the allocation) and
132+
an offset into the allocation (indicating which byte of the allocation the
133+
pointer points to). It may seem odd that a `Pointer` is not just an integer
134+
address, but remember that during const evaluation, we cannot know at which
135+
actual integer address the allocation will end up -- so we use `AllocId` as
136+
symbolic base addresses, which means we need a separate offset. (As an aside,
137+
it turns out that pointers at run-time are
138+
[more than just integers, too](https://rust-lang.github.io/unsafe-code-guidelines/glossary.html#pointer-provenance).)
124139

125140
These allocations exist so that references and raw pointers have something to
126141
point to. There is no global linear heap in which things are allocated, but each
@@ -131,23 +146,35 @@ matter how unsafe) operation that you can do that would ever change said pointer
131146
to a pointer to a different local variable `b`.
132147
Pointer arithmetic on `a` will only ever change its offset; the `AllocId` stays the same.
133148

134-
This, however, causes a problem when we want to store a `Pointer` into an `Allocation`: we cannot turn it into a sequence of `u8` of the right length!
135-
`AllocId` and offset together are twice as big as a pointer "seems" to be.
136-
This is what the `relocation` field of `Allocation` is for: the byte offset of the `Pointer` gets stored as a bunch of `u8`, while its `AllocId` gets stored out-of-band.
137-
The two are reassembled when the `Pointer` is read from memory.
138-
The other bit of extra data an `Allocation` needs is `undef_mask` for keeping track of which of its bytes are initialized.
149+
This, however, causes a problem when we want to store a `Pointer` into an
150+
`Allocation`: we cannot turn it into a sequence of `u8` of the right length!
151+
`AllocId` and offset together are twice as big as a pointer "seems" to be. This
152+
is what the `relocation` field of `Allocation` is for: the byte offset of the
153+
`Pointer` gets stored as a bunch of `u8`, while its `AllocId` gets stored
154+
out-of-band. The two are reassembled when the `Pointer` is read from memory.
155+
The other bit of extra data an `Allocation` needs is `undef_mask` for keeping
156+
track of which of its bytes are initialized.
139157

140158
### Global memory and exotic allocations
141159

142-
`Memory` exists only during the Miri evaluation; it gets destroyed when the final value of the constant is computed.
143-
In case that constant contains any pointers, those get "interned" and moved to a global "const eval memory" that is part of `TyCtxt`.
144-
These allocations stay around for the remaining computation and get serialized into the final output (so that dependent crates can use them).
145-
146-
Moreover, to also support function pointers, the global memory in `TyCtxt` can also contain "virtual allocations": instead of an `Allocation`, these contain an `Instance`.
147-
That allows a `Pointer` to point to either normal data or a function, which is needed to be able to evaluate casts from function pointers to raw pointers.
148-
149-
Finally, the [`GlobalAlloc`] type used in the global memory also contains a variant `Static` that points to a particular `const` or `static` item.
150-
This is needed to support circular statics, where we need to have a `Pointer` to a `static` for which we cannot yet have an `Allocation` as we do not know the bytes of its value.
160+
`Memory` exists only during the Miri evaluation; it gets destroyed when the
161+
final value of the constant is computed. In case that constant contains any
162+
pointers, those get "interned" and moved to a global "const eval memory" that is
163+
part of `TyCtxt`. These allocations stay around for the remaining computation
164+
and get serialized into the final output (so that dependent crates can use
165+
them).
166+
167+
Moreover, to also support function pointers, the global memory in `TyCtxt` can
168+
also contain "virtual allocations": instead of an `Allocation`, these contain an
169+
`Instance`. That allows a `Pointer` to point to either normal data or a
170+
function, which is needed to be able to evaluate casts from function pointers to
171+
raw pointers.
172+
173+
Finally, the [`GlobalAlloc`] type used in the global memory also contains a
174+
variant `Static` that points to a particular `const` or `static` item. This is
175+
needed to support circular statics, where we need to have a `Pointer` to a
176+
`static` for which we cannot yet have an `Allocation` as we do not know the
177+
bytes of its value.
151178

152179
[`Memory`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/interpret/struct.Memory.html
153180
[`Allocation`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/mir/interpret/struct.Allocation.html
@@ -156,14 +183,20 @@ This is needed to support circular statics, where we need to have a `Pointer` to
156183

157184
### Pointer values vs Pointer types
158185

159-
One common cause of confusion in Miri is that being a pointer *value* and having a pointer *type* are entirely independent properties.
160-
By "pointer value", we refer to a `Scalar::Ptr` containing a `Pointer` and thus pointing somewhere into Miri's virtual memory.
161-
This is in contrast to `Scalar::Raw`, which is just some concrete integer.
162-
163-
However, a variable of pointer or reference *type*, such as `*const T` or `&T`, does not have to have a pointer *value*:
164-
it could be obtaining by casting or transmuting an integer to a pointer (currently that is hard to do in const eval, but eventually `transmute` will be stable as a `const fn`).
165-
And similarly, when casting or transmuting a reference to some actual allocation to an integer, we end up with a pointer *value* (`Scalar::Ptr`) at integer *type* (`usize`).
166-
This is a problem because we cannot meaningfully perform integer operations such as division on pointer values.
186+
One common cause of confusion in Miri is that being a pointer *value* and having
187+
a pointer *type* are entirely independent properties. By "pointer value", we
188+
refer to a `Scalar::Ptr` containing a `Pointer` and thus pointing somewhere into
189+
Miri's virtual memory. This is in contrast to `Scalar::Raw`, which is just some
190+
concrete integer.
191+
192+
However, a variable of pointer or reference *type*, such as `*const T` or `&T`,
193+
does not have to have a pointer *value*: it could be obtaining by casting or
194+
transmuting an integer to a pointer (currently that is hard to do in const eval,
195+
but eventually `transmute` will be stable as a `const fn`). And similarly, when
196+
casting or transmuting a reference to some actual allocation to an integer, we
197+
end up with a pointer *value* (`Scalar::Ptr`) at integer *type* (`usize`). This
198+
is a problem because we cannot meaningfully perform integer operations such as
199+
division on pointer values.
167200

168201
## Interpretation
169202

0 commit comments

Comments
 (0)