-
Notifications
You must be signed in to change notification settings - Fork 13.5k
Introduce ByteSymbol
#141875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Introduce ByteSymbol
#141875
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Introduce `ByteSymbol` r? `@ghost`
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (f661a16): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.
Max RSS (memory usage)Results (primary -1.9%, secondary -1.7%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary -2.4%, secondary 0.3%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (primary -1.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 775.84s -> 774.376s (-0.19%) |
The motivation here: introduce It works, and the max-rss results (and page faults) are down for the literal-heavy benchmarks: |
Upd: but maybe most of the (non-backend) improvements we already done in #103812. |
This is a real issue because people want to include relatively large binary blobs into their programs, but I'm not sure we have a relevant benchmark in the perf suite. It makes sense to benchmark this change on something specifically binary-blob-oriented. |
I actually wanted to share the interner between string symbols and byte symbols. #65818 (comment) suggested making Not sure if the same sharing can be done for metadata encoding/decoding tables. |
I think this generally makes sense, the byte array interner is not large and is mostly a copypaste of the string interner, so not much new complexity. |
Reminder, once the PR becomes ready for a review, use |
I benchmarked this program: pub const BYTES: &[u8] = include_bytes!("/bin/containerd");
fn main() {
println!("len = {}", BYTES.len());
} On my Linux box hyperfine results, before and after:
I.e. this PR makes things a little slower. Probably because of the cost of hashing the binary blob to convert it into a |
Seems reasonable, big binary blobs are unlikely to be identical, so not much sense to try deduplicating them. |
Surprising it's only a little slower: Cachegrind says my example program takes 39 million instrs to compile normally and 67 million instructions with this PR's changes. The differences being all in hashing and some Edit: if I put the benchmark into rustc-perf and measure it, for a |
This comment has been minimized.
This comment has been minimized.
Some changes occurred in compiler/rustc_codegen_ssa These commits modify compiler targets. Some changes occurred to the CTFE / Miri interpreter cc @rust-lang/miri, @RalfJung, @oli-obk, @lcnr Some changes occurred in match checking cc @Nadrieril Some changes occurred in src/tools/clippy cc @rust-lang/clippy Some changes occurred in cc @BoxyUwU HIR ty lowering was modified cc @fmease |
I have done this.
I've also updated the encoding/decoding so that strings and byte strings are mostly handled the same way, so there is much less code duplication than the earlier version. |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Introduce `ByteSymbol` It's like `Symbol` but for byte strings. The interner is now used for both `Symbol` and `ByteSymbol`. E.g. if you intern `"dog"` and `b"dog"` you'll get a `Symbol` and a `ByteSymbol` with the same index and the characters will only be stored once. The motivation for this is to eliminate the `Arc`s in `ast::LitKind`, to make `ast::LitKind` impl `Copy`, and to avoid the need to arena-allocate `ast::LitKind` in HIR. The latter change reduces peak memory by a non-trivial amount on literal-heavy benchmarks such as `deep-vector` and `tuple-stress`. `Encoder`, `Decoder`, `SpanEncoder`, and `SpanDecoder` all get some changes so that they can handle normal strings and byte strings.
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (4314adc): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary 0.2%, secondary -1.0%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (secondary -10.2%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 691.142s -> 690.132s (-0.15%) |
It's like `Symbol` but for byte strings. The interner is now used for both `Symbol` and `ByteSymbol`. E.g. if you intern `"dog"` and `b"dog"` you'll get a `Symbol` and a `ByteSymbol` with the same index and the characters will only be stored once. The motivation for this is to eliminate the `Arc`s in `ast::LitKind`, to make `ast::LitKind` impl `Copy`, and to avoid the need to arena-allocate `ast::LitKind` in HIR. The latter change reduces peak memory by a non-trivial amount on literal-heavy benchmarks such as `deep-vector` and `tuple-stress`. `Encoder`, `Decoder`, `SpanEncoder`, and `SpanDecoder` all get some changes so that they can handle normal strings and byte strings. This change does slow down compilation of programs that use `include_bytes!` on large files, because the contents of those files are now interned (hashed). This makes `include_bytes!` more similar to `include_str!`, though `include_bytes!` contents still aren't escaped, and hashing is still much cheaper than escaping.
|
||
rustc_index::newtype_index! { | ||
#[orderable] | ||
struct ByteSymbolIndex {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like SymbolIndex
and ByteSymbolIndex
don't need to be separate types.
|
||
// SAFETY: we can extend the arena allocation to `'static` because we | ||
// only access these while the arena is still alive. | ||
let string: &'static str = unsafe { &*(string as *const str) }; | ||
let byte_str: &'static [u8] = unsafe { &*(byte_str as *const [u8]) }; | ||
|
||
// This second hash table lookup can be avoided by using `RawEntryMut`, | ||
// but this code path isn't hot enough for it to be worth it. See | ||
// #91445 for details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huge blobs from include_bytes!
will now go through this logic, so it makes sense to try this optimization again.
Actually, it would be good to have a benchmark for including a huge binary blob on perf-rlo, it's a special case, but an important one.
I've made an issue for that - rust-lang/rustc-perf#2171.
fn encode_symbol(&mut self, symbol: Symbol) { | ||
// if symbol predefined, emit tag and symbol index | ||
if symbol.is_predefined() { | ||
fn encode_symbol_index_or_byte_str(&mut self, index: u32, byte_str: &[u8]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fn encode_symbol_index_or_byte_str(&mut self, index: u32, byte_str: &[u8]) { | |
fn encode_symbol(&mut self, index: u32) { |
byte_str
can be obtained here with ByteSymbol::new(index).as_byte_str()
.
Can also avoid touching the interner in SYMBOL_PREDEFINED
cases if doing it here.
@@ -1333,7 +1352,15 @@ impl<D: SpanDecoder> Decodable<D> for Span { | |||
|
|||
impl<D: SpanDecoder> Decodable<D> for Symbol { | |||
fn decode(s: &mut D) -> Symbol { | |||
s.decode_symbol() | |||
s.decode_symbol_index_or_byte_str(Symbol::new, |b| { | |||
Symbol::intern(unsafe { str::from_utf8_unchecked(b) }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, is this unsafe justified?
We are reading data from some file (?) and just assume that it's UTF-8.
Seems pre-existing because Decoder::read_str
does that as well, but it at least does a STR_SENTINEL
check.
self.emit_usize(v.len()); | ||
self.emit_raw_bytes(v.as_bytes()); | ||
self.emit_raw_bytes(v); | ||
self.emit_u8(STR_SENTINEL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
STR_SENTINEL
is now emitted for non UTF-8 strings, but read_str
relies on it being an indication of UTF-8-ness to be somewhat safe.
It's like
Symbol
but for byte strings. The interner is now used for bothSymbol
andByteSymbol
. E.g. if you intern"dog"
andb"dog"
you'll get aSymbol
and aByteSymbol
with the same index and the characters will only be stored once.The motivation for this is to eliminate the
Arc
s inast::LitKind
, to makeast::LitKind
implCopy
, and to avoid the need to arena-allocateast::LitKind
in HIR. The latter change reduces peak memory by a non-trivial amount on literal-heavy benchmarks such asdeep-vector
andtuple-stress
.Encoder
,Decoder
,SpanEncoder
, andSpanDecoder
all get some changes so that they can handle normal strings and byte strings.