-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache local DefId-keyed queries without hashing #119977
Conversation
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Cache DefId-keyed queries without hashing Not yet ready for review: * My guess is that this will be a significant memory footprint hit for sparser queries and require some more logic. * Likely merits some further consideration for parallel rustc, though as noted in a separate comment the existing IndexVec sharding looks useless to me (likely always selecting the same shard today in 99% of cases). cc rust-lang#45275 r? `@ghost`
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
This comment was marked as outdated.
This comment was marked as outdated.
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Cache DefId-keyed queries without hashing Not yet ready for review: * My guess is that this will be a significant memory footprint hit for sparser queries and require some more logic. * Likely merits some further consideration for parallel rustc, though as noted in a separate comment the existing IndexVec sharding looks useless to me (likely always selecting the same shard today in 99% of cases). Perf notes: * rust-lang#119977 (comment) evaluated a `IndexVec<CrateNum, IndexVec<DefIndex, Option<(V, DepNodeIndex)>>` scheme. This showed poor performance on incremental scenarios as the `iter()` callbacks are slower when walking the sparse vecs. In `full` scenarios this was a win for many primary benchmarks (~1-6% instructions, ~1-10% cycles), but did show significant memory overhead (+50% on many benchmarks). Next attempt will (a) skip hashing for local storage (expected to be denser) and retains the hashing for foreign storage (expected to be sparse) and (b) keep a present Vec to speed up `iter()` callbacks. cc rust-lang#45275 r? `@ghost`
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (e428ff4): comparison URL. Overall result: ✅ improvements - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 668.966s -> 669.246s (0.04%) |
Failed to set assignee to
|
Foreign maps are used to cache external DefIds, typically backed by metadata decoding. In the future we might skip caching `V` there (since loading from metadata usually is already cheap enough), but for now this cuts down on the impact to memory usage and time to None-init a bunch of memory. Foreign data is usually much sparser, since we're not usually loading *all* entries from the foreign crate(s).
4d55b76
to
3784964
Compare
r? compiler This is ready for review with some pretty good wins; updated the PR description with a summary and force-pushed a squash of the commits and comment updates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sharding can definitely be left to another PR.
Perf is excellent.
@@ -152,7 +153,7 @@ impl Key for LocalDefId { | |||
} | |||
|
|||
impl Key for DefId { | |||
type CacheSelector = DefaultCacheSelector<Self>; | |||
type CacheSelector = DefIdCacheSelector; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ModDefId
below could use DefIdCacheSelector
too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to leave that to follow up (want separate perf run on it - ModDefId sounds like it'll be much more commonly quite sparse as a % of indexes, so it may not be a win to move it like this).
let value = cache[idx].unwrap(); | ||
f(&DefId { krate: LOCAL_CRATE, index: idx }, &value.0, value.1); | ||
} | ||
self.foreign.iter(f); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should guard
be dropped before iterating on foreign
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't matter -- the callbacks are usually either math or serializing stuff to disk. We'd previously hold the lock for the whole map (modulo sharding) while calling the iteration function, whether we hold two locks or not I think can't really impact anything?
@bors r+ |
☀️ Test successful - checks-actions |
Finished benchmarking commit (098d4fd): comparison URL. Overall result: ✅ improvements - no action needed@rustbot label: -perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 663.149s -> 664.632s (0.22%) |
This caches local DefId-keyed queries using just an IndexVec. This costs ~5% extra max-rss at most but brings significant runtime improvement, up to 13% cycle counts (mean: 4%) on primary benchmarks. It's possible that further tweaks could reduce the memory overhead further but this win seems worth landing despite the increased memory, particularly with regards to eliminating the present set in non-incr or storing it inline (skip list?) with the main data.
We tried applying this scheme to all keys in the first perf run but found that it carried a significant memory hit (50%). instructions/cycle counts were also much more mixed, though that may have been due to the lack of the present set optimization (needed for fast iter() calls in incremental scenarios).
Closes #45275