Cache local DefId-keyed queries without hashing #119977

Mark-Simulacrum · 2024-01-14T23:11:52Z

This caches local DefId-keyed queries using just an IndexVec. This costs ~5% extra max-rss at most but brings significant runtime improvement, up to 13% cycle counts (mean: 4%) on primary benchmarks. It's possible that further tweaks could reduce the memory overhead further but this win seems worth landing despite the increased memory, particularly with regards to eliminating the present set in non-incr or storing it inline (skip list?) with the main data.

We tried applying this scheme to all keys in the first perf run but found that it carried a significant memory hit (50%). instructions/cycle counts were also much more mixed, though that may have been due to the lack of the present set optimization (needed for fast iter() calls in incremental scenarios).

Closes #45275

Mark-Simulacrum · 2024-01-14T23:13:07Z

@bors try @rust-timer queue

Cache DefId-keyed queries without hashing Not yet ready for review: * My guess is that this will be a significant memory footprint hit for sparser queries and require some more logic. * Likely merits some further consideration for parallel rustc, though as noted in a separate comment the existing IndexVec sharding looks useless to me (likely always selecting the same shard today in 99% of cases). cc rust-lang#45275 r? `@ghost`

bors · 2024-01-14T23:15:27Z

⌛ Trying commit b1e2dc2 with merge 6235575...

bors · 2024-01-15T00:41:20Z

☀️ Try build successful - checks-actions
Build commit: 6235575 (6235575300d8e6e2cc6f449cb9048722ef43f9c7)

Mark-Simulacrum · 2024-01-15T18:35:50Z

@bors try @rust-timer queue

Cache DefId-keyed queries without hashing Not yet ready for review: * My guess is that this will be a significant memory footprint hit for sparser queries and require some more logic. * Likely merits some further consideration for parallel rustc, though as noted in a separate comment the existing IndexVec sharding looks useless to me (likely always selecting the same shard today in 99% of cases). Perf notes: * rust-lang#119977 (comment) evaluated a `IndexVec<CrateNum, IndexVec<DefIndex, Option<(V, DepNodeIndex)>>` scheme. This showed poor performance on incremental scenarios as the `iter()` callbacks are slower when walking the sparse vecs. In `full` scenarios this was a win for many primary benchmarks (~1-6% instructions, ~1-10% cycles), but did show significant memory overhead (+50% on many benchmarks). Next attempt will (a) skip hashing for local storage (expected to be denser) and retains the hashing for foreign storage (expected to be sparse) and (b) keep a present Vec to speed up `iter()` callbacks. cc rust-lang#45275 r? `@ghost`

bors · 2024-01-15T18:37:00Z

⌛ Trying commit 4d55b76 with merge e428ff4...

bors · 2024-01-15T20:02:46Z

☀️ Try build successful - checks-actions
Build commit: e428ff4 (e428ff45c1774bfff604acb0297ef9433ca235f9)

rust-timer · 2024-01-15T21:46:09Z

Finished benchmarking commit (e428ff4): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	0.3%	[0.3%, 0.3%]	1
Improvements ✅ (primary)	-1.2%	[-8.2%, -0.2%]	162
Improvements ✅ (secondary)	-1.5%	[-3.8%, -0.3%]	64
All ❌✅ (primary)	-1.2%	[-8.2%, -0.2%]	162

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.9%	[0.8%, 4.8%]	56
Regressions ❌ (secondary)	3.1%	[1.8%, 4.8%]	19
Improvements ✅ (primary)	-0.7%	[-0.9%, -0.4%]	3
Improvements ✅ (secondary)	-2.0%	[-5.2%, -1.0%]	12
All ❌✅ (primary)	1.7%	[-0.9%, 4.8%]	59

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-4.0%	[-13.0%, -1.1%]	132
Improvements ✅ (secondary)	-5.0%	[-15.1%, -0.8%]	47
All ❌✅ (primary)	-4.0%	[-13.0%, -1.1%]	132

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 668.966s -> 669.246s (0.04%)
Artifact size: 308.17 MiB -> 308.27 MiB (0.03%)

rustbot · 2024-01-15T22:06:14Z

Failed to set assignee to ghost: invalid assignee

Note: Only org members with at least the repository "read" role, users with write permissions, or people who have commented on the PR may be assigned.

Foreign maps are used to cache external DefIds, typically backed by metadata decoding. In the future we might skip caching `V` there (since loading from metadata usually is already cheap enough), but for now this cuts down on the impact to memory usage and time to None-init a bunch of memory. Foreign data is usually much sparser, since we're not usually loading *all* entries from the foreign crate(s).

Mark-Simulacrum · 2024-01-15T22:25:12Z

r? compiler

This is ready for review with some pretty good wins; updated the PR description with a summary and force-pushed a squash of the commits and comment updates.

cjgillot

The sharding can definitely be left to another PR.
Perf is excellent.

cjgillot · 2024-01-15T22:58:02Z

compiler/rustc_middle/src/query/keys.rs

@@ -152,7 +153,7 @@ impl Key for LocalDefId {
 }

 impl Key for DefId {
-    type CacheSelector = DefaultCacheSelector<Self>;
+    type CacheSelector = DefIdCacheSelector;


ModDefId below could use DefIdCacheSelector too.

I'd prefer to leave that to follow up (want separate perf run on it - ModDefId sounds like it'll be much more commonly quite sparse as a % of indexes, so it may not be a win to move it like this).

cjgillot · 2024-01-15T23:02:09Z

compiler/rustc_query_system/src/query/caches.rs

+            let value = cache[idx].unwrap();
+            f(&DefId { krate: LOCAL_CRATE, index: idx }, &value.0, value.1);
+        }
+        self.foreign.iter(f);


Should guard be dropped before iterating on foreign?

Shouldn't matter -- the callbacks are usually either math or serializing stuff to disk. We'd previously hold the lock for the whole map (modulo sharding) while calling the iteration function, whether we hold two locks or not I think can't really impact anything?

cjgillot · 2024-01-15T23:20:58Z

@bors r+

bors · 2024-01-15T23:22:01Z

📌 Commit 3784964 has been approved by cjgillot

It is now in the queue for this repository.

bors · 2024-01-16T21:58:13Z

⌛ Testing commit 3784964 with merge 098d4fd...

bors · 2024-01-17T00:00:31Z

☀️ Test successful - checks-actions
Approved by: cjgillot
Pushing 098d4fd to master...

rust-timer · 2024-01-17T02:12:58Z

Finished benchmarking commit (098d4fd): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.7%	[0.7%, 0.7%]	1
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.2%	[-8.2%, -0.2%]	161
Improvements ✅ (secondary)	-1.5%	[-3.7%, -0.3%]	64
All ❌✅ (primary)	-1.2%	[-8.2%, 0.7%]	162

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.0%	[0.8%, 4.2%]	58
Regressions ❌ (secondary)	3.4%	[0.4%, 8.0%]	24
Improvements ✅ (primary)	-0.9%	[-0.9%, -0.8%]	2
Improvements ✅ (secondary)	-2.2%	[-4.7%, -1.4%]	7
All ❌✅ (primary)	1.9%	[-0.9%, 4.2%]	60

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	4.2%	[4.2%, 4.2%]	1
Improvements ✅ (primary)	-3.9%	[-13.1%, -0.8%]	136
Improvements ✅ (secondary)	-5.0%	[-14.2%, -1.4%]	54
All ❌✅ (primary)	-3.9%	[-13.1%, -0.8%]	136

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 663.149s -> 664.632s (0.22%)
Artifact size: 308.26 MiB -> 308.32 MiB (0.02%)

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 14, 2024

This comment has been minimized.

Sign in to view

This comment was marked as outdated.

Sign in to view

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jan 15, 2024

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 15, 2024

This comment has been minimized.

Sign in to view

rustbot removed S-waiting-on-perf Status: Waiting on a perf run to be completed. perf-regression Performance regression. labels Jan 15, 2024

Mark-Simulacrum changed the title ~~Cache DefId-keyed queries without hashing~~ Cache local DefId-keyed queries without hashing Jan 15, 2024

Mark-Simulacrum force-pushed the defid-cache branch from 4d55b76 to 3784964 Compare January 15, 2024 22:22

rustbot assigned oli-obk Jan 15, 2024

Mark-Simulacrum added the relnotes-perf Performance improvements that should be mentioned in the release notes. label Jan 15, 2024

cjgillot reviewed Jan 15, 2024

View reviewed changes

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 15, 2024

bors added the merged-by-bors This PR was explicitly merged by bors. label Jan 17, 2024

bors merged commit 098d4fd into rust-lang:master Jan 17, 2024

rustbot added this to the 1.77.0 milestone Jan 17, 2024

Mark-Simulacrum mentioned this pull request Jan 18, 2024

Avoid hashing more than is strictly necessary, in the compiler. #56308

Open

Mark-Simulacrum deleted the defid-cache branch January 18, 2024 22:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache local DefId-keyed queries without hashing #119977

Cache local DefId-keyed queries without hashing #119977

Mark-Simulacrum commented Jan 14, 2024 •

edited

Loading

Mark-Simulacrum commented Jan 14, 2024

This comment has been minimized.

bors commented Jan 14, 2024

bors commented Jan 15, 2024

This comment has been minimized.

This comment was marked as outdated.

Mark-Simulacrum commented Jan 15, 2024

This comment has been minimized.

bors commented Jan 15, 2024

bors commented Jan 15, 2024

This comment has been minimized.

rust-timer commented Jan 15, 2024

rustbot commented Jan 15, 2024

Mark-Simulacrum commented Jan 15, 2024

cjgillot left a comment

cjgillot Jan 15, 2024

Mark-Simulacrum Jan 15, 2024

cjgillot Jan 15, 2024

Mark-Simulacrum Jan 15, 2024 •

edited

Loading

cjgillot commented Jan 15, 2024

bors commented Jan 15, 2024

bors commented Jan 16, 2024

bors commented Jan 17, 2024

rust-timer commented Jan 17, 2024

Cache local DefId-keyed queries without hashing #119977

Cache local DefId-keyed queries without hashing #119977

Conversation

Mark-Simulacrum commented Jan 14, 2024 • edited Loading

Mark-Simulacrum commented Jan 14, 2024

This comment has been minimized.

bors commented Jan 14, 2024

bors commented Jan 15, 2024

This comment has been minimized.

This comment was marked as outdated.

Mark-Simulacrum commented Jan 15, 2024

This comment has been minimized.

bors commented Jan 15, 2024

bors commented Jan 15, 2024

This comment has been minimized.

rust-timer commented Jan 15, 2024

Overall result: ✅ improvements - no action needed

rustbot commented Jan 15, 2024

Mark-Simulacrum commented Jan 15, 2024

cjgillot left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Mark-Simulacrum Jan 15, 2024 • edited Loading

Choose a reason for hiding this comment

cjgillot commented Jan 15, 2024

bors commented Jan 15, 2024

bors commented Jan 16, 2024

bors commented Jan 17, 2024

rust-timer commented Jan 17, 2024

Overall result: ✅ improvements - no action needed

Mark-Simulacrum commented Jan 14, 2024 •

edited

Loading

Mark-Simulacrum Jan 15, 2024 •

edited

Loading