Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stop specializing on Copy #135634

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

stop specializing on Copy #135634

wants to merge 6 commits into from

Conversation

joboet
Copy link
Member

@joboet joboet commented Jan 17, 2025

fixes #132442

std specializes on Copy to optimize certain library functions such as clone_from_slice. This is unsound, however, as the Copy implementation may not be always applicable because of lifetime bounds, which specialization does not take into account; the result being that values are copied even though they are not Copy. For instance, this code:

struct SometimesCopy<'a>(&'a Cell<bool>);

impl<'a> Clone for SometimesCopy<'a> {
    fn clone(&self) -> Self {
        self.0.set(true);
        Self(self.0)
    }
}

impl Copy for SometimesCopy<'static> {}

let clone_called = Cell::new(false);
// As SometimesCopy<'clone_called> is not 'static, this must run `clone`,
// setting the value to `true`.
let _ = [SometimesCopy(&clone_called)].clone();
assert!(clone_called.get());

should not panic, but does (playground).

To solve this, this PR introduces a new unsafe trait: TrivialClone. This trait may be implemented whenever the Clone implementation is equivalent to copying the value (so e.g. fn clone(&self) -> Self { *self }). Because of lifetime erasure, there is no way for the Clone implementation to observe lifetime bounds, meaning that even if the TrivialClone has stricter bounds than the Clone implementation, its invariant still holds. Therefore, it is sound to specialize on TrivialClone.

I've changed all Copy specializations in the standard library to specialize on TrivialClone instead. Unfortunately, the unsound #[rustc_unsafe_specialization_marker] attribute on Copy cannot be removed in this PR as hashbrown still depends on it. I'll make a PR updating hashbrown once this lands.

With Copy no longer being considered for specialization, this change alone would result in the standard library optimizations not being applied for user types unaware of TrivialClone. To avoid this and restore the optimizations in most cases, I have changed the expansion of #[derive(Clone)]: Currently, whenever both Clone and Copy are derived, the clone method performs a copy of the value. With this PR, the derive macro also adds a TrivialClone implementation to make this case observable using specialization. I anticipate that most users will use #[derive(Clone, Copy)] whenever both are applicable, so most users will still profit from the library optimizations.

Unfortunately, Hyrum's law applies to this PR: there are some popular crates which rely on the precise specialization behaviour of core to implement "specialization at home", e.g. libAFL. I have no remorse for breaking such horrible code, but perhaps we should open other, better ways to satisfy their needs – for example by dropping the 'static bound on TypeId::of...

@rustbot
Copy link
Collaborator

rustbot commented Jan 17, 2025

r? @Mark-Simulacrum

rustbot has assigned @Mark-Simulacrum.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jan 17, 2025
@rustbot
Copy link
Collaborator

rustbot commented Jan 17, 2025

Changes to the code generated for builtin derived traits.

cc @nnethercote

@Mark-Simulacrum Mark-Simulacrum added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. I-libs-api-nominated Nominated for discussion during a libs-api team meeting. I-libs-nominated Nominated for discussion during a libs team meeting. labels Jan 17, 2025
@rust-log-analyzer

This comment has been minimized.

@Mark-Simulacrum
Copy link
Member

Going to nominate for libs-api (and libs) since this is both a breaking change (allowed since fixing soundness). I feel like I recall an RFC or some other discussion about us explicitly saying libraries shouldn't do the unsound thing here, but I don't know what that was. https://rust-lang.github.io/rfcs/1521-copy-clone-semantics.html is a bit related but not directly :)

@the8472
Copy link
Member

the8472 commented Jan 17, 2025

RFC 1521 could be interpreted so. Since it requires that Clone is equivalent to Copy when both are implemented.

Since SometimesCopy implements both (at least sometimes) they must be equivalent. And since cannot tell 'static and non-'static apart they must always be equivalent. Therefore the Clone implementation is wrong.


This is unsound, however, as the Copy implementation may not be always applicable because of lifetime bounds, which specialization does not take into account; the result being that values are copied even though they are not Copy.

I still don't think this is unsound in itself. So far all demonstrations of unsoundness required some other unsafe code to turn this into a miscompilation. E.g. the WeirdCow in #132442 or the TrustedLen impl in #89948 both require unsafe to exploit this.

Noratrieb also argues that lifetime-conditional Copy currently is unsupported in MIR.

So ISTM that this could be a documentation shortcoming and a compiler/lang issue that such implementations should be prevented but aren't.

That said, I agree that the current situation is brittle.

@scottmcm
Copy link
Member

Without saying anything about specialization on Copy, there's definitely been past land discussion of splitting the "memcpyable" part of Clone from the "don't need to write .clone()" part. Something like TrivialClone would probably be what that would need as well, and would -- as you mention in the docs in the PR -- be nice for allowing memcpying of non-Copy types like legacy::Range.

But that gets back to needing, as the8472 said, a way to actually block lifetime-bad implementations before it could be stable.

@Amanieu Amanieu removed I-libs-api-nominated Nominated for discussion during a libs-api team meeting. I-libs-nominated Nominated for discussion during a libs team meeting. labels Jan 21, 2025
@the8472
Copy link
Member

the8472 commented Jan 21, 2025

We discussed this during today's libs-API meeting. We currently are not aware of any safe code that is unsound due to these specializations and there were concerns about performance regressions for user types that manually implement Copy.

So we're leaning towards keeping the implementations as they are and instead improving things in other ways such as adding compiler warnings or improving the Copy documentation or unsafe-code-guidelines.


We'd like input from T-types whether they agree with this assessment and if something should be changed on the language side, e.g. by forbidding or at least warning on lifetime-conditional implementations, similar to how Drop impls must have the same bounds as the type it's implemented on.

A compiler-team member has indicated that lifetime-dependent Copy impls are de-facto unsupported.

@the8472 the8472 added the I-types-nominated Nominated for discussion during a types team meeting. label Jan 21, 2025
@BoxyUwU
Copy link
Member

BoxyUwU commented Jan 21, 2025

Forbidding lifetime dependent copy impls seems like it would be rather breaking (but that's pure speculation, we ought to do a crater run to check if anyone feels strongly we should forbid such impls), though generally I don't feel great about forbidding lifetime dependent copy impls. I also don't think a warning on lifetime dependent copy impls really helps anything for std as warnings cannot be relied upon for soundness and so std's usage of specialization would still be wrong.

In general I would prefer std to not be using specialization in any ways that affect behaviour in any way, it's stably exposing unstable broken parts of the type system in ways that are arguably unsound (allows you to prove trait bounds hold when they do not).

imo what should have happened is that years ago when specialization was found to be unsound all these specializations should have been ripped out regardless of the performance cost and re-added with a PR like this that respects lifetime constraints and treats the unsafe specialization marker attr as something unsafe with invariants to be upheld.

I cant speak for the whole types team but that's atleast my opinion as a types member 🤷‍♀️


On a semi-related note, does std still specialize fused iterator stuff in ways that exposes specialization to stable too? I remember that being a thing some years ago but haven't kept up to date with how std is using specialization

@the8472
Copy link
Member

the8472 commented Jan 21, 2025

On a semi-related note, does std still specialize fused iterator stuff in ways that exposes specialization to stable too?

Yes, but #86765 changed the specialization so that incorrect specializations only result in correctness issues and not soundness ones.

And we have TrusedFused now for cases where it's relevant to soundness.

@lcnr
Copy link
Contributor

lcnr commented Jan 28, 2025

The types team discussed this on zulip: https://rust-lang.zulipchat.com/#narrow/channel/326866-t-types.2Fnominated/topic/.23135634.3A.20stop.20specializing.20on.20.60Copy.60

My opinion/summary from there:

  • rn specializing on Copy is unsound from a type system pov, even as I don't know of, and can't think of, actual cases whether this results in broken invariants/ub
    • fixing specialization with lifetime dependent impls to be sound won't happen in the near future
    • forbidding lifetime dependent Copy impls is not possible/too much of a breaking change, as they are currently allowed with arbitrary where-bounds

I would like to avoid specializing on Copy. I believe we should land this PR if the approach of having a new trait implemented on derive(Copy) is good enough perf wise (whatever that means)

@lcnr lcnr removed the I-types-nominated Nominated for discussion during a types team meeting. label Jan 28, 2025
@bors
Copy link
Contributor

bors commented Feb 2, 2025

☔ The latest upstream changes (presumably #136448) made this pull request unmergeable. Please resolve the merge conflicts.

@joboet joboet added the I-libs-nominated Nominated for discussion during a libs team meeting. label Feb 3, 2025
@cuviper
Copy link
Member

cuviper commented Feb 7, 2025

Should we add manual conditional impls for types like Option<T> and [T; N]?
And how about compiler-implemented types like closures and tuples?

I know we're not going to perfectly recover everything that Copy specialization did right, but I think these will be impactful. It's also great that we could go further, like conditional Range<T> and unconditional slice::Iter<'_, T>.

@rust-log-analyzer

This comment has been minimized.

@joboet
Copy link
Member Author

joboet commented Feb 11, 2025

Should we add manual conditional impls for types like Option<T> and [T; N]? And how about compiler-implemented types like closures and tuples?

I know we're not going to perfectly recover everything that Copy specialization did right, but I think these will be impactful. It's also great that we could go further, like conditional Range<T> and unconditional slice::Iter<'_, T>.

Maybe, but let's just try the performance of this first:
@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 11, 2025
bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 11, 2025
stop specializing on `Copy`

fixes rust-lang#132442

`std` specializes on `Copy` to optimize certain library functions such as `clone_from_slice`. This is unsound, however, as the `Copy` implementation may not be always applicable because of lifetime bounds, which specialization does not take into account; the result being that values are copied even though they are not `Copy`. For instance, this code:
```rust
struct SometimesCopy<'a>(&'a Cell<bool>);

impl<'a> Clone for SometimesCopy<'a> {
    fn clone(&self) -> Self {
        self.0.set(true);
        Self(self.0)
    }
}

impl Copy for SometimesCopy<'static> {}

let clone_called = Cell::new(false);
// As SometimesCopy<'clone_called> is not 'static, this must run `clone`,
// setting the value to `true`.
let _ = [SometimesCopy(&clone_called)].clone();
assert!(clone_called.get());
```
should not panic, but does ([playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=6be7a48cad849d8bd064491616fdb43c)).

To solve this, this PR introduces a new `unsafe` trait: `TrivialClone`. This trait may be implemented whenever the `Clone` implementation is equivalent to copying the value (so e.g. `fn clone(&self) -> Self { *self }`). Because of lifetime erasure, there is no way for the `Clone` implementation to observe lifetime bounds, meaning that even if the `TrivialClone` has stricter bounds than the `Clone` implementation, its invariant still holds. Therefore, it is sound to specialize on `TrivialClone`.

I've changed all `Copy` specializations in the standard library to specialize on `TrivialClone` instead. Unfortunately, the unsound `#[rustc_unsafe_specialization_marker]` attribute on `Copy` cannot be removed in this PR as `hashbrown` still depends on it. I'll make a PR updating `hashbrown` once this lands.

With `Copy` no longer being considered for specialization, this change alone would result in the standard library optimizations not being applied for user types unaware of `TrivialClone`. To avoid this and restore the optimizations in most cases, I have changed the expansion of `#[derive(Clone)]`: Currently, whenever both `Clone` and `Copy` are derived, the `clone` method performs a copy of the value. With this PR, the derive macro also adds a `TrivialClone` implementation to make this case observable using specialization. I anticipate that most users will use `#[derive(Clone, Copy)]` whenever both are applicable, so most users will still profit from the library optimizations.

Unfortunately, Hyrum's law applies to this PR: there are some popular crates which rely on the precise specialization behaviour of `core` to implement "specialization at home", e.g. [`libAFL`](https://github.com/AFLplusplus/LibAFL/blob/89cff637025c1652c24e8d97a30a2e3d01f187a4/libafl_bolts/src/tuples.rs#L27-L49). I have no remorse for breaking such horrible code, but perhaps we should open other, better ways to satisfy their needs – for example by dropping the `'static` bound on `TypeId::of`...
@bors
Copy link
Contributor

bors commented Feb 11, 2025

⌛ Trying commit 80e881f with merge b50f56e...

@bors
Copy link
Contributor

bors commented Feb 11, 2025

☀️ Try build successful - checks-actions
Build commit: b50f56e (b50f56e97ab409c2609ac4794f3355435af2251a)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (b50f56e): comparison URL.

Overall result: ❌ regressions - please read the text below

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

mean range count
Regressions ❌
(primary)
0.5% [0.1%, 1.6%] 131
Regressions ❌
(secondary)
0.8% [0.2%, 1.7%] 37
Improvements ✅
(primary)
-0.4% [-0.5%, -0.4%] 2
Improvements ✅
(secondary)
-0.4% [-0.5%, -0.2%] 5
All ❌✅ (primary) 0.5% [-0.5%, 1.6%] 133

Max RSS (memory usage)

Results (primary 1.2%, secondary 2.5%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.7% [0.5%, 9.0%] 11
Regressions ❌
(secondary)
2.9% [1.9%, 4.0%] 13
Improvements ✅
(primary)
-4.2% [-7.3%, -2.5%] 3
Improvements ✅
(secondary)
-2.9% [-2.9%, -2.9%] 1
All ❌✅ (primary) 1.2% [-7.3%, 9.0%] 14

Cycles

Results (primary -11.3%, secondary 1.6%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.0% [1.0%, 1.0%] 3
Regressions ❌
(secondary)
1.6% [1.5%, 1.7%] 2
Improvements ✅
(primary)
-14.2% [-18.9%, -10.0%] 13
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -11.3% [-18.9%, 1.0%] 16

Binary size

Results (primary 0.3%, secondary 0.3%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.3% [0.0%, 1.9%] 120
Regressions ❌
(secondary)
0.3% [0.1%, 1.2%] 56
Improvements ✅
(primary)
-0.2% [-0.2%, -0.1%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.3% [-0.2%, 1.9%] 122

Bootstrap: 787s -> 787.644s (0.08%)
Artifact size: 348.27 MiB -> 348.04 MiB (-0.07%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Feb 11, 2025
@BoxyUwU
Copy link
Member

BoxyUwU commented Feb 11, 2025

Wow that's a lot better than I was expected. Great news

@joboet
Copy link
Member Author

joboet commented Feb 12, 2025

Should we add manual conditional impls for types like Option<T> and [T; N]?

Let's try if those two make up for the loss in performance.
@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 12, 2025
bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 12, 2025
stop specializing on `Copy`

fixes rust-lang#132442

`std` specializes on `Copy` to optimize certain library functions such as `clone_from_slice`. This is unsound, however, as the `Copy` implementation may not be always applicable because of lifetime bounds, which specialization does not take into account; the result being that values are copied even though they are not `Copy`. For instance, this code:
```rust
struct SometimesCopy<'a>(&'a Cell<bool>);

impl<'a> Clone for SometimesCopy<'a> {
    fn clone(&self) -> Self {
        self.0.set(true);
        Self(self.0)
    }
}

impl Copy for SometimesCopy<'static> {}

let clone_called = Cell::new(false);
// As SometimesCopy<'clone_called> is not 'static, this must run `clone`,
// setting the value to `true`.
let _ = [SometimesCopy(&clone_called)].clone();
assert!(clone_called.get());
```
should not panic, but does ([playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=6be7a48cad849d8bd064491616fdb43c)).

To solve this, this PR introduces a new `unsafe` trait: `TrivialClone`. This trait may be implemented whenever the `Clone` implementation is equivalent to copying the value (so e.g. `fn clone(&self) -> Self { *self }`). Because of lifetime erasure, there is no way for the `Clone` implementation to observe lifetime bounds, meaning that even if the `TrivialClone` has stricter bounds than the `Clone` implementation, its invariant still holds. Therefore, it is sound to specialize on `TrivialClone`.

I've changed all `Copy` specializations in the standard library to specialize on `TrivialClone` instead. Unfortunately, the unsound `#[rustc_unsafe_specialization_marker]` attribute on `Copy` cannot be removed in this PR as `hashbrown` still depends on it. I'll make a PR updating `hashbrown` once this lands.

With `Copy` no longer being considered for specialization, this change alone would result in the standard library optimizations not being applied for user types unaware of `TrivialClone`. To avoid this and restore the optimizations in most cases, I have changed the expansion of `#[derive(Clone)]`: Currently, whenever both `Clone` and `Copy` are derived, the `clone` method performs a copy of the value. With this PR, the derive macro also adds a `TrivialClone` implementation to make this case observable using specialization. I anticipate that most users will use `#[derive(Clone, Copy)]` whenever both are applicable, so most users will still profit from the library optimizations.

Unfortunately, Hyrum's law applies to this PR: there are some popular crates which rely on the precise specialization behaviour of `core` to implement "specialization at home", e.g. [`libAFL`](https://github.com/AFLplusplus/LibAFL/blob/89cff637025c1652c24e8d97a30a2e3d01f187a4/libafl_bolts/src/tuples.rs#L27-L49). I have no remorse for breaking such horrible code, but perhaps we should open other, better ways to satisfy their needs – for example by dropping the `'static` bound on `TypeId::of`...
@bors
Copy link
Contributor

bors commented Feb 12, 2025

⌛ Trying commit d31fbe3 with merge 3e31775...

@bors
Copy link
Contributor

bors commented Feb 12, 2025

☀️ Try build successful - checks-actions
Build commit: 3e31775 (3e3177541887d784070e5ab17d5ce89fcef9b9cf)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (3e31775): comparison URL.

Overall result: ❌ regressions - please read the text below

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

mean range count
Regressions ❌
(primary)
0.5% [0.1%, 1.6%] 130
Regressions ❌
(secondary)
0.8% [0.2%, 1.6%] 36
Improvements ✅
(primary)
-0.4% [-0.4%, -0.4%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.5% [-0.4%, 1.6%] 132

Max RSS (memory usage)

Results (primary 1.1%, secondary 3.4%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.2% [0.5%, 7.6%] 15
Regressions ❌
(secondary)
3.4% [2.2%, 4.5%] 2
Improvements ✅
(primary)
-4.5% [-7.9%, -2.1%] 3
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 1.1% [-7.9%, 7.6%] 18

Cycles

Results (primary 1.3%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.3% [1.1%, 1.6%] 4
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 1.3% [1.1%, 1.6%] 4

Binary size

Results (primary 0.3%, secondary 0.3%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.3% [0.0%, 1.9%] 122
Regressions ❌
(secondary)
0.3% [0.1%, 1.2%] 90
Improvements ✅
(primary)
-0.1% [-0.2%, -0.1%] 3
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.3% [-0.2%, 1.9%] 125

Bootstrap: 789.14s -> 789.465s (0.04%)
Artifact size: 347.71 MiB -> 347.57 MiB (-0.04%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 12, 2025
@Amanieu
Copy link
Member

Amanieu commented Feb 26, 2025

@joboet The regressions are still somewhat significant, are you expecting to be able to claw back more performance with more manual implementations of TrivialClone?

This was discussed in the libs team meeting and we understand that this is obviously the right thing to do for correctness, but we're still concerned about the performance impact.

@Amanieu Amanieu removed the I-libs-nominated Nominated for discussion during a libs team meeting. label Mar 5, 2025
@joboet
Copy link
Member Author

joboet commented Mar 6, 2025

@joboet The regressions are still somewhat significant, are you expecting to be able to claw back more performance with more manual implementations of TrivialClone?

I hope so. My gut feeling tells me that these regressions are all caused by just one or two cases where I need to readd the TrivialClone – I'll need to experiment a bit to determine where they are.

@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf-regression Performance regression. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Array and Vec's Clone specialization is maybe unsound with conditionally Copy types.