-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Union validity #368
Union validity #368
Conversation
The TBD aspects of this seem to leave enough room for further refinement. |
81a8a87
to
4ce6ade
Compare
Thanks for pointing out the rationale that unions are sometimes supposed to be treated as non-exhuastive so that they can preserve unknown variants. This is a strong rationale for removing tail padding from Raw-repr unions, since it could similarly break forwards compatibility, so I have updated the doc to account for that (except when the union is deliberately over-aligned). |
The [representation relation] is trivial in both directions, except for [padding | ||
bytes] which are uninitialized in all values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The [representation relation] is trivial in both directions, except for [padding | |
bytes] which are uninitialized in all values. | |
The [representation relation] is trivial in both directions, except for [padding | |
bytes] which are uninitialized in all values. |
Usually padding just isn't part of the value, and hence becomes uninitialized in encode
(representing a value as bytes). Why do it differently here?
|
||
## Value space for #[repr(C)] and Raw-repr unions | ||
|
||
`#[repr(C)]` and [Raw-repr][raw repr] unions can store any byte value, except for [padding bytes] which are always uninitialized. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A padding byte is not a value so I don't understand what this means.
(reads a 3rd time)
Oh... I parsed ("value, except for padding bytes which are...") as the thing that can be stored, but that's not what you mean. Please reformulate.
Actually, I am not even sure what you mean by "value space". You defined the "value model", as in the set of Value
s (in MiniRust terms) that make up unions, above. I can only guess that this here is describing the set of List<AbstractByte>
that can encode a union, which is technically redundant with the previous section but seems worth keeping. However, padding bytes can be represented by any AbstractByte, so in that case the sentence should say that any sequence of bytes is a valid union encoding.
|
||
## Possible niche values | ||
|
||
The presence of [padding bytes], and writes to individual fields in general, makes niches hard to come by in unions. A niche representation of a union would have to not only be invalid for every single one of its fields, but also impossible to construct in Safe Rust with any combination of writes to any of its fields. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two sections above already imply that there can be no niche. I am confused by this third discussion of the same decision.
assert_eq!(size_of::<Option<U>>(), 1); // Requires a niche, which in turn requires that U must be initialized to be valid. | ||
``` | ||
|
||
We are **not** describing this case as unspecified, but instead as TBD."Constructible with Safe Rust" is a poorly-defined and very complex invariant, which falls short of the UCG's goals of easily checked, easy to understand (such as it were) semantics, and therefore we are not comfortable leaving the language in this state on an indefinite basis. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are **not** describing this case as unspecified, but instead as TBD."Constructible with Safe Rust" is a poorly-defined and very complex invariant, which falls short of the UCG's goals of easily checked, easy to understand (such as it were) semantics, and therefore we are not comfortable leaving the language in this state on an indefinite basis. | |
We are **not** describing this case as unspecified, but instead as TBD. "Constructible with Safe Rust" is a poorly-defined and very complex invariant, which falls short of the UCG's goals of easily checked, easy to understand (such as it were) semantics, and therefore we are not comfortable leaving the language in this state on an indefinite basis. |
|
||
<details><summary><b>Rationale</b></summary> | ||
|
||
We have not yet reached consensus on whether or not we wish to leave the door open for the possibility that unions with safe field access, or `#[repr(transparent)]` unions with no ZSTs, contain niches: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If they do, however, we need to adjust the value representation / the set of valid representations of a union value. So if we want to keep the door open as you do here, that needs to already be said earlier when the value domain / representation is defined.
|
||
We can say that a byte is *sometimes padding* for a union `U` if there is *some* inhabited field `f` such that the byte is either padding for `f` or not a part of `f`. | ||
|
||
In that case, the byte will be uninitialized in the value `U{f: /* some value */ }`. By the [monotonicity property], therefore, all sometimes-padding bytes can contain any byte value, be it undefined or any bit pattern with any provenance. Likewise, if multiple bytes are padding for the same field, then they can take on any possible combination of byte values between them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, the byte will be uninitialized in the value `U{f: /* some value */ }`. By the [monotonicity property], therefore, all sometimes-padding bytes can contain any byte value, be it undefined or any bit pattern with any provenance. Likewise, if multiple bytes are padding for the same field, then they can take on any possible combination of byte values between them. | |
In that case, the byte will be uninitialized in the value `U{f: /* some value */ }`. By the [monotonicity property], therefore, all sometimes-padding bytes can contain any byte value, be it undefined or any bit pattern with any provenance. |
That last sentence is confusing and I don't know what it is supposed to achieve.
|
||
As per the previous section, however, just because a byte is a sometimes-padding byte does not mean it can always safely be set to uninitialized (or any other value), if this can produce a value not reachable from Safe Rust. | ||
|
||
For instance, the following is presently unsound (assuming that `#[repr(Rust)]` is not the [Raw-repr]), even assuming that all fields are placed at offset 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"unsound" is a property of safe APIs so this sentence is a category error.
I'd just say that this creates a union value that could not be created in safe Rust and so passing it to unsuspecting code might cause UB -- for example, someone might rely on the fact that "if the first byte is at least 2 then the 2nd byte is initialized". I don't think it is clear whether the crate making the assumption or the crate producing the strange value is wrong -- ultimately only the crate defining u
can make this call. (Maybe add forward reference to next section, or reorder the sections.)
|
||
} | ||
#[repr(Safe)] | ||
Union u { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Union u { | |
Union U { |
unsafe { (&mut u.b as *mut u8 as *mut MaybeUninit<u8>).offset(1).write(MaybeUninit::uninit()) }; // resulting bytes: [0xff, uninit] | ||
``` | ||
|
||
This value is impossible to reach in Safe Rust: the only way to write uninit to the padding is to write to the boolean field. Writing to the integer field must initialize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In C, as far as I know, writing to a field can clobber neighboring padding bytes. So actually, maybe there is a way to construct the bad value in safe code?
} | ||
``` | ||
|
||
Making this field access safe would require additional an safety invariant that can be understood by the compiler. The UCG WG does not oppose such a safety invariant, but believes it should be opt-in, and an RFC for such a feature is beyond our remit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making this field access safe would require additional an safety invariant that can be understood by the compiler. The UCG WG does not oppose such a safety invariant, but believes it should be opt-in, and an RFC for such a feature is beyond our remit. | |
This code could be made sound by convincing the author of crate a to add explicit documentation stating that `i` is always initialized. | |
Making this field access *safe* would require additional an safety invariant that can be understood by the compiler. The UCG WG does not oppose such a safety invariant, but believes it should be opt-in, and an RFC for such a feature is beyond our remit. |
|
||
This struct code, however, is absolutely sound, even in the absence of a safety invariant documented by `S`, because of `S`'s validity invariant: for `S` to be valid, all its fields must be valid, and therefore `i` must be initialized. If it weren't, the definition of `get_i` wouldn't be the problem: the caller would be committing UB by passing an uninitialized `S`. Consequently, the `unsafe` block is redundant. | ||
|
||
But for the union type `U`, its validity invariant is not transitive to its fields. `u.i` has no guarantee of validity for `U` to be valid. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But for the union type `U`, its validity invariant is not transitive to its fields. `u.i` has no guarantee of validity for `U` to be valid. | |
But for the union type `U`, its validity invariant does not imply the one for its fields. `u.i` has no guarantee of validity for `U` to be valid. |
This is not really a case of "transitivity" IMO.
Closing due to inactivity. Note that the UCG book isn't really maintained these days -- probably the proper thing would be to write up a markdown document, schedule a t-opsem meeting, use that to update the document, and then FCP the result. |
Closes #352.