Skip to content

Commit 5ddc70b

Browse files
committed
Document current state of the union: I. Layout.
This isn't intended to create controversy, but document where discussion has settled. Please feel free to open more PRs to clear up additional items. Closes #156.
1 parent 36661ec commit 5ddc70b

File tree

2 files changed

+163
-25
lines changed

2 files changed

+163
-25
lines changed

reference/src/glossary.md

+31-7
Original file line numberDiff line numberDiff line change
@@ -190,8 +190,12 @@ guarantee that `Option<&mut T>` has the same size as `&mut T`.
190190

191191
While all niches are invalid bit-patterns, not all invalid bit-patterns are
192192
niches. For example, the "all bits uninitialized" is an invalid bit-pattern for
193-
`&mut T`, but this bit-pattern cannot be used by layout optimizations, and is not a
194-
niche.
193+
`&mut T`, but this bit-pattern cannot be used by layout optimizations, and is not a niche.
194+
195+
It is a surprisingly common misconception that niches can occur in [padding] bytes.
196+
They cannot: A niche representation must be invalid for `T`.
197+
But a padding byte must be irrelevant to the value of `T`.
198+
A byte that participates in deciding whether or not the representation is valid cannot, therefore, be a padding byte.
195199

196200
#### Zero-sized type / ZST
197201

@@ -207,6 +211,8 @@ requirement of 2.
207211

208212
*Padding* (of a type `T`) refers to the space that the compiler leaves between fields of a struct or enum variant to satisfy alignment requirements, and before/after variants of a union or enum to make all variants equally sized.
209213

214+
Padding for a type is either [interior padding], which is part of one or more fields, or [exterior padding], which is before, between, or after the fields.
215+
210216
Padding can be though of as `[Pad; N]` for some hypothetical type `Pad` (of size 1) with the following properties:
211217
* `Pad` is valid for any byte, i.e., it has the same validity invariant as `MaybeUninit<u8>`.
212218
* Copying `Pad` ignores the source byte, and writes *any* value to the target byte. Or, equivalently (in terms of Abstract Machine behavior), copying `Pad` marks the target byte as uninitialized.
@@ -217,8 +223,26 @@ for all values `v` and lists of bytes `b` such that `v` and `b` are related at `
217223
changing `b` at index `i` to any other byte yields a `b'` such `v` and `b'` are related (`Vrel_T(v, b')`).
218224
In other words, the byte at index `i` is entirely ignored by `Vrel_T` (the value relation for `T`), and two lists of bytes that only differ in padding bytes relate to the same value(s), if any.
219225

220-
This definition works fine for product types (structs, tuples, arrays, ...).
221-
The desired notion of "padding byte" for enums and unions is still unclear.
226+
This definition works fine for product types (structs, tuples, arrays, ...) and for unions. The desired notion of "padding byte" for enums is still unclear.
227+
228+
#### Padding (exterior)
229+
[exterior padding]: #exterior-padding
230+
231+
Exterior padding bytes are [padding] bytes that are not part of one or more fields. They are exactly the padding bytes that are not [interior padding], and therefore must be before, between, or after the fields of the type. Padding that comes after all fields is called [tail padding].
232+
233+
#### Padding (interior)
234+
[interior padding]: #interior-padding
235+
236+
Interior padding bytes are [padding] bytes that are part of one or more fields of a type.
237+
238+
We can say that a field `f: F` *contains* the byte at index `i` in the type `T` if the layout of `T` places `f` at offset `j` and we have `j <= i < j + size_of::<F>()`. Then a padding byte is interior padding if and only if there exists a field `f` that contains it.
239+
240+
It follows that, provided `T` is not an enum, for any such `f`, the byte at index `i - j` in `F` is a padding byte of `F`. This is because all values of `f` give rise to distinct values of `T`.
241+
242+
#### Padding (tail)
243+
[tail padding]: #tail-padding
244+
245+
Tail padding is [exterior padding] that comes after all fields of a type.
222246

223247
#### Place
224248

@@ -254,8 +278,8 @@ The relation should be functional for a fixed list of bytes (i.e., every list of
254278
It is partial in both directions: not all values have a representation (e.g. the mathematical integer `300` has no representation at type `u8`), and not all lists of bytes correspond to a value of a specific type (e.g. lists of the wrong size correspond to no value, and the list consisting of the single byte `0x10` corresponds to no value of type `bool`).
255279
For a fixed value, there can be many representations (e.g., when considering type `#[repr(C)] Pair(u8, u16)`, the second byte is a [padding byte][padding] so changing it does not affect the value represented by a list of bytes).
256280

257-
See the [value domain][value-domain] for an example how values and representation relations can be made more precise.
281+
See the [MiniRust page on values][minirust-values] for an example how values and representation relations can be made more precise.
258282

259283
[stacked-borrows]: https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md
260-
[value-domain]: https://github.com/rust-lang/unsafe-code-guidelines/tree/master/wip/value-domain.md
261-
[place-value-expr]: https://doc.rust-lang.org/reference/expressions.html#place-expressions-and-value-expressions
284+
[minirust-values]: https://github.com/RalfJung/minirust/blob/master/lang/values.md
285+
[place-value-expr]: https://doc.rust-lang.org/reference/expressions.html#place-expressions-and-value-expressions

reference/src/layout/unions.md

+132-18
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,12 @@
11
# Layout of unions
22

3-
**Disclaimer:** This chapter represents the consensus from issue
4-
[#13]. The statements in here are not (yet) "guaranteed"
5-
not to change until an RFC ratifies them.
3+
**Disclaimer**: This chapter is a work-in-progress.
4+
What's contained here represents the consensus from [various issues][union
5+
discussion].
6+
The statements in here are not (yet) "guaranteed" not to change until an RFC
7+
ratifies them.
68

7-
[#13]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/13
9+
[union discussion]: https://github.com/rust-lang/unsafe-code-guidelines/blob/master/active_discussion/unions.md
810

911
### Layout of individual union fields
1012

@@ -29,8 +31,23 @@ largest field, and the offset of each union field within its variant. How these
2931
are picked depends on certain constraints like, for example, the alignment
3032
requirements of the fields, the `#[repr]` attribute of the `union`, etc.
3133

32-
[padding]: ../glossary.md#padding
33-
[layout]: ../glossary.md#layout
34+
Unions may contain both [exterior][exterior padding] and [interior padding].
35+
In the below diagram, exterior padding is marked by `EXT`, interior padding by
36+
`INT`, and bytes that are padding bytes for a particular field but not padding
37+
for union as a whole are marked `NON`:
38+
39+
```text
40+
[ EXT [ field0_0_ty | INT | field0_1_ty | INT ] EXT ]
41+
[ EXT [ field1_0_ty | INT | NON NON NON | INT ] EXT ]
42+
[ EXT | NON NON NON | INT [ field2_0_ty ] INT | EXT ]
43+
```
44+
45+
It is necessarily the case that any byte that is a non-padding byte for any
46+
field is also a non-padding byte for the union.
47+
It is, in general, **unspecified** whether the converse is true.
48+
Specific reprs may specify whether or not bytes are padding bytes.
49+
50+
Padding bytes in unions has subtle implications; see the union [value model].
3451

3552
### Unions with default layout ("`repr(Rust)`")
3653

@@ -40,6 +57,13 @@ layout of Rust unions is, _in general_, **unspecified**.
4057
That is, there are no _general_ guarantees about the offset of the fields,
4158
whether all fields have the same offset, what the call ABI of the union is, etc.
4259

60+
**Major footgun:** The layout of `#[repr(Rust)]` enums allows for the [interior
61+
padding footgun] to also exist with `#[repr(Rust)]`, and this behaviour *is*
62+
extant in Rustc as of this writing. It is [**TBD**][#354] whether it will be
63+
removed.
64+
65+
[interior padding footgun]: #interior-padding-footgun
66+
4367
<details><summary><b>Rationale</b></summary>
4468

4569
As of this writing, we want to keep the option of using non-zero offsets open
@@ -107,23 +131,24 @@ the layout of `U1` is **unspecified** because:
107131
* `Zst2` is not a [1-ZST], and
108132
* `SomeOtherStruct` has an unspecified layout and could contain padding bytes.
109133

110-
### C-compatible layout ("repr C")
134+
### C-compatible layout (`#[repr(C)]`)
111135

112-
The layout of `repr(C)` unions follows the C layout scheme. Per sections
113-
[6.5.8.5] and [6.7.2.1.16] of the C11 specification, this means that the offset
114-
of every field is 0. Unsafe code can cast a pointer to the union to a field type
115-
to obtain a pointer to any field, and vice versa.
136+
The layout of `repr(C)` unions follows the C layout scheme.
137+
Per sections [6.5.8.5] and [6.7.2.1.16] of the C11 specification, this means that the offset
138+
of every field is 0, and the alignment of the union is the largest alignment of its fields.
139+
Unsafe code can cast a pointer to the union to a field type to obtain a pointer to any field, and vice versa.
116140

117141
[6.5.8.5]: http://port70.net/~nsz/c/c11/n1570.html#6.5.8p5
118142
[6.7.2.1.16]: http://port70.net/~nsz/c/c11/n1570.html#6.7.2.1p16
119143

120144
#### Padding
121145

122-
Since all fields are at offset 0, `repr(C)` unions do not have padding before
123-
their fields. They can, however, have padding in each union variant *after* the
124-
field, to make all variants have the same size.
146+
Since all fields are at offset 0, `repr(C)` unions do not have [padding] before
147+
their fields.
148+
They can, however, have padding in each union variant *after* the field, to make
149+
all variants have the same size.
125150

126-
Moreover, the entire union can have trailing padding, to make sure the size is a
151+
Moreover, the entire union can have tail padding, to make sure the size is a
127152
multiple of the alignment:
128153

129154
```rust
@@ -138,9 +163,47 @@ assert_eq!(size_of::<U>(), 2);
138163
# }
139164
```
140165

141-
> **Note**: Fields are overlapped instead of laid out sequentially, so
142-
> unlike structs there is no "between the fields" that could be filled
143-
> with padding.
166+
#### Interior Padding Footgun
167+
168+
**Major footgun:** On some platform ABIs, such as the popular arm64, C unions
169+
*may also have [interior padding] *within* fields, where a byte is padding in
170+
*every variant:
171+
172+
```rust
173+
#[repr(C)]
174+
union U {
175+
x: (u8, u16), // [u8, 1*pad, u16]
176+
y: (u8, u8), // [u8, 1*pad, u8, 1*pad]
177+
}
178+
let u = unsafe { mem::zeroed::<U>() }; // resulting bytes: [0, uninit (!!), 0, 0]
179+
let buf: &[u8] = unsafe { slice::from_raw_parts(transmute(&u), 4) }; // UB!
180+
```
181+
182+
This is, surprisingly, undefined behaviour, because it appears that the union is
183+
fully initialized and therefore ought to be castable to a slice.
184+
However, because byte 1 is a padding byte in both variants, it can be a padding
185+
byte in the union type as well.
186+
Therefore, when the result of `mem::zeroed` is copied onto the stack, the
187+
padding byte is uninitialized, not 0.
188+
189+
This behaviour is platform-specific; on some platforms, this example may be
190+
well-defined.
191+
192+
**C/C++ compatibility hazard:** This footgun exists for compatibility with the
193+
*C/C++ platform ABI, but it is not well-known in C/C++ communities.
194+
So whenever dealing with a union that might have internal padding across FFI
195+
boundaries, you should be particularly careful not to assume that all bytes are
196+
initialized.
197+
198+
<details><summary><b>Rationale</b></summary>
199+
200+
Look. It wasn't our idea.
201+
202+
We could try to limit the blast radius to `extern "C"` functions, but really,
203+
that's just sawing off the end of the footgun.
204+
205+
</details>
206+
144207

145208
#### Zero-sized fields
146209

@@ -172,4 +235,55 @@ translation of that code into Rust will not produce a compatible result. Refer
172235
to the [struct chapter](structs-and-tuples.md#c-compatible-layout-repr-c) for
173236
further details.
174237

238+
### Transparent layout (`#[repr(transparent)]`)
239+
240+
`#[repr(transparent)]` is currently unstable for unions, but [RFC 2645]
241+
documents most of its semantics.
242+
Notably, it causes unions to be passed using the same ABI as the non-1-ZST
243+
field.
244+
245+
**Major footgun:** Matching the interior ABI means that all padding bytes of the
246+
*non-1-ZST field will also be padding bytes of the union, so the [interior
247+
*padding footgun] exists with `#[repr(transparent)]` unions.
248+
249+
**Note:** If `U` is a transparent union wrapping a `T`, `U` may not inherit
250+
*`T`'s niches, and therefore `Option<U>` and `Option<T>`, for instance, will not
251+
*necessarily have the same layout or even the same size.
252+
253+
This is because, if `U` contains any zero-sized fields in addition to the `T`
254+
field, the [value model] forces `U` to support uninitialized bytes, and that in
255+
turn prevents `T`'s niches from being present in `U`.
256+
Currently, `U` also supports uninitialized bytes if it does not contain any
257+
additional fields, but it is [**TBD**][#364] if single-field transparent unions
258+
might support niches.
259+
260+
[RFC 2645]: https://github.com/rust-lang/rfcs/blob/master/text/2645-transparent-unions.md
261+
262+
### Bag-o-bytes layout (Repr-raw)
263+
264+
There are applications where it is desirable that unions behave simply as a
265+
buffer of abstract bytes, with no constraints on validity and no interior
266+
padding bytes that can [get surprisingly reset to uninit][interior padding
267+
footgun].
268+
269+
Thus, we propose that Rust support a repr, which we are tentatively calling the Raw-repr, which gives these semantics to unions. The Raw-repr may be `#[repr(Rust)]` or it may be a new repr, say `#[repr(Raw)`]. The Raw-repr will have the following properties:
270+
271+
* All fields are laid out at offset 0.
272+
* The alignment of the union is the greatest alignment among fields.
273+
* The only padding bytes are tail padding bytes, if any.
274+
275+
<details><summary><b>Rationale</b></summary>
276+
277+
We need at least one repr without the [interior mutability footgun]. This layout is extremely constrained, so it would generally be against the philosophy of `#[repr(Rust)]` to impose these constraints on the default layout instead of introducing a new one. However, without such constraints, `#[repr(Rust)]` is a just a giant, largely useless footgun, which is a rationale to simply constrain it and leave any potential relaxations, e.g. for safe transmutes and niches, to other reprs.
278+
279+
</details>
280+
281+
[#354]: https://github.com/rust-lang/unsafe-code-guidelines/issues/354
282+
[#364]: https://github.com/rust-lang/unsafe-code-guidelines/issues/364
175283
[1-ZST]: ../glossary.md#zero-sized-type--zst
284+
[exterior padding]: ../glossary.md#exterior-padding
285+
[interior padding]: ../glossary.md#interior-padding
286+
[layout]: ../glossary.md#layout
287+
[padding]: ../glossary.md#padding
288+
[union values]: ../validity/unions.md#values
289+
[value model]: ../glossary.md#value-model

0 commit comments

Comments
 (0)