You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This isn't intended to create controversy, but document where discussion
has settled. Please feel free to open more PRs to clear up additional
items.
Closes#156.
Copy file name to clipboardexpand all lines: reference/src/glossary.md
+31-7
Original file line number
Diff line number
Diff line change
@@ -190,8 +190,12 @@ guarantee that `Option<&mut T>` has the same size as `&mut T`.
190
190
191
191
While all niches are invalid bit-patterns, not all invalid bit-patterns are
192
192
niches. For example, the "all bits uninitialized" is an invalid bit-pattern for
193
-
`&mut T`, but this bit-pattern cannot be used by layout optimizations, and is not a
194
-
niche.
193
+
`&mut T`, but this bit-pattern cannot be used by layout optimizations, and is not a niche.
194
+
195
+
It is a surprisingly common misconception that niches can occur in [padding] bytes.
196
+
They cannot: A niche representation must be invalid for `T`.
197
+
But a padding byte must be irrelevant to the value of `T`.
198
+
A byte that participates in deciding whether or not the representation is valid cannot, therefore, be a padding byte.
195
199
196
200
#### Zero-sized type / ZST
197
201
@@ -207,6 +211,8 @@ requirement of 2.
207
211
208
212
*Padding* (of a type `T`) refers to the space that the compiler leaves between fields of a struct or enum variant to satisfy alignment requirements, and before/after variants of a union or enum to make all variants equally sized.
209
213
214
+
Padding for a type is either [interior padding], which is part of one or more fields, or [exterior padding], which is before, between, or after the fields.
215
+
210
216
Padding can be though of as `[Pad; N]` for some hypothetical type `Pad` (of size 1) with the following properties:
211
217
*`Pad` is valid for any byte, i.e., it has the same validity invariant as `MaybeUninit<u8>`.
212
218
* Copying `Pad` ignores the source byte, and writes *any* value to the target byte. Or, equivalently (in terms of Abstract Machine behavior), copying `Pad` marks the target byte as uninitialized.
@@ -217,8 +223,26 @@ for all values `v` and lists of bytes `b` such that `v` and `b` are related at `
217
223
changing `b` at index `i` to any other byte yields a `b'` such `v` and `b'` are related (`Vrel_T(v, b')`).
218
224
In other words, the byte at index `i` is entirely ignored by `Vrel_T` (the value relation for `T`), and two lists of bytes that only differ in padding bytes relate to the same value(s), if any.
219
225
220
-
This definition works fine for product types (structs, tuples, arrays, ...).
221
-
The desired notion of "padding byte" for enums and unions is still unclear.
226
+
This definition works fine for product types (structs, tuples, arrays, ...) and for unions. The desired notion of "padding byte" for enums is still unclear.
227
+
228
+
#### Padding (exterior)
229
+
[exterior padding]: #exterior-padding
230
+
231
+
Exterior padding bytes are [padding] bytes that are not part of one or more fields. They are exactly the padding bytes that are not [interior padding], and therefore must be before, between, or after the fields of the type. Padding that comes after all fields is called [tail padding].
232
+
233
+
#### Padding (interior)
234
+
[interior padding]: #interior-padding
235
+
236
+
Interior padding bytes are [padding] bytes that are part of one or more fields of a type.
237
+
238
+
We can say that a field `f: F`*contains* the byte at index `i` in the type `T` if the layout of `T` places `f` at offset `j` and we have `j <= i < j + size_of::<F>()`. Then a padding byte is interior padding if and only if there exists a field `f` that contains it.
239
+
240
+
It follows that, provided `T` is not an enum, for any such `f`, the byte at index `i - j` in `F` is a padding byte of `F`. This is because all values of `f` give rise to distinct values of `T`.
241
+
242
+
#### Padding (tail)
243
+
[tail padding]: #tail-padding
244
+
245
+
Tail padding is [exterior padding] that comes after all fields of a type.
222
246
223
247
#### Place
224
248
@@ -254,8 +278,8 @@ The relation should be functional for a fixed list of bytes (i.e., every list of
254
278
It is partial in both directions: not all values have a representation (e.g. the mathematical integer `300` has no representation at type `u8`), and not all lists of bytes correspond to a value of a specific type (e.g. lists of the wrong size correspond to no value, and the list consisting of the single byte `0x10` corresponds to no value of type `bool`).
255
279
For a fixed value, there can be many representations (e.g., when considering type `#[repr(C)] Pair(u8, u16)`, the second byte is a [padding byte][padding] so changing it does not affect the value represented by a list of bytes).
256
280
257
-
See the [value domain][value-domain] for an example how values and representation relations can be made more precise.
281
+
See the [MiniRust page on values][minirust-values] for an example how values and representation relations can be made more precise.
This is, surprisingly, undefined behaviour, because it appears that the union is
181
+
fully initialized and therefore ought to be castable to a slice.
182
+
However, because byte 1 is a padding byte in both variants, it can be a padding
183
+
byte in the union type as well.
184
+
Therefore, when the result of `mem::zeroed` is copied onto the stack, the
185
+
padding byte is uninitialized, not 0.
186
+
187
+
This behaviour is platform-specific; on some platforms, this example may be
188
+
well-defined.
189
+
190
+
**C/C++ compatibility hazard:** This footgun exists for compatibility with the
191
+
*C/C++ platform ABI, but it is not well-known in C/C++ communities.
192
+
In particular, unions are sometimes treated as non-exhaustive, with an expectation that they will be ABI-compatible with future versions of the same code that have additional variatns for the union.
193
+
Padding, however, can cause unions not to actually be ABI-compatible with future versions of the same type.
194
+
(Note that it's also possible that adding a new variant might change the parameter-passing conventions, however, even in the absence of padding!)
195
+
So whenever dealing with a union that might have padding across FFI boundaries, you should be particularly careful not to assume that all bytes are initialized.
196
+
197
+
<details><summary><b>Rationale</b></summary>
198
+
199
+
Look. It wasn't our idea.
200
+
201
+
We could try to limit the blast radius to `extern "C"` functions, but really,
202
+
that's just sawing off the end of the footgun.
203
+
204
+
</details>
205
+
144
206
145
207
#### Zero-sized fields
146
208
@@ -172,4 +234,63 @@ translation of that code into Rust will not produce a compatible result. Refer
172
234
to the [struct chapter](structs-and-tuples.md#c-compatible-layout-repr-c) for
173
235
further details.
174
236
237
+
### Transparent layout (`#[repr(transparent)]`)
238
+
239
+
`#[repr(transparent)]` is currently unstable for unions, but [RFC 2645]
240
+
documents most of its semantics.
241
+
Notably, it causes unions to be passed using the same ABI as the non-1-ZST
242
+
field.
243
+
244
+
**Major footgun:** Matching the interior ABI means that all padding bytes of the
245
+
*non-1-ZST field will also be padding bytes of the union, so the [interior
246
+
*padding footgun] exists with `#[repr(transparent)]` unions.
247
+
248
+
**Note:** If `U` is a transparent union wrapping a `T`, `U` may not inherit
249
+
*`T`'s niches, and therefore `Option<U>` and `Option<T>`, for instance, will not
250
+
*necessarily have the same layout or even the same size.
251
+
252
+
This is because, if `U` contains any zero-sized fields in addition to the `T`
253
+
field, the [value model] forces `U` to support uninitialized bytes, and that in
254
+
turn prevents `T`'s niches from being present in `U`.
255
+
Currently, `U` also supports uninitialized bytes if it does not contain any
256
+
additional fields, but it is [**TBD**][#364] if single-field transparent unions
There are applications where it is desirable that unions behave simply as a
264
+
buffer of abstract bytes, with no constraints on validity and no interior
265
+
padding bytes that can [get surprisingly reset to uninit][interior padding
266
+
footgun].
267
+
268
+
Thus, we propose that Rust support a repr, which we are tentatively calling the Raw-repr, which gives these semantics to unions. The Raw-repr may be `#[repr(Rust)]` or it may be a new repr, say `#[repr(Raw)`], which one is TBD. The Raw-repr will have the following properties:
269
+
270
+
* All fields are laid out at offset 0.
271
+
* The alignment of the union is the greatest alignment among fields (or 1, in the case of an empty union).
272
+
* There are no padding bytes---even the bytes that aren't part of any variant, that would otherwise be tail padding, are not padding.
273
+
* If the union is over-aligned with an `#[repr(align(n))]` attribute, then any bytes beyond the "natural" alignment are tail padding.
274
+
275
+
Note that Raw-repr unions are *not* a substitute for `#[repr(C)]` unions. Although it would be nice if we could avoid the [padding footgun] that way.
276
+
277
+
<details><summary><b>Rationale</b></summary>
278
+
279
+
We need at least one repr without the [padding footgun], because interior padding in particular is surprising.
280
+
In particular, if users want to treat unions as non-exhaustive in a way that is ABI compatible with future versions with more fields, then such unions must not contain any padding.
281
+
The presence of tail padding---such as with `union([u8; 3], u16)`, which could have a single byte of tail padding---is less surprising.
282
+
But it would still prevent ABI forwards-compatibility if a `u32` field were added later.
283
+
284
+
This layout is extremely constrained, so it would generally be against the philosophy of `#[repr(Rust)]` to impose these constraints on the default layout instead of introducing a new one. However, without such constraints, `#[repr(Rust)]` is a just a giant, largely useless footgun, which is a rationale to simply constrain it and leave any potential relaxations, e.g. for safe transmutes and niches, to other reprs. Thus, whether it becomes a new repr or not is still TBD.
0 commit comments