Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: unchecked access to enum interior #1863

Open
Diggsey opened this issue Jan 22, 2017 · 14 comments
Open

Feature: unchecked access to enum interior #1863

Diggsey opened this issue Jan 22, 2017 · 14 comments
Labels
T-libs-api Relevant to the library API team, which will review and decide on the RFC.

Comments

@Diggsey
Copy link
Contributor

Diggsey commented Jan 22, 2017

I'll use Option as an example, but this applies to any enum:

Sometimes you may have a *mut Option<T>, and want to get a *mut T from that, without dereferencing the pointer, and without checking whether it is actually a Some(T). AFAICT, this is currently impossible to do in rust, unless you have some way to construct a valid T (which would allow calculating an offset before-hand and then using pointer arithmetic).

Using mem::uninitialized(), or mem::zeroed() to construct a T before-hand to calculate the pointer offset doesn't work, because of enum layout optimization, which means it could be UB.

This came up when writing some concurrency code: in this case de-referencing the pointer would be invalid because the memory may be being concurrently written to. The unsafe code is able to be correct because it checks at run-time that it has exclusive access, and that it was indeed a Some(T) before it ever tries to dereference the pointer.

@SimonSapin
Copy link
Contributor

Does this work for you? https://docs.rs/unreachable/0.1.1/unreachable/trait.UncheckedOptionExt.html

Under the hood this uses a match expression which in the None case unsafely creates a &Void (with enum Void {}) and matches on that. The optimizer assumes that this code is "impossible" (unreachable), and should then eliminate the branch in the original match expression.

If you want to make this slightly less dangerous, https://crates.io/crates/debug_unreachable adds a check in debug mode only.

@Diggsey
Copy link
Contributor Author

Diggsey commented Jan 22, 2017

@SimonSapin I can't use that directly since I need a pointer to the interior, and dereferencing the initial pointer at all is UB. A similar approach might work in release mode, as a way to calculate the pointer offset before-hand, but I'd need it to work in debug mode too (also, relying on compiler optimizations for correctness is not ideal!)

@SimonSapin
Copy link
Contributor

Sorry, I don’t understand: why is it UB to dereference that *mut Option<T> pointer if it’s not UB to do anything at all with it?

@Diggsey
Copy link
Contributor Author

Diggsey commented Jan 22, 2017

The timeline looks approximately like this:

  • Read *mut Option<T> from atomic variable
  • Take *mut Option<T> and convert it to a *mut T
  • Try obtaining exclusive access to the data, via an atomic compare-exchange
  • If successful, dereference the *mut T
  • If failed, throw away the *mut T and try again

Until I know I have exclusive access to the pointer, it would be UB to try dereferencing it (data race) but I can't wait until I've got exclusive access to convert the pointer.

@burdges
Copy link

burdges commented Jan 22, 2017

Afaik, there is not necessarily any valid pointer to the interior of an Option<NonZero<_>>, which includes pointer types like Option<Unique<_>> and Option<Shared<_>> and likely anything built using them.

Can you build up a custom trait that provides this pointer when it exists? Roughly :

unsafe trait InteriorRef {
    type Interior;
    unsafe fn interior_mut(*mut Self) -> *mut Interior;
    unsafe fn interior(*cost self) -> *const Self {
        interior_mut( ::std::mem::transmute::<&Self,&mut Self>(self) )
    }
}

unsafe impl<T> InteriorRef for Option<T> where T: Copy+Default {
    type Interior = T;
    pub fn interior_mut(*mut s) -> *mut Interior {
        let x = Some(<T as Default>::default());
        let Some(mut ref y) = x;
        let o: usize = (y as *mut T as usize) - (&x as *mut Self as usize) + (s as usize);
        o as *mut Interior;
    }
}

@Diggsey
Copy link
Contributor Author

Diggsey commented Jan 22, 2017

@burdges There is always a valid pointer to the interior of an Option<T> - even with layout optimization, the Some(T) case is guaranteed to have a T as part of its layout. This is how Option::as_ref() works.

@ticki
Copy link
Contributor

ticki commented Jan 22, 2017

https://github.com/search?l=rust&q=%22unchecked_unwrap%22+language%3ARust&type=Code&utf8=%E2%9C%93

We need this in libstd badly.

@Diggsey
Copy link
Contributor Author

Diggsey commented Jan 22, 2017

I found a (horrible) workaround for my specific Option case:

fn unwrap_unchecked<T>(x: *mut Option<T>) -> *mut T {
    let offset = mem::size_of::<Option<T>>() - mem::size_of::<T>();
    (x as usize + offset) as *mut T
}

Obviously this will fail if rust ever adds more advanced layout manipulations, but hopefully this issue is resolved properly before then...

@ahicks92
Copy link

@Diggsey
Hi, sorry, we already have them. They're just off by default at the moment, but won't be for much longer.

They'd be on, but a ton of personal things came up, so the last pull request that actually enables it isn't in yet.

@ahicks92
Copy link

@Diggsey
Copy link
Contributor Author

Diggsey commented Jan 23, 2017

@camlorn In that case, the only option left would seem to be to create a bit pattern which is likely to be within the domain of T, (eg. 0x80808080...) and construct a Some::<T> from that, and use that to determine the enum layout beforehand via Option::as_ref().

@comex
Copy link

comex commented Jan 23, 2017

That sounds like a pretty horrific hack :)

Have you considered replacing your use of Option with a union?

edit: This would be a nice feature to have though.

@eddyb
Copy link
Member

eddyb commented Jan 23, 2017

My take on this is that even a really cut down version of #1450 would be enough here, and that's more likely to get in (the implementation is already there in part, as MIR requires downcasts to variants) than any hack.

@oconnor663
Copy link

oconnor663 commented Oct 18, 2017

Using mem::uninitialized(), or mem::zeroed() to construct a T before-hand to calculate the pointer offset doesn't work, because of enum layout optimization, which means it could be UB.

Could someone explain that part in more detail? Is it UB to construct such a pointer even if it's never dereferenced? Or would it have to get dereferenced in some way?

Edit: Hmm, I think I understand part of the problem. Could something like this be correct?

fn option_offset<T>() -> usize {
    // Avoid creating an uninitialized Some if the null pointer optimization is
    // in effect, because it's not actually guaranteed to be Some.
    if mem::size_of::<T>() == mem::size_of::<Option<T>>() {
        return 0;
    }
    let dummy: Option<T> = unsafe { Some(mem::uninitialized()) };
    let dummy_ptr = &dummy as *const Option<T> as usize;
    let interior_ptr = dummy.as_ref().unwrap() as *const T as usize;
    mem::forget(dummy);
    interior_ptr - dummy_ptr
}

@Centril Centril added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Feb 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

No branches or pull requests

9 participants