Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the 'static requirement from the BinRead Vec implementation by delegating to a new trait method. #317

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
91 changes: 86 additions & 5 deletions binrw/src/binread/impls.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,9 @@ use core::num::{
};

macro_rules! binread_impl {
($($type_name:ty),*$(,)?) => {
// `$(some_lit $(__unused $is_lit_present:tt))?` allows us to match on the present of a literal
// using `$($($is_list_present)? fn do_whatever() {})?`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure the benefits outweigh the costs of using a weirdo macro pattern like this instead of just using a second macro and sending the extra function in a tt (binread_impl + binread_impl_int)?

($($type_name:ty $(as int $(__unused $is_int:tt)?)?),*$(,)?) => {
$(
impl BinRead for $type_name {
type Args<'a> = ();
Expand All @@ -29,12 +31,91 @@ macro_rules! binread_impl {
}
})
}

$(
$($is_int)?
fn read_options_count<'a, R, C>(
reader: &mut R,
endian: Endian,
_args: Self::Args<'a>,
count: C::Count,
) -> BinResult<C>
where
R: Read + Seek,
Self::Args<'a>: Clone,
C: crate::container::Container<Item=Self>,
{
C::new_smart(count, |mut buf| {
reader.read_exact(&mut bytemuck::cast_slice_mut::<_, u8>(
&mut buf,
))?;

if core::mem::size_of::<Self>() != 1
&& ((cfg!(target_endian = "big") && endian == crate::Endian::Little)
|| (cfg!(target_endian = "little") && endian == crate::Endian::Big))
{
for value in buf.iter_mut() {
*value = value.swap_bytes();
}
}
Ok(())
})
Comment on lines +48 to +62
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was previously done in two passes: incrementally read the whole buffer, then byteswap the whole buffer.

The current construction reads each portion of the buffer, swaps the bytes of that portion, then reads the next portion, etc. Does this make a meaningful performance difference, and should we consider splitting into a "read phase" and a "finalize" phase?

A finalize phase seems ripe for misuse, though. would it be possible to prevent someone from ignoring the read phase and instead doing the read during the finalization phase? I suppose a by-value map would do it.

C::new_smart(
    count,
    |mut buf| {
        reader.read_exact(&mut bytemuck::cast_slice_mut::<_, u8>(&mut buf))?;
        Ok(())
    },
    |value| {
        if core::mem::size_of::<Self>() != 1
            && ((cfg!(target_endian = "big") && endian == crate::Endian::Little)
                || (cfg!(target_endian = "little") && endian == crate::Endian::Big))
        {
            value.swap_bytes()
        } else {
            value
        }
    },
)

i feel like this is haring off further in the direction of over-complication.

}
)?
}
)*
}
}

binread_impl!(u8, u16, u32, u64, u128, i8, i16, i32, i64, i128, f32, f64);
binread_impl!(
u16 as int,
u32 as int,
u64 as int,
u128 as int,
i8 as int,
i16 as int,
i32 as int,
i64 as int,
i128 as int,
f32,
f64
);

impl BinRead for u8 {
type Args<'a> = ();
fn read_options<R: Read + Seek>(
reader: &mut R,
_endian: Endian,
_args: Self::Args<'_>,
) -> BinResult<Self> {
let mut val = 0u8;
let pos = reader.stream_position()?;
reader
.read_exact(core::slice::from_mut(&mut val))
.or_else(crate::__private::restore_position(reader, pos))?;

Ok(val)
}

// This extra impl for `u8` makes it faster than
// `binread_impl`, but *only* because `binread_impl` is not allowed
// to use unsafe code to eliminate the unnecessary zero-fill.
// Otherwise, performance would be identical and it could be
// deleted.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure this comment makes sense any more since had been contextualised by surrounding code which no longer exists?

fn read_options_count<'a, R, C>(
reader: &mut R,
_endian: Endian,
_args: Self::Args<'a>,
count: C::Count,
) -> BinResult<C>
where
R: Read + Seek,
Self::Args<'a>: Clone,
C: crate::container::Container<Item = Self>,
{
C::new_smart(count, |buf| reader.read_exact(buf).map_err(Into::into))
}
Comment on lines +100 to +117
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. This comment was copied over and is inaccurate now because I wasn't paying close enough attention to how I was implementing this version of the specialization. (Notably, we can only get this version of the specialization when using Vec, and I've completely abstracted access to that Vec away here. This is a performance regression.

Ideally we'd be using the current implementation for arrays, and read_to_end for Vec. Funnily enough, we might be able to use fake specialization that requires 'static here to switch between the two implementations, since we only need to care about u8 (and anything that might implement Container... should we be sealing Container?) instead of .

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, no, that doesn't work. That would require us to bound C: Container<Item=Self> + 'static, and if user generated items are not static, the Container wrapping those items probably won't be static either.

}

fn unexpected_zero_num() -> Error {
Error::Io(io::Error::new(
Expand Down Expand Up @@ -133,7 +214,7 @@ pub struct VecArgs<Inner: Clone> {

impl<B> BinRead for Vec<B>
where
B: BinRead + 'static,
B: BinRead,
for<'a> B::Args<'a>: Clone,
{
type Args<'a> = VecArgs<B::Args<'a>>;
Expand All @@ -143,7 +224,7 @@ where
endian: Endian,
args: Self::Args<'_>,
) -> BinResult<Self> {
crate::helpers::count_with(args.count, B::read_options)(reader, endian, args.inner)
B::read_options_count(reader, endian, args.inner, args.count)
}
}

Expand All @@ -159,7 +240,7 @@ where
endian: Endian,
args: Self::Args<'_>,
) -> BinResult<Self> {
array_init::try_array_init(|_| BinRead::read_options(reader, endian, args.clone()))
BinRead::read_options_count(reader, endian, args, ())
}
}

Expand Down
56 changes: 56 additions & 0 deletions binrw/src/binread/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -192,6 +192,62 @@ pub trait BinRead: Sized {
endian: Endian,
args: Self::Args<'_>,
) -> BinResult<Self>;

/// Read `count` items of `Self` from the reader using the given [`Endian`] and arguments
///
/// A vehicle for optimizations of types that can easily be read many-at-a-time.
/// For example, the integral types {i,u}{8,16,32,64,128}.
///
/// # Errors
///
/// If reading fails, an [`Error`](crate::Error) variant will be returned.
///
/// # Examples
///
/// ```
/// # use binrw::{io::{Read, Seek}, BinRead, BinResult, Endian, container::Container};
/// struct CustomU8(u8);
///
/// impl BinRead for CustomU8 {
/// type Args<'a> = <u8 as BinRead>::Args<'a>;
///
/// fn read_options<R: Read + Seek>(
/// reader: &mut R,
/// endian: binrw::Endian,
/// args: Self::Args<'_>,
/// ) -> BinResult<Self> {
/// u8::read_options(reader, endian, args).map(CustomU8)
/// }
///
/// fn read_options_count<'a, R, C>(
/// reader: &mut R,
/// endian: Endian,
/// args: Self::Args<'a>,
/// count: C::Count,
/// ) -> BinResult<C>
/// where
/// R: Read + Seek,
/// Self::Args<'a>: Clone,
/// C: Container<Item=Self>
/// {
/// let c: C::HigherSelf<u8> = u8::read_options_count(reader, endian, args, count)?;
/// Ok(c.map(CustomU8))
/// }
/// }
/// ```
fn read_options_count<'a, R, C>(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name: read_collection?

I’m not feeling confident that this should be part of the public API if this is really all just to find a better way to handle specialisation of contiguous primitive types. There are already two ways to read into collections (newtypes and helper functions), this adds a third one, and if with directive gets implemented then there will be four. What does the decision tree look like about when to use each of these different ways? What about the write-API?

Something seems weird to me that this change means read_options becomes effectively an optimisation of read_options_count(r, e, a, 1).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: being a public API: This is also a way for people who have a contiguous slice of bytemuck/zerocopy types to add the optimization themselves (unless we want to add that to binrw uner feature flags), and that would be a less contrived example of the feature.

Overall, I chose the approach that I did due to a combination of making the main library less special (only one to be able to define specializations like this), not require the use of unsafe, and (if i've pulled things off right) being semver compatible.

That said, this design isn't perfect. I don't like that I'm exposing Collection as a public API. That means that users may expect more than bare functionality out of it. (If they care at all. If they don't, then exposing this as public did nothing.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something seems weird to me that this change means read_options becomes effectively an optimisation of read_options_count(r, e, a, 1).

Standard library is in a similar bind with io::Read::read_buf and io::Read::read. They can be implemented in terms of eachother, io::Read::read existed first, and it is more optimal to implement io::Read::read_buf.

They've decided as a prerequsite that they'll need a way to tell the compiler that you only need to implement one or the other. If that ever stabilizes, we can do the same.

reader: &mut R,
endian: Endian,
args: Self::Args<'a>,
count: C::Count,
) -> BinResult<C>
where
R: Read + Seek,
Self::Args<'a>: Clone,
C: crate::container::Container<Item = Self>,
{
C::new_naive(count, || Self::read_options(reader, endian, args.clone()))
}
}

/// Extension methods for reading [`BinRead`] objects directly from a reader.
Expand Down
121 changes: 121 additions & 0 deletions binrw/src/container.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
//! container module
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: everything in here needs proper documentation.

#[cfg(not(feature = "std"))]
use alloc::vec::Vec;

/// Container
pub trait Container: Sized + IntoIterator {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure I understand, this trait is just to be able to support both Vec<T> and [T; N]?

Is there some other collection type that would be supportable in the future?

Should this be sealed?

Should this be hidden?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This IntoIterator bound isn't technically necessary. There was an earlier iteration where I was playing around with IntoIterator and FromIterator until I realized that FromIterator doesn't play nicely with arrays, and it was nice to reuse the Item associated type.

/// Count
type Count;

/// naive
///
/// # Errors
///
/// If `f` returns an error, the error will be returned
fn new_naive<Fun, Error>(count: Self::Count, f: Fun) -> Result<Self, Error>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name: new? new_slow?

If the ‘smart’ constructor was filling a MaybeUninit so that Default constraint was not required, would it still be necessary for this to exist at all?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmmm... I suppose, technically, it wouldn't, because we could just iterate over the slice provided by read_smart and perform reads one by one, and that would be morally equivalent, even if the underlying strategies are different.

where
Fun: FnMut() -> Result<Self::Item, Error>;

/// smart
///
/// # Errors
///
/// If `f` returns an error, the error will be returned
fn new_smart<Fun, Error>(count: Self::Count, f: Fun) -> Result<Self, Error>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name: new_contiguous? new_fast?

where
Fun: FnMut(&mut [Self::Item]) -> Result<(), Error>,
Self::Item: Default + Clone;
Comment on lines +19 to +27
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If features read_buf and core_io_borrowed_buf were stable, passing a BorrowedCursor to the closure instead of &mut [Self::Item], and using Read::read_buf_exact would almost get us to not needing to initialize our buffers. But, with the current state of that API, we'd have no way to own the filled values (without unsafe) -- we would only get a reference to a slice of the filled values. (This doesn't mean that such an API won't eventually be added, only that it doesn't currently exist.)


// whee it's a Functor
/// Type Constructor
type HigherSelf<T>: Container<Count = Self::Count, HigherSelf<Self::Item> = Self>
+ IntoIterator<Item = T>;

/// map
fn map<Fun, T>(self, f: Fun) -> Self::HigherSelf<T>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this part of this API (beyond the example in the read_options_count docs, which seems pretty contrived to me)? Is this also the only reason why this trait requires IntoIterator?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of the API very much directly exists because of that example: because when I went to write it, I reached for this pattern and realized that if i wanted to use that pattern that I would need to add support for it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A less contrived form of the example would be someone writing binrw impls for fixed point numbers. For example, a type with 2 integral bits and 30 fractional bits that fits in a u32 and is stored as such, then reading a dataset that contains large arrays of these integers.

It would be nice if the user could reuse "reading a large amount of u32s" instead of writing it themselves. Whether that's preferred over making the user do it themselves is arguable, though. If we make the user do it themselves, they can use bytemuck or zerocopy to ensure there are no extra copies. This implementation has no such guarantees.

where
Fun: FnMut(Self::Item) -> T;
Comment on lines +29 to +37
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, having to invent an ecosystem around a new trait so that people can do things that are maybe useful.

This uses the Functor pattern from https://varkor.github.io/blog/2019/03/28/idiomatic-monads-in-rust.html (specifically, "attempt 2", before they start inventing (generic) associated traits to try to express further patterns).

Goal here was to allow users to delegate to other implementations, then map the values into another container "of the same form." I'm effectively trying to keep an eye out for if we ever support not having alloc around: parsing arrays will only ever use arrays, and not alloc::vec::Vec

}

impl<T, const N: usize> Container for [T; N] {
type Count = ();
type HigherSelf<X> = [X; N];

fn new_naive<Fun, Error>(_count: (), mut f: Fun) -> Result<Self, Error>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Self::Count instead of the literal type (throughout)?

where
Fun: FnMut() -> Result<Self::Item, Error>,
{
array_init::try_array_init(|_| f())
}

fn new_smart<Fun, Error>(_count: (), mut f: Fun) -> Result<Self, Error>
where
Fun: FnMut(&mut [Self::Item]) -> Result<(), Error>,
Self::Item: Default + Clone,
{
let mut res = array_init::array_init(|_| Self::Item::default());
f(&mut res)?;
Ok(res)
}

fn map<Fun, X>(self, f: Fun) -> Self::HigherSelf<X>
where
Fun: FnMut(Self::Item) -> X,
{
self.map(f)
}
}

impl<T> Container for Vec<T> {
type Count = usize;
type HigherSelf<X> = Vec<X>;

fn new_naive<Fun, Error>(count: usize, f: Fun) -> Result<Self, Error>
where
Fun: FnMut() -> Result<Self::Item, Error>,
{
core::iter::repeat_with(f).take(count).collect()
}

fn new_smart<Fun, Error>(count: usize, mut f: Fun) -> Result<Self, Error>
where
Fun: FnMut(&mut [Self::Item]) -> Result<(), Error>,
Self::Item: Default + Clone,
{
let mut list = Self::default();
let mut start = 0;
let mut remaining = count;
// Allocating and reading from the source in chunks is done to keep
// a bad `count` from causing huge memory allocations that are
// doomed to fail
while remaining != 0 {
// Using a similar strategy as std `default_read_to_end` to
// leverage the memory growth strategy of the underlying Vec
// implementation (in std this will be exponential) using a
// minimum byte allocation
let growth: usize = 32 / core::mem::size_of::<u32>();
list.reserve(remaining.min(growth.max(1)));

let items_to_read = remaining.min(list.capacity() - start);
let end = start + items_to_read;

// In benchmarks, this resize decreases performance by 27–40%
// relative to using `unsafe` to write directly to uninitialised
// memory, but nobody ever got fired for buying IBM
list.resize(end, Self::Item::default());
f(&mut list[start..end])?;

remaining -= items_to_read;
start += items_to_read;
}

Ok(list)
}

fn map<Fun, X>(self, f: Fun) -> Self::HigherSelf<X>
where
Fun: FnMut(Self::Item) -> X,
{
self.into_iter().map(f).collect()
}
}
Loading
Loading