Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor allocator strategy #421

Closed
wants to merge 6 commits into from
Closed

Refactor allocator strategy #421

wants to merge 6 commits into from

Conversation

zzau13
Copy link
Contributor

@zzau13 zzau13 commented Jul 20, 2020

Closes #419
Closes #401
It is just the main idea. But before continuing I would like some comment.

Hello World benchmark

Old

Buffer Bytes            time:   [30.068 ns 30.155 ns 30.273 ns] 

New

Buffer Bytes            time:   [14.207 ns 14.366 ns 14.573 ns]                          
                        change: [-51.621% -50.128% -48.373%] (p = 0.00 < 0.05)
                        Performance has improved.

@seanmonstar
Copy link
Member

What's the motivation for handling allocation ourselves instead of reusing what Vec does?

@zzau13 zzau13 marked this pull request as draft July 20, 2020 16:55
@zzau13 zzau13 marked this pull request as ready for review July 20, 2020 22:58
@zzau13 zzau13 changed the title [WIP] Refactor allocator strategy Refactor allocator strategy Jul 20, 2020
@zzau13 zzau13 marked this pull request as draft July 20, 2020 23:15
@zzau13 zzau13 marked this pull request as ready for review July 21, 2020 09:49
@zzau13
Copy link
Contributor Author

zzau13 commented Jul 21, 2020

What's the motivation for handling allocation ourselves instead of reusing what Vec does?

With these changes it has been achieved:

  • Avoid double checks when use BytesMut.reserve(usize)
  • Improve rustc inline optimizations
  • "Real" zero cost BytesMut.freeze()

@zzau13
Copy link
Contributor Author

zzau13 commented Jul 21, 2020

@seanmonstar In the benchmarks, https://github.com/botika/buf-min/tree/e62a7975ed7ea00c473902ae774af537c0374949/benches, you can see the difference. Just change the byte version in Cargo.toml. In conclusion, for a "Hello World!", it takes half the time.
I can add all the coverage that you think is convenient to have more soundness.

@seanmonstar
Copy link
Member

The difference is very cool! Would you be able to explain the differences proposed here, vs what Vec is doing? Usually, just relying on std is a good idea, since they handle safety and speed. But if we are going to differ, we should be able to explain exactly why.

@zzau13
Copy link
Contributor Author

zzau13 commented Jul 21, 2020

The difference is very cool! Would you be able to explain the differences proposed here, vs what Vec is doing? Usually, just relying on std is a good idea, since they handle safety and speed. But if we are going to differ, we should be able to explain exactly why.

@fanatid
Copy link
Contributor

fanatid commented Jul 21, 2020

@botika can you comment this? Should be vector be shrinked?
#418 (comment)

Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave this a quick skim and commented on a few minor style nits. I'd like to think through the details of the change a little more, as well.

NonNull::dangling()
} else {
check_capacity_overflow(capacity);
unsafe {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this function (alloc_buf) is going to be safe to call, it might be worth having a comment here explaining why this is always safe with any value of capacity.

@@ -1476,6 +1530,7 @@ impl PartialEq<Bytes> for BytesMut {
}
}

#[inline]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious to know whether or not adding inline hints to these internal functions has a noticeable result on performance — in general, I expect #[inline] (not #[inline(always)]) to only be meaningful for pub functions, since it enables cross-crate inlining, and since it's a hint, rustc is pretty free to ignore it...

}

#[inline]
fn non_null(ptr: *mut u8, layout: Layout) -> NonNull<u8> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth having a comment here (and on the already-present vptr function?) explaining when they're supposed to be used. It looks like vptr is called to construct a NonNull from a raw ptr which is assumed to be an already valid pointer to a Vec, while this function is called to construct a NonNull from a raw ptr which was returned by the allocator? But, I am not sure if this distinction is clear on the first read-through.

Copy link
Contributor Author

@zzau13 zzau13 Jul 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this result, https://github.com/rust-lang/rust/blob/8ad7bc3f428300aee6764f6e23527e19eb235e81/src/liballoc/alloc.rs#L177.
In vptr, this result has already been checked since it comes from a Vec.

/// No cost BytesMut to Bytes converter
///
/// # Safety
/// Make sure `bytes.kind()` is `KIND_VEC`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add an assertion (or at least a debug assertion?) in this function before performing the conversion?

Copy link
Contributor Author

@zzau13 zzau13 Jul 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but .kind() is private, make it pub(crate)?
https://github.com/tokio-rs/bytes/blob/master/src/bytes_mut.rs#L841-L844

src/bytes.rs Outdated
Comment on lines 163 to 178
if ptr as usize & KIND_MASK == 0 {
let data = ptr as usize | KIND_VEC;
Bytes {
ptr: off_ptr,
len,
data: AtomicPtr::new(data as *mut _),
vtable: &PROMOTABLE_EVEN_VTABLE,
}
} else {
Bytes {
ptr: off_ptr,
len,
data: AtomicPtr::new(ptr as *mut _),
vtable: &PROMOTABLE_ODD_VTABLE,
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style nit, take it or leave it: this might be a little more readable as

Suggested change
if ptr as usize & KIND_MASK == 0 {
let data = ptr as usize | KIND_VEC;
Bytes {
ptr: off_ptr,
len,
data: AtomicPtr::new(data as *mut _),
vtable: &PROMOTABLE_EVEN_VTABLE,
}
} else {
Bytes {
ptr: off_ptr,
len,
data: AtomicPtr::new(ptr as *mut _),
vtable: &PROMOTABLE_ODD_VTABLE,
}
}
let (data, vtable) = if ptr as usize & KIND_MASK == 0 {
(ptr as usize | KIND_VEC as *mut _, &PROMOTABLE_EVEN_VTABLE)
} else {
(ptr, &PROMOTABLE_ODD_VTABLE)
};
Bytes {
ptr: off_ptr,
len,
data: AtomicPtr::new(data),
vtable,
}

@zzau13
Copy link
Contributor Author

zzau13 commented Jul 21, 2020

@fanatid I don't see any capacity in the free call, I do not know what he's talking about.
https://www.tutorialspoint.com/c_standard_library/c_function_free.htm

@Darksonn
Copy link
Contributor

When running the following program with miri, it fails due to the memory being deallocated with the wrong capacity.

[package]
name = "dealloc-test"
version = "0.1.0"
authors = ["Alice Ryhl <[email protected]>"]
edition = "2018"

[dependencies]
bytes = { git = "https://github.com/botika/bytes", rev = "alloc" }
use bytes::BytesMut;

fn main() {
    let mut b = BytesMut::with_capacity(16);
    b.extend_from_slice(&[0; 8]);
    let frozen = b.freeze();
    drop(frozen);
}
error: Undefined Behavior: incorrect layout on deallocation: allocation has size 16 and alignment 1, but gave size 8 and alignment 1
   --> /home/alice/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/liballoc/alloc.rs:102:14
    |
102 |     unsafe { __rust_dealloc(ptr, layout.size(), layout.align()) }
    |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ incorrect layout on deallocation: allocation has size 16 and alignment 1, but gave size 8 and alignment 1
    |
    = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
    = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
            
    = note: inside `std::alloc::dealloc` at /home/alice/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/liballoc/alloc.rs:102:14
    = note: inside `<std::alloc::Global as std::alloc::AllocRef>::dealloc` at /home/alice/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/liballoc/alloc.rs:186:22
    = note: inside `alloc::alloc::box_free::<[u8]>` at /home/alice/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/liballoc/alloc.rs:295:9
    = note: inside `std::intrinsics::drop_in_place::<std::boxed::Box<[u8]>> - shim(Some(std::boxed::Box<[u8]>))` at /home/alice/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ptr/mod.rs:184:1
    = note: inside `std::mem::drop::<std::boxed::Box<[u8]>>` at /home/alice/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/mem/mod.rs:883:24
    = note: inside closure at /home/alice/.cargo/git/checkouts/bytes-31cde87f5a385f88/03f963b/src/bytes.rs:903:13
    = note: inside `<std::sync::atomic::AtomicPtr<()> as bytes::loom::sync::atomic::AtomicMut<()>>::with_mut::<[closure@bytes::bytes::promotable_even_drop::{{closure}}#0 0:&*const u8, 1:&usize], ()>` at /home/alice/.cargo/git/checkouts/bytes-31cde87f5a385f88/03f963b/src/loom.rs:17:17
    = note: inside `bytes::bytes::promotable_even_drop` at /home/alice/.cargo/git/checkouts/bytes-31cde87f5a385f88/03f963b/src/bytes.rs:894:5
    = note: inside `<bytes::Bytes as std::ops::Drop>::drop` at /home/alice/.cargo/git/checkouts/bytes-31cde87f5a385f88/03f963b/src/bytes.rs:525:18
    = note: inside `std::intrinsics::drop_in_place::<bytes::Bytes> - shim(Some(bytes::Bytes))` at /home/alice/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ptr/mod.rs:184:1
    = note: inside `std::mem::drop::<bytes::Bytes>` at /home/alice/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/mem/mod.rs:883:24
note: inside `main` at src/main.rs:7:5
   --> src/main.rs:7:5
    |
7   |     drop(frozen);
    |     ^^^^^^^^^^^^
    = note: inside closure at /home/alice/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:67:34
    = note: inside closure at /home/alice/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:52:73
    = note: inside `std::sys_common::backtrace::__rust_begin_short_backtrace::<[closure@std::rt::lang_start_internal::{{closure}}#0::{{closure}}#0 0:&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe], i32>` at /home/alice/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/sys_common/backtrace.rs:130:5
    = note: inside closure at /home/alice/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:52:13
    = note: inside `std::panicking::r#try::do_call::<[closure@std::rt::lang_start_internal::{{closure}}#0 0:&&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe], i32>` at /home/alice/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panicking.rs:348:40
    = note: inside `std::panicking::r#try::<i32, [closure@std::rt::lang_start_internal::{{closure}}#0 0:&&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe]>` at /home/alice/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panicking.rs:325:15
    = note: inside `std::panic::catch_unwind::<[closure@std::rt::lang_start_internal::{{closure}}#0 0:&&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe], i32>` at /home/alice/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panic.rs:394:14
    = note: inside `std::rt::lang_start_internal` at /home/alice/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:51:25
    = note: inside `std::rt::lang_start::<()>` at /home/alice/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:67:5

error: aborting due to previous error

@fanatid
Copy link
Contributor

fanatid commented Jul 21, 2020

@Darksonn thank you. I created small example on playground with Vec which illustrate this problem https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=19802f08ea7fdc58b341dae3aa5efdfd
Looks like this happened because Layout can be invalid and layout required on dealloc as second arg.. https://doc.rust-lang.org/alloc/alloc/trait.AllocRef.html#tymethod.dealloc

@zzau13
Copy link
Contributor Author

zzau13 commented Jul 21, 2020

@Darksonn That is, we should pass the capacity to Bytes to later dealloc or dealloc when freezing to Bytes or what else?

@Darksonn
Copy link
Contributor

I am not very familiar with the internals of bytes. I imagine you may be able to solve it by doing the equivalent of first creating a Bytes of the full capacity and then calling split to reduce the range of the Bytes to the actual length? There may also be other better solutions — I don't know.

@zzau13
Copy link
Contributor Author

zzau13 commented Jul 21, 2020

@Darksonn 9cfdbe6 What do you think? It is more optimal than dealloc when freezing

But with Bytes.advance() in version "0.5.6" the same thing happens

@zzau13 zzau13 requested a review from hawkw July 21, 2020 19:58
@zzau13
Copy link
Contributor Author

zzau13 commented Jul 21, 2020

Okay, everything should be solved. Comments?

@zzau13 zzau13 closed this Jul 22, 2020
@hawkw
Copy link
Member

hawkw commented Jul 23, 2020

I think we might still want to make the changes to improve reserve and BytesMut::with_capacity, even if we do want to keep deallocation in freeze.

Another potential solution we could consider is implementing some kind of heuristic for whether or not to deallocate in freeze based on whether or not the buffer after deallocating could be vectorized? This could get us the best of both worlds. But, it may not be worth introducing more variable behavior, if it makes Bytes harder to reason about for users...

@zzau13
Copy link
Contributor Author

zzau13 commented Jul 24, 2020

I closed it because realloc is not behaving as expected, I have to investigate further on the subject and I will come back to do another PR. Thanks !!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BytesMut::freeze is not zero-cost New allocator strategy
5 participants