-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
argon2: add parallelism #547
base: master
Are you sure you want to change the base?
argon2: add parallelism #547
Conversation
Could you benchmark the parallel implementation and compare it against the single threaded one? |
This comment was marked as outdated.
This comment was marked as outdated.
memory_blocks | ||
.segment_views(slice, lanes) | ||
.for_each(|mut memory_view| { | ||
let lane = memory_view.lane(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note that this fill_blocks
diff is very noisy, due to a necessary indentation change + rustfmt + diff getting somewhat lost.
The only changes to this function are the use of the segment view iterator (here), the accessing of memory through the segment view API instead of through indexing of the memory_blocks
slice (bellow), and changing memory_blocks
to be mutable (above).
Some benchmarks: Note: these are outdated since the removal of 018c3e9 due to #568 (comment). Benchmarking master...HEAD with parallel feature
Note: 6-core CPU with SMT. Also: Benchmarking master...HEAD without parallel feature, default param tests only
|
@jonasmalacofilho if you can rebase I added |
264821d
to
018c3e9
Compare
@tarcieri oh, i forgot about that one. Rebased, and thanks for pointing it out! That said, we should probably try also to add the very cheapest of tests and have it run in Miri in CI:
|
By the way, I think there are some things I can improve in the code, but I would really appreciate a review first. And so I've kept edits to a minimum for now, so that you can actually review it. |
Yeah, Miri is tricky specifically because you can't do anything computationally expensive under it. I think we could potentially gate expensive tests under |
I think the 2_8_2 (t=2,m=2,p=2) tests are the cheapest in the crate, and still quite expensive... I could try adding t=1,m=8,p=2 tests and see if they execute in acceptable time in CI. Additionally, maybe a few unit tests ensuring that allowed borrows pass in Miri, and that some known invalid borrow patterns are either impossible at compile time or caught at runtime. |
Quick update: I ended up getting stuck trying to remove the (apparently) unrelated warnings from Miri (a warning in crossbeam and a leak due to rayon), and then I couldn't get to this PR for a few weeks. EDIT: (easily) running the tests in Miri is not currently possible due to crossbeam-rs/crossbeam#1181. Once that fix is released, it's possible that only Tree Borrows may work due to crossbeam-rs/crossbeam#545. |
018c3e9
to
31cecde
Compare
Coordinated shared access in the memory blocks is implemented with `SegmentViewIter` and associated types, which provide views into Argon2 memory that can be processed in parallel. These views alias in the regions that are read-only, but are disjoint in the regions where mutation happens. Effectively, they implement, with a combination of mutable borrowing and runtime checking, the cooperative contract outlined in RFC 9106. To avoid aliasing mutable references into the entire buffer of blocks (which would be UB), pointers are used up to the moment where a reference (shared or mutable) into a specific block is returned. At that point, aliasing is no longer possible, as argued in SAFETY comments and/or checked at runtime. Finally, add a `parallel` feature and parallelize filling the blocks using the memory views mentioned above and rayon.
This was cause by having multiple different versions of criterion, and therefore the train, in use: we specified ^0.4, but pprof 0.14.0 already required ^0.5.
Additionally, use a set instead of trying to avoid repeating a particular set of params by hand.
31cecde
to
0f5355a
Compare
I removed the conflict, rebased the PR, fixed/updated the benchmarks and did some other minor cleanup. Crossbeam-epoch doesn't currently work in Miri (see my edited comment above). Between that and the fact that even the most minimal Miri test would be super slow on GitHub free runners, I just don't think they are worth it for now. (It should be still possible to get an older toolchain and Miri and run some specific tests locally). Is there's something else you would like me to add here? |
@jonasmalacofilho still need to go through it in detail, but there's nothing I see that's an immediate blocker |
Adds a
default-enabledparallel
feature, with anotherwiseoptional dependency onrayon
, and parallelize the filling of blocks using the memory views mentioned above.Coordinated shared access in the memory blocks is implemented with a
SegmentViewIter
iterator, which implements eitherrayon::iter::ParallelIterator
orcore::iter::Iterator
and returnsSegmentView
views into the Argon2 blocks memory that are safe to be used in parallel.The views alias in the regions that are read-only, but are disjoint in the regions where mutation happens. Effectively, they implement, with a combination of mutable borrowing and runtime checking, the cooperative contract outlined in RFC 9106. This is similar to what was suggested in #380.
To avoid aliasing mutable references into the entire buffer of blocks (which would be UB), pointers are used up to the moment where a reference (shared or mutable) into a specific block is returned. At that point, aliasing is no longer possible.
The following tests have been tried in and pass Miri (modulo unrelated warnings):
(Running these in Miri is quite slow, taking ~5 minutes each, so I only ran the most obviously relevant tests for now).
Finally, the alignment ofBlock
s increases to 128 bytes for better prevention of false sharing on modern platforms. The new value is based on notes oncrossbeam-utils::CachePadded
.I also took some inspiration from an intermediate snapshot of #247, before the parallel implementation was removed, as well as from an implementation without any safe abstractions I just worked on for the rust-argon2 crate (sru-systems/rust-argon2#56).