-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add simple data-race detector #1617
Conversation
passes all current tests but additional tests are required
837715a
to
cae4302
Compare
src/data_race.rs
Outdated
|
||
use smallvec::SmallVec; | ||
|
||
use crate::*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use crate::*; | |
use crate::MiriEvalContext; |
Since there are only a couple of includes using crate
below?
src/data_race.rs
Outdated
impl<'mir, 'tcx: 'mir> EvalContextExt<'mir, 'tcx> for crate::MiriEvalContext<'mir, 'tcx> {} | ||
pub trait EvalContextExt<'mir, 'tcx: 'mir>: crate::MiriEvalContextExt<'mir, 'tcx> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
impl<'mir, 'tcx: 'mir> EvalContextExt<'mir, 'tcx> for crate::MiriEvalContext<'mir, 'tcx> {} | |
pub trait EvalContextExt<'mir, 'tcx: 'mir>: crate::MiriEvalContextExt<'mir, 'tcx> { | |
impl<'mir, 'tcx: 'mir> EvalContextExt<'mir, 'tcx> for MiriEvalContext<'mir, 'tcx> {} | |
pub trait EvalContextExt<'mir, 'tcx: 'mir>: MiriEvalContextExt<'mir, 'tcx> { |
src/data_race.rs
Outdated
let data_race = &*this.memory.extra.data_race; | ||
let old = data_race.multi_threaded.get(); | ||
|
||
data_race.multi_threaded.set(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you can use let old = data_race.multi_threaded replace(false);
.
src/data_race.rs
Outdated
"Invalid alt (>=):\n l: {:?}\n r: {:?}",l,r | ||
); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: missing trailing newline here and in other files.
@JCTyblaidd thanks a lot! This is coming quite out of the blue, but a welcome surprise. :) Unfortunately I will probably not have the time during the week to review such a massive PR. I hope next week-end will work, but it might also be 2 weeks until I have a few free hours to do the initial review.
Conceivably a program could briefly create a thread for a one-off task and then continue running mostly sequentially afterwards. So I think it would still be good to do some benchmarking to measure the impact of this. Unfortunately Miri does not have an established benchmarking infrastructure, all we have are two little crates that I used for some ad-hoc benchmarking in the past. Could you measure the impact of your PR on those cases (after adding a dummy thread created in the beginning)? |
Maybe it could be disabled again once all threads except the main thread have been joined. |
Well, then there could be a background thread just sitting there waiting for some cleanup to need happening, but not actually doing anything. (Crossbeam has such a thread, AFAIK.) Basically, I am worried about making "there's a thread somewhere doing nothing" becoming a huge performance cliff. |
add dangling thread variant of one of the benchmarks
Only did the tests for one of the benchmarks, this only adds one dangling thread so the overhead might be reduced by being inside the small vector size limit. Will try get more numbers later. Single-threaded:time ./miri run ./bench-cargo-miri/mse/src/main.rs -Zmiri-disable-stacked-borrows
Single-threaded with overhead:time ./miri run ./bench-cargo-miri/mse_and_dangling_thread/src/main.rs -Zmiri-disable-stacked-borrows real 0m2.184s
Changed benchmark with variant:
|
Thanks for measuring this! Is the "716.6 ms ± 6.6 ms" using your branch, or master? I am surprised that the difference is so small (doesn't this affect every single memory access?), also given that |
"716.6 ms ± 6.6 ms" is with my branch, which should is similar to current upstream master with the only overhead of an easily speculated branch that is always false to skip data-race checking, seems to be within the margin of error. This could be the benchmark being used though, if the values do not escape into memory and are kept inline? More benchmarks would probably be better but I can't get the other one to compile, it complains about importing serde_json. Upstream master
The overhead might be minimal in the above "744.0 ms ± 7.1 ms" case because(maybe wrong):
Probably need more benchmarks to get a clearer picture. |
previously only data races between two non-atomic accesses were detected.
code & add support for vector index re-use for multiple threads after termination.
convert validate atomic op to use shared reference and get_raw instead of get_raw_mut so it can be used for validate_atomic_load as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really great overall, thanks a lot for this impressive piece of work! My comments are mostly formatting and grammar nits, and some naming suggestions.
I did not fully understand every part of the algorithm, but I do not necessarily have to. What I saw mostly makes intuitive sense, except for release sequences which I need to look into further, and some parts of the handling of non-atomic accesses.
What I missed is anything specific to SeqCst
. SeqCst
induces a global ordering on all sequentially consistent operations, shouldn't that affect the data race detector? Currently, if I saw correctly, SeqCst
is fully equivalent to always using the strongest available release/acquire ordering.
authors = ["Ralf Jung <[email protected]>"] | ||
edition = "2018" | ||
|
||
[dependencies] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is a good idea to duplicate this example code... isn't there a better way, such as adding a cfg
flag or so?
src/machine.rs
Outdated
@@ -109,12 +109,15 @@ impl fmt::Display for MiriMemoryKind { | |||
pub struct AllocExtra { | |||
/// Stacked Borrows state is only added if it is enabled. | |||
pub stacked_borrows: Option<stacked_borrows::AllocExtra>, | |||
/// Data race detection via the use of a vector-clock. | |||
pub data_race: data_race::AllocExtra, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to -Zmiri-disable-stacked-borrows
, I think it would make sense to have -Zmiri-disable-data-race-detector
(or so) which turns this field into None
to ensure there is no overhead.
src/machine.rs
Outdated
} | ||
|
||
/// Extra global memory data | ||
#[derive(Clone, Debug)] | ||
pub struct MemoryExtra { | ||
pub stacked_borrows: Option<stacked_borrows::MemoryExtra>, | ||
pub data_race: data_race::MemoryExtra, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should then likely also be an Option
field if we have -Zmiri-disable-stacked-borrows
.
src/shims/posix/linux/sync.rs
Outdated
@@ -78,7 +78,10 @@ pub fn futex<'tcx>( | |||
// Read an `i32` through the pointer, regardless of any wrapper types. | |||
// It's not uncommon for `addr` to be passed as another type than `*mut i32`, such as `*const AtomicI32`. | |||
// FIXME: this fails if `addr` is not a pointer type. | |||
let futex_val = this.read_scalar_at_offset(addr.into(), 0, this.machine.layouts.i32)?.to_i32()?; | |||
// FIXME: what form of atomic operation should the `futex` use to load the value? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question... aren't there any docs or so defining this?
src/shims/posix/sync.rs
Outdated
ecx.read_scalar_at_offset(mutex_op, offset, ecx.machine.layouts.i32) | ||
ecx.read_scalar_at_offset_atomic( | ||
mutex_op, offset, ecx.machine.layouts.i32, | ||
AtomicReadOp::SeqCst |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this really be SeqCst
? I would have expected Acquire
. (Same for the other reads in this file I think.)
I've pushed a few commits that should fix the requests. As i added into a comment inside the data-race descriptor code, per the C++ specification on sequentially consistent ordering:
This means, since we currently do not model weak memory effects and the scheduling algorithm already provides a global total order of all memory operations, that seq-cst loads/stores are equivalent to acquire loads/release stores and a seq-cst RMW is equivalent to a acquire-release RMW. |
And regarding vector index re-use, i tidied up the explanation in the comments. But the high-level view is a vector-index can only be re-used once the thread has terminated, been joined, and the effects of the join happen-before the current vector clock of all currently active threads. Therefore no data-races can possible be reported with the old thread, since the thread local happens-before clocks do not decrease, any stored timestamps from that index must be less than all active vector clocks, which means that no data-races can ever be reported at that index, so the vector-index stops providing any value. The program will then advance the vector index by 1 and then allow the vector-index to be re-used for a newly created thread. |
(I only had time to go over the first of your new commits today, will continue tomorrow. Thanks for your patience!) |
Ah, with this extra last condition about all threads, that makes of sense -- thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot! This is my final round of comments/questions.
Sadly, I do not have the time to actually understand the underlying data race detection algorithm. But thankfully you based this of a POPL paper, so I am confident that if necessary, I can dig into that paper to understand the grand scheme of things. If there are ways in which you diverged from the paper, or particular implementation choices you made (beyond what you already explained in comments), I'd much appreciate if you could add those to the comment at the top of the data race detector, so that whoever tries to match up paper and implementation in the future will have it as easy as you can make it.
I also like your compile-fail tests. Are there others you could imagine having that you didn't have the time to write? If so, please open an issue, so maybe someone can pick this up in the future. One thing I wondered about is a test related to release sequences -- something that is UB because the release sequence got broken.
… hold-over from earlier versions so fixed.
Fixed review changes, I added some comments to |
I am not surprised to see atomic accesses in the mutex implementation, this makes sense -- they just should not induce any synchronization themselves. So |
src/data_race.rs
Outdated
@@ -37,6 +37,24 @@ | |||
//! to a acquire load and a release store given the global sequentially consistent order | |||
//! of the schedule. | |||
//! | |||
//! The timestamps used in the data-race detector assign each sequence of non-atomic operations | |||
//! followed by a single atomic or concurrent operation a single timestamp. | |||
//! Write, Read, Write, ThreadJoin will be represented by a single timestamp value on a thread |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a full stop missing at the end here?
Also, does this match what the paper does? If the paper increments before and after, then adding a comment that this is somewhat strange but see the paper for details seems better than trying to adjust the algorithm the paper authors came up with.
src/data_race.rs
Outdated
@@ -499,7 +517,7 @@ pub trait EvalContextExt<'mir, 'tcx: 'mir>: MiriEvalContextExt<'mir, 'tcx> { | |||
} | |||
|
|||
/// Perform an atomic compare and exchange at a given memory location |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Perform an atomic compare and exchange at a given memory location | |
/// Perform an atomic compare and exchange at a given memory location. |
src/data_race.rs
Outdated
/// For normal locks this should be equivalent to `validate_lock_release` | ||
/// this function only exists for joining over the set of concurrent readers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// For normal locks this should be equivalent to `validate_lock_release` | |
/// this function only exists for joining over the set of concurrent readers | |
/// For normal locks this should be equivalent to `validate_lock_release`. | |
/// This function only exists for joining over the set of concurrent readers |
src/shims/posix/sync.rs
Outdated
@@ -62,9 +62,11 @@ fn mutex_get_kind<'mir, 'tcx: 'mir>( | |||
mutex_op: OpTy<'tcx, Tag>, | |||
) -> InterpResult<'tcx, ScalarMaybeUninit<Tag>> { | |||
let offset = if ecx.pointer_size().bytes() == 8 { 16 } else { 12 }; | |||
//FIXME: this has been made atomic to fix data-race reporting inside the internal | |||
// mutex implementation, it may not need to be atomic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to leave a FIXME here I think (same in the rest of this file).
src/data_race.rs
Outdated
/// Range of Vector clocks, this gives each byte a potentially | ||
/// unqiue set of vector clocks, but merges identical information | ||
/// together for improved efficiency. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Range of Vector clocks, this gives each byte a potentially | |
/// unqiue set of vector clocks, but merges identical information | |
/// together for improved efficiency. | |
/// Assigning each byte a MemoryCellClocks. |
"Set" is a bit confusing here since it sounds like it might be a HashSet
or so. Also, the fact that RangeMap
is internally more efficient should not matter at this level.
…ing into it again, it seems the paper only increments the timestamp after release operations, so changed to approximation of that implementation.
Thanks again @JCTyblaidd for the PR and for working with me through the review. This is an amazing addition to Miri. :-) @bors r+ |
📌 Commit cbb695f has been approved by |
☀️ Test successful - checks-actions |
update Miri This update includes rust-lang/miri#1617, the data race detector by `@JCTyblaidd.` :) Cc `@rust-lang/miri` r? `@ghost`
update Miri This update includes rust-lang/miri#1617, the data race detector by ``@JCTyblaidd.`` :) Cc ``@rust-lang/miri`` r? ``@ghost``
Partially fixes data-race detection, see #1372, based on Dynamic Race Detection for C++11
Does not attempt to re-use thread id's so creating and joining threads lots of time in an execution will result in the vector clocks growing in size and slowing down program executionIt does now