-
Notifications
You must be signed in to change notification settings - Fork 257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WaitForSingleObjectEx returns WAIT_ABANDONED when collecting backtraces #399
Comments
EDIT: Hold up, that observation is wrong. I cannot find the rules surrounding releasing/closing handles in shared namespaces. I would assume |
I think the call to From https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-createmutexa
Edit: Actually it looks like that behavior would return WAIT_FAILED instead. |
I get a unique handle value for every call to The only way to repro #[test]
fn wait_abandoned() {
unsafe {
let l1 = CreateMutexA(
ptr::null_mut(),
FALSE,
"Local\\RustBacktraceMutex\0".as_ptr() as _,
);
dbg!(l1);
let t = std::thread::spawn(|| {
let l2 = CreateMutexA(
ptr::null_mut(),
FALSE,
"Local\\RustBacktraceMutex\0".as_ptr() as _,
);
dbg!(l2);
dbg!(WaitForSingleObjectEx(l2, INFINITE, FALSE));
// CloseHandle does not unlock the mutex
// dbg!(CloseHandle(l2));
});
std::thread::sleep_ms(200);
dbg!(WaitForSingleObjectEx(l1, INFINITE, FALSE));
dbg!(ReleaseMutex(l1));
dbg!(CloseHandle(l1));
t.join().unwrap();
}
} That'd mean the https://devblogs.microsoft.com/oldnewthing/20050912-14/?p=34253 According to common-sense and this article, treating |
Thanks for the report! It looks like it shoudl be fine to handle |
@alexcrichton I'm hesitant to accept We definitely have to perform some tracing as the assertion above comes from a codepath that's not called when our application is in total despair: it's rather unlikely for threads to disappear there. It is however possible - quite likely, even - that the application panicked on another thread, but shouldn't that exit gracefully by dropping |
It's true yeah that I think it might be good to try to minimize this to figure out what's causing it. There may be interactions with other languages or something like that, or maybe some system library is zapping a thread? Hard to say unfortunately :( |
Ran into exactly this as well today in our app, happened on our main thread and don't think there was any other threads that was being destroyed. @MarijnS95 did y'all track this any further? |
@repi Nope, we just started doing much less frequent callstack tracing instead 🙈 |
@repi Unfortunately we have not been able to repro this anymore bar the threading example above. Maybe there are short-living and panicking threads spawned after all? This seems to be the only way to run into I'd suggest adding logging/tracing code around here to keep track of threads locking and (potentially forgetting to) unlock(ing) this mutex, in hopes of finding the cause. |
@MarijnS95 I had just disabled our intensive callstack recording to work around it since it was mostly a debug-only feature anyway. It's still there. |
I count that as not repro'ing it anymore! For us it was probably recording a callstack right as a panic was being handled, which did not end up releasing before closing the mutex (and/or terminating the entire thread). |
We think we've just encountered this too. I'm curious about the use of a Session-wide mutex with a fixed name - the aim described in the comment is to synchronise all accesses to I think one way around this would be to include the PID in the name of the created mutex, as the PID should be stable across the entire execution of the process and would disambiguate the created mutex between different processes. I was going to try this to test my theory, but I believe testing it would require rebuilding the standard library... |
Huh, that is odd. There's indeed no reason I can see to synchronize between processes. The docs only say that threads need to be synchronized. I think it would be more appropriate to use a critical section rather than mess about in the session namespace. EDIT: Oh I guess the problem is if the standard library's backtrace and another instance of backtrace are used in different DLLs but loaded within the same process, then it becomes kind of awkward to share a global symbol without some way to coordinate. |
backtrace-rs/src/dbghelp.rs
Line 315 in 47069af
First seen here: Traverse-Research/gpu-allocator#13
We've encountered it twice but haven't been able to trap it in the debugger, I'll update this issue if we have more information around it.
The text was updated successfully, but these errors were encountered: