-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lock the atexit_hooks during execution of the hooks on shutdown. #49868
Conversation
EDIT: these are resolved |
Fix the lock to be more precise, so that we are allowed to register more atexit hooks from inside an atexit hook. Add tests that cover all the cases: 1. registering a hook from inside a hook 2. registering a hook from another thread while hooks are running 3. attempting to register a hook after all hooks have finished (disallowed)
Co-authored-by: Jameson Nash <[email protected]>
Co-authored-by: Jameson Nash <[email protected]>
Ready for a final review. |
https://buildkite.com/julialang/julia-master/builds/24379#01886d90-17c9-48d9-9710-b2b18d515682 |
@NHDaly this bleeds output into the test logs and adds to noise. Would it be possible to avoid that?
|
Oh, woops! I think those logs were just there for debugging. Thanks! I'll clean that up right now. |
@@ -354,6 +354,7 @@ const atexit_hooks = Callable[ | |||
() -> Filesystem.temp_cleanup_purge(force=true) | |||
] | |||
const _atexit_hooks_lock = ReentrantLock() | |||
global _atexit_hooks_finished::Bool = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vtjnash: Oh, wait, this needs to be volatile
, right? Since it's used inside a while loop across multiple threads?
So this should be an atomic
, since we don't have the ability to do volatile without atomic, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking this might explain the failure you linked:
https://buildkite.com/julialang/julia-master/builds/24379#01886d90-17c9-48d9-9710-b2b18d515682
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
volatile
is only meaningful for special memory mapped hardware registers, and x86 doesn't have any of those
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that true? Isn't it also needed to prevent global variables from getting pulled into either a register or a stack-local variable and thus breaking coordination across threads?
I think we've seen this in julia code before, that unless you mark the variable as an Atomic
it ends up inlined as a local into the function body.
Here's an example:
julia> mutable struct Foo
x::Int
end
julia> function f(z::Foo)
while z.x > 0
end
return z.x
end
f (generic function with 1 method)
julia> @code_native f(Foo(0))
.section __TEXT,__text,regular,pure_instructions
.build_version macos, 13, 0
.globl _julia_f_331 ; -- Begin function julia_f_331
.p2align 2
_julia_f_331: ; @julia_f_331
; ┌ @ REPL[11]:1 within `f`
.cfi_startproc
; %bb.0: ; %top
; │ @ REPL[11] within `f`
;DEBUG_VALUE: f:z <- [DW_OP_deref] $x0
;DEBUG_VALUE: f:z <- [DW_OP_deref] 0
ldr x0, [x0]
; │ @ REPL[11]:2 within `f`
cmp x0, #1 ; =1
b.lt LBB0_2
LBB0_1: ; %L1
; =>This Inner Loop Header: Depth=1
b LBB0_1
LBB0_2: ; %L5
; │ @ REPL[11]:4 within `f`
ret
.cfi_endproc
; └
; -- End function
.subsections_via_symbols
And you can see that this breaks coordination across tasks:
julia> Threads.nthreads()
4
julia> global t::Foo = Foo(1)
Foo(1)
julia> Threads.@spawn @info f(t)
Task (runnable) @0x000000015b4bd5a0
julia> t.x = -1
-1
julia>
julia> t.x
-1
Whereas by marking this variable as an Atomic
, it fixes the inlining of the variable, and its value is fetched on every iteration of the loop:
julia> mutable struct Foo2
@atomic x::Int
end
julia> function f(z::Foo2)
while @atomic(z.x) > 0
end
return @atomic z.x
end
f (generic function with 2 methods)
julia> @code_native f(Foo2(0))
.section __TEXT,__text,regular,pure_instructions
.build_version macos, 13, 0
.globl _julia_f_350 ; -- Begin function julia_f_350
.p2align 2
_julia_f_350: ; @julia_f_350
; ┌ @ REPL[30]:1 within `f`
.cfi_startproc
; %bb.0: ; %top
; │ @ REPL[30] within `f`
;DEBUG_VALUE: f:z <- [DW_OP_deref] $x0
;DEBUG_VALUE: f:z <- [DW_OP_deref] 0
LBB0_1: ; %L1
; =>This Inner Loop Header: Depth=1
; │ @ REPL[30]:2 within `f`
; │┌ @ Base.jl:50 within `getproperty`
ldar x8, [x0]
; │└
cmp x8, #0 ; =0
b.gt LBB0_1
; %bb.2: ; %L5
; │ @ REPL[30]:4 within `f`
; │┌ @ Base.jl:50 within `getproperty`
ldar x0, [x0]
; │└
ret
.cfi_endproc
; └
; -- End function
.subsections_via_symbols
julia>
And now this issue is fixed as well:
julia> Threads.nthreads()
4
julia> global t2::Foo2 = Foo2(1)
Foo2(1)
julia> Threads.@spawn @info f(t2)
Task (runnable) @0x000000017234a290
julia> @atomic t2.x = -1
[ Info: -1
-1
^^ So I do think there is an issue in the code as written. I am pretty sure that this issue is part of what is encompassed by the term "volatility", but using whatever name we want to use for it, it's an issue, yeah?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although in this situation, it's also fixed by using a lock, which I guess introduces the needed barriers... 🤔
julia> mutable struct Foo
x::Int
lock::ReentrantLock
end
julia> function f(z::Foo)
while @lock z.lock (z.x > 0)
end
return @lock z.lock z.x
end
f (generic function with 1 method)
julia> global t::Foo = Foo(1, ReentrantLock())
Foo(1, ReentrantLock(nothing, 0x00000000, 0x00, Base.GenericCondition{Base.Threads.SpinLock}(Base.InvasiveLinkedList{Task}(nothing, nothing), Base.Threads.SpinLock(0
)), (2, 4333405504, 4499986432)))
julia> Threads.@spawn @info f(t)
Task (runnable) @0x0000000281dbf0f0
julia> @lock t.lock t.x = -1
[ Info: -1
-1
So are you saying that the lock like that should be sufficient in all cases, and we don't also need to mark the variable atomic, to get it to become volatile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what Jameson meant is that the notion of "volatile" is irrelevant here. The notion that is of import is "atomicity" (https://en.cppreference.com/w/c/language/atomic).
Either by marking the Bool Atomic
or by protecting it with a lock. Before C11 volatile and atomic were often confounded.
From https://en.cppreference.com/w/c/language/volatile
Note that volatile variables are not suitable for communication between threads; they do not offer atomicity, synchronization, or memory ordering. A read from a volatile variable that is modified by another thread without synchronization or concurrent modification from two unsynchronized threads is undefined behavior due to a data race.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aha. Thank you both! That's clear now, then. 👍 Thanks for educating me!
So, a julia ReentrantLock is enough to also provide the LLVM / C concept of atomicity? We can guarantee that the updates to the value will be reflected across threads?
Now that we're discussing it, the answer seems to obviously be yes, otherwise how would anything work... OKAY thanks this is helpful.
Ensure the lock is precise, so that we are allowed to register new atexit hooks from inside an atexit hook. But then disable `atexit()` when shutting down after it finishes running. Add tests that cover all the cases: 1. registering a hook from inside a hook 2. registering a hook from another thread while hooks are running 3. attempting to register a hook after all hooks have finished (disallowed) Fixes #49841 Co-authored-by: Jameson Nash <[email protected]> (cherry picked from commit 20752db)
Fixes #49841.
Follow-up to #49774.
This PR makes two changes:
atexit_hooks
while iterating the hooks during execution of_atexit()
at shutdown.