-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
shutdown event handler not killed on timeout #6615
Comments
Are you talking about the process pool timeout? Is the issue that the |
Presumably, in the above log snippet, the first event handler was killed after 10 minutes, the second event handler was not killed after 10 minutes. Whatever timeout applied to the first event handler should also have applied to the second. |
Reproducible example#!/usr/bin/env bash
# bin/handler
sleep 600 # flow.cylc
[scheduler]
[[events]]
stall handlers = handler
shutdown handlers = handler
[scheduling]
[[graph]]
R1 = foo => bar
[runtime]
[[foo]]
script = false
[[bar]] # global.cylc
[scheduler]
process pool timeout = PT10S
[[events]]
stall timeout = PT20S
abort on stall timeout = True $ cylc vip ... --no-detach
|
(Note for testing you can inline It seems to be only shutdown handlers that aren't subject to the process pool timeout. With the same global config, two non-shutdown handlers get killed here: [scheduler]
[[events]]
stall handlers = "sleep 60; echo"
startup handlers = "sleep 60; echo" but even a lone shutdown handler does not get killed: [scheduler]
[[events]]
shutdown handlers = "sleep 60; echo" |
Shutdown handlers aren't run in the process poll because it's "closed" by then. # cylc/flow/workflow_events.py
351 if self.proc_pool.closed:
352 # Run command in foreground if abort on failure is set or if
353 # process pool is closed
354 self.proc_pool.run_command(proc_ctx)
355 self._run_event_handlers_callback(proc_ctx)
356 else:
357 # Run command using process pool otherwise
358 self.proc_pool.put_command(
359 proc_ctx, callback=self._run_event_handlers_callback) |
I'll post a fix... |
Sneaky. I really don't like the auto-provision of args here. |
Dammit, I thought it might be something like that. FYI: There is another related issue that it might be possible to fix at the same time. #5997 |
Seen in the wild, event handler not respecting its timeout?
Why did the "workflow stalled" event handler abide by the timeout and the "abort on stall timeout" event handler ignore it?
This causes issues with auto-restart functionality because if an event handler ends up hanging for whatever reason, there is no timeout to stop it and the workflow will remain in the shutting down state indefinitely.
The text was updated successfully, but these errors were encountered: