-
-
Notifications
You must be signed in to change notification settings - Fork 559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Force test failure on async task panic (for now) #2479
Comments
@spacekookie I think you're more informed than I am here, and probably have some opinions or plans around how you imagine this to be handled in the node runtime, so please feel free to leave any thoughts (even/especially if you disagree with what I've written!) |
This is a tricky problem. My experience with test-wrappers that handle panics in a special way has so far mostly been negative, as it adds a load on maintenance burden to the project and corner-cases that may only exist in CI but not in production settings. Going down this line of thinking though I'm wondering if we could make use of Anyway, my general approach to panics is that we shouldn't ever have one. So making As for the information we'll be losing: I think we have to face the logging and diagnostics problem already because on |
Yes, we're in agreement here. The reason for doing a special thing in CI is because |
Currently our tests occasionally hang in CI. I think I've seen it happen at least 3-4 times in the past week, so it's not exactly rare.
I believe the main reason this happens is because a task (or worker) panics, which is currently swallowed by tokio (tokio-rs/tokio#2002). This kind of issue is the kind of issue we should work out.
There are a few options:
Restructure code so that we handle task panics in a more principled way. Ultimately, we should do this, but I think it's not critical in the short term.
When running tests, wrap
spawn(...)
in a wrapper that forces test failure if the task panics.catch_unwind
, and sending a message on a channel in the case of panics. Who listens to this channel would have to be worked out, but an easy answer is that we make it happen in the test wrapper.When running tests under nightly, use
RUSTFLAGS="-Cpanic=abort -Zpanic-abort-tests"
(see panic=abort testing / subprocess testing rust-lang/rust#67650). This is unstable, but will cause panics emitted anywhere during tests to immediately abort the process.-Zpanic-abort-tests
is an unstable feature required for the test runner to handle this abort gracefully -- otherwise, we might not even notice!Something else? @spacekookie may have some thoughts on other options.
As it stands, given that we have to support no_std and probably other environments where we build with
-Cpanic=abort
(which isn't that rarely enabled of a flag), we should probably assume any panic is a fatal bug in our code, at least for the time being.Anyway, I'm going to poke at 3, since it seems much less work than 2, and would (if nothing else) upgrade some CI hangs to actual test failures.
Long term thoughts?
However, in the long term, I think if/when
-Cpanic=abort
(or similar) is not enabled, we'd probably like users to be able to build workers which are robust to panics, since this kind of isolation is (sometimes) considered a hallmark of actor models. This seems like something we have some time to punt on though, as it is mostly desirable for panics caused by user code (rather than issues in our code -- ideally we should return Err rather than panic).The text was updated successfully, but these errors were encountered: