Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assert failed: cx_core.is_none() #5239

Closed
koivunej opened this issue Nov 29, 2022 · 15 comments · Fixed by #5837
Closed

assert failed: cx_core.is_none() #5239

koivunej opened this issue Nov 29, 2022 · 15 comments · Fixed by #5837
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug.

Comments

@koivunej
Copy link
Contributor

koivunej commented Nov 29, 2022

Version

tokio 1.21.1, with grep results prettified:

We use rust 1.62.1.

Platform

Initially detected from a CI run on:

  • amazon ec2
  • Linux hostname 5.10.144-127.601.amzn2.x86_64 #1 SMP Thu Sep 29 01:11:59 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Reproduced on:

  • ubuntu 22.04
  • Linux hostname 5.15.0-52-generic #58-Ubuntu SMP Thu Oct 13 08:03:55 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
  • AMD Ryzen 9 3900XT 12-Core Processor

Description

Assertion failure message with release build:

thread 'mgmt request worker' panicked at 'assertion failed: cx_core.is_none()', /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.21.1/src/runtime/scheduler/multi_thread/worker.rs:263:21

In our codebase this is after a few tries reproducable under load locally for me. Load as in while true; do cargo clean && cargo build; done for example running in the background. I can try out some patches if needed.

I haven't been able to find an MVCE.

Full steps to reproduce in our codebase
# install all dependencies from repository README.md:
# https://github.com/koivunej/neon/tree/tokio_assertion_failure#running-local-installation

git clone --recursive --branch tokio_assertion_failure https://github.com/koivunej/neon.git

# release build is needed to reproduce
BUILD_TYPE=release CARGO_BUILD_FLAGS="--features=testing,profiling" make -s -j4

# install more dependencies
PYTHON_KEYRING_BACKEND=keyring.backends.null.Keyring ./scripts/pysync

# add external load in another terminal to this
while true; do NEON_BIN=target/release ./scripts/pytest test_runner/regress/test_gc_aggressive.py::test_gc_aggressive; done

Expect to see:

FAILED test_runner/regress/test_gc_aggressive.py::test_gc_aggressive - requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

Then you will find the assertion failure in test_output/test_gc_aggressive/repo/pageserver.log. I have also copied the full stacktrace to the next <details>.

RUST_BACKTRACE=full of the assertion failure
thread 'mgmt request worker' panicked at 'assertion failed: cx_core.is_none()', /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.21.1/src/runtime/scheduler/multi_thread/worker.rs:263:21
stack backtrace:
   0:     0x56374720a37d - std::backtrace_rs::backtrace::libunwind::trace::h8e036432725b1c57
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1:     0x56374720a37d - std::backtrace_rs::backtrace::trace_unsynchronized::h4f83092254c85869
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x56374720a37d - std::sys_common::backtrace::_print_fmt::h9728b5e056a3ece3
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/sys_common/backtrace.rs:66:5
   3:     0x56374720a37d - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h48bb4bd2928827d2
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/sys_common/backtrace.rs:45:22
   4:     0x563747232e9c - core::fmt::write::h909e69a2c24f44cc
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/fmt/mod.rs:1196:17
   5:     0x563747202061 - std::io::Write::write_fmt::h7f4b8ab8af89e9ef
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/io/mod.rs:1654:15
   6:     0x56374720bcf5 - std::sys_common::backtrace::_print::hff4838ebf14a2171
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/sys_common/backtrace.rs:48:5
   7:     0x56374720bcf5 - std::sys_common::backtrace::print::h2499280374189ad9
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/sys_common/backtrace.rs:35:9
   8:     0x56374720bcf5 - std::panicking::default_hook::{{closure}}::h8b270fc55eeb284e
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:295:22
   9:     0x56374720b969 - std::panicking::default_hook::h3217e229d6e9d13c
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:314:9
  10:     0x56374720c3d8 - std::panicking::rust_panic_with_hook::h9acb8048b738d2e0
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:698:17
  11:     0x56374720c249 - std::panicking::begin_panic_handler::{{closure}}::h70f3b839526af6dc
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:586:13
  12:     0x56374720a834 - std::sys_common::backtrace::__rust_end_short_backtrace::h1ecf2cee857fbe0a
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/sys_common/backtrace.rs:138:18
  13:     0x56374720bfb9 - rust_begin_unwind
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:584:5
  14:     0x563747230b63 - core::panicking::panic_fmt::h9f8393e7fd56d655
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:142:14
  15:     0x5637472309ad - core::panicking::panic::h021666fc6a0f7b6b
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:48:5
  16:     0x56374710c22b - <tokio::runtime::scheduler::multi_thread::worker::block_in_place::Reset as core::ops::drop::Drop>::drop::{{closure}}::hd65847a1090ca025
  17:     0x5637471062c5 - <tokio::runtime::scheduler::multi_thread::worker::block_in_place::Reset as core::ops::drop::Drop>::drop::h42ae149038909fb7
  18:     0x56374697512e - core::ptr::drop_in_place<tokio::runtime::scheduler::multi_thread::worker::block_in_place::Reset>::h1e6f731fa79d34ba
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/ptr/mod.rs:486:1
  19:     0x56374697512e - tokio::runtime::scheduler::multi_thread::worker::block_in_place::hda495eb5ef5a1acd
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.21.1/src/runtime/scheduler/multi_thread/worker.rs:340:5
  20:     0x563746b08340 - tokio::task::blocking::block_in_place::ha97b73b75ce70862
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.21.1/src/task/blocking.rs:77:9
  21:     0x563746b08340 - pageserver::tenant::Tenant::gc_iteration::{{closure}}::hcc45b24d96148799
                               at /home/joonas/src/neon/neon/pageserver/src/tenant.rs:530:9
  22:     0x563746b08340 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h308288025478c0c0
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/mod.rs:91:19
  23:     0x56374687c68c - <tracing::instrument::Instrumented<T> as core::future::future::Future>::poll::hda287b8f128780d0
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tracing-0.1.37/src/instrument.rs:272:9
  24:     0x563746b0fcfd - pageserver::http::routes::timeline_gc_handler::{{closure}}::h0e56b6cccdfe75f6
                               at /home/joonas/src/neon/neon/pageserver/src/http/routes.rs:849:91
  25:     0x563746b0fcfd - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h4dee783785ea8184
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/mod.rs:91:19
  26:     0x56374678f3b7 - <core::pin::Pin<P> as core::future::future::Future>::poll::h5dbc8583f5dbf765
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/future.rs:124:9
  27:     0x56374678f3b7 - routerify::route::Route<B,E>::process::{{closure}}::h7fffd52673600116
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/routerify-3.0.0/src/route/mod.rs:105:32
  28:     0x56374678f3b7 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::ha39752ecfad407be
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/mod.rs:91:19
  29:     0x56374678f3b7 - routerify::router::Router<B,E>::process::{{closure}}::hc3d490240cd467ff
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/routerify-3.0.0/src/router/mod.rs:308:89
  30:     0x56374678f3b7 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h88afc17f6a7162c2
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/mod.rs:91:19
  31:     0x56374678f3b7 - <routerify::service::request_service::RequestService<B,E> as tower_service::Service<http::request::Request<hyper::body::body::Body>>>::call::{{closure}}::hf419aede28588ee7
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/routerify-3.0.0/src/service/request_service.rs:56:72
  32:     0x5637467b93a5 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h2de32919bd847725
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/mod.rs:91:19
  33:     0x5637467d596e - <core::pin::Pin<P> as core::future::future::Future>::poll::h3faa950168332df5
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/future.rs:124:9
  34:     0x5637467d596e - <hyper::proto::h1::dispatch::Server<S,hyper::body::body::Body> as hyper::proto::h1::dispatch::Dispatch>::poll_msg::hd5117f65306c4294
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/hyper-0.14.20/src/proto/h1/dispatch.rs:491:35
  35:     0x5637467d596e - hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T>::poll_write::hc55c2ea65eaff573
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/hyper-0.14.20/src/proto/h1/dispatch.rs:297:43
  36:     0x5637467d596e - hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T>::poll_loop::h214e07f7181a2707
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/hyper-0.14.20/src/proto/h1/dispatch.rs:161:21
  37:     0x5637467d30fd - hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T>::poll_inner::h2b3d24b8f8211935
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/hyper-0.14.20/src/proto/h1/dispatch.rs:137:16
  38:     0x5637467d30fd - hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T>::poll_catch::hfead020b3bd85cd6
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/hyper-0.14.20/src/proto/h1/dispatch.rs:120:28
  39:     0x5637466f9f52 - <hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T> as core::future::future::Future>::poll::hb9d39bd98e716b09
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/hyper-0.14.20/src/proto/h1/dispatch.rs:424:9
  40:     0x5637466f9f52 - <hyper::server::conn::ProtoServer<T,B,S,E> as core::future::future::Future>::poll::h7665d21f4b883402
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/hyper-0.14.20/src/server/conn.rs:952:47
  41:     0x5637466f9f52 - <hyper::server::conn::upgrades::UpgradeableConnection<I,S,E> as core::future::future::Future>::poll::hb96f5473d0574cb8
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/hyper-0.14.20/src/server/conn.rs:1012:30
  42:     0x56374671e6bf - <hyper::common::drain::Watching<F,FN> as core::future::future::Future>::poll::hf0c8ec2a7a8ed8b0
  43:     0x56374671e6bf - <hyper::server::server::new_svc::NewSvcTask<I,N,S,E,W> as core::future::future::Future>::poll::h846866b9a0929fda
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/hyper-0.14.20/src/server/server.rs:728:36
  44:     0x5637467f65e7 - tokio::runtime::task::core::CoreStage<T>::poll::{{closure}}::h9a58eefb1d854ebe
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.21.1/src/runtime/task/core.rs:184:17
  45:     0x5637467f65e7 - tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut::hbd0e5f206f1f3f6f
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.21.1/src/loom/std/unsafe_cell.rs:14:9
  46:     0x5637467f65e7 - tokio::runtime::task::core::CoreStage<T>::poll::hbee48de80c4fcccd
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.21.1/src/runtime/task/core.rs:174:13
  47:     0x56374685c61a - tokio::runtime::task::harness::poll_future::{{closure}}::h7ca64421cdeddcb2
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.21.1/src/runtime/task/harness.rs:480:19
  48:     0x56374685c61a - <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h2de3a15ff26ba160
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panic/unwind_safe.rs:271:9
  49:     0x56374685c61a - std::panicking::try::do_call::hf6d7a880e62abda6
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:492:40
  50:     0x56374685c61a - std::panicking::try::h531c1d3ec5cbe2b2
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:456:19
  51:     0x56374685c61a - std::panic::catch_unwind::h4f0af80b22a9de64
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panic.rs:137:14
  52:     0x56374685c61a - tokio::runtime::task::harness::poll_future::h57ec7dda84531f03
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.21.1/src/runtime/task/harness.rs:468:18
  53:     0x56374685c61a - tokio::runtime::task::harness::Harness<T,S>::poll_inner::heca3dd74238bdd7e
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.21.1/src/runtime/task/harness.rs:104:27
  54:     0x56374685c61a - tokio::runtime::task::harness::Harness<T,S>::poll::he0f319957dba656d
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.21.1/src/runtime/task/harness.rs:57:15
  55:     0x5637470e35c5 - std::thread::local::LocalKey<T>::with::h38aaa913b8a48d65
  56:     0x563747107563 - tokio::runtime::scheduler::multi_thread::worker::Context::run_task::hf28064e32e379826
  57:     0x563747106a80 - tokio::runtime::scheduler::multi_thread::worker::Context::run::hec211607b213b37b
  58:     0x56374710a7b7 - tokio::macros::scoped_tls::ScopedKey<T>::set::hd7166d6799738ff0
  59:     0x5637471064a9 - tokio::runtime::scheduler::multi_thread::worker::run::h958f4678849dd1fe
  60:     0x5637470f575c - <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll::h0ab71826e7387519
  61:     0x5637470da7e9 - tokio::runtime::task::harness::Harness<T,S>::poll::h091e55b483c30575
  62:     0x5637470f4f5a - tokio::runtime::blocking::pool::Inner::run::h3a91a3d2536a1c92
  63:     0x5637470e6df2 - std::sys_common::backtrace::__rust_begin_short_backtrace::h6a13e50bb80c5a9b
  64:     0x5637470e751f - core::ops::function::FnOnce::call_once{{vtable.shim}}::h81568063c1016e71
  65:     0x563747212053 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h191d5c5ea3edb31d
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/alloc/src/boxed.rs:1872:9
  66:     0x563747212053 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h42ef7cb2ae640a31
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/alloc/src/boxed.rs:1872:9
  67:     0x563747212053 - std::sys::unix::thread::Thread::new::thread_start::he47f7169665dab60
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/sys/unix/thread.rs:108:17
  68:     0x7f3b56aacb43 - start_thread
                               at ./nptl/./nptl/pthread_create.c:442:8
  69:     0x7f3b56b3ea00 - clone3
                               at ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
  70:                0x0 - <unknown>

In this branch, I've sprinkled a lot of block_in_place around the blocking parts after I ran into a deadlock caused by the unstealable lifo slot because there was blocking within the runtime threads. It is unlikely that I've caught all places of blocking within async context.

If I ended up misusing the block_in_place and block_on then I wish the assertion would have a clear message about the misuse. However since it only triggers under external load (and while being nice -20), I suspect it is a real tokio issue.

@koivunej koivunej added A-tokio Area: The main tokio crate C-bug Category: This is a bug. labels Nov 29, 2022
@koivunej
Copy link
Contributor Author

RUST_BACKTRACE=full with tokio 1.22.0
thread 'mgmt request worker' panicked at 'assertion failed: cx_core.is_none()', /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.22.0/src/runtime/scheduler/multi_thread/worker.rs:265:21
stack backtrace:
   0:     0x55566c170cfd - std::backtrace_rs::backtrace::libunwind::trace::h8e036432725b1c57
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1:     0x55566c170cfd - std::backtrace_rs::backtrace::trace_unsynchronized::h4f83092254c85869
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x55566c170cfd - std::sys_common::backtrace::_print_fmt::h9728b5e056a3ece3
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/sys_common/backtrace.rs:66:5
   3:     0x55566c170cfd - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h48bb4bd2928827d2
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/sys_common/backtrace.rs:45:22
   4:     0x55566c19987c - core::fmt::write::h909e69a2c24f44cc
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/fmt/mod.rs:1196:17
   5:     0x55566c1689d1 - std::io::Write::write_fmt::h7f4b8ab8af89e9ef
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/io/mod.rs:1654:15
   6:     0x55566c172675 - std::sys_common::backtrace::_print::hff4838ebf14a2171
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/sys_common/backtrace.rs:48:5
   7:     0x55566c172675 - std::sys_common::backtrace::print::h2499280374189ad9
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/sys_common/backtrace.rs:35:9
   8:     0x55566c172675 - std::panicking::default_hook::{{closure}}::h8b270fc55eeb284e
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:295:22
   9:     0x55566c1722e9 - std::panicking::default_hook::h3217e229d6e9d13c
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:314:9
  10:     0x55566c172d58 - std::panicking::rust_panic_with_hook::h9acb8048b738d2e0
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:698:17
  11:     0x55566c172bc9 - std::panicking::begin_panic_handler::{{closure}}::h70f3b839526af6dc
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:586:13
  12:     0x55566c1711b4 - std::sys_common::backtrace::__rust_end_short_backtrace::h1ecf2cee857fbe0a
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/sys_common/backtrace.rs:138:18
  13:     0x55566c172939 - rust_begin_unwind
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:584:5
  14:     0x55566c197543 - core::panicking::panic_fmt::h9f8393e7fd56d655
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:142:14
  15:     0x55566c19738d - core::panicking::panic::h021666fc6a0f7b6b
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:48:5
  16:     0x55566c070c2f - <tokio::runtime::scheduler::multi_thread::worker::block_in_place::Reset as core::ops::drop::Drop>::drop::{{closure}}::h780bb364a103247b
  17:     0x55566c04b9d5 - <tokio::runtime::scheduler::multi_thread::worker::block_in_place::Reset as core::ops::drop::Drop>::drop::hb1e875619dadefa6
  18:     0x55566b7e7681 - core::ptr::drop_in_place<tokio::runtime::scheduler::multi_thread::worker::block_in_place::Reset>::hacd502eeb7e9dd39
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/ptr/mod.rs:486:1
  19:     0x55566b7e7681 - tokio::runtime::scheduler::multi_thread::worker::block_in_place::h99940af26e422d93
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.22.0/src/runtime/scheduler/multi_thread/worker.rs:357:5
  20:     0x55566b9f4e8d - tokio::task::blocking::block_in_place::h568815439690e29e
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.22.0/src/task/blocking.rs:78:9
  21:     0x55566b9f4e8d - pageserver::tenant::Tenant::gc_iteration::{{closure}}::hb2eb232bd6a6b106
                               at /home/joonas/src/neon/neon/pageserver/src/tenant.rs:530:9
  22:     0x55566b9f4e8d - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h585b98797c18043e
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/mod.rs:91:19
  23:     0x55566b7c2c9c - <tracing::instrument::Instrumented<T> as core::future::future::Future>::poll::h21edc08432595c21
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tracing-0.1.37/src/instrument.rs:272:9
  24:     0x55566b9e54e8 - pageserver::http::routes::timeline_gc_handler::{{closure}}::h93932cd55bf6b3f2
                               at /home/joonas/src/neon/neon/pageserver/src/http/routes.rs:849:91
  25:     0x55566b9e54e8 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h26b7bf5b7ee4e2f8
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/mod.rs:91:19
  26:     0x55566b6939a7 - <core::pin::Pin<P> as core::future::future::Future>::poll::h37e3cb1d47d83990
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/future.rs:124:9
  27:     0x55566b6939a7 - routerify::route::Route<B,E>::process::{{closure}}::hf8cae15a3588e35e
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/routerify-3.0.0/src/route/mod.rs:105:32
  28:     0x55566b6939a7 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h777f77ec3363521b
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/mod.rs:91:19
  29:     0x55566b6939a7 - routerify::router::Router<B,E>::process::{{closure}}::hf31eb70c89e77947
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/routerify-3.0.0/src/router/mod.rs:308:89
  30:     0x55566b6939a7 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::he859109219d1c6e9
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/mod.rs:91:19
  31:     0x55566b6939a7 - <routerify::service::request_service::RequestService<B,E> as tower_service::Service<http::request::Request<hyper::body::body::Body>>>::call::{{closure}}::h30cef40d969885d8
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/routerify-3.0.0/src/service/request_service.rs:56:72
  32:     0x55566b6d0625 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::hf7e345e7160044bb
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/mod.rs:91:19
  33:     0x55566b6e447e - <core::pin::Pin<P> as core::future::future::Future>::poll::hc75ef90ec2156114
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/future.rs:124:9
  34:     0x55566b6e447e - <hyper::proto::h1::dispatch::Server<S,hyper::body::body::Body> as hyper::proto::h1::dispatch::Dispatch>::poll_msg::hcbb33d72b117a0b2
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/hyper-0.14.20/src/proto/h1/dispatch.rs:491:35
  35:     0x55566b6e447e - hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T>::poll_write::h03958203cc8a40a5
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/hyper-0.14.20/src/proto/h1/dispatch.rs:297:43
  36:     0x55566b6e447e - hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T>::poll_loop::h0833e4c3bb93bff9
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/hyper-0.14.20/src/proto/h1/dispatch.rs:161:21
  37:     0x55566b6e1c0d - hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T>::poll_inner::he5ae63ae6fbdbe2a
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/hyper-0.14.20/src/proto/h1/dispatch.rs:137:16
  38:     0x55566b6e1c0d - hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T>::poll_catch::hccdda79bb029e058
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/hyper-0.14.20/src/proto/h1/dispatch.rs:120:28
  39:     0x55566b5fdc12 - <hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T> as core::future::future::Future>::poll::h932231f5365edb01
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/hyper-0.14.20/src/proto/h1/dispatch.rs:424:9
  40:     0x55566b5fdc12 - <hyper::server::conn::ProtoServer<T,B,S,E> as core::future::future::Future>::poll::h6f320f37c2d19f52
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/hyper-0.14.20/src/server/conn.rs:952:47
  41:     0x55566b5fdc12 - <hyper::server::conn::upgrades::UpgradeableConnection<I,S,E> as core::future::future::Future>::poll::h76a89e0c6fcef9a6
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/hyper-0.14.20/src/server/conn.rs:1012:30
  42:     0x55566b61e54f - <hyper::common::drain::Watching<F,FN> as core::future::future::Future>::poll::hb3549caad34c092f
  43:     0x55566b61e54f - <hyper::server::server::new_svc::NewSvcTask<I,N,S,E,W> as core::future::future::Future>::poll::h71bbc06b1d3c6dcb
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/hyper-0.14.20/src/server/server.rs:728:36
  44:     0x55566b6867f6 - tokio::runtime::task::core::Core<T,S>::poll::{{closure}}::had33aadfb8a7be1b
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.22.0/src/runtime/task/core.rs:208:17
  45:     0x55566b6867f6 - tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut::h68313c3a57d767ff
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.22.0/src/loom/std/unsafe_cell.rs:14:9
  46:     0x55566b6867f6 - tokio::runtime::task::core::Core<T,S>::poll::h585708207cd04ce0
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.22.0/src/runtime/task/core.rs:197:13
  47:     0x55566b75e3bc - tokio::runtime::task::harness::poll_future::{{closure}}::h0f86aebe72adbf02
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.22.0/src/runtime/task/harness.rs:483:19
  48:     0x55566b75e3bc - <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h57362d0ed921d31a
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panic/unwind_safe.rs:271:9
  49:     0x55566b75e3bc - std::panicking::try::do_call::h59363edc7a6fa72f
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:492:40
  50:     0x55566b75e3bc - std::panicking::try::h0aa7d8957e0246ab
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:456:19
  51:     0x55566b75e3bc - std::panic::catch_unwind::h69488bb3e9c40098
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panic.rs:137:14
  52:     0x55566b75e3bc - tokio::runtime::task::harness::poll_future::hfc1f7155fdd8e61f
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.22.0/src/runtime/task/harness.rs:471:18
  53:     0x55566b75e3bc - tokio::runtime::task::harness::Harness<T,S>::poll_inner::h7ad5020e41d84ed0
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.22.0/src/runtime/task/harness.rs:107:27
  54:     0x55566b75e3bc - tokio::runtime::task::harness::Harness<T,S>::poll::h18715220ee6a3a95
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.22.0/src/runtime/task/harness.rs:61:15
  55:     0x55566c04c7d3 - tokio::runtime::scheduler::multi_thread::worker::Context::run_task::h6b54d484625cb2c7
  56:     0x55566c04c16f - tokio::runtime::scheduler::multi_thread::worker::Context::run::hbe53e8933f20a75e
  57:     0x55566c070587 - tokio::macros::scoped_tls::ScopedKey<T>::set::h987ed7e93ee03797
  58:     0x55566c04bbe1 - tokio::runtime::scheduler::multi_thread::worker::run::h16d9391f353449a4
  59:     0x55566b9791db - tokio::runtime::scheduler::multi_thread::worker::block_in_place::{{closure}}::{{closure}}::h595166236bea784b
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.22.0/src/runtime/scheduler/multi_thread/worker.rs:343:41
  60:     0x55566b9791db - <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll::hd703d0f9b339ae0f
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.22.0/src/runtime/blocking/task.rs:42:21
  61:     0x55566b9791db - tokio::runtime::task::core::Core<T,S>::poll::{{closure}}::h07f1e5d332b8ee48
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.22.0/src/runtime/task/core.rs:208:17
  62:     0x55566b9791db - tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut::h3462463ef6d987c2
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.22.0/src/loom/std/unsafe_cell.rs:14:9
  63:     0x55566ba9659f - tokio::runtime::task::core::Core<T,S>::poll::h4e3f09706b7975eb
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.22.0/src/runtime/task/core.rs:197:13
  64:     0x55566b91f2e4 - tokio::runtime::task::harness::poll_future::{{closure}}::h35c612330e09f6da
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.22.0/src/runtime/task/harness.rs:483:19
  65:     0x55566b91f2e4 - <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h998d7023d74a0259
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panic/unwind_safe.rs:271:9
  66:     0x55566b91f2e4 - std::panicking::try::do_call::hc24d9acc2e2059db
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:492:40
  67:     0x55566b91f2e4 - std::panicking::try::h2eb20d9b5940731c
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:456:19
  68:     0x55566b91f2e4 - std::panic::catch_unwind::hc564cc3996afde48
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panic.rs:137:14
  69:     0x55566b91f2e4 - tokio::runtime::task::harness::poll_future::hf98c61c1fda4e0a6
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.22.0/src/runtime/task/harness.rs:471:18
  70:     0x55566b91f2e4 - tokio::runtime::task::harness::Harness<T,S>::poll_inner::h060668b564c3a248
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.22.0/src/runtime/task/harness.rs:107:27
  71:     0x55566b91f2e4 - tokio::runtime::task::harness::Harness<T,S>::poll::hee4e8b8ee5bd2a6a
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.22.0/src/runtime/task/harness.rs:61:15
  72:     0x55566c050893 - std::sys_common::backtrace::__rust_begin_short_backtrace::hf85e7fce2451be54
  73:     0x55566c05632f - core::ops::function::FnOnce::call_once{{vtable.shim}}::hd229d8dfd0a88555
  74:     0x55566c178a33 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h191d5c5ea3edb31d
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/alloc/src/boxed.rs:1872:9
  75:     0x55566c178a33 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h42ef7cb2ae640a31
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/alloc/src/boxed.rs:1872:9
  76:     0x55566c178a33 - std::sys::unix::thread::Thread::new::thread_start::he47f7169665dab60
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/sys/unix/thread.rs:108:17
  77:     0x7fb551b19b43 - start_thread
                               at ./nptl/./nptl/pthread_create.c:442:8
  78:     0x7fb551baba00 - clone3
                               at ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
  79:                0x0 - <unknown>

@Veetaha
Copy link
Contributor

Veetaha commented Dec 7, 2022

We have the same problem in our private repository, which is very hard to debug, because it is nested deep inside a really heavy-weight logic (tens of minutes to re-run). The main problem is that this panic is highly flaky and hard to reproduce.

@koivunej
Copy link
Contributor Author

koivunej commented Dec 7, 2022

I did not try it yet but hoping to try this week; catch the bug happening in an rr recording, then find the reproducer.

Neglected to mention in the issue, we have multiple runtimes in our codebase as well (because of the blocking in async contexts).

@koivunej
Copy link
Contributor Author

koivunej commented Dec 8, 2022

With rr record --chaos --num-cores=2 this reproduces instantly. Getting this reproduced doesn't make it easy to track and understand. Will update this with any insights I gain.

@Darksonn
Copy link
Contributor

Darksonn commented Dec 8, 2022

If I ended up misusing the block_in_place and block_on then I wish the assertion would have a clear message about the misuse. However since it only triggers under external load (and while being nice -20), I suspect it is a real tokio issue.

The assertion you triggered certainly shouldn't happen under normal circumstances.

@koivunej
Copy link
Contributor Author

koivunej commented Dec 8, 2022

Found the single file reproducer/mvce. When ran with cargo run it finds the assertion error in first attempt. When adding incremental compilations, it seems to become more difficult.

Please find it at https://github.com/koivunej/tokio-nested-block_in_place.

Expected output:

thread 'tokio-runtime-worker' panicked at 'assertion failed: cx_core.is_none()', /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.23.0/src/runtime/scheduler/multi_thread/worker.rs:265:21
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

hit assertion as expected: "assertion failed: cx_core.is_none()" in 1th attempt

Checked tokio versions:

  • 1.21.1
  • 1.22.0 (in Cargo.toml)
  • 1.23.0

With AMD Ryzen 9 3900XT it reproduces regardless of rr's zen_workaround. Would be interesting to hear if it happens on other cpus.

rr record does not seem to catch it but rr record --chaos and rr record --num-cores=2 do.

Confirmed on:

  • apple m2
  • Intel(R) Core(TM) i7-8650U
  • zen2 and zen3 ryzens

@koivunej
Copy link
Contributor Author

koivunej commented Dec 8, 2022

Maybe it can still be slimmed down, but I don't understand why with regards to how difficult it was to catch originally:

#[tokio::main]
async fn main() {
    use tokio::{
        runtime::Handle,
        task::{block_in_place, spawn},
    };

    let res = spawn(async move {
        block_in_place(|| {
            Handle::current().block_on(async {
                block_in_place(|| {});
            });
        });
    })
    .await;

    let e = match res {
        Ok(()) => unreachable!("no panic?"),
        Err(e) => e,
    };

    let panic = e.into_panic();

    let s = *panic.downcast_ref::<&'static str>().unwrap();
    assert_eq!("assertion failed: cx_core.is_none()", s);
}

@koivunej
Copy link
Contributor Author

koivunej commented Dec 8, 2022

Oki it seems this is simply beyond what was fixed in #2645. I had assumed that nested block_in_place + block_on was supported.

@Veetaha
Copy link
Contributor

Veetaha commented Dec 8, 2022

I'd also expect block_in_place(|| block_on(|| block_in_place(|| { ... }))) would be supported at any level of nesting. We rely on that heavily to run async stuff in regular Drop

@Darksonn
Copy link
Contributor

Darksonn commented Dec 8, 2022

Yes, it should work at any level of nesting as long as the block_on is the one on `Handle.

@koivunej
Copy link
Contributor Author

koivunej commented Dec 9, 2022

Yes, it should work at any level of nesting as long as the block_on is the one on Handle.

Looking at the existing tests, I was thinking maybe the expectation is to use a different or new current thread runtime or futures executor to drive the more nested:

futures::executor::block_on(async move {
use tokio_stream::StreamExt;
assert_eq!(rx.fold(0, |n, _| n + 1).await, 200);
})
but that would not work for surprising code paths which end up doing tokio::task::block_in_place.

So I don't really understand the response here.

Still, I'm no longer looking at this approach, I've always thought the block_in_place is a feature of "last resort".

@Darksonn
Copy link
Contributor

Darksonn commented Dec 9, 2022

Using block_in_place doesn't work inside the current-thread runtime, which is why I mentioned Handle.

@jeromegn
Copy link

jeromegn commented Jun 29, 2023

I'm facing a similar issue, but I can't quite figure out what's triggering it. I thought it was a combination of spawn_blocking and block_in_place within a parent block_in_place, but I can't reproduce in any test, as hard as I try.

Looking at the backtrace, it seems this happens in the Drop impl of the Reset struct. Following the logic, there's one place where the cx.core() would be None and tokio still builds that Reset struct expecting it to Drop and get a Some(core) from cx.core.borrow_mut().

I guess this is evident, but I figured I'd write it out: it seems like some logic is putting back the Core on the Context when it shouldn't be there.

Looking into the code involved, there's at least this place where had_entered == true, but tokio couldn't take the Core:

// Get the worker core. If none is set, then blocking is fine!
let core = match cx.core.borrow_mut().take() {
Some(core) => core,
None => return Ok(()),
};

Since had_entered == true, logic after the setup will create a Reset instance, and if the Core was not put back before it drops, will trigger this assertion failure.

Maybe that can't really happen, I'm not sure.

Edit: I might've gotten confused. It is expected that cx.core.is_none(), the issue is when there's a Core there already.

@jeromegn
Copy link

jeromegn commented Jun 29, 2023

I added this log line before the assert:

println!(
    "Reset dropped worker core is_some? {}, context core is_some? {}",
    core.is_some(),
    cx_core.is_some()
);

When the assert failed, I got:

Reset dropped worker core is_some? false, context core is_some? false
Reset dropped worker core is_some? false, context core is_some? false

<-- here I'm calling code that might end calling some nested block_on + spawn_blocking + block_in_place -->

Reset dropped worker core is_some? true, context core is_some? false
Reset dropped worker core is_some? true, context core is_some? false
Reset dropped worker core is_some? true, context core is_some? false
Reset dropped worker core is_some? true, context core is_some? false
Reset dropped worker core is_some? true, context core is_some? false
Reset dropped worker core is_some? true, context core is_some? false

<-- the aforementioned code returned a value -->

Reset dropped worker core is_some? false, context core is_some? true
thread 'tokio-runtime-worker' panicked at 'assertion failed: cx_core.is_none()', /usr/local/cargo/git/checkouts/tokio-4a5e90f3eb383b08/05746ef/tokio/src/runtime/scheduler/multi_thread/worker.rs:339:21

I'm assuming it's a problem that the core is put back when it's not expected to be put back. It sounds like some blocking task might've ran on a thread that wasn't marked as blocking anymore.

Edit: I confirmed that the Reset dropped ... log lines after my logic calling the oddly nested block_on + spawn_blocking + block_in_place is happening from the same thread, including the final log before the panic.

@jeromegn
Copy link

jeromegn commented Jun 29, 2023

Some more debugging, added thread id and name + added a thread local AtomicU16 acting as some kind of ID for the Reset struct.

(ThreadId(14) Some("tokio-runtime-worker"), ResetId(22)) block_in_place Reset has been instantiated
<-- problematic function call start -->
(ThreadId(14) Some("tokio-runtime-worker")) id_valid called
(ThreadId(14) Some("tokio-runtime-worker")) block_in_place is on thread pool
(ThreadId(14) Some("tokio-runtime-worker")) block_in_place no context core
(ThreadId(14) Some("tokio-runtime-worker"), ResetId(23)) block_in_place Reset has been instantiated
(ThreadId(14) Some("tokio-runtime-worker")) id_valid inside block_in_place
(ThreadId(14) Some("tokio-runtime-worker"), ResetId(23)) Reset dropped worker core is_some? true, context core is_some? false
(ThreadId(14) Some("tokio-runtime-worker")) id_valid called
(ThreadId(14) Some("tokio-runtime-worker")) block_in_place is on thread pool
(ThreadId(14) Some("tokio-runtime-worker"), ResetId(24)) block_in_place Reset has been instantiated
(ThreadId(14) Some("tokio-runtime-worker")) id_valid inside block_in_place
(ThreadId(14) Some("tokio-runtime-worker"), ResetId(24)) Reset dropped worker core is_some? true, context core is_some? false
(ThreadId(14) Some("tokio-runtime-worker")) id_valid called
(ThreadId(14) Some("tokio-runtime-worker")) block_in_place is on thread pool
(ThreadId(14) Some("tokio-runtime-worker"), ResetId(25)) block_in_place Reset has been instantiated
(ThreadId(14) Some("tokio-runtime-worker")) id_valid inside block_in_place
(ThreadId(14) Some("tokio-runtime-worker"), ResetId(25)) Reset dropped worker core is_some? true, context core is_some? false
(ThreadId(14) Some("tokio-runtime-worker")) id_valid called
(ThreadId(14) Some("tokio-runtime-worker")) block_in_place is on thread pool
(ThreadId(14) Some("tokio-runtime-worker"), ResetId(26)) block_in_place Reset has been instantiated
(ThreadId(14) Some("tokio-runtime-worker")) id_valid inside block_in_place
(ThreadId(14) Some("tokio-runtime-worker"), ResetId(26)) Reset dropped worker core is_some? true, context core is_some? false
(ThreadId(14) Some("tokio-runtime-worker")) id_valid called
(ThreadId(14) Some("tokio-runtime-worker")) block_in_place is on thread pool
(ThreadId(14) Some("tokio-runtime-worker"), ResetId(27)) block_in_place Reset has been instantiated
(ThreadId(14) Some("tokio-runtime-worker")) id_valid inside block_in_place
(ThreadId(14) Some("tokio-runtime-worker"), ResetId(27)) Reset dropped worker core is_some? true, context core is_some? false
(ThreadId(14) Some("tokio-runtime-worker")) id_valid called
(ThreadId(14) Some("tokio-runtime-worker")) block_in_place is on thread pool
(ThreadId(14) Some("tokio-runtime-worker"), ResetId(28)) block_in_place Reset has been instantiated
(ThreadId(14) Some("tokio-runtime-worker")) id_valid inside block_in_place
(ThreadId(14) Some("tokio-runtime-worker"), ResetId(28)) Reset dropped worker core is_some? true, context core is_some? false
<-- problematic function call end -->
<-- a few seconds later -->
(ThreadId(14) Some("tokio-runtime-worker"), ResetId(22)) Reset dropped worker core is_some? false, context core is_some? true
thread 'tokio-runtime-worker' panicked at 'assertion failed: cx_core.is_none()', /usr/local/cargo/git/checkouts/tokio-4a5e90f3eb383b08/e608291/tokio/src/runtime/scheduler/multi_thread/worker.rs:347:21

Compared to one where no panic occur:

(ThreadId(11) Some("tokio-runtime-worker")) block_in_place is on thread pool
(ThreadId(11) Some("tokio-runtime-worker"), ResetId(0)) block_in_place Reset has been instantiated
<-- problematic function call start -->
(ThreadId(11) Some("tokio-runtime-worker")) id_valid called
(ThreadId(11) Some("tokio-runtime-worker")) block_in_place is on thread pool
(ThreadId(11) Some("tokio-runtime-worker")) block_in_place no context core
(ThreadId(11) Some("tokio-runtime-worker"), ResetId(1)) block_in_place Reset has been instantiated
(ThreadId(11) Some("tokio-runtime-worker")) id_valid inside block_in_place
(ThreadId(11) Some("tokio-runtime-worker"), ResetId(1)) Reset dropped worker core is_some? true, context core is_some? false
(ThreadId(11) Some("tokio-runtime-worker")) id_valid called
(ThreadId(11) Some("tokio-runtime-worker")) block_in_place is on thread pool
(ThreadId(11) Some("tokio-runtime-worker"), ResetId(2)) block_in_place Reset has been instantiated
(ThreadId(11) Some("tokio-runtime-worker")) id_valid inside block_in_place
(ThreadId(11) Some("tokio-runtime-worker"), ResetId(2)) Reset dropped worker core is_some? true, context core is_some? false
(ThreadId(11) Some("tokio-runtime-worker")) id_valid called
(ThreadId(11) Some("tokio-runtime-worker")) block_in_place is on thread pool
(ThreadId(11) Some("tokio-runtime-worker"), ResetId(3)) block_in_place Reset has been instantiated
(ThreadId(11) Some("tokio-runtime-worker")) id_valid inside block_in_place
(ThreadId(11) Some("tokio-runtime-worker"), ResetId(3)) Reset dropped worker core is_some? false, context core is_some? false
(ThreadId(11) Some("tokio-runtime-worker")) id_valid called
(ThreadId(11) Some("tokio-runtime-worker")) block_in_place is on thread pool
(ThreadId(11) Some("tokio-runtime-worker")) block_in_place no context core
(ThreadId(11) Some("tokio-runtime-worker"), ResetId(4)) block_in_place Reset has been instantiated
(ThreadId(11) Some("tokio-runtime-worker")) id_valid inside block_in_place
(ThreadId(11) Some("tokio-runtime-worker"), ResetId(4)) Reset dropped worker core is_some? false, context core is_some? false
(ThreadId(11) Some("tokio-runtime-worker")) id_valid called
(ThreadId(11) Some("tokio-runtime-worker")) block_in_place is on thread pool
(ThreadId(11) Some("tokio-runtime-worker")) block_in_place no context core
(ThreadId(11) Some("tokio-runtime-worker"), ResetId(5)) block_in_place Reset has been instantiated
(ThreadId(11) Some("tokio-runtime-worker")) id_valid inside block_in_place
(ThreadId(11) Some("tokio-runtime-worker"), ResetId(5)) Reset dropped worker core is_some? false, context core is_some? false
(ThreadId(11) Some("tokio-runtime-worker")) id_valid called
(ThreadId(11) Some("tokio-runtime-worker")) block_in_place is on thread pool
(ThreadId(11) Some("tokio-runtime-worker")) block_in_place no context core
(ThreadId(11) Some("tokio-runtime-worker"), ResetId(6)) block_in_place Reset has been instantiated
(ThreadId(11) Some("tokio-runtime-worker")) id_valid inside block_in_place
(ThreadId(11) Some("tokio-runtime-worker"), ResetId(6)) Reset dropped worker core is_some? false, context core is_some? false
(ThreadId(11) Some("tokio-runtime-worker")) id_valid called
(ThreadId(11) Some("tokio-runtime-worker")) block_in_place is on thread pool
(ThreadId(11) Some("tokio-runtime-worker")) block_in_place no context core
(ThreadId(11) Some("tokio-runtime-worker"), ResetId(7)) block_in_place Reset has been instantiated
(ThreadId(11) Some("tokio-runtime-worker")) id_valid inside block_in_place
(ThreadId(11) Some("tokio-runtime-worker"), ResetId(7)) Reset dropped worker core is_some? false, context core is_some? false
<-- problematic function call end -->
<-- a few seconds later -->
(ThreadId(11) Some("tokio-runtime-worker"), ResetId(0)) Reset dropped worker core is_some? false, context core is_some? false
<-- no panic -->

Everything is the same up until line 19 of each log output where, in the non-panicking output, the there's no worker core (and still not context core).

Somehow that prevents it from panicking. It seems odd to me that the worker's core would ever get put back, outside of Reset::drop.

My branch with the println!-style debugging is here: master...jeromegn:block-cx-core-none

carllerche added a commit that referenced this issue Jun 29, 2023
This patch fixes a bug where nesting `block_in_place` with a `block_on`
between could lead to a panic. This happened because the nested
`block_in_place` would try to acquire a core on return when it should
not attempt to do so. The `block_on` between the two nested
`block_in_place` altered the thread-local state to lead to the incorrect
behavior.

The fix is for each call to `block_in_place` to track if it needs to try
to steal a core back.

Fixes #5239
carllerche added a commit that referenced this issue Jun 29, 2023
This patch fixes a bug where nesting `block_in_place` with a `block_on`
between could lead to a panic. This happened because the nested
`block_in_place` would try to acquire a core on return when it should
not attempt to do so. The `block_on` between the two nested
`block_in_place` altered the thread-local state to lead to the incorrect
behavior.

The fix is for each call to `block_in_place` to track if it needs to try
to steal a core back.

Fixes #5239
carllerche added a commit that referenced this issue Jun 29, 2023
This patch fixes a bug where nesting `block_in_place` with a `block_on`
between could lead to a panic. This happened because the nested
`block_in_place` would try to acquire a core on return when it should
not attempt to do so. The `block_on` between the two nested
`block_in_place` altered the thread-local state to lead to the incorrect
behavior.

The fix is for each call to `block_in_place` to track if it needs to try
to steal a core back.

Fixes #5239
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants