assert failed: cx_core.is_none() #5239

koivunej · 2022-11-29T08:32:35Z

Version

tokio 1.21.1, with grep results prettified:

tokio v1.21.1
tokio-util v0.7.4
tokio-stream v0.1.11
tokio-macros v1.8.0 (proc-macro)
tokio-io-timeout v1.2.0
tokio-rustls v0.23.4
tokio-postgres v0.7.6 (https://github.com/neondatabase/rust-postgres.git?rev=d052ee8b86fff9897c77b0fe89ea9daba0e1fa38#d052ee8b)
tokio-native-tls v0.3.0
hyper v0.14.20

We use rust 1.62.1.

Platform

Initially detected from a CI run on:

amazon ec2
Linux hostname 5.10.144-127.601.amzn2.x86_64 #1 SMP Thu Sep 29 01:11:59 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Reproduced on:

ubuntu 22.04
Linux hostname 5.15.0-52-generic #58-Ubuntu SMP Thu Oct 13 08:03:55 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
AMD Ryzen 9 3900XT 12-Core Processor

Description

Assertion failure message with release build:

thread 'mgmt request worker' panicked at 'assertion failed: cx_core.is_none()', /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.21.1/src/runtime/scheduler/multi_thread/worker.rs:263:21

In our codebase this is after a few tries reproducable under load locally for me. Load as in while true; do cargo clean && cargo build; done for example running in the background. I can try out some patches if needed.

I haven't been able to find an MVCE.

Full steps to reproduce in our codebase

# install all dependencies from repository README.md:
# https://github.com/koivunej/neon/tree/tokio_assertion_failure#running-local-installation

git clone --recursive --branch tokio_assertion_failure https://github.com/koivunej/neon.git

# release build is needed to reproduce
BUILD_TYPE=release CARGO_BUILD_FLAGS="--features=testing,profiling" make -s -j4

# install more dependencies
PYTHON_KEYRING_BACKEND=keyring.backends.null.Keyring ./scripts/pysync

# add external load in another terminal to this
while true; do NEON_BIN=target/release ./scripts/pytest test_runner/regress/test_gc_aggressive.py::test_gc_aggressive; done

Expect to see:

FAILED test_runner/regress/test_gc_aggressive.py::test_gc_aggressive - requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

Then you will find the assertion failure in test_output/test_gc_aggressive/repo/pageserver.log. I have also copied the full stacktrace to the next <details>.

RUST_BACKTRACE=full of the assertion failure

thread 'mgmt request worker' panicked at 'assertion failed: cx_core.is_none()', /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.21.1/src/runtime/scheduler/multi_thread/worker.rs:263:21
stack backtrace:
   0:     0x56374720a37d - std::backtrace_rs::backtrace::libunwind::trace::h8e036432725b1c57
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1:     0x56374720a37d - std::backtrace_rs::backtrace::trace_unsynchronized::h4f83092254c85869
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x56374720a37d - std::sys_common::backtrace::_print_fmt::h9728b5e056a3ece3
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/sys_common/backtrace.rs:66:5
   3:     0x56374720a37d - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h48bb4bd2928827d2
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/sys_common/backtrace.rs:45:22
   4:     0x563747232e9c - core::fmt::write::h909e69a2c24f44cc
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/fmt/mod.rs:1196:17
   5:     0x563747202061 - std::io::Write::write_fmt::h7f4b8ab8af89e9ef
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/io/mod.rs:1654:15
   6:     0x56374720bcf5 - std::sys_common::backtrace::_print::hff4838ebf14a2171
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/sys_common/backtrace.rs:48:5
   7:     0x56374720bcf5 - std::sys_common::backtrace::print::h2499280374189ad9
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/sys_common/backtrace.rs:35:9
   8:     0x56374720bcf5 - std::panicking::default_hook::{{closure}}::h8b270fc55eeb284e
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:295:22
   9:     0x56374720b969 - std::panicking::default_hook::h3217e229d6e9d13c
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:314:9
  10:     0x56374720c3d8 - std::panicking::rust_panic_with_hook::h9acb8048b738d2e0
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:698:17
  11:     0x56374720c249 - std::panicking::begin_panic_handler::{{closure}}::h70f3b839526af6dc
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:586:13
  12:     0x56374720a834 - std::sys_common::backtrace::__rust_end_short_backtrace::h1ecf2cee857fbe0a
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/sys_common/backtrace.rs:138:18
  13:     0x56374720bfb9 - rust_begin_unwind
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:584:5
  14:     0x563747230b63 - core::panicking::panic_fmt::h9f8393e7fd56d655
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:142:14
  15:     0x5637472309ad - core::panicking::panic::h021666fc6a0f7b6b
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:48:5
  16:     0x56374710c22b - <tokio::runtime::scheduler::multi_thread::worker::block_in_place::Reset as core::ops::drop::Drop>::drop::{{closure}}::hd65847a1090ca025
  17:     0x5637471062c5 - <tokio::runtime::scheduler::multi_thread::worker::block_in_place::Reset as core::ops::drop::Drop>::drop::h42ae149038909fb7
  18:     0x56374697512e - core::ptr::drop_in_place<tokio::runtime::scheduler::multi_thread::worker::block_in_place::Reset>::h1e6f731fa79d34ba
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/ptr/mod.rs:486:1
  19:     0x56374697512e - tokio::runtime::scheduler::multi_thread::worker::block_in_place::hda495eb5ef5a1acd
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.21.1/src/runtime/scheduler/multi_thread/worker.rs:340:5
  20:     0x563746b08340 - tokio::task::blocking::block_in_place::ha97b73b75ce70862
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.21.1/src/task/blocking.rs:77:9
  21:     0x563746b08340 - pageserver::tenant::Tenant::gc_iteration::{{closure}}::hcc45b24d96148799
                               at /home/joonas/src/neon/neon/pageserver/src/tenant.rs:530:9
  22:     0x563746b08340 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h308288025478c0c0
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/mod.rs:91:19
  23:     0x56374687c68c - <tracing::instrument::Instrumented<T> as core::future::future::Future>::poll::hda287b8f128780d0
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tracing-0.1.37/src/instrument.rs:272:9
  24:     0x563746b0fcfd - pageserver::http::routes::timeline_gc_handler::{{closure}}::h0e56b6cccdfe75f6
                               at /home/joonas/src/neon/neon/pageserver/src/http/routes.rs:849:91
  25:     0x563746b0fcfd - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h4dee783785ea8184
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/mod.rs:91:19
  26:     0x56374678f3b7 - <core::pin::Pin<P> as core::future::future::Future>::poll::h5dbc8583f5dbf765
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/future.rs:124:9
  27:     0x56374678f3b7 - routerify::route::Route<B,E>::process::{{closure}}::h7fffd52673600116
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/routerify-3.0.0/src/route/mod.rs:105:32
  28:     0x56374678f3b7 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::ha39752ecfad407be
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/mod.rs:91:19
  29:     0x56374678f3b7 - routerify::router::Router<B,E>::process::{{closure}}::hc3d490240cd467ff
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/routerify-3.0.0/src/router/mod.rs:308:89
  30:     0x56374678f3b7 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h88afc17f6a7162c2
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/mod.rs:91:19
  31:     0x56374678f3b7 - <routerify::service::request_service::RequestService<B,E> as tower_service::Service<http::request::Request<hyper::body::body::Body>>>::call::{{closure}}::hf419aede28588ee7
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/routerify-3.0.0/src/service/request_service.rs:56:72
  32:     0x5637467b93a5 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h2de32919bd847725
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/mod.rs:91:19
  33:     0x5637467d596e - <core::pin::Pin<P> as core::future::future::Future>::poll::h3faa950168332df5
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/future.rs:124:9
  34:     0x5637467d596e - <hyper::proto::h1::dispatch::Server<S,hyper::body::body::Body> as hyper::proto::h1::dispatch::Dispatch>::poll_msg::hd5117f65306c4294
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/hyper-0.14.20/src/proto/h1/dispatch.rs:491:35
  35:     0x5637467d596e - hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T>::poll_write::hc55c2ea65eaff573
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/hyper-0.14.20/src/proto/h1/dispatch.rs:297:43
  36:     0x5637467d596e - hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T>::poll_loop::h214e07f7181a2707
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/hyper-0.14.20/src/proto/h1/dispatch.rs:161:21
  37:     0x5637467d30fd - hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T>::poll_inner::h2b3d24b8f8211935
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/hyper-0.14.20/src/proto/h1/dispatch.rs:137:16
  38:     0x5637467d30fd - hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T>::poll_catch::hfead020b3bd85cd6
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/hyper-0.14.20/src/proto/h1/dispatch.rs:120:28
  39:     0x5637466f9f52 - <hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T> as core::future::future::Future>::poll::hb9d39bd98e716b09
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/hyper-0.14.20/src/proto/h1/dispatch.rs:424:9
  40:     0x5637466f9f52 - <hyper::server::conn::ProtoServer<T,B,S,E> as core::future::future::Future>::poll::h7665d21f4b883402
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/hyper-0.14.20/src/server/conn.rs:952:47
  41:     0x5637466f9f52 - <hyper::server::conn::upgrades::UpgradeableConnection<I,S,E> as core::future::future::Future>::poll::hb96f5473d0574cb8
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/hyper-0.14.20/src/server/conn.rs:1012:30
  42:     0x56374671e6bf - <hyper::common::drain::Watching<F,FN> as core::future::future::Future>::poll::hf0c8ec2a7a8ed8b0
  43:     0x56374671e6bf - <hyper::server::server::new_svc::NewSvcTask<I,N,S,E,W> as core::future::future::Future>::poll::h846866b9a0929fda
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/hyper-0.14.20/src/server/server.rs:728:36
  44:     0x5637467f65e7 - tokio::runtime::task::core::CoreStage<T>::poll::{{closure}}::h9a58eefb1d854ebe
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.21.1/src/runtime/task/core.rs:184:17
  45:     0x5637467f65e7 - tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut::hbd0e5f206f1f3f6f
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.21.1/src/loom/std/unsafe_cell.rs:14:9
  46:     0x5637467f65e7 - tokio::runtime::task::core::CoreStage<T>::poll::hbee48de80c4fcccd
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.21.1/src/runtime/task/core.rs:174:13
  47:     0x56374685c61a - tokio::runtime::task::harness::poll_future::{{closure}}::h7ca64421cdeddcb2
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.21.1/src/runtime/task/harness.rs:480:19
  48:     0x56374685c61a - <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h2de3a15ff26ba160
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panic/unwind_safe.rs:271:9
  49:     0x56374685c61a - std::panicking::try::do_call::hf6d7a880e62abda6
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:492:40
  50:     0x56374685c61a - std::panicking::try::h531c1d3ec5cbe2b2
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:456:19
  51:     0x56374685c61a - std::panic::catch_unwind::h4f0af80b22a9de64
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panic.rs:137:14
  52:     0x56374685c61a - tokio::runtime::task::harness::poll_future::h57ec7dda84531f03
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.21.1/src/runtime/task/harness.rs:468:18
  53:     0x56374685c61a - tokio::runtime::task::harness::Harness<T,S>::poll_inner::heca3dd74238bdd7e
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.21.1/src/runtime/task/harness.rs:104:27
  54:     0x56374685c61a - tokio::runtime::task::harness::Harness<T,S>::poll::he0f319957dba656d
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.21.1/src/runtime/task/harness.rs:57:15
  55:     0x5637470e35c5 - std::thread::local::LocalKey<T>::with::h38aaa913b8a48d65
  56:     0x563747107563 - tokio::runtime::scheduler::multi_thread::worker::Context::run_task::hf28064e32e379826
  57:     0x563747106a80 - tokio::runtime::scheduler::multi_thread::worker::Context::run::hec211607b213b37b
  58:     0x56374710a7b7 - tokio::macros::scoped_tls::ScopedKey<T>::set::hd7166d6799738ff0
  59:     0x5637471064a9 - tokio::runtime::scheduler::multi_thread::worker::run::h958f4678849dd1fe
  60:     0x5637470f575c - <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll::h0ab71826e7387519
  61:     0x5637470da7e9 - tokio::runtime::task::harness::Harness<T,S>::poll::h091e55b483c30575
  62:     0x5637470f4f5a - tokio::runtime::blocking::pool::Inner::run::h3a91a3d2536a1c92
  63:     0x5637470e6df2 - std::sys_common::backtrace::__rust_begin_short_backtrace::h6a13e50bb80c5a9b
  64:     0x5637470e751f - core::ops::function::FnOnce::call_once{{vtable.shim}}::h81568063c1016e71
  65:     0x563747212053 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h191d5c5ea3edb31d
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/alloc/src/boxed.rs:1872:9
  66:     0x563747212053 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h42ef7cb2ae640a31
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/alloc/src/boxed.rs:1872:9
  67:     0x563747212053 - std::sys::unix::thread::Thread::new::thread_start::he47f7169665dab60
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/sys/unix/thread.rs:108:17
  68:     0x7f3b56aacb43 - start_thread
                               at ./nptl/./nptl/pthread_create.c:442:8
  69:     0x7f3b56b3ea00 - clone3
                               at ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
  70:                0x0 - <unknown>

In this branch, I've sprinkled a lot of block_in_place around the blocking parts after I ran into a deadlock caused by the unstealable lifo slot because there was blocking within the runtime threads. It is unlikely that I've caught all places of blocking within async context.

If I ended up misusing the block_in_place and block_on then I wish the assertion would have a clear message about the misuse. However since it only triggers under external load (and while being nice -20), I suspect it is a real tokio issue.

The text was updated successfully, but these errors were encountered:

koivunej · 2022-11-29T08:48:28Z

RUST_BACKTRACE=full with tokio 1.22.0

thread 'mgmt request worker' panicked at 'assertion failed: cx_core.is_none()', /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.22.0/src/runtime/scheduler/multi_thread/worker.rs:265:21
stack backtrace:
   0:     0x55566c170cfd - std::backtrace_rs::backtrace::libunwind::trace::h8e036432725b1c57
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1:     0x55566c170cfd - std::backtrace_rs::backtrace::trace_unsynchronized::h4f83092254c85869
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x55566c170cfd - std::sys_common::backtrace::_print_fmt::h9728b5e056a3ece3
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/sys_common/backtrace.rs:66:5
   3:     0x55566c170cfd - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h48bb4bd2928827d2
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/sys_common/backtrace.rs:45:22
   4:     0x55566c19987c - core::fmt::write::h909e69a2c24f44cc
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/fmt/mod.rs:1196:17
   5:     0x55566c1689d1 - std::io::Write::write_fmt::h7f4b8ab8af89e9ef
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/io/mod.rs:1654:15
   6:     0x55566c172675 - std::sys_common::backtrace::_print::hff4838ebf14a2171
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/sys_common/backtrace.rs:48:5
   7:     0x55566c172675 - std::sys_common::backtrace::print::h2499280374189ad9
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/sys_common/backtrace.rs:35:9
   8:     0x55566c172675 - std::panicking::default_hook::{{closure}}::h8b270fc55eeb284e
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:295:22
   9:     0x55566c1722e9 - std::panicking::default_hook::h3217e229d6e9d13c
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:314:9
  10:     0x55566c172d58 - std::panicking::rust_panic_with_hook::h9acb8048b738d2e0
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:698:17
  11:     0x55566c172bc9 - std::panicking::begin_panic_handler::{{closure}}::h70f3b839526af6dc
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:586:13
  12:     0x55566c1711b4 - std::sys_common::backtrace::__rust_end_short_backtrace::h1ecf2cee857fbe0a
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/sys_common/backtrace.rs:138:18
  13:     0x55566c172939 - rust_begin_unwind
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:584:5
  14:     0x55566c197543 - core::panicking::panic_fmt::h9f8393e7fd56d655
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:142:14
  15:     0x55566c19738d - core::panicking::panic::h021666fc6a0f7b6b
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:48:5
  16:     0x55566c070c2f - <tokio::runtime::scheduler::multi_thread::worker::block_in_place::Reset as core::ops::drop::Drop>::drop::{{closure}}::h780bb364a103247b
  17:     0x55566c04b9d5 - <tokio::runtime::scheduler::multi_thread::worker::block_in_place::Reset as core::ops::drop::Drop>::drop::hb1e875619dadefa6
  18:     0x55566b7e7681 - core::ptr::drop_in_place<tokio::runtime::scheduler::multi_thread::worker::block_in_place::Reset>::hacd502eeb7e9dd39
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/ptr/mod.rs:486:1
  19:     0x55566b7e7681 - tokio::runtime::scheduler::multi_thread::worker::block_in_place::h99940af26e422d93
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.22.0/src/runtime/scheduler/multi_thread/worker.rs:357:5
  20:     0x55566b9f4e8d - tokio::task::blocking::block_in_place::h568815439690e29e
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.22.0/src/task/blocking.rs:78:9
  21:     0x55566b9f4e8d - pageserver::tenant::Tenant::gc_iteration::{{closure}}::hb2eb232bd6a6b106
                               at /home/joonas/src/neon/neon/pageserver/src/tenant.rs:530:9
  22:     0x55566b9f4e8d - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h585b98797c18043e
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/mod.rs:91:19
  23:     0x55566b7c2c9c - <tracing::instrument::Instrumented<T> as core::future::future::Future>::poll::h21edc08432595c21
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tracing-0.1.37/src/instrument.rs:272:9
  24:     0x55566b9e54e8 - pageserver::http::routes::timeline_gc_handler::{{closure}}::h93932cd55bf6b3f2
                               at /home/joonas/src/neon/neon/pageserver/src/http/routes.rs:849:91
  25:     0x55566b9e54e8 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h26b7bf5b7ee4e2f8
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/mod.rs:91:19
  26:     0x55566b6939a7 - <core::pin::Pin<P> as core::future::future::Future>::poll::h37e3cb1d47d83990
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/future.rs:124:9
  27:     0x55566b6939a7 - routerify::route::Route<B,E>::process::{{closure}}::hf8cae15a3588e35e
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/routerify-3.0.0/src/route/mod.rs:105:32
  28:     0x55566b6939a7 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h777f77ec3363521b
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/mod.rs:91:19
  29:     0x55566b6939a7 - routerify::router::Router<B,E>::process::{{closure}}::hf31eb70c89e77947
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/routerify-3.0.0/src/router/mod.rs:308:89
  30:     0x55566b6939a7 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::he859109219d1c6e9
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/mod.rs:91:19
  31:     0x55566b6939a7 - <routerify::service::request_service::RequestService<B,E> as tower_service::Service<http::request::Request<hyper::body::body::Body>>>::call::{{closure}}::h30cef40d969885d8
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/routerify-3.0.0/src/service/request_service.rs:56:72
  32:     0x55566b6d0625 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::hf7e345e7160044bb
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/mod.rs:91:19
  33:     0x55566b6e447e - <core::pin::Pin<P> as core::future::future::Future>::poll::hc75ef90ec2156114
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/future/future.rs:124:9
  34:     0x55566b6e447e - <hyper::proto::h1::dispatch::Server<S,hyper::body::body::Body> as hyper::proto::h1::dispatch::Dispatch>::poll_msg::hcbb33d72b117a0b2
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/hyper-0.14.20/src/proto/h1/dispatch.rs:491:35
  35:     0x55566b6e447e - hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T>::poll_write::h03958203cc8a40a5
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/hyper-0.14.20/src/proto/h1/dispatch.rs:297:43
  36:     0x55566b6e447e - hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T>::poll_loop::h0833e4c3bb93bff9
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/hyper-0.14.20/src/proto/h1/dispatch.rs:161:21
  37:     0x55566b6e1c0d - hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T>::poll_inner::he5ae63ae6fbdbe2a
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/hyper-0.14.20/src/proto/h1/dispatch.rs:137:16
  38:     0x55566b6e1c0d - hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T>::poll_catch::hccdda79bb029e058
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/hyper-0.14.20/src/proto/h1/dispatch.rs:120:28
  39:     0x55566b5fdc12 - <hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T> as core::future::future::Future>::poll::h932231f5365edb01
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/hyper-0.14.20/src/proto/h1/dispatch.rs:424:9
  40:     0x55566b5fdc12 - <hyper::server::conn::ProtoServer<T,B,S,E> as core::future::future::Future>::poll::h6f320f37c2d19f52
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/hyper-0.14.20/src/server/conn.rs:952:47
  41:     0x55566b5fdc12 - <hyper::server::conn::upgrades::UpgradeableConnection<I,S,E> as core::future::future::Future>::poll::h76a89e0c6fcef9a6
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/hyper-0.14.20/src/server/conn.rs:1012:30
  42:     0x55566b61e54f - <hyper::common::drain::Watching<F,FN> as core::future::future::Future>::poll::hb3549caad34c092f
  43:     0x55566b61e54f - <hyper::server::server::new_svc::NewSvcTask<I,N,S,E,W> as core::future::future::Future>::poll::h71bbc06b1d3c6dcb
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/hyper-0.14.20/src/server/server.rs:728:36
  44:     0x55566b6867f6 - tokio::runtime::task::core::Core<T,S>::poll::{{closure}}::had33aadfb8a7be1b
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.22.0/src/runtime/task/core.rs:208:17
  45:     0x55566b6867f6 - tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut::h68313c3a57d767ff
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.22.0/src/loom/std/unsafe_cell.rs:14:9
  46:     0x55566b6867f6 - tokio::runtime::task::core::Core<T,S>::poll::h585708207cd04ce0
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.22.0/src/runtime/task/core.rs:197:13
  47:     0x55566b75e3bc - tokio::runtime::task::harness::poll_future::{{closure}}::h0f86aebe72adbf02
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.22.0/src/runtime/task/harness.rs:483:19
  48:     0x55566b75e3bc - <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h57362d0ed921d31a
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panic/unwind_safe.rs:271:9
  49:     0x55566b75e3bc - std::panicking::try::do_call::h59363edc7a6fa72f
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:492:40
  50:     0x55566b75e3bc - std::panicking::try::h0aa7d8957e0246ab
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:456:19
  51:     0x55566b75e3bc - std::panic::catch_unwind::h69488bb3e9c40098
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panic.rs:137:14
  52:     0x55566b75e3bc - tokio::runtime::task::harness::poll_future::hfc1f7155fdd8e61f
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.22.0/src/runtime/task/harness.rs:471:18
  53:     0x55566b75e3bc - tokio::runtime::task::harness::Harness<T,S>::poll_inner::h7ad5020e41d84ed0
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.22.0/src/runtime/task/harness.rs:107:27
  54:     0x55566b75e3bc - tokio::runtime::task::harness::Harness<T,S>::poll::h18715220ee6a3a95
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.22.0/src/runtime/task/harness.rs:61:15
  55:     0x55566c04c7d3 - tokio::runtime::scheduler::multi_thread::worker::Context::run_task::h6b54d484625cb2c7
  56:     0x55566c04c16f - tokio::runtime::scheduler::multi_thread::worker::Context::run::hbe53e8933f20a75e
  57:     0x55566c070587 - tokio::macros::scoped_tls::ScopedKey<T>::set::h987ed7e93ee03797
  58:     0x55566c04bbe1 - tokio::runtime::scheduler::multi_thread::worker::run::h16d9391f353449a4
  59:     0x55566b9791db - tokio::runtime::scheduler::multi_thread::worker::block_in_place::{{closure}}::{{closure}}::h595166236bea784b
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.22.0/src/runtime/scheduler/multi_thread/worker.rs:343:41
  60:     0x55566b9791db - <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll::hd703d0f9b339ae0f
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.22.0/src/runtime/blocking/task.rs:42:21
  61:     0x55566b9791db - tokio::runtime::task::core::Core<T,S>::poll::{{closure}}::h07f1e5d332b8ee48
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.22.0/src/runtime/task/core.rs:208:17
  62:     0x55566b9791db - tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut::h3462463ef6d987c2
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.22.0/src/loom/std/unsafe_cell.rs:14:9
  63:     0x55566ba9659f - tokio::runtime::task::core::Core<T,S>::poll::h4e3f09706b7975eb
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.22.0/src/runtime/task/core.rs:197:13
  64:     0x55566b91f2e4 - tokio::runtime::task::harness::poll_future::{{closure}}::h35c612330e09f6da
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.22.0/src/runtime/task/harness.rs:483:19
  65:     0x55566b91f2e4 - <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h998d7023d74a0259
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panic/unwind_safe.rs:271:9
  66:     0x55566b91f2e4 - std::panicking::try::do_call::hc24d9acc2e2059db
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:492:40
  67:     0x55566b91f2e4 - std::panicking::try::h2eb20d9b5940731c
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:456:19
  68:     0x55566b91f2e4 - std::panic::catch_unwind::hc564cc3996afde48
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panic.rs:137:14
  69:     0x55566b91f2e4 - tokio::runtime::task::harness::poll_future::hf98c61c1fda4e0a6
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.22.0/src/runtime/task/harness.rs:471:18
  70:     0x55566b91f2e4 - tokio::runtime::task::harness::Harness<T,S>::poll_inner::h060668b564c3a248
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.22.0/src/runtime/task/harness.rs:107:27
  71:     0x55566b91f2e4 - tokio::runtime::task::harness::Harness<T,S>::poll::hee4e8b8ee5bd2a6a
                               at /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.22.0/src/runtime/task/harness.rs:61:15
  72:     0x55566c050893 - std::sys_common::backtrace::__rust_begin_short_backtrace::hf85e7fce2451be54
  73:     0x55566c05632f - core::ops::function::FnOnce::call_once{{vtable.shim}}::hd229d8dfd0a88555
  74:     0x55566c178a33 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h191d5c5ea3edb31d
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/alloc/src/boxed.rs:1872:9
  75:     0x55566c178a33 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h42ef7cb2ae640a31
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/alloc/src/boxed.rs:1872:9
  76:     0x55566c178a33 - std::sys::unix::thread::Thread::new::thread_start::he47f7169665dab60
                               at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/sys/unix/thread.rs:108:17
  77:     0x7fb551b19b43 - start_thread
                               at ./nptl/./nptl/pthread_create.c:442:8
  78:     0x7fb551baba00 - clone3
                               at ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
  79:                0x0 - <unknown>

Veetaha · 2022-12-07T15:55:20Z

We have the same problem in our private repository, which is very hard to debug, because it is nested deep inside a really heavy-weight logic (tens of minutes to re-run). The main problem is that this panic is highly flaky and hard to reproduce.

koivunej · 2022-12-07T16:12:32Z

I did not try it yet but hoping to try this week; catch the bug happening in an rr recording, then find the reproducer.

Neglected to mention in the issue, we have multiple runtimes in our codebase as well (because of the blocking in async contexts).

koivunej · 2022-12-08T09:27:04Z

With rr record --chaos --num-cores=2 this reproduces instantly. Getting this reproduced doesn't make it easy to track and understand. Will update this with any insights I gain.

Darksonn · 2022-12-08T14:04:40Z

If I ended up misusing the block_in_place and block_on then I wish the assertion would have a clear message about the misuse. However since it only triggers under external load (and while being nice -20), I suspect it is a real tokio issue.

The assertion you triggered certainly shouldn't happen under normal circumstances.

koivunej · 2022-12-08T14:19:44Z

Found the single file reproducer/mvce. When ran with cargo run it finds the assertion error in first attempt. When adding incremental compilations, it seems to become more difficult.

Please find it at https://github.com/koivunej/tokio-nested-block_in_place.

Expected output:

thread 'tokio-runtime-worker' panicked at 'assertion failed: cx_core.is_none()', /home/joonas/.cargo/registry/src/github.ghproxy.top-1ecc6299db9ec823/tokio-1.23.0/src/runtime/scheduler/multi_thread/worker.rs:265:21
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

hit assertion as expected: "assertion failed: cx_core.is_none()" in 1th attempt

Checked tokio versions:

1.21.1
1.22.0 (in Cargo.toml)
1.23.0

With AMD Ryzen 9 3900XT it reproduces regardless of rr's zen_workaround. Would be interesting to hear if it happens on other cpus.

rr record does not seem to catch it but rr record --chaos and rr record --num-cores=2 do.

Confirmed on:

apple m2
Intel(R) Core(TM) i7-8650U
zen2 and zen3 ryzens

koivunej · 2022-12-08T15:02:47Z

Maybe it can still be slimmed down, but I don't understand why with regards to how difficult it was to catch originally:

#[tokio::main]
async fn main() {
    use tokio::{
        runtime::Handle,
        task::{block_in_place, spawn},
    };

    let res = spawn(async move {
        block_in_place(|| {
            Handle::current().block_on(async {
                block_in_place(|| {});
            });
        });
    })
    .await;

    let e = match res {
        Ok(()) => unreachable!("no panic?"),
        Err(e) => e,
    };

    let panic = e.into_panic();

    let s = *panic.downcast_ref::<&'static str>().unwrap();
    assert_eq!("assertion failed: cx_core.is_none()", s);
}

koivunej · 2022-12-08T15:29:15Z

Oki it seems this is simply beyond what was fixed in #2645. I had assumed that nested block_in_place + block_on was supported.

Veetaha · 2022-12-08T16:22:10Z

I'd also expect block_in_place(|| block_on(|| block_in_place(|| { ... }))) would be supported at any level of nesting. We rely on that heavily to run async stuff in regular Drop

Darksonn · 2022-12-08T22:31:43Z

Yes, it should work at any level of nesting as long as the block_on is the one on `Handle.

koivunej · 2022-12-09T07:13:03Z

Yes, it should work at any level of nesting as long as the block_on is the one on Handle.

Looking at the existing tests, I was thinking maybe the expectation is to use a different or new current thread runtime or futures executor to drive the more nested:

tokio/tokio/tests/task_blocking.rs

Lines 212 to 215 in c693ccd

    
           futures::executor::block_on(async move { 
        
               use tokio_stream::StreamExt; 
        
               assert_eq!(rx.fold(0, |n, _| n + 1).await, 200); 
        
           })

but that would not work for surprising code paths which end up doing tokio::task::block_in_place.

So I don't really understand the response here.

Still, I'm no longer looking at this approach, I've always thought the block_in_place is a feature of "last resort".

Darksonn · 2022-12-09T10:28:43Z

Using block_in_place doesn't work inside the current-thread runtime, which is why I mentioned Handle.

jeromegn · 2023-06-29T12:20:10Z

I'm facing a similar issue, but I can't quite figure out what's triggering it. I thought it was a combination of spawn_blocking and block_in_place within a parent block_in_place, but I can't reproduce in any test, as hard as I try.

Looking at the backtrace, it seems this happens in the Drop impl of the Reset struct. Following the logic, there's one place where the cx.core() would be None and tokio still builds that Reset struct expecting it to Drop and get a Some(core) from cx.core.borrow_mut().

I guess this is evident, but I figured I'd write it out: it seems like some logic is putting back the Core on the Context when it shouldn't be there.

~~Looking into the code involved, there's at least this place where had_entered == true, but tokio couldn't take the Core:~~

tokio/tokio/src/runtime/scheduler/multi_thread/worker.rs

Lines 391 to 395 in ec1f52e

    
           // Get the worker core. If none is set, then blocking is fine! 
        
           let core = match cx.core.borrow_mut().take() { 
        
               Some(core) => core, 
        
               None => return Ok(()), 
        
           };

~~Since had_entered == true, logic after the setup will create a Reset instance, and if the Core was not put back before it drops, will trigger this assertion failure.~~

~~Maybe that can't really happen, I'm not sure.~~

Edit: I might've gotten confused. It is expected that cx.core.is_none(), the issue is when there's a Core there already.

jeromegn · 2023-06-29T13:45:17Z

I added this log line before the assert:

println!(
    "Reset dropped worker core is_some? {}, context core is_some? {}",
    core.is_some(),
    cx_core.is_some()
);

When the assert failed, I got:

Reset dropped worker core is_some? false, context core is_some? false
Reset dropped worker core is_some? false, context core is_some? false

<-- here I'm calling code that might end calling some nested block_on + spawn_blocking + block_in_place -->

Reset dropped worker core is_some? true, context core is_some? false
Reset dropped worker core is_some? true, context core is_some? false
Reset dropped worker core is_some? true, context core is_some? false
Reset dropped worker core is_some? true, context core is_some? false
Reset dropped worker core is_some? true, context core is_some? false
Reset dropped worker core is_some? true, context core is_some? false

<-- the aforementioned code returned a value -->

Reset dropped worker core is_some? false, context core is_some? true
thread 'tokio-runtime-worker' panicked at 'assertion failed: cx_core.is_none()', /usr/local/cargo/git/checkouts/tokio-4a5e90f3eb383b08/05746ef/tokio/src/runtime/scheduler/multi_thread/worker.rs:339:21

I'm assuming it's a problem that the core is put back when it's not expected to be put back. It sounds like some blocking task might've ran on a thread that wasn't marked as blocking anymore.

Edit: I confirmed that the Reset dropped ... log lines after my logic calling the oddly nested block_on + spawn_blocking + block_in_place is happening from the same thread, including the final log before the panic.

jeromegn · 2023-06-29T15:14:03Z

Some more debugging, added thread id and name + added a thread local AtomicU16 acting as some kind of ID for the Reset struct.

(ThreadId(14) Some("tokio-runtime-worker"), ResetId(22)) block_in_place Reset has been instantiated
<-- problematic function call start -->
(ThreadId(14) Some("tokio-runtime-worker")) id_valid called
(ThreadId(14) Some("tokio-runtime-worker")) block_in_place is on thread pool
(ThreadId(14) Some("tokio-runtime-worker")) block_in_place no context core
(ThreadId(14) Some("tokio-runtime-worker"), ResetId(23)) block_in_place Reset has been instantiated
(ThreadId(14) Some("tokio-runtime-worker")) id_valid inside block_in_place
(ThreadId(14) Some("tokio-runtime-worker"), ResetId(23)) Reset dropped worker core is_some? true, context core is_some? false
(ThreadId(14) Some("tokio-runtime-worker")) id_valid called
(ThreadId(14) Some("tokio-runtime-worker")) block_in_place is on thread pool
(ThreadId(14) Some("tokio-runtime-worker"), ResetId(24)) block_in_place Reset has been instantiated
(ThreadId(14) Some("tokio-runtime-worker")) id_valid inside block_in_place
(ThreadId(14) Some("tokio-runtime-worker"), ResetId(24)) Reset dropped worker core is_some? true, context core is_some? false
(ThreadId(14) Some("tokio-runtime-worker")) id_valid called
(ThreadId(14) Some("tokio-runtime-worker")) block_in_place is on thread pool
(ThreadId(14) Some("tokio-runtime-worker"), ResetId(25)) block_in_place Reset has been instantiated
(ThreadId(14) Some("tokio-runtime-worker")) id_valid inside block_in_place
(ThreadId(14) Some("tokio-runtime-worker"), ResetId(25)) Reset dropped worker core is_some? true, context core is_some? false
(ThreadId(14) Some("tokio-runtime-worker")) id_valid called
(ThreadId(14) Some("tokio-runtime-worker")) block_in_place is on thread pool
(ThreadId(14) Some("tokio-runtime-worker"), ResetId(26)) block_in_place Reset has been instantiated
(ThreadId(14) Some("tokio-runtime-worker")) id_valid inside block_in_place
(ThreadId(14) Some("tokio-runtime-worker"), ResetId(26)) Reset dropped worker core is_some? true, context core is_some? false
(ThreadId(14) Some("tokio-runtime-worker")) id_valid called
(ThreadId(14) Some("tokio-runtime-worker")) block_in_place is on thread pool
(ThreadId(14) Some("tokio-runtime-worker"), ResetId(27)) block_in_place Reset has been instantiated
(ThreadId(14) Some("tokio-runtime-worker")) id_valid inside block_in_place
(ThreadId(14) Some("tokio-runtime-worker"), ResetId(27)) Reset dropped worker core is_some? true, context core is_some? false
(ThreadId(14) Some("tokio-runtime-worker")) id_valid called
(ThreadId(14) Some("tokio-runtime-worker")) block_in_place is on thread pool
(ThreadId(14) Some("tokio-runtime-worker"), ResetId(28)) block_in_place Reset has been instantiated
(ThreadId(14) Some("tokio-runtime-worker")) id_valid inside block_in_place
(ThreadId(14) Some("tokio-runtime-worker"), ResetId(28)) Reset dropped worker core is_some? true, context core is_some? false
<-- problematic function call end -->
<-- a few seconds later -->
(ThreadId(14) Some("tokio-runtime-worker"), ResetId(22)) Reset dropped worker core is_some? false, context core is_some? true
thread 'tokio-runtime-worker' panicked at 'assertion failed: cx_core.is_none()', /usr/local/cargo/git/checkouts/tokio-4a5e90f3eb383b08/e608291/tokio/src/runtime/scheduler/multi_thread/worker.rs:347:21

Compared to one where no panic occur:

(ThreadId(11) Some("tokio-runtime-worker")) block_in_place is on thread pool
(ThreadId(11) Some("tokio-runtime-worker"), ResetId(0)) block_in_place Reset has been instantiated
<-- problematic function call start -->
(ThreadId(11) Some("tokio-runtime-worker")) id_valid called
(ThreadId(11) Some("tokio-runtime-worker")) block_in_place is on thread pool
(ThreadId(11) Some("tokio-runtime-worker")) block_in_place no context core
(ThreadId(11) Some("tokio-runtime-worker"), ResetId(1)) block_in_place Reset has been instantiated
(ThreadId(11) Some("tokio-runtime-worker")) id_valid inside block_in_place
(ThreadId(11) Some("tokio-runtime-worker"), ResetId(1)) Reset dropped worker core is_some? true, context core is_some? false
(ThreadId(11) Some("tokio-runtime-worker")) id_valid called
(ThreadId(11) Some("tokio-runtime-worker")) block_in_place is on thread pool
(ThreadId(11) Some("tokio-runtime-worker"), ResetId(2)) block_in_place Reset has been instantiated
(ThreadId(11) Some("tokio-runtime-worker")) id_valid inside block_in_place
(ThreadId(11) Some("tokio-runtime-worker"), ResetId(2)) Reset dropped worker core is_some? true, context core is_some? false
(ThreadId(11) Some("tokio-runtime-worker")) id_valid called
(ThreadId(11) Some("tokio-runtime-worker")) block_in_place is on thread pool
(ThreadId(11) Some("tokio-runtime-worker"), ResetId(3)) block_in_place Reset has been instantiated
(ThreadId(11) Some("tokio-runtime-worker")) id_valid inside block_in_place
(ThreadId(11) Some("tokio-runtime-worker"), ResetId(3)) Reset dropped worker core is_some? false, context core is_some? false
(ThreadId(11) Some("tokio-runtime-worker")) id_valid called
(ThreadId(11) Some("tokio-runtime-worker")) block_in_place is on thread pool
(ThreadId(11) Some("tokio-runtime-worker")) block_in_place no context core
(ThreadId(11) Some("tokio-runtime-worker"), ResetId(4)) block_in_place Reset has been instantiated
(ThreadId(11) Some("tokio-runtime-worker")) id_valid inside block_in_place
(ThreadId(11) Some("tokio-runtime-worker"), ResetId(4)) Reset dropped worker core is_some? false, context core is_some? false
(ThreadId(11) Some("tokio-runtime-worker")) id_valid called
(ThreadId(11) Some("tokio-runtime-worker")) block_in_place is on thread pool
(ThreadId(11) Some("tokio-runtime-worker")) block_in_place no context core
(ThreadId(11) Some("tokio-runtime-worker"), ResetId(5)) block_in_place Reset has been instantiated
(ThreadId(11) Some("tokio-runtime-worker")) id_valid inside block_in_place
(ThreadId(11) Some("tokio-runtime-worker"), ResetId(5)) Reset dropped worker core is_some? false, context core is_some? false
(ThreadId(11) Some("tokio-runtime-worker")) id_valid called
(ThreadId(11) Some("tokio-runtime-worker")) block_in_place is on thread pool
(ThreadId(11) Some("tokio-runtime-worker")) block_in_place no context core
(ThreadId(11) Some("tokio-runtime-worker"), ResetId(6)) block_in_place Reset has been instantiated
(ThreadId(11) Some("tokio-runtime-worker")) id_valid inside block_in_place
(ThreadId(11) Some("tokio-runtime-worker"), ResetId(6)) Reset dropped worker core is_some? false, context core is_some? false
(ThreadId(11) Some("tokio-runtime-worker")) id_valid called
(ThreadId(11) Some("tokio-runtime-worker")) block_in_place is on thread pool
(ThreadId(11) Some("tokio-runtime-worker")) block_in_place no context core
(ThreadId(11) Some("tokio-runtime-worker"), ResetId(7)) block_in_place Reset has been instantiated
(ThreadId(11) Some("tokio-runtime-worker")) id_valid inside block_in_place
(ThreadId(11) Some("tokio-runtime-worker"), ResetId(7)) Reset dropped worker core is_some? false, context core is_some? false
<-- problematic function call end -->
<-- a few seconds later -->
(ThreadId(11) Some("tokio-runtime-worker"), ResetId(0)) Reset dropped worker core is_some? false, context core is_some? false
<-- no panic -->

Everything is the same up until line 19 of each log output where, in the non-panicking output, the there's no worker core (and still not context core).

Somehow that prevents it from panicking. It seems odd to me that the worker's core would ever get put back, outside of Reset::drop.

My branch with the println!-style debugging is here: master...jeromegn:block-cx-core-none

This patch fixes a bug where nesting `block_in_place` with a `block_on` between could lead to a panic. This happened because the nested `block_in_place` would try to acquire a core on return when it should not attempt to do so. The `block_on` between the two nested `block_in_place` altered the thread-local state to lead to the incorrect behavior. The fix is for each call to `block_in_place` to track if it needs to try to steal a core back. Fixes #5239

koivunej added A-tokio Area: The main tokio crate C-bug Category: This is a bug. labels Nov 29, 2022

koivunej mentioned this issue Nov 29, 2022

Tokio based walredo neondatabase/neon#2875

Closed

2 tasks

koivunej mentioned this issue Dec 9, 2022

Blocking throughout pageserver in async contexts neondatabase/neon#2975

Closed

carllerche mentioned this issue Jun 29, 2023

rt: fix nesting block_in_place with block_on #5837

Merged

carllerche closed this as completed in #5837 Jun 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assert failed: cx_core.is_none() #5239

assert failed: cx_core.is_none() #5239

koivunej commented Nov 29, 2022 •

edited

Loading

koivunej commented Nov 29, 2022

Veetaha commented Dec 7, 2022

koivunej commented Dec 7, 2022

koivunej commented Dec 8, 2022

Darksonn commented Dec 8, 2022

koivunej commented Dec 8, 2022 •

edited

Loading

koivunej commented Dec 8, 2022

koivunej commented Dec 8, 2022

Veetaha commented Dec 8, 2022 •

edited

Loading

Darksonn commented Dec 8, 2022

koivunej commented Dec 9, 2022 •

edited

Loading

Darksonn commented Dec 9, 2022

jeromegn commented Jun 29, 2023 •

edited

Loading

jeromegn commented Jun 29, 2023 •

edited

Loading

jeromegn commented Jun 29, 2023 •

edited

Loading

assert failed: cx_core.is_none() #5239

assert failed: cx_core.is_none() #5239

Comments

koivunej commented Nov 29, 2022 • edited Loading

koivunej commented Nov 29, 2022

Veetaha commented Dec 7, 2022

koivunej commented Dec 7, 2022

koivunej commented Dec 8, 2022

Darksonn commented Dec 8, 2022

koivunej commented Dec 8, 2022 • edited Loading

koivunej commented Dec 8, 2022

koivunej commented Dec 8, 2022

Veetaha commented Dec 8, 2022 • edited Loading

Darksonn commented Dec 8, 2022

koivunej commented Dec 9, 2022 • edited Loading

Darksonn commented Dec 9, 2022

jeromegn commented Jun 29, 2023 • edited Loading

jeromegn commented Jun 29, 2023 • edited Loading

jeromegn commented Jun 29, 2023 • edited Loading

koivunej commented Nov 29, 2022 •

edited

Loading

koivunej commented Dec 8, 2022 •

edited

Loading

Veetaha commented Dec 8, 2022 •

edited

Loading

koivunej commented Dec 9, 2022 •

edited

Loading

jeromegn commented Jun 29, 2023 •

edited

Loading

jeromegn commented Jun 29, 2023 •

edited

Loading

jeromegn commented Jun 29, 2023 •

edited

Loading