upgrade to Tokio v1.x ecosystem #11475

tdyas · 2021-01-18T02:04:04Z

Problem

Stay current on Tokio / Tonic / Prost / Hyper ecosystem.

Solution

Upgrade to the Tokio v1.x ecosystem. A minimum of Tokio v1.4 is required in order to have Handle::block_on available for use by the task_executor crate.

Result

Existing tests pass.

tdyas · 2021-01-18T02:05:10Z

This compiles but has some tests failing in process_execution. There are some questions that need answering before proceeding to finalize this. Will tag appropriate people.

tdyas · 2021-01-18T02:06:08Z

Prerequisite is a release of the nails crate with Tokio 1.0 ecosystem support from stuhood/nails#2.

src/rust/engine/nailgun/src/tests.rs

src/rust/engine/process_execution/Cargo.toml

tdyas · 2021-01-18T02:17:27Z

src/rust/engine/process_execution/src/local.rs

+use ouroboros::self_referencing;
+
+#[self_referencing]
+struct OwnedChild {
+  child: Box<Child>,
+  #[borrows(mut child)]
+  #[not_covariant]
+  exit_stream: BoxStream<'this, Result<ChildOutput, std::io::Error>>,
+}
+
+impl Stream for OwnedChild {
+  type Item = Result<ChildOutput, std::io::Error>;
+
+  fn poll_next(
+    mut self: Pin<&mut Self>,
+    cx: &mut std::task::Context<'_>,
+  ) -> Poll<Option<Self::Item>> {
+    self.with_exit_stream_mut(|es| Pin::new(es).poll_next(cx))
+  }
+}
+


tokio::process::Child no longer implements Future in Tokio 1.0. Instead, Child exposes a wait instance method. The problem is that wait takes &mut self and so the original version of exit_stream thus mutably borrows self and so cannot be returned from run_in_workdir since the borrow checker rejects the code since Child does not survive the call to run_in_workdir.

The solution is to have Child survive that call. This code uses the ouroboros crate which has a proc macro that allows for a limited form of self-referential structs. The Child struct is owned by this struct and this struct is what comprises part of the stream returned from run_in_workdir. The exit_stream part of the return value from run_in_workdir is also embedded in this struct using the ouroboros crate.

Finally, the solution implements Stream on OwnedChild so that it exposes the same stream as exit_stream but as a first-class owner of the embedded Child.

cc @stuhood

So, afaik, the suggested way to do this is to call child.stdout.take() (which is identical to how you do this with the stdlib Child type). See the stdout/stderr fields in the stdlib docs https://doc.rust-lang.org/std/process/struct.Child.html for more info. You can then call wait safely.

The issue is not with child.stdout nor with child.stderr. The stdout_stream and stderr_stream streams take ownership of those respective values and are not triggering the borrow checker. See the construction of stdout_stream and stderr_stream where the code already calls child.stdout.take().unwrap() and child.stderr.take().unwrap() respectively.

The issue is with exit_stream. In tokio v0.2.x, Child implements Future and can be used directly to await the exit status of the child process. The borrow checker was not triggered in run_in_workdir previously because exit_stream took ownership of the underlying Future a/k/a the Child struct. Thus returning exit_stream from run_in_workdir implicitly passed ownership of Child out of run_in_workdir because exit_stream owned it.

In tokio v1.x, that is not the case, Child no longer implements Future and one must call .wait to obtain a Future that can be used to await the child's exit status. .wait takes &mut self and thus borrows Child mutably within the call to run_in_workdir. Unlike in tokio v0.2.x, the borrow checker triggers in this case because exit_stream now only has the &mut self reference to Child (via .wait) and not ownership of Child like it did previously. Thus the borrow checker triggers because the Child is dropped when run_in_workdir returns.

The solution is to let Child escape run_in_workdir. However, the code should also keep the Future returned from .wait alive (and its &mut self reference) so that we only construct exit_stream once. This is accomplished by using the ouroboros crate to safely maintain the self-reference between Child and exit_stream.

And to make things easier, the OwnedChild struct implements Stream so that the details are hidden of how exit_stream and the Child struct are stored.

Hm. Ok. I expect that moving forward we should move away from the "unified stream of stdout and exit code" interface here, as TODO'd in: https://github.com/pantsbuild/pants/pull/11370/files#r547623793. nails did to ease lifetime issues, and it seemed to work fine. Happy to land this as is if we preserve the TODO though.

Fine by me. I'll preserve the TODO.

src/rust/engine/process_execution/src/local.rs

tdyas · 2021-01-18T02:21:29Z

src/rust/engine/task_executor/src/lib.rs

+  // TODO(tokio-1.0): Figure out thread count configuration in Pants.
+  pub fn new_owned(_core_threads: usize, num_threads: usize) -> Result<Executor, String> {
+    let runtime = Builder::new_multi_thread()
+      .worker_threads(num_threads)
+      .max_blocking_threads(num_threads)


The thread configuration for the multi-thread runtime changed in v1.0. Tokio no longer dynamically scales the number of threads as it does in prior versions. Instead, it will just spin up worker_threads threads and ensure that only max_blocking_threads are used for blocking work. (At least that is my naive take on the matter.)

The questions are:

What API should we be exposing for PyExecutor and related types?

The default number of threads now is the number of available cores. Should we just trust the Tokio defaults now and not perform manual configuration in the general case?

According to the discussion on tokio-rs/tokio#2802 this is now worker_threads + max_blocking_threads? Commented to confirm though: tokio-rs/tokio#2802 (comment)

Your understanding was confirmed: tokio-rs/tokio#2802 (comment)

We should probably change the API here to just pass-through the underlying concepts. Although we might want to make setting the number of threads optional so that we can allow Tokio to just use the number of cores.

Thoughts?

Possibly. I think that one issue with passing these concepts through directly is that we may shift "what" is blocking vs async over time, and so I don't know how we would explain what needs blocking threads.

A punt for now might be to set worker_threads to our existing "core threads", and compute max_blocking_threads from the difference between our core and max, with a minimum value of 1 blocking thread. Not ideal, but.

How should we express the concept of letting Tokio choose the worker_threads value while still retaining the ability to configure the number of blocking threads? Just do max-core and only use that value?

Tokio no longer dynamically scales the number of threads as it does in prior versions.

To be clear, the only thing that has changed here is how you set the upper limit on blocking threads. Just like in the previous versions, the number of core threads is fixed, and the number of blocking threads is dynamicly scaled until it reaches the limit.

tdyas · 2021-01-18T02:23:56Z

cc @stuhood @Eric-Arellano @illicitonion

Low priority. The PR has some questions to be answered in the coming week, which will require some thought on our part, so not marking for review yet.

src/rust/engine/task_executor/src/lib.rs

stuhood · 2021-01-20T19:23:07Z

nails 0.12.0 is released, using tokio 1.x: https://crates.io/crates/nails/0.12.0 -- thanks for prepping that!

stuhood

Thanks a lot!

src/rust/engine/async_latch/src/lib.rs

src/rust/engine/nailgun/src/tests.rs

src/rust/engine/process_execution/Cargo.toml

stuhood · 2021-01-20T19:39:30Z

src/rust/engine/process_execution/src/local.rs

+use ouroboros::self_referencing;
+
+#[self_referencing]
+struct OwnedChild {
+  child: Box<Child>,
+  #[borrows(mut child)]
+  #[not_covariant]
+  exit_stream: BoxStream<'this, Result<ChildOutput, std::io::Error>>,
+}
+
+impl Stream for OwnedChild {
+  type Item = Result<ChildOutput, std::io::Error>;
+
+  fn poll_next(
+    mut self: Pin<&mut Self>,
+    cx: &mut std::task::Context<'_>,
+  ) -> Poll<Option<Self::Item>> {
+    self.with_exit_stream_mut(|es| Pin::new(es).poll_next(cx))
+  }
+}
+


So, afaik, the suggested way to do this is to call child.stdout.take() (which is identical to how you do this with the stdlib Child type). See the stdout/stderr fields in the stdlib docs https://doc.rust-lang.org/std/process/struct.Child.html for more info. You can then call wait safely.

stuhood · 2021-01-20T19:47:15Z

src/rust/engine/task_executor/src/lib.rs

+  // TODO(tokio-1.0): Figure out thread count configuration in Pants.
+  pub fn new_owned(_core_threads: usize, num_threads: usize) -> Result<Executor, String> {
+    let runtime = Builder::new_multi_thread()
+      .worker_threads(num_threads)
+      .max_blocking_threads(num_threads)


According to the discussion on tokio-rs/tokio#2802 this is now worker_threads + max_blocking_threads? Commented to confirm though: tokio-rs/tokio#2802 (comment)

src/rust/engine/task_executor/src/lib.rs

tdyas · 2021-01-22T22:06:37Z

@stuhood: Regarding the use of block_on, do we have a definitive list of the contexts where Pants uses block_on? And is it the case that those call sites don't have access to a Runtime or even a Handle to use spawn_blocking or tokio::task::block_in_place (assuming it is relevant)?

I'm trying to see if there is a way to work around the lack of block_on on Handle.

tdyas · 2021-03-17T07:50:20Z

Rebased and update PR for latest master. Apparently tokio-rs/tokio#3569 will add block_on back to Handle, but has not landed in Tokio yet.

tdyas · 2021-03-20T20:37:03Z

Rebased again and updated Tokio to be a minimum of v1.4 which includes the change to add block_on back to Handle. (This is also all squashed down to a single commit. I recommend reviewing anew.)

stuhood

Thanks! One block_on to restore, but pre-shipping for once that is fixed.

src/rust/engine/async_latch/Cargo.toml

src/rust/engine/nailgun/src/tests.rs

src/rust/engine/process_execution/src/local.rs

[ci skip-build-wheels]

stuhood

Thanks again!

[ci skip-build-wheels]

tdyas commented Jan 18, 2021

View reviewed changes

tdyas commented Jan 19, 2021

View reviewed changes

src/rust/engine/task_executor/src/lib.rs Outdated Show resolved Hide resolved

stuhood mentioned this pull request Jan 20, 2021

Replacement for Handle::current tokio-rs/tokio#2965

Closed

stuhood reviewed Jan 20, 2021

View reviewed changes

tdyas force-pushed the tokio_v1.0_upgrade branch from 1be7ed1 to 2090210 Compare March 17, 2021 07:49

Base automatically changed from master to main March 19, 2021 19:20

tdyas force-pushed the tokio_v1.0_upgrade branch from 2090210 to 0a4b387 Compare March 20, 2021 20:32

tokio 1.x upgrade

af8b85b

tdyas force-pushed the tokio_v1.0_upgrade branch from 0a4b387 to af8b85b Compare March 20, 2021 20:35

tdyas marked this pull request as ready for review March 20, 2021 20:35

tdyas requested review from stuhood, Eric-Arellano and illicitonion March 20, 2021 20:44

tdyas changed the title ~~Tokio v1.0 upgrade~~ upgrade to Tokio v1.x ecosystem Mar 20, 2021

stuhood approved these changes Mar 20, 2021

View reviewed changes

src/rust/engine/async_latch/Cargo.toml Outdated Show resolved Hide resolved

src/rust/engine/nailgun/src/tests.rs Outdated Show resolved Hide resolved

src/rust/engine/process_execution/src/local.rs Outdated Show resolved Hide resolved

Tom Dyas added 4 commits March 20, 2021 14:08

fix fs/brfs crate

c3c869f

caret specifier is implicit

ae339a5

use block_on again in nailgun tests

adb6b2b

[ci skip-build-wheels]

don't use ouroboros crate -- closure fun

d2030eb

[ci skip-build-wheels]

stuhood approved these changes Mar 20, 2021

View reviewed changes

add missing dep

ca88095

[ci skip-build-wheels]

tdyas merged commit 0f266e7 into pantsbuild:main Mar 20, 2021

tdyas deleted the tokio_v1.0_upgrade branch March 20, 2021 23:49

stuhood mentioned this pull request Mar 25, 2021

Prepare 2.4.0rc0. #11800

Merged

Eric-Arellano mentioned this pull request Apr 20, 2021

CI sometimes hangs when writing to the remote cache #11908

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

upgrade to Tokio v1.x ecosystem #11475

upgrade to Tokio v1.x ecosystem #11475

tdyas commented Jan 18, 2021 •

edited

Loading

tdyas commented Jan 18, 2021

tdyas commented Jan 18, 2021

tdyas Jan 18, 2021

stuhood Jan 20, 2021

tdyas Jan 21, 2021

stuhood Jan 21, 2021

tdyas Jan 21, 2021

tdyas Jan 18, 2021

stuhood Jan 20, 2021

tdyas Jan 21, 2021

stuhood Jan 21, 2021

tdyas Jan 22, 2021

Darksonn Mar 17, 2021

tdyas commented Jan 18, 2021

stuhood commented Jan 20, 2021 •

edited

Loading

stuhood left a comment

stuhood Jan 20, 2021

stuhood Jan 20, 2021

tdyas commented Jan 22, 2021

tdyas commented Mar 17, 2021

tdyas commented Mar 20, 2021

stuhood left a comment

stuhood left a comment

upgrade to Tokio v1.x ecosystem #11475

upgrade to Tokio v1.x ecosystem #11475

Conversation

tdyas commented Jan 18, 2021 • edited Loading

Problem

Solution

Result

tdyas commented Jan 18, 2021

tdyas commented Jan 18, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tdyas commented Jan 18, 2021

stuhood commented Jan 20, 2021 • edited Loading

stuhood left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tdyas commented Jan 22, 2021

tdyas commented Mar 17, 2021

tdyas commented Mar 20, 2021

stuhood left a comment

Choose a reason for hiding this comment

stuhood left a comment

Choose a reason for hiding this comment

tdyas commented Jan 18, 2021 •

edited

Loading

stuhood commented Jan 20, 2021 •

edited

Loading