-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
per-thread executors #87
Comments
Hey, @actix lead here. The runtime strategy used in Actix Web fits this description. From the actix-rt docs:
In short, we spin up worker threads and start a single threaded Tokio runtime for each. Then spawn the server future (and therefore most user code) onto a This works well as our strategy (and worked especially in the early days when Tokio's multi-threaded story was not as mature as it is now).
Since work is not stolen, it is even more important to not block the thread inside request handlers than in hyper or warp. Thankfully, the solution is a blocking task threadpool which is provided by Tokio, too. It could be argued that increasing pressure to use it for expensive, blocking work actually produces more carefully considered, explicitly delimited code rather than inadvertantly using work stealing as a crutch. Anecdotally, the benefits of supporting !Send futures goes beyond the theoretical performance increase from reducing atomic synchronisation and into developer experience, too, since no concerns about handler Send-ness arise. In general, atomics and synchronisation are opt-in and we believe it is a good approach. A long term goal of the project may be to support a strictly single threaded mode; in which case Send futures are useless. It should be noted that this strategy extends beyond web. The original Actix actor framework is still maintained, uses some of the same runtime primitives, and is a general solution to concurrent task problems. |
Thanks! |
So far, I have mostly been using a single single-threaded executor/runtime per core, for example in a traffic simulation that was simulating a large number of individual cars (modelled as async processes) or in a distributed task system that was handling scheduling of tasks, data movement, worker registration, etc. all on a single thread. For me, async is a way to elegantly cooperate between multiple independent "sub-processes" (let's say actors) inside a single process first, and a way to get potentially more performance second. That being said, I feel that having multiple single-threaded executors can sometimes even improve performance, but it's not the main reason why I think that this use case should be supported on a first-class basis. If I don't need multiple-thread work-stealing (and so far I have mostly almost never needed it), having a multi-threaded runtime is just a nuisance because of the Thanks for the async design work btw! It's great to open up a dialogue about these things. |
Kind of every single successful async framework since the inception of eventloops :-) libevent, libev, libuv, boost asio, GTK, QT, seastar, nginx, javascript + node.js, dpdk, netty, grizzly, dart, etc. Besides Rust the main frameworks which tried to do move tasks between executors are Go and C#'s Threadpool executor (although I think the ASP.NET default executor might have fixed threads). Therefore the state of the world is actually more that Rust would need to prove that its approach of defaulting to thread-safe is a viable alternative than questioning the effectiveness of single-threaded eventloops. Their effectiveness in terms of reducing context switches and guaranteeeing good cache hit rates was more or less what triggered people to move towards the model, despite the ergonomic challenges of using callbacks. |
Hi I am the author of glommio, a thread-per-core runtime for rust that recently got some attention. Prior to rust I was involved in the Scylla C++ database, which also uses a thread-per-core model. Glommio is now being used at my employer to hopefully move some older systems from languages like go to rust, and it's been working quite well. My main complaint at the moment is that even some crates like hyper that technically allow you to replace your executor, will at some point assume that the executor is The thoughts behind a thread-per-core executor is that atomic operations being expensive, if you can get rid of them altogether, you win when running in large instances where contention can easily become an issue: See for instance the following 3rd-party comparison: https://github.com/fujita/greeter |
https://github.com/uazu/stakker @uazu stakker is also per-thread which is similar to actix |
Yes, Stakker is a runtime per thread. I found that I had to implement my own wake-up mechanism instead of using Waker through Context to not have synchronization costs for same-thread wakeups. I can see the point of having access to a cross-thread Waker, though, so that futures that need to offload work can start threads or send inter-thread messages and then pass data back -- just that that wouldn't be the most common case. I'd expect most wakeups to be local as async/await code interacts with actor code. I didn't consider that Context is Send+Sync. Does it really need to be? That could be an obstacle to adding things later on. |
I'm not sure whether this is the right place for this comment, but another use I see (and have) for per-thread executors is in the game context. For example, one would like to be able to define tutorials and even maybe unit behaviours (e.g. in a RTS) using async syntax but one might not want to have multi-threading there (for the sake of predictable behaviour). Also, if the game targets WASM one might not be able to use multi-threading without significant hacks (at least for now). If the game code has to run on servers (for network games), one might want to have multiple instances of async executors independently and have no global variables associated with these executors. I do not know how much the performance gain would be for such use cases if able to be written with explicitly having single-threaded constraints, for sure it would make them easier to write and possibly reduce the amount of user-written unsafe needed. |
Removing Send+Sync would considerably improve the pollster implementation. Currently it needs to allocate an Arc and use thread-safe operations to update its readiness state (maybe it doesn't need the full Mutex + Condvar it has now, but certainly something more expensive than the Cell it should need), and if it were used improperly from other threads it would actually be an error. I believe this would be completely fixed by removing the Send+Sync requirement--no Arc, and could use Cell for its interior mutable state. I understand that removing Send is more controversial than removing Sync; however, I'm pretty sure all the arguments for having Sync but not Send don't work currently, due to Waker implementing Clone and requiring in its contract that wakeup wake the original task (BTW, can someone explain to me why the lack of a lifetime on Waker combined with the existence of Clone don't make any lifetime-bounded Waker data invalid as well and implicitly add a Without the Clone instance, or with a looser requirement on wakeup (e.g. it might fail to wake up any task, but if it does wake up something it must be the original task), I think none of this would not be true. Does anyone actually rely on all of the "must always wake up the original task" behavior, the Clone instance, and expect to work for generic Waker, and do all of this for safety, rather than liveness? It seems pretty infeasible to me that this would be the case since what waking up a task does is mostly unspecified for arbitrary Waker. Also, is the description of wake combined with Clone already not already strong enough that most Waker implementations don't actually satisfy it? It's certainly not true that if you clone a waker and call wake() on it, the original task's resources necessarily get dropped, but that's what seems to be implied by these two lines: For clone:
For wake:
A similar issue affects My basic thoughts here are: while changing some of this language is probably a good idea (safety requirements should be as conservative as possible until proven to actually provide optimizations) and would technically "fix" the problem in some cases, especially in combination with !Sync, it would still not really solve the spirit of the problem. Arc could become Box<Cell<_>>, but we (1) still have an extra allocation, and (2) it's there to make sure Clone stays safe and the thing is still sendable; yet this not only "solves" a case we don't want to happen (notifying from another thread for a single-threaded executor), it also makes cloning the waker not even work properly on the original thread! This hardly seems like the desired outcome to me: what we want is (As an aside, could adding a second lifetime to the Context object--the lifetime of the Waker itself--together with a lifetime to Waker, and defaulting both to (Removing Clone would be marginally less invasive in terms of types, but of course many places actually call clone directly on a Waker--presumably most/all of them are built together with an executor and know the task was spawned by the right one, but I don't know for sure). Either way, IMO, the current implementation is just so specifically written to assume Arc that it is virtually pretty much impossible to satisfy if implemented in any other way, short of using some sort of sentinel value and doing the actual wakeup work in a thread local or static (which can be hard to do safely and efficiently, and introduces runtime panics for no real reason). I definitely understand why a Send Context can be desirable in some cases, and even more why Clone / static lifetime are, but I think in practice direct calls to poll with executors complex enough to move tasks around, clone wakers, etc., are closely tied to the particular executor, which should be able to wrap the Context in a |
I'd add one more thing--I don't fully understand how the original LocalWaker proposals was expected to work, but would it have been necessary to basically implement a separate Future type for Send and !Send versions of the same async executor? That seems suboptimal... It seems to me that the main problem with removing Send is that in essentially all cases, when you store a cloned Waker (regardless of whether you're familiar with the executor or not), you want your (necessarily hand-rolled) Future to be polymorphic in the Send-ness of the Waker you're called with. Unfortunately you have absolutely no idea about that because this information isn't carried by anything--Waker isn't polymorphic in an executor type (which is in a deep sense the root of most problems with Waker design) so there's nothing to carry that information. Moreover, there's no legal way to store something like a reference to a Waker in your type, not even with the helpful I'm sure this is covered in the original RFC, but why not parameterize Futures by their Waker (and/or some phantom type) rather than do everything via virtual dispatch? Not for performance reasons (Waker does virtual dispatch itself), but for type-carrying reasons. This wouldn't hurt the composition of Future at all in practice because literally everyone who could be would be totally parametric over the Waker type... it would just allow futures that store wakers to be parametric in things like Send, Sync, Clone, and lifetime, rather than the current case where we're restricted to the union of what all executor implementations need. And it feels like it would be something that could be done in a Rust Edition by moving all uses of I guess the reason the above approach was not taken is due to then having to infer the Waker generic parameter for In any case, I think giving an error in the situation where you use two different Futures in the same async/await block that for some reason require totally different, explicit |
(NOT A CONTRIBUTION) Contrary to my opinions in 2018, nowadays I do think it would be preferable to support single threaded wakers if possible, for users who have strong certainty that their workloads benefit from a shared-nothing thread-per-executor architecture. The fact that Context implements Send and Sync, making this impossible, is a glaring mistake that we did not think hard enough about. I am in favor of at least testing the waters on a breaking change to fix this (i.e. cratering and issuing a lint now to determine who would be broken). The fundamental idea as I understand is that Context would get a I think as time goes on, it becomes more and more likely that this breakage will be too much to stomach. I think the async team should take this seriously and move quickly to assess whether a breakage is possible. I expressed this opinion privately to a few people more than a year ago. Since this has such a strong time component, I'm disappointed there hasn't been faster movement on it. |
(NOT A CONTRIBUTION)
This was also a fair question. To be honest, this choice was inherited from the pre-async/await version of Future. Then, I think the main benefit was that it was anticipated there would be a lot of people writing Nowadays, |
Brief summary
i've heard a number of people cite per-thread executors as a good model for high efficiency. There are issues like
Context
implementing Send+Sync. Are people doing this? Does it work? :)Optional details
The text was updated successfully, but these errors were encountered: