-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pageserver: spawning walredo process is slow #6565
Labels
c/storage/pageserver
Component: storage: pageserver
t/bug
Issue Type: Bug
triaged
bugs that were already triaged
Comments
problame
added a commit
that referenced
this issue
Feb 1, 2024
… code The rust stdlib uses the efficient `posix_spawn` by default. However, before this PR, pageserver used `pre_exec()` in our `close_fds()` ext trait. This PR moves the work that `close_fds()` did to the walredo C code. I verified manually that we're now forking out the walredo process using `posix_spawn`. refs #6565
problame
added a commit
that referenced
this issue
Feb 1, 2024
… code The rust stdlib uses the efficient `posix_spawn` by default. However, before this PR, pageserver used `pre_exec()` in our `close_fds()` ext trait. This PR moves the work that `close_fds()` did to the walredo C code. I verified manually that we're now forking out the walredo process using `posix_spawn`. refs #6565
problame
added a commit
that referenced
this issue
Feb 1, 2024
…C code (#6574) The rust stdlib uses the efficient `posix_spawn` by default. However, before this PR, pageserver used `pre_exec()` in our `close_fds()` ext trait. This PR moves the work that `close_fds()` did to the walredo C code. I verified manually using `gdb` that we're now forking out the walredo process using `posix_spawn`. refs #6565
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
c/storage/pageserver
Component: storage: pageserver
t/bug
Issue Type: Bug
triaged
bugs that were already triaged
Problem
On some pageservers we see >1s times to spawn the process.
Investigation Results
Customer investigation https://neondb.slack.com/archives/C033RQ5SPDH/p1706787518630459?thread_ts=1706774416.482029&cid=C033RQ5SPDH
walredo spawn latency is bimodal on most pageservers: some spawns are fast, taking tens of milliseconds, others asre slow, taking multiple seconds
even though rust stdlib uses the efficient
posix_spawn
by default, we don't use it on pageservers because we usepre_exec()
inclose_fds()
DoD
Plan
Explore whether we can us
posix_spawn
; if so, ship to staging and observe whether it is a sufficient improvement. We can move theclose_fds
work into walredo startup, where we still trust the process.If posix_spawn can't be used, implement a sidecar "spawner" process that pageserver asks to spawn walredo processes.
NB: we decide against a pool of pre-spawned walredo processes as the amoutn of CPU wasted on the inefficient
fork()
call is significant.Background Reading
Work
Solve The Issue
Follow-Ups
Spin-Offs (no need to complete before closing)
close_range
#6646The text was updated successfully, but these errors were encountered: