Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make /server/status/ healthcheck endpoints non-blocking with respect to the compiler pool #8355

Open
zackelan opened this issue Feb 20, 2025 · 0 comments
Assignees

Comments

@zackelan
Copy link
Contributor

current behavior:

calling the the /server/status/alive and/or /server/status/ready healthcheck endpoints makes an RPC to the compiler pool, and if the system is overly busy, that RPC may wait in a queue before it's serviced, potentially causing the healthcheck to time out

desired behavior:

make the check non-blocking so that it won't time out when the system is under heavy load

sketch of a design:

  • keep a timestamp of the most recent successful RPC call made to the compiler (time.perf_counter_ns is fine for this, because we don't care about calendar time, just elapsed time)

  • in the healthcheck handler, if that timestamp is less than N seconds old (1 <= N <= 60, I'd suggest 10), then treat that as "compiler pool is reachable". if it's more than N seconds old, then follow the current logic of making an RPC to check.

very-nice-to-have:

backportable to 5.x to allow patching existing Cloud instances without requiring an upgrade to 6.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants