You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
calling the the /server/status/alive and/or /server/status/ready healthcheck endpoints makes an RPC to the compiler pool, and if the system is overly busy, that RPC may wait in a queue before it's serviced, potentially causing the healthcheck to time out
desired behavior:
make the check non-blocking so that it won't time out when the system is under heavy load
sketch of a design:
keep a timestamp of the most recent successful RPC call made to the compiler (time.perf_counter_ns is fine for this, because we don't care about calendar time, just elapsed time)
in the healthcheck handler, if that timestamp is less than N seconds old (1 <= N <= 60, I'd suggest 10), then treat that as "compiler pool is reachable". if it's more than N seconds old, then follow the current logic of making an RPC to check.
very-nice-to-have:
backportable to 5.x to allow patching existing Cloud instances without requiring an upgrade to 6.x
The text was updated successfully, but these errors were encountered:
current behavior:
calling the the
/server/status/alive
and/or/server/status/ready
healthcheck endpoints makes an RPC to the compiler pool, and if the system is overly busy, that RPC may wait in a queue before it's serviced, potentially causing the healthcheck to time outdesired behavior:
make the check non-blocking so that it won't time out when the system is under heavy load
sketch of a design:
keep a timestamp of the most recent successful RPC call made to the compiler (
time.perf_counter_ns
is fine for this, because we don't care about calendar time, just elapsed time)in the healthcheck handler, if that timestamp is less than N seconds old (1 <= N <= 60, I'd suggest 10), then treat that as "compiler pool is reachable". if it's more than N seconds old, then follow the current logic of making an RPC to check.
very-nice-to-have:
backportable to 5.x to allow patching existing Cloud instances without requiring an upgrade to 6.x
The text was updated successfully, but these errors were encountered: