-
Notifications
You must be signed in to change notification settings - Fork 480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RayService] e2e for check the readiness of head Pods for both pending / active clusters #2806
[RayService] e2e for check the readiness of head Pods for both pending / active clusters #2806
Conversation
2f29285
to
c314ca1
Compare
g.Eventually(func(g Gomega) int { | ||
heads, err := test.Client().Core().CoreV1().Pods(namespace.Name).List(test.Ctx(), metav1.ListOptions{ | ||
LabelSelector: "ray.io/serve=true", | ||
}) | ||
g.Expect(err).NotTo(HaveOccurred()) | ||
return len(heads.Items) | ||
}, TestTimeoutShort).Should(Equal(2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we manually delete the label from the active cluster before the upgrade, this condition ensures we check the readiness for both active and pending clusters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline:
- Create a RayService
- Wait until the active cluster to be running, and has X endpoints
- Trigger zero downtime upgrade. Pending cluster should take a long time to ready (e.g. add a init container to sleep)
- Make the proxy actor on the active cluster's head Pod fail and can't receive requests.
- Check the number of endpoints become
X-1
.
serveConfig = strings.Replace(serveConfig, "factor: 5", "factor: 3", -1) | ||
|
||
// modify EnableInTreeAutoscaling to trigger a zero downtime upgrade. | ||
rayService.Spec.RayClusterSpec.EnableInTreeAutoscaling = ptr.To[bool](true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RayVersion seems to be a safer option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The newly added init container can already trigger the zero downtime upgrade.
…g / active clusters Signed-off-by: Rueian <[email protected]>
4bd018c
to
ba5bea2
Compare
@@ -24,10 +24,6 @@ func TestRayServiceInPlaceUpdate(t *testing.T) { | |||
|
|||
rayServiceAC := rayv1ac.RayService(rayServiceName, namespace.Name).WithSpec(rayServiceSampleYamlApplyConfiguration()) | |||
|
|||
// TODO: This test will fail on Ray 2.40.0. Pin the Ray version to 2.9.0 as a workaround. Need to remove this after the issue is fixed. | |||
rayServiceAC.Spec.RayClusterSpec.WithRayVersion("2.9.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2.41.0 works now.
…g / active clusters Signed-off-by: Rueian <[email protected]>
ba5bea2
to
34e10d4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I will open a follow up PR if needed. |
Why are these changes needed?
Resolves #2787
Related issue number
Checks