Stricter notion of esReacheable: require health response #5796

pebrc · 2022-06-17T16:50:16Z

Simplest (?) possible fix without doing extra requests. The goal is avoid confusing error messages like "could not verify license" but instead have a meaningful status like "Elasticsearch is not reachable". The implication here is that esReachable is now tied to the first successful observation of cluster health. Those happen asynchronously in the observer mechanism and are not done inside the reconciliation loop. I think that is OK as the cluster is not available immediately but with a certain delay anyway, depending on hardware and the time it takes for the ES container in the Pods to become ready. If the timing is unfortunate we might lose an additional 10 seconds (which is the current observation interval) which seems acceptable to me.

pebrc · 2022-06-20T12:08:18Z

run/e2e-tests tags=es

barkbay

Thanks for raising this PR, I agree that the current status may be confusing.
~~I'm must confess however I'm a bit hesitant regarding the way to improve things as there can be 2 reasons for having an "unknown" state:~~

~~The operator has not attempted to reach the cluster yet, in which case I would expect the condition to be corev1.ConditionUnknown. We don't really no if Elasticsearch is not responding to requests.~~
The operator attempted to get the cluster state, but an error occurred (time out, credentials issue...). Health is still unknown but we now have an error to explain why. The condition should then be corev1.ConditionFalse, with maybe the error as the condition message.

Edit: Sorry I was focusing on the code while working on the PR and forgot your comment 🤦

~~That being said you can argue that the observer should attempt a first connection pretty quickly once the operator is started or when a cluster is created, so may be I'm overthinking...~~

pkg/controller/elasticsearch/driver/driver.go

barkbay

If the timing is unfortunate we might lose an additional 10 seconds (which is the current observation interval) which seems acceptable to me.

Same. We could still consider it as a first step and improve it later if needed.

Co-authored-by: Michael Morello <[email protected]>

pebrc · 2022-06-24T14:13:06Z

cla/check

The goal is avoid confusing error messages like "could not verify license" but instead have a meaningful status like "Elasticsearch is not reachable". The implication here is that esReachable is now tied to the first successful observation of cluster health. Those happen asynchronously in the observer mechanism and are not done inside the reconciliation loop. I think that is OK as the cluster is not available immediately but with a certain delay anyway, depending on hardware and the time it takes for the ES container in the Pods to become ready. If the timing is unfortunate we might lose an additional 10 seconds (which is the current observation interval) which seems acceptable to me. Co-authored-by: Michael Morello <[email protected]>

stricter notion of esReacheable: require health response

7b893ce

botelastic bot added the triage label Jun 17, 2022

pebrc added the >enhancement Enhancement of existing functionality label Jun 17, 2022

botelastic bot removed the triage label Jun 17, 2022

pebrc added triage v2.4.0 labels Jun 17, 2022

botelastic bot removed the triage label Jun 17, 2022

pebrc marked this pull request as ready for review June 22, 2022 06:41

pebrc marked this pull request as draft June 22, 2022 06:50

adjust status message depending on reachability

ddf0a00

pebrc marked this pull request as ready for review June 22, 2022 08:52

barkbay reviewed Jun 23, 2022

View reviewed changes

pkg/controller/elasticsearch/driver/driver.go Outdated Show resolved Hide resolved

barkbay approved these changes Jun 23, 2022

View reviewed changes

pebrc and others added 2 commits June 24, 2022 09:12

Update pkg/controller/elasticsearch/driver/driver.go

e6e0e91

Co-authored-by: Michael Morello <[email protected]>

add unit test

d393c24

pebrc merged commit 444de57 into elastic:main Jun 24, 2022

This was referenced Jan 4, 2023

Cluster license cannot be updated by ECK if it has already expired #6274

Closed

Try to reconcile license even in absence of known health status #6278

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stricter notion of esReacheable: require health response #5796

Stricter notion of esReacheable: require health response #5796

pebrc commented Jun 17, 2022 •

edited

Loading

pebrc commented Jun 20, 2022

barkbay left a comment •

edited

Loading

barkbay left a comment

pebrc commented Jun 24, 2022

Stricter notion of esReacheable: require health response #5796

Stricter notion of esReacheable: require health response #5796

Conversation

pebrc commented Jun 17, 2022 • edited Loading

pebrc commented Jun 20, 2022

barkbay left a comment • edited Loading

Choose a reason for hiding this comment

barkbay left a comment

Choose a reason for hiding this comment

pebrc commented Jun 24, 2022

pebrc commented Jun 17, 2022 •

edited

Loading

barkbay left a comment •

edited

Loading