-
Notifications
You must be signed in to change notification settings - Fork 734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Upgrading from ECK 2.1.0
to 2.2.0
Causes issues with Kibana and Fleet during Rolling Restart
#5684
Comments
This issue also impact the connectiong between Elasticsearch and Kibana |
same thing on EKS :( chart version 2.2.0 Kibana says:
|
While we are looking into a fix for the next release. There is currently two things you can do if you are affected:
For this second approach: For Kibana:
NAME=$WHATEVER_NAME_YOU_USED_FOR_THE_ELASTICSEARCH_CLUSTER
PW=$(kubectl get secret "$NAME-es-elastic-user" -o go-template='{{.data.elastic | base64decode }}')
# assuming you run this from outside the k8s cluster
kubectl port-forward service/$NAME-es-http 9200
curl -k -X POST "https://localhost:9200/_security/service/elastic/kibana/credential/token/issue-5684
This will quickly roll out a new Kibana replica set with a working configuration. For Fleet ServerSimilar steps as above but different service account token:
And configuration of Fleet Server needs to use an environment variable:
The reason why this restores availability quickly is because these API based tokens are immediately available through the Elasticsearch cluster and rolling out a new Kibana or Fleet Server instance is usually quick unless you have many Kibana instances running. |
We have a fix in #5830 which shipped with ECK 2.3 so I am closing this issue for now and will also update our known issue in the documentation for ECK 2.2 |
Bug Report
What did you do?
I upgraded ECK from
2.1.0
to2.2.0
What did you expect to see?
I expected a rolling upgrade to happen (related #5648).
What did you see instead? Under which circumstances?
I saw a rolling upgrade, but I saw Kibana and Fleet stuck in crash loops for a good duration of the rolling restart time.
Environment
ECK version:
2.1.0
->2.2.0
Kubernetes information:
v1.22.8+rke2r1
Resource definition:
Setup a "large" cluster across multiple availability zones (AZs), and the ClusterIP service point to a subset of nodes across the different AZs.
Logs:
Relevant Kibana Log
Issue/Bug:
It appears that with the switch to service accounts for Kibana and Fleet server (#5468) an issue that happens is that it can take a decent amount of time in large clusters to be to have all of the ingress nodes in the cluster be available with the new service accounts to be able to successfully authenticate.
In large clusters where a rolling restart can take hours, this can leave Kibana and Fleet unusable for some time.
Steps to Reproduce:
(Note: I'm providing a setup similar to mine as I know its reproducible with it, but it might also be reproducible with a smaller setup.
2.1.0
2.2.0
I would expect ECK to either not start a rolling restart of Kibana and Fleet server until after the rolling restart of Elasticsearch cluster has completed, or that it doesn't switch over to using the new service account authentication method until after the rolling restart is completed.
The text was updated successfully, but these errors were encountered: