Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

timeouts preventing installer from completing properly #843

Closed
kikisdeliveryservice opened this issue Dec 8, 2018 · 10 comments
Closed

timeouts preventing installer from completing properly #843

kikisdeliveryservice opened this issue Dec 8, 2018 · 10 comments

Comments

@kikisdeliveryservice
Copy link
Contributor

Version

$ bin/openshift-install version
bin/openshift-install v0.5.0-master-47-gf7d6d2923a7979344fcac33293084051cecc8aab

Platform (aws|libvirt|openstack):

libvirt

What happened?

Installer did not complete, hit timeouts:
`ERROR: logging before flag.Parse: E1207 18:24:44.544541 16824 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=3, ErrCode=NO_ERROR, debug=""
WARNING RetryWatcher - getting event failed! Re-creating the watcher. Last RV: 210
WARNING Failed to connect events watcher: Get https://test1-api.tt.testing:6443/api/v1/namespaces/kube-system/events?resourceVersion=210&watch=true: dial tcp 192.168.126.10:6443: connect: connection refused
WARNING Failed to connect events watcher: Get https://test1-api.tt.testing:6443/api/v1/namespaces/kube-system/events?resourceVersion=210&watch=true: dial tcp 192.168.126.11:6443: connect: connection refused
WARNING Failed to connect events watcher: Get https://test1-api.tt.testing:6443/api/v1/namespaces/kube-system/events?resourceVersion=210&watch=true: dial tcp 192.168.126.11:6443: connect: connection refused
WARNING Failed to connect events watcher: Get https://test1-api.tt.testing:6443/api/v1/namespaces/kube-system/events?resourceVersion=210&watch=true: dial tcp 192.168.126.11:6443: connect: connection refused
WARNING Failed to connect events watcher: Get https://test1-api.tt.testing:6443/api/v1/namespaces/kube-system/events?resourceVersion=210&watch=true: dial tcp 192.168.126.11:6443: connect: connection refused
WARNING Failed to connect events watcher: Get https://test1-api.tt.testing:6443/api/v1/namespaces/kube-system/events?resourceVersion=210&watch=true: dial tcp 192.168.126.10:6443: connect: connection refused
WARNING Failed to connect events watcher: Get https://test1-api.tt.testing:6443/api/v1/namespaces/kube-system/events?resourceVersion=210&watch=true: dial tcp 192.168.126.10:6443: connect: connection refused
WARNING Failed to connect events watcher: Get https://test1-api.tt.testing:6443/api/v1/namespaces/kube-system/events?resourceVersion=210&watch=true: dial tcp 192.168.126.10:6443: connect: connection refused
WARNING Failed to connect events watcher: Get https://test1-api.tt.testing:6443/api/v1/namespaces/kube-system/events?resourceVersion=210&watch=true: dial tcp 192.168.126.11:6443: connect: connection refused
FATAL Error executing openshift-install: waiting for bootstrap-complete: watch closed before UntilWithoutRetry timeout

I also ran oc get pods --all-namespaces and got:
$ oc get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE openshift-cluster-version cluster-version-operator-5bd8d79d6c-nnj92 0/1 Pending 0 25m

What you expected to happen?

I expected the installer to sucessfully complete and not timeout.

How to reproduce it (as minimally and precisely as possible)?

I just ran the installer.

References

I have been experience sporadic timeouts (though not during install) as described here: openshift/origin#21612

@kikisdeliveryservice kikisdeliveryservice changed the title timeouts preventing installed from completing. timeouts preventing installer from completing properly Dec 8, 2018
@rwsu
Copy link
Contributor

rwsu commented Dec 8, 2018

I'm seeing a similar issue during install. Also using libvirt.

INFO Waiting 30m0s for the Kubernetes API...      
INFO Waiting 30m0s for the bootstrap-complete event... 
WARNING Failed to connect events watcher: Get https://demo-api.localdomain:6443/api/v1/namespaces/kube-system/events?watch=true: dial tcp: lookup demo-api.localdomain on 127.0.0.1:53: no such host 
WARNING Failed to connect events watcher: Get https://demo-api.localdomain:6443/api/v1/namespaces/kube-system/events?watch=true: dial tcp: lookup demo-api.localdomain on 127.0.0.1:53: no such host 
WARNING Failed to connect events watcher: Get https://demo-api.localdomain:6443/api/v1/namespaces/kube-system/events?watch=true: dial tcp: lookup demo-api.localdomain on 127.0.0.1:53: no such host 
WARNING Failed to connect events watcher: Get https://demo-api.localdomain:6443/api/v1/namespaces/kube-system/events?watch=true: dial tcp: lookup demo-api.localdomain on 127.0.0.1:53: no such host 

@jpeeler
Copy link

jpeeler commented Dec 9, 2018

I was getting the exact same error messages as the original reported issue #843 (comment), consistently. What I did that ended up working was:
got new pull secret
removed ~/.cache/openshift-install/libvirt
git pulled to f7d6d29 and rebuilt installer
installed via bin/openshift-install --dir jefftmp/ create cluster (based off #836 (comment))

I'm not really sure which steps were required, but I suspect previous retry attempts got something in a weird state and --dir was the trick that enabled the freshest bits to work properly.

@jiajliu
Copy link
Contributor

jiajliu commented Dec 10, 2018

Also hit it when create cluster on aws.

# ./openshift-install create cluster --dir demo
INFO Using Terraform to create cluster...         
INFO Waiting for bootstrap completion...          
INFO API v1.11.0+9fc67b7 up                       
WARNING RetryWatcher - getting event failed! Re-creating the watcher. Last RV: 57 
WARNING Failed to connect events watcher: Get https://jliu-test-api.devcluster.openshift.com:6443/api/v1/namespaces/kube-system/events?resourceVersion=57&watch=true: dial tcp 18.215.199.119:6443: connect: connection refused 
WARNING Failed to connect events watcher: Get https://jliu-test-api.devcluster.openshift.com:6443/api/v1/namespaces/kube-system/events?resourceVersion=57&watch=true: dial tcp 54.204.150.177:6443: connect: connection refused 
WARNING Failed to connect events watcher: Get https://jliu-test-api.devcluster.openshift.com:6443/api/v1/namespaces/kube-system/events?resourceVersion=57&watch=true: dial tcp 18.215.199.119:6443: connect: connection refused 
WARNING Failed to connect events watcher: Get https://jliu-test-api.devcluster.openshift.com:6443/api/v1/namespaces/kube-system/events?resourceVersion=57&watch=true: dial tcp 50.16.136.243:6443: connect: connection refused 
WARNING Failed to connect events watcher: Get https://jliu-test-api.devcluster.openshift.com:6443/api/v1/namespaces/kube-system/events?resourceVersion=57&watch=true: dial tcp 18.233.42.68:6443: connect: connection refused 
WARNING Failed to connect events watcher: Get https://jliu-test-api.devcluster.openshift.com:6443/api/v1/namespaces/kube-system/events?resourceVersion=57&watch=true: dial tcp 107.20.62.160:6443: connect: connection refused 
WARNING Failed to connect events watcher: Get https://jliu-test-api.devcluster.openshift.com:6443/api/v1/namespaces/kube-system/events?resourceVersion=57&watch=true: dial tcp 18.204.97.206:6443: connect: connection refused 
WARNING Failed to connect events watcher: Get https://jliu-test-api.devcluster.openshift.com:6443/api/v1/namespaces/kube-system/events?resourceVersion=57&watch=true: dial tcp 54.204.150.177:6443: connect: connection refused 
FATAL Error executing openshift-install: waiting for bootstrap-complete: watch closed before UntilWithoutRetry timeout 
# ./openshift-install version
./openshift-install v0.5.0
Terraform v0.11.8

Your version of Terraform is out of date! The latest version
is 0.11.10. You can update by downloading from www.terraform.io/downloads.html

After destroy the cluster and rm demo directory, re-run "./openshift-install create cluster --dir demo" works well.

@cgwalters
Copy link
Member

A lot of duplicate discussion of this one, e.g. #857

The install I just did also shows the WARNING RetryWatcher - getting event failed! Re-creating the watcher. Last RV: 57 .

However:

$ oc get events -n kube-system bootstrap-complete
LAST SEEN   FIRST SEEN   COUNT     NAME                 KIND      SUBOBJECT   TYPE      REASON    SOURCE                      MESSAGE
4m          4m           1         bootstrap-complete                                             cluster, osiris-bootstrap   cluster bootstrapping has completed

So I am suspecting the problem is with the logic for how we retry in the installer waiting for the event.

@kikisdeliveryservice
Copy link
Contributor Author

I also saw today, like @cgwalters and was unable to successfully bring a cluster up:

WARNING RetryWatcher - getting event failed! Re-creating the watcher. Last RV: 2074 
DEBUG added kube-scheduler.156fa66448e10b57: test1-master-0_3d908c00-fe34-11e8-88e4-a221fd594322 became leader 
DEBUG added kube-controller-manager.156fa66cd218aa95: test1-master-0_67fc4244-fe34-11e8-b0bd-a221fd594322 became leader 
FATAL Error executing openshift-install: waiting for bootstrap-complete: timed out waiting for the condition 
DEBUG Stopping RetryWatcher. 

@jianlinliu
Copy link
Contributor

I think this is because your pull secret is out of date, update it from try.openshift.com

@wzheng1
Copy link

wzheng1 commented Dec 17, 2018

I changed cluster name and --dir to a complete new ones, it succeeds without this error with below version
bin/openshift-install v0.7.0-master-6-g8f02020b59147c933a08c5e248a8e2c69dad24ae

@thomasmckay
Copy link

Getting new auth from try.openshift.com fixed this for me as well. Having feedback that this is the cause would be very welcome to this new user.

It is also unclear how to re-enter the new auth. Is there a way to re-prompt during openshift-install create cluster? The --help doesn't indicate anything.

@wking
Copy link
Member

wking commented Jan 2, 2019

It is also unclear how to re-enter the new auth. Is there a way to re-prompt during openshift-install create cluster?

More on this here.

@eparis
Copy link
Member

eparis commented Feb 19, 2019

Hopefully at this point most or all of these have been resolved. If there are futher bugs, please open a bugzilla.

@eparis eparis closed this as completed Feb 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants