-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stuck after reboot #1589
Comments
Hello @jknipper, thanks for raising this. Are you able to tell us since when (which release) did you notice this issue? It seems to be an issue with the disk, do you have a specific configuration for your instances? |
From what I can see in our alerts/logging it started in July or August on a regular basis. We are running the stable release and are updating all instances shortly after a new release is published. It's hard to see if this is on the hypervisor side or an issue with flatcar. There was no real change on how the instances are provisioned, maybe on VMWare side, I'll try to find out. |
All the instances reboot at the same time on the hypervisor? |
No, reboots are managed by some operator in Kubernetes and are pretty much random. What is interesting from that screenshot is that the filesystem check for that same device seem so succeed but the mounting afterwards fails. When looking at the release history of Flatcar, our observations seem to collide with the switch from kernel version 6.1.x to 6.6.x. but could also be a coincidence. |
systemd-fsck runs fsck as a child process and it may exit with non 0 rc. I wonder how can I see the https://github.com/systemd/systemd/blob/4d65c9f70c866e678bd1c408a0a9ebf9046c5db0/src/fsck/fsck.c#L399 log in console output during the boot? It seems that boot logs are not saved to the root partition and after the next reboot they are missed. Does it make sense to add this unit add-on by default?
|
We discovered that we're using the default openstack flatcar image, while our openstack hypervisor is based on VMware, and there is no open-vm-tools in the openstack image. Therefore a regular VM reboot acts as a hard reboot. I tried to reboot a Flatcar VM based on the vmware image with a tiny adjustment on the oem fs: qemu image - https://s3.gifyu.com/images/bbEbQ.gif Now I have a question. How can I adjust boot parameters using ignition without the P.S. Since when a basic flatcar image doesn't contain open-vm-tools? Previously we didn't have such reboot issues. We also noticed that systemd was updated from 252 to 255 on Aug 7, 2024. Could this be another reason for our issue? |
Thanks for get back on this and on your investigation.
I am confused: if you run OpenStack on top of VMware, you should just run OpenStack images or I missed something?
|
Right, since the very beginning (coreos) we're using openstack images and unfortunately they don't contain open-vm-tools. And my discovery is that the lack of open-vm-tools causes incorrect VM reboot, the VM doesn't try to reach the |
UPD: I'm trying to find a way to tell flatcar vmware image that it is running in openstack environment. Unfortunately modifying the EFI partition's
doesn't help. The only way I found is to modify BTRFS OEM partition's
Is there any other methods to force vmware image to think that it is openstack without modifying an image? What if the open-vm-tools fixes some flatcar bug not allowing it to reach In the first case I got a broadcast message in ssh and my ssh connection was closed by remote server:
With the stopped I also tested flatcar openstack releases down to 3602.2.3, they all behave the same way. ssh connection get stuck, no broadcast reboot message. In addition I enabled acpid service with logging, but there were no logs at all during the reboot. Looks like we were lucky not to have faced the broken partition issue earlier. |
TLDR: we're using vmware hypervizors for our openstack environment, when we use
Solution found - modify |
@kayrus using Another thing you could try is to use the |
I don't think this may help. VMware hypervizors don't send ACPI events on reboot, they rely on open-vm-tools, and when hypervisor doesn't see this tool running, it forces the VM reboot.
This might be an option. I'll try this and let you know if this helps. |
Using the ignition config below did the trick, thank you!
|
@kayrus good that it works for you - but I'll investigate deeper. It's not supposed to work this way haha. |
Description
From time to time we see servers stuck in boot process after reboot. This only happens in rare cases after an OS update was applied.
Impact
The server is stuck after reboot and needs to be rebooted manually a second time to bring it up.
Environment and steps to reproduce
Expected behavior
The machine boots without interruption.
Additional information
From the attached screenshot of the server console it seems that the boot process got stuck while trying to mount sysroot.
This issue started some time ago and is hard to debug for us. Any suggestions how we could investigate further in this matter are greatly appreciated!
The text was updated successfully, but these errors were encountered: