Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no longer able to update Silverblue due to grub2-mkconfig failing #3715

Closed
gdesmott opened this issue Jun 1, 2022 · 12 comments
Closed

no longer able to update Silverblue due to grub2-mkconfig failing #3715

gdesmott opened this issue Jun 1, 2022 · 12 comments

Comments

@gdesmott
Copy link

gdesmott commented Jun 1, 2022

Host system details

State: idle
Warning: failed to finalize previous deployment
         error: Bootloader write config: grub2-mkconfig: Child process exited with code 1
         check `journalctl -b -1 -u ostree-finalize-staged.service`
Deployments:
  fedora:fedora/35/x86_64/silverblue
                   Version: 35.20220509.0 (2022-05-09T00:45:48Z)
                BaseCommit: ab8b2fc29f7d091a774fea9e806ee5c08013dea67816579b00f176b4016b381d
              GPGSignature: Valid signature by 787EA6AE1147EEE56C40B30CDB4639719867C58F
                      Diff: 45 removed
       RemovedBasePackages: thermald 2.4.8-3.fc35
           LayeredPackages: chromium gnome-tweaks google-chrome-stable kernel-tools libva-utils NetworkManager-l2tp-gnome nordvpn podman-compose qemu-user qemu-user-binfmt simple-scan throttled vim-X11
                            wireguard-tools
             LocalPackages: nordvpn-release-1.0.0-1.noarch

  fedora:fedora/35/x86_64/silverblue
                   Version: 35.20220509.0 (2022-05-09T00:45:48Z)
                BaseCommit: ab8b2fc29f7d091a774fea9e806ee5c08013dea67816579b00f176b4016b381d
              GPGSignature: Valid signature by 787EA6AE1147EEE56C40B30CDB4639719867C58F
       RemovedBasePackages: thermald 2.4.8-3.fc35
           LayeredPackages: chromium ffmpeg gnome-tweaks google-chrome-stable intel-media-driver kernel-tools libva-utils NetworkManager-l2tp-gnome nordvpn podman-compose qemu-user qemu-user-binfmt simple-scan
                            throttled vim-X11 wireguard-tools
             LocalPackages: nordvpn-release-1.0.0-1.noarch

● fedora:fedora/35/x86_64/silverblue
                   Version: 35.20220509.0 (2022-05-09T00:45:48Z)
                BaseCommit: ab8b2fc29f7d091a774fea9e806ee5c08013dea67816579b00f176b4016b381d
              GPGSignature: Valid signature by 787EA6AE1147EEE56C40B30CDB4639719867C58F
       RemovedBasePackages: thermald 2.4.8-3.fc35
           LayeredPackages: chromium ffmpeg gnome-tweaks google-chrome-stable intel-media-driver kernel-tools libva-utils NetworkManager-l2tp-gnome nordvpn podman-compose qemu-user qemu-user-binfmt simple-scan
                            throttled vim-X11 wireguard-tools
             LocalPackages: nordvpn-release-1.0.0-1.noarch rpmfusion-free-release-35-1.noarch rpmfusion-nonfree-release-35-1.noarch

Expected vs actual behavior

I'm no longer able to update my Silverblue. rpm-ostree update works fine but the changes are discarded when rebooting.

rpm-ostree status does say something went wrong:

$ rpm-ostree status
State: idle
Warning: failed to finalize previous deployment
         error: Bootloader write config: grub2-mkconfig: Child process exited with code 1
         check `journalctl -b -1 -u ostree-finalize-staged.service`

$ journalctl -b -1 -u ostree-finalize-staged.service
jun 01 12:27:31 cass-t14 systemd[1]: Finished OSTree Finalize Staged Deployment.
jun 01 13:20:42 cass-t14 systemd[1]: Stopping OSTree Finalize Staged Deployment...
jun 01 13:20:42 cass-t14 ostree[15261]: Finalizing staged deployment
jun 01 13:20:43 cass-t14 ostree[15261]: Copying /etc changes: 18 modified, 0 removed, 89 added
jun 01 13:20:43 cass-t14 ostree[15261]: Copying /etc changes: 18 modified, 0 removed, 89 added
jun 01 13:20:43 cass-t14 ostree[15261]: Regex version mismatch, expected: 10.39 2021-10-29 actual: 10.40 2022-04-14
jun 01 13:20:43 cass-t14 ostree[15261]: Regex version mismatch, expected: 10.39 2021-10-29 actual: 10.40 2022-04-14
jun 01 13:20:43 cass-t14 ostree[15261]: Regex version mismatch, expected: 10.39 2021-10-29 actual: 10.40 2022-04-14
jun 01 13:20:43 cass-t14 ostree[15261]: Regex version mismatch, expected: 10.39 2021-10-29 actual: 10.40 2022-04-14
jun 01 13:20:44 cass-t14 ostree[15261]: error: Bootloader write config: grub2-mkconfig: Child process exited with code 1
jun 01 13:20:44 cass-t14 systemd[1]: ostree-finalize-staged.service: Control process exited, code=exited, status=1/FAILURE
jun 01 13:20:44 cass-t14 systemd[1]: ostree-finalize-staged.service: Failed with result 'exit-code'.
jun 01 13:20:44 cass-t14 systemd[1]: Stopped OSTree Finalize Staged Deployment.
jun 01 13:20:44 cass-t14 systemd[1]: ostree-finalize-staged.service: Consumed 1.822s CPU time.

I tried running the finalized stage manually:

$ rpm-ostree update

(...)

$ sudo ostree admin finalize-staged
Copying /etc changes: 18 modified, 0 removed, 89 added
Regex version mismatch, expected: 10.39 2021-10-29 actual: 10.40 2022-04-14
Regex version mismatch, expected: 10.39 2021-10-29 actual: 10.40 2022-04-14
Regex version mismatch, expected: 10.39 2021-10-29 actual: 10.40 2022-04-14
Regex version mismatch, expected: 10.39 2021-10-29 actual: 10.40 2022-04-14
error: Bootloader write config: grub2-mkconfig: Child process exited with code 1

And indeed grub2 is failing:

# grub2-mkconfig -o /boot/grub2/grub.cfg 
Generating grub configuration file ...
/usr/sbin/grub2-probe: error: ../grub-core/kern/fs.c:120:unknown filesystem.

# grub2-mkconfig -V
grub2-mkconfig (GRUB) 2.06

This guy suggested that's a Grub 2.06 regression but I'm not sure how to properly downgrade it as I'm no longer able to add new deployment to grub.

@jlebon
Copy link
Member

jlebon commented Jun 3, 2022

Doing grub2-mkconfig manually is not exactly the same. To be sure that's the error, try redoing the ostree admin finalize-staged test with OSTREE_DEBUG_GRUB2=1. Try also running it with GRUB_DISABLE_OS_PROBER=true. If that works, you can add that to /etc/default/grub.

Only do this if you're not dual booting two different operating systems though (or you are, but the Silverblue bootloader is the one being chainloaded). A more nuclear option in that case would also be is to switch to using bootloader=none and blscfg but that requires verifying a few things first to make sure you don't brick your machine.

@gdesmott
Copy link
Author

gdesmott commented Jun 3, 2022

Thanks for your reply @jlebon

To be sure that's the error, try redoing the ostree admin finalize-staged test with OSTREE_DEBUG_GRUB2=1

Doesn't say much:

 OSTREE_DEBUG_GRUB2=1 sudo ostree admin finalize-staged
Copying /etc changes: 18 modified, 0 removed, 89 added
Regex version mismatch, expected: 10.39 2021-10-29 actual: 10.40 2022-04-14
Regex version mismatch, expected: 10.39 2021-10-29 actual: 10.40 2022-04-14
Regex version mismatch, expected: 10.39 2021-10-29 actual: 10.40 2022-04-14
Regex version mismatch, expected: 10.39 2021-10-29 actual: 10.40 2022-04-14
error: Bootloader write config: grub2-mkconfig: Child process exited with code 1

Try also running it with GRUB_DISABLE_OS_PROBER=true

Does not help either:

$ GRUB_DISABLE_OS_PROBER=true OSTREE_DEBUG_GRUB2=1 sudo ostree admin finalize-staged
Copying /etc changes: 18 modified, 0 removed, 89 added
Regex version mismatch, expected: 10.39 2021-10-29 actual: 10.40 2022-04-14
Regex version mismatch, expected: 10.39 2021-10-29 actual: 10.40 2022-04-14
Regex version mismatch, expected: 10.39 2021-10-29 actual: 10.40 2022-04-14
Regex version mismatch, expected: 10.39 2021-10-29 actual: 10.40 2022-04-14
error: Bootloader write config: grub2-mkconfig: Child process exited with code 1

@jlebon
Copy link
Member

jlebon commented Jun 3, 2022

sudo drops the env vars. You need to either use sudo -E or sudo env FOO=BAR ... CMD ARGS....

@gdesmott
Copy link
Author

gdesmott commented Jun 3, 2022

from a sudo -s shell:

# OSTREE_DEBUG_GRUB2=1 ostree admin finalize-staged
Copying /etc changes: 18 modified, 0 removed, 89 added
Regex version mismatch, expected: 10.39 2021-10-29 actual: 10.40 2022-04-14
Regex version mismatch, expected: 10.39 2021-10-29 actual: 10.40 2022-04-14
Regex version mismatch, expected: 10.39 2021-10-29 actual: 10.40 2022-04-14
Regex version mismatch, expected: 10.39 2021-10-29 actual: 10.40 2022-04-14
Generating grub configuration file ...
/usr/sbin/grub2-probe: error: ../grub-core/kern/fs.c:120:unknown filesystem.
error: Bootloader write config: grub2-mkconfig: Child process exited with code 1
# GRUB_DISABLE_OS_PROBER=true OSTREE_DEBUG_GRUB2=1 ostree admin finalize-staged
Copying /etc changes: 18 modified, 0 removed, 89 added
Regex version mismatch, expected: 10.39 2021-10-29 actual: 10.40 2022-04-14
Regex version mismatch, expected: 10.39 2021-10-29 actual: 10.40 2022-04-14
Regex version mismatch, expected: 10.39 2021-10-29 actual: 10.40 2022-04-14
Regex version mismatch, expected: 10.39 2021-10-29 actual: 10.40 2022-04-14
Generating grub configuration file ...
/usr/sbin/grub2-probe: error: ../grub-core/kern/fs.c:120:unknown filesystem.
error: Bootloader write config: grub2-mkconfig: Child process exited with code 1

@jlebon
Copy link
Member

jlebon commented Jun 3, 2022

Thanks. I thought that grub2-probe error came from 30_os-prober, but there's really a bunch of grub.d/ dropins that use it it seems. @martinezjavier any ideas here? Are you aware of any regression that could cause this?

It seems like there might be some attached disk that GRUB no longer recognizes. Might be worth trying to unplug any non-Silverblue disks and try again just to get out of this?

@gdesmott
Copy link
Author

gdesmott commented Jun 3, 2022

Might be worth trying to unplug any non-Silverblue disks and try again just to get out of this?

I don't have any. This is a laptop with a single SSD and no USB disk plugged.

Screenshot from 2022-06-03 17-35-47

@martinezjavier
Copy link

@martinezjavier any ideas here? Are you aware of any regression that could cause this?

No idea sorry since I can't follow anymore in detail all the grub2 changes anymore. @frozencemetery may be able to help.

@frozencemetery
Copy link

Suggest filing a bugzilla. First things we'd ask are to retry with the latest grub2, then adding verbosity flags, and if that doesn't work, it's gdb time.

@gdesmott
Copy link
Author

retry with the latest grub2

How can I update grub2 as all updates are broken because of this?

@gdesmott
Copy link
Author

Suggest filing a bugzilla.

https://bugzilla.redhat.com/show_bug.cgi?id=2096192

@frozencemetery
Copy link

Just to close the loop on this: the issue turned out to be that several mtimes of files on the ESP were set to the epoch, and grub2 didn't handle that properly. I've now patched grub2, and find /boot/efi -exec touch '{}' ';' also functioned as a workaround.

I think this bug can now be closed.

@cgwalters
Copy link
Member

Thanks for investigating this!

We should probably also try to add a systemd unit that force-disables the grub2 backend for ostree and hence relies solely on BLS.

pierrepinon pushed a commit to pierrepinon/workstation-ostree-config that referenced this issue Feb 14, 2025
We need to make it easier to update the bootloader on these variants
because unlike on traditional systems, it's not updated automatically
with the rest of the system. Add bootupd for that.

This would allow fixing issues like:
- coreos/rpm-ostree#3715
- fedora-silverblue/issue-tracker#120 (comment)

It won't be enabled by default and as mentioned in that comment requires
work in Anaconda to be seamless. But at least with this users should be
able to adopt and update:

https://github.com/coreos/bootupd/blob/main/README-design.md

See also the tracker issue where we did this for Fedora CoreOS:

coreos/fedora-coreos-tracker#510
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants