-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rawhide: kola: 20220205: coreos.boot-mirror.luks
test enters emergency.target
#1092
rawhide: kola: 20220205: coreos.boot-mirror.luks
test enters emergency.target
#1092
Comments
The ignition-ostree-transposefs-restore.service is failing on boot and we're dropping into the emergency shell. Let's snooze the test for now on rawhide while we investigate. See coreos/fedora-coreos-tracker#1092
The ignition-ostree-transposefs-restore.service is failing on boot and we're dropping into the emergency shell. Let's snooze the test for now on rawhide while we investigate. See coreos/fedora-coreos-tracker#1092
I do a local build with the rawhide branch of the fcos configs repo, run There are something different between the logs, maybe the failed reason is: Failed logs:
Successful logs:
|
If you run it ten times does it pass every time? |
Yes, all passed after run totally 20 times |
Hmm. Maybe the issue only happens in our CI environment. Regardless we can remove the snooze and see if it happens again. |
Can not reproduce coreos.boot-mirror.luks failed issue locally after running 20 times, remove from denylist to see if it happens. See coreos/fedora-coreos-tracker#1092
Can not reproduce `coreos.boot-mirror.luks` failed issue locally with rawhide after running 20 times, remove from denylist to see if it happens. See coreos/fedora-coreos-tracker#1092
Can not reproduce `coreos.boot-mirror.luks` failed issue locally with rawhide after running 20 times, remove from denylist to see if it happens. See coreos/fedora-coreos-tracker#1092
Can not reproduce `coreos.boot-mirror.luks` failed issue locally with rawhide after running 20 times, remove from denylist to see if it happens. See coreos/fedora-coreos-tracker#1092
This issue is back again. See coreos/fedora-coreos-tracker#1092 (comment)
Looking at the error, this patch may help: diff --git a/overlay.d/05core/usr/lib/dracut/modules.d/40ignition-ostree/ignition-ostree-transposefs.sh b/overlay.d/05core/usr/lib/dracut/modules.d/40ignition-ostree/ignition-ostree-transposefs.sh
index 18224c36..4f72aacb 100755
--- a/overlay.d/05core/usr/lib/dracut/modules.d/40ignition-ostree/ignition-ostree-transposefs.sh
+++ b/overlay.d/05core/usr/lib/dracut/modules.d/40ignition-ostree/ignition-ostree-transposefs.sh
@@ -56,7 +56,7 @@ udev_trigger_on_label_mismatch() {
local expected_dev=$1; shift
local actual_dev
expected_dev=$(realpath "${expected_dev}")
- actual_dev=$(realpath "/dev/disk/by-label/$label")
+ actual_dev=$(realpath "/dev/disk/by-label/$label" || :)
if [ "$actual_dev" != "$expected_dev" ]; then
echo "Expected /dev/disk/by-label/$label to point to $expected_dev, but points to $actual_dev; triggering udev"
udevadm trigger --settle "$expected_dev" Though would be good to narrow down on what's happening there at the udev level and possibly file another RHBZ before working around it. The |
This issue is back again. See coreos/fedora-coreos-tracker#1092 (comment)
Test with fedora-coreos-37.20220217.dev.0-qemu.x86_64.qcow2, failed to Details
|
@dustymabe is there anyway I can find console logs about |
It looks like maybe the logs got cleaned up from that run already :( |
I just ran this locally against the ISO from #1092 (comment) and it failed:
So.. indeed looks like your failure is different because mine found those devices:
|
I can reproduce with |
The snooze for |
@HuijingHei One thing I find helpful when debugging udev-related things is to boot with |
Thanks @jlebon for your suggestion. |
@HuijingHei The debug logs are in the journal. If you want it on the console, you can add Hmm but actually, you're triggering this by running the test and not manually doing |
Skimmed that diff, but nothing jumped out related to storage devices, mounts, udev, etc. I think we'd need to bisect.
|
I do not know why I can not reproduce locally, not sure my steps are correct.
|
Most likely it's a race condition and my system is either faster or slower than yours :) |
These tests are still failing and are still being investigated. See: - coreos/fedora-coreos-tracker#1059 - coreos/fedora-coreos-tracker#1092 - coreos/fedora-coreos-tracker#1105
These tests are still failing and are still being investigated. See: - coreos/fedora-coreos-tracker#1059 - coreos/fedora-coreos-tracker#1092 - coreos/fedora-coreos-tracker#1105
I'm no longer seeing this issue with latest kernels in
|
We're no longer seeing a test failure there. It appears the following kernel transition fixed the issue: ``` kernel 5.17.0-0.rc5.20220225git53ab78cd6d5a.106.fc37.x86_64 → 5.17.0-0.rc6.109.fc37.x86_64 ``` Closes coreos/fedora-coreos-tracker#1092
We're no longer seeing a test failure there. It appears the following kernel transition fixed the issue: ``` kernel 5.17.0-0.rc5.20220225git53ab78cd6d5a.106.fc37.x86_64 → 5.17.0-0.rc6.109.fc37.x86_64 ``` Closes coreos/fedora-coreos-tracker#1092
We're no longer seeing a test failure there. It appears the following kernel transition fixed the issue: ``` kernel 5.17.0-0.rc5.20220225git53ab78cd6d5a.106.fc37.x86_64 → 5.17.0-0.rc6.109.fc37.x86_64 ``` Closes coreos/fedora-coreos-tracker#1092
ok more information here.. I was mislead in my previous understanding that the issue was fixed. It turns out that kernel NVR versions that include
have debug turned ON, where:
do NOT. This problem actually only shows itself when debug is turned on. I was able to reproduce this issue on
NOTE: For now I needed this patch to The problem actually goes back much farther. I chased it for a while but then gave up. Here are the debug kernels I tried and confirmed the issue occurs on (all built on top of
So this problem has been around over a year in the kernel. I imagine we only started hitting it this year because some of our code introduced some path which made the race easier to hit. Rather than waste much more time on this issue (since it only happens with debug kernels) we should consider ways to make the test artificially not fail if we're on a debug kernel. |
… to debug kernels It appears that on debug kernels the "/dev/disk/by-label/$label" device *can* not appear at all (some sort of race condition) and thus calling `realpath` on it will fail. Let's just make the call to `realpath` not be fatal so we can workaround this issue that has been around in the kernel for some time. Closes coreos/fedora-coreos-tracker#1092
… to debug kernels It appears that on debug kernels the "/dev/disk/by-label/$label" device *can* not appear at all (some sort of race condition) and thus calling `realpath` on it will fail. Let's just make the call to `realpath` not be fatal so we can workaround this issue that has been around in the kernel for some time. Closes coreos/fedora-coreos-tracker#1092
I decided for now to punt on this issue with a workaround since I've spent way too much time on it and the fact that it only presents itself with a debug kernel, which we'll never ship to our prod streams. Someone else is welcome to pick up and investigate further and open bugs if they find enough information to open a BZ, but I need to move on. |
The ignition-ostree-transposefs-restore.service is failing on boot and we're dropping into the emergency shell. Let's snooze the test for now on rawhide while we investigate. See coreos/fedora-coreos-tracker#1092
Can not reproduce `coreos.boot-mirror.luks` failed issue locally with rawhide after running 20 times, remove from denylist to see if it happens. See coreos/fedora-coreos-tracker#1092
This issue is back again. See coreos/fedora-coreos-tracker#1092 (comment)
These tests are still failing and are still being investigated. See: - coreos/fedora-coreos-tracker#1059 - coreos/fedora-coreos-tracker#1092 - coreos/fedora-coreos-tracker#1105
We're no longer seeing a test failure there. It appears the following kernel transition fixed the issue: ``` kernel 5.17.0-0.rc5.20220225git53ab78cd6d5a.106.fc37.x86_64 → 5.17.0-0.rc6.109.fc37.x86_64 ``` Closes coreos/fedora-coreos-tracker#1092
… to debug kernels It appears that on debug kernels the "/dev/disk/by-label/$label" device *can* not appear at all (some sort of race condition) and thus calling `realpath` on it will fail. Let's just make the call to `realpath` not be fatal so we can workaround this issue that has been around in the kernel for some time. Closes coreos/fedora-coreos-tracker#1092
The ignition-ostree-transposefs-restore.service is failing on boot and we're dropping into the emergency shell. Let's snooze the test for now on rawhide while we investigate. See coreos/fedora-coreos-tracker#1092
Can not reproduce `coreos.boot-mirror.luks` failed issue locally with rawhide after running 20 times, remove from denylist to see if it happens. See coreos/fedora-coreos-tracker#1092
This issue is back again. See coreos/fedora-coreos-tracker#1092 (comment)
These tests are still failing and are still being investigated. See: - coreos/fedora-coreos-tracker#1059 - coreos/fedora-coreos-tracker#1092 - coreos/fedora-coreos-tracker#1105
We're no longer seeing a test failure there. It appears the following kernel transition fixed the issue: ``` kernel 5.17.0-0.rc5.20220225git53ab78cd6d5a.106.fc37.x86_64 → 5.17.0-0.rc6.109.fc37.x86_64 ``` Closes coreos/fedora-coreos-tracker#1092
… to debug kernels It appears that on debug kernels the "/dev/disk/by-label/$label" device *can* not appear at all (some sort of race condition) and thus calling `realpath` on it will fail. Let's just make the call to `realpath` not be fatal so we can workaround this issue that has been around in the kernel for some time. Closes coreos/fedora-coreos-tracker#1092
The
coreos.boot-mirror.luks
just started failing semi consistently in therawhide
stream. We first saw it in CI and now it's happening in the pipeline fedora-coreos-pipeline#19. Cracking open theconsole.txt
for the test shows us:Full console.txt file: coreos-boot-mirror-luks-console.txt
The text was updated successfully, but these errors were encountered: