You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The issue is caused due to timings of NVME disk subsystem udev rule processing and systemd configuration update:
nazariig@sonic-build-server: log$ zcat syslog.1.gz
Feb 27 19:21:14.663027 r-alligator-04 INFO rc.local[358]: + logger SONiC version 202205_2_rc.5-f1dfee1fe_Internal starting up...
Feb 27 19:21:14.663036 r-alligator-04 INFO systemd[1]: Finished Resets System Activity Logs.
Feb 27 19:21:14.663050 r-alligator-04 INFO sysfsutils[368]: Setting sysfs variables......done.
Feb 27 19:21:14.663059 r-alligator-04 INFO rc.local[358]: + grub_installation_needed=
Feb 27 19:21:14.663068 r-alligator-04 INFO rc.local[358]: + [ ! -e /host/machine.conf ]
Feb 27 19:21:14.663075 r-alligator-04 INFO rc.local[358]: + . /host/machine.conf
Feb 27 19:21:14.663087 r-alligator-04 INFO rc.local[358]: + onie_arch=x86_64
Feb 27 19:21:14.663096 r-alligator-04 INFO rc.local[358]: + onie_base_mac=90:0A:84:1E:90:80
Feb 27 19:21:14.663103 r-alligator-04 INFO rc.local[358]: + onie_bin=
Feb 27 19:21:14.663119 r-alligator-04 INFO rc.local[358]: + onie_boot_reason=install
Feb 27 19:21:14.663127 r-alligator-04 INFO rc.local[358]: + onie_build_date=2021-10-20T07:26+00:00
Feb 27 19:21:14.663136 r-alligator-04 INFO rc.local[358]: + onie_build_machine=nvidia_sn2201
Feb 27 19:21:14.663143 r-alligator-04 INFO rc.local[358]: + onie_build_platform=x86_64-nvidia_sn2201-r0
Feb 27 19:21:14.663151 r-alligator-04 INFO rc.local[358]: + onie_cli_static_parms=
Feb 27 19:21:14.663160 r-alligator-04 INFO rc.local[358]: + onie_cli_static_url=/tmp/sonic-mellanox.bin
Feb 27 19:21:14.663174 r-alligator-04 INFO rc.local[358]: + onie_config_version=1
Feb 27 19:21:14.663183 r-alligator-04 INFO rc.local[358]: + onie_dev=/dev/nvme0n1p2
Feb 27 19:21:14.663191 r-alligator-04 INFO rc.local[358]: + onie_exec_url=/tmp/sonic-mellanox.bin
Feb 27 19:21:14.663199 r-alligator-04 INFO rc.local[358]: + onie_firmware=auto
Feb 27 19:21:14.703369 r-alligator-04 INFO rc.local[358]: + onie_grub_image_name=grubx64.efi
Feb 27 19:21:14.703386 r-alligator-04 INFO rc.local[358]: + onie_initrd_tmp=/
Feb 27 19:21:14.703418 r-alligator-04 INFO rc.local[358]: + onie_installer=/var/tmp/installer
Feb 27 19:21:14.703428 r-alligator-04 INFO rc.local[358]: + onie_kernel_version=4.9.95
Feb 27 19:21:14.645551 r-alligator-04 INFO rc.local[358]: + onie_machine=nvidia_sn2201
Feb 27 19:21:14.656243 r-alligator-04 INFO rc.local[358]: + onie_machine_rev=0
Feb 27 19:21:14.688628 r-alligator-04 INFO rc.local[358]: + onie_partition_type=gpt
Feb 27 19:21:14.695383 r-alligator-04 INFO rc.local[358]: + onie_platform=x86_64-nvidia_sn2201-r0
Feb 27 19:21:14.702302 r-alligator-04 INFO rc.local[358]: + onie_root_dir=/mnt/onie-boot/onie
Feb 27 19:21:14.719454 r-alligator-04 INFO rc.local[358]: + onie_skip_ethmgmt_macs=yes
Feb 27 19:21:14.735404 r-alligator-04 INFO rc.local[358]: + onie_switch_asic=mlnx
Feb 27 19:21:14.755370 r-alligator-04 INFO rc.local[358]: + onie_uefi_arch=x64
Feb 27 19:21:14.775487 r-alligator-04 INFO rc.local[358]: + onie_uefi_boot_loader=grubx64.efi
Feb 27 19:21:14.791439 r-alligator-04 INFO rc.local[358]: + onie_vendor_id=5703
Feb 27 19:21:14.811461 r-alligator-04 INFO rc.local[358]: + onie_version=2021.08-nvidia_sn2201_v1.1-115200
Feb 27 19:21:14.827435 r-alligator-04 INFO rc.local[358]: + program_console_speed
Feb 27 19:21:14.844015 r-alligator-04 NOTICE root: SONiC version 202205_2_rc.5-f1dfee1fe_Internal starting up...
Feb 27 19:21:14.844423 r-alligator-04 INFO systemd[1]: Started LSB: Set sysfs variables from /etc/sysfs.conf.
Feb 27 19:21:14.844942 r-alligator-04 INFO kdump-tools[348]: Starting kdump-tools:
Feb 27 19:21:14.863908 r-alligator-04 INFO rasdaemon[353]: rasdaemon: Can't open trace_clock
Feb 27 19:21:14.863967 r-alligator-04 INFO rasdaemon[353]: rasdaemon: Can't select a timestamp for tracing
Feb 27 19:21:14.864183 r-alligator-04 INFO rc.local[405]: + cat /proc/cmdline
Feb 27 19:21:14.883864 r-alligator-04 INFO monit[350]: Starting daemon monitor: monit.
Feb 27 19:21:14.884047 r-alligator-04 CRIT smartd[363]: Warning via /usr/share/smartmontools/smartd-runner to root produced unexpected output (183 bytes) to STDOUT/STDERR:
Feb 27 19:21:14.884135 r-alligator-04 INFO systemd[1]: e2scrub_reap.service: Succeeded.
Feb 27 19:21:14.884337 r-alligator-04 INFO kdump-tools[388]: no crashkernel= parameter in the kernel cmdline ...
Feb 27 19:21:14.907576 r-alligator-04 INFO kdump-tools[436]: failed!
Feb 27 19:21:14.919840 r-alligator-04 INFO rc.local[406]: + grep -Eo console=ttyS[0-9]+,[0-9]+
Feb 27 19:21:14.936008 r-alligator-04 CRIT smartd[363]: /etc/smartmontools/run.d/10mail:
Feb 27 19:21:14.936114 r-alligator-04 INFO systemd[1]: Finished Remove Stale Online ext4 Metadata Check Snapshots.
Feb 27 19:21:14.936490 r-alligator-04 INFO rc.local[408]: + cut -d , -f2
Feb 27 19:21:14.955488 r-alligator-04 CRIT smartd[363]: Your system does not have /usr/bin/mail. Install the mailx or mailutils package
Feb 27 19:21:14.955593 r-alligator-04 INFO systemd[1]: Started Self Monitoring and Reporting Technology (SMART) Daemon.
Feb 27 19:21:14.955711 r-alligator-04 INFO rc.local[358]: + speed=115200
Feb 27 19:21:14.971385 r-alligator-04 INFO rc.local[358]: + [ -z 115200 ]
Feb 27 19:21:14.983545 r-alligator-04 INFO rc.local[358]: + CONSOLE_SPEED=115200
Feb 27 19:21:14.999556 r-alligator-04 CRIT smartd[363]: run-parts: /etc/smartmontools/run.d/10mail exited with return code 1
Feb 27 19:21:14.999654 r-alligator-04 INFO systemd[1]: Reloading.
Feb 27 19:21:15.000212 r-alligator-04 INFO haveged[314]: haveged: ver: 1.9.14; arch: x86; vend: GenuineIntel; build: (gcc 10.2.1 ITV); collect: 128K
Feb 27 19:21:15.000279 r-alligator-04 INFO haveged[314]: haveged: cpu: (L4 VC); data: 24K (L4 V); inst: 32K (L4 V); idx: 24/40; sz: 32154/54019
Feb 27 19:21:15.000325 r-alligator-04 INFO haveged[314]: haveged: tot tests(BA8): A:1/1 B:1/1 continuous tests(B): last entropy estimate 8.00361
Feb 27 19:21:15.000370 r-alligator-04 INFO haveged[314]: haveged: fills: 0, generated: 0
Feb 27 19:21:15.000560 r-alligator-04 INFO rc.local[420]: + grep agetty /lib/systemd/system/[email protected]
Feb 27 19:21:15.023522 r-alligator-04 CRIT smartd[363]: Warning via /usr/share/smartmontools/smartd-runner to root: failed (32-bit/8-bit exit status: 256/1)
Feb 27 19:21:15.024318 r-alligator-04 INFO rc.local[421]: + grep keep-baud
Feb 27 19:21:15.042904 r-alligator-04 INFO smartd[363]: Device: /dev/nvme0, state written to /var/lib/smartmontools/smartd.SFPC020GM1EC2TO_I_5E_51P_STD-000060202504A4000030.nvme.state
Feb 27 19:21:15.043010 r-alligator-04 NOTICE systemd[1]: /lib/systemd/system/ztp.service:9: Standard output type syslog is obsolete, automatically updating to journal. Please update your unit file, and consider removing the setting altogether.
Feb 27 19:21:15.043141 r-alligator-04 INFO rc.local[358]: + [ 1 = 0 ]
Feb 27 19:21:15.050313 r-alligator-04 INFO rc.local[358]: + sed -i s|u' .* %I|u' 115200 %I|g /lib/systemd/system/[email protected]
Feb 27 19:21:15.067443 r-alligator-04 INFO rc.local[358]: + systemctl daemon-reload
Feb 27 19:21:15.084074 r-alligator-04 INFO rc.local[358]: + [ -f /host/image-202205_2_rc.5-f1dfee1fe_Internal/platform/firsttime ]
Feb 27 19:21:15.103425 r-alligator-04 INFO rc.local[358]: + [ -f /var/log/fsck.log.gz ]
Feb 27 19:21:15.119941 r-alligator-04 INFO rsyslogd: imuxsock: Acquired UNIX socket '/run/systemd/journal/syslog' (fd 3) from systemd. [v8.2102.0]
Feb 27 19:21:15.120063 r-alligator-04 NOTICE systemd[1]: /lib/systemd/system/ztp.service:10: Standard output type syslog+console is obsolete, automatically updating to journal+console. Please update your unit file, and consider removing the setting altogether.
Feb 27 19:21:15.120312 r-alligator-04 INFO rc.local[464]: + gunzip -d -c /var/log/fsck.log.gz
Feb 27 19:21:15.136747 r-alligator-04 INFO rsyslogd: [origin software="rsyslogd" swVersion="8.2102.0" x-pid="360" x-info="https://www.rsyslog.com"] start
Feb 27 19:21:15.137264 r-alligator-04 INFO systemd-udevd[293]: Using default interface naming scheme 'v247'.
Feb 27 19:21:15.137906 r-alligator-04 INFO rc.local[465]: + logger -t FSCK
Feb 27 19:21:15.151717 r-alligator-04 INFO monit[398]: Starting Monit 5.20.0 daemon with http interface at /var/run/monit.sock
Feb 27 19:21:15.152218 r-alligator-04 ERR systemd-udevd[307]: nvme0n1p1: Failed to add device '/dev/nvme0n1p1' to watch: Operation not permitted
Feb 27 19:21:15.152328 r-alligator-04 INFO rc.local[358]: + rm -f /var/log/fsck.log.gz
Feb 27 19:21:15.167526 r-alligator-04 INFO rc.local[358]: + exit 0
Feb 27 19:21:15.183416 r-alligator-04 INFO monit[398]: Monit start delay set to 300s
Feb 27 19:21:15.183507 r-alligator-04 INFO systemd[1]: Started System Logging Service.
Feb 27 19:21:15.184136 r-alligator-04 INFO systemd[1]: Started OpenBSD Secure Shell server.
Feb 27 19:21:15.184283 r-alligator-04 INFO systemd[1]: Finished Kernel crash dump capture service.
Feb 27 19:21:15.184355 r-alligator-04 NOTICE FSCK: /root/host/disk-img/var-log.ext4: recovering journal
Feb 27 19:21:15.184427 r-alligator-04 INFO systemd[1]: Started LSB: service and resource monitoring daemon.
Feb 27 19:21:15.185956 r-alligator-04 NOTICE FSCK: /root/host/disk-img/var-log.ext4: clean, 43/262144 files, 37379/1048576 blocks
Feb 27 19:21:15.186038 r-alligator-04 INFO systemd[1]: Started /etc/rc.local Compatibility.
Feb 27 19:21:15.186123 r-alligator-04 INFO systemd[1]: Started User Login Management.
Feb 27 19:21:15.186200 r-alligator-04 INFO systemd-udevd[293]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Feb 27 19:21:15.186279 r-alligator-04 INFO systemd[1]: Starting Config chassis_db...
Feb 27 19:21:15.186361 r-alligator-04 INFO systemd[1]: Started Getty on tty1.
Feb 27 19:21:15.186438 r-alligator-04 INFO systemd[1]: Found device /dev/ttyS0.
Feb 27 19:21:15.186513 r-alligator-04 INFO systemd[1]: Started Serial Getty on ttyS0.
Feb 27 19:21:15.186590 r-alligator-04 INFO systemd[1]: Reached target Login Prompts.
Feb 27 19:21:15.527160 r-alligator-04 INFO systemd-udevd[296]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Feb 27 19:21:15.824473 r-alligator-04 INFO ras-mc-ctl[351]: ras-mc-ctl: Error: No dimm labels for NVIDIA model VMOD0014
When SONiC starts after image deploy, the udev rule is being triggered in parallel to systemd configuration reload:
https://github.com/sonic-net/sonic-buildimage/blob/master/files/image_config/platform/rc.local#L183
program_console_speed()
{
speed=$(cat /proc/cmdline | grep -Eo 'console=tty(S|AMA)[0-9]+,[0-9]+' | cut -d "," -f2)
if [ -z "$speed" ]; then
CONSOLE_SPEED=9600
else
CONSOLE_SPEED=$speed
fi
# Set correct baud rate in getty service
# First boot will find keep-baud argument with multiple baud rates as per Debian convention
# Subsequent boots may have new baud rate without finding keep-baud argument
grep agetty /lib/systemd/system/[email protected] |grep keep-baud
if [ $? = 0 ]; then
sed -i "s|\-\-keep\-baud .* %I| $CONSOLE_SPEED %I|g" /lib/systemd/system/[email protected]
else
sed -i "s|u' .* %I|u' $CONSOLE_SPEED %I|g" /lib/systemd/system/[email protected]
fi
# Reload systemd config
systemctl daemon-reload
}
Steps to reproduce the issue:
Run in parallel
root@r-alligator-04:/home/admin# cat ./test.sh
#!/bin/bash
while true; do
systemctl daemon-reload
done
root@r-alligator-04:/home/admin# cat udev.sh
#!/bin/bash
while true; do
udevadm trigger /dev/nvme0n1p1
done
Check syslog + udev
root@r-alligator-04:/home/admin# udevadm monitor
monitor will print the received events for:
UDEV - the event which udev sends out after rule processing
KERNEL - the kernel uevent
KERNEL[7102.165395] change /devices/pci0000:00/0000:00:09.0/0000:02:00.0/nvme/nvme0/nvme0n1/nvme0n1p1 (block)
UDEV [7102.169121] change /devices/pci0000:00/0000:00:09.0/0000:02:00.0/nvme/nvme0/nvme0n1/nvme0n1p1 (block)
root@r-alligator-04:/home/admin# tail -F /var/log/syslog | grep nvme
Mar 3 17:03:23.619034 r-alligator-04 ERR systemd-udevd[117268]: nvme0n1p1: Failed to add device '/dev/nvme0n1p1' to watch: Operation not permitted
Describe the results you received:
root@r-alligator-04:/home/admin# tail -F /var/log/syslog | grep nvme
Mar 3 17:03:23.619034 r-alligator-04 ERR systemd-udevd[117268]: nvme0n1p1: Failed to add device '/dev/nvme0n1p1' to watch: Operation not permitted
nazariig
changed the title
[202205] [NVME] | ERR systemd-udevd[307]: nvme0n1p1: Failed to add device '/dev/nvme0n1p1' to watch: Operation not permitted
[NVME] | ERR systemd-udevd[307]: nvme0n1p1: Failed to add device '/dev/nvme0n1p1' to watch: Operation not permitted
Apr 12, 2023
Description
The issue is caused due to timings of NVME disk subsystem
udev
rule processing andsystemd
configuration update:When SONiC starts after image deploy, the
udev
rule is being triggered in parallel tosystemd
configuration reload:Steps to reproduce the issue:
syslog
+udev
Describe the results you received:
Describe the results you expected:
No errors are expected
Output of
show version
:Output of
show techsupport
:Additional information you deem important (e.g. issue happens only occasionally):
NVME
udev
misc:The text was updated successfully, but these errors were encountered: