You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During switch bootup supervisorctl is called lot of times from different SONiC containers to start daemons. It consumes a lot of CPU. It makes containers to start slower and has impact on cold/fast/warm boot timings.
Steps to reproduce the issue:
No specific steps, just perform any kind of reload/reboot and start some profiling tool (bootchart, perf&flamegraphs)
Describe the results you received:
supervisorctl is a very CPU intensive utility, however it is used everywhere, causing slow start.
Fast boot suffers because platform SDK may not be able to perfrom switch init and reconfiguration fast enough if other CPU intensive tasks are running in parallel.
Fast/Warm boot suffers because switch control plane downtime is increased.
Describe the results you expected:
More new containers will use supervisorctl to start daemons. To my understanding the job of supervisor is to fork the daemon and wait - which is blocking operation, so no intensive tasks are performed. However, supervisorctl (mainly) and supervisord take a lot of CPU during startup. Ideally there should not be high CPU usage for those processes.
Additional information you deem important (e.g. issue happens only occasionally):
This is very platform specific, depending on platform CPU you may have different results.
Output of show version:
The version is debug version compiled with SONIC_PROFILING_ON=y and '-fno-omit-frame-pointer':
Attached is system perf recording and generated flamegraph during bootup. Perf was started at /etc/rc.local phase with command: perf_4.9 record -F 99 -a -g -o /home/admin/perf -- sleep 100 & system-perf.svg.gz
We can see a lot of supervisorctl samples collected, more than any critical SONiC component, like SDK, syncd, orchagent or redis-server.
Bootchart plot (https://elinux.org/Bootchart)
We can see lot of supervisor invocations during SDK start and configuration.
Description
During switch bootup supervisorctl is called lot of times from different SONiC containers to start daemons. It consumes a lot of CPU. It makes containers to start slower and has impact on cold/fast/warm boot timings.
Steps to reproduce the issue:
No specific steps, just perform any kind of reload/reboot and start some profiling tool (bootchart, perf&flamegraphs)
Describe the results you received:
supervisorctl is a very CPU intensive utility, however it is used everywhere, causing slow start.
Fast boot suffers because platform SDK may not be able to perfrom switch init and reconfiguration fast enough if other CPU intensive tasks are running in parallel.
Fast/Warm boot suffers because switch control plane downtime is increased.
Describe the results you expected:
More new containers will use supervisorctl to start daemons. To my understanding the job of supervisor is to fork the daemon and wait - which is blocking operation, so no intensive tasks are performed. However, supervisorctl (mainly) and supervisord take a lot of CPU during startup. Ideally there should not be high CPU usage for those processes.
Additional information you deem important (e.g. issue happens only occasionally):
This is very platform specific, depending on platform CPU you may have different results.
Output of
show version
:The version is debug version compiled with SONIC_PROFILING_ON=y and '-fno-omit-frame-pointer':
Attached is system perf recording and generated flamegraph during bootup. Perf was started at /etc/rc.local phase with command:
perf_4.9 record -F 99 -a -g -o /home/admin/perf -- sleep 100 &
system-perf.svg.gz
We can see a lot of supervisorctl samples collected, more than any critical SONiC component, like SDK, syncd, orchagent or redis-server.
Bootchart plot (https://elinux.org/Bootchart)

We can see lot of supervisor invocations during SDK start and configuration.
Attach debug file
sudo generate_dump
:sonic_dump.tar.gz
The text was updated successfully, but these errors were encountered: