Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

supervisor consumes lot of CPU during switch startup #4554

Closed
stepanblyschak opened this issue May 7, 2020 · 1 comment · Fixed by #4599
Closed

supervisor consumes lot of CPU during switch startup #4554

stepanblyschak opened this issue May 7, 2020 · 1 comment · Fixed by #4599

Comments

@stepanblyschak
Copy link
Collaborator

Description

During switch bootup supervisorctl is called lot of times from different SONiC containers to start daemons. It consumes a lot of CPU. It makes containers to start slower and has impact on cold/fast/warm boot timings.

Steps to reproduce the issue:
No specific steps, just perform any kind of reload/reboot and start some profiling tool (bootchart, perf&flamegraphs)

Describe the results you received:
supervisorctl is a very CPU intensive utility, however it is used everywhere, causing slow start.

Fast boot suffers because platform SDK may not be able to perfrom switch init and reconfiguration fast enough if other CPU intensive tasks are running in parallel.
Fast/Warm boot suffers because switch control plane downtime is increased.

Describe the results you expected:
More new containers will use supervisorctl to start daemons. To my understanding the job of supervisor is to fork the daemon and wait - which is blocking operation, so no intensive tasks are performed. However, supervisorctl (mainly) and supervisord take a lot of CPU during startup. Ideally there should not be high CPU usage for those processes.

Additional information you deem important (e.g. issue happens only occasionally):

This is very platform specific, depending on platform CPU you may have different results.

Output of show version:

The version is debug version compiled with SONIC_PROFILING_ON=y and '-fno-omit-frame-pointer':
Attached is system perf recording and generated flamegraph during bootup. Perf was started at /etc/rc.local phase with command:
perf_4.9 record -F 99 -a -g -o /home/admin/perf -- sleep 100 &
system-perf.svg.gz
We can see a lot of supervisorctl samples collected, more than any critical SONiC component, like SDK, syncd, orchagent or redis-server.

Bootchart plot (https://elinux.org/Bootchart)
bootchart
We can see lot of supervisor invocations during SDK start and configuration.

SONiC Software Version: SONiC.201911.0-8367dfeb
Distribution: Debian 9.12
Kernel: 4.9.0-11-2-amd64
Build commit: 8367dfeb
Build date: Wed May  6 16:14:29 UTC 2020
Built by: stepanb@r-build-sonic02

Platform: x86_64-mlnx_msn2700-r0
HwSKU: Mellanox-SN2700-D48C8
ASIC: mellanox
Serial Number: MT1822K07823
Uptime: 10:36:48 up 8 min,  1 user,  load average: 1.61, 1.69, 0.99

Docker images:
REPOSITORY                        TAG                 IMAGE ID            SIZE
docker-platform-monitor-dbg       201911.0-8367dfeb   af5e7e595e0c        641MB
docker-platform-monitor-dbg       latest              af5e7e595e0c        641MB
docker-platform-monitor           201911.0-8367dfeb   af5e7e595e0c        641MB
docker-platform-monitor           latest              af5e7e595e0c        641MB
docker-sflow-dbg                  201911.0-8367dfeb   7a794dfac076        332MB
docker-sflow-dbg                  latest              7a794dfac076        332MB
docker-sflow                      201911.0-8367dfeb   7a794dfac076        332MB
docker-sflow                      latest              7a794dfac076        332MB
docker-fpm-frr-dbg                201911.0-8367dfeb   231d4d8e8fbd        385MB
docker-fpm-frr-dbg                latest              231d4d8e8fbd        385MB
docker-fpm-frr                    201911.0-8367dfeb   231d4d8e8fbd        385MB
docker-fpm-frr                    latest              231d4d8e8fbd        385MB
docker-lldp-sv2-dbg               201911.0-8367dfeb   63600bb51fad        326MB
docker-lldp-sv2-dbg               latest              63600bb51fad        326MB
docker-lldp-sv2                   201911.0-8367dfeb   63600bb51fad        326MB
docker-lldp-sv2                   latest              63600bb51fad        326MB
docker-orchagent-dbg              201911.0-8367dfeb   1f2d3cbf41c4        376MB
docker-orchagent-dbg              latest              1f2d3cbf41c4        376MB
docker-orchagent                  201911.0-8367dfeb   1f2d3cbf41c4        376MB
docker-orchagent                  latest              1f2d3cbf41c4        376MB
docker-snmp-sv2-dbg               201911.0-8367dfeb   36b4cb692870        357MB
docker-snmp-sv2-dbg               latest              36b4cb692870        357MB
docker-snmp-sv2                   201911.0-8367dfeb   36b4cb692870        357MB
docker-snmp-sv2                   latest              36b4cb692870        357MB
docker-nat-dbg                    201911.0-8367dfeb   8b9e98228c50        362MB
docker-nat-dbg                    latest              8b9e98228c50        362MB
docker-nat                        201911.0-8367dfeb   8b9e98228c50        362MB
docker-nat                        latest              8b9e98228c50        362MB
docker-sonic-mgmt-framework-dbg   201911.0-8367dfeb   f6f222fbb114        443MB
docker-sonic-mgmt-framework-dbg   latest              f6f222fbb114        443MB
docker-sonic-mgmt-framework       201911.0-8367dfeb   f6f222fbb114        443MB
docker-sonic-mgmt-framework       latest              f6f222fbb114        443MB
docker-teamd-dbg                  201911.0-8367dfeb   aea02a411903        360MB
docker-teamd-dbg                  latest              aea02a411903        360MB
docker-teamd                      201911.0-8367dfeb   aea02a411903        360MB
docker-teamd                      latest              aea02a411903        360MB
docker-syncd-mlnx-dbg             201911.0-8367dfeb   a6067b546dec        416MB
docker-syncd-mlnx-dbg             latest              a6067b546dec        416MB
docker-syncd-mlnx                 201911.0-8367dfeb   a6067b546dec        416MB
docker-syncd-mlnx                 latest              a6067b546dec        416MB
docker-sonic-telemetry-dbg        201911.0-8367dfeb   2653e5a6978d        369MB
docker-sonic-telemetry-dbg        latest              2653e5a6978d        369MB
docker-sonic-telemetry            201911.0-8367dfeb   2653e5a6978d        369MB
docker-sonic-telemetry            latest              2653e5a6978d        369MB
docker-database-dbg               201911.0-8367dfeb   e224f6c86f35        303MB
docker-database-dbg               latest              e224f6c86f35        303MB
docker-database                   201911.0-8367dfeb   e224f6c86f35        303MB
docker-database                   latest              e224f6c86f35        303MB
docker-router-advertiser-dbg      201911.0-8367dfeb   eddcecf2bd70        306MB
docker-router-advertiser-dbg      latest              eddcecf2bd70        306MB
docker-router-advertiser          201911.0-8367dfeb   eddcecf2bd70        306MB
docker-router-advertiser          latest              eddcecf2bd70        306MB
docker-dhcp-relay-dbg             201911.0-8367dfeb   e1cc48d4ce38        312MB
docker-dhcp-relay-dbg             latest              e1cc48d4ce38        312MB
docker-dhcp-relay                 201911.0-8367dfeb   e1cc48d4ce38        312MB
docker-dhcp-relay                 latest              e1cc48d4ce38        312MB

Attach debug file sudo generate_dump:
sonic_dump.tar.gz

@lguohan
Copy link
Collaborator

lguohan commented May 13, 2020

image

i suspect we do not really need supervisorctl to start the process and just need supervisord to start those process, change the process to auto start.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants