Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[swss] Orchagent terminated by SIGHUP because logrotate sent SIGHUP on boot after 202405->202411 warm upgrade #21962

Open
volodymyrsamotiy opened this issue Mar 7, 2025 · 4 comments

Comments

@volodymyrsamotiy
Copy link
Collaborator

Description
After upgrading from 202405 to 202411 image, during boot to new image, orchagent was terminated by SIGHUP.
Not sure if it is related to warm-reboot or upgrade flow, probably it is generic issue, but we reproduced it during warm upgrade.
Please note that this issue happened just once so far.

In syslog there is indication that there was some BASH error in logrotate script:

2025 Mar  2 03:40:02.291520 sonic INFO logrotate[8512]: logrotate_script: 10: [: -gt: unexpected operator

After that logrotate sent SIGHUP to orchagent:

2025 Mar  2 03:40:02.294484 sonic INFO logrotate: Sending SIGHUP to OA log_file_name: /var/log/swss/swss.rec

As a result orchagent exited:

2025 Mar  2 03:40:02.393109 sonic INFO swss#supervisord 2025-03-02 03:40:02,381 WARN exited: orchagent (terminated by SIGHUP; not expected)

Looks the error happened in the script that is defined for postrotate action in /etc/logrotate.d/rsyslog configuration file:
https://github.com/sonic-net/sonic-buildimage/blob/master/files/image_config/logrotate/rsyslog.j2#L118

    postrotate
        if [ $(echo $1 | grep -c "/var/log/swss/") -gt 0 ]; then
            # for multi asic platforms, there are multiple orchagents
            # send the SIGHUP only to the orchagent the which needs log file rotation
            PLATFORM=`sonic-cfggen -H -v DEVICE_METADATA.localhost.platform`
            ASIC_CONF=/usr/share/sonic/device/$PLATFORM/asic.conf
            if [ -f "$ASIC_CONF" ]; then
                . $ASIC_CONF
            fi
            if [ $NUM_ASIC -gt 1 ]; then
                log_file=$1
                log_file_name=${log_file#/var/log/swss/}
                logger -p syslog.info -t "logrotate" "Sending SIGHUP to OA log_file_name: $log_file_name"
                pgrep -xa orchagent | grep $log_file_name | awk '{ print $1; }' | xargs /bin/kill -HUP 2>/dev/null || true
            else
                logger -p syslog.info -t "logrotate" "Sending SIGHUP to OA log_file_name: $1"
                pgrep -x orchagent | xargs /bin/kill -HUP 2>/dev/null || true
            fi
        else
            if [ -f /var/run/rsyslogd.pid ]; then
                /bin/kill -HUP $(cat /var/run/rsyslogd.pid)
            fi
        fi
    endscript

Steps to reproduce the issue:
No specific steps to reproduce, issue happened just once so far.
It looks like generic statistical issue related to logrotate.
But we reproduced it during warm upgrade from 202404 to 202411.

Describe the results you received:
Orchagent terminated by SIGHUP because logrotate sent SIGHUP:

2025 Mar  2 03:40:02.291520 sonic INFO logrotate[8512]: logrotate_script: 10: [: -gt: unexpected operator
2025 Mar  2 03:40:02.294484 sonic INFO logrotate: Sending SIGHUP to OA log_file_name: /var/log/swss/swss.rec
2025 Mar  2 03:40:02.389481 sonic INFO systemd[1]: logrotate.service: Deactivated successfully.
2025 Mar  2 03:40:02.389594 sonic INFO systemd[1]: Finished logrotate.service - Rotate log files.
2025 Mar  2 03:40:02.393109 sonic INFO swss#supervisord 2025-03-02 03:40:02,381 WARN exited: orchagent (terminated by SIGHUP; not expected)
2025 Mar  2 03:40:02.398651 sonic INFO swss#supervisor-proc-exit-listener: Process 'orchagent' exited unexpectedly. Terminating supervisor 'swss'

Describe the results you expected:
Logrotate should not send SIGHUP to orchagent

@dgsudharsan
Copy link
Collaborator

@vaibhavhd For visibility

@saiarcot895
Copy link
Contributor

saiarcot895 commented Mar 10, 2025

Logrotate should send SIGHUP to orchagent; the question here is why orchagent exited when it got the SIGHUP.

The expectation here is that it should reopen the log files (sairedis.rec and swss.rec) when it gets the SIGHUP.

See also https://github.com/sonic-net/sonic-swss/blob/202411/orchagent/main.cpp#L334-L338

@vaibhavhd
Copy link
Contributor

@prsunny, can you help take a look?

@arlakshm
Copy link
Contributor

@theasianpianist and @prabhataravind, can you please help triage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants