Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Device Hangs When Running config reload with an Errored config_db.json File #21918

Open
silango-ebay opened this issue Mar 4, 2025 · 2 comments

Comments

@silango-ebay
Copy link

silango-ebay commented Mar 4, 2025

Description:
When running config reload with an errored config_db.json file, the device hangs, all SONiC containers crash, and SONiC commands become unresponsive. The only way to recover is a full Linux reboot. Instead, the system should detect the invalid configuration, throw an error, and allow a successful reload after correcting the file.

Steps to Reproduce the Issue:
1.Modify the config_db.json file and introduce an error (e.g., incorrect syntax or missing required fields).
2.Run the below command:
sudo config reload -y

Describe the Results You Received:

admin@sonic:~$ sudo config reload -y
Disabling container monitoring ...
Stopping SONiC target ...
Running command: /usr/local/bin/sonic-cfggen  -j /etc/sonic/init_cfg.json  -j /etc/sonic/config_db.json  --write-to-db
Traceback (most recent call last):
  File "/usr/local/bin/sonic-cfggen", line 452, in <module>
    main()
  File "/usr/local/bin/sonic-cfggen", line 322, in main
    _process_json(args, data)
  File "/usr/local/bin/sonic-cfggen", line 236, in _process_json
    deep_update(data, FormatConverter.to_deserialized(json.load(stream)))
  File "/usr/lib/python3.9/json/__init__.py", line 293, in load
    return loads(fp.read(),
  File "/usr/lib/python3.9/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.9/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.9/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid control character at: line 844 column 28 (char 26989)
admin@sonic:~$
admin@sonic:~$ docker ps -a
CONTAINER ID   IMAGE                                COMMAND                  CREATED         STATUS                        PORTS     NAMES
34fc31eaa986   avizdock/ones-agent:v2.0.0           "/usr/bin/start.sh"      13 months ago   Up 52 minutes                           ones-agent
e0599c031ccc   docker-sonic-telemetry:latest        "/usr/local/bin/supe…"   21 months ago   Exited (0) 29 seconds ago               telemetry
173302cb94a8   docker-sonic-mgmt-framework:latest   "/usr/local/bin/supe…"   21 months ago   Exited (0) 28 seconds ago               mgmt-framework
e11c339ac6eb   docker-snmp:latest                   "/usr/local/bin/supe…"   21 months ago   Exited (0) 22 seconds ago               snmp
c17421141cd1   docker-lldp:latest                   "/usr/bin/docker-lld…"   21 months ago   Exited (137) 19 seconds ago             lldp
4a7708f03872   docker-router-advertiser:latest      "/usr/bin/docker-ini…"   21 months ago   Exited (0) 28 seconds ago               radv
63249652393b   docker-platform-monitor:latest       "/usr/bin/docker_ini…"   21 months ago   Exited (0) 20 seconds ago               pmon
8a005334da25   docker-syncd-mlnx:latest             "/usr/local/bin/supe…"   21 months ago   Exited (0) 12 seconds ago               syncd
e9622cf10055   docker-fpm-frr:latest                "/usr/bin/docker_ini…"   21 months ago   Exited (137) 20 seconds ago             bgp
8505f14bd6cc   docker-teamd:latest                  "/usr/local/bin/supe…"   21 months ago   Exited (0) 24 seconds ago               teamd
d092429170b9   docker-orchagent:latest              "/usr/bin/docker-ini…"   21 months ago   Exited (137) 14 seconds ago             swss
c7d0c5e97daf   docker-eventd:latest                 "/usr/local/bin/supe…"   21 months ago   Exited (0) 28 seconds ago               eventd
16a6ba2b7cb4   docker-database:latest               "/usr/local/bin/dock…"   21 months ago   Up 55 minutes                           database

Correcting the config_db.json

admin@sonic:~$ sudo vi /etc/sonic/config_db.json
admin@sonic:~$ sudo config reload -y

^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C

Running any SONiC command results in a system hang.
The only way to recover the device is by performing a reboot.

Describe the Results You Expected:
The device should detect the erroneous config_db.json file and throw an appropriate error message.
The system should not hang or crash the containers.
After correcting the config_db.json file and running config reload again, the device should successfully reload the configuration without requiring a full reboot.

Output of show version:
admin@sonic:~$ show version

SONiC Software Version: SONiC.202211.285097-d93970bc2
SONiC OS Version: 11
Distribution: Debian 11.6
Kernel: 5.10.0-18-2-amd64
Build commit: d93970b
Build date: Thu Jun 1 13:16:12 UTC 2023
Built by: AzDevOps@vmss-soni0017FB

Platform: x86_64-mlnx_msn4700-r0
HwSKU: ACS-MSN4700
ASIC: mellanox
ASIC Count: 1
Serial Number: MT2127X00668
Model Number: MSN4700-WS2FO
Hardware Revision: A1
Uptime: 11:21:16 up 44 min, 1 user, load average: 0.56, 0.53, 0.51
Date: Tue 04 Mar 2025 11:21:16

Docker images:
REPOSITORY TAG IMAGE ID SIZE
avizdock/ones-agent v2.0.0 65db34b4a08d 106MB
docker-syncd-mlnx 202211.285097-d93970bc2 9c1731d74a08 799MB
docker-syncd-mlnx latest 9c1731d74a08 799MB
docker-orchagent 202211.285097-d93970bc2 00caa3bf9212 398MB
docker-orchagent latest 00caa3bf9212 398MB
docker-fpm-frr 202211.285097-d93970bc2 f578990a6fbf 408MB
docker-fpm-frr latest f578990a6fbf 408MB
docker-teamd 202211.285097-d93970bc2 f2fdc5c8435d 379MB
docker-teamd latest f2fdc5c8435d 379MB
docker-macsec latest 605c41a3a906 381MB
docker-platform-monitor 202211.285097-d93970bc2 e5f902a41aee 799MB
docker-platform-monitor latest e5f902a41aee 799MB
docker-snmp 202211.285097-d93970bc2 c7c421be6aa8 407MB
docker-snmp latest c7c421be6aa8 407MB
docker-dhcp-relay latest 8ed4b74f3ed2 372MB
docker-eventd 202211.285097-d93970bc2 2bd9f79e438f 362MB
docker-eventd latest 2bd9f79e438f 362MB
docker-sonic-telemetry 202211.285097-d93970bc2 aac602648956 661MB
docker-sonic-telemetry latest aac602648956 661MB
docker-sonic-p4rt 202211.285097-d93970bc2 f53ef73d13c3 444MB
docker-sonic-p4rt latest f53ef73d13c3 444MB
docker-lldp 202211.285097-d93970bc2 d60896006825 404MB
docker-lldp latest d60896006825 404MB
docker-mux 202211.285097-d93970bc2 8b1a74f806bd 411MB
docker-mux latest 8b1a74f806bd 411MB
docker-database 202211.285097-d93970bc2 f77ca7a88334 362MB
docker-database latest f77ca7a88334 362MB
docker-router-advertiser 202211.285097-d93970bc2 f90bbaad223c 362MB
docker-router-advertiser latest f90bbaad223c 362MB
docker-nat 202211.285097-d93970bc2 ed3d70fdac5d 348MB
docker-nat latest ed3d70fdac5d 348MB
docker-sflow 202211.285097-d93970bc2 b7b95468a8ea 346MB
docker-sflow latest b7b95468a8ea 346MB
docker-sonic-mgmt-framework 202211.285097-d93970bc2 b34aa1e3f3c8 475MB
docker-sonic-mgmt-framework latest b34aa1e3f3c8 475MB
avizdock/ones-agent latest b65c0ba99e7c 93.2MB

@arlakshm
Copy link
Contributor

Is this valid usecase or these are negative tests?

@arlakshm
Copy link
Contributor

Can you enhance the config reload cli command to stop if the contents of config_db.json are corrupted before the sonic.target is stopped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants