Skip to content

Commit 17a3c29

Browse files
authored
Merge pull request sonic-net#9 from rajendra-dendukuri/coredump_ocyang_updates
Add openconfig yang model for managing core files
2 parents 9a232e3 + 28d238a commit 17a3c29

File tree

1 file changed

+226
-33
lines changed

1 file changed

+226
-33
lines changed

platforms/coremgr/core_file_manager.md

+226-33
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
# Core file manager
33

44
## High Level Design Document
5-
**Rev 2.0**
5+
**Rev 2.1**
66

77
## Table of Contents
88

@@ -20,7 +20,12 @@
2020
* [Design](#design)
2121
* [ Core-dump generation service](#core-dump-generation-service)
2222
* [ Tech-support export service](#tech-support-export-service)
23-
* [CLI commands](#cli-commands)
23+
* [User Interface](#user-interface)
24+
* [Data Models](#data-models)
25+
* [Show Commands](#show-commands)
26+
* [Configuration Commands](#configuration-commands)
27+
* [REST API Support](#rest-api-support)
28+
* [Techsupport Export Services CLI commands](#techsupport-export-services-cli-commands)
2429
* [Serviceability and DEBUG](#serviceability-and-debug)
2530
* [Warm Boot Support](#warm-boot-support)
2631
* [Unit Test](#unit-test)
@@ -34,7 +39,8 @@
3439
Rev | Date | Author | Change Description
3540
:---: | :-----: | :------: | :---------
3641
1.0 | 05/07/19 | Kalimuthu | Initial version
37-
2.0 | 03/08/19 | Rajendra | Review Comments
42+
2.0 | 08/03/19 | Rajendra | Review Comments
43+
2.1 | 05/17/20 | Rajendra | Defined an openconfig yang model for systemd-coredump. Added KLISH CLI commands for coredump configuration
3844

3945

4046
# About this Manual
@@ -88,14 +94,14 @@ This document describes new mechanisms to manage the core files that are generat
8894

8995
To configure the core dump and tech-support data, export to an external server and to view the core details the following config and show commands shall be supported. It is to be noted that the tech-support data always includes the core dumps generated on the system.
9096

91-
### Config commands
97+
### Config commands requirements
9298

9399
>1. Config command to enable/disable the coredump generation of processes.
94100
>2. Config command to store the details of exporting tech-support data to an external server which includes remote server name, path, transfer protocol type and the user credentials.
95101
>2. Config command to enable/disable the tech-support export
96102
>3. Config command to configure the tech-support export periodic interval.
97103
98-
### Show commands
104+
### Show commands requirements
99105
> 1. Show commands to display the core file information
100106
> 2. show commands to display the tech-support export information.
101107
@@ -111,15 +117,15 @@ There should be a limit on the size of the core file generated and the space occ
111117

112118
The corefile management functionality is divided into two main services.
113119

114-
1. Core-dump generation service.
115-
2. Tech-support data export service.
120+
1. Core-Dump Generation Service.
121+
2. Tech-support data export service.
116122

117123

118-
## Core-dump generation service
124+
## Core-Dump Generation Service
119125

120-
1. Core files are usually generated when process terminates unexpectedly. Typical conditions are access violations, termination signals (except SIGKILL), etc.,
121-
2. ulimit configuration might prevent generation of core due to size configurations. We need to ensure this is not the case.
122-
3. Service restart functions - will not generate the core dump as it handle the graceful stop and start. This includes docker service restart as well.
126+
1. Core files are usually generated when process terminates unexpectedly. Typical conditions are access violations, termination signals (except SIGKILL), etc.,
127+
2. ulimit configuration might prevent generation of core due to size configurations. We need to ensure this is not the case.
128+
3. Service restart functions - will not generate the core dump as it handle the graceful stop and start. This includes docker service restart as well.
123129

124130
## systemd-coredump
125131

@@ -154,25 +160,7 @@ Current SONiC code has some basic support for generation and compression of core
154160
>- Setting of “kernel/core_pattern” in “build_debian.sh” is removed as systemd-coredump sets this parameter.
155161
>- A symlink /var/core is created to point to the systemd-coredump standard core file destination “var/lib/systemd/coredump”
156162
>- “show techsupport” command is modified to capture the core files from the symlink “/var/core”. It is also modified to consider that core files are lz4 compressed instead of gz files.
157-
158-
## Configuration commands:
159-
160-
For SONiC switches following CLI commands will be provided to manage core files
161-
162-
#### show core [ config | info | list ]
163-
164-
>###### **\<config>** Show coredump configuration
165-
>###### **\<info>** Show information about one or more coredumps
166-
>###### **\<list>** List available coredumps
167-
168-
Display list of current core files available and their information. This is a wrapper command for the coredumpctl utility provided by systemd-coredump package.
169163
170-
#### config core <enable|disable>
171-
172-
Enable or disable coredump functionality. This configuration entry will be part of Config DB and thus can be stored as part of startup-configuration.
173-
174-
When disabled, this command will set ProcessSizeMax=0 in the /etc/systemd/coredump.conf file. The configuration variable ProcessSizeMax specifies maximum size in bytes of a core which will be processed. By setting it to 0 core dump generation can be disabled. When enabled this command will set ProcessSizeMax to be the same value as ExternalSizeMax. The configuration variable ExternalSizeMax indicates the maximum (uncompressed) size in bytes of a core to be saved.
175-
176164
## Core Dump Event Logging
177165

178166
Report of available core files can be obtained using the coredumpctl utility.
@@ -224,8 +212,15 @@ When core file is generated for the same process multiple times, the framework s
224212

225213
The archived core file is generated in a pre-defined format by the systemd-coredump tool.
226214

227-
Format: core.<program-name>.<uid>.<boot-id>.<PID>.<timestamp>
228-
Examples:
215+
**Format:**
216+
217+
```
218+
core.<program-name>.<uid>.<boot-id>.<PID>.<timestamp>
219+
```
220+
221+
**Examples:**
222+
223+
```
229224
core.orchagent.0.8bc64adf67544e9e8b897cc5c1c9fde7.10618.1479890855000000000000.lz4
230225
core.orchagent.0.8bc64adf67544e9e8b897cc5c1c9fde7.11686.1479886973000000000000.lz4
231226
core.orchagent.0.8bc64adf67544e9e8b897cc5c1c9fde7.1748.1479887528000000000000.lz4
@@ -235,6 +230,7 @@ Examples:
235230
core.orchagent.0.8bc64adf67544e9e8b897cc5c1c9fde7.26069.1479889746000000000000.lz4
236231
core.orchagent.0.8bc64adf67544e9e8b897cc5c1c9fde7.31104.1479891410000000000000.lz4
237232
core.orchagent.0.8bc64adf67544e9e8b897cc5c1c9fde7.5952.1479889193000000000000.lz4
233+
```
238234

239235
# Tech-support export service
240236

@@ -251,9 +247,23 @@ The export service is configured to monitors the coredump path for any new core
251247

252248
### Config DB Schema
253249

250+
#### Coredump Configuration
251+
252+
The coredump administrative mode can be stored in the Config DB as defined below. By default, coredump is enabled and if the COREDUMP Config DB table entry is missing, coredump is
253+
assumed to be administratively enabled.
254+
255+
```
256+
"COREDUMP": {
257+
"config": {
258+
"enabled": "true"
259+
}
260+
}
261+
```
262+
263+
#### Tech Support Services
254264
In order to export the tech support data, remote server details have to be configured on the device. Through CLI interface, external storage server can be configured which includes server IP, path and access information like user credentials and transport protocol. This information is stored as part of config DB.
255265
,
256-
>>
266+
```
257267
"EXPORT": {
258268
"export": {
259269
"config": "<enable/disable>",
@@ -266,10 +276,193 @@ In order to export the tech support data, remote server details have to be confi
266276
267277
}
268278
},
279+
```
269280

270281
While configuring the export service, the remote server password is encrypted with device universally unique identifier (UUID) and stored into the config DB, so that the password can be decrypted only on the device. The protocol fields specifies the one of the file transfer protocol either SCP or SFTP. The interval field specifies the duration in which it captures the tech-support data and export it.
271282

272-
## CLI commands
283+
## User Interface
284+
### Data Models
285+
286+
Coredump configuration and status parameters are defined in the openconfig-systemd-coredump yang model. The openconfig-systemd-coredump yang model is included as an extension to the openconfig-system yang model.
287+
288+
```
289+
+--rw oc-sys-ext:systemd-coredump
290+
+--rw oc-sys-ext:config
291+
| +--rw oc-sys-ext:enable? boolean
292+
+--ro oc-sys-ext:state
293+
| +--ro oc-sys-ext:enable? boolean
294+
+--ro oc-sys-ext:core-file-records
295+
+--ro oc-sys-ext:core-file-record* [timestamp]
296+
+--ro oc-sys-ext:timestamp -> ../state/timestamp
297+
+--ro oc-sys-ext:state
298+
+--ro oc-sys-ext:timestamp? oc-types:timeticks64
299+
+--ro oc-sys-ext:executable? string
300+
+--ro oc-sys-ext:core-file? string
301+
+--ro oc-sys-ext:pid? uint64
302+
+--ro oc-sys-ext:uid? uint32
303+
+--ro oc-sys-ext:gid? uint32
304+
+--ro oc-sys-ext:signal? uint32
305+
+--ro oc-sys-ext:command-line? string
306+
+--ro oc-sys-ext:boot-identifier? string
307+
+--ro oc-sys-ext:machine-identifier? string
308+
+--ro oc-sys-ext:crash-message? string
309+
+--ro oc-sys-ext:core-file-present? boolean
310+
```
311+
312+
313+
### Show Commands
314+
315+
The following CLI commands provide the ability to view the core files generated on the SONiC switch.
316+
317+
#### show core config
318+
**Description**
319+
320+
Display the coredump configuration. Use this command to display if the coredump feature
321+
is administratively enabled or disabled.
322+
323+
**Usage**
324+
325+
```
326+
show core config
327+
```
328+
329+
**Example**
330+
331+
```
332+
sonic# show core config
333+
Coredump : Enabled
334+
```
335+
336+
##### show core list
337+
**Description**
338+
339+
Use this command to list a summary of the core files generated by the kernel. The following information
340+
about each core file is also displayed.
341+
- TIME The time of the crash, as reported by the kernel in UTC
342+
- PID: The identifier of the process that crashed
343+
- SIG: The signal that caused the process to crash, when applicable
344+
- COREFILE: Indicates whether the captured core file exists on local disk or has been removed
345+
- EXE: The application executable that has crashed
346+
347+
**Usage**
348+
349+
```
350+
show core list
351+
```
352+
353+
**Example**
354+
355+
```
356+
sonic# show core list
357+
TIME PID SIG COREFILE EXE
358+
2020-05-16 11:54:33 26480 11 present clish
359+
2020-05-15 01:25:16 6195 11 present crashme
360+
2020-05-15 00:45:28 13604 11 present crashme
361+
2020-05-14 02:11:11 3197 11 present crashme
362+
2020-05-13 01:10:56 17844 11 missing crashme
363+
2020-05-13 01:10:55 17728 11 present crashme
364+
```
365+
366+
##### show core info
367+
**Description**
368+
369+
Use this command to display detailed information about a crash that has occured in the system. This command
370+
takes processid or executable name as input to search and display the corresponding crash information. If multiple
371+
core files are found which satisfy the match condition, information of all core files is displayed.
372+
373+
The following information about matching core files is displayed:
374+
- Time: The time of the crash, as reported by the kernel in UTC
375+
- Executable: The full path to the application executable that has crashed"
376+
- Core File: The file name of the application core dump of the executable that has crashed
377+
- PID: The identifier of the process that crashed
378+
- User ID: The user identifier of the process that crashed
379+
- Group ID: The group identifier of the process that crashed
380+
- Signal: The signal that caused the process to crash, when applicable
381+
- Command Line: The command line arguments of the process that crashed
382+
- Boot ID: The unique identifier of the local system that is generated and set on each system boot up event
383+
- Machine ID: The unique machine identifier of the local system that is set during installation
384+
- Core File Found: Indicates whether the captured core file exists on local disk or has been removed
385+
- Crash Message: A copy of the application stack trace information of the process crashed
386+
387+
388+
**Usage**
389+
390+
```
391+
show core info <pid | exe>
392+
```
393+
394+
**Example**
395+
396+
```
397+
sonic# show core info clish
398+
Time : 2020-05-16 11:54:33
399+
Executable : /usr/sbin/cli/clish
400+
Core File : /var/lib/systemd/coredump/core.clish.1000.8f1cad11c59840318a6df3aa6ed3633e.26480.1589630073000000000000.lz4
401+
PID : 26480
402+
User ID : 1000
403+
Group ID : 1000
404+
Signal : 11
405+
Command Line : /usr/sbin/cli/clish
406+
Boot ID : 8f1cad11c59840318a6df3aa6ed3633e
407+
Machine ID : fc0a437952314ee5a585a94ceaa480af
408+
Core File Found : present
409+
Crash Message :
410+
Process 26480 (clish) of user 1000 dumped core.
411+
Stack trace of thread 152:
412+
#0 0x00007f30f516357a PyEval_EvalFrameEx (libpython2.7.so.1.0)
413+
#1 0x00007f30f52cc29c PyEval_EvalCodeEx (libpython2.7.so.1.0)
414+
#2 0x00007f30f5220670 n/a (libpython2.7.so.1.0)
415+
#3 0x00007f30f51b85c3 PyObject_Call (libpython2.7.so.1.0)
416+
#4 0x00007f30f52cb6c7 PyEval_CallObjectWithKeywords (libpython2.7.so.1.0)
417+
#5 0x00007f30ebb2f43a n/a (/usr/sbin/cli/.libs/clish_plugin_clish.so)
418+
```
419+
420+
### Configuration commands:
421+
422+
The following CLI commands provide the ability to confgure the systemd-coredump feature.
423+
424+
#### config core <enable|disable>
425+
**Description**
426+
427+
This command can be used to enable or disable the cability to generate a core file when an application crash is detected by the kernel.
428+
429+
When disabled, this command will set ProcessSizeMax=0 in the /etc/systemd/coredump.conf file. The configuration variable ProcessSizeMax
430+
specifies maximum size in bytes of a core which will be processed. By setting it to 0 core dump generation can be disabled. When enabled
431+
this command will set ProcessSizeMax to be the same value as ExternalSizeMax.
432+
The configuration variable ExternalSizeMax indicates the maximum (uncompressed) size in bytes of a core to be saved.
433+
434+
435+
**Usage**
436+
437+
```
438+
[no] core enable
439+
```
440+
441+
**Example**
442+
```
443+
sonic# configure terminal
444+
sonic(config)# no core
445+
446+
sonic(config)# core enable
447+
```
448+
449+
### REST API Support
450+
451+
The following REST URIs are supported to configure coredump and view the core file information.
452+
453+
```
454+
PATCH "<REST-SERVER:PORT>/restconf/data/openconfig-system:system/openconfig-system-ext:systemd-coredump/config" -d "{ \"openconfig-system-ext:config\": {\"enable\": true}}"
455+
456+
PATCH "<REST-SERVER:PORT>/restconf/data/openconfig-system:system/openconfig-system-ext:systemd-coredump/config" -d "{ \"openconfig-system-ext:config\": {\"enable\": false}}"
457+
458+
GET "<REST-SERVER:PORT>/restconf/data/openconfig-system:system/openconfig-system-ext:systemd-coredump/core-file-records"
459+
460+
GET "<REST-SERVER:PORT>/restconf/data/openconfig-system:system/openconfig-system-ext:systemd-coredump/config"
461+
462+
GET "<REST-SERVER:PORT>/restconf/data/openconfig-system:system/openconfig-system-ext:systemd-coredump/state"
463+
```
464+
465+
## Techsupport Export Services CLI commands
273466

274467
To enable the export feature:
275468

0 commit comments

Comments
 (0)