Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running container killed with failure to write to cgroup.procs #1326

Open
sboschman opened this issue Feb 14, 2017 · 6 comments
Open

Running container killed with failure to write to cgroup.procs #1326

sboschman opened this issue Feb 14, 2017 · 6 comments

Comments

@sboschman
Copy link

On our Jenkins CI infrastructure we run Maven builds inside a Docker container. Unfortunately once in a while the build container crashes during the execution of the Maven build with a failure writing a pid to the cgroup.proc file.

Feb` 13 23:56:41 myhost dockerd-current[5659]: time="2017-02-13T23:56:41.695619134+01:00" level=error msg="Error running exec in container: rpc error: code = 13 desc = invalid header field value "oci runtime error: exec failed: container_linux.go:247: starting container process caused \"process_linux.go:87: adding pid 21890 to cgroups caused \\\"failed to write 21890 to cgroup.procs: write /sys/fs/cgroup/cpu,cpuacct/system.slice/docker-f2ea7bd5f37f4d5719fec4a05fdb58401207c98b6abbfa02e497af4bc167ec08.scope/cgroup.procs: invalid argument\\\"\"\n\

Feb 14 04:34:52 myhost dockerd-current[5659]: time="2017-02-14T04:34:52.084545467+01:00" level=error msg="Error running exec in container: rpc error: code = 13 desc = invalid header field value "oci runtime error: exec failed: container_linux.go:247: starting container process caused \"process_linux.go:87: adding pid 36656 to cgroups caused \\\"failed to write 36656 to cgroup.procs: write /sys/fs/cgroup/cpu,cpuacct/system.slice/docker-2b412ea4871f6b1ed33547224bac344ea677f1814604a501a44e42ce84b64854.scope/cgroup.procs: invalid argument\\\"\"\n""

Feb 14 06:20:22 myhost dockerd-current[5659]: time="2017-02-14T06:20:22.751841415+01:00" level=error msg="Error running exec in container: rpc error: code = 13 desc = invalid header field value "oci runtime error: exec failed: container_linux.go:247: starting container process caused \"process_linux.go:87: adding pid 30239 to cgroups caused \\\"failed to write 30239 to cgroup.procs: write /sys/fs/cgroup/cpu,cpuacct/system.slice/docker-1418520af414c01054dce8ca3777b616289ee41a0de7ad135af8e2e740472a49.scope/cgroup.procs: invalid argument\\\"\"\n""```

I assume the error is thrown from https://github.com/opencontainers/runc/blob/v1.0.0-rc2/libcontainer/cgroups/utils.go#L422 , which boils down to https://github.com/golang/go/blob/master/src/io/ioutil/ioutil.go#L76 and https://github.com/golang/go/blob/master/src/os/file.go#L139

 Running: 3
 Paused: 0
 Stopped: 7
Images: 54
Server Version: 1.12.6
Storage Driver: devicemapper
 Pool Name: docker-thinpool
 Pool Blocksize: 524.3 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: xfs
 Data file: 
 Metadata file: 
 Data Space Used: 10.18 GB
 Data Space Total: 45.1 GB
 Data Space Available: 34.92 GB
 Metadata Space Used: 4.526 MB
 Metadata Space Total: 4.295 GB
 Metadata Space Available: 4.29 GB                                                                                                                                                                                                                                                                                           
 Thin Pool Minimum Free Space: 4.509 GB                                                                                                                                                                                                                                                                                      
 Udev Sync Supported: true                                                                                                                                                                                                                                                                                                   
 Deferred Removal Enabled: true                                                                                                                                                                                                                                                                                              
 Deferred Deletion Enabled: true                                                                                                                                                                                                                                                                                             
 Deferred Deleted Device Count: 0                                                                                                                                                                                                                                                                                            
 Library Version: 1.02.136 (2016-11-05)                                                                                                                                                                                                                                                                                      
Logging Driver: journald                                                                                                                                                                                                                                                                                                     
Cgroup Driver: systemd                                                                                                                                                                                                                                                                                                       
Plugins:                                                                                                                                                                                                                                                                                                                     
 Volume: local                                                                                                                                                                                                                                                                                                               
 Network: overlay host null bridge                                                                                                                                                                                                                                                                                           
Swarm: inactive                                                                                                                                                                                                                                                                                                              
Runtimes: oci runc                                                                                                                                                                                                                                                                                                           
Default Runtime: oci                                                                                                                                                                                                                                                                                                         
Security Options: seccomp selinux                                                                                                                                                                                                                                                                                            
Kernel Version: 4.9.7-201.fc25.x86_64                                                                                                                                                                                                                                                                                        
Operating System: Fedora 25 (Atomic Host)                                                                                                                                                                                                                                                                                    
OSType: linux                                                                                                                                                                                                                                                                                                                
Architecture: x86_64                                                                                                                                                                                                                                                                                                         
Number of Docker Hooks: 2                                                                                                                                                                                                                                                                                                    
CPUs: 56                                                                                                                                                                                                                                                                                                                     
Total Memory: 125.8 GiB                                                                                                                                                                                                                                                                                                      
Name: myhost                                                                                                                                                                                                                                                                                           
ID: XDZT:BINX:3JZJ:BABH:6WSS:T2D5:Z5XJ:FM3Y:HOG7:XB33:T22Z:F2IS                                                                                                                                                                                                                                                              
Docker Root Dir: /var/lib/docker                                                                                                                                                                                                                                                                                             
Debug Mode (client): false                                                                                                                                                                                                                                                                                                   
Debug Mode (server): false                                                                                                                                                                                                                                                                                                   
Registry: https://index.docker.io/v1/                                                                                                                                                                                                                                                                                        
Insecure Registries:                                                                                                                                                                                                                                                                                                         
 127.0.0.0/8                                                                                                                                                                                                                                                                                                                 
Registries: docker.io (secure)    ```
@hqhq
Copy link
Contributor

hqhq commented Aug 9, 2017

Given that all errors happened when you were using docker exec and trying to join cpu, cpuacct group, the only possibility I can think of is somehow the process got PF_NO_SETAFFINITY set (usually not possibly in userspace) or the process was set to be an RT process without rt_runtime allocated in the cgroup.

danail-branekov added a commit to masters-of-cats/runc that referenced this issue Aug 30, 2018
opencontainers#1326 suggests that invalid
argument err while adding the process to a cgroup might be caused by
process affinity flags. Lets log that and see
@chinglinwen
Copy link

does this relate to #1884

I have same error text

"note": "Liveness probe failed: OCI runtime exec failed: exec failed: container_linux.go:348: starting container process caused \"process_linux.go:90: adding pid 27257 to cgrou
ps caused \\\"failed to write 27257 to cgroup.procs: write /sys/fs/cgroup/cpu,cpuacct/kubepods/burstable/poda16dca42-8cfc-11e9-8753-767ef6f517db/443e19668182ba1351c93af648fad2f8
b839990567d5fd4c612c152800888301/cgroup.procs: invalid argument\\\"\": unknown\r\n",
 "type": "Warning",

readiness check:

        readinessProbe:
          exec:
            command:
            - /bin/sh
            - -c
            - redis-cli
            - -h
            - ${POD_IP}
            - -p
            - "19000"
            - ping
          failureThreshold: 3
          initialDelaySeconds: 5
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 1

kubernetes v1.14.1
os: CentOS Linux release 7.4.1708 (Core)
kernel: 4.14.15-1.el7.elrepo.x86_64
docker: 18.06.2-ce (API version: 1.38 (minimum version 1.12)

@zeusro
Copy link

zeusro commented Jul 19, 2019

@chinglinwen Similar situation like you.

Readiness probe failed: rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:262: starting container process caused "process_linux.go:86: adding pid 16166 to cgroups caused "failed to write 16166 to cgroup.procs: write /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod6e7e876e_9957_11e9_a845_00163e08cd06.slice/docker-941ddc07fc84ba668df4821403a6b051c85aad4cf6c64153aae0e9a0977d943d.scope/cgroup.procs: invalid argument\

 Kernel Version:             3.10.0-693.2.2.el7.x86_64
 OS Image:                   CentOS Linux 7 (Core)
 Operating System:           linux
 Architecture:               amd64
 Container Runtime Version:  docker://17.6.2
 Kubelet Version:            v1.12.6-aliyun.1
 Kube-Proxy Version:         v1.12.6-aliyun.1

@nnvema
Copy link

nnvema commented Oct 9, 2019

I face the same problem running on kops -1.11.7

Liveness probe failed: rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:87: adding pid 27268 to cgroups caused "failed to write 27268 to cgroup.procs: write /sys/fs/cgroup/cpu,cpuacct/kubepods/burstable/pod612392f6-a439-11e9-9830-0e60496b67de/958ab969a162d91c9e58cb9e84db295083dfb3e4aa833e7575d3d042bffce720/cgroup.procs: invalid argument""

  Normal   Killing    20m (x9 over 83d)   kubelet,   Killing container with id docker://kd-inventory:Container failed liveness probe.. Container will be killed and recreated.

@ilyesAj
Copy link

ilyesAj commented Mar 9, 2021

any updates on this issue ?

@kolyshkin
Copy link
Contributor

@ilyesAj do you see this, too? If yes, can you peek into the kernel logs (dmesg) and see if there's anything from the OOM killer. I suspect this is a race between runc trying to start exec and the kernel killing the exec'ed process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants