Multiple containers in the same cgroup #716

kolyshkin · 2021-08-10T22:11:22Z

It seems that crun has the same set of issues as opencontainers/runc#3132

Using config.json with cgroupPath set:

# crun run -d s3
# crun run -d s4
# crun list
NAME PID       STATUS   BUNDLE PATH                            
s3   245092    running  /home/kir/git/runc/tst                 
s4   245107    running  /home/kir/git/runc/tst                 
# diff -u /proc/{245092,245107}/cgroup 
(same cgroup)
# crun pause s3
# crun list
NAME PID       STATUS   BUNDLE PATH                            
s3   245092    paused   /home/kir/git/runc/tst                 
s4   245107    paused   /home/kir/git/runc/tst                 
(both paused)
root@ubu2004:/home/kir/git/runc/tst# crun run -d s5
(hung)

Can we discuss it at opencontainers/runc#3132 @giuseppe?

The text was updated successfully, but these errors were encountered:

kolyshkin · 2021-09-02T00:12:24Z

So, for runc I am implementing these measures (see last commits in opencontainers/runc#3131)

runc run/create: refuse non-empty cgroup
runc run/create: refuse cgroup if frozen
runc exec: refuse paused container

Item 3 is fixed in crun by #727.

Items 1 and 2 are somewhat harder to fix in crun in case systemd manager is used, as in this case cgroup path is only known after we put a process in a cgroup (see systemd_finalize), meaning it's not possible to do any cgroup checks before we already added init pid into it.

I think it's possible to change that (i.e. figure out the path beforehand, rather than get it from /proc/PID/cgroup after), but I'm not sure such change would be wecomed.

It makes sense for runtime to reject a cgroup which is frozen (for both new and existing container), otherwise the runtime command will just end up stuck. It makes sense for runtime to make sure the cgroup for a new container is empty (i.e. there are no processes it in), and reject it otherwise. The scenario in which a non-empty cgroup is used for a new container has multiple problems, for example: * If two or more containers share the same cgroup, and each container has its own limits configured, the order of container starts ultimately determines whose limits will be effectively applied. * If two or more containers share the same cgroup, and one of containers is paused/unpaused, all others are paused, too. * If cgroup.kill is used to forcefully kill the container, it will also kill other processes that are not part of this container but merely belong to the same cgroup. * When a systemd cgroup manager is used, this becomes even worse. Such as, stop (or even failed start) of any container results in stopTransientUnit command being sent to systemd, and so (depending on unit properties) other containers can receive SIGTERM, be killed after a timeout etc. * Many other bad scenarios are possible, as the implicit assumption of 1:1 container:cgroup mapping is broken. opencontainers/runc#3132 containers/crun#716 Signed-off-by: Kir Kolyshkin <[email protected]>

It makes sense for runtime to reject a cgroup which is frozen (for both new and existing container), otherwise the runtime command (create/run/exec) may end up being stuck. It makes sense for runtime to make sure the cgroup for a new container is empty (i.e. there are no processes it in), and reject it otherwise. The scenario in which a non-empty cgroup is used for a new container has multiple problems, for example: * If two or more containers share the same cgroup, and each container has its own limits configured, the order of container starts ultimately determines whose limits will be effectively applied. * If two or more containers share the same cgroup, and one of containers is paused/unpaused, all others are paused, too. * If cgroup.kill is used to forcefully kill the container, it will also kill other processes that are not part of this container but merely belong to the same cgroup. * When a systemd cgroup manager is used, this becomes even worse. Such as, stop (or even failed start) of any container results in stopTransientUnit command being sent to systemd, and so (depending on unit properties) other containers can receive SIGTERM, be killed after a timeout etc. * Many other bad scenarios are possible, as the implicit assumption of 1:1 container:cgroup mapping is broken. opencontainers/runc#3132 containers/crun#716 Signed-off-by: Kir Kolyshkin <[email protected]>

cdoern · 2022-07-01T18:21:02Z

@giuseppe I am looking to get into some crun code, you think this would be a good issue to pick up, or is this resolved?

giuseppe · 2022-07-03T15:03:12Z

honestly, I am not sure about addressing it. There could be valid use cases for running different containers in the same cgroup, even if it is a weird configuration.

kolyshkin mentioned this issue Aug 10, 2021

Multiple containers in the same cgroup opencontainers/runc#3132

Closed

kolyshkin mentioned this issue Sep 27, 2021

config-linux: MAY reject an unfit cgroup opencontainers/runtime-spec#1125

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple containers in the same cgroup #716

Multiple containers in the same cgroup #716

kolyshkin commented Aug 10, 2021 •

edited

Loading

kolyshkin commented Sep 2, 2021

cdoern commented Jul 1, 2022

giuseppe commented Jul 3, 2022

Multiple containers in the same cgroup #716

Multiple containers in the same cgroup #716

Comments

kolyshkin commented Aug 10, 2021 • edited Loading

kolyshkin commented Sep 2, 2021

cdoern commented Jul 1, 2022

giuseppe commented Jul 3, 2022

kolyshkin commented Aug 10, 2021 •

edited

Loading