Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple containers in the same cgroup #716

Open
kolyshkin opened this issue Aug 10, 2021 · 3 comments
Open

Multiple containers in the same cgroup #716

kolyshkin opened this issue Aug 10, 2021 · 3 comments

Comments

@kolyshkin
Copy link
Collaborator

kolyshkin commented Aug 10, 2021

It seems that crun has the same set of issues as opencontainers/runc#3132

Using config.json with cgroupPath set:

# crun run -d s3
# crun run -d s4
# crun list
NAME PID       STATUS   BUNDLE PATH                            
s3   245092    running  /home/kir/git/runc/tst                 
s4   245107    running  /home/kir/git/runc/tst                 
# diff -u /proc/{245092,245107}/cgroup 
(same cgroup)
# crun pause s3
# crun list
NAME PID       STATUS   BUNDLE PATH                            
s3   245092    paused   /home/kir/git/runc/tst                 
s4   245107    paused   /home/kir/git/runc/tst                 
(both paused)
root@ubu2004:/home/kir/git/runc/tst# crun run -d s5
(hung)

Can we discuss it at opencontainers/runc#3132 @giuseppe?

@kolyshkin
Copy link
Collaborator Author

So, for runc I am implementing these measures (see last commits in opencontainers/runc#3131)

  1. runc run/create: refuse non-empty cgroup
  2. runc run/create: refuse cgroup if frozen
  3. runc exec: refuse paused container

Item 3 is fixed in crun by #727.

Items 1 and 2 are somewhat harder to fix in crun in case systemd manager is used, as in this case cgroup path is only known after we put a process in a cgroup (see systemd_finalize), meaning it's not possible to do any cgroup checks before we already added init pid into it.

I think it's possible to change that (i.e. figure out the path beforehand, rather than get it from /proc/PID/cgroup after), but I'm not sure such change would be wecomed.

kolyshkin added a commit to kolyshkin/runtime-spec that referenced this issue Sep 27, 2021
It makes sense for runtime to reject a cgroup which is frozen
(for both new and existing container), otherwise the runtime
command will just end up stuck.

It makes sense for runtime to make sure the cgroup for a new container
is empty (i.e. there are no processes it in), and reject it otherwise.
The scenario in which a non-empty cgroup is used for a new container
has multiple problems, for example:

 * If two or more containers share the same cgroup, and each container
   has its own limits configured, the order of container starts
   ultimately determines whose limits will be effectively applied.

* If two or more containers share the same cgroup, and one of containers
  is paused/unpaused, all others are paused, too.

* If cgroup.kill is used to forcefully kill the container, it will also
  kill other processes that are not part of this container but merely
  belong to the same cgroup.

  * When a systemd cgroup manager is used, this becomes even worse. Such
  as, stop (or even failed start) of any container results in
  stopTransientUnit command being sent to systemd, and so (depending
  on unit properties) other containers can receive SIGTERM, be killed
  after a timeout etc.

* Many other bad scenarios are possible, as the implicit assumption
  of 1:1 container:cgroup mapping is broken.

opencontainers/runc#3132
containers/crun#716

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runtime-spec that referenced this issue Sep 28, 2021
It makes sense for runtime to reject a cgroup which is frozen
(for both new and existing container), otherwise the runtime
command (create/run/exec) may end up being stuck.

It makes sense for runtime to make sure the cgroup for a new container
is empty (i.e. there are no processes it in), and reject it otherwise.
The scenario in which a non-empty cgroup is used for a new container
has multiple problems, for example:

 * If two or more containers share the same cgroup, and each container
   has its own limits configured, the order of container starts
   ultimately determines whose limits will be effectively applied.

* If two or more containers share the same cgroup, and one of containers
  is paused/unpaused, all others are paused, too.

* If cgroup.kill is used to forcefully kill the container, it will also
  kill other processes that are not part of this container but merely
  belong to the same cgroup.

  * When a systemd cgroup manager is used, this becomes even worse. Such
  as, stop (or even failed start) of any container results in
  stopTransientUnit command being sent to systemd, and so (depending
  on unit properties) other containers can receive SIGTERM, be killed
  after a timeout etc.

* Many other bad scenarios are possible, as the implicit assumption
  of 1:1 container:cgroup mapping is broken.

opencontainers/runc#3132
containers/crun#716

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runtime-spec that referenced this issue Sep 28, 2021
It makes sense for runtime to reject a cgroup which is frozen
(for both new and existing container), otherwise the runtime
command (create/run/exec) may end up being stuck.

It makes sense for runtime to make sure the cgroup for a new container
is empty (i.e. there are no processes it in), and reject it otherwise.
The scenario in which a non-empty cgroup is used for a new container
has multiple problems, for example:

 * If two or more containers share the same cgroup, and each container
   has its own limits configured, the order of container starts
   ultimately determines whose limits will be effectively applied.

* If two or more containers share the same cgroup, and one of containers
  is paused/unpaused, all others are paused, too.

* If cgroup.kill is used to forcefully kill the container, it will also
  kill other processes that are not part of this container but merely
  belong to the same cgroup.

  * When a systemd cgroup manager is used, this becomes even worse. Such
  as, stop (or even failed start) of any container results in
  stopTransientUnit command being sent to systemd, and so (depending
  on unit properties) other containers can receive SIGTERM, be killed
  after a timeout etc.

* Many other bad scenarios are possible, as the implicit assumption
  of 1:1 container:cgroup mapping is broken.

opencontainers/runc#3132
containers/crun#716

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runtime-spec that referenced this issue Sep 30, 2021
It makes sense for runtime to reject a cgroup which is frozen
(for both new and existing container), otherwise the runtime
command (create/run/exec) may end up being stuck.

It makes sense for runtime to make sure the cgroup for a new container
is empty (i.e. there are no processes it in), and reject it otherwise.
The scenario in which a non-empty cgroup is used for a new container
has multiple problems, for example:

 * If two or more containers share the same cgroup, and each container
   has its own limits configured, the order of container starts
   ultimately determines whose limits will be effectively applied.

* If two or more containers share the same cgroup, and one of containers
  is paused/unpaused, all others are paused, too.

* If cgroup.kill is used to forcefully kill the container, it will also
  kill other processes that are not part of this container but merely
  belong to the same cgroup.

  * When a systemd cgroup manager is used, this becomes even worse. Such
  as, stop (or even failed start) of any container results in
  stopTransientUnit command being sent to systemd, and so (depending
  on unit properties) other containers can receive SIGTERM, be killed
  after a timeout etc.

* Many other bad scenarios are possible, as the implicit assumption
  of 1:1 container:cgroup mapping is broken.

opencontainers/runc#3132
containers/crun#716

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit to kolyshkin/runtime-spec that referenced this issue Sep 30, 2021
It makes sense for runtime to reject a cgroup which is frozen
(for both new and existing container), otherwise the runtime
command (create/run/exec) may end up being stuck.

It makes sense for runtime to make sure the cgroup for a new container
is empty (i.e. there are no processes it in), and reject it otherwise.
The scenario in which a non-empty cgroup is used for a new container
has multiple problems, for example:

 * If two or more containers share the same cgroup, and each container
   has its own limits configured, the order of container starts
   ultimately determines whose limits will be effectively applied.

* If two or more containers share the same cgroup, and one of containers
  is paused/unpaused, all others are paused, too.

* If cgroup.kill is used to forcefully kill the container, it will also
  kill other processes that are not part of this container but merely
  belong to the same cgroup.

  * When a systemd cgroup manager is used, this becomes even worse. Such
  as, stop (or even failed start) of any container results in
  stopTransientUnit command being sent to systemd, and so (depending
  on unit properties) other containers can receive SIGTERM, be killed
  after a timeout etc.

* Many other bad scenarios are possible, as the implicit assumption
  of 1:1 container:cgroup mapping is broken.

opencontainers/runc#3132
containers/crun#716

Signed-off-by: Kir Kolyshkin <[email protected]>
@cdoern
Copy link
Contributor

cdoern commented Jul 1, 2022

@giuseppe I am looking to get into some crun code, you think this would be a good issue to pick up, or is this resolved?

@giuseppe
Copy link
Member

giuseppe commented Jul 3, 2022

honestly, I am not sure about addressing it. There could be valid use cases for running different containers in the same cgroup, even if it is a weird configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants