-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v4.9-rhel] Ensure that containers do not get stuck in stopping #23088
[v4.9-rhel] Ensure that containers do not get stuck in stopping #23088
Conversation
The scenario for inducing this is as follows: 1. Start a container with a long stop timeout and a PID1 that ignores SIGTERM 2. Use `podman stop` to stop that container 3. Simultaneously, in another terminal, kill -9 `pidof podman` (the container is now in ContainerStateStopping) 4. Now kill that container's Conmon with SIGKILL. 5. No commands are able to move the container from Stopping to Stopped now. The cause is a logic bug in our exit-file handling logic. Conmon being dead without an exit file causes no change to the state. Add handling for this case that tries to clean up, including stopping the container if it still seems to be running. Fixes containers#19629 Addresses: https://issues.redhat.com/browse/ACCELFIX-250 Signed-off-by: Matt Heon <[email protected]> Signed-off-by: tomsweeneyredhat <[email protected]>
Added the hold to make sure we get a Jira Card assigned and attached to this. |
@edsantiago and/or @cevich I keep running into the below error on the "Validate rawhide Build" test. Other than pressing "try again", is there anything else to do?
|
Seems to be this podman/contrib/cirrus/setup_environment.sh Lines 337 to 338 in 1b04994
Can you try removing that dnf line? And maybe removing it from the other two places that also dnf it? At least that might get you past |
Another suggestion: Remove the rawhide validation (or possibly ALL rawhide everything) from CI, I don't think old-rawhide is useful on a RHEL release branch. |
The Rawhide tests were failing all over the place in the v4.9-rhel branch. They are only necessary for main, remove them. Signed-off-by: tomsweeneyredhat <[email protected]>
Turning off Rawhide seems to have done the trick. Ready for review and happy green tests buttons. Ditto #23089 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Luap99, TomSweeneyRedHat The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
b699052
into
containers:v4.9-rhel
Addresses: https://issues.redhat.com/browse/RHEL-45531 |
The scenario for inducing this is as follows:
podman stop
to stop that containerpidof podman
(the container is now in ContainerStateStopping)The cause is a logic bug in our exit-file handling logic. Conmon being dead without an exit file causes no change to the state. Add handling for this case that tries to clean up, including stopping the container if it still seems to be running.
Fixes #19629
Addresses: https://issues.redhat.com/browse/ACCELFIX-250
Does this PR introduce a user-facing change?