-
Notifications
You must be signed in to change notification settings - Fork 416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[4.3] Add a "reboot request" annotation flow, and request-reboot
command
#926
Conversation
knowing the constraint behind all this (time mainly?) I believe this is a good start - the thing I'm still grasping is why we're using the journal to request an external reboot and not something kube driven like a crd or something, but that can surely come later |
My thought is that anyone who needs to reboot a node has privileged access to that node anyways - e.g. the tuned case, it is a privileged daemonset. The journal gives us auditing and is already per-node. The other reason is to "bind" the reboot request to a boot. We only query the journal for reboot requests logged after this boot; if it was an annotation, we'd have to do something like require the annotation include the boot ID or something, so we know when to delete it. |
But just to think about this a bit more, what would a CRD approach to this look like? Something like a
And then each MCD would watch for all instances of rebootRequest? Or we'd have the MCC roll out via annotations as this PR is doing? |
Worth noting here the journal is conceptually just an implementation detail; the "API" is That doesn't preclude that command creating a CRD in the future or something else. |
7374f47
to
8752108
Compare
8752108
to
638784a
Compare
d480b0c
to
d93b81b
Compare
Slowly making progress with this...what I'm hitting now is that for some reason the persistent |
d93b81b
to
f9e14de
Compare
The journalctl issue is still a problem, I mostly wrestled with getting the units to pass today. There's a new "ping pong" involved in the reboot request/approval flags between the MCD and MCC, and to avoid having to entirely rewrite the unit tests I just changed them to set the flags in advance. |
f9e14de
to
bd604b0
Compare
Hooray, this passed CI! |
bd604b0
to
3f8d302
Compare
@LorbusChris at a very high-level, this is already very similar to the zincati+airlock way (local agent for reboots, external DB for lock handling). At lower levels, I don't think there is much that can be aligned/reused at this point. |
request-reboot
command
I think for some use cases (particularly e.g. privileged daemonset pods), the "exec command on host to request reboot" is a natural-enough API. However, there are other reasons to reboot. One case is that over time, kernel memory can get fragmented: etc. And rebooting is a way to work around it. To deal with this you'd need to have a controller/script/job which did e.g. Speaking of that, https://github.com/kubevirt/node-maintenance-operator is closely related here - I feel like that functionality might just be better as core to the MCO? And we'd have a
or so? |
Currently, the MCO only supports rebooting for config changes; the MCC sets the `desiredConfig` annotation, and the MCD implicitly takes `currentConfig != desiredConfig` as permission to reboot. However, we have cases where we want to support drain+reboot for non-config changes. The first major motivating reason for this is to handle kernel arguments injected via initial MachineConfig objects. Until we do more work, because `pivot` as shipped with OpenShift 4.1.0 doesn't know about `MachineConfig` (and we have no mechanism right now to update the bootimages: openshift/os#381 ) what will happen is the MCD will land on the nodes and then go degraded, because the kernel arguments don't match what is expected. Now, we really want to handle kernel arguments at the first boot, and that will be done eventually. But this gives us a mechanism to reconcile rather than go degraded. Previously if we'd had the MCD start a reboot on its own, we could exceed the "maxUnavailable" as defined by the pool, break master etcd consistency etc. In other words, reboots need to be managed via the MCC too. For the MCD, the `reboot-requested` source of truth is the systemd journal, as it gives us an easy way to "scope" a reboot request to a given boot. The flow here is: - MCC sets desiredConfig - MCD notices desiredConfig, logs request in journal - MCD gets its own journal message, adds `reboot-requested` annotation - MCC approves reboot via `reboot-approved` annotation - MCD reboots - MCD notices it has reboot annotation, checks to see if it's in the journal; it's not, so `reboot-requested` flag is removed - MCC notices node has `reboot-approved` without `reboot-requested`, removes `reboot-approved` This ensures that config changes are always testing the "reboot request" API. For now, we expose that API as `/usr/libexec/machine-config-daemon request-reboot <reason>`. For example, one could use `oc debug node` in a loop and run that on each node to perform a rolling reboot of the hosts.
bf63d4b
to
ea46adb
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cgwalters The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
request-reboot
commandrequest-reboot
command
@cgwalters: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I believe the functionality should be part of an SLO (MCO is probably the best candidate), rather than an optional OLM-managed component. It seems wrong for SLOs to rely on OLM-managed components. |
Will this PR get merged in 4.3? or even 4.4? |
Also related kubernetes/enhancements#1411 |
In general this needs to be something more like
where if someone bumps generation then all nodes are updated (generation is part of the generated config). We use that on deployments, statefulsets, and daemonsets today by doing:
We are discussing standardizing this across all "reboot able objects" (including machines) such that a command like
would thus rollout all of those entities. EDIT: These names are made up, it might be |
Random aside considering this much later: One thing we can always do is take the hit of a "double reboot" - in current scenarios where e.g. the bootimage is too old to do something we want, we can write a systemd unit that does the pivot, reboots, and then uses new code (whether that's ostree or podman or whatever) to perform more things like kernel argument changes, and then finally reboots again and runs |
@cgwalters: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@cgwalters: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Currently, the MCO only supports rebooting for config changes;
the MCC sets the
desiredConfig
annotation, and the MCD implicitlytakes
currentConfig != desiredConfig
as permission to reboot.However, we have cases where we want to support drain+reboot
for non-config changes.
The first major motivating reason for this is to handle kernel
arguments injected via initial MachineConfig objects. Until
we do more work, because
pivot
as shipped with OpenShift 4.1.0doesn't know about
MachineConfig
(and we have no mechanismright now to update the bootimages: openshift/os#381 )
what will happen is the MCD will land on the nodes and then
go degraded, because the kernel arguments don't match what is
expected.
Now, we really want to handle kernel arguments at the first boot,
and that will be done eventually.
But this gives us a mechanism to reconcile rather than go degraded.
Previously if we'd had the MCD start a reboot on its own, we could
exceed the "maxUnavailable" as defined by the pool, break master
etcd consistency etc.
In other words, reboots need to be managed via the MCC too.
For the MCD, the
reboot-requested
source of truth is the systemdjournal, as it gives us an easy way to "scope" a reboot request
to a given boot.
The flow here is:
reboot-requested
annotationreboot-approved
annotationreboot-requested
flag is removedreboot-approved
withoutreboot-requested
, removesreboot-approved
This ensures that config changes are always testing the "reboot request"
API.
For now, we expose that API as
/usr/libexec/machine-config-daemon request-reboot <reason>
.For example, one could use
oc debug node
in a loop and run that on each nodeto perform a rolling reboot of the hosts.