[4.3] Add a "reboot request" annotation flow, and `request-reboot` command #926

cgwalters · 2019-07-03T21:00:03Z

Currently, the MCO only supports rebooting for config changes;
the MCC sets the desiredConfig annotation, and the MCD implicitly
takes currentConfig != desiredConfig as permission to reboot.

However, we have cases where we want to support drain+reboot
for non-config changes.

The first major motivating reason for this is to handle kernel
arguments injected via initial MachineConfig objects. Until
we do more work, because pivot as shipped with OpenShift 4.1.0
doesn't know about MachineConfig (and we have no mechanism
right now to update the bootimages: openshift/os#381 )
what will happen is the MCD will land on the nodes and then
go degraded, because the kernel arguments don't match what is
expected.

Now, we really want to handle kernel arguments at the first boot,
and that will be done eventually.

But this gives us a mechanism to reconcile rather than go degraded.
Previously if we'd had the MCD start a reboot on its own, we could
exceed the "maxUnavailable" as defined by the pool, break master
etcd consistency etc.

In other words, reboots need to be managed via the MCC too.

For the MCD, the reboot-requested source of truth is the systemd
journal, as it gives us an easy way to "scope" a reboot request
to a given boot.

The flow here is:

MCC sets desiredConfig
MCD notices desiredConfig, logs request in journal
MCD gets its own journal message, adds reboot-requested annotation
MCC approves reboot via reboot-approved annotation
MCD reboots
MCD notices it has reboot annotation, checks to see if it's in the journal; it's not, so reboot-requested flag is removed
MCC notices node has reboot-approved without reboot-requested, removes reboot-approved

This ensures that config changes are always testing the "reboot request"
API.

For now, we expose that API as /usr/libexec/machine-config-daemon request-reboot <reason>.
For example, one could use oc debug node in a loop and run that on each node
to perform a rolling reboot of the hosts.

runcom · 2019-07-04T08:48:45Z

knowing the constraint behind all this (time mainly?) I believe this is a good start - the thing I'm still grasping is why we're using the journal to request an external reboot and not something kube driven like a crd or something, but that can surely come later

cgwalters · 2019-07-04T12:54:29Z

thing I'm still grasping is why we're using the journal to request an external reboot and not something kube driven like a crd or something, but that can surely come later

My thought is that anyone who needs to reboot a node has privileged access to that node anyways - e.g. the tuned case, it is a privileged daemonset. The journal gives us auditing and is already per-node.

The other reason is to "bind" the reboot request to a boot. We only query the journal for reboot requests logged after this boot; if it was an annotation, we'd have to do something like require the annotation include the boot ID or something, so we know when to delete it.

cgwalters · 2019-07-08T15:01:28Z

But just to think about this a bit more, what would a CRD approach to this look like? Something like a rebootrequest.machineconfiguration.openshift.io with:

spec:
    nodeRef:
      kind: Node
      name: node-41234
      uid: ab3ff6bc-a186-11e9-93cc-5254005254d4
    bootID: 7cacc01d-13b0-4d3c-abd2-8a8461cccd1c

And then each MCD would watch for all instances of rebootRequest? Or we'd have the MCC roll out via annotations as this PR is doing?

cgwalters · 2019-07-08T16:44:42Z

Worth noting here the journal is conceptually just an implementation detail; the "API" is execve(/usr/libexec/machine-config-daemon request-reboot <reason>).

That doesn't preclude that command creating a CRD in the future or something else.

cgwalters · 2019-07-22T19:46:04Z

Slowly making progress with this...what I'm hitting now is that for some reason the persistent journalctl -f process isn't seeing updates. One thing that I'm not sure is the cause of that but may be related is libsystemd seems to be falling back to polling instead of using inotify.

cgwalters · 2019-07-23T21:21:00Z

The journalctl issue is still a problem, I mostly wrestled with getting the units to pass today. There's a new "ping pong" involved in the reboot request/approval flags between the MCD and MCC, and to avoid having to entirely rewrite the unit tests I just changed them to set the flags in advance.

cgwalters · 2019-07-24T12:19:11Z

Hooray, this passed CI!

lucab · 2019-07-26T06:53:24Z

@LorbusChris at a very high-level, this is already very similar to the zincati+airlock way (local agent for reboots, external DB for lock handling). At lower levels, I don't think there is much that can be aligned/reused at this point.

cgwalters · 2019-07-26T13:04:18Z

I think for some use cases (particularly e.g. privileged daemonset pods), the "exec command on host to request reboot" is a natural-enough API.

However, there are other reasons to reboot. One case is that over time, kernel memory can get fragmented:

etc. And rebooting is a way to work around it. To deal with this you'd need to have a controller/script/job which did e.g. oc debug node/$x -- /usr/libexec/machine-config-daemon request-reboot "fragmentation" or so, which isn't very kube-like. If one was writing a "fragmentation avoidance controller/operator", it'd be nicer to e.g. watch Prometheus metrics and just set a node annotation or even something more CRD like probably.

Speaking of that, https://github.com/kubevirt/node-maintenance-operator is closely related here - I feel like that functionality might just be better as core to the MCO? And we'd have a

NodeMaintenance
  type: reboot

or so?

Currently, the MCO only supports rebooting for config changes; the MCC sets the `desiredConfig` annotation, and the MCD implicitly takes `currentConfig != desiredConfig` as permission to reboot. However, we have cases where we want to support drain+reboot for non-config changes. The first major motivating reason for this is to handle kernel arguments injected via initial MachineConfig objects. Until we do more work, because `pivot` as shipped with OpenShift 4.1.0 doesn't know about `MachineConfig` (and we have no mechanism right now to update the bootimages: openshift/os#381 ) what will happen is the MCD will land on the nodes and then go degraded, because the kernel arguments don't match what is expected. Now, we really want to handle kernel arguments at the first boot, and that will be done eventually. But this gives us a mechanism to reconcile rather than go degraded. Previously if we'd had the MCD start a reboot on its own, we could exceed the "maxUnavailable" as defined by the pool, break master etcd consistency etc. In other words, reboots need to be managed via the MCC too. For the MCD, the `reboot-requested` source of truth is the systemd journal, as it gives us an easy way to "scope" a reboot request to a given boot. The flow here is: - MCC sets desiredConfig - MCD notices desiredConfig, logs request in journal - MCD gets its own journal message, adds `reboot-requested` annotation - MCC approves reboot via `reboot-approved` annotation - MCD reboots - MCD notices it has reboot annotation, checks to see if it's in the journal; it's not, so `reboot-requested` flag is removed - MCC notices node has `reboot-approved` without `reboot-requested`, removes `reboot-approved` This ensures that config changes are always testing the "reboot request" API. For now, we expose that API as `/usr/libexec/machine-config-daemon request-reboot <reason>`. For example, one could use `oc debug node` in a loop and run that on each node to perform a rolling reboot of the hosts.

openshift-ci-robot · 2019-07-26T21:27:38Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [cgwalters]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2019-08-09T17:18:24Z

@cgwalters: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jmencak · 2019-11-28T07:51:50Z

Speaking of that, https://github.com/kubevirt/node-maintenance-operator is closely related here - I feel like that functionality might just be better as core to the MCO? And we'd have a

I believe the functionality should be part of an SLO (MCO is probably the best candidate), rather than an optional OLM-managed component. It seems wrong for SLOs to rely on OLM-managed components.

pliurh · 2019-12-03T08:53:49Z

Will this PR get merged in 4.3? or even 4.4?

cgwalters · 2020-02-14T22:02:51Z

Also related kubernetes/enhancements#1411

smarterclayton · 2020-02-14T22:05:22Z

In general this needs to be something more like

spec:
  generation: 2

where if someone bumps generation then all nodes are updated (generation is part of the generated config).

We use that on deployments, statefulsets, and daemonsets today by doing:

spec:
  template:
    metadata:
      annotations:
        generation: "2"

We are discussing standardizing this across all "reboot able objects" (including machines) such that a command like

oc rollout deploy/foo mcp/bar sts/baz

would thus rollout all of those entities.

EDIT: These names are made up, it might be configGeneration or stateGeneration or some other word that implies the "rebooting / redeploying - ness" of it.

cgwalters · 2020-05-29T00:35:22Z

Random aside considering this much later: One thing we can always do is take the hit of a "double reboot" - in current scenarios where e.g. the bootimage is too old to do something we want, we can write a systemd unit that does the pivot, reboots, and then uses new code (whether that's ostree or podman or whatever) to perform more things like kernel argument changes, and then finally reboots again and runs kubelet.service.

openshift-ci-robot · 2020-09-14T18:13:52Z

@cgwalters: The following tests failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/e2e-aws-op	`ea46adb`	link	`/test e2e-aws-op`
ci/prow/e2e-aws	`ea46adb`	link	`/test e2e-aws`
ci/prow/e2e-aws-disruptive	`ea46adb`	link	`/test e2e-aws-disruptive`
ci/prow/e2e-gcp-op	`ea46adb`	link	`/test e2e-gcp-op`
ci/prow/e2e-aws-upgrade	`ea46adb`	link	`/test e2e-aws-upgrade`
ci/prow/e2e-gcp-upgrade	`ea46adb`	link	`/test e2e-gcp-upgrade`
ci/prow/e2e-vsphere	`ea46adb`	link	`/test e2e-vsphere`
ci/prow/build-rpms-from-tar	`ea46adb`	link	`/test build-rpms-from-tar`
ci/prow/e2e-ovn-step-registry	`ea46adb`	link	`/test e2e-ovn-step-registry`
ci/prow/e2e-aws-proxy	`ea46adb`	link	`/test e2e-aws-proxy`
ci/prow/e2e-upgrade	`ea46adb`	link	`/test e2e-upgrade`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-merge-robot · 2020-11-05T21:58:10Z

@cgwalters: The following tests failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/e2e-agnostic-upgrade	`ea46adb`	link	`/test e2e-agnostic-upgrade`
ci/prow/e2e-aws-serial	`ea46adb`	link	`/test e2e-aws-serial`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-bot · 2021-02-16T23:40:49Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2021-03-19T02:39:04Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot · 2021-04-18T06:45:42Z

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci-robot · 2021-04-18T06:45:53Z

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 3, 2019

openshift-ci-robot requested review from ericavonb and LorbusChris July 3, 2019 21:00

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 3, 2019

cgwalters mentioned this pull request Jul 8, 2019

Expose API for blocking and initiating reboots #662

Closed

openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 9, 2019

cgwalters force-pushed the reboot-coordination branch from 7374f47 to 8752108 Compare July 9, 2019 20:23

cgwalters force-pushed the reboot-coordination branch from 8752108 to 638784a Compare July 16, 2019 18:18

openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 16, 2019

cgwalters force-pushed the reboot-coordination branch 4 times, most recently from d480b0c to d93b81b Compare July 22, 2019 17:38

cgwalters force-pushed the reboot-coordination branch from d93b81b to f9e14de Compare July 23, 2019 21:15

openshift-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 23, 2019

cgwalters force-pushed the reboot-coordination branch from f9e14de to bd604b0 Compare July 23, 2019 22:23

cgwalters force-pushed the reboot-coordination branch from bd604b0 to 3f8d302 Compare July 24, 2019 14:11

cgwalters changed the title ~~Reboot coordination~~ Add a "reboot request" annotation flow, and request-reboot command Jul 26, 2019

cgwalters mentioned this pull request Jul 26, 2019

define and scope relationship to MCO kubevirt/node-maintenance-operator#28

Closed

cgwalters force-pushed the reboot-coordination branch from bf63d4b to ea46adb Compare July 26, 2019 21:27

cgwalters changed the title ~~Add a "reboot request" annotation flow, and request-reboot command~~ [4.3] Add a "reboot request" annotation flow, and request-reboot command Aug 5, 2019

cgwalters added the 4.3 label Aug 5, 2019

openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 9, 2019

cgwalters mentioned this pull request Jul 13, 2020

Bug 1855821: pkg/controller/render: log actions on machine configs #1921

Closed

kikisdeliveryservice removed request for ericavonb and LorbusChris November 18, 2020 19:19

kikisdeliveryservice added the team-mco label Nov 18, 2020

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 16, 2021

openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 19, 2021

openshift-ci-robot closed this Apr 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[4.3] Add a "reboot request" annotation flow, and `request-reboot` command #926

[4.3] Add a "reboot request" annotation flow, and `request-reboot` command #926

cgwalters commented Jul 3, 2019 •

edited

Loading

runcom commented Jul 4, 2019

cgwalters commented Jul 4, 2019

cgwalters commented Jul 8, 2019

cgwalters commented Jul 8, 2019

cgwalters commented Jul 22, 2019

cgwalters commented Jul 23, 2019

cgwalters commented Jul 24, 2019

lucab commented Jul 26, 2019

cgwalters commented Jul 26, 2019

openshift-ci-robot commented Jul 26, 2019

openshift-ci-robot commented Aug 9, 2019

jmencak commented Nov 28, 2019

pliurh commented Dec 3, 2019

cgwalters commented Feb 14, 2020

smarterclayton commented Feb 14, 2020 •

edited

Loading

cgwalters commented May 29, 2020

openshift-ci-robot commented Sep 14, 2020

openshift-merge-robot commented Nov 5, 2020

openshift-bot commented Feb 16, 2021

openshift-bot commented Mar 19, 2021

openshift-bot commented Apr 18, 2021

openshift-ci-robot commented Apr 18, 2021

[4.3] Add a "reboot request" annotation flow, and request-reboot command #926

[4.3] Add a "reboot request" annotation flow, and request-reboot command #926

Conversation

cgwalters commented Jul 3, 2019 • edited Loading

runcom commented Jul 4, 2019

cgwalters commented Jul 4, 2019

cgwalters commented Jul 8, 2019

cgwalters commented Jul 8, 2019

cgwalters commented Jul 22, 2019

cgwalters commented Jul 23, 2019

cgwalters commented Jul 24, 2019

lucab commented Jul 26, 2019

cgwalters commented Jul 26, 2019

openshift-ci-robot commented Jul 26, 2019

openshift-ci-robot commented Aug 9, 2019

jmencak commented Nov 28, 2019

pliurh commented Dec 3, 2019

cgwalters commented Feb 14, 2020

smarterclayton commented Feb 14, 2020 • edited Loading

cgwalters commented May 29, 2020

openshift-ci-robot commented Sep 14, 2020

openshift-merge-robot commented Nov 5, 2020

openshift-bot commented Feb 16, 2021

openshift-bot commented Mar 19, 2021

openshift-bot commented Apr 18, 2021

openshift-ci-robot commented Apr 18, 2021

[4.3] Add a "reboot request" annotation flow, and `request-reboot` command #926

[4.3] Add a "reboot request" annotation flow, and `request-reboot` command #926

cgwalters commented Jul 3, 2019 •

edited

Loading

smarterclayton commented Feb 14, 2020 •

edited

Loading