Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework build process to generate rhel-coreos-base distinct from ocp-rhel-coreos #799

Closed
cgwalters opened this issue May 11, 2022 · 18 comments
Assignees
Labels
jira lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@cgwalters
Copy link
Member

cgwalters commented May 11, 2022

Reworking RHEL CoreOS to be more like OKD and towards quay.io/openshift/node-base:rhel10

This pre-enhancement originated in this github issue.

A foundational decision in early on OpenShift 4 was to create RHEL CoreOS. Key
aspects of this were:

  • kubelet would not be containerized (negative experience with "system containers")
  • More crucially, we wanted to ship a tested combination of operating system and cluster
  • Also, the operating system updates should come in a container image

We're several years in now, and have learned a lot. This proposal calls for
reworking how we build things, but will avoid changing these key aspects.

Rework RHCOS disk images to not have OCP content

When we speak of RHEL CoreOS, there are two independent things at play:

  • disk images (AMI, qcow2, ISO, etc.)
  • OS update container

In this base proposal, the disk images shift to only RHEL content.

  • kubelet will not be in the AMI.
  • The version will change to something of the form $rhel.$datestamp, e.g. 9.2.20220510.1

Additionally, there will be a new container image called rhel-coreos-base that
will exactly match this.

These disk images will generally only be updated at the GA release of each RHEL, and will not contain security updates.

In phase 0, openshift-installer will continue to have rhcos.json. Disk images will continue to be provided at e.g. mirror.openshift.com.

However, the disk images will be much more likely to be shared across OCP releases in a bit for bit fashion.

machine-os-content/rhel-coreos-9

The key change here is that OCP content, including kubelet move into a container
image that derives from this base image. One can imagine it as the following Containerfile:

FROM rhel-coreos-base
RUN rpm-ostree install openshift-hyperkube

This is in fact currently done for OKD.

flowchart TD
    rpms[RHEL rpms] --> base[quay.io/openshift/rhel-coreos-base:9]-- Add kubelet, crio, openvswitch --> ocpnode[quay.io/openshift/rhel-coreos:9]
Loading

In phase 0, this new image will likely be built by the current CoreOS pipeline.

installer changes to always rebase/pivot from the disk image

Because OCP has not usually respun disk images for releases, at a technical level nodes always do an in-place OS update before kubelet starts.

In this new model, this is now also the time when kubelet gets installed.

The only exception to this today for OCP is the bootstrap node. The bootstrap node would switch to also doing an in-place update to the desired node image. This is how OKD works today.

flowchart LR
    installer[openshift-install] -->boot[RHEL base CoreOS disk image]-- pull quay.io/openshift/node:rhel10+reboot -->node[OCP node]
Loading

Phase 1 followups

Consider the above as a "phase 0" - a minimum set of changes to achieve a significant improvement without breaking things.

Create https://gitlab.com/redhat/coreos/base.git

A while ago, we created github.com/openshift/os to be the source of truth for RHCOS. But after phase 0 is done, conceptually there's nothing OCP specific about this. In order to align with RHEL, we could move into the https://gitlab.com/redhat project.

Images built with (or just mirroring) C9S composes

We can start producing images that exactly match a C9S compose; including mirroring version numbers.

github.com/openshift/node

It would make a huge amount of sense to also move the base systemd unit file into what is currently called rhel-coreos. The systemd unit currently lives in the MCO.

If we do the above gitlab/coreos/base.git change first, then this git repository could instead change to become openshift/node, and the systemd unit would perhaps live here (but maybe it should really be part of the RPM?)

Then, a next major step is to have this node image to be built the same way as any other OCP platform image, via Prow for CI and OSBS for production builds. This would significantly simplify the current RHCOS pipeline, and making it much more clear that it should align with RHEL lifecycles and technologies.

This may be a significant enough change on its own to call for renaming the OS image in the payload (yes, again) to just node, de-emphasizing "coreos".

@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 10, 2022
@travier
Copy link
Member

travier commented Aug 11, 2022

/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 11, 2022
@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 10, 2022
@sdodson
Copy link
Member

sdodson commented Nov 11, 2022

/lifecycle frozen

@openshift-ci openshift-ci bot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 11, 2022
@LorbusChris
Copy link
Member

We're looking at this from the OKD/SCOS side too.

The RPMs that I'd at a minimum like to split out from the base OS into a layer that is versioned together with the rest of the OpenShift codebase are the following:

  • conmon-rs
  • cri-o
  • cri-tools
  • openshift-clients
  • openshift-hyperkube
  • openvswitch (and NetworkManager-ovs)

@cgwalters
Copy link
Member Author

I love the idea of trying this out first in OKD. We'd need to bikeshed implementation strategy...a whole lot involved in either path of coreos-assembler or ignoring coreos-assembler and going full native container builds via Dockerfile or a middle ground of trying to implement rpm-ostree compose image --from.

@LorbusChris
Copy link
Member

To add to the bikeshedding, my first thought was that rpm-ostree install --from-manifest would be handy for this. It would be run during the container build and consume a manifest from os.

I think we'll also need to update the builds metadata with the layered container image artifact build, very similar to what cosa build-extensions does today, something like a cosa build-derive as a container build wrapper.

@cgwalters
Copy link
Member Author

One giant benefit of this is that now it becomes immediately much better for OpenShift how to inject other code into the host system written in a compiled language. For example the code to manage the primary NIC via OVS is crying out to be...not bash.

The MCD today has a hack to copy itself to the host, which only dubiously works with skew between host and container userspace.

Basically the status quo makes no sense at all, where we embed a kubelet binary but inject all this other shell script and other logic. With this split, all that stuff would be consistently in a separate container image layer.

@cgwalters
Copy link
Member Author

OK, I've updated the initial description in this issue with a bit more fleshed out description. Feedback appreciated!

@cgwalters
Copy link
Member Author

One interesting example here is the SSH password bug.

If we'd already had this split, I think the change there would have landed in github.com/openshift/node - not in gitlab.com/redhat/coreos. We suddenly have a way to clearly distinguish the "stuff done for openshift nodes" versus "bootable rhel".

@LorbusChris
Copy link
Member

This is strongly related to okd-project/okd-coreos-pipeline#46, which will split SCOS into a base and an OKD layer.

@jlebon
Copy link
Member

jlebon commented Feb 16, 2024

/assign jlebon
/label jira

Copy link
Contributor

openshift-ci bot commented Feb 16, 2024

@jlebon: The label(s) /label jira cannot be applied. These labels are supported: acknowledge-critical-fixes-only, platform/aws, platform/azure, platform/baremetal, platform/google, platform/libvirt, platform/openstack, ga, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, px-approved, docs-approved, qe-approved, no-qe, downstream-change-needed, rebase/manual, cluster-config-api-changed, approved, backport-risk-assessed, bugzilla/valid-bug, cherry-pick-approved, jira/valid-bug, staff-eng-approved. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?

In response to this:

/assign jlebon
/label jira

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jlebon jlebon added the jira label Feb 16, 2024
jlebon added a commit to jlebon/coreos-ci that referenced this issue Feb 20, 2024
The Prow CI we have in those repos are extremely slow and annoying to
maintain. We're still going to need it for now to at least build RHCOS
with actual RHEL RPMs, but at least for CentOS Stream we should be able
to build that fine in CoreOS CI. (We don't have access to the OCP RPMs,
but with openshift/os#799, we'll move those
out of the base compose anyway.)
jlebon added a commit to coreos/coreos-ci that referenced this issue Feb 20, 2024
The Prow CI we have in those repos are extremely slow and annoying to
maintain. We're still going to need it for now to at least build RHCOS
with actual RHEL RPMs, but at least for CentOS Stream we should be able
to build that fine in CoreOS CI. (We don't have access to the OCP RPMs,
but with openshift/os#799, we'll move those
out of the base compose anyway.)
jlebon added a commit to jlebon/os that referenced this issue Feb 21, 2024
As prep for openshift#799, let's better split the postprocessing steps that are
related to OCP from those that have tighter binding to RHEL proper.

This should have almost no functional effect. One visible difference is
in the `/etc/motd` we write which before hardcoded e.g. RHCOS and CentOS
Stream in the prose text, but is now a little more generic.
ravanelli added a commit to ravanelli/fedora-coreos-pipeline that referenced this issue Mar 6, 2025
 - As part of openshift/os#799, we'll want to
build the "OCP node" image as a layered image on top of the RHCOS base
image.
 - Create a separete job where we build it and push to the registry

Signed-off-by: Renata Ravanelli <[email protected]>
ravanelli added a commit to ravanelli/fedora-coreos-pipeline that referenced this issue Mar 6, 2025
 - As part of openshift/os#799, we'll want to
build the "OCP node" image as a layered image on top of the RHCOS base
image.
 - Create a separete job where we build it and push to the registry

Signed-off-by: Renata Ravanelli <[email protected]>
ravanelli added a commit to ravanelli/fedora-coreos-pipeline that referenced this issue Mar 6, 2025
 - As part of openshift/os#799, we'll want to
build the "OCP node" image as a layered image on top of the RHCOS base
image.
 - Create a separete job where we build it and push to the registry

Signed-off-by: Renata Ravanelli <[email protected]>
ravanelli added a commit to ravanelli/fedora-coreos-pipeline that referenced this issue Mar 6, 2025
 - As part of openshift/os#799, we'll want to
build the "OCP node" image as a layered image on top of the RHCOS base
image.
 - Create a separete job where we build it and push to the registry

Signed-off-by: Renata Ravanelli <[email protected]>
ravanelli added a commit to ravanelli/fedora-coreos-pipeline that referenced this issue Mar 6, 2025
 - As part of openshift/os#799, we'll want to
build the "OCP node" image as a layered image on top of the RHCOS base
image.
 - Create a separete job where we build it and push to the registry

Signed-off-by: Renata Ravanelli <[email protected]>
ravanelli added a commit to ravanelli/fedora-coreos-pipeline that referenced this issue Mar 6, 2025
 - As part of openshift/os#799, we'll want to
build the "OCP node" image as a layered image on top of the RHCOS base
image.
 - Create a separete job where we build it and push to the registry

Signed-off-by: Renata Ravanelli <[email protected]>
ravanelli added a commit to ravanelli/fedora-coreos-pipeline that referenced this issue Mar 7, 2025
 - As part of openshift/os#799, we'll want to
build the "OCP node" image as a layered image on top of the RHCOS base
image.
 - Create a separete job where we build it and push to the registry

Signed-off-by: Renata Ravanelli <[email protected]>
ravanelli added a commit to ravanelli/fedora-coreos-pipeline that referenced this issue Mar 7, 2025
 - As part of openshift/os#799, we'll want to
build the "OCP node" image as a layered image on top of the RHCOS base
image.
 - Create a separete job where we build it and push to the registry

Signed-off-by: Renata Ravanelli <[email protected]>
ravanelli added a commit to ravanelli/fedora-coreos-pipeline that referenced this issue Mar 7, 2025
 - As part of openshift/os#799, we'll want to
build the "OCP node" image as a layered image on top of the RHCOS base
image.
 - Create a separete job where we build it and push to the registry

Signed-off-by: Renata Ravanelli <[email protected]>
ravanelli added a commit to ravanelli/fedora-coreos-pipeline that referenced this issue Mar 7, 2025
 - As part of openshift/os#799, we'll want to
build the "OCP node" image as a layered image on top of the RHCOS base
image.
 - Create a separete job where we build it and push to the registry

Signed-off-by: Renata Ravanelli <[email protected]>
ravanelli added a commit to ravanelli/fedora-coreos-pipeline that referenced this issue Mar 7, 2025
 - As part of openshift/os#799, we'll want to
build the "OCP node" image as a layered image on top of the RHCOS base
image.
 - Create a separete job where we build it and push to the registry

Signed-off-by: Renata Ravanelli <[email protected]>
ravanelli added a commit to ravanelli/fedora-coreos-pipeline that referenced this issue Mar 8, 2025
 - As part of openshift/os#799, we'll want to
build the "OCP node" image as a layered image on top of the RHCOS base
image.
 - Create a separete job where we build it and push to the registry

Signed-off-by: Renata Ravanelli <[email protected]>
ravanelli added a commit to ravanelli/fedora-coreos-pipeline that referenced this issue Mar 10, 2025
 - As part of openshift/os#799, we'll want to
build the "OCP node" image as a layered image on top of the RHCOS base
image.
 - Create a separete job where we build it and push to the registry

Signed-off-by: Renata Ravanelli <[email protected]>
ravanelli added a commit to ravanelli/fedora-coreos-pipeline that referenced this issue Mar 10, 2025
 - As part of openshift/os#799, we'll want to
build the "OCP node" image as a layered image on top of the RHCOS base
image.
 - Create a separete job where we build it and push to the registry

Signed-off-by: Renata Ravanelli <[email protected]>
ravanelli added a commit to ravanelli/fedora-coreos-pipeline that referenced this issue Mar 10, 2025
 - As part of openshift/os#799, we'll want to
build the "OCP node" image as a layered image on top of the RHCOS base
image.
 - Create a separete job where we build it and push to the registry

Signed-off-by: Renata Ravanelli <[email protected]>
ravanelli added a commit to ravanelli/fedora-coreos-pipeline that referenced this issue Mar 10, 2025
 - As part of openshift/os#799, we'll want to
build the "OCP node" image as a layered image on top of the RHCOS base
image.
 - Create a separete job where we build it and push to the registry

Signed-off-by: Renata Ravanelli <[email protected]>
ravanelli added a commit to ravanelli/fedora-coreos-pipeline that referenced this issue Mar 10, 2025
 - As part of openshift/os#799, we'll want to
build the "OCP node" image as a layered image on top of the RHCOS base
image.
 - Create a separete job where we build it and push to the registry

Signed-off-by: Renata Ravanelli <[email protected]>
ravanelli added a commit to ravanelli/fedora-coreos-pipeline that referenced this issue Mar 10, 2025
 - As part of openshift/os#799, we'll want to
build the "OCP node" image as a layered image on top of the RHCOS base
image.
 - Create a separete job where we build it and push to the registry

Signed-off-by: Renata Ravanelli <[email protected]>
ravanelli added a commit to ravanelli/fedora-coreos-pipeline that referenced this issue Mar 10, 2025
 - As part of openshift/os#799, we'll want to
build the "OCP node" image as a layered image on top of the RHCOS base
image.
 - Create a separete job where we build it and push to the registry

Signed-off-by: Renata Ravanelli <[email protected]>
ravanelli added a commit to ravanelli/fedora-coreos-pipeline that referenced this issue Mar 10, 2025
 - As part of openshift/os#799, we'll want to
build the "OCP node" image as a layered image on top of the RHCOS base
image.
 - Create a separete job where we build it and push to the registry

Signed-off-by: Renata Ravanelli <[email protected]>
ravanelli added a commit to ravanelli/fedora-coreos-pipeline that referenced this issue Mar 10, 2025
 - As part of openshift/os#799, we'll want to
build the "OCP node" image as a layered image on top of the RHCOS base
image.
 - Create a separete job where we build it and push to the registry

Signed-off-by: Renata Ravanelli <[email protected]>
ravanelli added a commit to ravanelli/fedora-coreos-pipeline that referenced this issue Mar 10, 2025
 - As part of openshift/os#799, we'll want to
build the "OCP node" image as a layered image on top of the RHCOS base
image.
 - Create a separete job where we build it and push to the registry

Signed-off-by: Renata Ravanelli <[email protected]>
ravanelli added a commit to ravanelli/fedora-coreos-pipeline that referenced this issue Mar 10, 2025
 - As part of openshift/os#799, we'll want to
build the "OCP node" image as a layered image on top of the RHCOS base
image.
 - Create a separete job where we build it and push to the registry

Signed-off-by: Renata Ravanelli <[email protected]>
ravanelli added a commit to ravanelli/fedora-coreos-pipeline that referenced this issue Mar 10, 2025
 - As part of openshift/os#799, we'll want to
build the "OCP node" image as a layered image on top of the RHCOS base
image.
 - Create a separete job where we build it and push to the registry

Signed-off-by: Renata Ravanelli <[email protected]>
ravanelli added a commit to ravanelli/fedora-coreos-pipeline that referenced this issue Mar 10, 2025
 - As part of openshift/os#799, we'll want to
build the "OCP node" image as a layered image on top of the RHCOS base
image.
 - Create a separate job where we build it and push to the registry

Signed-off-by: Renata Ravanelli <[email protected]>
ravanelli added a commit to ravanelli/fedora-coreos-pipeline that referenced this issue Mar 10, 2025
 - As part of openshift/os#799, we'll want to
build the "OCP node" image as a layered image on top of the RHCOS base
image.
 - Create a separate job where we build it and push to the registry

Signed-off-by: Renata Ravanelli <[email protected]>
ravanelli added a commit to ravanelli/fedora-coreos-pipeline that referenced this issue Mar 10, 2025
 - As part of openshift/os#799, we'll want to
build the "OCP node" image as a layered image on top of the RHCOS base
image.
 - Create a separate job where we build it and push to the registry

Signed-off-by: Renata Ravanelli <[email protected]>
ravanelli added a commit to ravanelli/fedora-coreos-pipeline that referenced this issue Mar 10, 2025
 - As part of openshift/os#799, we'll want to
build the "OCP node" image as a layered image on top of the RHCOS base
image.
 - Create a separate job where we build it and push to the registry

Signed-off-by: Renata Ravanelli <[email protected]>
ravanelli added a commit to ravanelli/fedora-coreos-pipeline that referenced this issue Mar 10, 2025
 - As part of openshift/os#799, we'll want to
build the "OCP node" image as a layered image on top of the RHCOS base
image.
 - Create a separate job where we build it and push to the registry

Signed-off-by: Renata Ravanelli <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
None yet
Development

No branches or pull requests

6 participants