-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move and rename repos, upgrade to Catalyst 4, support SDK on arm64 #2093
Conversation
Sorry for doing all this in one giant commit, but it was hard to separate it out. We had no arm64 SDK, so some cross-compiling or emulation was most likely going to be needed to produce one. Catalyst 4 adds support for building with QEMU, so I looked into upgrading. This turned out to be very much slower than emulating the amd64 SDK on arm64, where an arm64 build could then be mostly run without emulation. We can't stay on Catalyst 3 forever though, so I continued with the upgrade. Despite being slow, I have kept support for building with QEMU using Catalyst since it requires little code and may be useful to somebody. Catalyst 4 has totally changed the way repositories are handled. It only works when the name of the directory containing the repository matches the configured name of that repository. This was not the case for us, with the coreos repository residing in the coreos-overlay directory. We wanted to move and rename our repositories anyway, so they are now known as gentoo-subset and flatcar-overlay, and they live under scripts/repos. Using the same name as upstream Gentoo would have been problematic, and just "flatcar" would have looked awkward in documentation. Catalyst 4 also ingests the main repository snapshot as a squashfs rather than a tarball. It features a utility to generate such a snapshot, but it doesn't fit Flatcar well, particularly because it expects each ebuild repository to reside at the top level of its own git repository. It was very easy to call tar2sqfs manually though. There were several places where we assumed that amd64 was native and arm64 required emulation via QEMU. The scripts are now more architecture-agnostic, paving the way for riscv support later. We no longer set QEMU_LD_PREFIX because it prevents the SDK itself from being emulated. It also assumes there is only one non-native target, which won't be the case soon. bubblewrap does a better job of running binaries under QEMU. Signed-off-by: James Le Cuirot <[email protected]>
This hasn't been needed for a while, and it now breaks util-linux, installing modules under /usr/lib64 when they should be under /usr/lib. Signed-off-by: James Le Cuirot <[email protected]>
This is what upstream Gentoo does. They would previously update the entire seed, but this took a long time. Our seeds are much bigger, so we kept repo snapshots to build stage1 against these instead. The new method of only rebuilding packages with changed sub-slots is a good compromise and removes the need to write stage1 hooks that selectively catch the repository up. This also avoids some conflicts by adding the `--ignore-world` option. Gentoo seeds have nothing in @world. We have much more, but none of that is needed for stage1. This continues to exclude cross-*-cros-linux-gnu/* as that is not needed for stage1. It now also excludes dev-lang/rust, because it is never a DEPEND, so it would not break other packages in this way. It may fail to run due to a sub-slot change in one of its own dependencies, but it is also unlikely to be needed in stage1 and it is not configured to use the system LLVM. If needs be, we could improve the behaviour of Portage's @changed-subslot to respect `--with-bdeps`. In my testing, it was unable to handle an SDK from 17 months ago, but one from 7 months ago did work. In practise, we will always use a much more recent one, which is far more likely to work. Signed-off-by: James Le Cuirot <[email protected]>
From https://wiki.gentoo.org/wiki/Catalyst/Stage_Creation#Build_Stage3: > It is not necessary to build stage2 in order to build stage3. Gentoo > release engineering does not build stage2, and you should not need to > unless you're intentionally building a stage2 as your goal. Signed-off-by: James Le Cuirot <[email protected]>
We stopped using profiles with a lib->lib64 symlink a while ago, so there is no point in checking for this any more. We weren't checking against the target SDK architecture anyway. Signed-off-by: James Le Cuirot <[email protected]>
We currently put an os-release symlink in lib64, but we shouldn't assume that the architecture will even have a lib64 directory. I doubt this compatibility symlink was needed anyway. Gentoo doesn't have one, and applications are supposed to check /etc/os-release. I can find almost no reference to /usr/lib64/os-release anywhere, let alone in Flatcar. Signed-off-by: James Le Cuirot <[email protected]>
Hello @chewi, great work on the move and the ARM64 SDK. I have tried to reproduce the build, but got some issues:
When running catalyst emerge:
What seed are you using, maybe my recent AMD64 SDK flatcar-sdk-all-4019.0.0-nightly-20240702-2100_os-main-4019.0.0-nightly-20240702-2100 is too new? |
Build action triggered: https://github.com/flatcar/scripts/actions/runs/9942160252 |
Why not? This is going to be a deal breaker - there must be a clean transition path from one sdk version to the next. This PR is a lot of changes squeezed together - my initial thought is that we need to split this into stages:
What did you use as seed for "from scratch"? How about finding a way to do it with https://alpha.release.flatcar-linux.net/arm64-usr/current/flatcar_developer_container.bin.bz2 which is based on flatcar production images but has a full toolchain + emerge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unable to comment on the gigantic commit: the qemu-user emulation is used when cross-compiling (bootengine and i'm sure some other packages run compiled helpers) so QEMU_LD_PREFIX and binfmt are still needed to stay. amd64->arm64 cross-compilation is still going to have to stay supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading this it is not obvious to someone why this is no longer needed and how it breaks something. Which "modules" are installed in /usr/lib64
? Why hasn't this been needed for a while? Error messages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do want the cross-compilation to stay, but I wasn't aware that it relied on QEMU like this. We'd have to find a different way of doing it. I've been brewing up an eclass to help with cases like this, so that's one option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the Python modules: The hack is not needed now because the way Gentoo handles Python modules has totally changed since this was written. This was largely driven by PEP 517, which made it much easier for Gentoo to cross-compile Python modules. The util-linux package installs the libmount module. It's only now needed because of Catalyst 4. We previously disabled the python USE flag.
https://github.com/flatcar/flatcar-dev-util/blob/flatcar-master/emerge-gitclone relies on the repo paths and names - this is used by the devcontainer. In phase 1 we don't necessarily need to move or rename the repos, right? We just need to align metadata with the existing directory name. |
From my experience, when we were doing this kind of heavy PRs, we would usually create multiple PRs that were doing only one thing, without any of those PRs being functional per se, and then have one PR (meta-PR) that contained all those PRs to be able to test the full functionality. Once the independent PRs were reviewed / comented and agreed upon in isolation and the meta-PR was tested, we were merging the independent PRs at the same time, in the correct order and closed the meta-PR. Thus, the comments and reviews / updates were trackable in isolation and we had the full view in the meta-pr. |
@ader1990 I just tried my steps in 4020.0.0+nightly-20240703-2100 and it worked fine, but I was previously using a much older version. I'll give you a call later to see what's up. |
I used my cross-boss project to cross-compile With the other changes I've made, using the dev container might just work now, but I haven't looked at it yet. I suspect it has the same "cros" toolchain? |
Hello, I could reproduce a succesful AMD64 SDK build and a Flatcar image creation with that SDK. # First step, clone this branch and enter the latest sdk container
# Step done on the AMD64 host
git clone https://github.com/flatcar/scripts -b chewi/repo-mv-catalyst4-arm64-sdk
cd scripts
./run_sdk_container -t
# Second step, make sure that the gentoo portage profiles are properly set
# Step done on the SDK container
sudo ln -snf /mnt/host/source/src/scripts/repos/flatcar-overlay/profiles/coreos/amd64/sdk /etc/portage/make.profile
sudo tee /etc/portage/repos.conf/coreos.conf <<EOF
[DEFAULT]
main-repo = gentoo-subset
[flatcar-overlay]
location = /mnt/host/source/src/scripts/repos/flatcar-overlay
[gentoo-subset]
location = /mnt/host/source/src/scripts/repos/gentoo-subset
EOF
# Third step, emerge catalyst and bootstrap the SDK
# Step done on the SDK container
sudo emerge -av catalyst
sudo ./bootstrap_sdk
# Once the bootstrap in complete, an tar.bzip2 artifact should be present here
# /mnt/host/source/src/build/catalyst/builds/flatcar-sdk/flatcar-sdk-amd64-*tar.bz2
# 4th step, copy the artifact in the scripts folder (cwd) and exit the initial SDK container
sudo cp /mnt/host/source/src/build/catalyst/builds/flatcar-sdk/flatcar-sdk-amd64-*tar.bz2 .
exit
# 5th step, build the Docker image of that SDK
# Step done on the AMD64 host
./build_sdk_container_image flatcar-sdk-amd64-*.tar.bz2
# After the build completes, you should see 3 new docker images, with suffix amd64/arm64 and all
# docker image ls
# REPOSITORY TAG IMAGE ID CREATED SIZE
# ghcr.io/flatcar/flatcar-sdk-amd64 4020.0.0-nightly-20240703-2100-6-ga344b7edca ada50a52899c 12 hours ago 6.05GB
# ...
# 6th step, enter the new SDK container
# Step done on the AMD64 host
./run_sdk_container -t -C ghcr.io/flatcar/flatcar-sdk-amd64:4020.0.0-nightly-20240703-2100-6-ga344b7edca -n test-amd64-new-sdk -a amd64
# 7th step, build_packages, build_image, image_to_vm, boot the image on qemu to make sure it works
# Step done on the new SDK container
./build_packages
./build_image --image_compression_formats none
./image_to_vm.sh --from=../build/images/amd64-usr/developer-4020.0.0+nightly-20240703-2100-6-ga344b7edca-a1 --board=amd64-usr --image_compression_formats none
cd ../build/images/amd64-usr/developer-4020.0.0+nightly-20240703-2100-6-ga344b7edca-a1/
sudo bash ./flatcar_production_qemu_uefi_secure.sh -nographic
# Make sure that the QEMU VM autologins with user core and `systemctl status --failed` returns an empty response. |
I've prepared a branch with just the Catalyst 4 upgrade and am testing it out. It still requires two manual adjustments though. One can be avoided by also taking the change to not use snapshots in stage1. This isn't a very large change. The other is the |
What's the issue with updating |
If you mean scripting that up for a clean transition, then yeah, we could do that, but I like to avoid short-term fix-ups and this isn't a bad thing to add to Portage anyway. |
As far as i can tell this file is generated during running of catalyst and is (/can be) regenerated when entering the SDK. What am I missing? |
The SDK's repo config needs to be correct in order to update Catalyst, although that in itself is currently a manual step. That aside, if you don't also apply the two lib64 symlink commits, then bootstrap_sdk dies almost immediately. Even after that, you still get a couple of warnings about it, although they're possibly benign. |
How to reproduce the ARM64 build on a ARM64 Pre-requisites:
# load the ARM64 SDK docker image
# Step run on the Flatcar ARM64 host
docker load < magic-image.tar.gz
docker image ls
# Step run on the Flatcar ARM64 host
# ghcr.io/flatcar/flatcar-sdk-all 4006.0.0-nightly-20240619-2100-28-g922276c37f 857f97cb44b6 4 days ago 7.5GB
# enter the SDK container
# Step run on the Flatcar ARM64 host
./run_sdk_container -n arm64sdkv1 -t -C ghcr.io/flatcar/flatcar-sdk-all:4006.0.0-nightly-20240619-2100-28-g922276c37f
# build packages / image / vm image and run vm
# Step run on the SDK ARM64 container
./build_packages --board arm64-usr
./build_image --board arm64-usr
./image_to_vm --from <image dir>
./flatcar_production_qemu_uefi.sh -nographic
|
Ahem, you just pinged somebody else there. 😅 |
copy paste from notepad did not work so well, sorry. |
My Portage fix has now been merged. I'm one of the maintainers, so I can cut a release. There's just one other change we'd like to get in first. If I then create a new PR here to take that release, it will presumably then make it into nightly SDK builds. Is that good enough to ensure a clean transition, or would it also need to hit the release channels first? It would be helpful to understand this process when making other fixes in future. |
The next release is built with the previous release's SDK as a seed, so that's the scenario that needs to be tested in the CI. The nightly SDK build follows that process but getting something into the nightly SDK only lets you depend on it for package builds. We have maintainer docs here: https://github.com/flatcar/flatcar-maintainer-private/blob/main/documentation/maintenance/release.md. As for the "clean transition" - it depends. I'm not sure which changes you want to split out - it'll be better to discuss this once you open the PR. With my rough understanding of what your intention is: if you enable updating the seed in catalyst and change the (generated) repos.conf definition to one that the seed's portage doesn't understand, then I expect that to fail. That's why I suggested not going down that road and updating the section in the config to match the folder name instead. |
We will want the steps that produce the initial arm64 seed to be merged into this repo and setup a jenkins job for it. |
Okay, but which SDK version actually kicks off the build? It is the same as the version used for the seed? |
At least for the Jenkins sdk pipeline, it seems to use the same version as the seed. In any case, I've decided not to wait for the Portage fix to land. I've now created #2115, which automatically fixes up the repo name when the SDK starts and upgrades Catalyst when the SDK build starts. |
Move and rename repos, upgrade to Catalyst 4, support SDK on arm64
Sorry for doing all this in one giant commit, but it was hard to separate it out. In fact, it was so big that it made the GitHub UI unresponsive, so I had to create this PR using the CLI tool!
We had no arm64 SDK, so some cross-compiling or emulation was most likely going to be needed to produce one. Catalyst 4 adds support for building with QEMU, so I looked into upgrading. This turned out to be very much slower than emulating the amd64 SDK on arm64, where an arm64 build could then be mostly run without emulation. We can't stay on Catalyst 3 forever though, so I continued with the upgrade.
Catalyst 4 has totally changed the way repositories are handled. It only works when the name of the directory containing the repository matches the configured name of that repository. This was not the case for us, with the coreos repository residing in the coreos-overlay directory. We wanted to move and rename our repositories anyway, so they are now known as gentoo-subset and flatcar-overlay, and they live under scripts/repos. Using the same name as upstream Gentoo would have been problematic, and just "flatcar" would have looked awkward in documentation.
Please see the commit messages for more detail. We will need some coordination to get new SDKs published once this is merged because the usual process won't work.
How to use
As you might expect, this is a breaking change for building the SDK, but building a new amd64 SDK with an existing amd64 SDK doesn't require much effort.
Obviously, we haven't published an arm64 SDK yet, but I can provide one if you want to test that.
Testing done
I've built an arm64 SDK from scratch, including the step to turn it into a Docker image, and built another arm64 SDK from that. I've also built a new amd64 SDK, using an existing 7 month old amd64 SDK as a seed.
changelog/
directory (user-facing change, bug fix, security fix, update)/boot
and/usr
size, packages, list files for any missing binaries, kernel modules, config files, kernel modules, etc.