Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Carry #1111: specs-go/config: add Landlock LSM support #1241

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Zheaoli
Copy link

@Zheaoli Zheaoli commented Jan 2, 2024

Carry #1111

@Zheaoli
Copy link
Author

Zheaoli commented Jan 2, 2024

cc @l0kod @AkihiroSuda @vbatts @gnoack @cyphar

@AkihiroSuda AkihiroSuda added this to the v1.2.0 (tentative) milestone Jan 2, 2024
@AkihiroSuda
Copy link
Member

cc @kailun-qin (author of #1111) PTAL

@AkihiroSuda
Copy link
Member

Before merging this PR, can we have a POC implementation of runc (or crun, youki) to confirm the design?

@Zheaoli
Copy link
Author

Zheaoli commented Jan 9, 2024

opencontainers/runc#3194 POC is here. Still need polish some code and add tests for it.

I will carry it

@@ -340,6 +340,26 @@ For Linux-based systems, the `process` object supports the following process-spe

* **`class`** (string, REQUIRED) specifies the I/O scheduling class. Possible values are `IOPRIO_CLASS_RT`, `IOPRIO_CLASS_BE`, and `IOPRIO_CLASS_IDLE`.
* **`priority`** (int, REQUIRED) specifies the priority level within the class. The value should be an integer ranging from 0 (highest) to 7 (lowest).
* **`landlock`** (object, OPTIONAL) specifies the Landlock unprivileged access control settings for the container process.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR should be carefully updated following the latest advances of landlock (rather than simply carried) as some of the spec proposals might be stale now. Pls see https://docs.kernel.org/userspace-api/landlock.html and https://github.com/landlock-lsm/go-landlock for details.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you got time to continue this PR or I can continue carry(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls go ahead updating this PR. I'll need some time to follow up and dive into the updates of landlock (even for review).

@l0kod
Copy link

l0kod commented Jan 10, 2024

Thanks for working on this!

You might want to take a look at this talk about backward and forward compatibility challenges: https://archive.fosdem.org/2023/schedule/event/rust_backward_and_forward_compatibility_for_security_features/

FYI, I plan to review these changes next week.

@gnoack
Copy link

gnoack commented Jan 10, 2024

Thanks for working on this!

You might want to take a look at this talk about backward and forward compatibility challenges: https://archive.fosdem.org/2023/schedule/event/rust_backward_and_forward_compatibility_for_security_features/

FYI, I plan to review these changes next week.

FWIW the most common implementation problem so far has been the use of the "refer" right, whose logic unfortunately works slightly differently than for other access rights. I tried to explain this in https://blog.gnoack.org/post/landlock-best-effort/ last year.

With Linux 6.7, Landlock also has gained support for restricting the use of bind(2) and connect(2) for TCP. (Landlock ABI version 4)

@Zheaoli
Copy link
Author

Zheaoli commented Jan 10, 2024

FYI, I plan to review these changes next week.

Thanks for the important information, I will update this spec in this week ASAP I can

@utam0k
Copy link
Member

utam0k commented Jan 11, 2024

I wonder what the behavior should be when a new process(e.g. docker exec) jumps into the existing container.

@Zheaoli Zheaoli marked this pull request as draft February 27, 2024 16:33
@Zheaoli Zheaoli force-pushed the manjusaka/carry-1111 branch from 56a5a1d to 6730eb2 Compare March 1, 2024 03:18
@Zheaoli Zheaoli marked this pull request as ready for review March 1, 2024 03:23
@Zheaoli
Copy link
Author

Zheaoli commented Mar 1, 2024

I have update the PR, PTAL @kailun-qin @gnoack @l0kod

@Zheaoli Zheaoli requested a review from kailun-qin March 1, 2024 03:24
// LandlockFSAction used to specify the FS actions that are handled by a ruleset or allowed by a rule.
type LandlockFSAction string

// Define actions on files and directories that Landlock can restrict a sandboxed process to.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation nit: The word "define" is usually unnecessary in docstrings (all of the exported symbols are defining something after all)

Copy link
Member

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Zheaoli thanks for the PR! Added some comments.

I think the config.json interface needs more work, but I'd love for this to be ready.

I read @gnoack comments, that weren't addressed, and I fully agree with them. Please take a look their review comments.

config.md Outdated
3. ABI version >= 3:
1. truncate
* **`paths`** (array of strings, OPTIONAL) is an array of files or parent directories of the file hierarchies to restrict.
* **`portBeneath`** (array of objects, OPTIONAL) is an array of the network-hierarchy typed rules.
Copy link
Member

@rata rata Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't make sense out of that.

If I want to have a rule for port 21, 22, 53 and 80, it seems complicated to have a portBeneath rule. For filesystems that makes a lot of sense, because is all beneath this (i.e. subdirs and everything below that), this is applied. It's in the nature of fs.

But what does it mean for ports? It seems like a complicated way to express it. Am I missing something?

Oh, it seems @gnoack already commented on this. I fully agree: https://github.com/opencontainers/runtime-spec/pull/1241/files#r1508607132

@AkihiroSuda
Copy link
Member

Needs rebase

@AkihiroSuda
Copy link
Member

Needs another rebase

@AkihiroSuda
Copy link
Member

@Zheaoli Could you rebase this? 🙏
Let's merge this soon

@Zheaoli
Copy link
Author

Zheaoli commented Dec 16, 2024

Could you rebase this? 🙏
Let's merge this soon

Sure, I will carry this today and fix the review idea during this week.

(BTW I'm watching new job opportunities)(Free man today

@Zheaoli Zheaoli force-pushed the manjusaka/carry-1111 branch 2 times, most recently from 7cea8e2 to af20627 Compare December 20, 2024 09:54
kailun-qin and others added 2 commits December 20, 2024 17:57
Linux kernel 5.13 adds support for Landlock Linux Security Module (LSM).
This allows unprivileged processes to create safe security sandboxes
that can securely restrict the ambient rights (e.g. global filesystem
access) for themselves.

opencontainers#1110

Co-authored-by: Zheao Li <[email protected]>
Signed-off-by: Kailun Qin <[email protected]>
Signed-off-by: Manjusaka <[email protected]>
Signed-off-by: Manjusaka <[email protected]>
@Zheaoli Zheaoli force-pushed the manjusaka/carry-1111 branch from af20627 to 77ec826 Compare December 20, 2024 09:57
@Zheaoli Zheaoli requested review from gnoack and rata December 20, 2024 09:58
@Zheaoli
Copy link
Author

Zheaoli commented Dec 20, 2024

cc @AkihiroSuda @gnoack @utam0k @rata I made a huge update based on the reivew idea. PTAL

@utam0k
Copy link
Member

utam0k commented Dec 22, 2024

I'd like to know the use cases with LandLock.
I'm worried that it will become a feature that no one uses, and it will only increase the burden of maintenance. I'm not familiar with this, so please let me know 🙏

@AkihiroSuda
Copy link
Member

Can we get the POC opencontainers/runc#3194 synced with this PR, so as to verify the feasibility and usability?

@Zheaoli
Copy link
Author

Zheaoli commented Dec 23, 2024

Can we get the POC opencontainers/runc#3194 synced with this PR, so as to verify the feasibility and usability?

SGTM, Let me carry the runc#3194

@rata
Copy link
Member

rata commented Jan 21, 2025

@Zheaoli thanks! It's very hard to review if you just "do huge updates" but don't really answer the comments. It's not easy to know what you tackled, what you didn't and why, and what you might just forgotten. If you can take the time to answer each comment, it would be greatly appreciated.

I had a quick look, it does look better from a 10k feet point of view at least, but the part of exposing this in features is completely missed. I fear without it it might not be possible to use it in some scenarios, as old runtimes without this will just ignore this. This doesn't play well with the disable_best_effort thing. It seems to do one thing (enforce it), but it doesn't (the old runtime will just ignore things it doesn't know about).

@Zheaoli Can you have a look at that?

Copy link
Member

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great improvement on the previous iterations, but I don't think we should merge this, I think we still need changes :(

Besides some errors on how landlock behaves (and some typos that I removed the comments to make the review shorter, as there are other more important blockers IMHO), I don't like the idea of having the possible words in the spec as the only way to configure landlock (e.g. read_file and all allowed_access, the same for the network section, etc.).

In a lot of kernel releases (I think most) there were landlock changes, I mentioned the example of ioctls as the last recent example (this PR was updated in Dec and is already old). I think we need a better way to have something that is more "future proof", because the linux kernel is released every 6-8 weeks, we can't update the spec and runtimes so often for any new thing that can be restricted in a new kernel.

I suggested using the plain numbers, as defined in the landlock.h uapi header. This way, we can expose nice names in the go bindings, but users like containerd and so can still use new features if they just use the numbers (they can define a friendly constant for the number, etc.). This removes the runtime-spec and runtimes to create new releases to support every new "string" supported. We will need a new release for things like "now we can restrict a whole new category", like the network stuff that was added at some point in the past, but that isn't as often and seems like a reasonable trade-off for me. But I'm open to other ideas, of course :-)

* **`handledAccess`** (object, OPTIONAL) specifies the access rights that will be restricted by the ruleset.
The `handledAccess` currently contains the following types:
* **`handledAccessFS`** (array of strings, OPTIONAL) is an array of FS typed actions that are handled by a ruleset.
If no rule explicitly allow them, they should then be forbidden.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this what landlock does? I thought it will allow everything that is not handled here. Otherwise, backwards compatibility (when you add a new access type here, like kernel X added access type Y) will be impossible, right?

Doing a local experiment, this is not indeed what landlock does, so we can't promise this. It will be the case if we don't mention it in path_beneath. But if we mention it here, this is not true. Maybe it is just c&p that went unnoticed?

The `handledAccess` currently contains the following types:
* **`handledAccessFS`** (array of strings, OPTIONAL) is an array of FS typed actions that are handled by a ruleset.
If no rule explicitly allow them, they should then be forbidden.
* **`handledAccessNetwork`** (array of strings, OPTIONAL) is an array of NETWORK typed actions that are handled by a ruleset. (The NETWORK typed actions are available when the ABI version >= 4. The behavior when the ABI version is less than 4 will depend on the **`enableBestEffort`**)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove the last parenthesis. That is the case with all the things, like some handledAccessFs types might not be supported, etc.

The `rules` currently contains the following types:
* **`pathBeneath`** (array of objects, OPTIONAL) is an array of the file-hierarchy typed rules.
Entries in the array contain the following properties:
* **`allowedAccess`** (array of strings, OPTIONAL) is an array of FS typed actions that are allowed by a rule. The actions are grouped by the ABI version in the following description:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want strings or ints here? If we use ints, we can handle new values without requiring to add them to the spec. Of course, we will need to provide a friendly name for known things in the go bindings, but until a new release is done, people can use a number and it will just work.

Furthermore, what is here is already out of date. ABI 5 now is supported by new kernels (to restrict ioctls). So I think we need a more "future proof" spec :-)

What do others think?

* **`paths`** (array of strings, OPTIONAL) is an array of files or parent directories of the file hierarchies to restrict.
* **`networkPort`** (array of objects, OPTIONAL) is an array of the network socket rules.
Entries in the array contain the following properties:
* **`allowedAccess`** (array of strings, OPTIONAL) is an array of NETWORK typed actions that are allowed by a rule. The actions are grouped by the ABI version in the following description:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idem here, we might get away with ints?

1. bind
2. connect
* **`ports`** (array of strings, OPTIONAL) is an array of network ports to restrict.
* **`enableBestEffort`** (bool, OPTIONAL) the `enableBestEffort` field disables the best-effort security approach for Landlock access rights.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned before, without exposing landlock enabled as a bool in the features subcommand, a high level container runtime (like containerd or crio) can't know if the low-level container runtime implementing this (like crun or runc) supports landlock at all. And in that case, the runtimes that don't support landlock will just ignore this whole section and the container will be created just fine, no landlock enforced.

Therefore, I think we really need to expose this in the features subcommand too.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, we need to be careful to not give users a false sense of security.

* **`enableBestEffort`** (bool, OPTIONAL) the `enableBestEffort` field disables the best-effort security approach for Landlock access rights.
This is for conditions when the Landlock access rights explicitly configured by the container are not supported or available in the running kernel.
If the best-effort security approach is enabled (`false`), the runtime SHOULD enforce the strongest rules configured up to the current kernel support, and only be [logged as a warning](runtime.md#warnings) for those not supported.
If disabled (`true`), the runtime MUST [generate an error](runtime.md#errors) if one or more rules specified by the container is not supported.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the other comment, this only work for runtimes that use this spec version. For runc 1.1 or 1.2 and probably 1.3, it's mandated by the spec to ignore unrecognized fields. Therefore, this is not safe when you take into account that we need to support older runtimes from a high-level container like containerd or crio


* **`handledAccess`** (object, OPTIONAL) specifies the access rights that will be restricted by the ruleset.
The `handledAccess` currently contains the following types:
* **`handledAccessFS`** (array of strings, OPTIONAL) is an array of FS typed actions that are handled by a ruleset.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The int ideas can apply here too.

The `handledAccess` currently contains the following types:
* **`handledAccessFS`** (array of strings, OPTIONAL) is an array of FS typed actions that are handled by a ruleset.
If no rule explicitly allow them, they should then be forbidden.
* **`handledAccessNetwork`** (array of strings, OPTIONAL) is an array of NETWORK typed actions that are handled by a ruleset. (The NETWORK typed actions are available when the ABI version >= 4. The behavior when the ABI version is less than 4 will depend on the **`enableBestEffort`**)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idem, what about using ints for this too?

1. ABI version >= 4:
1. bind
2. connect
* **`ports`** (array of strings, OPTIONAL) is an array of network ports to restrict.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is a typo, we want int for ports, right?

@l0kod
Copy link

l0kod commented Feb 6, 2025

Besides some errors on how landlock behaves (and some typos that I removed the comments to make the review shorter, as there are other more important blockers IMHO), I don't like the idea of having the possible words in the spec as the only way to configure landlock (e.g. read_file and all allowed_access, the same for the network section, etc.).

In a lot of kernel releases (I think most) there were landlock changes, I mentioned the example of ioctls as the last recent example (this PR was updated in Dec and is already old). I think we need a better way to have something that is more "future proof", because the linux kernel is released every 6-8 weeks, we can't update the spec and runtimes so often for any new thing that can be restricted in a new kernel.

I suggested using the plain numbers, as defined in the landlock.h uapi header. This way, we can expose nice names in the go bindings, but users like containerd and so can still use new features if they just use the numbers (they can define a friendly constant for the number, etc.). This removes the runtime-spec and runtimes to create new releases to support every new "string" supported. We will need a new release for things like "now we can restrict a whole new category", like the network stuff that was added at some point in the past, but that isn't as often and seems like a reasonable trade-off for me. But I'm open to other ideas, of course :-)

Using keywords with a well-defined semantic should be favored, but I agree that giving the possibility for users to use currently-unknown-to-the-runtime access rights would enable them to benefit from more security features. I'd like to be able to specify both keywords and numbers though. Current Landlock users are looking for groups of access rights (e.g. read-only vs. read-write), and specific keywords could also be used for that (taking into account compatibility).

I started working on a standalone tool and library that reads a JSON configuration and creates a Landlock sandbox. The overall approach looks pretty similar to this specification and I'd like to combine our effort to get a common configuration format for Landlock. This would also be easier to synchronize and maintain wrt. to the kernel changes (that I maintain). The configuration should be a bit more flexible, closer to the kernel's UAPI, handle file descriptors as well, and ease handling of backward and forward compatibility. I plan to release an initial experimental version next week and I'll add a comment to this PR to point to the repository.

@rata
Copy link
Member

rata commented Feb 6, 2025

@l0kod +1 on not creating two different json specs to describe a landlock policy. I'll monitor this for when you post the link to the experimental release :)

@l0kod
Copy link

l0kod commented Feb 16, 2025

Here is an initial experimental version: https://github.com/landlock-lsm/landlockconfig
I'm working on improving the compatibility configuration to make it simpler to tweak (if required), see landlock-lsm/landlockconfig#3.

@Zheaoli
Copy link
Author

Zheaoli commented Feb 16, 2025

Here is an initial experimental version: https://github.com/landlock-lsm/landlockconfig
I'm working on improving the compatibility configuration to make it simpler to tweak (if required), see landlock-lsm/landlockconfig#3.

Thanks for the work. I think it would be great if we have a common landlock spec

@@ -352,6 +352,52 @@ For Linux-based systems, the `process` object supports the following process-spe
for `initial`. If omitted or empty, runtime SHOULD NOT change process'
CPU affinity after the process is moved to container's cgroup, and the
final affinity is determined by the Linux kernel.
* **`landlock`** (object, OPTIONAL) specifies the Landlock unprivileged access control settings for the container process.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this doesn't belong to this file. This should be in config-linux. The same with the secion of the struct this is, this should be inside the Linux section, as seccomp is. Not as a general feature, as this is a linux-only feature.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch, there is a section for Linux here too. Sorry!

@@ -96,8 +96,86 @@ type Process struct {
IOPriority *LinuxIOPriority `json:"ioPriority,omitempty" platform:"linux"`
// ExecCPUAffinity specifies CPU affinity for exec processes.
ExecCPUAffinity *CPUAffinity `json:"execCPUAffinity,omitempty" platform:"linux"`
// Landlock specifies the Landlock unprivileged access control settings for the container process.
// `noNewPrivileges` must be enabled to use Landlock.
Landlock *Landlock `json:"landlock,omitempty" platform:"linux"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idem, this is the wrong section. This should be inside the linux-only part, not the generic part. This only applies to linux

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch, there is a section for Linux here too. Sorry!

@rata
Copy link
Member

rata commented Feb 19, 2025

Added some other comments so the next iteration/PR doesn't do those mistakes too.

@rata
Copy link
Member

rata commented Feb 21, 2025

@l0kod I'm no maintainer here, but having a quick look the json schema seems mostly okay to me. Some comments:

  1. IIUC it is not handling numbers for the constants in landlock.h from uapi. I guess you want to change that?
  2. The additionalproperties: false is not something we can do in runtime-spec, sadly. I think the lib will need to have a way to accept additional properties to be able to use it in OCI runtimes like runc or crun. Maaaybe there is some edge case where we can reject, I haven't look in detail. Let's look at that later, it seems something easy to change later too.
  3. What would be the difference in practice between "soft_requirement" and "best_effort"? Maybe an example that behaves differently by changing this option?

Thanks!

@l0kod
Copy link

l0kod commented Mar 3, 2025

@l0kod I'm no maintainer here, but having a quick look the json schema seems mostly okay to me. Some comments:

  1. IIUC it is not handling numbers for the constants in landlock.h from uapi. I guess you want to change that?

Correct, I reverted this part for now because it is not implemented yet, but I will.

  1. The additionalproperties: false is not something we can do in runtime-spec, sadly. I think the lib will need to have a way to accept additional properties to be able to use it in OCI runtimes like runc or crun. Maaaybe there is some edge case where we can reject, I haven't look in detail. Let's look at that later, it seems something easy to change later too.

OK, I'd like to make it optional in the library then, it would be a shame to start a new schema with a legacy issue. 😉
It should be easy to only remove additionalproperties: false for the OCI runtime spec and still keep it in sync. Anyway, we'll need to integrate them somehow, so this could be part of the merge process.

  1. What would be the difference in practice between "soft_requirement" and "best_effort"? Maybe an example that behaves differently by changing this option?

soft_requirement is to completely disable the sandbox, whereas best_effort is to enforce as much restrictions as possible. However, I plan to replace that with options that make more sense, see landlock-lsm/landlockconfig#3 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants