-
Notifications
You must be signed in to change notification settings - Fork 561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Carry #1111: specs-go/config: add Landlock LSM support #1241
base: main
Are you sure you want to change the base?
Conversation
9962bf2
to
c94edbc
Compare
cc @kailun-qin (author of #1111) PTAL |
Before merging this PR, can we have a POC implementation of runc (or crun, youki) to confirm the design? |
opencontainers/runc#3194 POC is here. Still need polish some code and add tests for it. I will carry it |
@@ -340,6 +340,26 @@ For Linux-based systems, the `process` object supports the following process-spe | |||
|
|||
* **`class`** (string, REQUIRED) specifies the I/O scheduling class. Possible values are `IOPRIO_CLASS_RT`, `IOPRIO_CLASS_BE`, and `IOPRIO_CLASS_IDLE`. | |||
* **`priority`** (int, REQUIRED) specifies the priority level within the class. The value should be an integer ranging from 0 (highest) to 7 (lowest). | |||
* **`landlock`** (object, OPTIONAL) specifies the Landlock unprivileged access control settings for the container process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR should be carefully updated following the latest advances of landlock (rather than simply carried) as some of the spec proposals might be stale now. Pls see https://docs.kernel.org/userspace-api/landlock.html and https://github.com/landlock-lsm/go-landlock for details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you got time to continue this PR or I can continue carry(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pls go ahead updating this PR. I'll need some time to follow up and dive into the updates of landlock (even for review).
Thanks for working on this! You might want to take a look at this talk about backward and forward compatibility challenges: https://archive.fosdem.org/2023/schedule/event/rust_backward_and_forward_compatibility_for_security_features/ FYI, I plan to review these changes next week. |
FWIW the most common implementation problem so far has been the use of the "refer" right, whose logic unfortunately works slightly differently than for other access rights. I tried to explain this in https://blog.gnoack.org/post/landlock-best-effort/ last year. With Linux 6.7, Landlock also has gained support for restricting the use of bind(2) and connect(2) for TCP. (Landlock ABI version 4) |
Thanks for the important information, I will update this spec in this week ASAP I can |
I wonder what the behavior should be when a new process(e.g. docker exec) jumps into the existing container. |
56a5a1d
to
6730eb2
Compare
I have update the PR, PTAL @kailun-qin @gnoack @l0kod |
specs-go/config.go
Outdated
// LandlockFSAction used to specify the FS actions that are handled by a ruleset or allowed by a rule. | ||
type LandlockFSAction string | ||
|
||
// Define actions on files and directories that Landlock can restrict a sandboxed process to. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation nit: The word "define" is usually unnecessary in docstrings (all of the exported symbols are defining something after all)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config.md
Outdated
3. ABI version >= 3: | ||
1. truncate | ||
* **`paths`** (array of strings, OPTIONAL) is an array of files or parent directories of the file hierarchies to restrict. | ||
* **`portBeneath`** (array of objects, OPTIONAL) is an array of the network-hierarchy typed rules. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't make sense out of that.
If I want to have a rule for port 21, 22, 53 and 80, it seems complicated to have a portBeneath rule. For filesystems that makes a lot of sense, because is all beneath this (i.e. subdirs and everything below that), this is applied. It's in the nature of fs.
But what does it mean for ports? It seems like a complicated way to express it. Am I missing something?
Oh, it seems @gnoack already commented on this. I fully agree: https://github.com/opencontainers/runtime-spec/pull/1241/files#r1508607132
Needs rebase |
6730eb2
to
9016907
Compare
Needs another rebase |
@Zheaoli Could you rebase this? 🙏 |
Sure, I will carry this today and fix the review idea during this week. (BTW I'm watching new job opportunities)(Free man today |
7cea8e2
to
af20627
Compare
Linux kernel 5.13 adds support for Landlock Linux Security Module (LSM). This allows unprivileged processes to create safe security sandboxes that can securely restrict the ambient rights (e.g. global filesystem access) for themselves. opencontainers#1110 Co-authored-by: Zheao Li <[email protected]> Signed-off-by: Kailun Qin <[email protected]> Signed-off-by: Manjusaka <[email protected]>
Signed-off-by: Manjusaka <[email protected]>
af20627
to
77ec826
Compare
cc @AkihiroSuda @gnoack @utam0k @rata I made a huge update based on the reivew idea. PTAL |
I'd like to know the use cases with LandLock. |
Can we get the POC opencontainers/runc#3194 synced with this PR, so as to verify the feasibility and usability? |
SGTM, Let me carry the runc#3194 |
@Zheaoli thanks! It's very hard to review if you just "do huge updates" but don't really answer the comments. It's not easy to know what you tackled, what you didn't and why, and what you might just forgotten. If you can take the time to answer each comment, it would be greatly appreciated. I had a quick look, it does look better from a 10k feet point of view at least, but the part of exposing this in features is completely missed. I fear without it it might not be possible to use it in some scenarios, as old runtimes without this will just ignore this. This doesn't play well with the @Zheaoli Can you have a look at that? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great improvement on the previous iterations, but I don't think we should merge this, I think we still need changes :(
Besides some errors on how landlock behaves (and some typos that I removed the comments to make the review shorter, as there are other more important blockers IMHO), I don't like the idea of having the possible words in the spec as the only way to configure landlock (e.g. read_file and all allowed_access, the same for the network section, etc.).
In a lot of kernel releases (I think most) there were landlock changes, I mentioned the example of ioctls as the last recent example (this PR was updated in Dec and is already old). I think we need a better way to have something that is more "future proof", because the linux kernel is released every 6-8 weeks, we can't update the spec and runtimes so often for any new thing that can be restricted in a new kernel.
I suggested using the plain numbers, as defined in the landlock.h uapi header. This way, we can expose nice names in the go bindings, but users like containerd and so can still use new features if they just use the numbers (they can define a friendly constant for the number, etc.). This removes the runtime-spec and runtimes to create new releases to support every new "string" supported. We will need a new release for things like "now we can restrict a whole new category", like the network stuff that was added at some point in the past, but that isn't as often and seems like a reasonable trade-off for me. But I'm open to other ideas, of course :-)
* **`handledAccess`** (object, OPTIONAL) specifies the access rights that will be restricted by the ruleset. | ||
The `handledAccess` currently contains the following types: | ||
* **`handledAccessFS`** (array of strings, OPTIONAL) is an array of FS typed actions that are handled by a ruleset. | ||
If no rule explicitly allow them, they should then be forbidden. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this what landlock does? I thought it will allow everything that is not handled here. Otherwise, backwards compatibility (when you add a new access type here, like kernel X added access type Y) will be impossible, right?
Doing a local experiment, this is not indeed what landlock does, so we can't promise this. It will be the case if we don't mention it in path_beneath. But if we mention it here, this is not true. Maybe it is just c&p that went unnoticed?
The `handledAccess` currently contains the following types: | ||
* **`handledAccessFS`** (array of strings, OPTIONAL) is an array of FS typed actions that are handled by a ruleset. | ||
If no rule explicitly allow them, they should then be forbidden. | ||
* **`handledAccessNetwork`** (array of strings, OPTIONAL) is an array of NETWORK typed actions that are handled by a ruleset. (The NETWORK typed actions are available when the ABI version >= 4. The behavior when the ABI version is less than 4 will depend on the **`enableBestEffort`**) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd remove the last parenthesis. That is the case with all the things, like some handledAccessFs types might not be supported, etc.
The `rules` currently contains the following types: | ||
* **`pathBeneath`** (array of objects, OPTIONAL) is an array of the file-hierarchy typed rules. | ||
Entries in the array contain the following properties: | ||
* **`allowedAccess`** (array of strings, OPTIONAL) is an array of FS typed actions that are allowed by a rule. The actions are grouped by the ABI version in the following description: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want strings or ints here? If we use ints, we can handle new values without requiring to add them to the spec. Of course, we will need to provide a friendly name for known things in the go bindings, but until a new release is done, people can use a number and it will just work.
Furthermore, what is here is already out of date. ABI 5 now is supported by new kernels (to restrict ioctls). So I think we need a more "future proof" spec :-)
What do others think?
* **`paths`** (array of strings, OPTIONAL) is an array of files or parent directories of the file hierarchies to restrict. | ||
* **`networkPort`** (array of objects, OPTIONAL) is an array of the network socket rules. | ||
Entries in the array contain the following properties: | ||
* **`allowedAccess`** (array of strings, OPTIONAL) is an array of NETWORK typed actions that are allowed by a rule. The actions are grouped by the ABI version in the following description: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idem here, we might get away with ints?
1. bind | ||
2. connect | ||
* **`ports`** (array of strings, OPTIONAL) is an array of network ports to restrict. | ||
* **`enableBestEffort`** (bool, OPTIONAL) the `enableBestEffort` field disables the best-effort security approach for Landlock access rights. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned before, without exposing landlock enabled as a bool in the features subcommand, a high level container runtime (like containerd or crio) can't know if the low-level container runtime implementing this (like crun or runc) supports landlock at all. And in that case, the runtimes that don't support landlock will just ignore this whole section and the container will be created just fine, no landlock enforced.
Therefore, I think we really need to expose this in the features subcommand too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, we need to be careful to not give users a false sense of security.
* **`enableBestEffort`** (bool, OPTIONAL) the `enableBestEffort` field disables the best-effort security approach for Landlock access rights. | ||
This is for conditions when the Landlock access rights explicitly configured by the container are not supported or available in the running kernel. | ||
If the best-effort security approach is enabled (`false`), the runtime SHOULD enforce the strongest rules configured up to the current kernel support, and only be [logged as a warning](runtime.md#warnings) for those not supported. | ||
If disabled (`true`), the runtime MUST [generate an error](runtime.md#errors) if one or more rules specified by the container is not supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per the other comment, this only work for runtimes that use this spec version. For runc 1.1 or 1.2 and probably 1.3, it's mandated by the spec to ignore unrecognized fields. Therefore, this is not safe when you take into account that we need to support older runtimes from a high-level container like containerd or crio
|
||
* **`handledAccess`** (object, OPTIONAL) specifies the access rights that will be restricted by the ruleset. | ||
The `handledAccess` currently contains the following types: | ||
* **`handledAccessFS`** (array of strings, OPTIONAL) is an array of FS typed actions that are handled by a ruleset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The int ideas can apply here too.
The `handledAccess` currently contains the following types: | ||
* **`handledAccessFS`** (array of strings, OPTIONAL) is an array of FS typed actions that are handled by a ruleset. | ||
If no rule explicitly allow them, they should then be forbidden. | ||
* **`handledAccessNetwork`** (array of strings, OPTIONAL) is an array of NETWORK typed actions that are handled by a ruleset. (The NETWORK typed actions are available when the ABI version >= 4. The behavior when the ABI version is less than 4 will depend on the **`enableBestEffort`**) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idem, what about using ints for this too?
1. ABI version >= 4: | ||
1. bind | ||
2. connect | ||
* **`ports`** (array of strings, OPTIONAL) is an array of network ports to restrict. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is a typo, we want int for ports, right?
Using keywords with a well-defined semantic should be favored, but I agree that giving the possibility for users to use currently-unknown-to-the-runtime access rights would enable them to benefit from more security features. I'd like to be able to specify both keywords and numbers though. Current Landlock users are looking for groups of access rights (e.g. read-only vs. read-write), and specific keywords could also be used for that (taking into account compatibility). I started working on a standalone tool and library that reads a JSON configuration and creates a Landlock sandbox. The overall approach looks pretty similar to this specification and I'd like to combine our effort to get a common configuration format for Landlock. This would also be easier to synchronize and maintain wrt. to the kernel changes (that I maintain). The configuration should be a bit more flexible, closer to the kernel's UAPI, handle file descriptors as well, and ease handling of backward and forward compatibility. I plan to release an initial experimental version next week and I'll add a comment to this PR to point to the repository. |
@l0kod +1 on not creating two different json specs to describe a landlock policy. I'll monitor this for when you post the link to the experimental release :) |
Here is an initial experimental version: https://github.com/landlock-lsm/landlockconfig |
Thanks for the work. I think it would be great if we have a common landlock spec |
@@ -352,6 +352,52 @@ For Linux-based systems, the `process` object supports the following process-spe | |||
for `initial`. If omitted or empty, runtime SHOULD NOT change process' | |||
CPU affinity after the process is moved to container's cgroup, and the | |||
final affinity is determined by the Linux kernel. | |||
* **`landlock`** (object, OPTIONAL) specifies the Landlock unprivileged access control settings for the container process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this doesn't belong to this file. This should be in config-linux. The same with the secion of the struct this is, this should be inside the Linux section, as seccomp is. Not as a general feature, as this is a linux-only feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ouch, there is a section for Linux here too. Sorry!
@@ -96,8 +96,86 @@ type Process struct { | |||
IOPriority *LinuxIOPriority `json:"ioPriority,omitempty" platform:"linux"` | |||
// ExecCPUAffinity specifies CPU affinity for exec processes. | |||
ExecCPUAffinity *CPUAffinity `json:"execCPUAffinity,omitempty" platform:"linux"` | |||
// Landlock specifies the Landlock unprivileged access control settings for the container process. | |||
// `noNewPrivileges` must be enabled to use Landlock. | |||
Landlock *Landlock `json:"landlock,omitempty" platform:"linux"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idem, this is the wrong section. This should be inside the linux-only part, not the generic part. This only applies to linux
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ouch, there is a section for Linux here too. Sorry!
Added some other comments so the next iteration/PR doesn't do those mistakes too. |
@l0kod I'm no maintainer here, but having a quick look the json schema seems mostly okay to me. Some comments:
Thanks! |
Correct, I reverted this part for now because it is not implemented yet, but I will.
OK, I'd like to make it optional in the library then, it would be a shame to start a new schema with a legacy issue. 😉
|
Carry #1111