Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libcontainer: intelrdt: add support for Intel RDT/MBA in runc #1632

Merged
merged 6 commits into from
Oct 16, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 35 additions & 3 deletions events.go
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,13 @@ type l3CacheInfo struct {
NumClosids uint64 `json:"num_closids,omitempty"`
}

type memBwInfo struct {
BandwidthGran uint64 `json:"bandwidth_gran,omitempty"`
DelayLinear uint64 `json:"delay_linear,omitempty"`
MinBandwidth uint64 `json:"min_bandwidth,omitempty"`
NumClosids uint64 `json:"num_closids,omitempty"`
}

type intelRdt struct {
// The read-only L3 cache information
L3CacheInfo *l3CacheInfo `json:"l3_cache_info,omitempty"`
Expand All @@ -113,6 +120,15 @@ type intelRdt struct {

// The L3 cache schema in 'container_id' group
L3CacheSchema string `json:"l3_cache_schema,omitempty"`

// The read-only memory bandwidth information
MemBwInfo *memBwInfo `json:"mem_bw_info,omitempty"`

// The read-only memory bandwidth schema in root
MemBwSchemaRoot string `json:"mem_bw_schema_root,omitempty"`

// The memory bandwidth schema in 'container_id' group
MemBwSchema string `json:"mem_bw_schema,omitempty"`
}

var eventsCommand = cli.Command{
Expand Down Expand Up @@ -248,9 +264,16 @@ func convertLibcontainerStats(ls *libcontainer.Stats) *stats {
}

if is := ls.IntelRdtStats; is != nil {
s.IntelRdt.L3CacheInfo = convertL3CacheInfo(is.L3CacheInfo)
s.IntelRdt.L3CacheSchemaRoot = is.L3CacheSchemaRoot
s.IntelRdt.L3CacheSchema = is.L3CacheSchema
if intelrdt.IsCatEnabled() {
s.IntelRdt.L3CacheInfo = convertL3CacheInfo(is.L3CacheInfo)
s.IntelRdt.L3CacheSchemaRoot = is.L3CacheSchemaRoot
s.IntelRdt.L3CacheSchema = is.L3CacheSchema
}
if intelrdt.IsMbaEnabled() {
s.IntelRdt.MemBwInfo = convertMemBwInfo(is.MemBwInfo)
s.IntelRdt.MemBwSchemaRoot = is.MemBwSchemaRoot
s.IntelRdt.MemBwSchema = is.MemBwSchema
}
}

return &s
Expand Down Expand Up @@ -293,3 +316,12 @@ func convertL3CacheInfo(i *intelrdt.L3CacheInfo) *l3CacheInfo {
NumClosids: i.NumClosids,
}
}

func convertMemBwInfo(i *intelrdt.MemBwInfo) *memBwInfo {
return &memBwInfo{
BandwidthGran: i.BandwidthGran,
DelayLinear: i.DelayLinear,
MinBandwidth: i.MinBandwidth,
NumClosids: i.NumClosids,
}
}
105 changes: 69 additions & 36 deletions libcontainer/SPEC.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,17 +156,21 @@ init process will block waiting for the parent to finish setup.

### IntelRdt

Intel platforms with new Xeon CPU support Intel Resource Director Technology
(RDT). Cache Allocation Technology (CAT) is a sub-feature of RDT, which
currently supports L3 cache resource allocation.
Intel platforms with new Xeon CPU support Resource Director Technology (RDT).
Cache Allocation Technology (CAT) and Memory Bandwidth Allocation (MBA) are
two sub-features of RDT.

This feature provides a way for the software to restrict cache allocation to a
defined 'subset' of L3 cache which may be overlapping with other 'subsets'.
The different subsets are identified by class of service (CLOS) and each CLOS
has a capacity bitmask (CBM).
Cache Allocation Technology (CAT) provides a way for the software to restrict
cache allocation to a defined 'subset' of L3 cache which may be overlapping
with other 'subsets'. The different subsets are identified by class of
service (CLOS) and each CLOS has a capacity bitmask (CBM).

It can be used to handle L3 cache resource allocation for containers if
hardware and kernel support Intel RDT/CAT.
Memory Bandwidth Allocation (MBA) provides indirect and approximate throttle
over memory bandwidth for the software. A user controls the resource by
indicating the percentage of maximum memory bandwidth.

It can be used to handle L3 cache and memory bandwidth resources allocation
for containers if hardware and kernel support Intel RDT CAT and MBA features.

In Linux 4.10 kernel or newer, the interface is defined and exposed via
"resource control" filesystem, which is a "cgroup-like" interface.
Expand All @@ -175,66 +179,95 @@ Comparing with cgroups, it has similar process management lifecycle and
interfaces in a container. But unlike cgroups' hierarchy, it has single level
filesystem layout.

CAT and MBA features are introduced in Linux 4.10 and 4.12 kernel via
"resource control" filesystem.

Intel RDT "resource control" filesystem hierarchy:
```
mount -t resctrl resctrl /sys/fs/resctrl
tree /sys/fs/resctrl
/sys/fs/resctrl/
|-- info
| |-- L3
| |-- cbm_mask
| |-- min_cbm_bits
| | |-- cbm_mask
| | |-- min_cbm_bits
| | |-- num_closids
| |-- MB
| |-- bandwidth_gran
| |-- delay_linear
| |-- min_bandwidth
| |-- num_closids
|-- cpus
|-- ...
|-- schemata
|-- tasks
|-- <container_id>
|-- cpus
|-- ...
|-- schemata
|-- tasks

```

For runc, we can make use of `tasks` and `schemata` configuration for L3 cache
resource constraints.
For runc, we can make use of `tasks` and `schemata` configuration for L3
cache and memory bandwidth resources constraints.

The file `tasks` has a list of tasks that belongs to this group (e.g.,
<container_id>" group). Tasks can be added to a group by writing the task ID
to the "tasks" file (which will automatically remove them from the previous
to the "tasks" file (which will automatically remove them from the previous
group to which they belonged). New tasks created by fork(2) and clone(2) are
added to the same group as their parent. If a pid is not in any sub group, it
is in root group.
added to the same group as their parent.

The file `schemata` has allocation masks/values for L3 cache on each socket,
which contains L3 cache id and capacity bitmask (CBM).
The file `schemata` has a list of all the resources available to this group.
Each resource (L3 cache, memory bandwidth) has its own line and format.

L3 cache schema:
It has allocation bitmasks/values for L3 cache on each socket, which
contains L3 cache id and capacity bitmask (CBM).
```
Format: "L3:<cache_id0>=<cbm0>;<cache_id1>=<cbm1>;..."
```
For example, on a two-socket machine, L3's schema line could be `L3:0=ff;1=c0`
Which means L3 cache id 0's CBM is 0xff, and L3 cache id 1's CBM is 0xc0.
For example, on a two-socket machine, the schema line could be "L3:0=ff;1=c0"
which means L3 cache id 0's CBM is 0xff, and L3 cache id 1's CBM is 0xc0.

The valid L3 cache CBM is a *contiguous bits set* and number of bits that can
be set is less than the max bit. The max bits in the CBM is varied among
supported Intel Xeon platforms. In Intel RDT "resource control" filesystem
layout, the CBM in a group should be a subset of the CBM in root. Kernel will
check if it is valid when writing. e.g., 0xfffff in root indicates the max bits
of CBM is 20 bits, which mapping to entire L3 cache capacity. Some valid CBM
values to set in a group: 0xf, 0xf0, 0x3ff, 0x1f00 and etc.
supported Intel CPU models. Kernel will check if it is valid when writing.
e.g., default value 0xfffff in root indicates the max bits of CBM is 20
bits, which mapping to entire L3 cache capacity. Some valid CBM values to
set in a group: 0xf, 0xf0, 0x3ff, 0x1f00 and etc.

Memory bandwidth schema:
It has allocation values for memory bandwidth on each socket, which contains
L3 cache id and memory bandwidth percentage.
```
Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."
```
For example, on a two-socket machine, the schema line could be "MB:0=20;1=70"

The minimum bandwidth percentage value for each CPU model is predefined and
can be looked up through "info/MB/min_bandwidth". The bandwidth granularity
that is allocated is also dependent on the CPU model and can be looked up at
"info/MB/bandwidth_gran". The available bandwidth control steps are:
min_bw + N * bw_gran. Intermediate values are rounded to the next control
step available on the hardware.

For more information about Intel RDT/CAT kernel interface:
For more information about Intel RDT kernel interface:
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt

An example for runc:
```
An example for runc:
Consider a two-socket machine with two L3 caches where the default CBM is
0xfffff and the max CBM length is 20 bits. With this configuration, tasks
inside the container only have access to the "upper" 80% of L3 cache id 0 and
the "lower" 50% L3 cache id 1:
0x7ff and the max CBM length is 11 bits, and minimum memory bandwidth of 10%
with a memory bandwidth granularity of 10%.

Tasks inside the container only have access to the "upper" 7/11 of L3 cache
on socket 0 and the "lower" 5/11 L3 cache on socket 1, and may use a
maximum memory bandwidth of 20% on socket 0 and 70% on socket 1.

"linux": {
"intelRdt": {
"l3CacheSchema": "L3:0=ffff0;1=3ff"
}
"intelRdt": {
"closID": "guaranteed_group",
"l3CacheSchema": "L3:0=7f0;1=1f",
"memBwSchema": "MB:0=20;1=70"
}
}
```

Expand Down
4 changes: 2 additions & 2 deletions libcontainer/configs/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -186,8 +186,8 @@ type Config struct {
// callers keyring in this case.
NoNewKeyring bool `json:"no_new_keyring"`

// IntelRdt specifies settings for Intel RDT/CAT group that the container is placed into
// to limit the resources (e.g., L3 cache) the container has available
// IntelRdt specifies settings for Intel RDT group that the container is placed into
// to limit the resources (e.g., L3 cache, memory bandwidth) the container has available
IntelRdt *IntelRdt `json:"intel_rdt,omitempty"`

// RootlessEUID is set when the runc was launched with non-zero EUID.
Expand Down
4 changes: 4 additions & 0 deletions libcontainer/configs/intelrdt.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,8 @@ type IntelRdt struct {
// The schema for L3 cache id and capacity bitmask (CBM)
// Format: "L3:<cache_id0>=<cbm0>;<cache_id1>=<cbm1>;..."
L3CacheSchema string `json:"l3_cache_schema,omitempty"`

// The schema of memory bandwidth percentage per L3 cache id
// Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."
MemBwSchema string `json:"memBwSchema,omitempty"`
}
19 changes: 15 additions & 4 deletions libcontainer/configs/validate/validator.go
Original file line number Diff line number Diff line change
Expand Up @@ -169,11 +169,22 @@ func (v *ConfigValidator) sysctl(config *configs.Config) error {

func (v *ConfigValidator) intelrdt(config *configs.Config) error {
if config.IntelRdt != nil {
if !intelrdt.IsEnabled() {
return fmt.Errorf("intelRdt is specified in config, but Intel RDT feature is not supported or enabled")
if !intelrdt.IsCatEnabled() && !intelrdt.IsMbaEnabled() {
return fmt.Errorf("intelRdt is specified in config, but Intel RDT is not supported or enabled")
}
if config.IntelRdt.L3CacheSchema == "" {
return fmt.Errorf("intelRdt is specified in config, but intelRdt.l3CacheSchema is empty")

if !intelrdt.IsCatEnabled() && config.IntelRdt.L3CacheSchema != "" {
return fmt.Errorf("intelRdt.l3CacheSchema is specified in config, but Intel RDT/CAT is not enabled")
}
if !intelrdt.IsMbaEnabled() && config.IntelRdt.MemBwSchema != "" {
return fmt.Errorf("intelRdt.memBwSchema is specified in config, but Intel RDT/MBA is not enabled")
}

if intelrdt.IsCatEnabled() && config.IntelRdt.L3CacheSchema == "" {
return fmt.Errorf("Intel RDT/CAT is enabled and intelRdt is specified in config, but intelRdt.l3CacheSchema is empty")
}
if intelrdt.IsMbaEnabled() && config.IntelRdt.MemBwSchema == "" {
return fmt.Errorf("Intel RDT/MBA is enabled and intelRdt is specified in config, but intelRdt.memBwSchema is empty")
}
}

Expand Down
14 changes: 12 additions & 2 deletions libcontainer/container_linux_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,7 @@ func TestGetContainerStats(t *testing.T) {
intelRdtManager: &mockIntelRdtManager{
stats: &intelrdt.Stats{
L3CacheSchema: "L3:0=f;1=f0",
MemBwSchema: "MB:0=20;1=70",
},
},
}
Expand All @@ -161,14 +162,22 @@ func TestGetContainerStats(t *testing.T) {
if stats.CgroupStats.MemoryStats.Usage.Usage != 1024 {
t.Fatalf("expected memory usage 1024 but received %d", stats.CgroupStats.MemoryStats.Usage.Usage)
}
if intelrdt.IsEnabled() {
if intelrdt.IsCatEnabled() {
if stats.IntelRdtStats == nil {
t.Fatal("intel rdt stats are nil")
}
if stats.IntelRdtStats.L3CacheSchema != "L3:0=f;1=f0" {
t.Fatalf("expected L3CacheSchema L3:0=f;1=f0 but recevied %s", stats.IntelRdtStats.L3CacheSchema)
}
}
if intelrdt.IsMbaEnabled() {
if stats.IntelRdtStats == nil {
t.Fatal("intel rdt stats are nil")
}
if stats.IntelRdtStats.MemBwSchema != "MB:0=20;1=70" {
t.Fatalf("expected MemBwSchema MB:0=20;1=70 but recevied %s", stats.IntelRdtStats.MemBwSchema)
}
}
}

func TestGetContainerState(t *testing.T) {
Expand Down Expand Up @@ -210,6 +219,7 @@ func TestGetContainerState(t *testing.T) {
intelRdtManager: &mockIntelRdtManager{
stats: &intelrdt.Stats{
L3CacheSchema: "L3:0=f0;1=f",
MemBwSchema: "MB:0=70;1=20",
},
path: expectedIntelRdtPath,
},
Expand All @@ -232,7 +242,7 @@ func TestGetContainerState(t *testing.T) {
if memPath := paths["memory"]; memPath != expectedMemoryPath {
t.Fatalf("expected memory path %q but received %q", expectedMemoryPath, memPath)
}
if intelrdt.IsEnabled() {
if intelrdt.IsCatEnabled() || intelrdt.IsMbaEnabled() {
intelRdtPath := state.IntelRdtPath
if intelRdtPath == "" {
t.Fatal("intel rdt path should not be empty")
Expand Down
6 changes: 3 additions & 3 deletions libcontainer/factory_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ func RootlessCgroupfs(l *LinuxFactory) error {

// IntelRdtfs is an options func to configure a LinuxFactory to return
// containers that use the Intel RDT "resource control" filesystem to
// create and manage Intel Xeon platform shared resources (e.g., L3 cache).
// create and manage Intel RDT resources (e.g., L3 cache, memory bandwidth).
func IntelRdtFs(l *LinuxFactory) error {
l.NewIntelRdtManager = func(config *configs.Config, id string, path string) intelrdt.Manager {
return &intelrdt.IntelRdtManager{
Expand Down Expand Up @@ -222,7 +222,7 @@ func (l *LinuxFactory) Create(id string, config *configs.Config) (Container, err
newgidmapPath: l.NewgidmapPath,
cgroupManager: l.NewCgroupsManager(config.Cgroups, nil),
}
if intelrdt.IsEnabled() {
if intelrdt.IsCatEnabled() || intelrdt.IsMbaEnabled() {
c.intelRdtManager = l.NewIntelRdtManager(config, id, "")
}
c.state = &stoppedState{c: c}
Expand Down Expand Up @@ -268,7 +268,7 @@ func (l *LinuxFactory) Load(id string) (Container, error) {
if err := c.refreshState(); err != nil {
return nil, err
}
if intelrdt.IsEnabled() {
if intelrdt.IsCatEnabled() || intelrdt.IsMbaEnabled() {
c.intelRdtManager = l.NewIntelRdtManager(&state.Config, id, state.IntelRdtPath)
}
return c, nil
Expand Down
Loading