社区首页 >专栏 >Docker及Kubernetes下device使用和分析

Docker及Kubernetes下device使用和分析

原创

langwu 吴英文

修改于 2019-09-16 14:14:02

10.8K0

文章被收录于专栏：KubernetesKubernetes

Docker下使用device

默认情况下，Docker 容器内无法访问宿主机上的设备，比如/dev/mem

Docker有两种方式访问设备，一种是使用特权模式，一种是通过--device指定要访问的设备。

非特权模式下，容器内的root用户相当于宿主机上的普通用户，使用特权模式后，容器内的root用户将真正获得root权限，可以访问很多host上的设备，包括/dev/mem，GPU等

使用特权模式会将一些容器不需要用到的权限也放开，存在较大风险。所以在设备上，一般使用--device来指定容器可使用的设备

需要说明的是，使用--device挂载的设备，容器内的进程通常没有权限操作，需要使用--cap-add开放相应的权限，如下

Kubernetes下使用device

Kubernetes支持--device问题在社区上讨论了很久，感兴趣的可以看下#5607。当前的解决方案是使用device plugins机制来注册要访问的设备，典型的如GPU（https://github.com/NVIDIA/k8s-device-plugin）。同样，如果pod要使用/dev/mem，也需要有一个device plugin将/dev/mem注册到Kubernetes中，注册成功后，可在相应节点中查看到该设备资源信息，这时就可以在pod中使用了。

Kubernetes device plugin设计实现可见https://github.com/kubernetes/community/blob/master/contributors/design-proposals/resource-management/device-plugin.md

由于/dev下有很多的设备，每个device都写一个device plugin确实很麻烦，有一给力的哥们开源了个k8s-hostdev-plugin项目（https://github.com/honkiko/k8s-hostdev-plugin），可基于该项目挂载/dev下的一些设备（该项目当前有个缺陷，后面源码分析会提到）。

下载k8s-hostdev-plugin包，编辑 hostdev-plugin-ds.yaml中的containers.*.args，如下

执行kubectl create -f hostdev-plugin-ds.yaml创建daemonset对象。

当daemonset的pod起来后，执行kubectl describe node检查/dev/mem是否有注册到Kubernetes中。当node的Capacity和Allocatable有hostdev.k8s.io/dev_mem时，说明/dev/mem注册成功

在业务pod中使用/dev/mem，与使用cpu等resource一样。需要注意的是，扩展资源仅支持整型的资源，且容器规格中声明的 limit 与 request 必须相等

k8s-hostdev-plugin实现分析

k8s-device-plugin是怎么实现将/dev/mem挂载到容器内的呢？我们先用docker inspect CONTAINERID看pod的容器

和直接用docker run --device跑起来的容器一样。由此可知k8s-device-plugin最终还是基于Docker的--device来指定容器可访问的设备

Kubernetes device plugin API 提供了以下几种方式来设置容器

type ContainerAllocateResponse struct {
	// List of environment variable to be set in the container to access one of more devices.
	Envs map[string]string `protobuf:"bytes,1,rep,name=envs" json:"envs,omitempty" protobuf_key:"bytes,1,opt,name=key,proto3" protobuf_val:"bytes,2,opt,name=value,proto3"`
	// Mounts for the container.
	Mounts []*Mount `protobuf:"bytes,2,rep,name=mounts" json:"mounts,omitempty"`
	// Devices for the container.
	Devices []*DeviceSpec `protobuf:"bytes,3,rep,name=devices" json:"devices,omitempty"`
	// Container annotations to pass to the container runtime
	Annotations map[string]string `protobuf:"bytes,4,rep,name=annotations" json:"annotations,omitempty" protobuf_key:"bytes,1,opt,name=key,proto3" protobuf_val:"bytes,2,opt,name=value,proto3"`
}

其中Envs表示环境变量，如NVIDIA GPU device plugin就是通过这个来指定容器可运行的GPU。Devices则对应容器的--device，k8s-hostdev-plugin就是通过该方式来指定容器可使用的设备。

看下k8s-hostdev-plugin的代码实现

// NewHostDevicePlugin returns an initialized HostDevicePlugin
func NewHostDevicePlugin(devCfg *DevConfig) (*HostDevicePlugin, error) {
	normalizedName, err := NomalizeDevName(devCfg.DevName)
	if err != nil {
		return nil, err
	}

  //要注册到Kubernetes的设备信息
	devs := []*pluginapi.Device {
		&pluginapi.Device{ID: devCfg.DevName, Health: pluginapi.Healthy},
	}

	return &HostDevicePlugin{
		DevName: 		devCfg.DevName,
		Permissions:    devCfg.Permissions,
		NormalizedName: normalizedName,
		ResourceName:   ResourceNamePrefix + normalizedName,
		UnixSockPath:   pluginapi.DevicePluginPath + normalizedName,
		Dev:			devs,
		StopChan: 		make(chan interface{}),
		IsRigistered: false,
	}, nil
}

上面的pluginapi.Device表示一个设备，包含设备ID和设备状态两个字段。需要注意的是，扩展资源仅支持整型的资源，因为这里只new了一个设备，所以最多只能有一个pod能使用这个resource。如果要运行多个使用该resource的pod，可以多new几个pluginapi.Device，确保DeviceID不一样就可以了。（目前该项目还未支持该功能，需要使用者自己去修改扩展）。

k8s-hostdev-plugin向kubelet注册device resource信息后，kubelet会调用ListAndWatch()方法获取所有设备信息。ListAndWatch()将device信息发送给kubelet后，会定时上报device的状态。实现如下

// ListAndWatch lists devices and update that list according to the health status
func (plugin *HostDevicePlugin) ListAndWatch(e *pluginapi.Empty, s pluginapi.DevicePlugin_ListAndWatchServer) error {

	s.Send(&pluginapi.ListAndWatchResponse{Devices: plugin.Dev})

	ticker := time.NewTicker(time.Second * 10)

	for {
		select {
		case <-plugin.StopChan:
			return nil
		case <-ticker.C:
			s.Send(&pluginapi.ListAndWatchResponse{Devices: plugin.Dev})
		}
	}
	return nil
}

当pod的resources.limits中使用该resource时，kubelet会调用Allocate()方法请求资源信息，Allocate()方法可根据请求的DeviceID返回相应的信息。这里因为要将/dev下的设备挂载到容器中，使用了ContainerAllocateResponse.Devices。在pluginapi.DeviceSpec中可指定host和容器的device路径，以及读写权限。具体实现如下

// Allocate which return list of devices.
func (plugin *HostDevicePlugin) Allocate(ctx context.Context, r *pluginapi.AllocateRequest) (*pluginapi.AllocateResponse, error) {
	//spew.Printf("Context: %#v\n", ctx)
	spew.Printf("AllocateRequest: %#v\n", *r)

	response := pluginapi.AllocateResponse{}

  //指定host和容器的device路径，以及读写权限
	devSpec := pluginapi.DeviceSpec {
		HostPath: plugin.DevName,
		ContainerPath: plugin.DevName,
		Permissions: plugin.Permissions,
	}

	//log.Debugf("Request IDs: %v", r)
	var devicesList []*pluginapi.ContainerAllocateResponse

  //构建返回给kubelet的device resource信息
	devicesList = append(devicesList, &pluginapi.ContainerAllocateResponse{
		Envs: make(map[string]string),
		Annotations: make(map[string]string),
		Devices: []*pluginapi.DeviceSpec{&devSpec},
		Mounts: nil,
	})

	response.ContainerResponses = devicesList

	spew.Printf("AllocateResponse: %#v\n", devicesList)

	return &response, nil
}