vDPA 设备是一种遵循virtio 数据路径规范但具有特定于供应商的控制路径的设备。
vDPA 设备既可以物理位于硬件上,也可以通过软件模拟。
主机内核中只需要一个小型 vDPA 父驱动程序来处理控制路径。主要优点是所有 vDPA 设备都具有统一的软件堆栈:
近年来,已经发表了许多博客文章和演讲,可以帮助您更好地了解 vDPA 和用例。我们在vdpa-dev.gitlab.io上 收集了其中一些;我建议您至少探索以下内容:
vDPA 中的大部分工作是由网络设备驱动的,但近年来,我们也开发了对块设备的支持。
主要用例肯定是利用硬件直接模拟 virtio-blk 设备并支持不同的网络后端,例如 Ceph RBD 或 iSCSI。这是某些 SmartNIC 或 DPU 的目标,它们当然能够模拟 virtio-net 设备,但也能模拟用于网络存储的 virtio-blk。
vDPA 提供的抽象还使软件加速器成为可能,类似于现有的 vhost 或 vhost-user 设备。我们在 2021 年 KVM 论坛上讨论过这个问题。
我们在那次演讲中讨论了快速路径和慢速路径。当 QEMU 需要处理请求(例如支持实时迁移或执行 I/O 限制)时,它会使用慢速路径。在慢速路径期间,暴露给客户机的设备在 QEMU 中模拟。QEMU 利用 libblkio 中实现的驱动程序拦截请求并将其转发到 vDPA 设备。另一方面,当 QEMU 不需要干预时,快速路径就会发挥作用。在这种情况下,vDPA 设备可以直接暴露给客户机,绕过 QEMU 的模拟。
libblkio公开了用于在用户空间中访问块设备的通用 API。它支持多个驱动程序。我们将重点介绍QEMU 中块设备virtio-blk-vhost-vdpa
使用的驱动程序virtio-blk-vhost-vdpa
。它目前仅支持慢速路径,但将来应该能够自动切换到快速路径。自 QEMU 7.2 以来,它支持 libblkio 驱动程序,因此您可以使用以下选项将 vDPA 块设备连接到虚拟机:
-blockdev node-name=drive_src1,driver=virtio-blk-vhost-vdpa,path=/dev/vhost-vdpa-0,cache.direct=on \
-device virtio-blk-pci,id=src1,bootindex=2,drive=drive_src1 \
无论如何,为了充分利用 vDPA 硬件设备的性能,我们始终可以使用QEMU 提供的通用设备vhost-vdpa-device-pci,该设备支持任何 vDPA 设备并将其直接公开给客户机。当然,QEMU 无法在这种情况下拦截请求,因此其块层提供的一些功能(例如实时迁移、磁盘格式等)不受支持。从 QEMU 8.0 开始,您可以使用以下选项将通用 vDPA 设备连接到 VM:
-device vhost-vdpa-device-pci,vhostdev=/dev/vhost-vdpa-0
在 2022 年 KVM 论坛上,Alberto Faria 和 Stefan Hajnoczi 介绍了 libblkio,而Kevin Wolf 和我讨论了它在 QEMU 存储守护进程 (QSD) 中的用法。
vDPA 的一大优势是其强大的抽象性,支持在硬件和软件中实现 virtio 设备(无论是在内核还是用户空间中)。这种统一在单一框架下,设备对于 QEMU 而言是相同的,有助于无缝集成硬件和软件组件。
关于内核设备,从 Linux v5.13 开始,存在一个专为开发和调试目的而设计的简单模拟器。它可通过vdpa-sim-blk
内核模块使用,该模块模拟 128 MB 的 ramdisk。正如 KVM Forum 2021 上的演讲中所强调的那样,内核中的未来设备(类似于反复提出但从未合并的vhost-blk
)可能会提供出色的性能。当硬件不可用时,这种设备可以用作替代方案,例如,促进任何系统中的实时迁移,无论目标系统是否具有 SmartNIC/DPU。
相反,关于用户空间,我们可以使用 VDUSE。QSD 支持它,因此允许我们以这种方式导出 QEMU 支持的任何磁盘映像,例如导出vDPA 设备vduse0:
qemu-storage-daemon \
--blockdev file,filename=/path/to/disk.qcow2,node-name=file \
--blockdev qcow2,file=file,node-name=qcow2 \
--export type=vduse-blk,id=vduse0,name=vduse0,node-name=qcow2,writable=on
如介绍中所述,vDPA 支持不同的总线,例如 vhost-vdpa
和virtio-vdpa
。驱动:vhost-vdpa, 这种灵活性使得可以通过总线将 vDPA 设备与虚拟机或用户空间驱动程序(例如 libblkio)一起使用。此外,virtio-vdpa驱动还允许通过总线与直接在主机上或容器内运行的应用程序进行交互。
iproute2中的工具vdpa
可以通过netlink方便的管理vdpa设备,可以分配和释放这些设备。
从 Linux 5.17 开始,vDPA 驱动程序支持driver_ovveride
。此增强功能允许在运行时进行动态重新配置,从而允许设备以这种方式从一条总线迁移到另一条总线:
# load vdpa buses
$ modprobe -a virtio-vdpa vhost-vdpa
# load vdpa-blk in-kernel simulator
$ modprobe vdpa-sim-blk
# instantiate a new vdpasim_blk device called `vdpa0`
$ vdpa dev add mgmtdev vdpasim_blk name vdpa0
# `vdpa0` is attached to the first vDPA bus driver loaded
$ driverctl -b vdpa list-devices
vdpa0 virtio_vdpa
# change the `vdpa0` bus to `vhost-vdpa` 迁移总线vdpa0 -> vhost-vdpa
$ driverctl -b vdpa set-override vdpa0 vhost_vdpa
# `vdpa0` is now attached to the `vhost-vdpa` bus
$ driverctl -b vdpa list-devices
vdpa0 vhost_vdpa [*]
# Note: driverctl(8) integrates with udev so the binding is preserved.
以下是关于如何将VDUSE和QEMU 存储守护进程 与虚拟机 ( QEMU
) 或容器 ( podman
) 结合使用的几个示例。这些步骤可轻松适应通过 vDPA 支持 virtio-blk 设备的任何硬件。
# load vdpa buses
$ modprobe -a virtio-vdpa vhost-vdpa
# create an empty qcow2 image
$ qemu-img create -f qcow2 test.qcow2 10G
# load vduse kernel module
$ modprobe vduse
# launch QSD exposing the `test.qcow2` image as `vduse0` vDPA device
$ qemu-storage-daemon --blockdev file,filename=test.qcow2,node-name=file \
--blockdev qcow2,file=file,node-name=qcow2 \
--export vduse-blk,id=vduse0,name=vduse0,num-queues=1,node-name=qcow2,writable=on &
# instantiate the `vduse0` device (same name used in QSD)
$ vdpa dev add name vduse0 mgmtdev vduse
# be sure to attach it to the `virtio-vdpa` device to use with host applications
$ driverctl -b vdpa set-override vduse0 virtio_vdpa
# device exposed as a virtio device, but attached to the host kernel
$ lsblk -pv
NAME TYPE TRAN SIZE RQ-SIZE MQ
/dev/vda disk virtio 10G 256 1
# start a container with `/dev/vda` attached
podman run -it --rm --device /dev/vda --group-add keep-groups fedora:39 bash
# download Fedora cloud image (or use any other bootable image you want)
$ wget https://download.fedoraproject.org/pub/fedora/linux/releases/39/Cloud/x86_64/images/Fedora-Cloud-Base-39-1.5.x86_64.qcow2
# launch QSD exposing the VM image as `vduse1` vDPA device
$ qemu-storage-daemon \
--blockdev file,filename=Fedora-Cloud-Base-39-1.5.x86_64.qcow2,node-name=file \
--blockdev qcow2,file=file,node-name=qcow2 \
--export vduse-blk,id=vduse1,name=vduse1,num-queues=1,node-name=qcow2,writable=on &
# instantiate the `vduse1` device (same name used in QSD)
$ vdpa dev add name vduse1 mgmtdev vduse
# initially it's attached to the host (`/dev/vdb`), because `virtio-vdpa`
# is the first kernel module we loaded
$ lsblk -pv
NAME TYPE TRAN SIZE RQ-SIZE MQ
/dev/vda disk virtio 10G 256 1
/dev/vdb disk virtio 5G 256 1
$ lsblk /dev/vdb
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
vdb 251:16 0 5G 0 disk
├─vdb1 251:17 0 1M 0 part
├─vdb2 251:18 0 1000M 0 part
├─vdb3 251:19 0 100M 0 part
├─vdb4 251:20 0 4M 0 part
└─vdb5 251:21 0 3.9G 0 part
# and it is identified as `virtio1` in the host
$ ls /sys/bus/vdpa/devices/vduse1/
driver driver_override power subsystem uevent virtio1
# attach it to the `vhost-vdpa` device to use the device with VMs
$ driverctl -b vdpa set-override vduse1 vhost_vdpa
# `/dev/vdb` is not available anymore
$ lsblk -pv
NAME TYPE TRAN SIZE RQ-SIZE MQ
/dev/vda disk virtio 10G 256 1
# the device is identified as `vhost-vdpa-1` in the host
$ ls /sys/bus/vdpa/devices/vduse1/
driver driver_override power subsystem uevent vhost-vdpa-1
$ ls -l /dev/vhost-vdpa-1
crw-------. 1 root root 511, 0 Feb 12 17:58 /dev/vhost-vdpa-1
# launch QEMU using `/dev/vhost-vdpa-1` device with the
# `virtio-blk-vhost-vdpa` libblkio driver
$ qemu-system-x86_64 -m 512M -smp 2 -M q35,accel=kvm,memory-backend=mem \
-object memory-backend-memfd,share=on,id=mem,size="512M" \
-blockdev node-name=drive0,driver=virtio-blk-vhost-vdpa,path=/dev/vhost-vdpa-1,cache.direct=on \
-device virtio-blk-pci,drive=drive0
# `virtio-blk-vhost-vdpa` blockdev can be used with any QEMU block layer
# features (e.g live migration, I/O throttling).
# In this example we are using I/O throttling:
$ qemu-system-x86_64 -m 512M -smp 2 -M q35,accel=kvm,memory-backend=mem \
-object memory-backend-memfd,share=on,id=mem,size="512M" \
-blockdev node-name=drive0,driver=virtio-blk-vhost-vdpa,path=/dev/vhost-vdpa-1,cache.direct=on \
-blockdev node-name=throttle0,driver=throttle,file=drive0,throttle-group=limits0 \
-object throttle-group,id=limits0,x-iops-total=2000 \
-device virtio-blk-pci,drive=throttle0
# Alternatively, we can use the generic `vhost-vdpa-device-pci` to take
# advantage of all the performance, but without having any QEMU block layer
# features available
$ qemu-system-x86_64 -m 512M -smp 2 -M q35,accel=kvm,memory-backend=mem \
-object memory-backend-memfd,share=on,id=mem,size="512M" \
-device vhost-vdpa-device-pci,vhostdev=/dev/vhost-vdpa-0
virtio vdpa vduse linux qemu 会议 演讲
QEMU设备类型 | 快路径 | QEMU拦截请求 |
---|---|---|
vhost-vdpa-device-pci (更通用) | 支持 | 不支持 |
virtio-blk-vhost-vdpa(与libblkio配合) | 不支持 | 支持(实时迁移,磁盘格式,IO流控等) |
modprobe vdpa-sim-blk, insmod vdpa_sim_blk.ko
drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
module_init(vdpasim_blk_init) -> sim
module_param(shared_backend, bool, 0444)
device_register(&vdpasim_blk_mgmtdev)
vdpa_mgmtdev_register(&mgmt_dev)
if (shared_backend)
shared_buffer = kvzalloc(VDPASIM_BLK_CAPACITY << SECTOR_SHIFT, GFP_KERNEL)
static struct vdpa_mgmt_dev mgmt_dev = {
.device = &vdpasim_blk_mgmtdev,
.id_table = id_table,
.ops = &vdpasim_blk_mgmtdev_ops,
};
static const struct vdpa_mgmtdev_ops vdpasim_blk_mgmtdev_ops = {
.dev_add = vdpasim_blk_dev_add,
.dev_del = vdpasim_blk_dev_del
};
# Add `vdpa-blk1` device through `vdpasim_blk` management device
$ vdpa dev add name vdpa-blk1 mgmtdev vdpasim_blk
static int vdpasim_blk_dev_add(struct vdpa_mgmt_dev *mdev, const char *name, const struct vdpa_dev_set_config *config)
dev_attr.id = VIRTIO_ID_BLOCK
dev_attr.supported_features = VDPASIM_BLK_FEATURES
dev_attr.nvqs = VDPASIM_BLK_VQ_NUM -> 1
dev_attr.get_config = vdpasim_blk_get_config
blk_config->capacity = cpu_to_vdpasim64(vdpasim, VDPASIM_BLK_CAPACITY) -> 0x40000 -> print(0x40000) 262144 * 512 = 134217728 = 128MB -> lsblk vda 252:0 0 128M 0 block:virtio:vdpa
blk_config->size_max = cpu_to_vdpasim32(vdpasim, VDPASIM_BLK_SIZE_MAX) -> 4096
blk_config->seg_max = cpu_to_vdpasim32(vdpasim, VDPASIM_BLK_SEG_MAX) -> 32
blk_config->blk_size = cpu_to_vdpasim32(vdpasim, SECTOR_SIZE) -> 1<<9 = 512
...
dev_attr.work_fn = vdpasim_blk_work -> IO处理函数/工作队列
for (i = 0; i < VDPASIM_BLK_VQ_NUM; i++)
while (vdpasim_blk_handle_req(vdpasim, vq))
vdpasim_blk_handle_req
struct virtio_blk_outhdr hdr
vringh_getdesc_iotlb(&vq->vring, &vq->out_iov, &vq->in_iov, &vq->head, GFP_ATOMIC) -> 不再需要使用 riov 和 wiov 时,您应该通过调用 vringh_kiov_cleanup() 来清理它们以释放内存
__vringh_iov(vrh, *head, riov, wiov, no_range_check, NULL,gfp, copydesc_iotlb)
to_push = vringh_kiov_length(&vq->in_iov) - 1
to_pull = vringh_kiov_length(&vq->out_iov)
bytes = vringh_iov_pull_iotlb(&vq->vring, &vq->out_iov, &hdr, sizeof(hdr)) -> copy bytes from vring_iov to hdr(从GUEST获取消息头)
vringh_iov_xfer(vrh, riov, dst, len, xfer_from_iotlb)
err = xfer(vrh, iov->iov[iov->i].iov_base, ptr, partlen) -> xfer_from_iotlb
copy_from_iotlb(vrh, dst, src, len)
type = vdpasim32_to_cpu(vdpasim, hdr.type);
sector = vdpasim64_to_cpu(vdpasim, hdr.sector);
offset = sector << SECTOR_SHIFT;
status = VIRTIO_BLK_S_OK;
switch (type)
case VIRTIO_BLK_T_IN -> GUEST READ
vringh_iov_push_iotlb(&vq->vring, &vq->in_iov, blk->buffer + offset, to_push) -> copy bytes into vring_iov -> vringh_iov_xfer(vrh, wiov, (void *)src, len, xfer_to_iotlb) -> read from blk->buffer + offset
err = xfer(vrh, iov->iov[iov->i].iov_base, ptr, partlen)
copy_to_iotlb(vrh, dst, src, len)
ret = iotlb_translate(vrh, (u64)(uintptr_t)dst
copy_to_iter
case VIRTIO_BLK_T_OUT: -> GUEST WRITE
bytes = vringh_iov_pull_iotlb(&vq->vring, &vq->out_iov, blk->buffer + offset, to_pull) -> copy bytes from vring_iov to blk buffer(将IO写入blk指向的缓冲区(偏移))
case VIRTIO_BLK_T_GET_ID
bytes = vringh_iov_push_iotlb(&vq->vring, &vq->in_iov, vdpasim_blk_id, VIRTIO_BLK_ID_BYTES)
case VIRTIO_BLK_T_FLUSH:
break
case VIRTIO_BLK_T_DISCARD:
case VIRTIO_BLK_T_WRITE_ZEROES:
struct virtio_blk_discard_write_zeroes range
bytes = vringh_iov_pull_iotlb(&vq->vring, &vq->out_iov, &range, to_pull) -> vring -> range
vdpasim_blk_check_range(vdpasim, sector, num_sectors, VDPASIM_BLK_DWZ_MAX_SECTORS)
memset(blk->buffer + offset, 0, num_sectors << SECTOR_SHIFT) -> reset blk buffer
smp_wmb()
local_bh_disable
if (vringh_need_notify_iotlb(&vq->vring) > 0)
vringh_notify(&vq->vring)
vrh->notify(vrh)
local_bh_enable()
if (reschedule)
vdpasim_schedule_work
simdev = vdpasim_create(&dev_attr, config)
ops = &vdpasim_batch_config_ops
or ops = &vdpasim_config_ops
vdpa = __vdpa_alloc_device(NULL, ops, dev_attr->ngroups, dev_attr->nas, dev_attr->alloc_size, dev_attr->name, use_va) -> allocate and initilaize a vDPA device
vdev->dev.bus = &vdpa_bus
vdev->config = config -> ops
device_initialize(&vdev->dev)
kthread_init_work(&vdpasim->work, vdpasim_work_fn)
vdpasim->worker = kthread_create_worker(0, "vDPA sim worker: %s", dev_attr->name)
if (dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64)))
vdpasim->vqs = kcalloc(dev_attr->nvqs
vdpasim->iommu = kmalloc_array(vdpasim->dev_attr.nas, sizeof(*vdpasim->iommu), GFP_KERNEL)
vdpasim->iommu_pt = kmalloc_array(vdpasim->dev_attr.nas, sizeof(*vdpasim->iommu_pt), GFP_KERNEL)
for (i = 0; i < vdpasim->dev_attr.nas; i++)
vhost_iotlb_init(&vdpasim->iommu[i], max_iotlb_entries, 0) -> 2048 -> initialize a vhost IOTLB
vhost_iotlb_add_range vdpasim->iommu_pt[i] = true
for (i = 0; i < dev_attr->nvqs; i++)
vringh_set_iotlb(&vdpasim->vqs[i].vring, &vdpasim->iommu[0], &vdpasim->iommu_lock) -> initialize a vringh for a ring with IOTLB, associated vring and iotlb
blk = sim_to_blk(simdev)
blk->shared_backend = shared_backend
blk->buffer = kvzalloc
_vdpa_register_device(&simdev->vdpa, VDPASIM_BLK_VQ_NUM)
static const struct vdpa_config_ops vdpasim_config_ops = {
.set_vq_address = vdpasim_set_vq_address,
vq->desc_addr = desc_area;
vq->driver_addr = driver_area
vq->device_addr = device_area
.set_vq_num = vdpasim_set_vq_num, -> vq->num = num
.kick_vq = vdpasim_kick_vq, -> vdpasim_schedule_work(vdpasim) -> vdpasim_work_fn -> 关键函数, 当用户态Guest准备好IO BUF后通知VQ后执行此IO处理函数
vdpasim->dev_attr.work_fn(vdpasim) -> vdpasim_blk_work -> handle IO
.set_vq_cb = vdpasim_set_vq_cb,
vq->cb = cb->callback
vq->private = cb->private
.set_vq_ready = vdpasim_set_vq_ready,
vdpasim_queue_ready(vdpasim, idx)
vq->vring.last_avail_idx = last_avail_idx;
vq->vring.last_used_idx = last_avail_idx
vq->vring.notify = vdpasim_vq_notify
vq->cb(vq->private)
.get_vq_ready = vdpasim_get_vq_ready,
.set_vq_state = vdpasim_set_vq_state,
vrh->last_avail_idx = state->split.avail_index
.get_vendor_vq_stats = vdpasim_get_vq_stats,
.get_vq_state = vdpasim_get_vq_state,
.get_vq_align = vdpasim_get_vq_align, -> #define VDPASIM_QUEUE_ALIGN PAGE_SIZE -> 4K
.get_vq_group = vdpasim_get_vq_group,
.get_device_features = vdpasim_get_device_features,
vdpasim->dev_attr.supported_features
.get_backend_features = vdpasim_get_backend_features,
BIT_ULL(VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK)
.set_driver_features = vdpasim_set_driver_features,
.get_driver_features = vdpasim_get_driver_features,
.set_config_cb = vdpasim_set_config_cb,
.get_vq_num_max = vdpasim_get_vq_num_max,
.get_device_id = vdpasim_get_device_id,
.get_vendor_id = vdpasim_get_vendor_id,
.get_status = vdpasim_get_status,
.set_status = vdpasim_set_status,
vdpasim->status = status
.reset= vdpasim_reset,
.compat_reset= vdpasim_compat_reset,
vdpasim_do_reset(vdpasim, flags)
for (i = 0; i < vdpasim->dev_attr.nvqs; i++)
vdpasim_vq_reset(vdpasim, &vdpasim->vqs[i])
vq->ready = false;
vq->desc_addr = 0;
vq->driver_addr = 0;
vq->device_addr = 0;
vq->cb = NULL;
vq->private = NULL;
vringh_init_iotlb(&vq->vring, vdpasim->dev_attr.supported_features, VDPASIM_QUEUE_MAX, false, NULL, NULL, NULL);
vq->vring.notify = NULL;
vringh_set_iotlb(&vdpasim->vqs[i].vring, &vdpasim->iommu[0], &vdpasim->iommu_lock)
for (i = 0; i < vdpasim->dev_attr.nas; i++)
vhost_iotlb_reset(&vdpasim->iommu[i])
vhost_iotlb_add_range(&vdpasim->iommu[i], 0, ULONG_MAX, 0, VHOST_MAP_RW)
.suspend= vdpasim_suspend,
vdpasim->running = false
.resume= vdpasim_resume,
if (vdpasim->pending_kick)
for (i = 0; i < vdpasim->dev_attr.nvqs; ++i)
vdpasim_kick_vq(vdpa, i)
.get_config_size = vdpasim_get_config_size,
.get_config = vdpasim_get_config,
.set_config = vdpasim_set_config,
vdpasim->dev_attr.set_config(vdpasim, vdpasim->config) -> 中转
.get_generation = vdpasim_get_generation,
.get_iova_range = vdpasim_get_iova_range,
.set_group_asid = vdpasim_set_group_asid,
.dma_map = vdpasim_dma_map,
vhost_iotlb_add_range_ctx(&vdpasim->iommu[asid], iova, iova + size - 1, pa, perm, opaque)
.dma_unmap = vdpasim_dma_unmap,
vhost_iotlb_reset(&vdpasim->iommu[asid])
vhost_iotlb_del_range(&vdpasim->iommu[asid], iova, iova + size - 1)
.reset_map = vdpasim_reset_map,
.bind_mm= vdpasim_bind_mm,
mm_work.mm_to_bind = mm
vdpasim_worker_change_mm_sync(vdpasim, &mm_work)
vdpasim_mm_work_fn
vdpasim->mm_bound = mm_work->mm_to_bind
.unbind_mm= vdpasim_unbind_mm,
.free = vdpasim_free,
};
vdpa, linux commit: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4c8cf31885f69e86be0b5b9e6677a26797365e1d
modprobe vdpa
core_initcall(vdpa_init) -> vDPA:介绍 vDPA 总线,vDPA 设备是一种使用符合 virtio 规范的数据路径并带有供应商特定控制路径的设备。vDPA 设备既可以物理位于硬件上,也可以通过软件模拟。vDPA 硬件设备通常通过 PCIE 实现,具有以下类型: - PF(物理功能) - 单个物理功能 - VF(虚拟功能) - 支持单根 I/O 虚拟化(SR-IOV)的设备。其虚拟功能(VF)代表设备的虚拟化实例,可以分配给不同的分区 - ADI(可分配设备接口)及其等效项 - 利用 Intel 可扩展 IOV 等技术,由主机操作系统利用一个或多个 ADI 组成的虚拟设备(VDEV)。或者其等效项,如 Mellanox 的 SF(子功能)。 >从驱动程序的角度来看,根据 DMA 转换的方式和位置,vDPA 设备分为两种类型: - 平台特定的 DMA 转换 - 从驱动程序的角度来看,设备可以在设备访问内存中的数据受到限制和/或转换的平台上使用。一个例子是 PCIE vDPA,其 DMA 请求通过总线(例如 PCIE)特定的方式标记。DMA 转换和保护在 PCIE 总线 IOMMU 级别完成。 - 设备特定的 DMA 转换 - 设备通过其自己的逻辑实现 DMA 隔离和保护。一个例子是使用片上 IOMMU 的 vDPA 设备。为了隐藏 vDPA 设备/IOMMU 选项的上述类型的差异和复杂性,并为了向上层展示通用的 virtio 设备,需要一个设备无关的框架。此补丁引入了软件 vDPA 总线,它抽象了 vDPA 设备、vDPA 总线驱动程序的通用属性以及 vDPA 设备抽象和 vDPA 总线驱动程序之间的通信方法 (vdpa_config_ops)。这允许多种类型的驱动程序用于 vDPA 设备,例如 virtio_vdpa 和 vhost_vdpa 驱动程序在总线上运行,并允许内核 virtio 驱动程序或用户空间 vhost 驱动程序使用 vDPA 设备: -> commit: https://github.com/ssbandjl/linux/commit/961e9c84077f6c8579d7a628cbe94a675cb67ae4, 通过对 vDPA 总线和 vDPA 总线操作的抽象,底层硬件的差异和复杂性对上层隐藏起来。上层的 vDPA 总线驱动程序可以使用统一的 vdpa_config_ops 来控制不同类型的 vDPA 设备
virtio drivers vhost drivers
| |
[virtio bus] [vhost uAPI]
| |
virtio device vhost device
virtio_vdpa drv vhost_vdpa drv
\ /
[vDPA bus]
|
vDPA device
hardware drv
|
[hardware bus]
|
vDPA hardware
root@host101:/dev# tree -L 100 /sys/bus/vdpa/
/sys/bus/vdpa/
├── devices
├── drivers
├── drivers_autoprobe
├── drivers_probe
└── uevent
modprobe -a virtio-vdpa vhost-vdpa
root@host101:/sys/bus/vdpa# tree -L 10
.
├── devices
├── drivers
│ ├── vhost_vdpa
│ │ ├── bind
│ │ ├── module -> ../../../../module/vhost_vdpa
│ │ ├── uevent
│ │ └── unbind
│ └── virtio_vdpa
│ ├── bind
│ ├── module -> ../../../../module/virtio_vdpa
│ ├── uevent
│ └── unbind
├── drivers_autoprobe
├── drivers_probe
└── uevent
root@host101:/sys/bus/vdpa# vdpa dev add mgmtdev vdpasim_blk name vdpa0
root@host101:/sys/bus/vdpa# tree -L 10
.
├── devices
│ └── vdpa0 -> ../../../devices/vdpa0
├── drivers
│ ├── vhost_vdpa
│ │ ├── bind
│ │ ├── module -> ../../../../module/vhost_vdpa
│ │ ├── uevent
│ │ └── unbind
│ └── virtio_vdpa
│ ├── bind
│ ├── module -> ../../../../module/virtio_vdpa
│ ├── uevent
│ ├── unbind
│ └── vdpa0 -> ../../../../devices/vdpa0
├── drivers_autoprobe
├── drivers_probe
└── ueven
https://stefano-garzarella.github.io/posts/2024-02-12-vdpa-blk/
kernel: drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
博客: https://cloud.tencent.com/developer/user/5060293/articles | https://logread.cn | https://blog.csdn.net/ssbandjl | https://www.zhihu.com/people/ssbandjl/posts
https://chattoyou.cn(吐槽/留言)
https://cloud.tencent.com/developer/column/101987
技术会友: 欢迎对DPU/智能网卡/卸载/网络,存储加速/安全隔离等技术感兴趣的朋友加入DPU技术交流群
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。