vfio进行网卡透传

2023-05-16


VFIO is a new method of doing PCI device assignment ("PCI passthrough"
aka "<hostdev>") available in newish kernels (3.6?; it's in Fedora 18 at
any rate) and via the "vfio-pci" device in qemu-1.4+. In contrast to the
traditional KVM PCI device assignment (available via the "pci-assign"
device in qemu), VFIO works properly on systems using UEFI "Secure
Boot"; it also offers other advantages, such as grouping of related
devices that must all be assigned to the same guest (or not at all).
Here's some useful reading on the subject.


  http://lwn.net/Articles/474088/
  http://lwn.net/Articles/509153/


Short description (from Alex Williamson's KVM Forum Presentation)


1) Assume this is the device you want to assign:
01:10.0 Ethernet controller: Intel Corporation 82576
Virtual Function (rev 01)


2) Find the vfio group of this device
:
# readlink /sys/bus/pci/devices/0000:01:10.0/iommu_group
../../../../kernel/iommu_groups/15


==IOMMU Group = 15


3) Check the devices in the group:
# ls /sys/bus/pci/devices/0000:01:10.0/iommu_group/devices/
0000:01:10.0


(so this group has only 1 device)


4) Unbind from device driver
# echo 0000:01:10.0 >/sys/bus/pci/devices/0000:01:10.0/driver/unbind


5) Find vendor & device ID
$ lspci -n -s 01:10.0
01:10.0 0200: 8086:10ca (rev 01)


6) Bind to vfio-pci
$ echo 8086 10ca /sys/bus/pci/drivers/vfio-pci/new_id


(this will result in a new device node "/dev/vfio/15",  which is what qemu will use to setup the device for passthrough)


7) chown the device node so it is accessible by qemu user:
# chown qemu /dev/vfio/15; chgrp qemu /dev/vfio/15


(note that /dev/vfio/vfio, which is installed as 0600 root:root, must also be made mode 0666, still owned by root - this is supposedly not dangerous)


I'll look into this, the intention has always been that /dev/vfio/vfio
is a safe interface that's only empowered when connected to
a /dev/vfio/$GROUP, which implies some privileges.


8) set the limit for locked memory equal to all of guest memory size + [some amount large enough to encompass all of io space]
# ulimit -l 2621440   # ((2048 + 512) * 1024)


9) pass to qemu using -device vfio-pci:


 sudo qemu qemu-system-x86_64 -m 2048 -hda rhel6vm \
              -vga std -vnc :0 -net none \
              -enable-kvm \
              -device vfio-pci,host=01:10.0,id=net0


(qemu will then use something like step (2) to figure out which device node it needs to use)


Why the "ulimit -l"?
--------------------


Any qemu guest that is using the old pci-assign must have *all* guest
memory and IO space locked in memory. Normally the maximum amount of
locked memory allowed for a process is controlled by "ulimit -l", but
in the case of pc-assign, the kvm kernel module has always just
ignored the -l limit and locked it all anyway.


With vfio-pci, all guest memory and IO space must still be locked in
memory, but the vfio module *doesn't* ignore the process limits, so
libvirt will need to set ulimit -l for any guest that wants to do
vfio-based pci passthrough. Since (due to the possibility of hotplug)
we don't know at the time the qemu process is started whether or not
it might need to do a pci passthrough, we will need to use prlimit(2)
to modify the limit of the already-running qemu.




Proposed XML Changes
--------------------


To support vfio pci device assignment in libvirt, I'm thinking something
like this (note that the <driversubelement is already used for
<interfaceand <diskto choose which backend to use for a particular
device):


   <hostdev managed='yes'>
     <driver name='vfio'/>
     ...
   </hostdev>


   <interface type='hostdev' managed='yes'>
     <driver name='vfio'/>


vfio is the overall userspace driver framework while vfio-pci is the
specific qemu driver we're using here.  Does it make more sense to call
this 'vfio-pci'?  It's possible that we could later have a device tree
qemu driver which would need to be involved with -device vfio-dt (or
something) and have different options.


     ...
   </hostdev>


(this new use of <driverinside <interfacewouldn't conflict with
the existing <driver name='qemu|vhost'>, since neither of those could
ever possibly be a valid choice for <interface type='hostdev'>. The
one possible problem would be if someone had an <interface
type='network'which might possibly point to a hostdev or standard
bridged network, and wanted to make sure that in the case of a bridged
network, that <driver name='qemu' was used. I suppose in this case,
the driver name in the network definition would override any driver
name in the interface?)


Sepaking of <network>, here's how vfio would be specified in a hostdev <networkdefinition:


   <network>
     <name>vfio-net</name>
     <forward mode='hostdev' managed='yes'>
       <driver name='vfio'/>
       <pf dev='eth3'/<!-- or a list of VFs -->
     </forward>
     ...
   </network>


Another possibility for the <networkxml would be to add a
"driver='vfio'" to each individual <interfaceline, in case someone
wanted some devices in a pool to be asigned using vfio and some using
the old style, but that seems highly unlikely (and could create
problems in the future if we ever needed to add a 2nd attribute to the
<driverelement).


Actually, at one point I considered that vfio should be turned on
globally in libvirtd.conf (or qemu.conf), but that would make
switchover a tedious process, as all existing guests using PCI
passthrough would need to be shutdown prior to the change. As long as
there are no technical problems with allowing both types on the same
host, it's more flexible to choose on a device-by-device basis.
>
Now some questions:


1) Is there any reason that we shouldn't/can't allow both pci-assign
and vfio-pci at the same time on the same host (and even guest).


vfio-pci and pci-assign can be mixed, but don't intermix devices within
a group.  Sometimes this will work (if the grouping is isolation
reasons), but sometimes it won't (when the grouping is for visibility).
Best to just avoid that scenario.


2) Does it make any sense to support a "managed='no'" mode for vfio,
which skipped steps 2-6 above? (this would be parallel to the existing
pci-assign managed='no'(where no unbinding/binding of the device to
the host's pci-stub driver is done, but the device name is simply
passed to qemu assuming that all that work was already done)) Or
should <driver name='vfio'/automatically mean that all
unbinding/binding be done for each device.


I don't think it hurts to have it, but I can't think of a use case.
Even with pci-assign, I can only think of cases where customers have
used it to try to work around things they shouldn't be doing with it.


3) Is it at all bothersome that qemu must be the one opening the
device node, and that there is apparently no way to have libvirt open
it and send the fd to qemu?


I have the same question.  The architecture of vfio is that the user
will open /dev/vfio/vfio (vfiofd) and add a group to it (groupfd).
Multiple groupfds can be added to a single vfiofd, allowing groups to
share IOMMU domains.  However, it's not guaranteed that the IOMMU driver
will allow this (the domains may be incompatible).  Qemu will therefore
attempt to add any new group to an existing vfiofd before re-opening a
new one.  There's also the problem that a group has multiple devices, so
if device A from group X gets added with vfiofd and groupXfd and libvirt
then passes a new vfiofd' and groupXfd' for attaching device B, also
from group X... what's qemu to do?


So in order to pass file descriptors libvirt has to either know exactly
how things are working or just always pass a vfiofd and groupfd, which
qemu will discard if it doesn't need.  The latter implies that fds could
live on and be required past the point where the device that added them
has been removed (in the example above, add A and qemu uses vfiofd and
groupXfd, hot add B and qemu discards vfiofd' and groupXfd', remove A
and qemu continues to use vfiofd and groupXfd for B). 
  


  

*********************************************************************  

-device pci-assign 已经不使用了,会报错invalid argument  


  

最新的内核里,建议废除KVM_ASSIGN机制,只支持vfio,如果还是使用老的 KVM ASSIGN的话,那么需要手动修改.config文件 “KVM_DEVICE_ASSIGNMENT=y”,才能使用kvm assgin。 注意,要vim手动修改,make menuconfig里面已经没有了


看了一下代码,assigned-dev.c 是kvm_assgin的实现,只有选择CONFIG_KVM_DEVICE_ASSIGNMENT才会对其进行编译

arch/x86/kvm/Makefile:

kvm-$(CONFIG_KVM_DEVICE_ASSIGNMENT) += assigned-dev.o iommu.o


这里是一篇关于如何使用kvm-pci-assign机制的文章

http://www.linux-kvm.org/page/How_to_assign_devices_with_VT-d_in_KVM


参考链接

http://www.spinics.net/lists/kvm/msg120779.html

http://nanxiao.me/en/why-does-qemu-complain-no-iommu-found/


同样使用kvm-assgin的话,使用最新的QEMU同样存在问题

“qemu-system-x86_64: pci_get_msi_message: unknown interrupt type”

这同样是VFIO的问题

如果想使用kvm-pci-assgin,那么就使用2.6.0以前的QEMU吧

参考链接

http://qemu.11.n7.nabble.com/PATCH-v9-00-25-IOMMU-Enable-interrupt-remapping-for-Intel-IOMMU-td412217.html



另外有个地方可以下载到kvm很多有用的脚本

https://github.com/smilejay/kvm-book.git


本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

vfio进行网卡透传 的相关文章

  • 数码管显示问题总结

    1 数码管显示原理 我们最常用的是七段式和八段式 LED 数码管 xff0c 八段比七段多了一个小数点 xff0c 其他的基本相同 所谓的八段就是指数码管里有八个小 LED 发光二极管 xff0c 通过控制不同的 LED 的亮灭来显示出不同
  • 多种方式登陆模块设计

    多种方式登陆模块设计 目录 多种方式登陆模块设计 目录参考了一些资料总结一下 1 使用 用户名邮箱手机号 密码 登陆2 第三方登陆 mob文档中还有以下描述 思考 参考了一些资料 知乎 第三方一键登录如何保证产品的唯一ID xff1f 开源
  • P1233 木棍加工

    题目描述 一堆木头棍子共有n根 xff0c 每根棍子的长度和宽度都是已知的 棍子可以被一台机器一个接一个地加工 机器处理一根棍子之前需要准备时间 准备时间是这样定义的 xff1a 第一根棍子的准备时间为1分钟 xff1b 如果刚处理完长度为
  • RUST 是 JavaScript 基建的未来

    这里写自定义目录标题 1 RUST 是 JavaScript 基建的未来1 1 Rust 是什么 1 2 Adoption1 3 从 JavaScript 到 Rust1 4 SWC1 5 Deno1 6 esbuild1 7 Rome1
  • VNC XRDP

    2个可以远程到linux上的远程桌面 xff0c 我都在我的raspi2上实现了 xff0c 这2个软件在我看来都差不多 xff0c 都可以满足我的要求 xff0c 只不过一个需要在windows上安装客户端VNC VIEWER xff0c
  • [Python学习]基础一: 循环

    34 if 语句可以嵌套 xff0c 但是不推荐 2 xff09 python 没有switch语句 3 xff09 循环语句 重复的执行某一个固定的动作或者任务 分类for while 4 xff09 for循环 for 变量 in 序列
  • springboot配置mysql数据库spring.datasource.url报错处理

    spring datasource url 61 jdbc mysql abc 3306 abcd useUnicode 61 true amp characterEncoding 61 utf8 很常规地配置了这个mysql的url后发现
  • 从jar包中读取文件的几种方式

    写在前面 本文会用到这篇文章 1 xff1a 从自身读取文件 1 1 xff1a 定义读取的文件 1 2 xff1a 读取代码 span class token annotation punctuation 64 SpringBootApp
  • Win10环境安装Anaconda(3-2021.05)+Tensorflow(2.6)

    Win10环境安装Anaconda 3 2021 05 43 Tensorflow 2 6 在学习机器学习的过程中会用到许多Python库 xff0c 例如tensorflow pandas等 xff0c 用到的时候单独去安装十分不方便 x
  • VIM中字符串的替换

    VIM中字符串的替换 字符串的替换 span class token number 1 span span class token function vim span 中可用 s 命令来替换字符串 xff0c 具体如下 xff1a s st
  • Golang依赖管理工具之go module(go1.11)

    大多数语言都会有包管理工具 xff0c 像Node有npm xff0c PHP有composer xff0c Java有Maven和Gradle 可是 xff0c Go语言一直缺乏一个官方的包管理 Dep被称为官方试验品official e
  • 一个linux驱动链表例子

    Free list初始化 span class token keyword struct span span class token class name list head span free list span class token
  • SAS学习笔记

    术语解释 Phy xff1a The term Phy is used in the standard with respect to the interface between a device and the service deliv
  • scsi命令的读写命令在哪里构建?

    drivers scsi sd c 中的sd setup read write cmnd函数里构建
  • linux查看磁盘的inquery data

    sg inq命令可以
  • SCSI任务优先级

    在SCSI命令参考手册中给出了三个优先级控制位 xff1a HEADSUP xff0c ORDWUP SIMPSUP xff0c 分别对应的优先级是HEAD OF QUEUE ORDERED SIMPLE HEAD OF QUEUE优先级的
  • 通过sg3_utils发送scsi cdb命令码

    sg raw 可以直接发送scsi命令 如sg raw r 1k dev sg0 12 00 00 00 60 00 是inquiry命令
  • 查看sas盘支持的所有命令

    sg raw r span class token number 1024 span dev sda a3 0c 00 00 00 00 00 00 04 00 00 00 或者 sg opcodes dev sda
  • [已解决]Ubuntu安装libssl-dev失败

    ygu 64 guyongqiangx span class token function sudo span span class token function apt get span span class token function

随机推荐

  • 安装ubuntu的时候注意事项

    安装的时候一定要选择上第三方更新 xff0c 并且不下载updates安装完毕后第一件事件就是 xff0c 搜索Software amp Updates 然后在Updates项目那里把自动检查更新更改为Never 通知我ubuntu有新版本
  • ubuntu16.04开机登录后一直蓝屏解决方法

    Ctrl 43 Alt 43 F4 xff0c 进入字符界面 xff0c 然后重新登录 xff0c 先输入用户名 xff0c 然后输入登录密码 xff0c 然后安装相应服务 xff0c 然后重置它 xff01 span class toke
  • 制作 macOS Mojave U盘启动盘/安装盘

    制作 macOS Mojave U盘启动盘 安装盘 下载macOS Mojave程序 去 App Store 下载好 macOS Mojave 安装程序 xff0c 先不要启动安装 下载完成后 xff0c 可在 Launchpad 中看到一
  • 自定义WSL的安装位置,别再装到C盘啦

    WSL Windows Subsystem for Linux 是win10的一项十分强大的功能 WSL让我们可以像使用普通的软件一样直接使用Linux的功能 配合微软的Windows Terminal xff0c 拥有比通常的虚拟机更方便
  • win10离线装linux子系统 运行ubuntu.exe失败闪退没反应

    解决过程 在新的系统版本下 xff0c lxrun等命令已经失效 xff0c 可利用wsl list verbose查看当前电脑上子系统信息 xff0c 成功之后如下图所示 xff1a 这里注意状态是running是因为已经配置好了 xff
  • SCSI Upper Layer 与LLD的联系——sd_probe

    SCSI UL和LLD的关系是driver和device的关系 内核中定义了device driver和device结构 xff0c 分别来抽象设备驱动和设备 这两个结构相当于所有设备驱动和设备的超类 UL代表的scsi driver和LL
  • sas控制器驱动之设备管理

    本文以2 6 32 68内核中的mpt2sas为例子 xff0c 介绍了sas驱动的设备管理 1 基本结构 内核中scsi的结构分三层 xff0c 此在网上已有大量资料 xff0c 不再赘述 本文在此基础上增加了mid layer的 tra
  • 结构体中char a[0]用法——柔性数组

    有如下定义 xff1a typedef struct char a char b 0 其中元素Char b 0 叫做柔性数组 xff0c 主要用于使结构体包含可变长字段 详细内容如下 xff1a 柔性数组 柔性数组结构成员 C99中 xff
  • dd命令中seek和skip的用法

    dd命令中seek和skip用法 xff0c 感兴趣的朋友可以参考下 假如有一个文件abc gz xff0c 大小为83456k xff0c 我想用dd命令实现如下备份 结果 xff1a 首先将备份分成三个部分 xff0c 第一部分为备份文
  • vscode常用集锦

    跳转 xff1a 跳转到第一行 xff1a 1 跳转到行尾 xff1a 1
  • yum install 失败 Failed to download metadata for repo ‘AppStream‘问题解决

    Centos8于2021年年底停止了服务 xff0c 大家再在使用yum源安装时候 xff0c 出现下面错误 错误 xff1a Failed to download metadata for repo AppStream Cannot pr
  • UEFI Drivers & UEFI Driver Model

    1 UEFI Drivers UEFI Drivers是UEFI Image的一种 xff0c UEFI Drivers与UEFI Applications的区别 xff1a Objects managed by UEFI based fi
  • 利用阿里云服务器自建DNS服务器

    好久没更新博客了 xff0c 突然想更新一下我最近的研究 国内DNS污染太严重了 xff0c 这就导致很多国内外没被墙的网站我们都打不开 xff0c 有时候就很麻烦 xff0c 所以最近我研究了一下AdGudrd Home xff0c 它是
  • Golang类型转换

    Go不会对数据进行隐式的类型转换 xff0c 只能手动去执行转换操作 strconv包提供了简单数据类型之间的类型转换功能 span class token keyword package span main span class toke
  • iOS开发进阶 - 基于PhotoKit的图片选择器

    移动端访问不佳 xff0c 请访问我的个人博客 很早之前就用OC把代码写完了并用在项目中了 xff0c 一直没时间整理 xff0c 现在用swift重写一份 xff0c 并且更加详细的来了解这个Photos框架 xff0c 下面是我集合苹果
  • 云服务器ECS的基本概念

    地域和可用区 xff1a 指ECS实例所在的物理位置 实例 xff1a 等同于一台虚拟机 xff0c 包含CPU 内存 操作系统 网络 磁盘等最基础的计算组件 实例规格 xff1a 指实例的配置 xff0c 包括VCPU核数 内存 网络性能
  • JSON_VALUE Function (JSON)

    The following statement returns a value of 10 SELECT JSON VALUE 39 34 item1 34 10 39 39 item1 39 AS 34 value 34 FROM DUM
  • dmpython安装

    Import dmpython步骤 数据库版本 xff1a SQL gt select from v version DM Database Server 64 V8 DB Version 0x7000c Python版本 xff1a 3
  • Ceph部署(超详细)

    Ceph的部署工具 xff1a ceph deploy xff1a 官方的部署工具ceph ansible xff1a 红帽的部署工具ceph chef xff1a 利用chef进行自动部署Ceph的工具puppet ceph xff1a
  • vfio进行网卡透传

    VFIO is a new method of doing PCI device assignment 34 PCI passthrough 34 aka 34 lt hostdev gt 34 available in newish ke