R3live论文翻译

2023-05-16

摘要

在本文中，我们提出了一个新颖的激光惯性视觉传感器融合框架，也就是R3live；它利用了激光雷达、惯性和视觉传感器的测量值，可以得到鲁棒和高精度的状态估计。R3live包含了两个子系统，激光惯性里程计系统（LIO）和视觉惯性里程计系统（VLO）。LIO子系统（FAST-LIO）利用激光雷达和惯性传感器的测量数据构建全局地图的几何特征。VIO子系统则利用视觉和惯性传感器的测量数据给地图添加纹理（也就是给3D点赋予颜色）。更具体的是，VIO子系统通过最小化frame-to-map的光度误差的方式直接有效的融合视觉数据。R3live是在R2live的基础上进一步开发的。实验结果表明，我们所提出的系统在状态估计方面比同行鲁棒性更强和精度更高（可以查看我们的视频）。

R3live是一个多功能且精心设计的系统，适用于各种可能的应用，不仅可以作为实时机器人上应用的SLAM系统，还可以为测量和测绘等应用重建密集、精确、RGB颜色的3D地图。此外，为了使R3live更具可扩展性，我们开发了一系列用于重建和纹理网格的离线实用程序，这就进一步缩小了R3live与各种3D应用程序（如模拟器，视频游戏等）之间的差距（请参阅我们演示视频video2）。为了分享我们的发现和回馈社区，我们在github上开源了R3live，包括代码，软件应用和设备的结构设计。

1 介绍

前面省略了价值不大的部分。

在本文中，我们基于激光雷达、惯性和视觉传感器的测量数据进行紧耦合，解决了实时同步定位、3D建图和地图渲染的问题。我们的贡献如下：

我们提出了一个实时建图定位着色的框架。提出的框架了包含了一个用于构建全局地图几何特征的激光惯性里程计（LIO），和一个用于纹理渲染的视觉惯性里程计部分。整个系统能够实时构建一个密集的，3D的，RGB颜色的点云地图。
我们提出了一个基于RGB颜色点云地图的新型VIO系统。VIO通过最小化观测到全局地图中的点RGB颜色与当前图像的颜色之间的光度误差估计当前的状态。由于这样处理不需要环境中的特征点，因此节约了相应的处理时间（特征检测和特征提取），同时也令我们的系统在纹理缺乏的环境中更加的鲁棒。
我们在R3live系统中实现了我们所提出的方法。R3live可以实时的low-drift的为实际环境构建一个稠密的、准确的、GRB颜色的点云地图。整个系统在不同的室内和室外环境中进行了验证。结果显示我们的系统在行进1.5公里后在平移上漂移0.16米，在旋转上漂移3.9度。
我们在Github上开源了我们的系统。另外，我们也为三维重建和texturing meshes from colorized point cloud 开发了几个离线工具。这些软件和装置的结构设计也开源了。

2 系统概述

我们系统的整体流程图如图2所示。我们系统框架包含了两个子系统：LIO子系统（上面部分）和VIO子系统（下面部分）。LIO子系统负责构建全局地图的几何特征，它对齐输入的LiDAR scan，并且通过最小化point-to-plane残差以估计系统的状态。VIO子系统构建地图的纹理，它使用图像渲染每个点的RGB颜色并通过最小化frame-to-frame PnP重投影误差和frame-to-map光度误差更新系统的状态。

3 符号

在整篇论文中，我们使用下面TABLE I中的符号；它们在先前的工作R2live【12】中已经介绍过了。

A 状态向量

在我们的工作中，我们定义the full state vector $x\epsilon R^{29}$ 如下:

其中 $_{}^{G}\textrm{g}\epsilon R^{3}$ 是全局坐标系下（第一帧激光雷达坐标系）的重力向量。 $_{}^{I}\textrm{T}_{C}$ 是相机和雷达之间的时间偏差，雷达则认为已经同IMU进行了时间同步。 $\phi =[f_{x}, f_{y}, c_{x}, c_{y} ]$ 是相机的内参，其中 $f_{x}, f_{y}$ 是焦距， $c_{x}, c_{y}$ 是相机光心与图像平面左上角的距离。

B 地图表示

我们的地图是由体素和三维点组成的，其中三维点包含在voxels中并且voxels是地图的最小组成元素。

1）Voxels：为了快速发现地图中的点进行渲染和跟踪（参考V-C和V-D小节），我们设计了一个固定大小（0.1m x 0.1m x 0.1m）的容易称之为voxel。如果voxel刚刚添加过点（最近的1秒内），我们就把这个voxel标记为activated。否则，这个voxel就被标记为deactivated。

2）Points：在我们的工作中，点P是一个6维的向量：

其中第一个3 x 1的子向量 $_{}^{G}\textrm{P}=[p_{x}, p_{y}, p_{z} ]^{T}$ 代表了3D点在全局坐标系中的位置。第二个3 x 1的子向量 $c=[c_{r}, c_{g}, c_{b} ]^{T}$ 代表了3D点的颜色。除此之外，我们也记录该点的其他信息，比如3 x 3的协方差矩阵 $\epsilon _{c},\epsilon _{p}$ ，表示了 $_{}^{G}\textrm{P}$ 和c的相应的估计协方差，and the timestamps when this point have been created and rendered。

4 激光里程计子系统

如图2所示，R3live的子系统LIO构造了全局地图的几何结构。For an incoming 激光雷达帧，由于连续运动导致的畸变首先通过IMU反向传播进行补偿，如【6】所示。然后，基于ESIKF最小化point-to-plane残差估计系统的状态。最后当状态收敛时，the points of this scan添加到全局地图中，并标记相应的voxel为激活或者未激活。全局地图中累积的3D点组成了几何结构，这些3D点也被用来为VIO子系统提供相应的深度。对于R3live中的LIO子系统的更多细节，我们推荐读者去阅读我们之前的相关工作【12，22】。

5 视觉里程计子系统

我们VIO子系统通过最小化光度误差估计系统状态，进而渲染全局地图的纹理。更详细的是，我们首先把一定数量的点（跟踪点）从全局地图中投影到当前图像中，然后在ESIKF框架下通过最小化这些点的光度误差迭代估计系统状态。出于效率的目的，跟踪的地图点需要比较稀疏，which usually requires to build a pyramid4 of the input image。The pyramid is however not invariant to either translation or rotation that needs also to be estimated. 在我们提出的框架中，我们利用单个地图点的颜色去计算光度误差。在VIO中同步渲染的颜色，是地图3D点的固有属性，并且具有旋转和平移不变性。为了保证鲁棒和快速收敛，我们设计了一个如图2所示的两部pipline。首先利用frame-to-frame 光流法跟踪地图点，并通过最小化PnP跟踪地图点的重投影误差优化系统的状态（V-A小节）。接下来，我们通过最小化跟踪点的frame-to-map光度误差进一步refine系统状态（V-B小节）。当系统状态收敛后，基于原始图像，我们进行纹理渲染，更新全局地图3D点的颜色（V-C小节）。

A frame-to-frame 里程计

假设我们在上一帧 $I_{k-1}$ 跟踪了m个地图点 $P={p_{1}, p_{2}, ... ,p_{m}}$ ，它们在 $I_{k-1}$ 中的投影是 $p_{1_{k-1}}, p_{2_{k-1}}, ... ,p_{m_{k-1}}$ ，然后我们使用LK光流跟踪找到它们在第 $I_{k}$ 帧中的位置以 $p_{1_{k}}, p_{2_{k}}, ... ,p_{m_{k}}$ 表示。然后我们通过一个ESIKF计算和优化它们的重投影误差以估计系统状态。

1）PnP重投影误差

如图3所示，我们以第s个点 $p_{s}=[_{}^{G}\textrm{p}_{s}^{T}, \textrm{c}_{s}^{T}]^{T}\epsilon P$ 作为例子进行说明重投影误差的计算 $r(\tilde{x_{k}}, \rho _{s}^{k}, _{}^{G}\textrm{}p_{s})$ ：

其中， $\tilde{x}$ 是ESIKF每次迭代的状态， $\pi (_{}^{C}\textrm{P}_{s},\check{x_{k}})$ 计算公式如下：

其中， $\Delta t_{k-1,k}$ 是上一帧 $I_{k-1}$ 和当前帧 $I_{k}$ 的时间间隔。需要注意的是公式（5），第一项是小孔投影公式，第二项是online-temporal correction factor【23】。

残差公式(4)中的测量噪声包含了两部分：一个是 $\rho_{s_{k}}$ 中的像素跟踪误差，另外一个是 $_{}^{G}\textrm{P}_{s}$ 中地图点的位置误差。

其中， $_{}^{G}\textrm{P}_{s}^{gt}$ 和 $_{}^{G}\textrm{}\rho _{s}^{gt}$ 是对应的 $_{}^{G}\textrm{P}_{s}$ 和 $\rho_{s_{k}}$ 真值。然后我们对残差 $r(\tilde{x_{k}}, _{}^{G}\textrm{}\rho _{s}^{gt}, _{}^{G}\textrm{P}_{s}^{gt})$ 进行一阶泰勒展开：

其中 $\alpha _{s}~N(0, \epsilon_{\alpha _{s}})$ 并且有：

2）Frame-to-frame VIO ESIKF update

Equation (8) constitutes an observation distribution for $x_{k}$ (or equivalently $\delta \check{x_{k}}=x_{k}\ominus \check{x_{k}}$ ), which can be combined with 先验分布 from the IMU propagation to obtain the 最大后验估计(MAP) estimation of $\delta \check{x_{k}}$ 。

其中 $||x||_{\Sigma }^{2}=x^{t}\Sigma^{-1} x$ 是马氏距离的平方， $\Sigma$ 是协方差。 $\check{x}_{k}$ 是IMU前向传播之后的状态， $\Sigma_{\delta \check{x_{k}}}$ 是IMU前向传播状态的协方差。H是将状态从 $\hat{x_{k}}$ 的切空间投影到 $\check{x_{k}}$ 的切空间的雅可比矩阵。公式（12）第一部分的具体细节可以参考R2live【12】的E小节。

符号表示：

Following [6]，卡尔曼增益计算公式如下：

然后我们可以更新系统状态如下：

以上过程迭代到收敛(小节V-A1到小节V-A2)(更新小于阈值)。需要注意的是这样的迭代卡尔马你滤波等同于高斯牛顿优化方法【24】。

B frame-to-map 里程计

1） Frame-to-map光度更新

经过frame-to-frame VIO更新，我们得到了一个较好的状态估计 $\check{x_{k}}$ ，然后我们通过最小化跟踪点的光度误差进行frame-to-map VIO update 以降低漂移。如图4所示：以第s个跟踪点 $p_{s}\epsilon P$ 为例进行说明，它的光度误差计算公式如下 $o(\check{x_{k}}, _{}^{G}\textrm{P}_{s}, c_{s})$ ：

其中 $c_{s}$ 是保存在全局地图中当前点的颜色， $\gamma _{s}$ 是在当前帧图像 $I_{K}$ 中观测到的颜色。为了获得观测到的颜色 $\gamma _{s}$ 及其协方差，我们对点落在当前帧中的位置 $\tilde{P_{s_{k}}}=\pi (_{}^{C}\textrm{p}_{s}, \check{x_{k}})$ 进行预测，然后使用其邻阈附近像素的RGB颜色进行线性插值。

当前我们也考虑到了 $\gamma _{s}$ 和 $c_{s}$ 的观测噪声。

其中 $r_{s}^{gt}$ 是 $\gamma _{s}$ 的真值， $c_{s}^{gt}$ 是 $c_{s}$ 的真值。 $\Delta _{t_{cs}}$ 是上一次渲染 $p_{s}$ 的时间与当前帧的间隔。需要注意的是，在方程（21）中， $c_{s}$ 的测量误差包含了两部分：上一次渲染时的估计误差 $n_{cs}$ （小节V-C）和随机游走噪声 $\eta _{cs}$ ，which accounts for the change of environment illusion。

结合方程（19），（20）和（21），我们对the true zero 残差 $o(x_{k}, _{}^{G}\textrm{P}_{s}^{gt}, c_{s}^{gt})$ 进行一阶泰勒展开：

其中 $\beta _{s}\epsilon ~N(0, \Sigma _{\beta _{s}})$ 并且有：

2）Frame-to-map VIO ESIFK updata

公式（22）constitutes another observation distribution for $\delta \check{x_{k}}$ , which is combined with the prior distribution from the IMU propagation to obtain the maximum a posteriori (MAP) estimation of $\delta \check{x_{k}}$ :

H，R， $\check{z_{k}}$ 和P符号表示与（13~16）相似：

然后与（17）和（18）类似，我们接下来进行状态更新。Frame-to-map VIO ESIKF update(小节V-B1和V-B2)一直迭代直至收敛。状态收敛后然后其被用做：（1）渲染全局地图的纹理（小节V-C）；（2）更新当前跟踪点集P以下一帧使用（小节V-D）；（3）在下一帧LIO或者VIO更新前，作为IMU前向积分的起始点。

C 渲染全局地图的纹理

经过frame-to-map VIO更新后，我们获取到当前图像的准确位姿，然后接下来我们可以进行纹理渲染以更新全局地图点的颜色。

首先，我们检索在activated voxels中（参考IV小节）的所有点。假设一共有n个点并且点集 $\zeta =P_{1} ,P_{2},...P_{n}$ 。我们以第s个点 $p_{s}=[_{}^{G}\textrm{p}_{s}^{T}, \textrm{c}_{s}^{T}]^{T}\epsilon P$ 的颜色更新过程为例进行说明。如果 $p_{s}$ 落入当前帧 $I_{K}$ 中，我们通过对其在 $I_{K}$ 邻阈像素的RGB值进行线性插值获得观测颜色 $\gamma _{s}$ 和颜色协方差 $\Sigma _{\gamma _{s}}$ 。然后通过Bayesian update融合新观测到点的颜色与地图中记录的点的颜色 $c_{s}$ （如图5所示），融合后的结果及其协方差如下所示：

D 更新VIO子系统跟踪的特征点

经过纹理渲染后，我们接下来需要更新一下跟踪点集P。首先，我们把公式（4）中重投影误差和公式（19）中光度误差比较大的点从P中移除，并且移除没有落到当前帧中 $I_{k}$ 的点。然后，我们把 $\zeta$ 中的每个点投影到当前帧 $I_{k}$ 中，如果它附近没有其他跟踪点（50个像素为半径）就把它们添加到跟踪点集P中。

6 实验和结果

7 R3live应用

参考文献

[1] J. Levinson, J. Askeland, J. Becker, J. Dolson, D. Held, S. Kammel, J. Z.Kolter, D. Langer, O. Pink, V. Pratt, et al., “Towards fully autonomous driving: Systems and algorithms,” in 2011 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2011, pp. 163–168.
[2] A. Bry, A. Bachrach, and N. Roy, “State estimation for aggressive flight in gps-denied environments using onboard sensing,” in 2012 IEEE International Conference on Robotics and Automation. IEEE, 2012, pp. 1–8.
[3] F. Gao, W. Wu, W. Gao, and S. Shen, “Flying on point clouds: Online trajectory generation and autonomous navigation for quadrotors in cluttered environments,” Journal of Field Robotics, vol. 36, no. 4, pp. 710–733, 2019.
[4] F. Kong, W. Xu, Y. Cai, and F. Zhang, “Avoiding dynamic small obstacles with onboard sensing and computation on aerial robots,” IEEE Robotics and Automation Letters, vol. 6, no. 4, pp. 7869–7876, 2021.
[5] Z. Liu, F. Zhang, and X. Hong, “Low-cost retina-like robotic lidars based on incommensurable scanning,” IEEE Transactions on Mechatronics, 2021, in press.
[6] W. Xu and F. Zhang, “Fast-lio: A fast, robust lidar-inertial odometry package by tightly-coupled iterated kalman filter,” IEEE Robotics and Automation Letters, 2021, in press.
[7] J. Lin, X. Liu, and F. Zhang, “A decentralized framework for simultaneous calibration, localization and mapping with multiple lidars,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2020, pp. 4870–4877.
[8] Z. Liu and F. Zhang, “Balm: Bundle adjustment for lidar mapping,” IEEE Robotics and Automation Letters, 2021, in press.

[9] X. Liu and F. Zhang, “Extrinsic calibration of multiple lidars of small fov in targetless environments,” IEEE Robotics and Automation Letters, 2021, in press.
[10] C. Yuan, X. Liu, X. Hong, and F. Zhang, “Pixel-level extrinsic self calibration of high resolution lidar and camera in targetless environments,” arXiv preprint arXiv:2103.01627, 2021.
[11] J. Lin and F. Zhang, “Loam livox: A fast, robust, high-precision lidar odometry and mapping package for lidars of small fov,” in 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020, pp. 3126–3131.
[12] J. Lin, C. Zheng, W. Xu, and F. Zhang, “R2live: A robust, real-time, lidar-inertial-visual tightly-coupled state estimator and mapping,” IEEE Robotics and Automation Letters, vol. 6, no. 4, pp. 7469–7476, 2021.
[13] T. Shan, B. Englot, C. Ratti, and D. Rus, “Lvi-sam: Tightly-coupled lidar-visual-inertial odometry via smoothing and mapping,” arXiv preprint arXiv:2104.10831, 2021.
[14] X. Zuo, P. Geneva, W. Lee, Y. Liu, and G. Huang, “Lic-fusion: Lidarinertial-camera odometry,” in 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2019, pp. 5848–5854.
[15] X. Zuo, Y. Yang, J. Lv, Y. Liu, G. Huang, and M. Pollefeys, “Lic-fusion 2.0: Lidar-inertial-camera odometry with sliding-window plane-feature tracking,” in IROS 2020, 2020.
[16] W. Zhen and S. Scherer, “Estimating the localizability in tunnel-like environments using lidar and uwb,” in 2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019, pp. 4903–4908.
[17] H. Zhou, Z. Yao, and M. Lu, “Uwb/lidar coordinate matching method with anti-degeneration capability,” IEEE Sensors Journal, vol. 21, no. 3, pp. 3344–3352, 2020.
[18] C. Debeunne and D. Vivet, “A review of visual-lidar fusion based simultaneous localization and mapping,” Sensors, vol. 20, no. 7, p. 2068, 2020.
[19] J. Zhang and S. Singh, “Laser–visual–inertial odometry and mapping with high robustness and low drift,” Journal of Field Robotics, vol. 35, no. 8, pp. 1242–1264, 2018.
[20] W. Shao, S. Vijayarangan, C. Li, and G. Kantor, “Stereo visual inertial lidar simultaneous localization and mapping,” arXiv preprint arXiv:1902.10741, 2019.
[21] W. Wang, J. Liu, C. Wang, B. Luo, and C. Zhang, “Dv-loam: Direct visual lidar odometry and mapping,” Remote Sensing, vol. 13, no. 16, p. 3340, 2021.
[22] W. Xu, Y. Cai, D. He, J. Lin, and F. Zhang, “Fast-lio2: Fast direct lidar-inertial odometry,” arXiv preprint arXiv:2107.06829, 2021.
[23] T. Qin and S. Shen, “Online temporal calibration for monocular visualinertial systems,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2018, pp. 3662–3669.
[24] B. M. Bell and F. W. Cathey, “The iterated kalman filter update as a gauss-newton method,” IEEE Transactions on Automatic Control, vol. 38, no. 2, pp. 294–297, 1993.
[25] S. Garrido-Jurado, R. Munoz-Salinas, F. J. Madrid-Cuevas, and M. J. ˜ Mar´ın-Jimenez, “Automatic generation and detection of highly reliable ´ fiducial markers under occlusion,” Pattern Recognition, vol. 47, no. 6, pp. 2280–2292, 2014.
[26] T. Qin, P. Li, and S. Shen, “Vins-mono: A robust and versatile monocular visual-inertial state estimator,” IEEE Transactions on Robotics, vol. 34, no. 4, pp. 1004–1020, 2018.
[27] Z. Zhang and D. Scaramuzza, “A tutorial on quantitative trajectory evaluation for visual (-inertial) odometry,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2018, pp. 7244–7251.
[28] P. Labatut, J.-P. Pons, and R. Keriven, “Efficient multi-view reconstruction of large-scale scenes using interest points, delaunay triangulation and graph cuts,” in 2007 IEEE 11th international conference on computer vision. IEEE, 2007, pp. 1–8.
[29] A. Fabri and S. Pion, “Cgal: The computational geometry algorithms library,” in Proceedings of the 17th ACM SIGSPATIAL international conference on advances in geographic information systems, 2009, pp. 538–539.
[30] D. Girardeau-Montaut, “Cloudcompare,” France: EDF R&D Telecom ParisTech, 2016.
[31] P. Cignoni, G. Ranzuglia, M. Callieri, M. Corsini, F. Ganovelli, N. Pietroni, and M. Tarini, “Meshlab,” 2011.
[32] S. Shah, D. Dey, C. Lovett, and A. Kapoor, “Airsim: High-fidelity visual and physical simulation for autonomous vehicles,” in Field and service robotics. Springer, 2018, pp. 621hehe

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

R3live

论文翻译