Monocular visual information and IMU (inertial measurement Unit) information fused scale estimation system and method

文档序号:1657433 发布日期:2019-12-27 浏览:15次 中文

阅读说明:本技术 单目视觉信息和imu信息相融合的尺度估计系统及方法 (Monocular visual information and IMU (inertial measurement Unit) information fused scale estimation system and method ) 是由 邹旭东 肖麟慧 李志天 杨伍昊 熊兴崟 刘云飞 于 2019-09-26 设计创作,主要内容包括:本公开提供一种单目视觉信息和IMU信息相融合的尺度估计系统及方法,所述尺度估计系统包括:单目视觉SLAM模块,包括单目相机,用于采集视觉图像数据;IMU模块,用于获得IMU模块的IMU位姿数据;以及离线尺度估计模块,将所述视觉图像数据和IMU位姿数据进行融合,实现尺度估计和定位建图。所述尺度估计方法,包括:步骤S1:通过单目相机采集视觉图像数据;通过IMU模块获得IMU位姿数据;步骤S2:基于角速度将IMU位姿数据和单目相机位姿数据进行时间空间对齐;以及步骤S3:基于加速度在频域下进行尺度估计,完成基于单目视觉信息和IMU信息相融合的尺度估计。(The present disclosure provides a scale estimation system and method for integrating monocular visual information and IMU information, wherein the scale estimation system includes: the monocular vision SLAM module comprises a monocular camera and is used for acquiring visual image data; the IMU module is used for obtaining IMU pose data of the IMU module; and the off-line scale estimation module is used for fusing the visual image data and the IMU pose data to realize scale estimation and positioning mapping. The scale estimation method comprises the following steps: step S1: acquiring visual image data through a monocular camera; obtaining IMU pose data through an IMU module; step S2: performing time-space alignment on the IMU pose data and the monocular camera pose data based on the angular velocity; and step S3: and carrying out scale estimation in a frequency domain based on the acceleration to finish the scale estimation based on the fusion of monocular visual information and IMU information.)

1. A system for fusion of monocular visual information and IMU information for scale estimation, comprising:

the monocular vision SLAM module comprises a monocular camera and is used for acquiring visual image data;

the IMU module is used for obtaining IMU pose data of the IMU module; and

and the scale estimation module is used for fusing the visual image data and the IMU pose data to realize scale estimation and positioning mapping.

2. The monocular visual information and IMU information fused scale estimation system of claim 1, wherein the visual pose data is obtained by processing visual image data collected by the monocular camera without absolute scale information.

3. The monocular visual information and IMU information fused scale estimation system of claim 2, the visual pose data comprising: monocular camera position, monocular camera rotation matrix, monocular camera rotation quaternion, and monocular camera rotation angular velocity.

4. The monocular visual information and IMU information fused scale estimation system of claim 1, the IMU module comprising a gyroscope, an accelerometer; the IMU pose data comprises: IMU acceleration, IMU angular velocity.

5. A method for estimating a scale by the monocular visual information and IMU information fused scale estimation system according to any one of claims 1 to 4, the scale estimator comprising:

step S1: acquiring visual image data through a monocular camera; obtaining IMU pose data through an IMU module;

step S2: performing time-space alignment on the IMU pose data and the monocular camera pose data based on the angular velocity; and

step S3: and carrying out scale estimation in a frequency domain based on the acceleration to finish the scale estimation based on the fusion of monocular visual information and IMU information.

6. The monocular vision SLAM module of claim 5, wherein in step S1, the monocular vision SLAM module observes landmark points through a monocular camera, projects three-dimensional landmark points onto an imaging plane of the camera, and obtains pixel coordinates; the model during imaging adopts a pinhole imaging model; the single landmark point of the monocular vision SLAM module is observed according to the following equation:

ZPuv=K(RW+t)=KTPW

wherein, Pu,vIs the pixel coordinate on the image, PW=(X Y Z)TThe three-dimensional coordinate of the landmark point under a camera coordinate system, the matrix K is a parameter matrix in the camera, T is a transformation matrix of the camera from the camera coordinate system to a world coordinate system, R is a rotation matrix of the camera, T is a translation vector of the camera from the camera coordinate system to the world coordinate system, and Z is a Z-axis coordinate of the landmark point under the camera coordinate system, namely a depth value of the landmark point under the camera coordinate system; then, the observation result is subjected to minimum reprojection error, pi(·)Is a projection function, so that the monocular camera pose result of visual observation can be obtained, namely the rotation matrix R of the camera and the position p of the camera:

wherein, XiIs the position of the ith landmark point in the camera coordinate system,is the projection coordinate of the ith landmark point in the pixel plane, ρ () is a robust kernel function, χ is the set of all feature points (i.e., landmark points) extracted from the image captured by the monocular camera, Σ is a summer symbol,the method refers to the value of a variable R and t when a following expression reaches the minimum value, and p is position data of a camera, namely a translation vector t of the camera.

7. The method for estimating a fused metric of monocular visual information and IMU information as claimed in claim 5, wherein in step S2, it is first assumed that the visual angular velocity and the IMU angular velocity are aligned in time, and it can be obtained by minimizing the cost function without time offset by using the least square method:

wherein R isSRepresenting a rotation matrix, t, from the camera coordinate system to the IMU body coordinate systemdThe time offset of the IMU is represented,representing bias of a gyroscope In order to obtain the angular velocity of the monocular camera,is the IMU angular velocity.

8. The method for scale estimation based on fusion of monocular visual information and IMU information as claimed in claim 7, wherein the visual angular velocity and IMU angular velocity satisfy the spatial rotation and translation relations of two sets of vectors, and the following initial value of spatial alignment is used to estimate the closed solution algorithm, and the centroid of angular velocity of two sets of quantities is defined first:

wherein the content of the first and second substances,respectively a visual angular velocity mass center and an IMU angular velocity mass center, n is the number of data participating in calculation, thereby realizing the purpose of calculating the angular velocity of the objectTo obtain two sets of centroidal angular velocities:

wherein, PV,QIRespectively a centroid-removed visual angular velocity set and a centroid-removed IMU angular velocity set; then the angular velocity accumulation matrix W is defined:

performing singular value decomposition on W:

W=U∑VT

where Σ is a singular value diag (σ)1,σ2,σ3) Forming a diagonal matrix, arranging diagonal elements from large to small in sequence, and decomposing U and V into orthogonal matrices; thus, the spatially aligned rotation matrix RSNamely:

RS=V×C×UT

when W is a full rank matrix, C ═ diag (1, 1, 1); when W is rank deficiency matrix, C ═ diag (1, 1, -1); biasing of angular velocity of gyroscopes in IMU modules after completion of the rotation matrix solutionThe following equation is obtained:

9. the method of claim 8, wherein the maximum time bias variation in positive and negative directions is determined by iterative algorithm of time bias and angular velocity bias based on golden section searchd)maxThen, using golden section search for time bias, and solving R by space alignment initial value estimation closed solution methodSAndthen, the solution of time offset and space rotation is carried out in an iterative mode, specifically, the golden section ratio is firstly set as follows:

the initial value of the time offset is:

a=-(td)max,b=(td)max

the golden section test points of the two time offsets are respectively:

α=b-(b-a)/ρg,β=a+(b-a)/ρg

when the error of two probe points is less than the threshold tautoleranceThen, the mean value is the final time offset value:

td=(α+β)/2;

to obtain RSAfter the optimal solution, the time is biased by the term tdThe optimization result of the time offset is solved, and then the optimization of the space alignment and the time alignment is carried out alternately, wherein the optimization is included in an objective function, and the optimization is as follows:

10. the method of claim 5, wherein after obtaining the visual acceleration in the camera coordinate system and the filtered IMU acceleration, performing least squares optimization, and the objective function in the time domain with the gravity term is as follows:

||gW||2=9.8;

wherein the content of the first and second substances,for the expression of the visual acceleration in the camera coordinate system,for IMU acceleration in the camera coordinate system, gWIn order to be the acceleration of the gravity,is a rotation matrix from the world coordinate system to the camera coordinate system at different times,for the expression of IMU acceleration bias under a camera coordinate system, s is a scale factor, then the frequency domain transformation is carried out on the visual acceleration and the IMU acceleration under a time domain, and an operator is setThe fourier transform is represented so that the visual and inertial accelerations in the frequency domain are calculated as follows:

wherein the content of the first and second substances,andfourier transformation matrixes containing visual acceleration and inertial acceleration are respectively used, Fourier transformation of three axial directions is separately calculated at the same time, and f represents frequency; the final scale factor s solution is to minimize the amplitude spectrum | AV(f) I and | AI(f) L, its cost function is defined as follows:

||gW||2=9.8;

wherein, baIs the bias of the IMU accelerometer, fmaxRepresenting the maximum frequency upper limit.

Technical Field

The disclosure relates to the technical field of synchronous positioning and mapping, in particular to a scale estimation system and method for fusing monocular visual information and IMU information.

Background

Synchronous positioning and Mapping technology (SLAM for short) refers to a problem that a mobile robot incrementally constructs a globally consistent environment map and performs position positioning at the same time according to input information of a specific sensor carried under an unknown environment and an unknown position. With the development of the last decades, SLAM technology becomes a crucial research technology in the fields of robots, positioning and navigation, computer vision, and the like, and simultaneously, the SLAM technology initially forms basic industrial practical capability in the industry.

Visual SLAM (visual SLAM) is a SLAM system constructed using visual sensors such as a monocular camera, a binocular camera, and an RGB-D camera. The visual SLAM is similar to human eyes, the information perception of the surrounding environment is realized in a mode of actively or passively receiving ambient light, and meanwhile, the three-dimensional space reduction is carried out on the feature points in the image based on the multi-view geometric principle, so that a three-dimensional map consistent with the real environment is constructed. The SLAM system based on the binocular camera and the RGB-D camera can construct the scale of a real environment because the sensor can acquire depth information, but the RGB-D camera is high in use cost, and meanwhile, the RGB-D camera and the binocular camera are large in size and inconvenient to carry, so that further application of the SLAM system on low-cost equipment and portable mobile equipment is limited. On the contrary, the SLAM system based on the monocular vision camera has the advantages of low cost, small volume, high real-time performance, abundant and reliable available information and the like, and is more advantageous in practical use. Currently, there are many solutions for monocular vision based SLAM, such as open source method visual sfm, OpenMVG, OpenSfM, ORB-SLAM2, and so on. However, because the monocular camera projects from a three-dimensional space to a two-dimensional pixel imaging plane on the basis of the multi-view geometric projection theory, information of one dimension is lost, so that depth information of a 3D point is lost, and a finally constructed map does not have scale information. Therefore, the SLAM system with pure monocular vision cannot obtain the absolute scale information of the constructed graph. The defect limits the application of the robot in a monocular real scene and also limits the application of positioning and mapping.

The IMU (Inertial Measurement Unit) also has the advantages of low cost and small volume, and the IMU can estimate the scale through the Inertial Measurement value of the IMU. The visual inertial system (VINS) refers to a system that combines a visual and inertial measurement unit IMU to perform positioning, mapping, and navigation. The monocular vision system is assisted by the low-cost inertial measurement unit to become a growing trend of the SLAM, and the system can make up for the defect that the monocular vision SLAM system does not have scale information by means of IMU information.

Although the monocular VINS system can be perfectly combined in positioning, mapping and navigation at the technical level, the actual combination algorithm still has great defects. One of the difficulties is that the hardware, such as synchronous circuits, such as hardware alignment design, is required to be very high. Currently, many methods (such as leutengger et al, 2014, OKVIS framework) including the dedicated hardware device google tango tablet (using a fisheye lens Camera and a high-precision IMU) of commercial visual inertial odometers achieve accurate scale estimation by using specially customized hardware (Camera-IMU). However, there are methods for motion tracking and reconstruction using standard smartphone sensors, and the practical results of these methods have not been as desirable.

Besides providing specific requirements for hardware, the monocular VINS system needs to solve two algorithm problems of visual ranging and inertial navigation, such as strict initialization, and is difficult to directly fuse a monocular visual structure and an inertial measurement value due to lack of direct distance measurement. In addition, the VINS system puts high requirements on hardware, for example, the synchronization of each video frame and the IMU sensor timestamp must be relatively accurate, and how to achieve accurate scale estimation and positioning on a low-precision mobile device is a current big problem. The VINS-mono framework proposed by the hong Kong science and technology university in 2017 is the best effect in the VINS system at present, and fig. 7 is a framework diagram of the system. In this framework, the vision and IMU implement the process of initializing the scale estimation in an offline and loosely coupled manner. However, in this framework, a large error is introduced by minimizing the position of the pose and positioning result output by the offline vision sfm (struct from motion) end and the data after the pre-integration of the IMU, so that the error generated by the subsequent positioning and estimation accuracy by using the scale information with the error is more significant finally.

BRIEF SUMMARY OF THE PRESENT DISCLOSURE

Technical problem to be solved

Based on the problems, the present disclosure provides a scale estimation system and method with monocular visual information and IMU information fused together, so as to alleviate the problem that in the prior art, there is no absolute scale information in an image acquired by a monocular camera, and the scale estimation accuracy of a position-based visual inertia system is low; the hardware timestamp synchronicity of the image sequence and the IMU sensor must be very strict; the prior VINS system needs strict initialization and other technical problems.

(II) technical scheme

In one aspect of the present disclosure, a scale estimation system for fusing monocular visual information and IMU information is provided, including: the monocular vision SLAM module comprises a monocular camera and is used for acquiring visual image data; the IMU module is used for obtaining IMU pose data of the IMU module; and the scale estimation module is used for fusing the visual image data and the IMU pose data to realize scale estimation and positioning mapping.

In the embodiment of the disclosure, the visual pose data is obtained by processing the visual image data without absolute scale information acquired by the monocular camera.

In an embodiment of the present disclosure, the visual pose data includes: monocular camera position, monocular camera rotation matrix, monocular camera rotation quaternion, and monocular camera rotation angular velocity.

In an embodiment of the present disclosure, the IMU module includes a gyroscope, an accelerometer; the IMU pose data comprises: IMU acceleration, IMU angular velocity.

In another aspect of the present disclosure, there is provided a method for estimating a scale by fusing monocular visual information and IMU information, the method for estimating a scale by using the system for estimating a scale by fusing monocular visual information and IMU information according to any one of the above, the method comprising: step S1: acquiring visual image data through a monocular camera; obtaining IMU pose data through an IMU module; step S2: performing time-space alignment on the IMU pose data and the monocular camera pose data based on the angular velocity; and step S3: and carrying out scale estimation in a frequency domain based on the acceleration to finish the scale estimation based on the fusion of monocular visual information and IMU information.

In the embodiment of the disclosure, the monocular vision SLAM module observes the landmark points through the monocular camera, and projects the three-dimensional landmark points to an imaging plane of the camera to obtain pixel coordinates; the model during imaging adopts a pinhole imaging model; the single landmark point of the monocular vision SLAM module is observed according to the following equation:

ZPuv=K(RPW+t)=KTPW

wherein, Pu,vIs the pixel coordinate on the image, PW=(X Y Z)TThe three-dimensional coordinate of the landmark point under a camera coordinate system, the matrix K is a parameter matrix in the camera, T is a transformation matrix of the camera from the camera coordinate system to a world coordinate system, R is a rotation matrix of the camera, T is a translation vector of the camera from the camera coordinate system to the world coordinate system, and Z is a Z-axis coordinate of the landmark point under the camera coordinate system, namely a depth value of the landmark point under the camera coordinate system; then, the observation result is subjected to minimum reprojection error, pi(·)Is a projection function, so that the monocular camera pose result of visual observation can be obtained, namely the rotation matrix R of the camera and the position p of the camera:

wherein, XiIs the position of the ith landmark point in the camera coordinate system,is the projection coordinate of the ith landmark point in the pixel plane, ρ () is a robust kernel function, χ is the set of all feature points (i.e., landmark points) extracted from the image captured by the monocular camera, Σ is a summer symbol,the method refers to the value of a variable R and t when a following expression reaches the minimum value, and p is position data of a camera, namely a translation vector t of the camera.

In the disclosed embodiment, first assuming that the visual angular velocity and the IMU angular velocity are aligned in time, it can be obtained by minimizing the cost function without time offset by using the least square method:

wherein R isSRepresenting a rotation matrix, t, from the camera coordinate system to the IMU body coordinate systemdThe time offset of the IMU is represented,representing bias of a gyroscope In order to obtain the angular velocity of the monocular camera,is the IMU angular velocity.

In the embodiment of the present disclosure, because the visual angular velocity and the IMU angular velocity satisfy the spatial rotation and translation relationship of two sets of vectors, a closed solution algorithm is estimated by using the following initial spatial alignment values, and the angular velocity centroid of two sets of vectors is defined first:

wherein the content of the first and second substances,respectively, a visual angular velocity centroid and an IMU angular velocity centroid, wherein n is the number of data participating in calculation, so that two groups of angular velocities with centroids removed can be obtained:

wherein, PV,QIRespectively a centroid-removed visual angular velocity set and a centroid-removed IMU angular velocity set; then the angular velocity accumulation matrix W is defined:

performing singular value decomposition on W:

W=U∑VT

where Σ is a singular value diag (σ)1,σ2,σ3) Forming a diagonal matrix, arranging diagonal elements from large to small in sequence, and decomposing U and V into orthogonal matrices; thus, the spatially aligned rotation matrix RSNamely:

RS=V×C×UT

when W is a full rank matrix, C ═ diag (1, 1, 1); when W is rank deficiency matrix, C ═ diag (1, 1, -1); biasing of angular velocity of gyroscopes in IMU modules after completion of the rotation matrix solutionThe following equation is obtained:

in the disclosed embodiment, an iterative algorithm of time bias and angular velocity bias based on the golden section search is utilized,firstly determining the maximum time bias change +/-t (t) in positive and negative directionsd)maxThen, using golden section search for time bias, and solving R by space alignment initial value estimation closed solution methodSAndthen, the solution of time offset and space rotation is carried out in an iterative mode, specifically, the golden section ratio is firstly set as follows:

the initial value of the time offset is:

a=-(td)max,b=(td)max

the golden section test points of the two time offsets are respectively:

α=b-(b-a)/ρg,β=a+(b-a)/ρg

when the error of two probe points is less than the threshold tautoleranceThen, the mean value is the final time offset value:

td=(α+β)/2;

to obtain RSAfter the optimal solution, the time is biased by the term tdThe optimization result of the time offset is solved, and then the optimization of the space alignment and the time alignment is carried out alternately, wherein the optimization is included in an objective function, and the optimization is as follows:

in the embodiment of the present disclosure, after obtaining the visual acceleration in the camera coordinate system and the filtered IMU acceleration, a least squares optimization is performed, and an objective function in the time domain with the gravity term is as follows:

wherein the content of the first and second substances,for the expression of the visual acceleration in the camera coordinate system,for IMU acceleration in the camera coordinate system, gWIn order to be the acceleration of the gravity,is a rotation matrix from the world coordinate system to the camera coordinate system at different times,for the expression of IMU acceleration bias under a camera coordinate system, s is a scale factor, then the frequency domain transformation is carried out on the visual acceleration and the IMU acceleration under a time domain, and an operator is setThe fourier transform is represented so that the visual and inertial accelerations in the frequency domain are calculated as follows:

wherein the content of the first and second substances,andfourier transformation matrixes containing visual acceleration and inertial acceleration are respectively used, and Fourier transformation of three axial directions is separately calculated at the same time; the final scale factor s solution is to minimize the amplitude spectrum | AV(f) I and | AI(f) L, its cost function is defined as follows:

wherein, baIs the bias of the IMU accelerometer, fmaxRepresenting the maximum frequency upper limit.

(III) advantageous effects

According to the technical scheme, the monocular visual information and IMU information fused scale estimation system and method disclosed by the invention have at least one or part of the following beneficial effects:

(1) the time offset alignment and the space alignment between the vision and the IMU can be better realized without strict hardware requirements, the hardware time stamps of a video sequence and an IMU sensor are not strictly synchronized, and the scale estimation module can be used for an initialization and back-end processing part in a Vision Inertial Odometer (VIO) and a Vision Inertial Navigation System (VINS);

(2) the method and the device realize the fusion of the visual pose data and the IMU inertial data, solve the problem that a monocular camera has no scale information, solve the problem that the scale estimation precision of the current position-based visual inertial system is low, and realize high-precision scale estimation and positioning on low-cost mobile equipment.

Drawings

Fig. 1 is a schematic diagram illustrating a scale estimation system in which monocular visual information and IMU information are fused according to an embodiment of the present disclosure.

Fig. 2 is a detailed framework structure diagram of a scale estimation system in which monocular visual information and IMU information are fused according to an embodiment of the present disclosure.

Fig. 3 is a schematic flow chart of a scale estimation method in which monocular visual information and IMU information are fused according to an embodiment of the present disclosure.

Fig. 4 is a schematic flow chart of a time-space alignment algorithm based on angular velocity in the scale estimation method in which monocular visual information and IMU information are fused according to the embodiment of the present disclosure.

Fig. 5 is a schematic flow chart of an iterative solution algorithm of time bias and space bias based on golden section search in the scale estimation method of fusing monocular visual information and IMU information according to the embodiment of the present disclosure.

Fig. 6 is a schematic flow chart of an acceleration-based scale estimation algorithm in a frequency domain in the scale estimation method in which monocular visual information and IMU information are fused according to the embodiment of the present disclosure.

Fig. 7 is a block diagram of the VINS-mono system proposed by the university of hong kong science and technology in 2017.

Detailed Description

The scale estimation refers to scaling the result of visual SLAM (Simultaneous Localization and Mapping) to a real scale by using IMU (Inertial Measurement Unit) information. The IMU inertial measurement unit can make up the deficiency of monocular SLAM and can be used for estimating the scale factor of visual positioning and mapping. The method aims at the problem that the monocular camera cannot recover scale information, a set of loosely-coupled scale estimation algorithm is developed by combining an IMU with low cost and consumption level, and the algorithm realizes accurate scale recovery through an offline visual SLAM module and IMU data with lower precision.

For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.

In an embodiment of the present disclosure, a monocular visual information and IMU information fused scale estimation system is provided, which is shown in fig. 1 to 2, and includes:

the monocular vision SLAM module comprises a monocular camera and is used for acquiring visual image data;

the IMU module is used for obtaining IMU pose data of the IMU module; and

and the scale estimation module is used for fusing the visual image data and the IMU pose data to realize scale estimation and positioning mapping.

The method comprises the steps that visual image data which are acquired by a monocular camera and have no absolute scale information are processed, and pose data of the monocular camera, namely the visual pose data, are obtained;

the visual pose data includes position data and pose data: the position data refers to the monocular camera position, typically represented by translation vector t; pose data refers to the pose of the monocular camera and is typically represented by a monocular camera (vision) rotation matrix R or a monocular camera (vision) rotation quaternion q. The pose data may also be represented by a transformation matrix T that includes both position information and pose information. Other pose information, such as monocular camera rotation angular velocity (visual angular velocity), can be derived from the visual pose data according to prior knowledge;

the IMU module comprises a gyroscope and an accelerometer;

the IMU pose data comprises: IMU acceleration, IMU angular velocity;

IMU acceleration may also be referred to as inertial acceleration, IMU angular velocity may also be referred to as inertial angular velocity;

the scale estimation module is an offline scale estimation module or an online scale estimation module, and in the embodiment of the present disclosure, the offline scale estimation module is described as a specific embodiment.

The visual image data is scale-free information data, firstly, a monocular visual SLAM module is used for carrying out visual positioning, local clustering adjustment is carried out on camera pose and 3D points through a nonlinear optimization method, and meanwhile, a loop detection method is adopted to realize robust global positioning, so that a robust positioning map is obtained. After a positioning map without scale information is obtained, robust scale estimation is realized by an offline scale estimation module on IMU pose data in a mode of upsampling and minimizing an acceleration cost function in a frequency domain, so that a good scale estimation result can be still achieved under the condition of low data accuracy; by effectively estimating the scale, the robust visual inertial positioning and mapping can be further realized.

After the information of the monocular vision SLAM module is obtained, a subsequent off-line scale estimation module can realize an SLAM front-end initialization and rear-end scale estimation method for recovering the scale under the condition of giving an inertia measurement value and a camera pose. The off-line scale estimation module adopts decoupling and off-line modes and combines the positioning result of the monocular vision SLAM and the pose data of the IMU for fusion.

In the embodiment of the disclosure, a scale estimation method for fusing monocular visual information and IMU information is further provided, the scale estimation method fuses visual image data and IMU pose data, firstly, time and space alignment of the monocular visual information (visual image data) and the IMU information (IMU pose data) is performed, meanwhile, a Rauch-Tung-Streebel (RTS) smoother is used for processing a camera position with noise, an EKF extended Kalman filter is used for processing IMU angular velocity and IMU acceleration information with noise, and finally, a minimum acceleration measurement value based on a frequency domain is used for implementing final scale estimation in scale estimation, so that low-cost and high-precision scale estimation and positioning mapping can be realized under the condition of an IMU based on a monocular camera and low-cost consumption level; with reference to fig. 2 to 6, the method for estimating the scale by fusing monocular visual information and IMU information includes:

step S1: acquiring visual image data through a monocular camera; obtaining IMU pose data through an IMU module;

the monocular vision SLAM module observes the landmark points through a monocular camera and projects the three-dimensional landmark points to an imaging plane of the camera to obtain pixel coordinates; the model during imaging adopts a pinhole imaging model; the single landmark point of the monocular vision SLAM module is observed according to the following equation:

ZPuv=K(RPW+t)=KTPW

wherein, Pu,vIs the pixel coordinate on the image, PW=(X Y Z)TThe three-dimensional coordinate of the landmark point under the camera coordinate system is shown, the matrix K is a parameter matrix in the camera, T is a transformation matrix of the camera from the camera coordinate system to the world coordinate system, R is a rotation matrix of the camera, T is a translation vector of the camera from the camera coordinate system to the world coordinate system, and Z is a Z-axis coordinate of the landmark point under the camera coordinate system, namely a depth value of the landmark point under the camera coordinate system.

Then, the observation result is subjected to minimum reprojection error, pi(·)Is a projection function, so that the monocular camera pose result of visual observation, namely the rotation of the camera, can be obtainedRotation matrix R and position p of camera:

wherein, XiIs the position of the ith landmark point in the camera coordinate system,is the projection coordinate of the ith landmark point in the pixel plane, ρ () is a robust kernel function, χ is the set of all feature points (i.e., landmark points) extracted from the image captured by the monocular camera, Σ is a summer symbol,the method refers to the value of a variable R and t when a following expression reaches the minimum value, and p is position data of a camera, namely a translation vector t of the camera.

Step S2: performing time-space alignment on the IMU pose data and the monocular camera pose data based on the angular velocity;

time alignment refers to a method of recovering timestamps of visual camera frames and IMU pose data, which are not strictly recorded according to a clock, to actual conditions through an algorithm; the spatial alignment is realized by an algorithm which does not strictly refer to the position relation T of the camera and the IMU coordinate system on the hardware designBCA technique for matrix recovery to reality.

As shown in fig. 4, a comparison is first made with the monocular camera angular velocity (in the visual image data) based on the gyroscope readings in the IMU module, where the monocular camera angular velocity is calculated from the camera pose. The subsequent alignment optimization step uses a bias value of least square optimization of the angular velocity of the iterative monocular camera and the angular velocity of the IMU using the gauss-newton method as an optimal estimation value, thereby performing time alignment while spatially aligning. The specific technical method comprises the following steps:

rotation matrix R from camera coordinate system to IMU body coordinate system due to quantity to be solved in optimization equationS(i.e., spatial alignment) and IMU time offset t from the camerad(i.e., time alignment), and gyroscope biasDevice for placingOptimal results for spatial alignment cannot be obtained without knowledge of the time offset. To find the angular velocity of the monocular cameraAnd IMU angular velocityThe optimal spatial transformation relationship between the two is firstly assumed that the visual angular velocity and the inertial angular velocity are aligned in time, and can be obtained by minimizing a cost function without time offset by adopting a least square method:

wherein the content of the first and second substances,is the expression of the rotation angular velocity of the monocular camera in the camera coordinate system,is the expression of the angular velocity of the IMU in the IMU coordinate system, the superscript V denotes the visual quantity, the superscript I denotes the IMU quantity, the subscript C denotes the camera coordinate system, and the subscript S denotes the IMU coordinate system;is an expression of the angular velocity bias of the gyroscope in a camera coordinate system; rSRepresents a rotation matrix, | C, from the camera coordinate system to the IMU body coordinate systemω||2The index error value, sigma is a summation symbol, and argmin is the value of the subscript variable when the following expression reaches the minimum value. In the above equation of the minimized error cost function for the visual angular velocity and the IMU angular velocity in the space, because they satisfy the spatial rotation and translation relationship of two sets of vectors, the present disclosure performs closed solution using the idea of ICP between 3D points and 3D points;the following is a spatial alignment initial estimation algorithm proposed in this disclosure, first defining two sets of quantities of angular velocity centroids:

wherein the content of the first and second substances,respectively a visual angular velocity centroid and an IMU angular velocity centroid, wherein n is the number of data participating in calculation. Two sets of centroidal angular velocities can thus be obtained:

wherein, PV,QIRespectively a centroid-removed visual angular velocity set and a centroid-removed IMU angular velocity set. Then the angular velocity accumulation matrix W is defined:

performing Singular Value Decomposition (SVD) on W,

W=U∑VT

where Σ is a singular value diag (σ)1,σ2,σ3) Forming diagonal matrix, arranging diagonal elements from large to small in sequence, U and V being decomposed orthogonal matrix, superscriptTIs a transposed symbol. Thus, the spatially aligned rotation matrix RSNamely:

RS=V×C×UT

where C is diag (1, 1, 1) if W is a full rank matrix, and C is diag (1, 1, -1) if W is a rank-deficient matrix. Biasing of angular velocity of IMU gyroscope after completion of the solution of rotation matrixThe following equation can be used to obtain:

in the optimization calculation aiming at the time bias, the invention provides an iterative algorithm of the time bias and the angular velocity bias based on golden section search. The flow of the iterative solution algorithm for specific time offsets and space offsets is shown in fig. 5.

In the bias iterative algorithm, the maximum time bias change in positive and negative directions is firstly determined ± (t)d)maxThen using golden section search for time bias and solving R in closed solution formSAndthe solution of the time offset and the spatial rotation is then performed in an iterative manner. Specifically, first, golden section ratio ρ is setgComprises the following steps:

the initial values of the time offset a, b are:

a=-(td)max,b=(td)max

the golden section heuristic points α, β for the two time offsets are:

α=b-(b-a)/ρg,β=a+(b-a)/ρg

when the error of two probe points is less than the threshold tautoleranceThen the mean value is the final time offset value td

td=(α+β)/2;

In the formation of RSAfter the optimal solution of (c), the time is then biased by the term tdAnd (4) including the time offset optimization result into an objective function, and then alternately performing space alignment and time alignment optimization.

After completion of the temporal and spatial alignment, R will be usedSAnd tdThe estimation result of (2) aligns the inertial acceleration with the camera acceleration in time and space, and then performs scale estimation.

Step S3: and carrying out scale estimation in a frequency domain based on the acceleration to finish the scale estimation based on the fusion of monocular visual information and IMU information.

The purpose of the scale estimation is to find a scale factor s to determine the scale of reconstruction and localization. While the measurement of the accelerometer in a typical IMU includes earth gravity, the measurement is noisy and biased. Gravity and accelerometer bias are also estimated when using acceleration measurement data for scale estimation.

Because the offsets of the IMU and the camera timestamps are not constant quantities, in order to solve the problem that the time stamps of the IMU and the camera cannot be completely aligned in the time domain and also solve the problem that the scale estimation based on the position has a drift error, a scale estimation algorithm based on acceleration in the frequency domain is proposed in the present disclosure. The specific algorithm flow is shown in fig. 6.

After the visual acceleration under the camera coordinate system and the filtered IMU acceleration are obtained, only the least square optimization needs to be performed, and the objective function in the time domain with the gravity term can be written as follows:

in the above formula, the first and second carbon atoms are,for the expression of the visual acceleration in the camera coordinate system,for IMU acceleration in the camera coordinate system, gWIn order to be the acceleration of the gravity,for different momentsA rotation matrix from the world coordinate system to the camera coordinate system,for the expression of IMU acceleration bias in the camera coordinate system, s is the scale factor to be found. Then, the frequency domain transformation is carried out on the visual acceleration and IMU acceleration under the time domain, and an operator is setThe fourier transform is represented so that the visual and inertial accelerations in the frequency domain are calculated as follows:

wherein the content of the first and second substances,andfourier transform matrices containing visual acceleration and inertial acceleration are respectively, while fourier transforms in three axial directions are separately calculated, and f represents frequency. The final scale factor s solution is to minimize the amplitude spectrum | AV(f) I and | AI(f) L, its cost function is defined as follows:

in the above formula, Σ is a summation symbol, and argmin is a value of a subscript variable when the following formula reaches the minimum value. B is baIs the bias of the IMU accelerometer, with an upper maximum frequency limit of fmaxAnd (4) showing.

So far, the embodiments of the present disclosure have been described in detail with reference to the accompanying drawings. It is to be noted that, in the attached drawings or in the description, the implementation modes not shown or described are all the modes known by the ordinary skilled person in the field of technology, and are not described in detail. Further, the above definitions of the various elements and methods are not limited to the various specific structures, shapes or arrangements of parts mentioned in the examples, which may be easily modified or substituted by those of ordinary skill in the art.

From the above description, those skilled in the art should clearly recognize that the disclosed monocular visual information and IMU information fused scale estimation system and method are applicable.

In summary, the scale estimation system and method for fusing monocular visual information and IMU information disclosed by the present disclosure can achieve better time offset alignment and space alignment between the visual and IMU without strict hardware requirements, do not require that the hardware timestamps of the video sequence and IMU sensor must be strictly synchronized, and solve the problem that the existing VINS system needs to be strictly initialized. The method for recovering the scale information from the measurement data of the monocular camera and the IMU realizes the fusion of the visual pose data and the IMU inertial data, solves the problem that the monocular camera does not have scale information, solves the problem that the scale estimation precision of the current position-based visual inertial system is low, and realizes high-precision scale estimation and positioning on low-cost mobile equipment.

It should also be noted that directional terms, such as "upper", "lower", "front", "rear", "left", "right", and the like, used in the embodiments are only directions referring to the drawings, and are not intended to limit the scope of the present disclosure. Throughout the drawings, like elements are represented by like or similar reference numerals. Conventional structures or constructions will be omitted when they may obscure the understanding of the present disclosure.

And the shapes and sizes of the respective components in the drawings do not reflect actual sizes and proportions, but merely illustrate the contents of the embodiments of the present disclosure. Furthermore, in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim.

Unless otherwise indicated, the numerical parameters set forth in the specification and attached claims are approximations that can vary depending upon the desired properties sought to be obtained by the present disclosure. In particular, all numbers expressing quantities of ingredients, reaction conditions, and so forth used in the specification and claims are to be understood as being modified in all instances by the term "about". Generally, the expression is meant to encompass variations of ± 10% in some embodiments, 5% in some embodiments, 1% in some embodiments, 0.5% in some embodiments by the specified amount.

Furthermore, the word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements.

The use of ordinal numbers such as "first," "second," "third," etc., in the specification and claims to modify a corresponding element does not by itself connote any ordinal number of the element or any ordering of one element from another or the order of manufacture, and the use of the ordinal numbers is only used to distinguish one element having a certain name from another element having a same name.

In addition, unless steps are specifically described or must occur in sequence, the order of the steps is not limited to that listed above and may be changed or rearranged as desired by the desired design. The embodiments described above may be mixed and matched with each other or with other embodiments based on design and reliability considerations, i.e., technical features in different embodiments may be freely combined to form further embodiments.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Also in the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the disclosure, various features of the disclosure are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various disclosed aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that is, the claimed disclosure requires more features than are expressly recited in each claim. Rather, as the following claims reflect, disclosed aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this disclosure.

The above-mentioned embodiments are intended to illustrate the objects, aspects and advantages of the present disclosure in further detail, and it should be understood that the above-mentioned embodiments are only illustrative of the present disclosure and are not intended to limit the present disclosure, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

18页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:单目视觉和惯性传感器融合的远距离测距系统及方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!