Monocular vision through-guidance SLAM method and device based on dotted line feature fusion

文档序号:1902813 发布日期:2021-11-30 浏览:8次 中文

阅读说明:本技术 基于点线特征融合的单目视觉贯导slam方法及装置 (Monocular vision through-guidance SLAM method and device based on dotted line feature fusion ) 是由 樊渊 潘高峰 宋程 潘天红 李腾 董翔 程松松 方笑晗 于 2021-07-30 设计创作,主要内容包括:本发明公开了基于点线特征融合的单目视觉贯导SLAM方法及装置,所述方法包括以下步骤:将单目相机获取的图像和IMU获取的环境信息作为输入信息;相机移动过程中获得很多帧图像,将所有帧图像中点线特征匹配成功的图像帧作为关键帧;对所有关键帧中提取特征点低于第一预设值的关键帧剔除,剩余的关键帧形成的相机运动轨迹构成多个局部地图,所有局部地图中点线特征个数低于第二预设值的剔除,对剩下的局部地图进行BA优化;重复相机轨迹进行循环检测和循环修正,得到新的局部地图,对全局地图进行全局BA优化,再经过循环检测和循环修正更新全局地图,构建出环境中的完整地图;本发明的优点在于:具有更高的精度和更小的轨迹误差。(The invention discloses a monocular vision through SLAM method and a monocular vision through SLAM device based on dotted line feature fusion, wherein the method comprises the following steps: taking an image acquired by a monocular camera and environmental information acquired by an IMU as input information; acquiring a plurality of frame images in the moving process of the camera, and taking the image frames with successfully matched point-line characteristics in all the frame images as key frames; extracting key frames with characteristic points lower than a first preset value from all key frames, removing the key frames, forming a plurality of local maps by camera motion tracks formed by the remaining key frames, removing the characteristic number of point lines in all the local maps lower than a second preset value, and performing BA optimization on the remaining local maps; repeating the camera track to perform cyclic detection and cyclic correction to obtain a new local map, performing global BA optimization on the global map, updating the global map through cyclic detection and cyclic correction to construct a complete map in the environment; the invention has the advantages that: with higher accuracy and smaller tracking errors.)

1. The monocular vision trans-ocular navigation SLAM method based on dotted line feature fusion is characterized by comprising the following steps of:

s1: taking an image acquired by a monocular camera and environmental information acquired by an IMU as input information;

s2: performing point-line feature extraction, point-line feature weighting and point-line feature matching based on input information, performing triangulation measurement while extracting the point-line features to acquire the pose of the camera in real time, installing an IMU (inertial measurement unit) at the gravity center of a monocular camera, acquiring the pose of the camera in real time by the IMU, fusing the poses of the cameras acquired in the two modes to acquire a camera moving process, acquiring a plurality of frame images in the camera moving process, and taking an image frame of which the point-line features are successfully matched in all the frame images as a key frame;

s3: extracting key frames with characteristic points lower than a first preset value from all key frames, removing the key frames, forming a plurality of local maps by camera motion tracks formed by the remaining key frames, removing the characteristic number of point lines in all the local maps lower than a second preset value, and performing BA optimization on the remaining local maps;

s4: the method comprises the steps of initializing a system, repeating camera tracks to perform cyclic detection and cyclic correction to obtain a new local map, and after the cyclic detection and the cyclic correction are finished, performing average operation on results of multiple detections and corrections on the same local map to obtain a final local map;

s5: and forming a global map by all the final local maps, carrying out global BA optimization on the global map, and updating the global map through cyclic detection and cyclic correction to construct a complete map in the environment.

2. The monocular visual coherence SLAM method based on dotted line feature fusion as claimed in claim 1, wherein step S1 is preceded by IMU initialization, wherein IMU initialization comprises:

putting the parameters collected by the IMU together to form a state vector yk={s,Rwg,b,v0:kWhere s denotes a scale factor, RwgRepresenting a gravity vector, wherein the representation of the gravity vector in a world coordinate system is g ═ RwggIWherein g isI=(0,0,G)TG is the modulus of gravity;representing the bias of the accelerometer and gyroscope, which is constant during initialization; v. of0:kRepresenting the unscaled body velocity from the first frame image to the last frame image;

the posterior distribution maximized in IMU initialization is p (y)k|I0:k)∝p(I0:k|yk)p(yk) Wherein, I0:kMeasurement information representing the first frame image to the last frame image, i0:kRepresents measurement information of the first frame image to the last frame image which have been measured andp(yk|I0:k) Showing the maximum posterior distribution, and oc showing the corresponding p (I)0:k|yk) Representing the likelihood, p (y)k) Representing a prior distribution;

the maximum a posteriori estimate of the IMU is expressed as

Wherein, Ii-1Measurement information representing the i-1 th frame, gdirRepresenting the gravity vector, vi-1Non-scale body velocity, v, representing the i-1 st frame imageiRepresenting the non-scale body speed of the ith frame image;

taking the negative logarithm of the maximum a posteriori estimate of the IMU and assuming that the IMU pre-integration and prior distribution obey a Gaussian distribution, the final maximum a posteriori estimate of the IMU is then expressed as

Wherein r ispRepresenting a prior residual term, Ii-1,iRepresenting measurement information of two adjacent frames.

3. The monocular visual coherence SLAM method based on dotted line feature fusion of claim 1, wherein the step S2 comprises: and (3) extracting point features through a FAST corner extraction algorithm, extracting line features through a line segment detector LSD algorithm, and weighting the point line features in a low-texture scene, wherein the weight of the line features is higher than that of the point features.

4. The monocular visual coherence SLAM method based on dotted line feature fusion of claim 1, wherein the process of the dotted line feature matching in step S2 is as follows:

the point features in adjacent frames of the image obtained by the monocular camera are respectively marked as KiAnd QjI, j ═ 1,2,3 …, K1And QjMatching the characteristic features of the nearest points in the middle distance to obtain K2And QjMatching the point feature with the closest distance, and completing point feature matching by analogy;

line features in adjacent frames of an image acquired by a monocular camera are respectively denoted as lcAnd l'cBy the formulaAcquiring a re-projection error, and successfully matching the line features when the re-projection error of the current line feature of the current frame of the image and a certain line feature in an adjacent frame of the current frame of the image is minimum, wherein l1、l2Respectively represent straight lines lcThe x-direction coefficient and the y-direction coefficient of (c),is l'cTo l respectivelycThe distance of (c).

5. The monocular vision consistency guidance SLAM method based on dotted line feature fusion as claimed in claim 1, wherein the camera movement process obtained by pose fusion of the two cameras in the step S2 comprises: the method comprises the steps of obtaining a first moving track of a camera according to triangularization measurement, obtaining a second moving track of the camera according to the pose of the camera obtained by an IMU, eliminating poses with pose deviation exceeding a third preset value at the same time after the first moving track and the second moving track are overlapped according to a time axis, and then averaging the first moving track and the second moving track to obtain a moving process of the camera.

6. The monocular visual coherence SLAM method based on dotted line feature fusion of claim 1, wherein the repeating of the camera trajectory in step S4 for loop detection and loop correction comprises: in the process of repeating the camera track, the cycle detection mainly comprises the steps of searching whether a key frame similar to a currently detected key frame is found in a key frame database, if so, using the key frame as a closed-loop candidate frame, using the closed-loop candidate frame to position the camera track again to obtain a new local map, and the cycle correction is to perform average operation on results of multiple detections and corrections of the same local map in the process of each cycle detection to obtain a final local map.

7. Monocular vision trans-conductance SLAM device based on dotted line feature fusion, characterized in that the device comprises:

the information acquisition module is used for taking the image acquired by the monocular camera and the environmental information acquired by the IMU as input information;

the system comprises a key frame acquisition module, a key frame acquisition module and a key frame matching module, wherein the key frame acquisition module is used for performing point line feature extraction, point line feature weighting and point line feature matching based on input information, triangularization measurement is performed while the point line feature extraction is performed to acquire the pose of a camera in real time, an IMU (inertial measurement unit) is installed at the gravity center of a monocular camera, the pose of the camera is also acquired by the IMU in real time, the poses of the cameras obtained in the two modes are fused to obtain a camera moving process, a plurality of frame images are obtained in the camera moving process, and image frames of all the frame images, of which the point line features are successfully matched, are used as key frames;

the BA optimization module is used for removing key frames with characteristic points lower than a first preset value from all key frames, forming a plurality of local maps by camera motion tracks formed by the remaining key frames, removing point line characteristic numbers in all the local maps lower than a second preset value, and carrying out BA optimization on the remaining local maps;

the detection and correction module is used for initializing the system, repeating the camera track to perform cyclic detection and cyclic correction to obtain a new local map, and after the cyclic detection and cyclic correction are finished, performing average operation on results of the same local map after multiple detections and corrections to obtain a final local map;

and the global optimization module is used for forming a global map by all the final local maps, carrying out global BA optimization on the global map, and updating the global map through cyclic detection and cyclic correction to construct a complete map in the environment.

8. The monocular visual coherence SLAM device based on dotted line feature fusion of claim 7, wherein the information acquisition module further comprises IMU initialization, the IMU initialization process is:

putting the parameters collected by the IMU together to form a state vector yk={s,Rwg,b,v0:kWhere s denotes a scale factor, RwgRepresenting a gravity vector, wherein the representation of the gravity vector in a world coordinate system is g ═ RwggIWherein g isI=(0,0,G)TG is the modulus of gravity;representing the bias of the accelerometer and gyroscope, which is constant during initialization; v. of0:kRepresenting the unscaled body velocity from the first frame image to the last frame image;

the posterior distribution maximized in IMU initialization is p (y)k|I0:k)∝p(I0:k|yk)p(yk) Wherein, I0:kMeasurement information representing the first frame image to the last frame image,representing the first to last frame images already measuredMeasure information andp(yk|I0:k) Showing the maximum posterior distribution, and oc showing the corresponding p (I)0:k|yk) Representing the likelihood, p (y)k) Representing a prior distribution;

the maximum a posteriori estimate of the IMU is expressed as

Wherein, Ii-1Measurement information representing the i-1 th frame, gdirRepresenting the gravity vector, vi-1Non-scale body velocity, v, representing the i-1 st frame imageiRepresenting the non-scale body speed of the ith frame image;

taking the negative logarithm of the maximum a posteriori estimate of the IMU and assuming that the IMU pre-integration and prior distribution obey a Gaussian distribution, the final maximum a posteriori estimate of the IMU is then expressed as

Wherein r ispRepresenting a prior residual term, Ii-1,iRepresenting measurement information of two adjacent frames.

9. The monocular visual coherence SLAM device based on dotted line feature fusion of claim 7, wherein the keyframe acquisition module is further configured to: and (3) extracting point features through a FAST corner extraction algorithm, extracting line features through a line segment detector LSD algorithm, and weighting the point line features in a low-texture scene, wherein the weight of the line features is higher than that of the point features.

10. The monocular visual coherence SLAM device based on dotted line feature fusion of claim 7, wherein the process of the dotted line feature matching in the keyframe acquisition module is:

the point features in adjacent frames of the image obtained by the monocular camera are respectively marked as KiAnd Qj1,2,3, and K1And QjMatching the characteristic features of the nearest points in the middle distance to obtain K2And QjMatching the point feature with the closest distance, and completing point feature matching by analogy;

line features in adjacent frames of an image acquired by a monocular camera are respectively denoted as lcAnd l'cBy the formulaAcquiring a re-projection error, and successfully matching the line features when the re-projection error of the current line feature of the current frame of the image and a certain line feature in an adjacent frame of the current frame of the image is minimum, wherein l1、l2Respectively represent straight lines lcThe x-direction coefficient and the y-direction coefficient of (c),is l'cTo l respectivelycThe distance of (c).

Technical Field

The invention relates to the field of map construction, in particular to a monocular vision through SLAM method and device based on dotted line feature fusion.

Background

With the gradual development of SLAM (simultaneous localization and mapping) technology, visual SLAM algorithm is becoming the hot spot of research today. The monocular vision SLAM method based on point features is a method for detecting and extracting surrounding environment corner points by taking a monocular camera as a sensor. The camera pose and the 3D map are estimated by using the extracted feature points through the algorithm, so that the calculation speed is high while the performance is good. The existing more sophisticated algorithms are: PTAM, ORB-SLAM2, and the like. However, the monocular vision SLAM method based on the point features guarantees the texture degree of the surrounding environment, and the monocular vision SLAM method does not perform well in a low-texture scene. The visual SLAM algorithm based on line features is proposed for insufficient or failed extraction of point feature algorithms in low-texture scenes. The algorithm utilizes line feature extraction and matching of the surrounding environment to complete the positioning of the robot and the composition of the environment. The line characteristics have the advantages of being capable of describing edge information of the environment, enriching geometric information in the environment and well describing low-texture environment information. The algorithm does not depend on the point characteristics of the environment, but completely depends on the line characteristic extraction and matching of the environment. The existing mature technologies include: LSD, LCD, etc. However, the visual SLAM algorithm based on the line features only uses the line features, which has great requirements on computing resources, and thus, the computing speed is slow, and the advantages in the texture environment are lost. The dynamic vision SLAM algorithm based on deep learning is characterized in that a dynamic object is identified based on a neural network technology, then the dynamic object is removed, and finally the environment except the dynamic object is matched and patterned. The main technical points are that a neural network (such as CNN) is used for identifying the dynamic object with extracted feature points, the dynamic object is not used for eliminating, and the rest static objects are used for calculation. Based on the problems that the methods are not suitable for low-texture scenes and the operation is complex, the visual through SLAM method with point-line feature fusion emerges in the prior art, and the method is suitable for the low-texture scenes while improving the positioning accuracy in a point-line fusion mode.

Chinese patent application publication No. CN109579840A discloses a close-fit binocular vision inertia SLAM method with dotted line feature fusion, comprising the following steps: determining a transformation relation between a camera coordinate system and an inertial sensor coordinate system: establishing a money-counting characteristic and IMU tracking line, solving the three-dimensional coordinates of an initial stereo point line, predicting the position of the characteristic point line by using the IMU after the IMU is initialized, establishing correct characteristic data association, and solving the pose transformation of continuous frames by combining an IMU and point line characteristic re-projection error term; establishing a dotted line characteristic and IMU local beam adjustment thread, and optimizing dotted line three-dimensional coordinates, a key frame pose and IMU state quantities in a local key frame window; and establishing a dotted line feature loop detection thread, calculating the score of the bag-of-words model by using dotted line feature weighting to detect the loop, and optimizing the global state quantity. This patent application can guarantee stably and high accuracy under the less and camera fast motion condition of characteristic point number, is applicable to low texture scene, but it weights the dotted line in the word area model, and the error that arouses because the dotted line is drawed inadequately in earlier stage will accumulate, arouses the great orbit error in later stage.

Disclosure of Invention

The invention aims to solve the technical problem that the accuracy of a finally generated map is not high due to large track error of a visual inertia SLAM method based on point-line fusion in the prior art.

The invention solves the technical problems through the following technical means: a monocular vision trans-ocular guidance SLAM method based on dotted line feature fusion comprises the following steps:

s1: taking an image acquired by a monocular camera and environmental information acquired by an IMU as input information;

s2: performing point-line feature extraction, point-line feature weighting and point-line feature matching based on input information, performing triangulation measurement while extracting the point-line features to acquire the pose of the camera in real time, installing an IMU (inertial measurement unit) at the gravity center of a monocular camera, acquiring the pose of the camera in real time by the IMU, fusing the poses of the cameras acquired in the two modes to acquire a camera moving process, acquiring a plurality of frame images in the camera moving process, and taking an image frame of which the point-line features are successfully matched in all the frame images as a key frame;

s3: extracting key frames with characteristic points lower than a first preset value from all key frames, removing the key frames, forming a plurality of local maps by camera motion tracks formed by the remaining key frames, removing the characteristic number of point lines in all the local maps lower than a second preset value, and performing BA optimization on the remaining local maps;

s4: the method comprises the steps of initializing a system, repeating camera tracks to perform cyclic detection and cyclic correction to obtain a new local map, and after the cyclic detection and the cyclic correction are finished, performing average operation on results of multiple detections and corrections on the same local map to obtain a final local map;

s5: and forming a global map by all the final local maps, carrying out global BA optimization on the global map, and updating the global map through cyclic detection and cyclic correction to construct a complete map in the environment.

According to the method, the monocular camera and the IMU are respectively input with prior information to determine the camera moving process, BA optimization is carried out after data with large errors are removed, then the camera track is repeated, and the final track, namely the final local map, is obtained through cyclic detection and cyclic correction and track averaging, and the process has higher precision and smaller track errors.

Further, before the step S1, an IMU initialization is further included, where the IMU initialization process is as follows:

putting the parameters collected by the IMU together to form a state vector yk={s,Rwg,b,v0:kWhere s denotes a scale factor, RwgRepresenting a gravity vector, wherein the representation of the gravity vector in a world coordinate system is g ═ RwggIWherein g isI=(0,0,G)TG is the modulus of gravity;representing the bias of the accelerometer and gyroscope, which is constant during initialization; v. of0:kRepresenting the unscaled body velocity from the first frame image to the last frame image;

the posterior distribution maximized in IMU initialization is p (y)k|I0:k)∝p(I0:k|yk)p(yk) Wherein, I0:kMeasurement information representing the first frame image to the last frame image,represents measurement information of the first frame image to the last frame image which have been measured andp(yk|I0:k) Maximum posterior distribution, oc indicates the corresponding p (I)0:k|yk) Representing the likelihood, p (y)k) Representing a prior distribution;

the maximum a posteriori estimate of the IMU is expressed as

Wherein, Ii-1Measurement information representing the i-1 th frame, gdirRepresenting the gravity vector, vi-1Non-scale body velocity, v, representing the i-1 st frame imageiRepresenting the non-scale body speed of the ith frame image;

taking the negative logarithm of the maximum a posteriori estimate of the IMU and assuming that the IMU pre-integration and prior distribution obey a Gaussian distribution, the final maximum a posteriori estimate of the IMU is then expressed as

Wherein r ispRepresenting a prior residual term, Ii-1,iRepresenting measurement information of two adjacent frames。

Further, the step S2 includes: and (3) extracting point features through a FAST corner extraction algorithm, extracting line features through a line segment detector LSD algorithm, and weighting the point line features in a low-texture scene, wherein the weight of the line features is higher than that of the point features.

Further, the process of the dotted line feature matching in step S2 is:

the point features in adjacent frames of the image obtained by the monocular camera are respectively marked as KiAnd QjI, j ═ 1,2,3 …, K1And QjMatching the characteristic features of the nearest points in the middle distance to obtain K2And QjMatching the point feature with the closest distance, and completing point feature matching by analogy;

line features in adjacent frames of an image acquired by a monocular camera are respectively denoted as lcAnd l'cBy the formulaAcquiring a re-projection error, and successfully matching the line features when the re-projection error of the current line feature of the current frame of the image and a certain line feature in an adjacent frame of the current frame of the image is minimum, wherein l1、l2Respectively represent straight lines lcThe x-direction coefficient and the y-direction coefficient of (c),is l'cTo l respectivelycThe distance of (c).

Further, the process of fusing the poses of the cameras obtained in the step S2 includes: the method comprises the steps of obtaining a first moving track of a camera according to triangularization measurement, obtaining a second moving track of the camera according to the pose of the camera obtained by an IMU, eliminating poses with pose deviation exceeding a third preset value at the same time after the first moving track and the second moving track are overlapped according to a time axis, and then averaging the first moving track and the second moving track to obtain a moving process of the camera.

Further, the repeating of the camera trajectory for loop detection and loop correction in step S4 includes: in the process of repeating the camera track, the cycle detection mainly comprises the steps of searching whether a key frame similar to a currently detected key frame is found in a key frame database, if so, using the key frame as a closed-loop candidate frame, using the closed-loop candidate frame to position the camera track again to obtain a new local map, and the cycle correction is to perform average operation on results of multiple detections and corrections of the same local map in the process of each cycle detection to obtain a final local map.

The invention also provides a monocular vision trans-conductance SLAM device based on dotted line feature fusion, which comprises:

the information acquisition module is used for taking the image acquired by the monocular camera and the environmental information acquired by the IMU as input information;

the system comprises a key frame acquisition module, a key frame acquisition module and a key frame matching module, wherein the key frame acquisition module is used for performing point line feature extraction, point line feature weighting and point line feature matching based on input information, triangularization measurement is performed while the point line feature extraction is performed to acquire the pose of a camera in real time, an IMU (inertial measurement unit) is installed at the gravity center of a monocular camera, the pose of the camera is also acquired by the IMU in real time, the poses of the cameras obtained in the two modes are fused to obtain a camera moving process, a plurality of frame images are obtained in the camera moving process, and image frames of all the frame images, of which the point line features are successfully matched, are used as key frames;

the BA optimization module is used for removing key frames with characteristic points lower than a first preset value from all key frames, forming a plurality of local maps by camera motion tracks formed by the remaining key frames, removing point line characteristic numbers in all the local maps lower than a second preset value, and carrying out BA optimization on the remaining local maps;

the detection and correction module is used for initializing the system, repeating the camera track to perform cyclic detection and cyclic correction to obtain a new local map, and after the cyclic detection and cyclic correction are finished, performing average operation on results of the same local map after multiple detections and corrections to obtain a final local map;

and the global optimization module is used for forming a global map by all the final local maps, carrying out global BA optimization on the global map, and updating the global map through cyclic detection and cyclic correction to construct a complete map in the environment.

Further, the information acquisition module further includes an IMU initialization before the information acquisition module, and the IMU initialization process includes:

putting the parameters collected by the IMU together to form a state vector yk={s,Rwg,b,v0:kWhere s denotes a scale factor, RwgRepresenting a gravity vector, wherein the representation of the gravity vector in a world coordinate system is g ═ RwggIWherein g isI=(0,0,G)TG is the modulus of gravity;representing the bias of the accelerometer and gyroscope, which is constant during initialization; v. of0:kRepresenting the unscaled body velocity from the first frame image to the last frame image;

the posterior distribution maximized in IMU initialization is p (y)k|I0:k)∝p(I0:k|yk)p(yk) Wherein, I0:kMeasurement information representing the first frame image to the last frame image,represents measurement information of the first frame image to the last frame image which have been measured andp(yk|I0:k) Showing the maximum posterior distribution, and oc showing the corresponding p (I)0:k|yk) Representing the likelihood, p (y)k) Representing a prior distribution;

the maximum a posteriori estimate of the IMU is expressed as

Wherein, Ii-1Measurement information representing the i-1 th frame, gdirRepresenting the gravity vector, vi-1Non-scale body speed representing i-1 frame imageDegree, viRepresenting the non-scale body speed of the ith frame image;

taking the negative logarithm of the maximum a posteriori estimate of the IMU and assuming that the IMU pre-integration and prior distribution obey a Gaussian distribution, the final maximum a posteriori estimate of the IMU is then expressed as

Wherein r ispRepresenting a prior residual term, Ii-1,iRepresenting measurement information of two adjacent frames.

Further, the key frame acquisition module is further configured to: and (3) extracting point features through a FAST corner extraction algorithm, extracting line features through a line segment detector LSD algorithm, and weighting the point line features in a low-texture scene, wherein the weight of the line features is higher than that of the point features.

Further, the process of the point-line feature matching in the key frame acquisition module is as follows:

the point features in adjacent frames of the image obtained by the monocular camera are respectively marked as KiAnd QjI, j ═ 1,2,3 …, K1And QjMatching the characteristic features of the nearest points in the middle distance to obtain K2And QjMatching the point feature with the closest distance, and completing point feature matching by analogy;

line features in adjacent frames of an image acquired by a monocular camera are respectively denoted as lcAnd l'cBy the formulaAcquiring a re-projection error, and successfully matching the line features when the re-projection error of the current line feature of the current frame of the image and a certain line feature in an adjacent frame of the current frame of the image is minimum, wherein l1、l2Respectively represent straight lines lcThe x-direction coefficient and the y-direction coefficient of (c),is l'cTo l respectivelycThe distance of (c).

Further, the process of obtaining the camera movement by fusing the poses of the cameras obtained in the two ways in the key frame obtaining module includes: the method comprises the steps of obtaining a first moving track of a camera according to triangularization measurement, obtaining a second moving track of the camera according to the pose of the camera obtained by an IMU, eliminating poses with pose deviation exceeding a third preset value at the same time after the first moving track and the second moving track are overlapped according to a time axis, and then averaging the first moving track and the second moving track to obtain a moving process of the camera.

Further, the repeating of the camera trajectory in the detection and correction module for cyclic detection and cyclic correction includes: in the process of repeating the camera track, the cycle detection mainly comprises the steps of searching whether a key frame similar to a currently detected key frame is found in a key frame database, if so, using the key frame as a closed-loop candidate frame, using the closed-loop candidate frame to position the camera track again to obtain a new local map, and the cycle correction is to perform average operation on results of multiple detections and corrections of the same local map in the process of each cycle detection to obtain a final local map.

The invention has the advantages that: according to the method, the monocular camera and the IMU are respectively input with prior information to determine the camera moving process, BA optimization is carried out after data with large errors are removed, then the camera track is repeated, and the final track, namely the final local map, is obtained through cyclic detection and cyclic correction and track averaging, and the process has higher precision and smaller track errors.

Drawings

FIG. 1 is a flowchart of a monocular visual coherence SLAM method based on dotted line feature fusion as disclosed in an embodiment of the present invention;

fig. 2 is a schematic diagram illustrating a relationship between a projection line and a matching line segment in a monocular visual trans-illumination SLAM method based on dotted line feature fusion according to an embodiment of the present invention;

fig. 3 is a schematic diagram illustrating distances from end points of a matching line segment to a projection line in a monocular visual coherence guidance SLAM method based on dotted line feature fusion according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1

As shown in fig. 1, the monocular visual coherence SLAM method based on dotted line feature fusion comprises the following steps:

s1: taking an image acquired by a monocular camera and environmental information acquired by an IMU as input information; the camera and the IMU run in parallel, image information and environment measurement information are respectively extracted, and then the environment information of the camera and the IMU is initialized in a close coupling mode. The tight coupling means that the information collected by the two persons is integrated together as input information, rather than being used as input information separately. It should be noted that, before the step S1, an IMU initialization is further included, and the IMU initialization process includes:

putting the parameters collected by the IMU together to form a state vector yk={s,Rwg,b,v0:kWhere s denotes a scale factor, RwgRepresenting a gravity vector, wherein the representation of the gravity vector in a world coordinate system is g ═ RwggIWherein g isI=(0,0,G)TG is the modulus of gravity;representing the bias of the accelerometer and gyroscope, which is constant during initialization; v. of0:kRepresenting the unscaled body velocity from the first frame image to the last frame image;

the posterior distribution maximized in IMU initialization is p (y)k|I0:k)∝p(I0:k|yk)p(yk) Wherein, I0:kDiagram representing the first frameLike the measurement information of the last frame image,represents measurement information of the first frame image to the last frame image which have been measured andp(yk|I0:k) Showing the maximum posterior distribution, and oc showing the corresponding p (I)0:k|yk) Representing the likelihood, p (y)k) Representing a prior distribution;

the maximum a posteriori estimate of the IMU is expressed as

Wherein, Ii-1Measurement information representing the i-1 th frame, gdirRepresenting the gravity vector, vi-1Non-scale body velocity, v, representing the i-1 st frame imageiRepresenting the non-scale body velocity of the ith frame image.

Taking the negative logarithm of the maximum a posteriori estimate of the IMU and assuming that the IMU pre-integration and prior distribution obey a Gaussian distribution, the final maximum a posteriori estimate of the IMU is then expressed as

Wherein r ispRepresenting a prior residual term, Ii-1,iRepresenting measurement information of two adjacent frames.

S2: performing point-line feature extraction, point-line feature weighting and point-line feature matching based on input information, performing triangulation measurement while extracting the point-line features to acquire the pose of the camera in real time, installing an IMU (inertial measurement unit) at the gravity center of a monocular camera, acquiring the pose of the camera in real time by the IMU, fusing the poses of the cameras acquired in the two modes to acquire a camera moving process, acquiring a plurality of frame images in the camera moving process, and taking an image frame of which the point-line features are successfully matched in all the frame images as a key frame;

in this embodiment, point feature extraction is performed by a FAST corner extraction algorithm, line feature extraction is performed by a line segment detector LSD algorithm, and in a low-texture scene, the point-line feature is weighted and the weight of the line feature is higher than that of the point feature. The FAST corner extraction algorithm belongs to the prior art, and the general process is as follows:

(1) selecting a pixel p in the image, assuming its intensity is Ip

(2) Setting a threshold T (e.g., I)p10% of).

(3) With the pixel p as the center, 16 pixels on a circle with a radius of 3 are selected.

(4) Assuming that the brightness of N successive points on the selected circle is greater than Ip+ T or less than IpT, then pixel p can be considered a point feature.

(5) The same operation is performed for each pixel by looping through the above four parts.

When extracting the point and line features, the point features are not easy to be extracted on the premise that the invention is performed under low texture. And carrying out weighting processing on the point and line characteristics. The invention is based on the low-texture scene, the extraction quantity of the line features is more than that of the point features, and therefore, the weighting is carried out by taking the line features as the main part and taking the point features as the auxiliary part in the weighting process, the weighting mode is to weight the effective point line features extracted from the environment, and the weighting mode can still initialize the map according to the line features when the point line features are extracted and the point features are not extracted for continuous frames, thereby not influencing the orderly process of the system. Therefore, the invention divides the proportion of the points and the line characteristics according to the number of the line characteristics, avoids complex weight distribution calculation and ensures the real-time performance of the system. The threshold for the number of line features is set to N.

Dotted line feature weight wpAnd wlThe distribution is as follows:

Nl>N:wp=0.2,wl=0.8

Nl<N:

wherein N islIndicating the number of successful line matches.

The process of point-line feature matching is as follows:

the point features in adjacent frames of the image obtained by the monocular camera are respectively marked as KiAnd QjI, j ═ 1,2,3 …, K1And QjMatching the characteristic features of the nearest points in the middle distance to obtain K2And QjMatching the point feature with the closest distance, and completing point feature matching by analogy;

before line feature matching, the feature lines in the space should be represented mathematically for computer description. Let the homogeneous coordinate of two end points of the space straight line L be P1=[x1,y1,z1,w1]TAnd P2=[x2,y2,z2,w2]TThe Prock matrix is defined asThen its non-homogeneous coordinates areAndthe straight line's Prock coordinates are:

where L is a 6 × 1 vector containing n and v, n is the moment vector of the straight line, v is the direction vector of the straight line, and n is perpendicular to the plane defined by the origin and the straight line. w is a1And w2Indicating the homogeneous factor.

The transformation relationship between the prock matrix T and the coordinate L is as follows:

wherein n isAn antisymmetric matrix representing the moment vector n of a straight line.

Setting a linear transformation matrix HcwIt is composed of a rotation matrixAnd translation vectorAnd (4) forming. L iscIs a straight line L in spacewFrom the spatial coordinate system to the coordinates of the camera coordinate system. Handle LcExpressed in prock coordinates as:

wherein the content of the first and second substances,an antisymmetric matrix representing the translation vector.

Let a straight line LcProjected onto the image plane as lcThe Prockian coordinates are expressed as:

wherein KlProjection matrix representing straight lines, fx、fy、cx、cyFocal length and image center representing the inside of camera intrinsic parameters (each camera has a fixed focal length and image center), ncRepresenting the normal vector in Prock expression, l1、l2、l3Represents a straight line lcCoefficient of (a), (b), c)c=l1x+l2y+l3z。

As shown in fig. 2 and 3, the line features in the adjacent frames of the image obtained by the monocular camera are respectively denoted as lcAnd l'cBy the formulaAcquiring a re-projection error, and successfully matching the line features when the re-projection error of the current line feature of the current frame of the image and a certain line feature in an adjacent frame of the current frame of the image is minimum, wherein l1、l2Respectively represent straight lines lcThe x-direction coefficient and the y-direction coefficient of (c),is l'cTo l respectivelycThe distance of (c).

The process of fusing the poses of the cameras obtained in the two ways to obtain the camera movement in the step S2 includes: the method comprises the steps of obtaining a first moving track of a camera according to triangularization measurement, obtaining a second moving track of the camera according to the pose of the camera obtained by an IMU, eliminating poses with pose deviation exceeding a third preset value at the same time after the first moving track and the second moving track are overlapped according to a time axis, and then averaging the first moving track and the second moving track to obtain a moving process of the camera.

S3: extracting key frames with characteristic points lower than a first preset value from all key frames, removing the key frames, forming a plurality of local maps by camera motion tracks formed by the remaining key frames, removing the characteristic number of point lines in all the local maps lower than a second preset value, and performing BA optimization on the remaining local maps; the process of BA optimization is the prior art, and a document SLAM fourteen says _ back end 1 uploaded on a CSDN blog in Jinterest2020, 1 month and 14 days discloses the specific process of BA optimization as follows:

the world coordinates are converted to camera coordinates using the rotation matrix R and translation matrix t of the camera.

P′=Rp+t=[X′,Y′,Z′]

Projecting P' to a normalization plane to obtain a normalization coordinate:

the distortion condition of the normalized coordinates is considered to obtain the original pixel coordinates before distortion removal

According to the internal reference model, pixel coordinates are calculated:

the observation data of the camera during the movement is recorded as:

z=h(x,y)

x denotes the pose of the camera, that is, it corresponds to the transformation matrix T, y denotes the three-dimensional point p, and then the observation error is:

e=z-h(T,p)

taking into account the observations at other times. Let zijIn-position posture TiWhere observation road sign pjThe resulting data, then the overall cost function is:

the least square solution is equivalent to adjusting the pose and the road sign at the same time, namely BA.

S4: the method comprises the steps of initializing a system, repeating camera tracks to perform cyclic detection and cyclic correction to obtain a new local map, and after the cyclic detection and the cyclic correction are finished, performing average operation on results of multiple detections and corrections on the same local map to obtain a final local map; wherein repeating the camera trajectory for loop detection and loop correction comprises: in the process of repeating the camera track, the cycle detection mainly comprises the steps of searching whether a key frame similar to a currently detected key frame is found in a key frame database, if so, using the key frame as a closed-loop candidate frame, using the closed-loop candidate frame to position the camera track again to obtain a new local map, and the cycle correction is to perform average operation on results of multiple detections and corrections of the same local map in the process of each cycle detection to obtain a final local map.

S5: and forming a global map by all the final local maps, carrying out global BA optimization on the global map, and updating the global map through cyclic detection and cyclic correction to construct a complete map in the environment.

According to the technical scheme, the monocular camera and the IMU are respectively input with prior information to determine the camera moving process, BA optimization is carried out after data with large errors are removed, then the camera track is repeated, the final track, namely the final local map, is obtained through track averaging after cyclic detection and cyclic correction, and the process has higher precision and smaller track errors.

Example 2

Based on embodiment 1 of the present invention, embodiment 2 of the present invention further provides a monocular vision through SLAM device based on dotted line feature fusion, where the device includes:

the information acquisition module is used for taking the image acquired by the monocular camera and the environmental information acquired by the IMU as input information;

the system comprises a key frame acquisition module, a key frame acquisition module and a key frame matching module, wherein the key frame acquisition module is used for performing point line feature extraction, point line feature weighting and point line feature matching based on input information, triangularization measurement is performed while the point line feature extraction is performed to acquire the pose of a camera in real time, an IMU (inertial measurement unit) is installed at the gravity center of a monocular camera, the pose of the camera is also acquired by the IMU in real time, the poses of the cameras obtained in the two modes are fused to obtain a camera moving process, a plurality of frame images are obtained in the camera moving process, and image frames of all the frame images, of which the point line features are successfully matched, are used as key frames;

the BA optimization module is used for removing key frames with characteristic points lower than a first preset value from all key frames, forming a plurality of local maps by camera motion tracks formed by the remaining key frames, removing point line characteristic numbers in all the local maps lower than a second preset value, and carrying out BA optimization on the remaining local maps;

the detection and correction module is used for initializing the system, repeating the camera track to perform cyclic detection and cyclic correction to obtain a new local map, and after the cyclic detection and cyclic correction are finished, performing average operation on results of the same local map after multiple detections and corrections to obtain a final local map;

and the global optimization module is used for forming a global map by all the final local maps, carrying out global BA optimization on the global map, and updating the global map through cyclic detection and cyclic correction to construct a complete map in the environment.

Specifically, the information acquisition module further includes an IMU initialization before the information acquisition module, and the IMU initialization process is as follows:

putting the parameters collected by the IMU together to form a state vector yk={s,Rwg,b,v0:kWhere s denotes a scale factor, RwgRepresenting a gravity vector, wherein the representation of the gravity vector in a world coordinate system is g ═ RwggIWherein g isI=(0,0,G)TG is the modulus of gravity;representing the bias of the accelerometer and gyroscope, which is constant during initialization; v. of0:kRepresenting the unscaled body velocity from the first frame image to the last frame image;

the posterior distribution maximized in IMU initialization is p (y)k|I0:k)∝p(I0:k|yk)p(yk) Wherein, I0:kMeasurement information representing the first frame image to the last frame image,represents measurement information of the first frame image to the last frame image which have been measured andp(yk|I0:k) Showing the maximum posterior distribution, and oc showing the corresponding p (I)0:k|yk) Representing the likelihood, p (y)k) Representing a prior distribution;

the maximum a posteriori estimate of the IMU is expressed as

Wherein, Ii-1Measurement information representing the i-1 th frame, gdirRepresenting the gravity vector, vi-1Non-scale body velocity, v, representing the i-1 st frame imageiRepresenting the non-scale body speed of the ith frame image;

taking the negative logarithm of the maximum a posteriori estimate of the IMU and assuming that the IMU pre-integration and prior distribution obey a Gaussian distribution, the final maximum a posteriori estimate of the IMU is then expressed as

Wherein r ispRepresenting a prior residual term, Ii-1,iRepresenting measurement information of two adjacent frames.

Specifically, the key frame acquiring module is further configured to: and (3) extracting point features through a FAST corner extraction algorithm, extracting line features through a line segment detector LSD algorithm, and weighting the point line features in a low-texture scene, wherein the weight of the line features is higher than that of the point features.

Specifically, the process of matching the point line features in the key frame acquisition module is as follows:

the point features in adjacent frames of the image obtained by the monocular camera are respectively marked as KiAnd QjI, j ═ 1,2,3 …, K1And QjMatching the characteristic features of the nearest points in the middle distance to obtain K2And QjMatching the point feature with the closest distance, and completing point feature matching by analogy;

line features in adjacent frames of an image acquired by a monocular camera are respectively denoted as lcAnd l'cBy the formulaAcquiring a re-projection error, and successfully matching the line features when the re-projection error of the current line feature of the current frame of the image and a certain line feature in an adjacent frame of the current frame of the image is minimum, wherein l1、l2Respectively represent straight lines lcThe x-direction coefficient and the y-direction coefficient of (c),is l'cTo l respectivelycThe distance of (c).

Specifically, the process of fusing the poses of the cameras obtained in the two modes in the key frame acquisition module to obtain the movement of the cameras comprises the following steps: the method comprises the steps of obtaining a first moving track of a camera according to triangularization measurement, obtaining a second moving track of the camera according to the pose of the camera obtained by an IMU, eliminating poses with pose deviation exceeding a third preset value at the same time after the first moving track and the second moving track are overlapped according to a time axis, and then averaging the first moving track and the second moving track to obtain a moving process of the camera.

Specifically, the repeating of the camera trajectory in the detection and correction module for cyclic detection and cyclic correction includes: in the process of repeating the camera track, the cycle detection mainly comprises the steps of searching whether a key frame similar to a currently detected key frame is found in a key frame database, if so, using the key frame as a closed-loop candidate frame, using the closed-loop candidate frame to position the camera track again to obtain a new local map, and the cycle correction is to perform average operation on results of multiple detections and corrections of the same local map in the process of each cycle detection to obtain a final local map.

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

16页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种八叉树地图构建方法及系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!