unsupervised absolute scale calculation method and system

文档序号:1578319 发布日期:2020-01-31 浏览:14次 中文

阅读说明:本技术 一种无监督绝对尺度计算方法及系统 (unsupervised absolute scale calculation method and system ) 是由 蔡行 王璐瑶 李承远 李宏 于 2019-10-12 设计创作,主要内容包括:本发明公开了一种无监督绝对尺度计算方法及系统,利用GAN(Generative Adversarial Networks,生成式对抗网络)对参考绝对尺度深度图和预测深度图进行判别,使深度图拥有绝对尺度,同时,由于重投影误差的约束,使预测深度图与位姿在同一尺度,所以位姿也拥有了绝对尺度。(The invention discloses unsupervised absolute scale calculation methods and systems, which utilize GAN (Generative adaptive Networks) to distinguish a reference absolute scale depth map and a predicted depth map, so that the depth map has absolute scales, and meanwhile, due to the constraint of a reprojection error, the predicted depth map and a pose are in the same scale, so that the pose also has absolute scales.)

The method for calculating the unsupervised absolute scale is characterized by comprising a pose depth network model T, a depth network model G1, a depth network model G2, a discrimination model D1, a discrimination model D2 and a countermeasure loss function, and comprises the following steps:

s1, preparing a monocular video data set and a reference depth map data set with absolute scale, wherein the data distribution of the monocular video data set and the data distribution of the reference depth map data set with absolute scale are not related;

s2, extracting at least 2 images from the monocular video data set in the step S1, wherein the images comprise a source image and a target image, an overlapping area exists between the source image and the target image, the source image and the target image are propagated forwards through the model T, and the relative pose between the source image and the target image is calculated; the target image is propagated forwards, the depth value of the image pixel is calculated through a model G1, and a prediction depth map is calculated; the reference depth map data set in the step S1 is subjected to forward propagation, and the color image is reconstructed by the model G2, so as to calculate a forged RGB image with an absolute scale;

s3, obtaining a re-projection source image through visual reconstruction of the relative pose and the predicted depth map in the step S2; the predicted depth map in step S2 is propagated forward, and the color image is reconstructed by the model G2 to calculate a reconstructed target image; in the step S2, the forged RGB image is propagated forwards, the depth value of the image pixel is calculated through the model G1, and the reconstructed reference depth is calculated; outputting the authenticity probability of the predicted depth map in the step S2 by using the reference depth map in the step S1 as a reference through the discriminant model D1; the forged RGB image and the target image in step S2 pass through the model D2, and the authenticity probability of the forged RGB image is output with the target object in step S2 as a reference; calculating a countermeasure error between the models G1 and D1 and a countermeasure error between the models G2 and D2 using a countermeasure loss function;

s4, calculating a reprojection error between the source image and the reprojected source image in the step S3, calculating a reconstruction error between the target image and the reconstructed target image in the step S3, and calculating a reconstruction error between the reference depth map and the reconstructed reference depth in the step S3;

s5, obtaining a loss function through the sum of the countermeasure error, the reprojection error and the reconstruction error, carrying out back propagation, and carrying out iterative updating until the loss function is converged;

and S6, inputting pairs of source images and target images into a test data set, respectively carrying out forward propagation by using a model T and a model G1, and calculating a predicted depth map of the camera relative pose with absolute scale and the target image.

2. The unsupervised absolute scale calculation method of claim 1, wherein the penalty function between G1 and D1 in step S3 is:

Figure FDA0002230897620000022

where xrgb is the input RGB image and xref is the reference depth map.

3. The unsupervised absolute scale calculation method of claim 1, wherein the penalty function between G2 and D2 in step S3 is:

Figure FDA0002230897620000023

where xrgb is the input RGB image and xref is the reference depth map.

4. The unsupervised absolute scale calculation method of claim 1, wherein the reconstruction error in step S4 is calculated by:

Figure FDA0002230897620000024

where xrgb is the input RGB image and xref is the reference depth map.

5. The unsupervised absolute scale calculation method of claim 1, wherein the loss function in step S5 is:

Figure FDA0002230897620000021

where lswood is the smoothing loss function of the depth map, Lreprojection is the reprojection error in S4, Lcycle is the sum of the countermeasure error and the reconstruction error, and α and β are weighting coefficients.

6. The unsupervised absolute scale calculation method of claim 5, wherein the Lcycle in step S5 is:

Lcycle=γ*Lrec+Ladv1+Ladv2

where Lrec is the reconstruction error in S4, Ladv1 is the confrontation error between G1 and D1 in S3, Ladv2 is the confrontation error between G2 and D2 in S3, and γ is the weight coefficient.

7. The unsupervised absolute scale calculation method of claim 1, wherein the loss function in step S5 is trained using Adam optimization method.

A system for unsupervised absolute scale computation, comprising a pose estimation depth network module T for extracting relative poses, a depth network module G1, a depth network module G2, a discriminant module D1, a discriminant module D2 and a loss function module, a module G1 for computing depth values of each pixel of an image, a module G2 for reconstructing a color image, discriminant modules D1 and D2 for outputting a plausibility probability, modules G1 and D1 being constrained by the loss function module, and modules G2 and D2 being constrained by the loss function module.

Technical Field

The invention belongs to the field of visual odometers and depth estimation methods in the field of computer vision, and particularly relates to unsupervised absolute scale calculation methods and systems.

Background

In recent years, monocular dense depth estimation based on a depth learning method and algorithms of visual odometry vo (visual odometry) have rapidly developed, and are also key modules of SfM and SLAM systems. Studies have shown that VO and depth estimation based on supervised depth learning achieve good performance in many challenging environments and mitigate performance degradation problems such as scale drift. However, in practical applications it is difficult and expensive to train these supervised models to obtain sufficient data with authentic signatures. In contrast, the unsupervised approach has the great advantage that only unlabeled video sequences are required.

Depth unsupervised models of depth and pose estimation typically employ two modules, of which predict the depth map and of which estimate the camera relative pose after transforming the image projection from the source image to the target image using the estimated depth map and pose, these models are trained in an end-to-end fashion using photometric error loss as an optimization target.

The classical problems with monocular VOs are that motion estimation and depth maps can only be recovered at unknown scales due to the characteristics of the monocular camera, if there are no absolute scales as anchor points, then the scales of the pose and depth maps are prone to drift throughout the training process.

The problem of scale recovery. As the monocular VO and the depth have no absolute scale information, the estimated pose and the estimated depth cannot be directly utilized or subjected to performance evaluation with a true value. So scale recovery is required. The existing monocular unsupervised deep learning framework adopts the following method to compare with the true value to calculate the scale. For the depth map, the single scale is calculated by the following formula, wherein the mean refers to the median of the whole prediction image,

for pose, the calculation method is as follows, scales are calculated every 5 frames and true value

Such a scale recovery method is difficult to be applied in practice because there is no way to get the true value of each frame of image in a real scene.

Disclosure of Invention

The working principle of the method is that a GAN (Generative adaptive Networks) is utilized to distinguish a reference absolute scale depth map and a prediction depth map, so that the depth map has absolute scale, and meanwhile, due to the constraint of a reprojection error, the prediction depth map and the pose are in the same scale, so that the pose also has absolute scale.

In order to solve the above problems, the present invention provides unsupervised absolute scale calculation methods and systems.

The technical scheme adopted by the invention is as follows:

unsupervised absolute scale calculation method, which comprises a pose depth network model T, a depth network model G1, a depth network model G2, a discriminant model D1, a discriminant model D2 and a confrontation loss function, and comprises the following steps:

s1, preparing a monocular video data set and a reference depth map data set with absolute scale, wherein the data distribution of the monocular video data set and the data distribution of the reference depth map data set with absolute scale are not related;

s2, extracting at least 2 images from the monocular video data set in the step S1, wherein the images comprise a source image and a target image, an overlapping area exists between the source image and the target image, the source image and the target image are propagated forwards through the model T, and the relative pose between the source image and the target image is calculated; the target image is propagated forwards, the depth value of the image pixel is calculated through a model G1, and a prediction depth map is calculated; the reference depth map data set in the step S1 is subjected to forward propagation, and the color image is reconstructed by the model G2, so as to calculate a forged RGB image with an absolute scale;

s3, obtaining a re-projection source image through visual reconstruction of the relative pose and the predicted depth map in the step S2; the predicted depth map in step S2 is propagated forward, and the color image is reconstructed by the model G2 to calculate a reconstructed target image; in the step S2, the forged RGB image is propagated forwards, the depth value of the image pixel is calculated through the model G1, and the reconstructed reference depth is calculated; outputting the authenticity probability of the predicted depth map in the step S2 by using the reference depth map in the step S1 as a reference through the discriminant model D1; the forged RGB image and the target image in step S2 pass through the model D2, and the authenticity probability of the forged RGB image is output with the target object in step S2 as a reference; calculating a countermeasure error between the models G1 and D1 and a countermeasure error between the models G2 and D2 using a countermeasure loss function;

s4, calculating a reprojection error between the source image and the reprojected source image in the step S3, calculating a reconstruction error between the target image and the reconstructed target image in the step S3, and calculating a reconstruction error between the reference depth map and the reconstructed reference depth in the step S3;

s5, obtaining a loss function through the sum of the countermeasure error, the reprojection error and the reconstruction error, carrying out back propagation, and carrying out iterative updating until the loss function is converged;

and S6, inputting pairs of source images and target images into a test data set, respectively carrying out forward propagation by using a model T and a model G1, and calculating a predicted depth map of the camera relative pose with absolute scale and the target image.

The GAN is adopted to fuse absolute scale information, a reference absolute scale depth map and a prediction depth map are distinguished, the depth map has absolute scale, meanwhile, due to the constraint of a reprojection error, the prediction depth map and the pose are in the same scale, the pose also has absolute scale, and the model is novel unsupervised frames for monocular vision and depth estimation, the depth and the pose estimated by the frames are in absolute scale, and therefore the model can be directly applied to an actual scene.

, the penalty function between G1 and D1 in step S3 is:

Figure BDA0002230897630000031

where xrgb is the input RGB image and xref is the reference depth map. By means of constraint against loss, model parameters in G1 and D1 are continuously optimized in an iterative mode, depth values and absolute scales of the predicted depth map generated by G1 are gradually accurate, D1 cannot give clear truth decision, and the optimization process can be considered to be converged.

, the penalty function between G2 and D2 in step S3 is:

Figure BDA0002230897630000032

where xrgb is the input RGB image and xref is the reference depth map.

, the reconstruction error in step S4 is calculated by:

Figure BDA0002230897630000041

where xrgb is the input RGB image and xref is the reference depth map.

, the loss function in step S5 is:

Figure BDA0002230897630000042

where lswood is the smoothing loss function of the depth map, Lreprojection is the reprojection error in S4, Lcycle is the sum of the countermeasure error and the reconstruction error, and α and β are weighting coefficients.

, the Lcycle in the step S5 is:

Lcycle=γ*Lrec+Ladv1+Ladv2

where Lrec is the reconstruction error in S4, Ladv1 is the confrontation error between G1 and D1 in S3, Ladv2 is the confrontation error between G2 and D2 in S3, and γ is the weight coefficient.

Further , the loss function in step S5 is trained using the Adam optimization method.

A system for unsupervised absolute scale computation, comprising a pose estimation depth network module T for extracting relative poses, a depth network module G1, a depth network module G2, a discriminant module D1, a discriminant module D2 and a loss function module, a module G1 for computing depth values of each pixel of an image, a module G2 for reconstructing color images, discriminant modules D1 and D2 for outputting plausibility probabilities, modules G1 and D1 being constrained by the loss function module, and modules G2 and D2 being constrained by the loss function module.

Compared with the prior art, the invention has the following advantages and effects:

1. novel unsupervised frameworks for monocular vision and depth estimation are provided, the frameworks adopt GAN to fuse absolute scale information, a reference absolute scale depth map and a prediction depth map are distinguished, so that the depth map has absolute scale, meanwhile, due to the constraint of reprojection errors, the prediction depth map and the pose have the same scale, so that the pose also has the absolute scale.

2. And a Cycle constraint module Cycle-GAN is introduced to ensure consistency of the reference RGB image and the predicted depth map.

Drawings

The accompanying drawings, which form a part hereof , are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.

FIG. 1 is a general flow diagram of the present invention;

FIG. 2 is a basic framework diagram of the metric learning of the present invention;

FIG. 3 is a graph comparing depth map results with other algorithms in accordance with the present invention;

FIG. 4 is a trajectory comparison of pose results of the present invention with other algorithms;

FIG. 5 is a graph comparing the depth estimation results of the algorithm of the present invention with other algorithms;

FIG. 6 is a comparison of the pose estimation results of the algorithm of the present invention with other algorithms;

FIG. 7 shows the structure and parameters of the decoder model G1, G2 according to the present invention;

FIG. 8 illustrates the model T decoder structure and parameters according to the present invention;

FIG. 9 shows the decoder structure and parameters of the model D1, D2 according to the present invention.

Detailed Description

For purposes of making the objects, aspects and advantages of the present invention more apparent, the present invention will be described in detail below with reference to the accompanying drawings and examples.

13页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:激光光斑中心检测方法、装置、计算机设备及存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!