A kind of binocular solid matching process based on convolutional neural networks

文档序号：1773186 发布日期：2019-12-03 浏览：31次中文

阅读说明：本技术 一种基于卷积神经网络的双目立体匹配方法 (A kind of binocular solid matching process based on convolutional neural networks ) 是由王亮赵长双于 2019-09-09 设计创作，主要内容包括：本发明公开了一种基于卷积神经网络的双目立体匹配方法。对于匹配代价计算,在初始特征的基础上利用稠密块整合上下文信息。对于匹配代价聚和,提出一个小型编解码结构正则化代价量。对于视差计算,在代价量的视差维度上执行一个可微分的soft argmin操作获取初始视差。对于视差细化,以残差块为主,相似性度量为辅指导细化初始视差。本发明严格遵照立体匹配算法的4个阶段,并将4个步骤整合到一个网络中,可端对端的对网络进行训练。本发明的立体匹配方法在特征提取过程中整合上下文信息有效的缓解了病态区域内像素点的误匹配,正则化过程中小型编解码结构显著减少了网络训练/推测期间的内存占用和运行时间,提高了视差预测精度。(The invention discloses a kind of binocular solid matching process based on convolutional neural networks.Matching cost is calculated, integrates contextual information using dense piece on the basis of initial characteristics.Matching cost is gathered and proposes a small-sized encoding and decoding structure regularization cost amount.For disparity computation, a differentiable soft argmin operation is executed in the parallax dimension of cost amount and obtains initial parallax.Parallax is refined, the guidance refinement initial parallax based on residual block, supplemented by similarity measurement.The present invention is strictly integrated into a network in accordance with 4 stages of Stereo Matching Algorithm, and by 4 steps, end-to-end can be trained to network.Solid matching method of the invention integrates the error hiding that contextual information effectively alleviates pixel in ill region in characteristic extraction procedure, the middle-size and small-size encoding and decoding structure of regularization process significantly reduces EMS memory occupation and runing time during network training/supposition, improves parallax precision of prediction.)

1. a kind of binocular solid matching process based on convolutional neural networks, it is characterised in that the following steps are included:

Step 1: building data set is simultaneously pre-processed, and the data set includes reference picture and corresponding target image, reference As one group of stereo pairs, all stereo pairs all through overcorrection, i.e., have in only the horizontal direction for image and target image Offset, vertical direction is without offset；

Step 2: building Stereo matching network, the Stereo matching network include initial characteristics extraction module, relevant layers module, Context information module, cost amount module, regularization module, disparity computation module and parallax refinement module；

The initial characteristics extraction module is the twin network of a shared weight, special for carrying out to input stereo pairs Sign is extracted, and input is input stereo pairs to be matched, and output is two unitary features；Wherein the twin network is first Down-sampling is carried out to input stereo pairs first with a convolutional layer, followed by 2 residual error layers further to input Stereo pairs are handled, wherein first residual error layer includes 3 residual blocks, second includes 4 residual blocks；Each residual error Block structure is BN-conv-BN-ReLU-conv-BN, and wherein BN, conv and ReLU respectively refer to batch normalization, convolutional layer and amendment Linear unit.After above-mentioned convolution operation, the output of the twin network is two special having a size of H/4 × W/4 × F unitary Sign, wherein H, W respectively indicate the height and width of original input picture, and F indicates characteristic dimension；

The relevant layers module includes two parts operation: first part refers to be stood what first residual error layer of twin network exported Rectangular block dot product operations are executed between body characteristics pair, for obtaining the similitude of stereoscopic features pair, i.e. relevant layers M_f；Second part Refer to and executes rectangular block dot product operations between input stereo pairs, it is for obtaining the similitude of input stereo pairs, i.e., related Layer M_c；

The context information module is made of dense piece and a convolutional layer, for what is extracted for initial characteristics extraction module Contextual information is added in two unitary features, and input is two unitary features that initial characteristics extraction module extracts, and output is Two characteristic patterns comprising contextual information, characteristic pattern dimension are H/4 × W/4 × F, and wherein H, W, which are respectively indicated, is originally inputted figure The height and width of picture, F indicate characteristic dimension；

The cost amount module is used to calculate the matching cost of two characteristic patterns, and input is two and includes contextual information Characteristic pattern, output are a cost amounts, specific calculating process include: by the fixed reference feature figure comprising contextual information with it is corresponding It is attached under each possible parallax between target signature comprising contextual information, and is wrapped into a 4D cost The cost amount dimension of amount, the cost amount module output is H/4 × W/4 × (D+1)/4 × F, and wherein H, W respectively indicate original The height and width of input picture, D indicate maximum possible parallax value, and F indicates characteristic dimension；

The regularization module is a small-sized encoding and decoding structure, is carried out for learning a regular function in cost amount Cost it is poly- and, input is cost amount, and output is regularization characteristic pattern；Wherein the small-sized encoding and decoding structure include coding and Two stages are decoded, coding stage includes 6 3D convolutional layers, is divided into three encoded hierarchies, each encoded hierarchy is using two volumes Lamination, and only first convolutional layer is followed by a BN and ReLU；Two 3D warp laminations are only applied to carry out in decoding stage Up-sampling, and before each up-sampling the characteristic pattern of corresponding dimension is added from coding stage with retain coarse high layer information and Detailed low level information；Finally, characteristic dimension is further reduced using two 3D convolutional layers to obtain regularization characteristic pattern, it is described Regularization characteristic pattern dimension be H/4 × W/4 × (D+1)/4 × 1, wherein H, W respectively indicate the height and width of original input picture, D indicates maximum possible parallax value；

The disparity computation module is to be tieed up using a differentiable soft argmin operation in the parallax of regularization characteristic pattern It carries out parallax on degree to return to predict smooth continuous initial parallax figure, input is regularization characteristic pattern, and output is that dimension is The initial parallax figure of H/4 × W/4 × 1, wherein H, W respectively indicate the height and width of original input picture；

The parallax refinement module is for further refining disparity estimation, and input is initial parallax figure, and output is finally to regard Difference figure.

Step 3: pretreated training dataset stereo pairs model training: being input to the mould of Stereo matching network first Propagated forward calculating is carried out in type, obtains final parallax；Then, the final parallax of output and true disparity map are input to In loss function, backpropagation is carried out using batch gradient descent method；Finally, repeatedly updating the study of iterative model according to gradient For parameter to obtain optimal Stereo matching network model, the learning parameter of the model includes weight and biasing；

Step 4: carrying out binocular solid matching using the Stereo matching network model that training is completed.

2. binocular solid matching process according to claim 1, which is characterized in that pretreatment described in step 1 refers to: right Each input stereo pairs carry out random cropping in data set, then it are normalized operation.

3. binocular solid matching process according to claim 1, which is characterized in that the twin network utilizes a volume Product core is that the convolutional layer that 5 × 5, step-length is 2 carries out a down-sampling to input stereo pairs；Two residual error layers in twin network Convolution kernel be 3 × 3, characteristic dimension is 32, in addition to first residual block step-length is 2 in second residual error layer, remaining step Long is all 1.

4. binocular solid matching process according to claim 1, which is characterized in that rectangular block described in relevant layers module Dot product operations are defined as follows:

c(x₁, x₂)=∑_{o∈[-k,k]×[-k,k]}<f₁(x₁+o),f₂(x₂+o)> (1)

Wherein f₁、f₂Refer to two single channel unitary features of input, x₁、x₂Respectively refer to f₁、f₂The center of rectangular block in unitary feature, K refers to the size of rectangular block, and the effect that relevant layers execute is to allow f₁In each rectangular block and f₂In each rectangular block compared Compared with.Maximum displacement d is provided, for f₁Each rectangular block center x₁, relevant layers c (x₁, x₂) only in f₂Each rectangular block center x₂2d+1 neighborhood in carry out dot product calculating, rather than entire f₂。

5. binocular solid matching process according to claim 1, which is characterized in that thick described in context information module Close piece is included 6 convolutional layers, and is attached between convolutional layer in dense mode, and the growth rate for connecting dimension every time is 16, in addition, each convolutional layer have expansion rate, respectively 1,2,4,8,16,1, finally, using the convolutional layer of a 1x1 to spy Sign figure carries out dimensionality reduction operation to facilitate building cost amount.

6. binocular solid matching process according to claim 1, which is characterized in that coding rank described in regularization module 6 convolution kernels of Duan Yingyong are that the convolutional layer of 3x3x3, wherein third and the step-length of the 5th convolutional layer are 2, remaining step-length is 1；The warp lamination for being 3x3x3 using 2 convolution kernels in decoding stage, step-length is 2.

7. binocular solid matching process according to claim 1, which is characterized in that can be micro- described in disparity computation module The soft argmin Operation Definition divided is as follows:

Wherein d ' refers to initial parallax figure, c_dRefer to that regularization characteristic pattern, d refer to possible parallax value, D_maxRefer to maximum parallax value, σ () refers to softmax function.

8. binocular solid matching process according to claim 1, which is characterized in that the parallax of the parallax refinement module Refinement operation process is as follows: firstly, initial parallax is desired to make money or profit and is upsampled to bilinear interpolation and M_fThe dimension of same size, and and M_fIt merges；It then is 3x3 by a convolution kernel, the convolutional layer that channel is 32, output result is again by with swollen 6 residual blocks that swollen rate is 1,2,4,8,1,1；Then, it is 1 that a dimension is sent into the output of residual block, and convolution kernel is the volume of 3x3 Lamination, which does not have BN and ReLU, and output is added with previous disparity map；Finally, true using a ReLU The parallax value for protecting prediction is positive；By M_fIt is replaced by M_cThe primary step is repeated, output result is final parallax；Described Parallax refinement module output final parallax dimension be H × W × 1, wherein H, W respectively indicate original input picture height and It is wide.

9. a kind of binocular solid matching process based on convolutional neural networks according to claim 1, which is characterized in that step Loss function described in rapid 3 is specific as follows:

Wherein, N is the quantity of true disparity map pixel, and d is true disparity map, and d ' is prediction disparity map, d_iIt is prediction disparity map Each pixel, d '_iIt is each pixel of true disparity map, x is d_i-d′_i。

Technical field

The present invention relates to the fields such as the robot navigation of computer vision, three-dimensional reconstruction, and in particular to one kind is based on convolution The binocular solid matching process of neural network.

Background technique

The key problem that estimation of Depth is many stereoscopic vision tasks is carried out from stereo image pair, and in many fields On have an application, such as 3D reconstruction, unmanned, object detection, robot navigation and virtual reality, augmented reality etc..Three-dimensional The purpose matched is to estimate the corresponding relationship of all pixels point between two correcting images.The stereo-picture for providing a pair of of correction, depending on The purpose of difference estimation is the parallax d for calculating each pixel in reference picture.Parallax refers to that reference picture and target image a pair are corresponding Horizontal displacement between point.It is (x, y) for reference picture point pixel, if finding correspondence at target image (x-d, y) Pixel, then, this point depth can be calculated by fb/d, wherein f is the focal length of camera, and b is two cameras The distance between.

A kind of typical Stereo Matching Algorithm includes 4 steps: matching cost calculates, matching cost is poly- and disparity computation It is refined with parallax.For the overall performance of Stereo matching, each step plays the role of vital.Since depth is rolled up Product neural network powerful feature representation ability is all shown in various visual tasks, therefore, convolutional neural networks by It is applied in Stereo matching and goes to improve disparity estimation precision, and significant has been more than traditional method.Zbontar and LeCun Convolutional neural networks are firstly introduced to go to calculate pixel similarity (J.Zbontar and between two input pictures Y.LeCun.Stereo matching by training a convolutional neural network to compare Image patches.Journal of Machine Learning Research, 17 (1-32): 2,2016).They think The gray difference or artificial characteristics of image that pixel is only considered for matching cost are insecure；On the contrary, convolutional Neural net Network can learn more healthy and stronger, degree of having any different feature from image to improve Stereo matching cost.Follow this thought, Ruo Ganfang Method, which is suggested to make a return journey, promotes computational efficiency or matching precision.However, these methods still have some limitations.First, network mould Type usually can not accurately find the corresponding match point of pixel in occlusion area, the ill region of repetition texture and reflecting surface etc.. Second, the existing network operation is there are huge memory consumption and needs powerful calculation processing ability.Third, network requirement Several post-processing steps.

Summary of the invention

The present invention mainly uses the method for deep learning to handle input stereo pairs, continuous accurate to obtain Disparity map.It is that using residual block and the twin network of dense piece of construction and feature extraction is carried out to input stereo pairs first, so Building cost amount completes matching cost calculating afterwards.Then, cost is carried out to cost amount using small-sized encoding and decoding structure to gather and delay The error hiding of cost amount is solved, and initial parallax figure is predicted by soft argmin function regression.It is finally obtained using relevant layers special The similarity measurement for levying figure and guides refinement initial parallax figure, to obtain accurate disparity estimation.

To achieve the goals above, the present invention provides following schemes:

A kind of binocular solid matching process based on convolutional neural networks, which comprises

Step 1: data processing；

Step 2: building Stereo matching network；

Step 3: training network model；

Step 4: carrying out binocular solid matching using the Stereo matching network model that training is completed.

The data processing, specifically comprises the following steps:

Step 1: data set: unless otherwise specified, the data set left-side images are right as reference picture Side image is as corresponding target image, reference picture and target image as one group of stereo pairs.All stereo-pictures To all through overcorrection, i.e., offsetting in only the horizontal direction, vertical direction is without offset.

Step 2: pretreatment: concentrating each input stereo pairs to carry out random cropping data, cut size for 512 × 256, operation is then normalized to it, makes image pixel value range between [- 1,1].

The building Stereo matching network, specifically includes following module:

Module 1: initial characteristics extraction module

The initial characteristics extraction module be construct a shared weight twin network to input stereo pairs into Row feature extraction, input is input stereo pairs to be matched, and output is two unitary features.The wherein twin net Network is that the convolutional layer that 5 × 5, step-length is 2 carries out a down-sampling to input stereo pairs first with a convolution kernel, is connect down 2 residual error layers are further handled input stereo pairs being, wherein first residual error layer includes 3 residual blocks, the Two include 4 residual blocks.Each residual error block structure is BN-conv-BN-ReLU-conv-BN, wherein BN, conv and ReLU points Do not refer to batch normalization, convolutional layer and amendment linear unit, and convolution kernel is 3 × 3, characteristic dimension is 32, in addition to second First residual block step-length is 2 in residual error layer, remaining step-length is all 1.After above-mentioned convolution operation, the twin network it is defeated Being out two, wherein H, W respectively indicate the height and width of original input picture having a size of H/4 × W/4 × F unitary feature, and F is indicated Characteristic dimension.

Module 2: relevant layers module

The relevant layers module is between the stereo pairs of first residual error layer output of twin network and original Execute rectangular block dot product operations respectively between input stereo pairs to obtain the similitude of two groups of stereo pairs, i.e. relevant layers M_fWith relevant layers M_c, input is stereo pairs, and output is the relevant layers comprising similarity measurement.For single pass unitary Feature, wherein the rectangular block dot product operations can be as given a definition:

c(x₁, x₂)=∑_{o∈[-k,k]×[-k,k]}<f₁(x₁+o),f₂(x₂+o)> (1)

Wherein f₁、f₂Refer to two single channel unitary features of input, x₁、x₂Respectively refer to f₁、f₂Rectangular block in unitary feature Center, k refer to the size of rectangular block.The effect that relevant layers execute is to allow f₁In each rectangular block and f₂In each rectangular block into Row compares.Maximum displacement d is provided, for f₁Each rectangular block center x₁, relevant layers c (x₁, x₂) only in f₂Each rectangular block Center x₂2d+1 neighborhood in carry out dot product calculating, rather than entire f₂.The displacement for limiting related layer operation can effectively reduce calculating Amount.

Relevant layers can effectively reflect the similitude of two input stereo pairs.Two are needed in parallax elaboration phase Relevant layers instruct parallax Refinement operation, it may be assumed that the stereo pairs output of first residual error layer constitutes thick (d=20) relevant layers M_f, stereo pairs are originally inputted as thin (d=10) relevant layers M_c。

Module 3: context information module

The context information module is dense piece of building and contextual information is added for two unitary features, and input is Two unitary features, output are two characteristic patterns comprising contextual information.Wherein described dense piece includes 6 convolutional layers, And be attached between convolutional layer in dense mode, the growth rate for connecting dimension every time is 16.In addition, each convolutional layer band There are expansion rate, respectively 1,2,4,8,16,1.Sense can be further increased under the premise of not changing input feature vector dimension size By open country, and assembles more contextual informations on different scale in the form of dense connection and can effectively alleviate ill region Error hiding.Finally, the convolutional layer using a 1x1 carries out dimensionality reduction operation to characteristic pattern to facilitate building cost amount.Described is upper Two characteristic pattern dimensions comprising contextual information of context information module output are H/4 × W/4 × F, and wherein H, W are respectively indicated The height and width of original input picture, F indicate characteristic dimension.

Module 4: cost amount module

The cost amount module is that the characteristic pattern building cost amount using two comprising contextual information calculates matching generation Valence, input are two characteristic patterns comprising contextual information, and output is a cost amount.The wherein calculating matching cost Be will comprising contextual information fixed reference feature figure and the corresponding target signature comprising contextual information between it is each can It can be attached under parallax, and be wrapped into a 4D cost amount.The cost amount dimension of the described cost amount module output is H/4 × W/4 × (D+1)/4 × F, wherein H, W respectively indicate the height and width of original input picture, and D indicates maximum possible parallax Value, F indicate characteristic dimension.

Module 5: regularization module

The regularization module is to learn a canonical in cost amount using a compact small-sized encoding and decoding structure Function come carry out cost it is poly- and, input is cost amount, and output is regularization characteristic pattern.The wherein small-sized encoding and decoding structure Including coding and decoding two stages.Coding stage includes 6 3D convolutional layers, and each encoded hierarchy is using two convolution kernels The convolutional layer of 3x3x3, and only first convolutional layer is followed by a BN and ReLU.In addition, third and the 5th convolutional layer Step-length be 2, remaining step-length be 1.Two 3D warp laminations are only applied to be up-sampled in decoding stage, step-length 2, and The characteristic pattern of corresponding dimension is added from coding stage before each up-sampling to retain coarse high layer information and detailed low Layer information.Finally, being further reduced characteristic dimension using two 3D convolutional layers is 1.The canonical of the regularization module output Change characteristic pattern dimension is H/4 × W/4 × (D+1)/4 × 1, and wherein H, W respectively indicate the height and width of original input picture, and D is indicated Maximum possibility parallax value.

Module 6: disparity computation module

The disparity computation module is the view using a differentiable soft argmin operation in regularization characteristic pattern It carries out parallax in poor dimension to return to predict smooth continuous initial parallax figure, input is regularization characteristic pattern, and output is just Beginning characteristic pattern.Wherein the differentiable soft argmin Operation Definition is as follows:

Wherein d ' refers to initial parallax figure, c_dRefer to that regularization characteristic pattern, d refer to possible parallax value, D_maxRefer to maximum parallax Value, σ () refer to softmax function.Initial characteristics figure d ' can be summed by the product to each parallax d and its probability value and be obtained , and the probability of each parallax d can use σ () function and calculate regularization characteristic pattern c_dIt obtains.The disparity computation mould The initial parallax figure dimension of block output is H/4 × W/4 × 1, and wherein H, W respectively indicate the height and width of original input picture.

Module 7: parallax refinement module

The task of the parallax refinement module is to find an increment graph to add deduct on initial parallax figure with further Disparity estimation is refined, input is initial parallax figure, and output is final parallax.Two relevant layers M are given in the block 2_f、 M_cDefinition, this stage utilize M_f、M_cTo instruct parallax Refinement operation.Wherein the parallax Refinement operation process is as follows: first First, initial parallax is desired to make money or profit is upsampled to and M with bilinear interpolation_fThe resolution ratio of same size, and same M_fIt merges.Then It is 3x3 by a convolution kernel, the convolutional layer that channel is 32, output result is again by being 1,2,4,8,1,1 with expansion rate 6 residual blocks.Then, it is 1 that a dimension is sent into the output of residual block, and convolution kernel is the convolutional layer of 3x3, which does not have BN and ReLU, and output is added with previous disparity map.Finally, ensuring that the parallax value predicted is using a ReLU Just.By M_fIt is replaced by M_cThe primary step is repeated, output result is final parallax.The parallax refinement module output Final parallax dimension be H × W × 1, wherein H, W respectively indicate the height and width of original input picture.

The trained network model, specifically comprises the following steps:

Step 1: the model that training dataset stereo pairs input to Stereo matching network is subjected to propagated forward training, The learning parameter of the model includes weight and biasing, and random initializtion parameter trains network model from the beginning.

Step 2: introducing smooth loss function L₁:

Wherein, N is the quantity of true disparity map pixel, and d is true disparity map, and d ' is prediction disparity map, d_iIt is prediction view Each pixel of poor figure, d '_iIt is each pixel of true disparity map, x is d_i-d′_i.According to L₁Loss function utilizes batch Gradient descent method carries out backpropagation, the learning parameter of more new model, including weight and biasing.

Step 3: repeating step 1 and step 2, continuous repetitive exercise network model parameter, to obtain optimal Stereo matching Network model.

Binocular solid matching is carried out using the Stereo matching network model that training is completed.

The utility model has the advantages that

The present invention provides a kind of binocular solid matching process based on convolutional neural networks, strictly in accordance with Stereo Matching Algorithm 4 steps, including matching cost calculates, matching cost is poly- and disparity computation and parallax refinement, and the detailed each step of design Suddenly, while by 4 steps it is integrated into a network, end-to-end network can be trained.Stereo matching side of the invention Method integrates the error hiding that contextual information effectively alleviates pixel in ill region, regularization in characteristic extraction procedure The middle-size and small-size encoding and decoding structure of journey reduces EMS memory occupation and runing time during training/supposition significantly, and regression forecasting is sub- The disparity map of pixel scale, while initial parallax figure is further refined using similarity measurement, improve parallax precision of prediction.

Detailed description of the invention

Fig. 1 is the flow through a network figure provided by the invention based on convolutional neural networks binocular solid matching process；

Fig. 2 is the network structure provided by the invention based on convolutional neural networks binocular solid matching process；

Fig. 3 is reference picture and target image to be matched in KITTI2015 data set provided in an embodiment of the present invention Schematic diagram: wherein Fig. 3 (a) is reference picture, Fig. 3 (b) is target image；

Fig. 4 is the disparity map of embodiment stereo pairs in the KITTI2015 data set obtained using inventive method.

Specific embodiment

The object of the present invention is to provide a kind of binocular solid matching process based on convolutional neural networks, can be complete end-to-end At the training of network, be not necessarily to any last handling process, with solve it is existing based on the solid matching method of convolutional neural networks in disease State region can not accurately find the problem of pixel Corresponding matching point, while the memory during reducing training/supposition with can dramatically accounts for With and runing time.

The present invention is described in detail below in conjunction with attached drawing, it is noted that described embodiment is only intended to just In the understanding of the present invention, and any restriction effect is not played to it.

Fig. 1 is the flow through a network figure of the binocular solid matching process provided by the invention based on convolutional neural networks.

Fig. 2 is the network structure of the binocular solid matching process provided by the invention based on convolutional neural networks.This hair The binocular solid matching process based on convolutional neural networks of bright offer specifically includes:

Step 1: data processing；To containing true parallax value left images carry out random cropping, cut size be 512 × 256, the image after cutting is normalized, makes the range of image pixel value between [- 1,1].Default left-side images For reference picture, image right is target image, and one group of stereo pairs is made of reference picture and target image.The training Sample stereo pairs are FlyingThings3D data set, and migration sample stereo pairs are KITTI2015 data set.

Step 2: building Stereo matching network；Firstly, learning a kind of for calculating the depth representing of Stereo matching cost.It is logical It is often used a character representation, rather than calculates Stereo matching cost using original pixel intensities.The inspiration sub by description, For the ambiguousness of illuminating surface, character representation be it is more healthy and stronger, therefore, input picture stereo pairs pass through 7 residual errors first Layer extracts depth characteristic and indicates.It, will using the dense layer comprising 6 convolutional layers in order to preferably solve the error hiding in ill region Contextual information is integrated into cost matching.In next step, unitary feature is referred to corresponding target unitary feature every by each It is attached to form a 4D cost amount to find the correspondence between two input stereo pairs pixels under a possibility parallax. Matching cost calculating provides similitude initial between stereo pairs, and cost is poly- and the stage can obtain more healthy and stronger view Difference prediction.In this regard, propose the small-sized encoding and decoding structure regularization cost amount of a 3D, while significant reducing the training/supposition phase Between EMS memory occupation and runing time.Then, the parallax dimension in cost amount is operated using a differentiable soft argmin The smooth continuous initial parallax figure of upper progress parallax regression forecasting.Specifically, being calculated often in cost amount using softmax operation The probability of a parallax.It predicts that parallax can sum by the product to each parallax and its probability value to obtain.It is refined in parallax Stage instructs the residual block of expansion convolution to generate the residual plot of parallax refinement using similarity measurement.Initial parallax figure and view For the sum of the residual plot of difference refinement as final disparity map, which clearly corrects refinement initial parallax figure.

Step 3: training network model: first by pretreated training dataset FlyingThings3D stereo pairs It is input in the model of Stereo matching network and carries out propagated forward training, the learning parameter of the model includes weight and biasing.So Afterwards, output disparity map and true disparity map are input to L₁In loss function, backpropagation is carried out using batch gradient descent method. Finally, updating the learning parameter of iterative model repeatedly according to gradient to obtain optimal Stereo matching network model.

Step 4: transfer learning；

Stereo matching network model is obtained by step 3, utilizes migrating data collection by way of transfer learning now The test that KITTI2015 stereo pairs carry out actual scene (if training dataset selects the image of actual scene, is not necessarily to Transfer learning is carried out again, can directly carry out binocular solid matching after training).Fig. 3 be it is provided in an embodiment of the present invention to Matched stereo pairs.Wherein Fig. 3 (a) is reference picture, and 3 (b) be target image.In the present embodiment, the reality to be matched The stereo pairs for applying example are extracted from KITTI2015 data set.With reference to Fig. 1 and Fig. 2, the present invention is based on convolutional Neural nets It is (described to carry out transfer learning explanation using the stereo pairs of embodiment in KITTI2015 data set for the solid matching method of network 3 rank tensor dimensions are H × W × F, and 4 rank tensor dimensions are H × W × D × F, and H, W respectively indicate the height and width of original input picture, D indicates maximum possible parallax value, and being defaulted as 192, F indicates characteristic dimension):

1) by the figure of the stereo pairs progress random cropping of embodiment in KITTI2015 data set to 512 × 256 sizes As block, then it is normalized, makes image pixel value range between [- 1,1], it, will after completing pretreatment stage Stereo pairs are input in trained Stereo matching network.

2) as shown in Fig. 2, the input stereo pairs to embodiment carry out feature extraction.Firstly, right using 2 residual error layers Stereo pairs carry out feature spy and take, and then utilize dense piece comprising 6 dense connection convolutional layers to integrate contextual information, Initial characteristics dimension is 32, growth rate 16.Output characteristic pattern dimension is 128 × 64 × 128 at this time.Then a convolution is utilized The convolutional layer that core is 1x1, characteristic dimension is 32 carries out dimensionality reduction, to facilitate building cost amount.

3) stereo pairs of output are cascaded up and forms tetradic building cost amount.Characteristic pattern dimension is exported at this time It is 128 × 64 × 48 × 32.The tensor first passes through the cataloged procedure comprising 6 3D convolution, then passes through on twice again and adopts Sample, exporting characteristic pattern dimension at this time is 128 × 64 × 48 × 32.Then, it is input in two 3D convolution and is carrying out cost just respectively It is then 1 with reduction characteristic dimension, exporting characteristic pattern dimension at this time is 128 × 64 × 48 × 1.

4) calculating of initial parallax.In cost amount c_dThe upper probability that each parallax d is calculated using softmax operation σ (). It predicts that parallax d ' can sum by the product to each parallax d and its probability value to obtain.Formula is as follows:

Parallax regression forecasting smoothly continuous initial parallax figure is carried out in the parallax dimension of cost amount using aforesaid operations. Output characteristic pattern dimension is 128 × 64 × 1 at this time.

5) as shown in Figure 2, using relevant layers twice as guidance, residual error layer carries out parallax Refinement operation, will give birth to every time At residual plot be added to obtain final disparity map with previous disparity map.First disparity map dimension is 256 × 128 × 1, the Two disparity map dimensions are 512 × 256 × 1.It is just restored to original input picture size, this has benefited from each refinement network all A bilinear interpolation operation is first passed through to be up-sampled.

6) output disparity map and true disparity map are input to L₁In loss function, carried out using batch gradient descent method anti- To propagation.Finally, the learning parameter of iterative model is repeatedly updated according to gradient, including weight and biasing, it is optimal to obtain training Stereo matching network model.

Binocular solid matching is carried out using the network that training obtains after the completion of transfer learning.

The disparity map of embodiment stereo pairs in the KITTI2015 data set that Fig. 4 uses the method for the present invention to obtain.According to Parallax prediction result in Fig. 4 can not accurately find pixel matching point in ill region it is found that the method for the present invention efficiently solves The problem of, and without any last handling process.Handling whole KITTI2015 data images (1242 × 375) can reach 5Hz, Compared to existing Stereo matching network, the speed of service dduring test is significantly improved.

The above, the only specific embodiment in the present invention, but scope of protection of the present invention is not limited thereto, appoints What is familiar with the people of the technology within the technical scope disclosed by the invention, it will be appreciated that the transformation and substitution expected should all be covered Within scope of the invention, therefore, the scope of protection of the invention shall be subject to the scope of protection specified in the patent claim.

13页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：桥梁裂缝宽度高精度测量方法及测量装置

A kind of binocular solid matching process based on convolutional neural networks

相关技术

网友询问留言