Depth imaging method, electronic device, and computer-readable storage medium

文档序号：363227 发布日期：2021-12-07 浏览：28次中文

阅读说明：本技术 深度成像方法、电子设备和计算机可读存储介质 (Depth imaging method, electronic device, and computer-readable storage medium ) 是由王亚运薛远曹天宇户磊于 2021-08-24 设计创作，主要内容包括：本申请实施例涉及机器视觉技术领域,公开了一种深度成像方法、电子设备和计算机可读存储介质。上述深度成像方法包括：提取获取到的物体图的特征和预设的参考图的特征,得到所述物体图对应的特征图和所述参考图对应的特征图；根据所述物体图对应的特征图、所述参考图对应的特征图和预设的搜索范围,计算所述物体图与所述参考图之间的关联性；其中,所述搜索范围包括行方向上的搜索范围和列方向上的搜索范围；根据所述关联性,获取所述物体图与所述参考图之间的列偏差；根据所述物体图与所述参考图之间的列偏差,获取所述物体图对应的深度图,可以大幅提升深度成像方法对行偏差的鲁棒性,从而提升获得的深度图的质量。(The embodiment of the application relates to the technical field of machine vision, and discloses a depth imaging method, electronic equipment and a computer-readable storage medium. The depth imaging method comprises the following steps: extracting the characteristics of the obtained object image and the characteristics of a preset reference image to obtain a characteristic image corresponding to the object image and a characteristic image corresponding to the reference image; calculating the relevance between the object map and the reference map according to the feature map corresponding to the object map, the feature map corresponding to the reference map and a preset search range; wherein the search range includes a search range in a row direction and a search range in a column direction; acquiring column deviation between the object image and the reference image according to the relevance; and acquiring a depth map corresponding to the object map according to the column deviation between the object map and the reference map, so that the robustness of the depth imaging method on the row deviation can be greatly improved, and the quality of the obtained depth map is improved.)

1. A depth imaging method, comprising:

extracting the characteristics of the obtained object image and the characteristics of a preset reference image to obtain a characteristic image corresponding to the object image and a characteristic image corresponding to the reference image;

calculating the relevance between the object map and the reference map according to the feature map corresponding to the object map, the feature map corresponding to the reference map and a preset search range; wherein the search range includes a search range in a row direction and a search range in a column direction;

acquiring column deviation between the object image and the reference image according to the relevance;

and acquiring a depth map corresponding to the object map according to the column deviation between the object map and the reference map.

2. The depth imaging method of claim 1, wherein the size of the feature map corresponding to the object map is: n is multiplied by H by W, wherein N is the number of channels, H is the height of the object diagram, and W is the width of the object diagram;

before acquiring column deviation between the object map and the reference map according to the correlation, the method further includes:

traversing each position of the feature map corresponding to the object map, and determining a matching cost space [ (2r +1) × (2c +1), H, W ], where (2r +1) × (2c +1) is a channel number corresponding to the matching cost space, of the feature map corresponding to the object map based on the search range;

the obtaining of the column deviation between the object map and the reference map according to the correlation includes:

according to the relevance, performing aggregation calculation on the matching cost space, and acquiring an optical flow parallax matrix with the number of channels being 2 corresponding to the object image: 2 × H × W, the channels of the optical flow disparity matrix include a first channel representing a row deviation between the object map and the reference map and a second channel representing a column deviation between the object map and the reference map.

3. The depth imaging method according to claim 1, wherein the object map is a multi-frame image, and the acquiring column deviations between the object map and the reference map includes:

acquiring column deviation between the ith frame of the object image and the reference image; wherein i is an integer greater than 0;

if i is greater than 1, the acquiring a column deviation between the ith frame of the object map and the reference map includes:

acquiring column deviation between the ith frame of the object image and the (i-1) th frame of the object image;

and acquiring the column deviation between the ith frame and the reference image according to the column deviation between the ith frame and the (i-1) th frame and the column deviation between the (i-1) th frame and the reference image.

4. The depth imaging method according to claim 3, wherein if i is equal to 1, the preset search range is a first search range;

if the i is larger than 1, the preset search range is a second search range; wherein the first search range is greater than the second search range.

5. The depth imaging method according to any one of claims 1 to 4, wherein the preset search range is: [ -r, r ] × [ -c, c ], wherein [ -r, r ] is a search range in the row direction, and [ -c, c ] is a search range in the column direction;

determining the relevance between the object map and the reference map according to the feature map corresponding to the object map, the feature map corresponding to the reference map and a preset search range by the following formula:

wherein, the c (x)₁R, c) is used to represent the correlation between the object map and the reference map, f₁(x₁) Corresponding characteristic diagram of the object diagram is in x₁Feature vector at location, said f₂(x₁+ o) is said reference picture in x₁Is positioned at the centerFeature vectors within a range.

6. The depth imaging method of claim 2, wherein the depth imaging method is implemented based on a pre-trained optical flow convolutional neural network, comprising:

inputting the obtained object image into the optical flow convolution network, and obtaining a depth map corresponding to the object image output by the optical flow convolution network;

the optical flow convolution network comprises a feature extraction unit, a matching cost space construction unit and an aggregation unit;

the feature extraction unit is used for extracting the features of the object map and the features of the reference map to obtain a feature map corresponding to the object map and a feature map corresponding to the reference map;

the matching cost space construction unit is used for determining the relevance between the object map and the reference map according to the feature map corresponding to the object map, the feature map corresponding to the reference map and the search range;

the matching cost space construction unit is further configured to traverse each position of the feature map corresponding to the object map, and determine a matching cost space of the feature map corresponding to the object map based on the search range;

the aggregation unit is used for performing aggregation calculation on the matching cost space according to the relevance to acquire column deviation between the object map and the reference map, and acquiring a depth map corresponding to the object map according to the column deviation between the object map and the reference map.

7. The depth imaging method according to claim 6, wherein the object map is a multi-frame image, the optical-flow convolution network includes a first optical-flow convolution network and a second optical-flow convolution network, the first optical-flow convolution network corresponds to a first search range, the second optical-flow convolution network corresponds to a second search range, and the first search range is larger than the second search range;

the inputting the obtained object map into the optical flow convolution network and obtaining the depth map corresponding to the object map output by the optical flow convolution network comprises:

inputting the 1 st frame of the object image into the first optical flow convolution network, and acquiring a depth image corresponding to the 1 st frame output by the first optical flow convolution network;

inputting the ith frame of the object image into the second optical flow convolution network, and acquiring a depth image corresponding to the ith frame output by the second optical flow convolution network; wherein i is an integer greater than 1.

8. The depth imaging method of claim 5 or 6, wherein the pre-trained optical flow convolution network is trained by:

acquiring a three-dimensional model data set; wherein the three-dimensional model dataset comprises a plurality of three-dimensional models;

performing virtual imaging and three-dimensional rendering on the three-dimensional model according to a preset virtual monocular structured light camera to obtain a first training sample set; wherein training samples in the first set of training samples comprise an object map, a reference map, and optical flow truth data;

training an initial optical flow convolution network according to the first training sample set;

adjusting a rotation matrix and/or a translation matrix corresponding to the virtual monocular structured light camera, and performing virtual imaging and three-dimensional rendering on the three-dimensional model according to the adjusted virtual monocular structured light camera to obtain a second training sample set; wherein training samples in the second set of training samples comprise an object map, a reference map, and optical flow truth data;

and training the optical flow convolution network trained by the first training sample set according to the second training sample set to obtain the pre-trained optical flow convolution network.

9. An electronic device, comprising:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the depth imaging method of any one of claims 1 to 8.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the depth imaging method of any one of claims 1 to 8.

Technical Field

The embodiment of the application relates to the technical field of machine vision, in particular to a depth imaging method, electronic equipment and a computer-readable storage medium.

Background

The depth perception technology based on machine vision has the characteristics of no contact, high speed, high precision, wide application range and the like, is widely applied to the fields of three-dimensional printing, medical imaging, somatosensory equipment, geographic mapping, industrial measurement, three-dimensional film and television, game manufacturing and the like, further expands the perception capability of people on three-dimensional information due to the popularization of the depth perception technology, has important application value in the current scientific research and engineering technology, an image obtained based on the depth perception imaging technology is called a depth map, and the pixel value of the depth map can reflect the distance from an object in a scene to a camera.

Most of the currently common monocular structured light depth imaging methods based on the convolutional network simply output the column deviation between the object image and the reference image through the convolutional network, and the depth image corresponding to the object image is calculated according to the column deviation.

However, the monocular structured light camera is inevitably subjected to conditions such as collision, impact, scratch and the like during use, the collision, the impact and the scratch can cause the change of the internal structure of the monocular structured light camera, meanwhile, the monocular structured light camera is also easily influenced by the temperature change, the structural change and the temperature change can cause a large line deviation between the object image and the reference image, the quality of the calculated depth image can be seriously reduced due to the line deviation, and the expansion of the application scene and the application range of the depth imaging method is not facilitated.

Disclosure of Invention

An object of the embodiments of the present application is to provide a depth imaging method, an electronic device, and a computer-readable storage medium, which can greatly improve robustness of the depth imaging method to a line deviation, so as to improve quality of an obtained depth map.

In order to solve the above technical problem, an embodiment of the present application provides a depth imaging method, including the following steps: extracting the characteristics of the obtained object image and the characteristics of a preset reference image to obtain a characteristic image corresponding to the object image and a characteristic image corresponding to the reference image; calculating the relevance between the object map and the reference map according to the feature map corresponding to the object map, the feature map corresponding to the reference map and a preset search range; wherein the search range includes a search range in a row direction and a search range in a column direction; acquiring column deviation between the object image and the reference image according to the relevance; and acquiring a depth map corresponding to the object map according to the column deviation between the object map and the reference map.

An embodiment of the present application further provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the depth imaging method described above.

Embodiments of the present application also provide a computer-readable storage medium storing a computer program which, when executed by a processor, implements the above-described depth imaging method.

In the depth imaging method, the electronic device, and the computer-readable storage medium provided by the embodiments of the present application, the features of the obtained object map and the features of the preset reference map are extracted to obtain the feature map corresponding to the object map and the feature map corresponding to the reference map, the features of the object map are searched on the feature map corresponding to the reference map according to the feature map corresponding to the object map, the preset search range in the row direction, and the search range in the column direction, so as to calculate the correlation between the object map and the reference map, and obtain the column offset between the object map and the reference map according to the calculated correlation, and finally obtain the depth map corresponding to the object map according to the column offset between the object map and the reference map. Therefore, the preset search range comprises the search range in the column direction and the search range in the row direction, the acquired column deviation comprises the influence of the row deviation on the column deviation, the method is more scientific and accurate, the real situation of the monocular structured light camera is met, the robustness of the depth imaging method on the row deviation is greatly improved, and the quality of the obtained depth image is improved.

In addition, the dimensions of the feature map corresponding to the object map are as follows: n is multiplied by H by W, wherein N is the number of channels, H is the height of the object diagram, and W is the width of the object diagram; before acquiring column deviation between the object map and the reference map according to the correlation, the method further includes: traversing each position of the feature map corresponding to the object map, and determining a matching cost space [ (2r +1) × (2c +1), H, W ], where (2r +1) × (2c +1) is a channel number corresponding to the matching cost space, of the feature map corresponding to the object map based on the search range; the obtaining of the column deviation between the object map and the reference map according to the correlation includes: according to the relevance, performing aggregation calculation on the matching cost space, and acquiring an optical flow parallax matrix with the number of channels being 2 corresponding to the object image: the method comprises the steps of determining a matching cost space, performing aggregation calculation on the matching cost space based on relevance, aggregating features of all dimensions in a row direction and a column direction, further improving accuracy of the determined row deviation and column deviation, and meanwhile, better considering influence of the row deviation on the column deviation by using the matching cost space, and further improving robustness of a depth imaging method on the row deviation.

In addition, the acquiring column deviation between the object map and the reference map includes: acquiring column deviation between the ith frame of the object image and the reference image; wherein i is an integer greater than 0; if i is greater than 1, the acquiring a column deviation between the ith frame of the object map and the reference map includes: acquiring column deviation between the ith frame of the object image and the (i-1) th frame of the object image; according to the column deviation between the ith frame and the (i-1) th frame and the column deviation between the (i-1) th frame and the reference image, the column deviation between the ith frame and the reference image is obtained.

In addition, if the i is equal to 1, the preset search range is a first search range; if the i is larger than 1, the preset search range is a second search range; the first search range is larger than the second search range, and the difference between a subsequent frame and a previous frame is not large, so that when the depth map corresponding to the subsequent frame is obtained, the accurate column deviation can be obtained by using the smaller second search range, the time consumed by calculation is further reduced, and the generation speed of the depth map is increased.

In addition, the depth imaging method is realized based on a pre-trained optical flow convolution neural network, and comprises the following steps: inputting the obtained object image into the optical flow convolution network, and obtaining a depth map corresponding to the object image output by the optical flow convolution network; the optical flow convolution network comprises a feature extraction unit, a matching cost space construction unit and an aggregation unit; the feature extraction unit is used for extracting the features of the object map and the features of the reference map to obtain a feature map corresponding to the object map and a feature map corresponding to the reference map; the matching cost space construction unit is used for determining the relevance between the object map and the reference map according to the feature map corresponding to the object map, the feature map corresponding to the reference map and the search range; the matching cost space construction unit is further configured to traverse each position of the feature map corresponding to the object map, and determine a matching cost space of the feature map corresponding to the object map based on the search range; the aggregation unit is used for performing aggregation calculation on the matching cost space according to the relevance to obtain column deviation between the object image and the reference image, obtaining a depth image corresponding to the object image according to the column deviation between the object image and the reference image, and performing depth imaging by using a pre-trained optical flow convolution network, so that the speed of depth imaging can be further improved, and the requirement of a user in depth imaging by using a monocular structured light camera is better met.

In addition, the object graph is a multi-frame image, the optical flow convolution network comprises a first optical flow convolution network and a second optical flow convolution network, the first optical flow convolution network corresponds to a first search range, the second optical flow convolution network corresponds to a second search range, and the first search range is larger than the second search range; the inputting the obtained object map into the optical flow convolution network and obtaining the depth map corresponding to the object map output by the optical flow convolution network comprises: inputting the 1 st frame of the object image into the first optical flow convolution network, and acquiring a depth image corresponding to the 1 st frame output by the first optical flow convolution network; inputting the ith frame of the object image into the second optical flow convolution network, and acquiring a depth image corresponding to the ith frame output by the second optical flow convolution network; the method includes the steps that i is an integer larger than 1, two optical flow convolution networks are trained in advance, the first optical flow convolution network is large in search range and suitable for obtaining a depth map corresponding to the 1 st frame of an object map, the second optical flow convolution network is small in search range and suitable for obtaining a depth map of a subsequent frame of the object map, and therefore time consumed by depth imaging is further shortened.

Additionally, the pre-trained optical flow convolution network is trained by: acquiring a three-dimensional model data set; wherein the three-dimensional model dataset comprises a plurality of three-dimensional models; performing virtual imaging and three-dimensional rendering on the three-dimensional model according to a preset virtual monocular structured light camera to obtain a first training sample set; wherein training samples in the first set of training samples comprise an object map, a reference map, and optical flow truth data; training an initial optical flow convolution network according to the first training sample set; adjusting a rotation matrix and/or a translation matrix corresponding to the virtual monocular structured light camera, and performing virtual imaging and three-dimensional rendering on the three-dimensional model according to the adjusted virtual monocular structured light camera to obtain a second training sample set; wherein training samples in the second set of training samples comprise an object map, a reference map, and optical flow truth data; according to the second training sample set, the optical flow convolution network trained by the first training sample set is trained to obtain the pre-trained optical flow convolution network, the row deviation and the column deviation between the object image and the reference image output by the optical flow convolution network need to be accurate as much as possible.

Drawings

One or more embodiments are illustrated by the corresponding figures in the drawings, which are not meant to be limiting.

FIG. 1 is a first flow chart of a depth imaging method according to one embodiment of the present application;

FIG. 2 is a flow chart two of a depth imaging method according to another embodiment of the present application;

FIG. 3 is a schematic diagram of a model structure of an optical flow convolution network provided in an embodiment in accordance with the present application;

FIG. 4 is a flow chart for obtaining column offsets between an ith frame of an object map and a reference map, according to one embodiment of the present application;

FIG. 5 is a flow chart of training a convolutional optical network, according to one embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device according to another embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present application clearer, the embodiments of the present application will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that in the examples of the present application, numerous technical details are set forth in order to provide a better understanding of the present application. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments. The following embodiments are divided for convenience of description, and should not constitute any limitation to the specific implementation manner of the present application, and the embodiments may be mutually incorporated and referred to without contradiction.

One embodiment of the present application relates to a depth imaging method applied to an electronic device; the electronic device may be a monocular structured light camera itself or a server built in the monocular structured light camera, and the electronic device in this embodiment and the following embodiments is described by taking the server built in the monocular structured light camera as an example.

A specific flow of the depth imaging method of this embodiment may be as shown in fig. 1, and includes:

step 101, extracting the features of the obtained object image and the features of a preset reference image to obtain a feature image corresponding to the object image and a feature image corresponding to the reference image.

Specifically, after acquiring an object map shot by a monocular structured light camera, a server may extract features of the object map, extract features of a preset reference map corresponding to the object map, obtain feature maps corresponding to the object map and the reference map, and the obtained feature map may be a three-dimensional feature map, where the preset reference map may be set by a person skilled in the art according to actual needs, and this is not specifically limited in the embodiment of the present application.

In an example, the extracting of the features of the object map and the extracting of the features of the reference map may be implemented based on a further compression network, such as a network of a DenseNet-BC structure, where the network of the DenseNet-BC structure performs deep feature calculation and extraction on the object map and the reference map through a plurality of consecutive two-dimensional convolution layers, respectively, to obtain a feature map corresponding to the object map and a feature map corresponding to the reference map, and when performing feature extraction, feature reuse and bypass setting may be performed, which may alleviate problems such as gradient disappearance and model degradation to a certain extent, and may also expand a receptive field by techniques such as spatial pyramiding and dilation convolution, and reduce image resolutions of the object map and the reference map to one fourth of the original ones, so that the object map and the reference map have a feature of weight sharing.

And 102, calculating the relevance between the object graph and the reference graph according to the feature graph corresponding to the object graph, the feature graph corresponding to the reference graph and a preset search range.

Specifically, after obtaining the feature map corresponding to the object map and the feature map corresponding to the reference map, the server may search the feature map corresponding to the reference map within a preset search range based on the feature map corresponding to the object map, and calculate the association between the object map and the reference map according to the search result, where the preset search range may be set by a person skilled in the art according to actual needs, and this is not specifically limited in the embodiment of the present application.

In one example, the server records the feature map corresponding to the object map as f₁Reference figure corresponds to a characteristic figure denoted as f₂The server calculates the relevance of each image block of the two feature maps, and the preset search range is as follows: [ -r, r [ -r [ - ]]×[-c,c]Wherein [ -r, r [ -r]As a search range in the row direction, [ -c, c]For the search range in the column direction, the server can calculate f by the following formula₁Middle position x₁Characteristic of (a) and (f)₂InOf features within a regionRelevance: c (x)₁,r,c)＝∑_{o∈[-r,r]×[-c,c]}f₁(x₁)·f₂(x₁+ o) formula, wherein c (x)₁R, c) for relevance, f₁(x₁) Feature map corresponding to object map is in x₁Feature vector at location, f₂(x₁+ o) is a reference picture in x₁Is positioned at the centerFeature vectors within a range.

And 103, acquiring column deviation between the object image and the reference image according to the relevance.

And 104, acquiring a depth map corresponding to the object map according to the column deviation between the object map and the reference map.

In a specific implementation, after the server calculates the correlation between the object map and the reference map, the server may obtain the row deviation and the column deviation between the object map and the reference map according to the correlation, obtain the depth map corresponding to the object map according to the column deviation between the object map and the reference map, and output the row deviation between the object map and the reference map while outputting the depth map, so that the user may check and correct the internal structure of the monocular structured light camera.

In one example, the server may obtain the depth map corresponding to the object map by performing a linear transformation on the column deviation between the object map and the reference map, where the linear transformation formula may be:in the formula, Z represents the depth value of a pixel point in the depth map, f is the focal length of the monocular structured light camera, L is the base length of the monocular structured light camera depth imaging system, and d is the column deviation value of the pixel point of the object map.

In one example, according to the column deviation between the object map and the reference map, obtaining the depth map corresponding to the object map can be realized by replacing an ordinary two-dimensional convolution layer with a convolution unit structure of the ShuffleNet-V2 version, and the structure can greatly improve the depth imaging speed without reducing the depth imaging precision.

In this embodiment, compared to the technical solution of simply determining the column deviation between the object map and the reference map and calculating the depth map corresponding to the object map according to the column deviation, firstly extracting the features of the obtained object map and the features of the preset reference map to obtain the feature map corresponding to the object map and the feature map corresponding to the reference map, then searching the features of the object map on the feature map corresponding to the reference map according to the feature map corresponding to the object map, the preset search range in the row direction and the search range in the column direction, so as to calculate the correlation between the object map and the reference map and obtain the column deviation between the object map and the reference map according to the calculated correlation, and finally obtaining the depth map corresponding to the object map according to the column deviation between the object map and the reference map, in the embodiment of the present application, when determining the column deviation, the column deviation is not simply determined based on the column direction, the influence of the row deviation on the column deviation is also considered, so that the preset search range comprises the search range in the column direction and the search range in the row direction, the obtained column deviation comprises the influence of the row deviation on the column deviation, the method is more scientific and accurate, the true situation of the monocular structured light camera is met, the robustness of the depth imaging method on the row deviation is greatly improved, and the quality of the obtained depth image is improved.

Another embodiment of the present application relates to a depth imaging method, and the implementation details of the depth imaging method of the present embodiment are specifically described below, and the following are provided only for facilitating understanding of the implementation details, and are not necessary for implementing the present embodiment, and a specific flow of the depth imaging method of the present embodiment may be as shown in fig. 2, and includes:

step 201, extracting the features of the obtained object diagram and the features of the preset reference diagram to obtain a feature diagram corresponding to the object diagram and a feature diagram corresponding to the reference diagram.

Step 202, calculating the relevance between the object map and the reference map according to the feature map corresponding to the object map, the feature map corresponding to the reference map and a preset search range.

Steps 201 to 202 are substantially the same as steps 101 to 102, and are not described herein again.

Step 203, traversing each position of the feature map corresponding to the object map, and determining a matching cost space of the feature map corresponding to the object map based on the search range.

In a specific implementation, the dimensions of the feature map corresponding to the object map are: n × H × W, where N is the number of channels, H is the height of the object map, W is the width of the object map, the size of the feature map corresponding to the reference map is the same as the size of the feature map corresponding to the object map, the server may traverse each position, i.e., each pixel, of the feature map corresponding to the object map, and determine a matching cost space of the feature map corresponding to the object map based on the search range, where the matching cost space is a matrix of three-dimensional tensors, and may be represented by [ (2r +1) × (2c +1), H, W ], where (2r +1) × (2c +1) is the number of channels corresponding to the matching cost space.

In one example, the construction of the matching cost space can be implemented by an associated layer network.

And 204, performing aggregation calculation on the matching cost space according to the relevance, and acquiring an optical flow parallax matrix with the channel number being 2 corresponding to the object graph, wherein the channels of the optical flow parallax matrix comprise a first channel representing the row deviation between the object graph and the reference graph and a second channel representing the column deviation between the object graph and the reference graph.

In a specific implementation, after determining a matching cost space of a feature map corresponding to an object map, a server may perform aggregation calculation on the matching cost space according to the relevance, and obtain an optical flow disparity matrix with a channel number of 2 corresponding to the object map: 2 × H × W, the channels of the optical-flow parallax matrix include a first channel representing a row deviation between the object map and the reference map and a second channel representing a column deviation between the object map and the reference map.

In one example, the aggregation calculation of the matching cost space can be realized by an improved stacked hourglass structure network, the improved stacked hourglass structure network abandons the up-down sampling operation of a common hourglass structure, retains the characteristic information of large resolution as much as possible, and performs the aggregation calculation of the matching cost space through a series of two-dimensional convolution layers, so that a three-dimensional tensor with the channel number of 2 is finally obtained as an optical flow parallax matrix.

And step 205, acquiring a depth map corresponding to the object map according to the column deviation between the object map and the reference map.

Step 205 is substantially the same as step 104, and is not described herein again.

In this embodiment, the dimensions of the feature map corresponding to the object map are: n is multiplied by H by W, wherein N is the number of channels, H is the height of the object diagram, and W is the width of the object diagram; before acquiring column deviation between the object map and the reference map according to the correlation, the method further includes: traversing each position of the feature map corresponding to the object map, and determining a matching cost space [ (2r +1) × (2c +1), H, W ], where (2r +1) × (2c +1) is a channel number corresponding to the matching cost space, of the feature map corresponding to the object map based on the search range; the obtaining of the column deviation between the object map and the reference map according to the correlation includes: according to the relevance, performing aggregation calculation on the matching cost space, and acquiring an optical flow parallax matrix with the number of channels being 2 corresponding to the object image: the optical flow parallax matrix comprises channels 2 × H × W, the channels include a first channel and a second channel, the first channel represents a row deviation between the object map and the reference map, the second channel represents a column deviation between the object map and the reference map, that is, a matching cost space is determined, and then the matching cost space is subjected to aggregation calculation based on correlation, so that features of each dimension can be aggregated in a row direction and a column direction, the accuracy of the determined row deviation and column deviation is further improved, meanwhile, the matching cost space can be used for better considering the influence of the row deviation on the column deviation, and the robustness of the depth imaging method on the row deviation is further improved.

In one embodiment, the depth imaging method is implemented based on a pre-trained optical flow convolution neural network, and a server inputs an acquired object image into the pre-trained optical flow convolution network to acquire a depth image corresponding to the object image output by the optical flow convolution network.

In a specific implementation, a model structure diagram of the optical flow convolution network may be as shown in fig. 3, where the optical flow convolution network includes a feature extraction unit, a matching cost space construction unit, and an aggregation unit.

The feature extraction unit is used for extracting the features of the object image and the features of the reference image to obtain a feature image corresponding to the object image and a feature image corresponding to the reference image.

The matching cost space construction unit is used for determining the relevance between the object graph and the reference graph according to the feature graph corresponding to the object graph, the feature graph corresponding to the reference graph and a preset search range, traversing each position of the feature graph corresponding to the object graph, and determining the matching cost space of the feature graph corresponding to the object graph based on the search range.

The aggregation unit is used for performing aggregation calculation on the matching cost space according to the relevance to obtain column deviation between the object image and the reference image, and obtaining the depth image corresponding to the object image according to the column deviation between the object image and the reference image.

In the embodiment, the server uses the pre-trained optical flow convolution network to perform depth imaging when performing depth imaging, so that the speed of the depth imaging can be further improved, and the requirement of a user when using a monocular structured light camera to perform the depth imaging can be better met.

In one embodiment, the object image shot by the monocular structured light camera is a multi-frame image, and the server acquires column deviation between the object image and a reference image, specifically, acquires column deviation between the ith frame of the object image and the reference image, wherein i is an integer greater than 0.

In a specific implementation, if i is equal to 1, that is, the first frame of the object map, of which the depth map needs to be obtained, the server may directly obtain the column deviation between the 1 st frame of the object map and the reference map; if i is greater than 1, that is, a subsequent frame of the object map that needs to obtain the depth map is obtained, the server obtains the column deviation between the ith frame of the object map and the reference map, which can be implemented by the steps shown in fig. 4, and specifically includes:

step 301, acquiring a column offset between the ith frame of the object map and the (i-1) th frame of the object map.

In a specific implementation, the server determines that the frame needing to obtain the depth map is not the 1 st frame of the object map, that is, the frame needing to obtain the depth map is the ith frame of the object map, and the server may first obtain the column offset between the ith frame of the object map and the i-1 th frame of the object map, and does not need to directly obtain the column offset between the ith frame of the object map and the reference map, so that the time for obtaining the column offset may be shortened.

Step 302, acquiring the column deviation between the ith frame and the reference image according to the column deviation between the ith frame and the (i-1) th frame and the column deviation between the (i-1) th frame and the reference image.

In a specific implementation, after acquiring the column deviation between the ith frame of the object map and the i-1 th frame of the object map, the server may superimpose the column deviation between the i-1 th frame of the object map and the reference map, which has been acquired in advance, on the basis of the column deviation between the ith frame of the object map and the i-1 th frame of the object map, to obtain the column deviation between the ith frame of the object map and the reference map.

In this embodiment, the acquiring the column deviation between the object map and the reference map includes: acquiring column deviation between the ith frame of the object image and the reference image; wherein i is an integer greater than 0; if i is greater than 1, the acquiring a column deviation between the ith frame of the object map and the reference map includes: acquiring column deviation between the ith frame of the object image and the (i-1) th frame of the object image; according to the column deviation between the ith frame and the (i-1) th frame and the column deviation between the (i-1) th frame and the reference image, the column deviation between the ith frame and the reference image is obtained.

In one embodiment, the monocular structured light camera takes multiple frames of images of an object image, the server obtains a column deviation between the object image and a reference image, specifically, obtains a column deviation between an ith frame of the object image and the reference image, where i is an integer greater than 0, if i is equal to 1, that is, the 1 st frame of the object image needs to be obtained, the server determines that the preset search range is a first search range, and if i is greater than 1, that is, the subsequent frame of the object image needs to be obtained, the server determines that the preset search range is a second search range, where the first search range is greater than the second search range, and considering that the difference between the subsequent frame and a previous frame is not large, when obtaining a depth image corresponding to the subsequent frame, an accurate column deviation can be obtained by using a smaller second search range, further reducing the calculation time consumption, and the generation speed of the depth map is improved.

In one embodiment, the depth imaging method may be implemented based on a pre-trained optical flow convolution neural network, where the server inputs an acquired object map into the pre-trained optical flow convolution network, acquires a depth map corresponding to the object map output by the optical flow convolution network, where the object map captured by the monocular structured light camera is a multi-frame image, the pre-trained optical flow convolution network includes a first optical flow convolution network and a second optical flow convolution network, the first optical flow convolution network corresponds to a first search range, the second optical flow convolution network corresponds to a second search range, and the first search range is greater than the second search range.

In a specific implementation, the method for acquiring the depth map corresponding to the object map output by the optical flow convolution network includes the following steps: inputting the 1 st frame of the object image into a first optical flow convolution network, acquiring a depth image corresponding to the 1 st frame output by the first optical flow convolution network, inputting the ith frame of the object image into a second optical flow convolution network, and acquiring a depth image corresponding to the ith frame output by the second optical flow convolution network, wherein i is an integer greater than 1.

In this embodiment, the server performs depth imaging by using two optical flow convolution networks trained in advance, the first optical flow convolution network has a large search range and is suitable for acquiring a depth map corresponding to the 1 st frame of the object map, and the second optical flow convolution network has a small search range and is suitable for acquiring a depth map of a subsequent frame of the object map, so that time consumed by depth imaging is further reduced.

In one embodiment, the pre-trained optical flow convolution network may be trained through the steps shown in fig. 5, which specifically include:

step 401, a three-dimensional model data set is obtained, wherein the three-dimensional model data set comprises a plurality of three-dimensional models.

In one example, the acquired three-dimensional model dataset may be a three-dimensional suspended matter dataset (Flying Things3D) in a Scene Flow dataset (Scene Flow Datasets), and the Flying Things3D dataset is composed of a plurality of three-dimensional objects floating in the air and a virtual background.

Step 402, performing virtual imaging and three-dimensional rendering on the three-dimensional model according to a preset virtual monocular structured light camera to obtain a first training sample set.

In a specific implementation, the server may perform virtual imaging and three-dimensional rendering on the three-dimensional model by using a preset virtual monocular structured light camera to obtain a first training sample set, where a rotation matrix of the preset virtual monocular structured light camera is a unit matrix and a translation matrix is a zero matrix, that is, it is considered that there is no line offset in the preset virtual monocular structured light camera, and the training samples in the first training sample set include paired object maps and reference maps, and optical flow truth value data.

In one example, after the server performs virtual imaging and three-dimensional rendering on the three-dimensional model by using a preset virtual monocular structured light camera to obtain a first training sample set, the server may perform preprocessing on the first training sample, where the preprocessing includes image brightness adjustment, image contrast adjustment, image clipping, and the like, so as to greatly expand the first training sample set.

And 403, training the initial optical flow convolution network according to the first training sample set.

In a specific implementation, after obtaining the first training sample set, the server may train the initial optical flow convolution network according to the first training sample set, where the first training sample set provides general basic data for training.

And 404, performing virtual imaging and three-dimensional rendering on the three-dimensional model according to the adjusted virtual monocular structured light camera to obtain a second training sample set according to the rotation matrix and/or translation matrix corresponding to the adjusted virtual monocular structured light camera.

In a specific implementation, after the server trains the initial optical flow convolution network according to the first training sample set, the server may adjust a rotation matrix and/or a translation matrix corresponding to the virtual monocular structured light camera, that is, it is considered that an internal structure of the virtual monocular structured light camera changes and a line deviation exists, and perform virtual imaging and three-dimensional rendering on the three-dimensional model according to the adjusted virtual monocular structured light camera to obtain a second training sample set, where the training samples in the second training sample set include paired object and reference images and optical flow truth value data.

In one example, the server performs virtual imaging and three-dimensional rendering on the three-dimensional model according to the adjusted virtual monocular structured light camera to obtain a second training sample set, and then performs preprocessing on the second training sample, wherein the preprocessing includes adding gaussian noise and performing gaussian fuzzy operation, and can simulate the phenomena of low speckle structured light reflectivity, imaging definition and signal-to-noise ratio difference of real human skin, so that the depth imaging method can be better adapted to the human face or the human body area.

In one example, the preprocessing of the second training samples further includes Local Contrast Normalization (LCN), where in the monocular structured light system, speckle brightness and Contrast of the object map and the reference map may be greatly different according to the shooting distance and the material of the object in the scene, and the LCN may be used to improve robustness of the optical flow convolution network for the case where the difference between the object map and the reference map is large.

In one example, LCN operation may be implemented by the following equation:

wherein I is the original pixel brightness value, I_LCNThe value of the pixel brightness after the LCN operation is μ is a brightness average value in a preset window around the pixel, the size of the preset window is generally 9 × 9 to 15 × 15, σ is a brightness standard deviation in the preset window around the pixel, and η is a preset constant for preventing the denominator from being 0.

And 405, training the optical flow convolution network trained by the first training sample set according to the second training sample set to obtain a pre-trained optical flow convolution network.

In a specific implementation, after obtaining the second training sample set, the server may train the optical flow convolution network trained by the first training sample set according to the second training sample set, so as to obtain a pre-trained optical flow convolution network.

In one example, the server may dynamically adjust a learning rate in the model training process using the RMSProp optimizer, the learning rate decreases in a stepwise manner as the number of iterations increases, and the server trains the optical convolutional network using a Smooth-L1-loss function as a loss function, where the loss value is calculated as follows:

where Loss value of Loss, N is the number of marked pixels, d_ijIn order to be the true disparity value,in order to be a predicted disparity value for the first image,is a Smooth-L1-loss function.

In this embodiment, the server performs double training on the optical flow convolution network, the first training is training performed on a general optical flow method network, that is, generalized training, so that the general optical flow method network has the capability of acquiring basic optical flow deviation, and the second training is targeted training performed on a monocular structured light camera, so that the optical flow convolution network has robustness to line deviation.

The steps of the above methods are divided for clarity, and the implementation may be combined into one step or split some steps, and the steps are divided into multiple steps, so long as the same logical relationship is included, which are all within the protection scope of the present patent; it is within the scope of the patent to add insignificant modifications to the algorithms or processes or to introduce insignificant design changes to the core design without changing the algorithms or processes.

Another embodiment of the present application relates to an electronic device, as shown in fig. 6, including: at least one processor 501; and a memory 502 communicatively coupled to the at least one processor 501; the memory 502 stores instructions executable by the at least one processor 501, and the instructions are executed by the at least one processor 501, so that the at least one processor 501 can execute the depth imaging method in the above embodiments.

Where the memory and processor are connected by a bus, the bus may comprise any number of interconnected buses and bridges, the buses connecting together one or more of the various circuits of the processor and the memory. The bus may also connect various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor is transmitted over a wireless medium via an antenna, which further receives the data and transmits the data to the processor.

The processor is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And the memory may be used to store data used by the processor in performing operations.

Another embodiment of the present application relates to a computer-readable storage medium storing a computer program. The computer program realizes the above-described method embodiments when executed by a processor.

That is, as can be understood by those skilled in the art, all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples for carrying out the present application, and that various changes in form and details may be made therein without departing from the spirit and scope of the present application in practice.

18页详细技术资料下载

Depth imaging method, electronic device, and computer-readable storage medium

相关技术

网友询问留言