Depth information processing method, device, apparatus, storage medium, and program product

文档序号：569653 发布日期：2021-05-18 浏览：12次中文

阅读说明：本技术 深度信息处理方法、装置、设备、存储介质以及程序产品 (Depth information processing method, device, apparatus, storage medium, and program product ) 是由宋希彬张良俊于 2021-01-07 设计创作，主要内容包括：本公开公开了深度信息处理方法、装置、设备、存储介质以及程序产品,涉及图像处理技术领域,尤其涉及计算机视觉技术、深度学习技术和自动驾驶技术。具体实现方案为：通过深度信息补充模型中子模型单元,根据目标场景的稀疏深度信息,确定目标场景的中间深度信息；将通过所述深度信息补充模型中尾部子模型单元所确定的中间深度信息,作为所述目标场景的稠密深度信息。本公开实施例可以提高稠密深度信息预测的准确性。(The disclosure discloses a depth information processing method, a device, equipment, a storage medium and a program product, which relate to the technical field of image processing, in particular to a computer vision technology, a depth learning technology and an automatic driving technology. The specific implementation scheme is as follows: determining intermediate depth information of a target scene according to sparse depth information of the target scene by a sub-model unit of a depth information supplementary model; and taking the intermediate depth information determined by the tail sub-model unit in the depth information supplementary model as the dense depth information of the target scene. The embodiment of the disclosure can improve the accuracy of dense depth information prediction.)

1. A depth information processing method, comprising:

determining intermediate depth information of a target scene according to sparse depth information of the target scene by a depth information supplementary model neutron model unit;

and taking the intermediate depth information determined by the tail sub-model unit in the depth information supplementary model as the dense depth information of the target scene.

2. The method of claim 1, wherein the determining intermediate depth information of a target scene from sparse depth information of the target scene by a depth information supplementing model submodel unit comprises:

taking the sparse depth information of the target scene as the input of a header sub-model unit in the depth information supplementary model to obtain the intermediate depth information determined by the header sub-model unit;

and aiming at each other submodel unit except the head submodel unit in the depth information supplementary model, taking the intermediate depth information determined by the last submodel unit of the other submodel units as the input of the other submodel units to obtain the intermediate depth information determined by the other submodel units.

3. The method of claim 1, wherein the determining intermediate depth information of a target scene from sparse depth information of the target scene by a depth information supplementing model submodel unit comprises:

processing the input depth information through an hourglass network layer of a neutron model unit of the depth information supplementary model to obtain supplementary depth information;

and superposing the supplementary depth information and the depth information input by the sub-model unit through an accumulation layer of the sub-model unit to obtain the intermediate depth information determined by the sub-model unit.

4. The method of claim 3, wherein said overlaying the supplemental depth information with the depth information input by the sub-model unit comprises:

determining a matching relationship between complementary pixel points in the complementary depth information and sparse pixel points in the input depth information through an accumulation layer of the sub-model unit;

and superposing the complementary depth data of the complementary pixel points and the sparse depth data of the matched sparse pixel points.

5. The method of claim 1, further comprising:

determining depth characteristic information according to the pixel information of the target scene through the depth information supplementary model neutron model unit;

and adjusting the intermediate depth information determined by the sub-model unit according to the depth characteristic information.

6. The method of claim 1, further comprising:

acquiring standard sparse depth information and standard dense depth information of a standard scene;

and training to obtain a depth information supplementary model according to the standard sparse depth information and the standard dense depth information.

7. The method of claim 6, wherein the obtaining standard sparse depth information and standard dense depth information for a standard scene comprises:

acquiring an image of the standard scene by using a depth sensor to obtain standard dense depth information;

and sampling is carried out in the standard dense depth information to generate the standard sparse depth information.

8. The method of claim 6, wherein the obtaining standard sparse depth information and standard dense depth information for a standard scene comprises:

adopting radar equipment to carry out video acquisition on the standard scene to obtain continuous multi-frame sparse depth information;

performing projection processing on the continuous multi-frame sparse depth information to generate the standard dense depth information;

and acquiring sparse depth information matched with the standard dense depth information from the continuous multi-frame sparse depth information, and determining the sparse depth information as the standard sparse depth information.

9. A depth information processing apparatus comprising:

the sparse depth information input module is used for supplementing a neutron model unit of the model through depth information and determining intermediate depth information of a target scene according to sparse depth information of the target scene;

and the dense depth information generation module is used for taking the intermediate depth information determined by the tail sub-model unit in the depth information supplementary model as the dense depth information of the target scene.

10. The apparatus of claim 9, wherein the sparse depth information input module comprises:

the serial input unit is used for taking the sparse depth information of the target scene as the input of a header sub-model unit in the depth information supplementary model to obtain the intermediate depth information determined by the header sub-model unit;

and the intermediate transmission unit is used for taking the intermediate depth information determined by the previous sub-model unit of the other sub-model units as the input of the other sub-model units aiming at each other sub-model unit except the head sub-model unit in the depth information supplementary model to obtain the intermediate depth information determined by the other sub-model units.

11. The apparatus of claim 9, wherein the sparse depth information input module comprises:

the complementary depth information acquisition unit is used for processing the input depth information through the hourglass network layer of the sub model unit of the depth information complementary model to obtain complementary depth information;

and the supplementary depth information superposition unit is used for superposing the supplementary depth information and the depth information input by the sub-model unit through an accumulation layer of the sub-model unit to obtain the middle depth information determined by the sub-model unit.

12. The apparatus of claim 11, wherein the supplemental depth information superimposing unit comprises:

the pixel matching subunit is used for determining the matching relationship between the complementary pixel points in the complementary depth information and the sparse pixel points in the input depth information through the accumulation layer of the sub-model unit;

and the pixel superposition subunit is used for superposing the complementary depth data of the complementary pixel points and the sparse depth data of the matched sparse pixel points.

13. The apparatus of claim 9, further comprising:

the pixel information input module is used for determining depth characteristic information according to the pixel information of the target scene through the depth information supplementary model neutron model unit;

and the intermediate depth information adjusting module is used for adjusting the intermediate depth information determined by the sub-model unit according to the depth characteristic information.

14. The apparatus of claim 9, further comprising:

the system comprises a sample acquisition module, a data acquisition module and a data processing module, wherein the sample acquisition module is used for acquiring standard sparse depth information and standard dense depth information of a standard scene;

and the model training module is used for training to obtain a depth information supplementary model according to the standard sparse depth information and the standard dense depth information.

15. The apparatus of claim 14, wherein the sample acquisition module comprises:

the dense depth information acquisition unit is used for acquiring images of the standard scene by adopting a depth sensor to obtain standard dense depth information;

and the sparse depth information generating unit is used for sampling in the standard dense depth information to generate the standard sparse depth information.

16. The apparatus of claim 14, wherein the sample acquisition module comprises:

the sparse depth information acquisition unit is used for acquiring videos of the standard scene by adopting radar equipment to obtain continuous multi-frame sparse depth information;

the sparse depth information fusion unit is used for performing projection processing on the continuous multi-frame sparse depth information to generate the standard dense depth information;

and the dense depth information generating unit is used for acquiring sparse depth information matched with the standard dense depth information from the continuous multi-frame sparse depth information and determining the sparse depth information as the standard sparse depth information.

17. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the depth information processing method of any one of claims 1 to 8.

18. A non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the depth information processing method according to any one of claims 1 to 8.

19. A computer program product comprising a computer program which, when executed by a processor, implements a depth information processing method according to any one of claims 1-8.

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to computer vision technologies, deep learning technologies, and automatic driving technologies.

Background

Depth perception, which refers to the perception of the distance of different objects in the same scene, is an important component in many computer vision tasks, such as auto-navigation tasks.

For example, radar devices can typically only generate sparse depth maps that lack much depth data. The depth completion technology is a method for recovering dense scene depth information by taking collected discrete scene depth information as input.

Disclosure of Invention

The present disclosure provides a depth information processing method, apparatus, device, storage medium, and program product.

According to an aspect of the disclosure, a neutron model unit of a model is supplemented by depth information, and intermediate depth information of a target scene is determined according to sparse depth information of the target scene;

and taking the intermediate depth information determined by the tail sub-model unit in the depth information supplementary model as the dense depth information of the target scene.

According to another aspect of the present disclosure, there is provided a depth information processing apparatus including:

the sparse depth information input module is used for supplementing the neutron model unit of the model through the depth information and determining the middle depth information of the target scene according to the sparse depth information of the target scene;

According to another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a depth information processing method according to any one of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform a depth information processing method according to any one of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the depth information processing method according to any one of the embodiments of the present disclosure.

According to the technical scheme disclosed by the invention, the accuracy of dense depth information prediction is improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a schematic diagram of a depth information processing method according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a depth information processing method according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a depth information supplemental model in which embodiments of the present disclosure may be implemented;

FIG. 4 is a schematic diagram of a depth information processing method according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a sub-model element in which embodiments of the present disclosure may be implemented;

FIG. 6 is a schematic diagram of a depth information supplemental model in which embodiments of the present disclosure may be implemented;

FIG. 7 is a schematic diagram of a sub-model element in which embodiments of the present disclosure may be implemented;

FIG. 8 is a schematic diagram of a depth information processing method according to an embodiment of the present disclosure;

fig. 9 is a schematic diagram of a depth information processing apparatus according to an embodiment of the present disclosure;

fig. 10 is a block diagram of an electronic device for implementing a depth information processing method according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a flowchart of a depth information processing method disclosed in an embodiment of the present disclosure, and this embodiment may be applied to a case where sparse depth information of a target scene is supplemented to generate dense depth information of the target scene. The method of the embodiment may be executed by a depth information processing apparatus, which may be implemented in software and/or hardware and is specifically configured in an electronic device with certain data operation capability.

S101, determining intermediate depth information of the target scene according to sparse depth information of the target scene through a sub-model unit of the depth information supplementary model.

The depth information supplement model is used for supplementing sparse depth information to form dense depth information, wherein depth information supplement can be understood as depth information prediction. The depth information of a scene can be described by using an image, and pixel points in the image have depth information. The difference between the sparse depth information and the dense depth information may mean that the proportions of effective pixels in the image are different. The occupation ratio (for example, 20%) of the pixel points with the depth information in the image corresponding to the sparse depth information to all the pixel points included in the image is smaller than the occupation ratio (for example, 80%) of the pixel points with the depth information in the image corresponding to the dense depth information to all the pixel points included in the image. Meanwhile, in the image corresponding to the sparse depth information, the pixel points with the depth information are unevenly distributed and sparse; in the image corresponding to the dense depth information, the distribution of the pixel points with the depth information is uniform and dense. Dense depth information is richer and denser than sparse depth information.

The depth information supplementary model may refer to a depth learning model trained in advance. The depth information supplementary model comprises a plurality of sub model units, and the connection mode among the sub model units can be series connection and/or parallel connection.

The target scene may be any application scene, for example, the target scene may be a road real scene, and for example, the target scene may be a scene in which a solid object is located. The sparse depth information of the target scene may refer to a depth information image acquired by collecting the target scene by using a radar or a depth sensor, wherein pixel points in the depth information image have depth information. The sparse depth information acquisition equipment can be configured on a robot, an unmanned vehicle and terminal equipment, and the robot, the unmanned vehicle or the terminal equipment can be adopted to acquire the depth information of a target scene in real time in the moving or static process. In addition, the acquired depth information image is generally a monocular image.

And sparse depth information of the target scene is used as input data of the depth information supplementary model. Optionally, the sparse depth information of the target scene is used as input data of a header sub-model unit or each sub-model unit in the depth information supplementary model. And inputting the sparse depth information into a depth information supplementary model, and respectively determining corresponding intermediate depth information through each sub-model unit. The intermediate depth information may be a prediction result of depth information output by the sub-model unit, and is usually depth information formed by performing supplementary processing on depth information input by the sub-model monolingual. And correspondingly outputting intermediate depth information by each sub-model unit.

And S102, taking the intermediate depth information determined by the tail sub-model unit in the depth information supplementary model as the dense depth information of the target scene.

The tail sub-model element may refer to a last sub-model element among a plurality of connected sub-model elements. For example, only in the plurality of sub-model units connected in series, the last sub-model unit is a tail sub-model unit, or all the plurality of sub-model units connected in parallel at the tail are tail sub-model units. And under the condition that the number of the tail sub-model units is one, determining the intermediate depth information output by the tail sub-model units as the dense depth information of the target scene, namely determining the dense depth information of the depth information supplementary model. And the dense depth information of the target scene is used as the output data of the depth information supplementary model. The depth information complementing model is used for complementing the sparse depth information into dense depth information.

According to the technical scheme, according to sparse depth information of a target scene, the middle depth information of the target scene is respectively determined through the sub-model units of the depth information supplement model, the sparse depth information is supplemented step by step, and finally the middle depth information of the tail sub-model unit is determined as dense depth information of the target scene, so that the sparse depth information is supplemented in multiple stages, the depth information is supplemented fully, and the accuracy of depth information prediction is improved.

Fig. 2 is a flowchart of another depth information processing method disclosed in an embodiment of the present disclosure, which is further optimized and expanded based on the above technical solution, and can be combined with the above optional embodiments.

S201, using the sparse depth information of the target scene as the input of a header sub-model unit in the depth information supplementary model to obtain the intermediate depth information determined by the header sub-model unit.

The depth information supplementary model comprises sub-model units connected in series. And the sparse depth information of the target scene is used as the input of a head sub-model unit in the depth information supplementary model, and the sparse depth information of the target scene is processed by the head sub-model unit to obtain the intermediate depth information determined by the head sub-model unit and is transmitted to a next sub-model unit.

And the header sub-model unit is used for supplementing the sparse depth information to obtain the intermediate depth information determined by the header sub-model unit. The occupation ratio of the pixel points with the depth information in the image corresponding to the middle depth information determined by the header sub-model unit is higher than the occupation ratio of the pixel points with the depth information in the image corresponding to the sparse depth information, which can be understood that the middle depth information determined by the header sub-model unit is denser than the sparse depth information, and meanwhile, the depth information in the middle depth information determined by the header sub-model unit is richer.

S202, aiming at each other submodel unit except the head submodel unit in the depth information supplementary model, taking the middle depth information determined by the last submodel unit of the other submodel units as the input of the other submodel units to obtain the middle depth information determined by the other submodel units.

Other submodels may refer to non-header submodel elements. The last submodel unit of other submodel units is connected with other submodels in series, and the output of the last submodel unit of other submodel units is the input of other submodels. The output of the last one of the other submodel elements includes intermediate depth information. And the other submodel units process the intermediate depth information determined by the previous submodel unit to obtain the intermediate depth information determined by the other submodels. The image size corresponding to each piece of intermediate depth information is the same, and the image size corresponding to the sparse depth information of the target scene is the same, and the image size may include the number of channels, the image width, the image height, and the like.

In practice, the content learned by the head sub-model unit is the difference between sparse depth information and true dense depth information, while the content learned by each other sub-model unit may be the difference between intermediate depth information output by the last sub-model unit and true dense depth information. Through continuous learning, the intermediate depth information output by each sub-model unit is closer to the real dense depth information. The real dense depth information is real depth information of the target scene, that is, each pixel in the corresponding image has depth information.

As shown in fig. 3, the depth information supplementing model includes a plurality of serially connected sub-model units 302-304, wherein the sub-model unit 302 is a head sub-model unit, the sub-model unit 304 is a tail sub-model unit, the sub-model unit 302 receives the sparse depth information 301 and processes the sparse depth information to obtain intermediate depth information, and transmits the intermediate depth information to the serially connected next sub-model unit 303, and so on until the sub-model unit 304 outputs the intermediate depth information to determine the intermediate depth information as dense depth information 305. Optionally, the depth information supplementary model includes at least two sub-model units.

The existing deep learning model for depth information supplement only performs single-stage processing on depth information to directly obtain supplemented depth information, and the depth information is difficult to be supplemented in one step, so that the sufficiency of depth information supplement is insufficient. According to the technical scheme, in the multiple serially-connected sub-model units, the unit of each sub-model is supplemented based on the intermediate depth information provided by the previous sub-model unit, so that multi-stage depth information supplement is realized, the sparse depth information is gradually added and supplemented, the sparse depth information is gradually denser, and the density of the depth information can be accurately improved.

S203, the intermediate depth information determined by the tail sub-model unit in the depth information supplementary model is used as the dense depth information of the target scene.

According to the technical scheme, in the multiple serially-connected sub-model units, the units of each sub-model are supplemented based on the intermediate depth information provided by the previous sub-model unit, and the sparse depth information is gradually accumulated and supplemented through the supplementing operation of the multi-stage depth information, so that the sparse depth information is gradually denser, the density of the depth information can be accurately improved, and the prediction accuracy of the depth information is improved.

Fig. 4 is a flowchart of another depth information processing method disclosed in an embodiment of the present disclosure, which is further optimized and expanded based on the above technical solution, and can be combined with the above optional embodiments.

S401, processing the input depth information through the hourglass network layer of the sub model unit of the depth information supplementary model to obtain supplementary depth information.

Each sub-model unit has the same structure and comprises a hourglass network layer and an accumulation layer, the hourglass network layer is connected with the accumulation layer, and the input of the accumulation layer comprises the output of the hourglass network layer. The supplementary depth information may be depth information in which the input depth information is missing, and the supplementary depth information is used to be added to the input depth information to supplement the input depth information to form intermediate depth information. The image size corresponding to the supplementary depth information is the same as the image size corresponding to the input depth information.

Among them, the Hourglass network layer (Hourglass Networks) includes an Encoder and a Decoder (Encoder-Decoder). The encoder and the decoder are of a symmetrical structure, the number of convolution layers included by the encoder and the number of convolution layers included by the decoder are the same, and therefore the image size corresponding to the supplementary depth information output by the hourglass network layer is the same as the image size corresponding to the input depth information. The encoder is used for feature extraction and the decoder can be viewed as the inverse operation of the encoder. For example, the encoder may employ a deep convolutional neural network vgg (visual Geometry group) or a residual error network (net), etc. Wherein each convolution operation includes a convolution kernel of k x k and a c-layer Channel (Channel).

As shown in fig. 5, in the configuration of the sub-model unit, the sub-model unit acquires input depth information 501, processes the input depth information 501 sequentially by using an encoder 502 and a decoder 503 in the hourglass network layer, and inputs the acquired data as supplementary depth information into the accumulation layer 504, and the accumulation layer 504 further acquires the input depth information 501, and accumulates the supplementary depth information and the input depth information 501 to obtain intermediate depth information 505. The input depth information 501 may be sparse depth information or intermediate depth information. In the hourglass network layer, two convolutional layers connected by arrows in the encoder 502 and the decoder 503 correspond to each other and have the same size, and the convolutional layers in the decoder 503 need to be feature-fused with the output of the corresponding convolutional layers (i.e., convolutional layers connected by arrows). Illustratively, the decoder 503 and the encoder 502 each include 4 convolutional layers. The input to the first convolutional layer of the decoder 503, in left-to-right order, is the feature (feature) convolved by the last convolutional layer (i.e., the fourth convolutional layer) of the encoder 502, which feature has the smallest size. The input of the subsequent decoder 503 convolutional layer is the output of the decoder 503 convolutional layer of the previous layer and the output of the layer corresponding to the decoder 503, that is, upsampling (up-sampling) is performed first, and the convolutional layers of the same size of the encoder 502 are fused to obtain the output characteristics, where the fusion mode may be pixel accumulation or concatenation (concate) convolution operation.

S402, the supplementary depth information and the depth information input by the sub-model unit are superposed through the accumulation layer of the sub-model unit to obtain the intermediate depth information determined by other sub-model units.

The accumulation layer is used for pixel-level accumulation of the first input depth information and the complementary depth information, and is particularly used for performing pixel-by-pixel depth information accumulation operation on images with the same size.

Optionally, the superimposing the supplementary depth information and the depth information input by the sub-model unit includes: determining a matching relation between a supplementary pixel point in the supplementary depth information and a sparse pixel point in the input depth information through an accumulation layer of the sub-model unit; and superposing the complementary depth data of the complementary pixel points and the sparse depth data of the matched sparse pixel points.

The matching relation is used for determining the relation between each supplementary pixel point and each sparse pixel point and determining the supplementary pixel points and the sparse pixel points which need to be superposed. Optionally, the matching relationship may refer to that the complementary pixel point and the sparse pixel point at the same position in the image are matched with each other. In fact, the image size of the complementary depth information is the same as the image size of the input depth information, and a matching relationship can be established between the complementary pixel point and the sparse pixel point at the same position. The complementary depth data is used for describing the depth information value of the complementary pixel point, and the sparse depth data is used for describing the depth information value of the sparse pixel point. The two matched complementary depth data and the sparse depth data can be directly subjected to accumulation calculation, the obtained depth data is determined to be the depth data corresponding to the new pixel point at the matched position, and a new image is formed based on the combination of all the new pixel points with the new depth data and is determined to be the intermediate depth information.

By determining the complementary pixel points and the matched sparse pixel points and superposing the depth data of the two pixel points, pixel-level superposition of the depth data is realized, and the depth information can be accurately predicted pixel by pixel, so that each new pixel point formed after superposition and the corresponding new depth data are obtained, the middle depth information is determined, the depth information of each pixel point in the middle depth information is accurately predicted, and the prediction accuracy of the depth information is improved.

And S403, taking the intermediate depth information determined by the tail sub-model unit in the depth information supplementary model as the dense depth information of the target scene.

Optionally, the depth information processing method further includes: determining depth characteristic information according to the pixel information of the target scene through the depth information supplementary model neutron model unit; and adjusting the intermediate depth information determined by the sub-model unit according to the depth characteristic information.

The pixel information may refer to a pixel value of a pixel point. The pixel information is used for describing color features of a target scene, and the target scene can be acquired by adopting a color camera under the condition that the acquisition conditions of the pixel information are the same as those of the sparse depth information to obtain the pixel information, wherein the acquisition conditions are the same, and at least one of the conditions such as a visual angle, brightness and camera parameters is the same. In general, pixel values may be represented using Red Green Blue (RGB). The depth feature information is used for describing the depth features learned by the sub-model unit from the pixel information of the target scene. The depth characteristic information is used for adjusting the intermediate depth information determined by the sub-model unit and updating the intermediate depth information so as to enable the intermediate depth information to be more accurate and rich. In fact, in the target scene, the pixel information corresponding to the pixel points with the same depth information in one object is the same, for example, the depth and the color of the pixel points belonging to the same position are the same in the same object relative to the acquisition plane. Therefore, the consistency exists between the pixel information of part of pixels in one object and the depth information, so that the pixel information of a target scene can be input into the sub-model unit, the sub-model unit learns the pixel information, the information of the pixel information is rich and dense, and the identification of the scene can be guided, for example, the outline of each object in the scene is guided, the probability that the depth information which usually belongs to the same object is the same is higher, the depth information of the pixel points which belong to the same object can be predicted according to the depth information, the middle depth information is adjusted according to the depth characteristic information determined by the pixel information, the sub-model unit can better predict the depth information, and the prediction accuracy of the depth information is improved.

In a specific example, as shown in FIG. 6, pixel information 606 and the input depth information are used as input for each sub-model cell (602-604). Taking the sparse depth information 601 of the target scene and the pixel information 606 of the target scene as the input of a header sub-model unit 602 in the depth information supplementary model together to obtain intermediate depth information determined by the header sub-model unit, wherein the intermediate depth information is depth information obtained by adjusting the intermediate depth information determined by the sparse depth information 601 by adopting the depth characteristic information determined by the pixel information 606; and aiming at each other submodel unit except the head submodel unit in the depth information supplementary model, taking the intermediate depth information determined by the last submodel unit of the other submodel units (603 and 604) and the pixel information of the target scene as the input of the other submodel units together to obtain the intermediate depth information determined by the other submodel units, wherein the intermediate depth information determined by the other submodel units is the depth characteristic information determined by adopting the pixel information to adjust the intermediate depth information determined according to the intermediate depth information determined by the last submodel unit to obtain the depth information.

As shown in fig. 7, the input depth information 701 and the pixel information 706 are processed by an hourglass network layer of a sub model unit in the depth information supplementary model to obtain supplementary depth information, where the hourglass network layer includes an encoder 702 and a decoder 703, and the supplementary depth information is actually depth feature information determined by the hourglass network layer according to the pixel information 706, and is depth information obtained by adjusting the supplementary depth information determined by the hourglass network layer according to the input depth information 701; the complementary depth information and the depth information input by the sub-model unit are superimposed by the accumulation layer 704 of the sub-model unit to obtain the intermediate depth information 705 determined by other sub-model units, and thus the intermediate depth information 705 is the depth information obtained by adjusting the intermediate depth information determined according to the input depth information by using the pixel information.

The color information is used for assisting in depth prediction in each sub-model unit, so that the prediction result of the dense depth is more accurate.

The multi-scale depth information features are extracted through the hourglass network layer, and the intermediate depth information output by the previous submodel unit of other submodel units is determined as the input of other submodel units, so that the learning target of each submodel unit is the residual between the intermediate depth information of the previous submodel unit and the real dense depth information, the input intermediate depth information is gradually close to the real dense depth information in an iteration mode, and the recovery result of the high-quality dense depth information is obtained.

Fig. 8 is a flowchart of another depth information processing method disclosed in an embodiment of the present disclosure, which is further optimized and expanded based on the above technical solution, and can be combined with the above optional embodiments.

S801, acquiring standard sparse depth information and standard dense depth information of a standard scene.

The standard sparse depth information is used as input data of the model, and the standard dense depth information is used as a true value of the model. The standard sparse depth information and the standard dense depth information are depth information acquired by performing acquisition processing on the same standard scene. And the standard sparse depth information and the standard dense depth information of the standard scene are used as a training sample to train the deep learning model to obtain the depth information supplementary model. At least one standard scene can be respectively collected to form a plurality of sample pairs, and the standard scenes corresponding to different sample pairs can be different or the same. And acquiring a large number of sample pairs to form a training data set so as to train the deep learning model to obtain a deep information supplementary model.

Optionally, the obtaining standard sparse depth information and standard dense depth information of the standard scene includes: acquiring an image of the standard scene by using a depth sensor to obtain standard dense depth information; and sampling is carried out in the standard dense depth information to generate the standard sparse depth information.

The depth sensor can be a high-precision depth sensor, and can acquire high-precision depth information. The depth sensor is adopted to collect images of the standard scene, and the obtained depth information can be regarded as relatively dense depth information, so that the depth information can be used as standard dense depth information. And sampling is carried out in the standard dense depth information, the standard dense depth information is used for screening the standard dense depth information, a small number of pixel points with depth information are obtained, and standard sparse depth information is formed, namely the standard dense depth information is used for converting the standard dense depth information from relatively dense depth information into relatively sparse discrete depth information.

The standard scene is subjected to image acquisition through the depth sensor, the obtained depth information is used as standard dense depth information, the standard dense depth information is subjected to sparse sampling to form standard sparse depth information, and a training sample can be generated quickly.

Optionally, the obtaining standard sparse depth information and standard dense depth information of the standard scene includes: adopting radar equipment to carry out video acquisition on the standard scene to obtain continuous multi-frame sparse depth information; performing projection processing on the continuous multi-frame sparse depth information to generate the standard dense depth information; and acquiring sparse depth information matched with the standard dense depth information from the continuous multi-frame sparse depth information, and determining the sparse depth information as the standard sparse depth information.

The radar device may refer to a low-precision depth acquisition device, and may acquire low-precision depth information. The method for acquiring the video of the standard scene by the radar equipment can be used for continuously acquiring the standard scene to form video stream data. The video stream data includes a plurality of frames of sparse depth information.

The different frames of sparse depth information can be set to have different collection visual angles, so that the sparse depth information can be taken as different sparse depth information obtained by shooting of the multi-view radar equipment. Correspondingly, projection processing is carried out on continuous multi-frame sparse depth information, multi-view sparse depth information is actually fused into single-view dense depth information, and the single-view dense depth information is determined to be standard dense depth information. Specifically, continuous multi-frame sparse depth information is projected to any one frame to obtain dense depth information corresponding to one frame, and the selected frame sparse depth information is determined as standard sparse depth information. The sparse depth information matched with the standard dense depth information may refer to sparse depth information having the same acquisition condition as the standard dense depth information.

The fusion mode may be to determine each pixel included in a certain frame under a certain collection view angle and depth information of each included pixel according to a position relationship between each pixel in the multi-frame sparse depth information and the depth information of each pixel. The position relation is used for mapping each pixel point to the same coordinate system and determining the pixel points belonging to the same frame. And combining the frame including the pixel points and the depth information of the pixel points to form new depth information, and determining the new depth information as standard dense depth information. Meanwhile, the sparse depth information corresponding to the frame collected in advance is used as standard sparse depth information and is combined with the standard dense depth information to form a training sample.

The method has the advantages that video acquisition is carried out on a standard scene through the radar, a plurality of pieces of depth information are fused to form standard dense depth information, any one piece of depth information is used as standard sparse depth information, a training sample can be generated accurately under the condition that high-precision acquisition equipment is not provided, and the generation cost of the training sample is reduced.

Optionally, one training sample may further include standard pixel information of a standard scene, and a color camera may be used to acquire the standard scene to obtain the standard pixel information, where an acquisition condition of the color camera is the same as an acquisition condition of the standard sparse depth information and the standard dense depth information.

The structure of the deep learning model may refer to the description of the structure of the aforementioned depth information supplementary model. The sub-model units have the same structure, the order of the sub-model units in the deep learning model can be adjusted when the training is not performed, and the order of the sub-model units is fixed after the training is started and the training is finished. And under the condition that the deep learning model training is completed, determining the current deep learning model as a deep information supplementary model. The deep learning model training completion may refer to that the sum of the loss values of the sub-model units is smaller than a first loss threshold, or the loss value of the tail sub-model unit is smaller than a second loss threshold, or the number of iterations or the number of training reaches a number threshold, or the like.

In the training process, the training samples, the learning rate or the training iteration number in the training stage can be changed again according to the model performance in real time to optimize the performance of the whole model. In the application process, if the loss value is too large, the number of sub-model units can be increased, the training times can be increased, or the training data can be increased to improve the performance of the current model, so that the self-optimization of the whole system can be realized. The increasing of the number of the sub-model units may be that, for a depth information supplementary model including M pre-trained sub-model units, only a first model composed of the first N (N < M) sub-model units is applied, and on this basis, the number of the sub-model units may be increased, and at this time, the number of the sub-model units is increased to M at most, and if the number is increased, additional training is required.

S802, training to obtain a depth information supplementary model according to the standard sparse depth information and the standard dense depth information.

Optionally, the structure of the deep learning model further includes a confidence level layer, the confidence level layer is connected to the hourglass network layer, and the output of the hourglass network layer is further used as the input of the confidence level layer. The confidence level layer comprises a nonlinear activation function (Sigmoid) and is used for processing the supplementary depth information output by the hourglass network layer to obtain confidence level information, namely each pixel point in the image has a confidence value. The confidence information is used to calculate the loss value of the sub-model unit to which the confidence belongs. The Loss value Loss of a sub model cell can be calculated based on the following formula:

Loss＝Ic*||D-D_gt||

where Ic is confidence information, D is intermediate depth information, and D _ gt is standard dense depth information. In practice, Ic is a confidence matrix determined based on the confidence information, i.e., a matrix formed by the confidence values of the pixels in the confidence information. And D is a matrix corresponding to the intermediate depth information output by the accumulation layer. And D _ gt is a matrix corresponding to the standard dense depth information in the training sample. Ic. The sizes of D and D _ gt are the same.

It should be noted that, in the training process, the deep learning model includes a confidence level, and in the application process, the deep information supplementary model may or may not include a confidence level.

And S803, determining intermediate depth information of the target scene according to the sparse depth information of the target scene by the depth information supplementing model neutron model unit.

S804, the intermediate depth information determined by the tail sub-model unit in the depth information supplementary model is used as the dense depth information of the target scene.

The depth information compensation model is obtained through training by obtaining standard sparse depth information and standard dense depth information of a standard scene, sparse depth information is gradually supplemented through the depth information compensation model trained in advance, the sparse depth information is supplemented in multiple stages, the sparse depth information is supplemented fully, the depth information prediction precision of the model is improved, moreover, a training sample only comprises the standard sparse depth information and the standard dense depth information, extra data processing of model input and output can be simplified, and the prediction efficiency of the dense depth information is improved.

According to an embodiment of the present disclosure, fig. 9 is a structural diagram of a depth information processing apparatus in an embodiment of the present disclosure, and the embodiment of the present disclosure is suitable for a case where sparse depth information of a target scene is supplemented to generate dense depth information of the target scene, and the apparatus is implemented by software and/or hardware and is specifically configured in an electronic device with a certain data operation capability.

A sparse depth information input module 901, configured to supplement, by a depth information supplementing model, a sub-model unit, and determine intermediate depth information of a target scene according to sparse depth information of the target scene;

a dense depth information generating module 902, configured to use the intermediate depth information determined by the tail sub-model unit in the depth information supplementary model as dense depth information of the target scene.

Further, the sparse depth information input module 901 includes: the serial input unit is used for taking the sparse depth information of the target scene as the input of a header sub-model unit in the depth information supplementary model to obtain the intermediate depth information determined by the header sub-model unit; and the intermediate transmission unit is used for taking the intermediate depth information determined by the previous submodel unit of the other submodel units as the input of the other submodel units aiming at each other submodel unit except the head submodel unit in the depth information supplementary model to obtain the intermediate depth information determined by the other submodel units.

Further, the sparse depth information input module 901 includes: the complementary depth information acquisition unit is used for processing the input depth information through the hourglass network layer of the sub model unit of the depth information complementary model to obtain complementary depth information; and the supplementary depth information superposition unit is used for superposing the supplementary depth information and the depth information input by the sub-model unit through an accumulation layer of the sub-model unit to obtain the middle depth information determined by the sub-model unit.

Further, the supplementary depth information superimposing unit includes: the pixel matching subunit is used for determining the matching relationship between the complementary pixel points in the complementary depth information and the sparse pixel points in the input depth information through the accumulation layer of the sub-model unit; and the pixel superposition subunit is used for superposing the complementary depth data of the complementary pixel points and the sparse depth data of the matched sparse pixel points.

Further, the depth information processing apparatus further includes: the pixel information input module is used for determining depth characteristic information according to the pixel information of the target scene through the depth information supplementary model neutron model unit; and the intermediate depth information adjusting module is used for adjusting the intermediate depth information determined by the sub-model unit according to the depth characteristic information.

Further, the depth information processing apparatus further includes: the system comprises a sample acquisition module, a data acquisition module and a data processing module, wherein the sample acquisition module is used for acquiring standard sparse depth information and standard dense depth information of a standard scene; and the model training module is used for training to obtain a depth information supplementary model according to the standard sparse depth information and the standard dense depth information.

Further, the sample acquiring module includes: the dense depth information acquisition unit is used for acquiring images of the standard scene by adopting a depth sensor to obtain standard dense depth information; and the sparse depth information generating unit is used for sampling in the standard dense depth information to generate the standard sparse depth information.

Further, the sample acquiring module includes: the sparse depth information acquisition unit is used for acquiring videos of the standard scene by adopting radar equipment to obtain continuous multi-frame sparse depth information; the sparse depth information fusion unit is used for performing projection processing on the continuous multi-frame sparse depth information to generate the standard dense depth information; and the dense depth information generating unit is used for acquiring sparse depth information matched with the standard dense depth information from the continuous multi-frame sparse depth information and determining the sparse depth information as the standard sparse depth information.

The depth information processing device can execute the depth information processing method provided by any embodiment of the disclosure, and has the corresponding functional modules and beneficial effects of executing the depth information processing method.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 10 illustrates a schematic block diagram of an example electronic device 1000 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 10, the apparatus 1000 includes a computing unit 1001 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)1002 or a computer program loaded from a storage unit 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data necessary for the operation of the device 1000 can also be stored. The calculation unit 1001, the ROM 1002, and the RAM 1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.

A number of components in device 1000 are connected to I/O interface 1005, including: an input unit 1006 such as a keyboard, a mouse, and the like; an output unit 1007 such as various types of displays, speakers, and the like; a storage unit 1008 such as a magnetic disk, an optical disk, or the like; and a communication unit 1009 such as a network card, a modem, a wireless communication transceiver, or the like. The communication unit 1009 allows the device 1000 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

Computing unit 1001 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 1001 performs the respective methods and processes described above, such as depth information processing. For example, in some embodiments, the depth information processing may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 1008. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 1000 via ROM 1002 and/or communications unit 1009. When the computer program is loaded into the RAM 1003 and executed by the computing unit 1001, one or more steps of the depth information processing described above may be performed. Alternatively, in other embodiments, the computing unit 1001 may be configured to perform depth information processing in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

20页详细技术资料下载

Depth information processing method, device, apparatus, storage medium, and program product

相关技术

网友询问留言