Monocular absolute depth estimation method and device based on deep learning

文档序号：551859 发布日期：2021-05-14 浏览：2次中文

阅读说明：本技术 基于深度学习的单目绝对深度估计方法和装置 (Monocular absolute depth estimation method and device based on deep learning ) 是由徐枫唐瑞杰杨东于 2021-01-25 设计创作，主要内容包括：本发明提出一种基于深度学习的单目绝对深度估计方法和装置,其中,方法包括：获取样本RGB图片对应的深度数据和绝对深度；将样本RGB图片的相对深度输入初始模型,获取初始模型输出的参考绝对深度；根据预设的第一损失函数计算参考绝对深度和相对深度对应的绝对深度的损失值；若损失值大于预设阈值,则调整初始模型,直至损失值小于预设阈值时,训练完成与初始模型对应的目标模型；根据目标模型计算场景的绝对深度。由此,可以较好地应用于自动驾驶等对场景深度测算要求较高的场景中。(The invention provides a monocular absolute depth estimation method and device based on deep learning, wherein the method comprises the following steps: acquiring depth data and absolute depth corresponding to a sample RGB picture; inputting the relative depth of the sample RGB picture into an initial model, and obtaining a reference absolute depth output by the initial model; calculating loss values of the reference absolute depth and the absolute depth corresponding to the relative depth according to a preset first loss function; if the loss value is larger than the preset threshold value, adjusting the initial model until the loss value is smaller than the preset threshold value, and training a target model corresponding to the initial model; the absolute depth of the scene is calculated from the object model. Therefore, the method can be well applied to scenes with high requirements for measuring and calculating the scene depth, such as automatic driving.)

1. A monocular absolute depth estimation method based on deep learning is characterized by comprising the following steps:

acquiring depth data and absolute depth corresponding to a sample RGB picture;

inputting the relative depth of the sample RGB picture into an initial model, and acquiring a reference absolute depth output by the initial model;

calculating a loss value of the reference absolute depth and an absolute depth corresponding to the relative depth according to a preset first loss function;

if the loss value is larger than a preset threshold value, adjusting the initial model until the loss value is smaller than the preset threshold value, and training a target model corresponding to the initial model;

and calculating the absolute depth of the scene according to the target model.

2. The method of claim 1, wherein the preset first loss function is:

wherein D is the true absolute depth,for the model estimated reference absolute depth, N is the total number of picture pixels.

3. The method of claim 1, after said calculating an absolute depth of a scene from said object model, further comprising:

inputting the absolute depth of the sample RGB picture into the initial model, and acquiring the reference relative depth output by the model;

calculating loss values of the reference relative depth and the relative depth corresponding to the absolute depth according to a preset second loss function pair;

and if the loss value is greater than a preset threshold value, adjusting the target model until the loss value is less than the preset threshold value, and finishing the training of the target model.

4. The method of claim 3, wherein the second loss function comprises:

wherein the content of the first and second substances,is the average of the true relative depth results,reference is made to the average of the relative depths.

5. The method of claim 1, wherein said adjusting said initial model comprises:

and adjusting the model parameters of the initial model according to a back propagation method.

6. The method of claim 1, wherein said calculating an absolute depth of a scene from said object model comprises:

acquiring a target image of the scene;

and inputting the target image into the target model to obtain the absolute depth of the scene.

7. A monocular absolute depth estimation device based on deep learning, comprising:

the first acquisition module is used for acquiring depth data and absolute depth corresponding to the sample RGB picture;

the second acquisition module is used for inputting the relative depth of the sample RGB picture into an initial model and acquiring the reference absolute depth output by the initial model;

the calculation module is used for calculating loss values of the reference absolute depth and the absolute depth corresponding to the relative depth according to a preset first loss function;

the training module is used for adjusting the initial model when the loss value is larger than a preset threshold value, and training a target model corresponding to the initial model until the loss value is smaller than the preset threshold value;

and the processing module is used for calculating the absolute depth of the scene according to the target model.

8. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of claims 1-6 when executing the computer program.

9. A non-transitory computer-readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the method of any one of claims 1-6.

10. A computer program product, characterized in that instructions in the computer program product, when executed by a processor, implement the method according to any of claims 1-6.

Technical Field

The invention relates to the technical field of computer vision and deep learning, in particular to a monocular absolute depth estimation method and device based on deep learning.

Background

The monocular depth estimation technology is an important field in the field of computer vision, and has very wide application in the fields of automatic driving and the like. Before the introduction of deep learning, the distance of a front object to be measured in automatic driving needs expensive equipment such as radar, and the technical cost is very high. After the deep learning is introduced, the automobile is only required to be provided with a plurality of cameras or even a single camera to acquire RGB (red, green and blue) pictures of the front scene, and then the depth information of the front scene, namely the distance between the automobile and the front scene is acquired through a neural network algorithm. The application greatly reduces the technical cost of automatic driving and further promotes the development of automatic driving.

However, the current monocular depth estimation technology has a hidden problem that the estimated depth is not absolute depth but relative depth, i.e. the depth scale (i.e. distance unit) estimated by the model for the scene is uncertain and varies with scene changes. The monocular absolute depth estimation model used in the automatic driving at present is essentially trained by using the picture-depth data with uniform scale (in meters) to solve the scale ambiguity problem, and the acquisition of the data still needs to depend on equipment such as radar and the like, so that the cost of mass acquisition is very expensive; the image-depth data which is easy to obtain is usually the depth calculated by the SFM, and the scale of the depth is uncertain and cannot be directly applied to scenes which need absolute depth, such as automatic driving and the like.

Disclosure of Invention

The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.

Therefore, a first objective of the present invention is to provide a monocular absolute depth estimation method based on deep learning, so as to achieve the effect of reducing the technical cost in the field of automatic driving.

The second purpose of the invention is to provide a monocular absolute depth estimation device based on deep learning.

A third object of the invention is to propose a computer device.

A fourth object of the invention is to propose a non-transitory computer-readable storage medium.

A fifth object of the invention is to propose a computer program product.

In order to achieve the above object, an embodiment of a first aspect of the present invention provides a method for monocular absolute depth estimation based on deep learning, including: acquiring depth data and absolute depth corresponding to a sample RGB picture;

inputting the relative depth of the sample RGB picture into an initial model, and acquiring a reference absolute depth output by the initial model;

calculating a loss value of the reference absolute depth and an absolute depth corresponding to the relative depth according to a preset first loss function;

and calculating the absolute depth of the scene according to the target model.

To achieve the above object, a second embodiment of the present invention provides a monocular absolute depth estimation device based on deep learning, including: the first acquisition module is used for acquiring depth data and absolute depth corresponding to the sample RGB picture;

the second acquisition module is used for inputting the relative depth of the sample RGB picture into an initial model and acquiring the reference absolute depth output by the initial model;

the calculation module is used for calculating loss values of the reference absolute depth and the absolute depth corresponding to the relative depth according to a preset first loss function;

and the processing module is used for calculating the absolute depth of the scene according to the target model.

To achieve the above object, a third embodiment of the present invention provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the processor implements the monocular absolute depth estimation method based on deep learning as described in the first embodiment.

In order to achieve the above object, a fourth aspect of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for monocular absolute depth estimation based on deep learning as described in the first aspect of the present invention.

In order to achieve the above object, a fifth embodiment of the present invention provides a computer program product, which when being executed by an instruction processor, implements the deep learning-based monocular absolute depth estimation method as described in the first embodiment.

The embodiment of the invention at least has the following technical effects:

based on monocular absolute depth estimation of deep learning, the model input is a single RGB picture of a scene, and the method can estimate the corresponding scene depth in meters. According to the method, only a small amount of picture-absolute depth data is needed, most of the data are easily acquired picture-relative depth data, the relative depth data can also be used for training an absolute depth estimation model through a specially designed loss function, and finally a monocular absolute depth estimation model which can be applied to various scenes is obtained.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a schematic flowchart of a monocular absolute depth estimation method based on deep learning according to an embodiment of the present invention;

fig. 2 is a schematic view of a specific monocular absolute depth estimation scene based on deep learning according to an embodiment of the present invention; and

fig. 3 is a schematic structural diagram of a monocular absolute depth estimation device based on deep learning according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

With the wide application of deep learning, many fields of computer vision have been greatly developed, and problems that some conventional methods cannot solve can also be solved by using deep learning based on a priori knowledge of a large amount of data, such as a monocular depth estimation for estimating the depth of a scene through a single RGB picture of the scene. The present invention presents a more rigorous problem: monocular absolute depth estimation estimates the absolute depth (for example, in meters) of a scene through a single RGB picture of the scene, and provides a solution based on deep learning.

The following describes a monocular absolute depth estimation method and apparatus based on deep learning according to an embodiment of the present invention with reference to the drawings.

Specifically, the invention provides a monocular absolute depth estimation method, which only needs a small amount of data with the depth as the absolute scale and trains a model by using a large amount of data with the depth as the relative scale, so that the model can estimate the absolute depth of a scene through a single RGB picture, the technical cost in the field of automatic driving can be effectively reduced, and the method can be applied to various scenes needing the absolute depth with stronger universality.

Fig. 1 is a schematic flowchart of a monocular absolute depth estimation method based on deep learning according to an embodiment of the present invention.

As shown in fig. 1, the monocular absolute depth estimation method based on deep learning includes the following steps:

step 101, obtaining depth data and absolute depth corresponding to a sample RGB picture.

In this embodiment, relative depth data and absolute depth data are collected for model training, and a DeMoN dataset and a MegaDepth dataset are used as the relative depth dataset. Each sample in the data set comprises an RGB picture and a relative depth with uncertain scale calculated by an SFM method; additionally, the KITTI dataset is used as the absolute depth dataset. Each sample in the data set contains one RGB picture and the absolute depth in meters obtained by radar acquisition.

And 102, inputting the relative depth of the sample RGB picture into an initial model, and acquiring a reference absolute depth output by the initial model.

Model training was performed using the collected data. The method uses U-Net as a training model (initial model), inputs an RGB picture of a scene, and outputs the corresponding wiping reference absolute depth.

And 103, calculating loss values of the reference absolute depth and the absolute depth corresponding to the relative depth according to a preset first loss function.

And calculating errors between the result of model prediction and the actual absolute/relative depth of the scene in the training process, and updating model parameters through a back propagation algorithm. The method in calculating the error is also different for different samples. For the absolute depth sample, the loss function is designed as the following equation (1):

wherein D is the true absolute depth,for the model estimated reference absolute depth, N is the total number of picture pixels.

And 104, if the loss value is greater than a preset threshold value, adjusting the initial model until the loss value is less than the preset threshold value, and training a target model corresponding to the initial model.

In this embodiment, if the loss value is greater than a preset threshold, the initial model is adjusted until the loss value is less than the preset threshold, and a target model corresponding to the initial model is trained.

In one embodiment of the invention, the loss function is designed as equation (2) for the relative depth samples:

wherein the content of the first and second substances,is the average of the true relative depth results,is the average of the reference relative depths.

In an embodiment of the present invention, in order to further optimize a target model, an absolute depth of the sample RGB picture is input to the initial model, a reference relative depth output by the model is obtained, a loss value of a relative depth corresponding to the reference relative depth and the absolute depth is calculated according to a preset second loss function pair, and if the loss value is greater than a preset threshold, the target model is adjusted until the loss value is less than the preset threshold, and training of the target model is completed.

By the method, the depth result of the model estimation only needs to be close to the real result in proportion, so that the requirement on scale estimation is avoided. In the actual training process, a large amount of relative depth data enables the model to estimate the relative distance between scene objects, and specific scale information is determined by absolute depth data, so that the model can estimate the absolute depth of the scene finally.

Step 105, calculating the absolute depth of the scene according to the target model.

In this embodiment, after the model training is completed, a new RGB image of the scene is given, and the absolute depth corresponding to the scene can be obtained by inputting the RGB image into the model.

The application scenario of the invention is shown in fig. 2, and mainly faces the field of automatic driving. When the automatic driving system is used, a single camera is installed on an automobile, the camera continuously shoots RGB (red, green and blue) pictures of a front scene in the driving process of the automobile, and then the RGB pictures are input into a trained neural network model to obtain the absolute depth of the scene, so that the operation execution of the automatic driving system is guided.

In summary, the monocular absolute depth estimation method based on deep learning according to the embodiment of the present invention can be applied to various scenes requiring absolute depth. According to the method, a large amount of relative depth data and a small amount of absolute depth data are used for training the model, meanwhile, the scale ambiguity of the relative depth data is eliminated by designing a loss function, so that the model training has a large amount of usable data, a monocular absolute depth estimation model with strong generalization performance is finally trained, the technical cost in the field of automatic driving can be reduced, and the method can be effectively applied to other scenes needing absolute depth.

In order to implement the above embodiments, the present invention further provides a monocular absolute depth estimation device based on deep learning.

Fig. 3 is a schematic structural diagram of a monocular absolute depth estimation device based on deep learning according to an embodiment of the present invention.

As shown in fig. 3, the apparatus for monocular absolute depth estimation based on deep learning includes: a first acquisition module 310, a second acquisition module 320, a calculation module 330, a training module 340, a processing module 350, wherein,

a first obtaining module 310, configured to obtain depth data and absolute depth corresponding to a sample RGB picture;

a second obtaining module 320, configured to input the relative depth of the sample RGB picture into an initial model, and obtain a reference absolute depth output by the initial model;

a calculating module 330, configured to calculate a loss value of the reference absolute depth and an absolute depth corresponding to the relative depth according to a preset first loss function;

a training module 340, configured to adjust the initial model when the loss value is greater than a preset threshold, and train a target model corresponding to the initial model until the loss value is less than the preset threshold;

a processing module 350 for calculating an absolute depth of the scene from the object model.

It should be noted that the foregoing explanation on the embodiment of the method for estimating monocular absolute depth based on deep learning is also applicable to the apparatus for estimating monocular absolute depth based on deep learning of this embodiment, and is not repeated here.

In order to implement the foregoing embodiments, the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the processor implements the monocular absolute depth estimation method based on deep learning as described in the foregoing embodiments.

In order to achieve the above embodiments, the present invention also proposes a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the deep learning-based monocular absolute depth estimation method as described in the above embodiments.

In order to implement the above embodiments, the present invention further provides a computer program product, which when executed by an instruction processor in the computer program product implements the monocular absolute depth estimation method based on deep learning as described in the above embodiments.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

12页详细技术资料下载

Monocular absolute depth estimation method and device based on deep learning

相关技术

网友询问留言