Three-dimensional image processing method and device based on neural network and electronic equipment

文档序号:1954770 发布日期:2021-12-10 浏览:16次 中文

阅读说明:本技术 基于神经网络的三维图像处理方法、装置及电子设备 (Three-dimensional image processing method and device based on neural network and electronic equipment ) 是由 罗天文 戴磊 刘玉宇 于 2021-09-15 设计创作,主要内容包括:本发明适用于人工智能及数字医疗领域,并公开了一种基于神经网络的三维图像处理方法、装置及电子设备,在基于神经网络的三维图像处理方法中,通过获取用于表征目标的深度图像的第一深度信息,并将第一深度信息输入至深度神经网络模型中进行转换得到目标的目标三维图像信息,深度神经网络模型是根据三维样本的样本三维图像信息和三维样本的第二深度信息经过训练后得到的,深度神经网络模型经过训练后可以准确地将二维的深度图像转换为与原三维的物体相对应的三维图像,因此将第一深度信息输入至训练好的深度神经网络模型中能够输出准确度高的目标三维图像信息,能够提高构建三维图像的精确度,从而提高计算机视觉识别的精度。(The invention is suitable for the field of artificial intelligence and digital medical treatment, and discloses a three-dimensional image processing method, a device and an electronic device based on a neural network, in the three-dimensional image processing method based on the neural network, the target three-dimensional image information of a target is obtained by obtaining first depth information of a depth image for representing the target and inputting the first depth information into a depth neural network model for conversion, the depth neural network model is obtained by training according to sample three-dimensional image information of a three-dimensional sample and second depth information of the three-dimensional sample, the two-dimensional depth image can be accurately converted into a three-dimensional image corresponding to an original three-dimensional object after the depth neural network model is trained, therefore, the first depth information is input into the trained depth neural network model to output the target three-dimensional image information with high accuracy, the accuracy of constructing the three-dimensional image can be improved, and therefore the accuracy of computer vision recognition is improved.)

1. A three-dimensional image processing method based on a neural network is characterized by comprising the following steps:

acquiring first depth information of a target, wherein the first depth information is used for representing a depth image of the target;

inputting the first depth information into an input layer of a depth neural network model, wherein the depth neural network model is obtained by training according to sample three-dimensional image information and second depth information of a three-dimensional sample, and the second depth information is used for representing a depth image of the three-dimensional sample;

inputting the first depth information into a convolution layer of the deep neural network model for convolution to obtain a first characteristic value;

inputting the first characteristic value into a change characteristic dimension layer of the deep neural network model for characteristic conversion to obtain a first three-dimensional characteristic body;

inputting the first three-dimensional feature into a three-dimensional deconvolution layer of the deep neural network model for deconvolution to obtain target three-dimensional image information of the target;

and outputting the target three-dimensional image information.

2. The three-dimensional image processing method based on the neural network according to claim 1, wherein the obtaining of the first depth information of the target includes:

acquiring two-dimensional image information of the target and depth image information corresponding to the two-dimensional image information;

and performing target detection on the two-dimensional image information to identify target image information used for representing the target from the two-dimensional image information, and obtaining corresponding first depth information from the depth image information according to the target image information.

3. The neural network-based three-dimensional image processing method according to claim 1, wherein the deep neural network model is obtained by training according to the following steps:

acquiring the sample three-dimensional image information of the three-dimensional sample;

obtaining second depth information of the three-dimensional sample according to the three-dimensional image information of the sample;

inputting the second depth information into the input layer;

inputting the second depth information into the convolutional layer for convolution to obtain a second characteristic value;

inputting the second characteristic value into the characteristic dimension changing layer to perform characteristic conversion to obtain a second three-dimensional characteristic body;

inputting the second three-dimensional feature into the three-dimensional deconvolution layer for deconvolution to obtain training three-dimensional image information of the three-dimensional sample;

inputting the training three-dimensional image information and the sample three-dimensional image information into a loss function to calculate a loss value;

and obtaining a target weight parameter according to the loss value and adjusting the deep neural network model according to the target weight parameter.

4. The neural network-based three-dimensional image processing method according to claim 3, wherein the inputting the second depth information into the input layer includes:

carrying out random first amplification transformation on the second depth information to obtain third depth information, wherein the first amplification transformation comprises one of random Gaussian noise values, random scaling, random angle rotation, random translation and random selection of partial areas of the depth image;

inputting the third depth information into the input layer.

5. The neural network-based three-dimensional image processing method according to claim 4, wherein the inputting the training three-dimensional image information and the sample three-dimensional image information into a loss function to calculate a loss value comprises:

converting the sample three-dimensional image information into first mesh information;

performing a second augmented transformation corresponding to the first augmented transformation on the first mesh information to obtain second mesh information corresponding to the third depth information perspective, wherein the second augmented transformation includes performing one of scaling, angular rotation and translation corresponding to the second depth information;

discretizing the second mesh information into sample three-dimensional voxel information;

inputting the training three-dimensional image information and the sample three-dimensional voxel information into the loss function to calculate the loss value.

6. The method of claim 3, wherein the obtaining a target weight parameter according to the loss value and adjusting the deep neural network model according to the target weight parameter comprises:

optimizing the loss value and performing back propagation chain type derivation on the optimized loss value to obtain a weight parameter gradient;

and executing gradient descending processing according to the weight parameter gradient to obtain the target weight parameter.

7. The three-dimensional image processing method based on the neural network as claimed in claim 6, wherein the performing gradient descent processing according to the weight parameter gradient to obtain the target weight parameter comprises:

and executing gradient descent processing according to the weight parameter gradient obtained by the last training to obtain the target weight parameter.

8. A three-dimensional image processing apparatus based on a neural network, comprising:

the image acquisition module is used for acquiring first depth information of a target, and the first depth information is used for representing a depth image of the target;

the processing module is connected with the image acquisition module and is used for inputting the first depth information into an input layer of a depth neural network model, wherein the depth neural network model is obtained by training according to sample three-dimensional image information and second depth information of a three-dimensional sample, and the second depth information is used for representing a depth image of the three-dimensional sample;

the processing module is further configured to input the first depth information into a convolution layer of the deep neural network model for convolution to obtain a first feature value, input the first feature value into a change feature dimension layer of the deep neural network model for feature conversion to obtain a first three-dimensional feature volume, and input the first three-dimensional feature volume into a three-dimensional deconvolution layer of the deep neural network model for deconvolution to obtain target three-dimensional image information of the target;

the processing module is further used for outputting the target three-dimensional image information.

9. An electronic device, comprising a memory storing a computer program, and a processor implementing the neural network-based three-dimensional image processing method according to any one of claims 1 to 7 when the processor executes the computer program.

10. A computer-readable storage medium characterized in that the storage medium stores a program executed by a processor to implement the neural network-based three-dimensional image processing method according to any one of claims 1 to 7.

Technical Field

The invention relates to the technical field of artificial intelligence and digital medical treatment, in particular to a three-dimensional image processing method and device based on a neural network and electronic equipment.

Background

In computer vision applications in the fields of artificial intelligence and digital medical treatment, two-dimensional image information of an object is often required to be converted into a three-dimensional image, namely a three-dimensional model, for example, the application of face recognition is widely applied in many fields, a terminal constructs a three-dimensional image by acquiring image information of a face, including depth image information, so that the three-dimensional image of the face, namely the three-dimensional face model, can be obtained, and the accuracy in the face recognition is improved. However, in the related art, there are many disadvantages in that a three-dimensional image is constructed from a depth image and two-dimensional image information acquired by a terminal device, the depth image corresponds to depth information of each position on a two-dimensional plane acquired at a specific viewing angle, and a three-dimensional object always has a part of an area blocked when viewed from different viewing angles, so that the three-dimensional image is simply constructed from the two-dimensional image information, which causes inaccuracy of the three-dimensional image, and the accuracy of computer visual recognition cannot be improved.

Disclosure of Invention

The following is a summary of the subject matter described in detail herein. This summary is not intended to limit the scope of the claims.

The embodiment of the invention provides a three-dimensional image processing method and device based on a neural network, electronic equipment and a storage medium, which can improve the accuracy of constructing a three-dimensional image, thereby improving the accuracy of computer vision identification.

In a first aspect, an embodiment of the present invention provides a three-dimensional image processing method based on a neural network, including:

acquiring first depth information of a target, wherein the first depth information is used for representing a depth image of the target;

inputting the first depth information into an input layer of a depth neural network model, wherein the depth neural network model is obtained by training according to sample three-dimensional image information and second depth information of a three-dimensional sample, and the second depth information is used for representing a depth image of the three-dimensional sample;

inputting the first depth information into a convolution layer of the deep neural network model for convolution to obtain a first characteristic value;

inputting the first characteristic value into a change characteristic dimension layer of the deep neural network model for characteristic conversion to obtain a first three-dimensional characteristic body;

inputting the first three-dimensional feature into a three-dimensional deconvolution layer of the deep neural network model for deconvolution to obtain target three-dimensional image information of the target;

and outputting the target three-dimensional image information.

In some embodiments, the obtaining the first depth information of the target includes:

acquiring two-dimensional image information of the target and depth image information corresponding to the two-dimensional image information;

and performing target detection on the two-dimensional image information to identify target image information used for representing the target from the two-dimensional image information, and obtaining corresponding first depth information from the depth image information according to the target image information.

In some embodiments, the deep neural network model is trained according to the following steps:

acquiring the sample three-dimensional image information of the three-dimensional sample;

obtaining second depth information of the three-dimensional sample according to the three-dimensional image information of the sample;

inputting the second depth information into the input layer;

inputting the second depth information into the convolutional layer for convolution to obtain a second characteristic value;

inputting the second characteristic value into the characteristic dimension changing layer to perform characteristic conversion to obtain a second three-dimensional characteristic body;

inputting the second three-dimensional feature into the three-dimensional deconvolution layer for deconvolution to obtain training three-dimensional image information of the three-dimensional sample;

inputting the training three-dimensional image information and the sample three-dimensional image information into a loss function to calculate a loss value;

and obtaining a target weight parameter according to the loss value and adjusting the deep neural network model according to the target weight parameter.

In some embodiments, said inputting said second depth information into said input layer comprises:

carrying out random first amplification transformation on the second depth information to obtain third depth information, wherein the first amplification transformation comprises one of random Gaussian noise values, random scaling, random angle rotation, random translation and random selection of partial areas of the depth image;

inputting the third depth information into the input layer.

In some embodiments, the inputting the training three-dimensional image information and the sample three-dimensional image information into a loss function to calculate a loss value comprises:

converting the sample three-dimensional image information into first mesh information;

performing a second augmented transformation corresponding to the first augmented transformation on the first mesh information to obtain second mesh information corresponding to the third depth information perspective, wherein the second augmented transformation includes performing one of scaling, angular rotation and translation corresponding to the second depth information;

discretizing the second mesh information into sample three-dimensional voxel information;

inputting the training three-dimensional image information and the sample three-dimensional voxel information into the loss function to calculate the loss value.

In some embodiments, the deriving a target weight parameter according to the loss value and adjusting the deep neural network model according to the target weight parameter includes:

optimizing the loss value and performing back propagation chain type derivation on the optimized loss value to obtain a weight parameter gradient;

and executing gradient descending processing according to the weight parameter gradient to obtain the target weight parameter.

In some embodiments, the performing a gradient descent process according to the weight parameter gradient to obtain the target weight parameter includes:

and executing gradient descent processing according to the weight parameter gradient obtained by the last training to obtain the target weight parameter.

In a second aspect, an embodiment of the present invention further provides a three-dimensional image processing apparatus based on a neural network, including:

the image acquisition module is used for acquiring first depth information of a target;

the processing module is connected with the image acquisition module and is used for inputting the first depth information into an input layer of a deep neural network model, wherein the deep neural network model is obtained by training according to sample three-dimensional image information and second depth information of a three-dimensional sample;

the processing module is further configured to input the first depth information into a convolution layer of the deep neural network model for convolution to obtain a first feature value, input the first feature value into a change feature dimension layer of the deep neural network model for feature conversion to obtain a first three-dimensional feature volume, and input the first three-dimensional feature volume into a three-dimensional deconvolution layer of the deep neural network model for deconvolution to obtain target three-dimensional image information of the target;

the processing module is further used for outputting the target three-dimensional image information.

In a third aspect, an embodiment of the present invention further provides an electronic device, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the neural network-based three-dimensional image processing method according to the first aspect when executing the computer program.

In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, where the storage medium stores a program, and the program is executed by a processor to implement the neural network-based three-dimensional image processing method according to the first aspect.

The embodiment of the invention at least comprises the following beneficial effects:

the invention discloses a three-dimensional image processing method, a device, electronic equipment and a storage medium based on a neural network, which are provided by the embodiment of the invention, first depth information of a target is obtained, the first depth information is used for representing a depth image of the target, the first depth information is input into a depth neural network model to be converted to obtain target three-dimensional image information of the target, wherein, first, second depth information is input into an input layer in the depth neural network model, the input layer transmits the second depth information to a convolution layer to be convolved to obtain a first characteristic value, then the first characteristic value is input into a change characteristic dimension layer to be subjected to characteristic conversion to obtain a first three-dimensional characteristic body, then, the three-dimensional deconvolution layer is used for carrying out deconvolution on the first three-dimensional characteristic body to obtain target three-dimensional image information, and finally, the target three-dimensional image information obtained by the depth neural network model is output, the depth neural network model is obtained by training according to sample three-dimensional image information of a three-dimensional sample and second depth information of the three-dimensional sample, and the depth neural network model can accurately convert a two-dimensional depth image into a three-dimensional image corresponding to an original three-dimensional object after being trained, so that the first depth information is input into the trained depth neural network model, target three-dimensional image information with high accuracy can be output, the accuracy of constructing the three-dimensional image can be improved, and the accuracy of computer visual identification is improved.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the example serve to explain the principles of the invention and not to limit the invention.

Fig. 1 is a schematic flow chart of a neural network-based three-dimensional image processing method according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of a neural network-based three-dimensional image processing method according to another embodiment of the present invention;

FIG. 3 is a schematic flow chart of a neural network-based three-dimensional image processing method according to another embodiment of the present invention;

FIG. 4 is a schematic flow chart of a neural network-based three-dimensional image processing method according to another embodiment of the present invention;

FIG. 5 is a schematic flow chart of a neural network-based three-dimensional image processing method according to another embodiment of the present invention;

FIG. 6 is a schematic flow chart of a neural network-based three-dimensional image processing method according to another embodiment of the present invention;

FIG. 7 is a schematic diagram of a neural network-based three-dimensional image processing apparatus according to an embodiment of the present invention;

fig. 8 is a schematic diagram of an electronic device according to an embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

It should be understood that in the description of the embodiments of the present invention, several means are more than one, several (or several) means are more than two, more than, less than, more than, etc. are understood as excluding the number, and more than, less than, etc. are understood as including the number. If the description of "first", "second", etc. is used for the purpose of distinguishing technical features, it is not intended to indicate or imply relative importance or to implicitly indicate the number of indicated technical features or to implicitly indicate the precedence of the indicated technical features.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein is for the purpose of describing embodiments of the invention only and is not intended to be limiting of the invention.

First, several terms referred to in the present application are resolved:

artificial Intelligence (AI): is a new technical science for researching and developing theories, methods, technologies and application systems for simulating, extending and expanding human intelligence; artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produces a new intelligent machine that can react in a manner similar to human intelligence, and research in this field includes robotics, language recognition, image recognition, natural language processing, and expert systems, among others. The artificial intelligence can simulate the information process of human consciousness and thinking. Artificial intelligence is also a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results.

The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.

The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

The gradient descent algorithm is used as a commonly used optimization algorithm in machine learning, only a first derivative of a loss function needs to be solved in the solving process, the calculation cost is low, the basic idea is to find a gradient direction first, find a point, then find the steepest gradient until the lowest point, namely the minimum cost function convergence point, and the gradient descent method has three different forms: batch Gradient Descent (Batch Gradient), Stochastic Gradient Descent (Stochastic Gradient), and Mini-Batch Gradient Descent (Mini-Batch Gradient), which are also commonly used in deep learning for model training.

The relu activation function is used for adding a nonlinear factor, and if no activation function exists, the input of each layer of nodes in the neural network model is a linear function of the output of an upper layer, so that the output is a linear combination of the inputs no matter how many layers of the neural network exist, the effect is equivalent to that of no hidden layer, the nonlinear function is introduced as the relu activation function to serve as the activation function, the expression capability of the neural network can be improved, the neural network is not the linear combination of the inputs any more, the relu activation function does not have a saturation region, the gradient disappearance problem does not exist, complicated exponential operation does not exist, the calculation is simple, the efficiency is improved, the actual convergence speed is high, the biological neural network model is more consistent with a biological neural activation mechanism, and when the relu activation function is adopted by the neural network model, each sample can have a self weight coefficient, namely unique nonlinear transformation.

The Medical cloud is a Medical cloud platform which is created by using cloud computing on the basis of new technologies such as cloud computing, mobile technology, multimedia, 4G communication, big data, internet of things and the like and combining Medical technology, and Medical resources are shared and the Medical scope is expanded. Due to the combination of the cloud computing technology, the medical cloud improves the efficiency of medical institutions and brings convenience to residents to see medical advice. Like the appointment register, the electronic medical record, the medical insurance and the like of the existing hospital are all products combining cloud computing and the medical field, and the medical cloud also has the advantages of data security, information sharing, dynamic expansion and overall layout.

Based on this, embodiments of the present invention provide a three-dimensional image processing method and apparatus based on a neural network, an electronic device, and a storage medium, which can improve the accuracy of constructing a three-dimensional image, thereby improving the accuracy of computer vision recognition.

The embodiment of the invention provides a three-dimensional image processing method, a three-dimensional image processing device, an electronic device and a storage medium based on a neural network, and specifically, the following embodiments are used for explaining, and firstly, the three-dimensional image processing method based on the neural network in the embodiment of the disclosure is described.

The embodiment of the invention provides a three-dimensional image processing method based on a neural network, relates to the technical field of artificial intelligence and digital medical treatment, and belongs to a branch field in the technical field of artificial intelligence. The three-dimensional image processing method based on the neural network can be applied to a terminal, a server side and software running in the terminal or the server side. In some embodiments, the terminal may be a smartphone, tablet, laptop, desktop computer, smart watch, or the like; the server side can be configured into an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and cloud servers for providing basic cloud computing services such as cloud service, a cloud database, cloud computing, cloud functions, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CDN (content delivery network) and big data and artificial intelligence platforms; the software may be an application or the like that implements a neural network-based three-dimensional image processing method, but is not limited to the above form.

Fig. 1 is an alternative flowchart of a neural network-based three-dimensional image processing method provided in an embodiment of the present disclosure, and the method in fig. 1 may include, but is not limited to, steps S110 to S160.

Step S110, obtaining first depth information of the target, where the first depth information is used to represent a depth image of the target.

In some embodiments of the present invention, the three-dimensional image processing method based on the neural network first obtains first depth information of a target, the target is a target object to be subjected to three-dimensional image construction in the embodiments of the present invention, in an embodiment, the target may be a human face, and on the premise of meeting requirements of the embodiments of the present invention, the target may be another object, it should be noted that, in an embodiment, the three-dimensional image processing method based on the neural network may also be applied to a digital medical or medical cloud, and the construction of the medical three-dimensional image is achieved by identifying the first depth information of the target, in the embodiments of the present invention, the human face is taken as an example, but not shown as a limitation to the present invention, the first depth information is used to represent a depth image of the target, which may be obtained by a depth image camera, by obtaining the first depth information of the target, for processing through the depth image of the target.

Step S120, inputting the first depth information into an input layer of a depth neural network model, wherein the depth neural network model is obtained by training according to sample three-dimensional image information of the three-dimensional sample and second depth information, and the second depth information is used for representing a depth image of the three-dimensional sample.

In some embodiments of the present invention, after obtaining the first depth information, the first depth information is input into a trained deep neural network model, including first inputting into an input layer of the deep neural network model, the deep neural network model in the embodiments of the present invention is obtained by training in advance according to sample three-dimensional image information of a three-dimensional sample and second depth information of the three-dimensional sample, the three-dimensional sample is a three-dimensional object corresponding to a target, and when the target is a human face, the three-dimensional samples are a batch of human face three-dimensional samples which are prepared in advance and used for neural network model training, in one embodiment, the deep neural network model is a three-dimensional model that outputs a sample based on the input second depth information, and the depth information is obtained by optimization after calculation with the three-dimensional image information of the sample, and the second depth information is used for representing the depth image of the three-dimensional sample. The deep neural network model in the embodiment of the invention has the advantages that the accuracy of the deep neural network model is higher due to the fact that the training is performed in advance through the three-dimensional image information and the second depth information of the three-dimensional sample, and an accurate three-dimensional model can be output well through the processing of the deep neural network model when different depth information is faced, so that the purpose of outputting the accurate three-dimensional model can be achieved after the first depth information is input into the deep neural network model.

Step S130, inputting the first depth information into a convolution layer of the deep neural network model for convolution to obtain a first characteristic value.

In some embodiments of the present invention, the convolutional layer performs two-dimensional convolution on the first depth information from the input layer, and after the convolution, a first feature value of the target is obtained, the deep neural network model may set a plurality of convolutional layers to perform convolution on the first depth information, so as to achieve a convolution effect, in one embodiment, the first feature value is a feature map, the image width and height of the first depth information input to the deep neural network model is 320 × 320, the deep neural network model sets 7 convolutional layers including a first convolutional layer, a second convolutional layer, a third convolutional layer, a fourth convolutional layer, a fifth convolutional layer, a sixth convolutional layer and a seventh convolutional layer, each convolutional layer performs two-dimensional convolution (conv2d), and the convolutional layers are all 3 × 3 in size. The first convolution layer step size is 2x2, a feature graph with the width and the height of 160x160 is output, the output channel number C is 16, and then a relu activation function is executed on the output feature graph; step size 2x2 of the second convolutional layer, and output the characteristic diagram with width and height 80x80, the output channel number C is 32, and then the relu activation function is executed on the output characteristic diagram; step size of the third convolutional layer is 2x2, a feature map with width and height of 40x40 is output, the number of output channels C is 64, and then a relu activation function is executed on the output feature map; step 2x2 of the fourth convolutional layer, and output the characteristic diagram with width and height of 20x20, the output channel number C is 128, and then the relu activation function is executed on the output characteristic diagram; a fifth convolutional layer step size is 2x2, a feature map with the width and the height of 10x10 is output, the number of output channels C is 256, and then a relu activation function is executed on the output feature map; step 2x2 of the sixth convolutional layer, and outputting a feature map with width and height of 5x5, wherein the output channel number C is 512, and then executing a relu activation function on the output feature map; and the seventh convolutional layer has the step size of 1x1, and outputs a feature map with the width and the height of 5x5, the output channel number C is 1280, then a relu activation function is executed on the output feature map, and the feature map output by the seventh convolutional layer is the first feature value.

It should be noted that, 1 in the number of channels represents only one feature map, and the number of numerical values represents how many feature maps are.

Step S140, inputting the first characteristic value into a change characteristic dimension layer of the deep neural network model for characteristic conversion to obtain a first three-dimensional characteristic body.

In some embodiments of the present invention, the feature dimension changing layer performs a feature transformation of a dimension on a first feature value from the convolutional layer, the feature dimension changing layer may correspond to a reshape layer in the deep neural network model, and a first three-dimensional feature of a target may be obtained after the feature transformation, and the deep neural network model may set a plurality of feature dimension changing layers to perform the feature transformation on the first feature value, so as to achieve a transformation effect And outputting the first three-dimensional characteristic body with the channel number C being 256.

And S150, inputting the first three-dimensional feature into a three-dimensional deconvolution layer of the deep neural network model for deconvolution to obtain target three-dimensional image information of the target.

In some embodiments of the present invention, the three-dimensional deconvolution layer performs three-dimensional deconvolution on a first three-dimensional feature from the layer with changed feature dimensions, and after the three-dimensional deconvolution, target three-dimensional image information of a target can be obtained, and the deep neural network model can set a plurality of three-dimensional deconvolution layers to perform deconvolution on the first three-dimensional feature, so as to achieve a three-dimensional deconvolution effect. The first three-dimensional deconvolution layer outputs three-dimensional feature bodies with the width and height of 10x10, the depth of 10D and the number of output channels of 256, and then executes relu activation functions on the output three-dimensional feature bodies; the second three-dimensional deconvolution layer outputs three-dimensional feature bodies with width and height of 20x20, depth D of 20 and output channel number C of 128, and then executes relu activation function on the output three-dimensional feature bodies; the third three-dimensional deconvolution layer outputs a three-dimensional feature body with the width and height of 40x40, the depth of 40D and the number of output channels of 64C, and then executes a relu activation function on the output three-dimensional feature body; the fourth three-dimensional deconvolution layer outputs a three-dimensional feature body with the width and height of 80x80, the depth of 80D and the number of output channels of 32, and then executes a relu activation function on the output three-dimensional feature body; the fifth three-dimensional deconvolution layer outputs a three-dimensional feature body with the width and height of 160x160, the depth of 160D and the number of output channels of 16, and then executes a relu activation function on the output three-dimensional feature body; the sixth three-dimensional deconvolution layer outputs a three-dimensional feature body with the width and height of 320x320, the depth of 320D and the number of output channels of C1, then a relu activation function is executed on the output three-dimensional feature body, and finally the three-dimensional feature body output by the sixth three-dimensional deconvolution layer is target three-dimensional image information, namely the final model output of the deep neural network model, and finally the target three-dimensional image information with the width and height of 320x320, the depth of 320D and the number of output channels of C1 is obtained.

It should be noted that, 1 in the number of channels represents only one three-dimensional feature, and the number of numerical values represents how many three-dimensional features are.

Step S160, outputting the target three-dimensional image information.

In some embodiments of the present invention, the first depth information is processed by a depth neural network to obtain three-dimensional image information of a target, and the three-dimensional image information of the target obtained by the processing is output, that is, the three-dimensional model obtained by converting two-dimensional first depth information after the processing by the depth neural network is input to the depth neural network model, and the three-dimensional model of the face is output.

Referring to fig. 2, step S110 in the embodiment of the present invention may further include, but is not limited to, the following step S210 and step S220.

In step S210, two-dimensional image information of the target and depth image information corresponding to the two-dimensional image information are acquired.

Step S210, performing target detection on the two-dimensional image information to identify target image information used for representing a target from the two-dimensional image information, and obtaining corresponding first depth information from the depth image information according to the target image information.

In some embodiments of the present invention, after obtaining two-dimensional image information of a target and depth image information corresponding to the two-dimensional image information, target detection is required to obtain more accurate first depth information, in an embodiment, an RGB-D (where D is depth information) camera disposed in a terminal device, acquired image data of RGB-D includes the two-dimensional image information and the depth image information, a depth channel D is extracted to obtain the depth image information, an RGB channel is extracted to obtain a color RGB image, which is two-dimensional image information of a target object, then target detection is performed to identify target image information for representing the target from the two-dimensional image information, when the target object is a human face, a general human face detector is used to perform human face detection on the RGB image to obtain a human face detection frame, which may be target image information, and intercepting the depth image of the area to obtain first depth information.

It should be noted that, in an embodiment, the target detection may further include: the face detection frame needs to be expanded upwards, downwards, left and right by a distance of 20% of the corresponding side length to obtain an expanded face frame. This is because the human face detection frame has errors, and the value of 20% is taken as the frame expansion range according to practical experience, so as to ensure that the human face area can be completely framed, on the premise of meeting the requirements of the embodiment of the invention, the invention can also extend the distance with other sizes, the invention does not limit the invention specifically, a part of depth image is cut out by using the enlarged face frame coordinate in the rectangular frame area of the corresponding coordinate position on the depth image, the obtained result is the depth image of the face area, and the depth image of the face area is subjected to data normalization operation to obtain a depth image with the size of 320x320 consistent with the input of a depth neural network model, the first depth image is input with a depth image with the size of 320x320 as input data, and is sent into a depth neural network model to perform operation, and target three-dimensional image information with the output width and height of 320x320 and the depth of 320D is obtained, namely the three-dimensional face (voxel) model deduced by the depth learning model.

Referring to fig. 3, the deep neural network model in the embodiment of the present invention is obtained through the following training steps S310, S320, S330, S340, S350, S360, S370, and S380.

Step S310, obtaining sample three-dimensional image information of a three-dimensional sample;

and step S320, obtaining second depth information of the three-dimensional sample according to the three-dimensional image information of the sample.

Step S330, inputting second depth information into the input layer.

Step S340, inputting the second depth information into the convolutional layer for convolution to obtain a second eigenvalue.

And step S350, inputting the second characteristic value into the characteristic dimension changing layer for characteristic conversion to obtain a second three-dimensional characteristic body.

And step S360, inputting the second three-dimensional feature into the three-dimensional deconvolution layer for deconvolution to obtain training three-dimensional image information of the three-dimensional sample.

Step S370, inputting the training three-dimensional image information and the sample three-dimensional image information into a loss function to calculate a loss value.

And step S380, obtaining a target weight parameter according to the loss value and adjusting the deep neural network model according to the target weight parameter.

In some embodiments of the present invention, information of a three-dimensional sample is input into a deep neural network model for training, first, sample three-dimensional image information of the three-dimensional sample is obtained, the sample three-dimensional image information represents three-dimensional model data of the sample, and the sample three-dimensional image information is converted into a depth image according to an algorithm and software to obtain second depth information of the three-dimensional sample, which indicates that the second depth information is directly obtained by directly converting the three-dimensional model data of the sample, and then the second depth information is input into the deep neural network model for processing, including being input into an input layer of the deep neural network model first.

Specifically, the second depth information is input into a convolution layer to be convolved to obtain a second characteristic value, the second characteristic value is input into a change characteristic dimension layer to be subjected to characteristic conversion to obtain a second three-dimensional characteristic body, the second three-dimensional characteristic body is input into a three-dimensional deconvolution layer to be subjected to deconvolution to obtain training three-dimensional image information of a three-dimensional sample, the training three-dimensional image information is a three-dimensional image obtained after the second depth information is input into a depth neural network model, then the training three-dimensional image information and the sample three-dimensional image information are input into a loss function to be calculated to obtain a loss value, and all weight information of the sample three-dimensional image information in the deep neural network model can be obtained by optimizing the loss value, namely, the target weight parameters are obtained through optimization, and each weight in the deep neural network model is adjusted according to the target weight parameters, so that the deep neural network model is trained.

It should be noted that the loss value obtained by the loss function calculation is used to update the weight, each layer tensor of the deep neural network model and the derivative value of each weight are calculated through reverse chain propagation, and the update amount of the target weight parameter is obtained by multiplying the derivative value by the learning rate, and the embodiment of the present invention provides a square training loss function, where the loss function is as follows:

in formula (1), θ is all weight parameters of the sample three-dimensional image information, μ is the output value of the deep neural network model, i.e., the training three-dimensional image information, and y is the sample three-dimensional image information. The calculation result l (theta) of the loss function is the loss value, so that the target weight parameter can be obtained after optimization is carried out according to the calculated loss value, and the deep neural network model is adjusted to complete training.

Referring to fig. 4, step S330 in the embodiment of the present invention may further include, but is not limited to, the following step S410 and step S420.

Step S410, a random first augmentation transformation is performed on the second depth information to obtain third depth information, where the first augmentation transformation includes one of a random gaussian noise value, random scaling, random angle rotation, random translation, and random selection of a partial region of the depth image.

Step S420, inputting the third depth information into the input layer.

In some embodiments of the present invention, before the second depth information is input to the deep neural network model for processing, data amplification processing needs to be performed on the second depth information, that is, the first amplification transformation is performed to obtain third depth information, and then the third depth information after the amplification transformation is input to the input layer of the deep neural network model, the purpose of data amplification is to increase the diversity and difference of data, each operation step uses a random number, or whether the step is performed randomly, so the present invention is a scheme for increasing randomness by performing random data amplification processing on the second depth information, and includes selecting one or more of the following listed individual amplification transformation algorithms and performing random ordering and sequential combination to obtain final third depth information after amplification: and adding a random Gaussian noise value to the second depth information, performing random scaling, performing random angle rotation, performing random translation and randomly selecting a partial region of the depth image, wherein the partial region of the randomly selected depth image is the partial region of the randomly selected depth image, and the value of the partial region is set to zero or the maximum value, so that the condition that the region is missing (hollow) in the depth image data acquired by the depth camera is simulated.

It should be noted that, in an embodiment, after the second depth information is subjected to the random first augmentation transform to obtain the third depth information, the training data normalization processing is further performed, so that an output image after the training data normalization processing meets an input requirement of the depth neural network model, and the output depth image is subjected to one of the following two types of normalization images to obtain a normalized image with a consistent size, where the normalization image includes cutting out an image with a size of 320x320 from the center of the image if the width and the height of the image into which the third depth information is input are greater than or equal to 320x320, centering the image otherwise, expanding the upper, lower, left, and right sides of the image outward, filling a zero value, filling the image with a size of 320x320, and inputting the processed third depth information into an input layer of the depth neural network model after the training data normalization processing is completed.

Referring to fig. 5, step S370 in the embodiment of the present invention may further include, but is not limited to, the following step S510, step S520, step S530, and step S540.

Step S510, converting the sample three-dimensional image information into first mesh information.

Step S520, performing a second augmented transformation corresponding to the first augmented transformation on the first mesh information to obtain second mesh information corresponding to a third depth information perspective, where the second augmented transformation includes performing one of scaling, angular rotation, and translation corresponding to the second depth information.

Step S530 discretizes the second mesh information into sample three-dimensional voxel information.

And step S540, inputting the training three-dimensional image information and the sample three-dimensional voxel information into a loss function to calculate a loss value.

In some embodiments of the present invention, data augmentation processing similar to that of the second depth information needs to be performed on the sample three-dimensional image information, where the obtained sample three-dimensional image information needs to be converted into first mesh information, the first mesh information is a triangular mesh model of the three-dimensional sample, that is, a three-dimensional model representation format, and it needs to be noted that the second depth information is also obtained by converting the first mesh information. After the first mesh information is obtained, performing a second augmented transformation corresponding to the first augmented transformation on the first mesh information to obtain second mesh information corresponding to a third depth information perspective, for example, if the second depth information performs random scaling, the first mesh information performs the same scaling; or, if the second depth information performs random angle rotation, the first mesh information performs rotation of the same angle; or, if the second depth information is randomly translated, the first grid information is translated in the same distance and direction, and the purpose is to obtain a grid model with the same visual angle as the second depth information, the first grid information is subjected to second amplification transformation to obtain second grid information, the second grid information is randomly discretized to obtain sample three-dimensional voxel information, and the obtained sample three-dimensional voxel information is used as information for comparing with the training three-dimensional image information, so that the training three-dimensional image information and the sample three-dimensional voxel information are input into a loss function to calculate a loss value.

It should be noted that, in an embodiment, after the sample three-dimensional image information is subjected to the second augmentation transformation and the like to obtain the second grid information, the supervision signal data normalization is further performed, one of the following two types of normalization voxel data is executed on the output sample three-dimensional voxel information to obtain normalized voxel model data consistent with the volume size of the third depth information, including if the length and the width of the three dimensions of the input sample three-dimensional voxel information are greater than or equal to 320x320x320, cutting the sample three-dimensional voxel information into a model with the size of 320x320x320 x320, and otherwise, centering the sample three-dimensional voxel information, expanding the upper, lower, left, right, front, and back, filling a zero value, filling the width to the model with the size of 320x320x320 x, and inputting the processed sample three-dimensional voxel information into the loss function after the supervision signal data normalization processing is completed.

Referring to fig. 6, step S380 in the embodiment of the present invention may further include, but is not limited to, the following step S610 and step S620.

And step S610, optimizing the loss value and performing back propagation chain type derivation on the loss value after optimization to obtain a weight parameter gradient.

Step S620, executing gradient descending processing according to the weight parameter gradient to obtain a target weight parameter.

In some embodiments of the invention, gradient calculation needs to be performed on the loss value obtained after optimization, an optimization goal of network training is to reduce the loss value l (θ) to a small value which tends to 0, the gradient d θ of all weight parameters in the network is calculated by using a back propagation chain derivation method, then the weight parameters are updated, and gradient descent processing is performed to obtain the target weight parameters.

In some embodiments of the present invention, the step S620 may further include: and executing gradient descent processing according to the weight parameter gradient obtained by the last training to obtain a target weight parameter. It should be noted that, in the embodiment of the present invention, sample three-dimensional image information of all three-dimensional samples collected is trained once, so as to improve accuracy of the deep neural network, in the embodiment, a new weight parameter gradient is obtained after completing one training, so as to update a target weight parameter, a training process is completed as training one epoch, 200 epochs are repeatedly trained, a weight parameter θ after completing the last epoch training is taken as a weight parameter of a deep neural network model finally required, and is taken as a final target weight parameter, so as to optimize the deep neural network model, thereby also realizing training from multiple different viewing angles, improving accuracy of the deep neural network model, and on the premise of satisfying requirements of the embodiment of the present invention, training for other times can be performed, without specific limitations on the present invention.

Referring to fig. 7, an embodiment of the present invention further provides a three-dimensional image processing apparatus 100 based on a neural network, which can implement the above three-dimensional image processing method based on the neural network, and the apparatus includes:

the image obtaining module 101, the image obtaining module 101 is configured to obtain first depth information of the target, where the first depth information is used to represent a depth image of the target.

The processing module 102 is connected to the image obtaining module 101, and the processing module 102 is configured to input the first depth information into an input layer of a deep neural network model, where the deep neural network model is obtained by training according to sample three-dimensional image information of a three-dimensional sample and second depth information, and the second depth information is used to represent a depth image of the three-dimensional sample.

The processing module 102 is further configured to input the first depth information into a convolution layer of the deep neural network model to perform convolution to obtain a first feature value, input the first feature value into a feature dimension change layer of the deep neural network model to perform feature conversion to obtain a first three-dimensional feature volume, and input the first three-dimensional feature volume into a three-dimensional deconvolution layer of the deep neural network model to perform deconvolution to obtain target three-dimensional image information of the target.

The processing module 102 is further configured to output the target three-dimensional image information.

It should be noted that, in the embodiment of the present invention, all processes of processing a deep neural network model may be performed in the processing module 102, and the image obtaining module 101 may also be configured to obtain image information of a three-dimensional sample, in an embodiment, the image obtaining module 101 may perform three-dimensional scanning on the three-dimensional sample to obtain three-dimensional image information of the sample, so as to facilitate training of the deep neural network model, the image obtaining module 101 may be a camera in a terminal device, and the processing module 102 may be a processor.

Fig. 8 shows an electronic device 200 provided by an embodiment of the present invention. The electronic apparatus 200 includes: the image processing device comprises a memory 201, a processor 202 and a computer program stored on the memory 201 and capable of running on the processor 202, wherein the computer program is used for executing the three-dimensional image processing method based on the neural network when running.

The processor 202 and the memory 201 may be connected by a bus or other means.

The memory 201, which is a non-transitory computer-readable storage medium, may be used to store a non-transitory software program and a non-transitory computer-executable program, such as the neural network-based three-dimensional image processing method described in the embodiments of the present invention. The processor 202 implements the above-described neural network-based three-dimensional image processing method by executing the non-transitory software program and instructions stored in the memory 201.

The memory 201 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store and execute the neural network-based three-dimensional image processing method described above. Further, the memory 201 may include a high speed random access memory 201, and may also include a non-transitory memory 201, such as at least one storage device memory device, flash memory device, or other non-transitory solid state memory device. In some embodiments, the memory 201 may optionally include memory 201 remotely located from the processor 202, and the remote memory 201 may be connected to the electronic device 200 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

Non-transitory software programs and instructions required to implement the neural network-based three-dimensional image processing method described above are stored in the memory 201, and when executed by the one or more processors 202, perform the neural network-based three-dimensional image processing method described above, for example, performing method steps S110 to S160 in fig. 1, method steps S210 to S220 in fig. 2, method steps S310 to S380 in fig. 3, method steps S410 to S420 in fig. 4, method steps S510 to S540 in fig. 5, and method steps S610 to S620 in fig. 6.

The embodiment of the invention also provides a computer-readable storage medium, which stores computer-executable instructions, and the computer-executable instructions are used for executing the three-dimensional image processing method based on the neural network.

In one embodiment, the computer-readable storage medium stores computer-executable instructions that are executed by one or more control processors, for example, to perform method steps S110-S160 in fig. 1, method steps S210-S220 in fig. 2, method steps S310-S380 in fig. 3, method steps S410-S420 in fig. 4, method steps S510-S540 in fig. 5, and method steps S610-S620 in fig. 6.

The above-described embodiments of the apparatus are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may also be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

One of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, storage device storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

It should also be appreciated that the various implementations provided by the embodiments of the present invention can be combined arbitrarily to achieve different technical effects.

One of ordinary skill in the art will appreciate that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

While the preferred embodiments of the present invention have been described in detail, it will be understood by those skilled in the art that the foregoing and various other changes, omissions and deviations in the form and detail thereof may be made without departing from the scope of this invention.

19页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:模型生成方法、深度估计方法、装置以及电子设备

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!