Training method of texture generation model, image processing method and device

文档序号：551854 发布日期：2021-05-14 浏览：31次中文

阅读说明：本技术 纹理生成模型的训练方法、图像处理方法及装置 (Training method of texture generation model, image processing method and device ) 是由姚光明袁燚范长杰胡志鹏于 2021-02-02 设计创作，主要内容包括：本发明提供了一种纹理生成模型的训练方法、图像处理方法及装置,涉及图片生成技术领域,包括：获取预渲染得到的训练图像；将训练图像输入至纹理生成模型,通过纹理生成模型针对训练图像输出第一光照参数和第一纹理集；基于第一光照参数、第一纹理集和预设的联合损失函数计算联合损失值；利用联合损失值对纹理生成模型进行训练。本发明可以生成质量较高的纹理,同时显著降低光照对纹理的影响。(The invention provides a training method of a texture generation model, an image processing method and a device, which relate to the technical field of picture generation and comprise the following steps: obtaining a pre-rendered training image; inputting a training image into a texture generation model, and outputting a first illumination parameter and a first texture set aiming at the training image through the texture generation model; calculating a joint loss value based on the first illumination parameter, the first texture set and a preset joint loss function; and training the texture generation model by using the joint loss value. The invention can generate texture with higher quality and obviously reduce the influence of illumination on the texture.)

1. A training method of a texture generation model is characterized by comprising the following steps:

obtaining a pre-rendered training image;

inputting the training image into a texture generation model, and outputting a first illumination parameter and a first texture set for the training image through the texture generation model;

calculating a joint loss value based on the first illumination parameter, the first texture set and a preset joint loss function;

and training the texture generation model by using the joint loss value.

2. The method of claim 1, wherein the texture generation model comprises an illumination estimation network and a texture prediction network;

the step of outputting, by the texture generation model, a first illumination parameter and a first texture set for the training image comprises:

estimating, by the illumination estimation network, a first illumination parameter of the training image, and generating, by the texture prediction network, a first texture set of the training image.

3. The method of claim 2, wherein the illumination estimation network comprises a downsampled convolutional layer and a multi-layer perceptual layer;

the step of estimating a first illumination parameter of the training image by the illumination estimation network comprises:

extracting illumination characteristics of the training image through the downsampling convolutional layer;

outputting a first illumination parameter of the training image according to the illumination characteristic through the multilayer sensing layer; wherein the first illumination parameter comprises one or more of an environmental parameter, a scattering parameter, an illumination intensity.

4. The method of claim 2, wherein the texture prediction network comprises a feature extraction module, a feature completion module, and a texture decoding module;

the step of generating a first texture set of the training image by the texture prediction network comprises:

extracting visible patch features and first invisible patch features of the training image through the feature extraction module;

completing the first invisible patch characteristic based on the visible patch characteristic through the characteristic completing module to obtain a second invisible patch characteristic;

and respectively decoding the visible patch features and the second invisible patch features through the texture decoding module to obtain a first texture set of the training image.

5. The method of claim 4, wherein the step of extracting, by the feature extraction module, visible patch features and first invisible patch features of the training image comprises:

extracting the features of the training images through a feature encoder in the feature extraction module to obtain a feature map corresponding to the training images;

sampling the feature map based on the depth value of each pixel in the training image to obtain visible vertex features and invisible vertex features;

carrying out average processing on visible vertex features and/or invisible vertex features belonging to the same patch to obtain visible patch features and first invisible patch features of the training image; if the patch contains one or more invisible vertices, determining that the corresponding feature of the patch is a first invisible patch feature; and if the patch does not contain the invisible vertex, determining that the corresponding feature of the patch is a visible patch feature.

6. The method of claim 5, wherein the step of sampling the feature map based on the depth values of the pixels in the training image to obtain visible vertex features and invisible vertex features comprises:

judging whether the depth value of each pixel in the training image is smaller than a preset depth buffer value or not;

if yes, determining the pixel as a visible vertex; if not, determining the pixel as an invisible vertex;

and sampling on the feature map based on the visible vertex and the invisible vertex by utilizing bilinear interpolation to respectively obtain visible vertex features and invisible vertex features.

7. The method of claim 4, wherein the feature completion module comprises a graph encoder and a graph decoder;

the step of completing, by the feature completion module, the first invisible patch feature based on the visible patch feature to obtain a second invisible patch feature includes:

performing, by the graph encoder, a convolution operation and a downsampling operation on the first invisible patch feature based on the visible patch feature and geometric information of a geometric model corresponding to the training image; wherein the graph encoder comprises a plurality of first graph convolution layers and a plurality of graph downsampling layers;

performing, by the graph encoder, convolution operation and upsampling operation on the feature output by the graph encoder based on the visible patch feature and the geometric information to obtain a second invisible patch feature; wherein the graph decoder includes a plurality of second graph convolution layers and a plurality of graph upsampling layers.

8. The method of claim 1, wherein the step of obtaining a pre-rendered training image comprises:

acquiring an original image;

rendering the original image by using a random illumination parameter and a plurality of rendering angles to obtain pre-rendering images corresponding to the rendering angles respectively;

and randomly selecting one prerender image from the prerender images as a training image.

9. The method of claim 8, wherein the joint loss function comprises a texture loss function, a round-robin uniform loss function, and a penalty-fighting loss function;

the step of calculating a joint loss value based on the first illumination parameter, the first texture set, and a preset joint loss function includes:

calculating a texture loss value based on the first texture set, a texture truth value, and the texture loss function;

rendering the original image by using the first illumination parameter, the first texture set and a rendering angle corresponding to the training image to obtain a micro-renderable image;

calculating a cyclic consistent loss value based on the training image, the micro-renderable image, and the cyclic consistent loss function;

calculating a countermeasure loss value based on the training image, the micro-renderable image, and the countermeasure loss function;

and carrying out weighted summation on the texture loss value, the cycle consistent loss value and the confrontation loss value to obtain a joint loss value.

10. The method of claim 9, wherein the step of calculating a challenge loss value based on the training image, the microreplicated image, and the challenge loss function comprises:

and performing linear uniform sampling processing on the training image and the micro-renderable image, and calculating a countermeasure loss value based on the image obtained by the linear uniform sampling processing, the training image, the micro-renderable image and the countermeasure loss function.

11. An image processing method, comprising:

acquiring a target image to be processed;

inputting the target image to a texture generation model; wherein the texture generation model is obtained by training by using the training method of the texture generation model according to any one of claims 1 to 10;

and generating a second illumination parameter and a second texture set corresponding to the target image through the texture generation model.

12. An apparatus for training a texture generation model, comprising:

the training image acquisition module is used for acquiring a training image obtained by prerendering;

a training image input module, configured to input the training image to a texture generation model, and output a first illumination parameter and a first texture set for the training image through the texture generation model;

a loss calculation module, configured to calculate a joint loss value based on the first illumination parameter, the first texture set, and a preset joint loss function;

and the training module is used for training the texture generation model by utilizing the joint loss value.

13. An image processing apparatus characterized by comprising:

the target image acquisition module is used for acquiring a target image to be processed;

a target image input module for inputting the target image to a texture generation model; wherein the texture generation model is obtained by training by using the training method of the texture generation model according to any one of claims 1 to 10;

and the generating module is used for generating a second illumination parameter and a second texture set corresponding to the target image through the texture generating model.

14. A server, comprising a processor and a memory;

the memory has stored thereon a computer program which, when executed by the processor, performs the method of any one of claims 1 to 10, or performs the method of claim 11.

15. A computer storage medium for storing computer software instructions for use in the method of any one of claims 1 to 10 or for use in the method of claim 11.

Technical Field

The invention relates to the technical field of image generation, in particular to a training method of a texture generation model, an image processing method and an image processing device.

Background

Texture generation is a picture generation technology, which can generate a corresponding texture map according to an input picture, for example, a texture map of a human body model can be generated according to an input human body image, so as to add the texture map to a reconstructed human body model (Mesh). At present, it is proposed in the related art that a convolutional neural network may be used to extract image features of an input image, sample each vertex from the image features to obtain color features of each vertex, and then send the color features of each vertex into an MLP (Multi-Layer Perceptron) to obtain a final predicted vertex color, so as to perform rendering based on a texture characterized by the vertex color, however, a texture generated in this way may often cause a rendered surface to be too smooth and the texture quality to be poor, and because the texture generated by the method is affected by illumination, the generated texture may be shaded.

Disclosure of Invention

In view of the above, an object of the present invention is to provide a training method, an image processing method, and an image processing apparatus for a texture generation model, in which the trained texture generation model can generate a texture with high quality, and at the same time, the influence of light on the texture is significantly reduced.

In a first aspect, an embodiment of the present invention provides a method for training a texture generation model, including: obtaining a pre-rendered training image; inputting the training image into a texture generation model, and outputting a first illumination parameter and a first texture set for the training image through the texture generation model; calculating a joint loss value based on the first illumination parameter, the first texture set and a preset joint loss function; and training the texture generation model by using the joint loss value.

In one embodiment, the texture generation model includes an illumination estimation network and a texture prediction network; the step of outputting, by the texture generation model, a first illumination parameter and a first texture set for the training image comprises: estimating, by the illumination estimation network, a first illumination parameter of the training image, and generating, by the texture prediction network, a first texture set of the training image.

In one embodiment, the illumination estimation network includes a downsampled convolutional layer and a multilayer perceptual layer; the step of estimating a first illumination parameter of the training image by the illumination estimation network comprises: extracting illumination characteristics of the training image through the downsampling convolutional layer; outputting a first illumination parameter of the training image according to the illumination characteristic through the multilayer sensing layer; wherein the first illumination parameter comprises one or more of an environmental parameter, a scattering parameter, an illumination intensity.

In one embodiment, the texture prediction network comprises a feature extraction module, a feature completion module, and a texture decoding module; the step of generating a first texture set of the training image by the texture prediction network comprises: extracting visible patch features and first invisible patch features of the training image through the feature extraction module; completing the first invisible patch characteristic based on the visible patch characteristic through the characteristic completing module to obtain a second invisible patch characteristic; and respectively decoding the visible patch features and the second invisible patch features through the texture decoding module to obtain a first texture set of the training image.

In one embodiment, the step of extracting, by the feature extraction module, visible patch features and first invisible patch features of the training image includes: extracting the features of the training images through a feature encoder in the feature extraction module to obtain a feature map corresponding to the training images; sampling the feature map based on the depth value of each pixel in the training image to obtain visible vertex features and invisible vertex features; carrying out average processing on visible vertex features and/or invisible vertex features belonging to the same patch to obtain visible patch features and first invisible patch features of the training image; if the patch contains one or more invisible vertices, determining that the corresponding feature of the patch is a first invisible patch feature; and if the patch does not contain the invisible vertex, determining that the corresponding feature of the patch is a visible patch feature.

In one embodiment, the step of sampling the feature map based on the depth values of the pixels in the training image to obtain visible vertex features and invisible vertex features includes: judging whether the depth value of each pixel in the training image is smaller than a preset depth buffer value or not; if yes, determining the pixel as a visible vertex; if not, determining the pixel as an invisible vertex; and sampling on the feature map based on the visible vertex and the invisible vertex by utilizing bilinear interpolation to respectively obtain visible vertex features and invisible vertex features.

In one embodiment, the feature completion module includes a graph encoder and a graph decoder; the step of completing, by the feature completion module, the first invisible patch feature based on the visible patch feature to obtain a second invisible patch feature includes: performing, by the graph encoder, a convolution operation and a downsampling operation on the first invisible patch feature based on the visible patch feature and geometric information of a geometric model corresponding to the training image; wherein the graph encoder comprises a plurality of first graph convolution layers and a plurality of graph downsampling layers; performing, by the graph encoder, convolution operation and upsampling operation on the feature output by the graph encoder based on the visible patch feature and the geometric information to obtain a second invisible patch feature; wherein the graph decoder includes a plurality of second graph convolution layers and a plurality of graph upsampling layers.

In one embodiment, the step of obtaining a pre-rendered training image includes: acquiring an original image; rendering the original image by using a random illumination parameter and a plurality of rendering angles to obtain pre-rendering images corresponding to the rendering angles respectively; and randomly selecting one prerender image from the prerender images as a training image.

In one embodiment, the joint loss function includes a texture loss function, a round-robin uniform loss function, and a countervailing loss function; the step of calculating a joint loss value based on the first illumination parameter, the first texture set, and a preset joint loss function includes: calculating a texture loss value based on the first texture set, a texture truth value, and the texture loss function; rendering the original image by using the first illumination parameter, the first texture set and a rendering angle corresponding to the training image to obtain a micro-renderable image; calculating a cyclic consistent loss value based on the training image, the micro-renderable image, and the cyclic consistent loss function; calculating a countermeasure loss value based on the training image, the micro-renderable image, and the countermeasure loss function; and carrying out weighted summation on the texture loss value, the cycle consistent loss value and the confrontation loss value to obtain a joint loss value.

In one embodiment, the step of calculating a challenge loss value based on the training image, the renderable image, and the challenge loss function comprises: and performing linear uniform sampling processing on the training image and the micro-renderable image, and calculating a countermeasure loss value based on the image obtained by the linear uniform sampling processing, the training image, the micro-renderable image and the countermeasure loss function.

In a second aspect, an embodiment of the present invention further provides an image processing method, including: acquiring a target image to be processed; inputting the target image to a texture generation model; wherein the texture generation model is obtained by training with the training method of the texture generation model according to any one of the first aspect; and generating a second illumination parameter and a second texture set corresponding to the target image through the texture generation model.

In a third aspect, an embodiment of the present invention further provides a training apparatus for a texture generation model, including: the training image acquisition module is used for acquiring a training image obtained by prerendering; a training image input module, configured to input the training image to a texture generation model, and output a first illumination parameter and a first texture set for the training image through the texture generation model; a loss calculation module, configured to calculate a joint loss value based on the first illumination parameter, the first texture set, and a preset joint loss function; and the training module is used for training the texture generation model by utilizing the joint loss value.

In a fourth aspect, an embodiment of the present invention further provides an image processing apparatus, including: the target image acquisition module is used for acquiring a target image to be processed; a target image input module for inputting the target image to a texture generation model; wherein the texture generation model is obtained by training with the training method of the texture generation model according to any one of the first aspect; and the generating module is used for generating a second illumination parameter and a second texture set corresponding to the target image through the texture generating model.

In a fifth aspect, an embodiment of the present invention further provides a server, including a processor and a memory; the memory has stored thereon a computer program which, when executed by the processor, performs the method of any one of the aspects as provided in the first aspect, or performs the method as provided in the second aspect.

In a sixth aspect, the present invention further provides a computer storage medium for storing computer software instructions for the method provided in any one of the first aspect, or for the method provided in the second aspect.

The embodiment of the invention provides a training method and a training device for a texture generation model. According to the method, the first illumination parameter and the second texture set corresponding to the training image are output through the texture generation model, the texture generation model is trained by combining the joint loss function, the texture generation model obtained through training can express textures with higher quality and more abundant details through the texture set, in addition, the illumination parameter can be predicted in a display mode, and the illumination parameter is restrained through the joint loss function, so that the influence of illumination on the generated texture is remarkably reduced, and the texture quality is further improved.

According to the image processing method and device provided by the embodiment of the invention, the target image to be processed is obtained firstly, and the target image is input into the texture generation model, so that the second illumination parameter and the second texture set corresponding to the target image are generated through the texture generation model. According to the method, the texture generation model obtained by training through the training method of the texture generation model is used for processing the target image, and the second illumination parameter and the second texture set corresponding to the target image are output, so that the texture quality is effectively improved, and the influence of illumination on the texture is remarkably reduced.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a schematic flow chart of human texture generation according to an embodiment of the present invention;

FIG. 2 is a schematic flowchart of a method for training a texture generation model according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a texture generation model according to an embodiment of the present invention;

FIG. 4a is a schematic diagram of a downsampled convolutional layer according to an embodiment of the present invention;

fig. 4b is a schematic structural diagram of an illumination estimation network according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a feature encoder according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a feature completion module according to an embodiment of the present invention;

fig. 7a is a schematic structural diagram of a sub-decoder according to an embodiment of the present invention;

fig. 7b is a schematic diagram of an upsampling structure provided by an embodiment of the present invention;

FIG. 8 is a logic diagram of a joint loss function design according to an embodiment of the present invention;

FIG. 9 is a flowchart illustrating another method for training a texture model according to an embodiment of the present invention;

fig. 10 is a flowchart illustrating an image processing method according to an embodiment of the present invention;

FIG. 11 is a flowchart illustrating another image processing method according to an embodiment of the present invention;

FIG. 12 is a schematic structural diagram of a training apparatus for generating a texture model according to an embodiment of the present invention;

fig. 13 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present invention;

fig. 14 is a schematic structural diagram of a server according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the embodiments, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

At present, the existing texture generation technology has the problems of poor texture quality, shadow in texture and the like. Referring to the flow chart of human texture generation shown in FIG. 1, the input isExtracting image features from the image through a series of convolutional neural networks, sampling each human body vertex from the image features to obtain the color feature of each human body vertex, and then sending the color feature into an MLP (maximum likelihood prediction) to obtain the final predicted vertex color. In the training process, the loss function of the model defines the closeness degree of the predicted vertex color and the color truth value, the smaller the loss function is, the closer the predicted vertex color and the color truth value is identified, and the internal parameters of the generator are optimized iteratively to obtain the required generator, wherein the optimization objective function is as follows:wherein, T_vThe color of the vertex is used as the color of the vertex,is the true color value. However, the texture generated by the above method is not effective and is easily affected by light, thereby causing shading of the texture. Based on the above, the invention provides a training method, an image processing method and an image processing device for a texture generation model, and the texture generation model obtained through training can generate textures with higher quality, and meanwhile, the influence of illumination on the textures is obviously reduced.

To facilitate understanding of the present embodiment, first, a detailed description is given to a training method of a texture generating model disclosed in the present embodiment, referring to a flow chart of the training method of the texture generating model shown in fig. 2, where the method mainly includes the following steps S202 to S208:

step S202, training images obtained through prerendering are obtained. In an embodiment, the original image may be obtained through various manners such as network downloading and manual uploading, and the original image is rendered from multiple rendering angles by using the random illumination parameter, so as to obtain pre-rendered images of the rendering angles, optionally, each pre-rendered image may be used as a training image, and one or more pre-rendered images may also be selected from each pre-rendered image as a training image. The random illumination parameters may include environmental parameters, scattering parameters, illumination direction, illumination intensity, and the like.

Step S204, inputting the training image into a texture generation model, and outputting a first illumination parameter and a first texture set aiming at the training image through the texture generation model. The first illumination parameter may include an environmental parameter, a scattering parameter, an illumination direction, an illumination intensity, and the like, the first texture set may include textures corresponding to each patch, and for facilitating understanding of the patches, the human body model is explained by taking a human body model as an example, the human body model is composed of vertices and patches, the vertices are a set of points having three-dimensional positions (x, y, z), and the patches determine connectivity of the vertices, for example, the 1 st, 2 nd, 3 rd vertices are connected by the patches {1, 2, 3 }. In one embodiment, the texture generation model may pre-load model parameters of the human body model, and output a first illumination parameter and a first texture set from the illumination estimation network and the texture prediction network within the texture generation model according to the model parameters and the training image, respectively.

Step S206, calculating a joint loss value based on the first illumination parameter, the first texture set, and a preset joint loss function. Wherein the joint loss function includes a texture loss function, a round robin uniform loss function, and a countervailing loss function. The texture loss function is used for constraining the generated texture; the cycle consistent loss function is used for enabling an image obtained by rendering to be as close to a pre-rendered image as possible, the cycle consistent loss function can enable the embodiment of the invention to be free from the constraint of displaying the predicted illumination parameters, and meanwhile, the texture generation model is enabled to learn the prediction of the illumination parameters; the penalty function is used to make the rendered image more realistic.

And step S208, training the texture generation model by using the joint loss value. In one embodiment, the internal parameters of the texture generation model may be iteratively optimized based on the joint loss value, and the training may be stopped when a preset condition is reached, where the preset condition may include the number of iterations or convergence of the joint loss value, etc.

According to the training method of the texture generation model provided by the embodiment of the invention, the first illumination parameter and the second texture set corresponding to the training image are output through the texture generation model, and the texture generation model is trained by combining the joint loss function, so that the trained texture generation model can represent textures with higher quality and more details through the texture set, in addition, the illumination parameter can be predicted in a display manner, and the illumination parameter is constrained through the joint loss function, so that the influence of illumination on the generated texture is obviously reduced, and the texture quality is further improved.

In an embodiment, an embodiment of the present invention provides an implementation manner for obtaining a training image obtained by prerendering, which may be used to obtain an original image, render the original image by using a random illumination parameter and a plurality of rendering angles to obtain prerendered images corresponding to the rendering angles, and randomly select one prerendered image from the prerendered images as the training image. In practical application, when training pictures required by training are generated, parallel light can be obtained by using random illumination parameters, the original images are rendered from a plurality of rendering angles by using the parallel light, so that multi-angle pre-rendering pictures are obtained, and a random pre-rendering image is input into a texture generation model to be trained as a training image.

To facilitate understanding of the step S204, the embodiment of the present invention provides a texture generation model, which includes an illumination estimation network and a texture prediction network. Specifically, referring to a structural diagram of a texture generation model shown in fig. 3, fig. 3 illustrates that the texture prediction network further includes a feature extraction module, a feature completion module, and a texture decoding module. The input of the illumination estimation network is a training image, and the output is a first illumination parameter; the input of the texture prediction network is a training image, and the output is a first texture set. Further, the input of a feature extraction module in the texture prediction network is a training image, and the output is a visible patch feature and a first invisible patch feature; the input of the characteristic completion module is a visible patch characteristic and a first invisible patch characteristic, and the output of the characteristic completion module is a visible patch characteristic and a second invisible patch characteristic; the input of the texture decoding module is a visible patch characteristic and a second invisible patch characteristic, and the output is a first texture set.

On the basis of the above-mentioned fig. 3, the embodiment of the present invention provides an implementation that outputs a first illumination parameter and a first texture set for a training image through a texture generation model, the first illumination parameter of the training image may be estimated through an illumination estimation network, and the first texture set of the training image may be generated through a texture prediction network. The embodiment of the invention can explicitly estimate the first illumination parameter in the image and train the texture generation model by combining the joint loss function, so that the trained texture generation model can constrain the illumination parameter, thereby reducing the influence of illumination on the generated texture. In addition, the embodiment of the invention can express the texture with higher quality and richer details by utilizing the texture set, thereby effectively improving the generated texture effect.

In an alternative embodiment, the illumination estimation network includes a downsampled convolutional layer and a multi-layer sensing layer (MLP), for understanding, referring to a schematic structure of a downsampled convolutional layer shown in fig. 4a, an embodiment of the present invention estimates the illumination parameter L using a network structure of a set of downsampled convolutional layers and a layer of MLP, where the downsampled convolutional layer includes a first convolutional layer (k ═ 3, s ═ 1, p ═ 1), a first BN (batch normalization) layer, a first return Linear Unit (Linear rectification function) layer, a second convolutional layer (k ═ 3, s ═ 1, p ═ 1), a second BN layer, a second return layer, and an averaging pool layer (averaging pool). In addition, fig. 4b provides a schematic structural diagram of an illumination estimation network, and fig. 4b exemplarily provides a feature map size of each downsampled convolutional layer output, for example, the size of the training image I is (256, 256, 3), the size of the feature map output by the first downsampled convolutional layer is (128, 64), the size of the feature map output by the second downsampled convolutional layer is (64, 128), and so on.

Based on this, when the step of estimating the first illumination parameter of the training image through the illumination estimation network is performed, the illumination feature of the training image may be extracted through the downsampling convolutional layer, and the first illumination parameter of the training image may be output according to the illumination feature through the multi-layered sensing layer. Wherein the first illumination parameter L comprises one or more of an environmental parameter α, a scattering parameter β, and an illumination intensity d. In practical applications, taking the illumination estimation network shown in fig. 4b as an example, the first 4 layers of downsampled convolutional layers are used to extract the illumination features of the training image, and the last layer of MLP is used to map the illumination features into the illumination parameter space, which optionally includes the direction and intensity of illumination assuming that only parallel illumination is considered. Furthermore, when the texture generation model is trained, a parameter space of random illumination can be generated, so that the constraint of the texture generation model on illumination is further improved.

For ease of understanding, an embodiment of the present invention further provides an implementation of generating a first texture set of a training image through a texture prediction network, which may be referred to as the following steps 1 to 3:

step 1, extracting visible patch features and first invisible patch features of a training image through a feature extraction module. Wherein, the characteristics of the visible patch can be understood as the characteristics of the visible patch, and the characteristics of the first invisible patch can be understood as the characteristics of the invisible patch, wherein, if the patch contains one or more invisible vertexes, the patch is determined to be the invisible patch, and the corresponding characteristics of the patch are the first invisible patch characteristics; if the patch does not contain the invisible vertex, the patch is determined to be a visible patch, and the corresponding feature of the patch is determined to be a visible patch feature.

In an embodiment, the feature extraction module includes a feature encoder, and an exemplary embodiment of the present invention provides a schematic structural diagram of the feature encoder, as shown in fig. 5, a network structure of the feature encoder is composed of four down-sampling blocks (downsampling convolutional layers). Based on this, the embodiment of the present invention provides an implementation manner for extracting visible patch features and first invisible patch features of a training image through a feature extraction module, which is as follows, in step 1.1 to step 1.3:

and 1.1, performing feature extraction on the training image through a feature encoder in the feature extraction module to obtain a feature map corresponding to the training image. In practical application, a training image is input to a feature encoder, and the feature encoder can encode the training image to obtain a feature map corresponding to the training image.

In the step 1.2, the method comprises the following steps of,and sampling the feature map based on the depth value of each pixel in the training image to obtain visible vertex features and invisible vertex features. Wherein the visible vertex feature is also the visible vertex V_vIs characteristic of invisible vertex, i.e. invisible vertex V_invThe characteristics of (1). In an alternative embodiment, the step of sampling the feature map based on the depth values of the pixels in the training image to obtain the visible vertex feature and the invisible vertex feature may be performed according to the following steps 1.2.1 to 1.2.2:

and step 1.2.1, judging whether the depth value of each pixel in the training image is smaller than a preset depth buffer value. If yes, determining the pixel as a visible vertex; if not, the pixel is determined to be an invisible vertex. In one embodiment, the vertices are projected onto the image plane, and vertices with depth values less than the depth buffer value can be classified as visible vertices V by comparing the depth values of the respective vertices to a preset depth buffer value_vAnd classifying as invisible vertex V a vertex whose depth value is greater than the depth buffer value_inv。

And 1.2.2, sampling on the feature map by utilizing bilinear interpolation based on the visible vertex and the invisible vertex to respectively obtain the visible vertex feature and the invisible vertex feature. In one embodiment, bilinear interpolation may be utilized based on the visible vertex V_vSampling the feature map to obtain visible vertex features; and may be based on invisible vertices V using bilinear interpolation_invAnd sampling the feature map to obtain invisible vertex features.

And step 1.3, carrying out average processing on the visible vertex features and/or invisible vertex features belonging to the same patch to obtain the visible patch features and the first invisible patch features of the training image. In practical applications, if it is expected that each patch predicts a texture, the vertex features (including the visible vertex feature and the invisible vertex feature) may be converted into patch features (including the visible patch feature and the first invisible patch feature), and specifically, the vertex features of each vertex corresponding to each patch may be averaged to obtain the corresponding patch feature. It should be noted that if the patch is ofIf any vertex is an invisible vertex, the patch is classified as an invisible patch, and the feature of the invisible patch is marked as an invisible patch feature F_invOtherwise, the patch is classified as a visible patch, and the feature of the visible patch is marked as a visible patch feature F_v. In addition, since the training image has only a visible frontal region, and all vertices are projected onto the image coordinate system, and the visible vertices and invisible vertices are coincident, the invisible patch feature F is herein_invThe actual corresponding visible patch feature F_vThe invisible feature F of the face needs to be processed by the following step 2_invAnd (5) further processing.

And 2, completing the first invisible patch characteristic based on the visible patch characteristic through a characteristic completing module to obtain a second invisible patch characteristic. To generate a texture map that conforms to a human model (also referred to as a geometric model), Graph-Unet may be used to introduce the geometric information of the human model into the texture generation model for further processing of the invisible features. In one embodiment, the visible patch feature F may be_vAnd invisible mask feature F_invInput to the feature completion module, since the feature F of the patch is visible_vTexture can be generated well, but the feature F of the invisible face_invRequires passing through the visible patch feature F_vAnd geometric information, so the feature completion module will be based on the visible patch feature F_vAnd geometric information vs. invisible features F_invCompleting the blank to obtain a second invisible sheet characteristic F'_inv。

To facilitate understanding of the above step 2, referring to a structural diagram of a feature completion module shown in fig. 6, fig. 6 illustrates that the feature completion module includes a graph encoder and a graph decoder, the graph encoder includes a plurality of first graph convolution layers and a plurality of graph down-sampling layers, the first graph convolution layers and the graph down-sampling layers are alternately connected, the graph decoder includes a plurality of second graph convolution layers and a plurality of graph up-sampling layers, and the second graph convolution layers and the graph up-sampling layers are alternately connected. In addition, FIG. 6 also notes the number of channels per convolution layer.

On the basis of the feature completion module shown in fig. 6, an embodiment of the present invention provides an implementation manner in which a feature completion module completes a first invisible patch feature based on a visible patch feature to obtain a second invisible patch feature, the first invisible patch feature may be subjected to convolution operation and downsampling operation based on the visible patch feature and geometric information of a geometric model corresponding to a training image by a map encoder, and the feature output by the map encoder may be subjected to convolution operation and upsampling operation based on the visible patch feature and the geometric information by a map decoder to obtain the second invisible patch feature. The calculation process of the graph downsampling operation is as follows:

idx＝rank(y,k)；

A^l+1＝A^l[idx,idx]；

wherein, the input of the sampling layer of the ith layer is recorded as F^l∈R^N*COutput is noted as F^l+1，A^lIs the l-th layer adjacency matrix, A^l ⁺¹Is the l +1 th layer adjacency matrix, p^lIs a trainable projection weight, rank (y, k) is a ranking function that returns the index of the first k maxima of y, F^l[idx,:]Returning a vector corresponding to index idx, an indication of an element multiplication,is that all elements are 1And (5) vector quantity. The upsampling layer is an inverse operation of the downsampling layer, and has a main function of restoring the downsampled graph structure to the previous structure, and is represented as:

F^l+1＝distribute(0_N*C,F^l,idx)；

wherein the distribute function is to F^lPut the row vector of to 0_N*CIn the corresponding index idx, 0_N*C∈R^N*CIs a vector with all elements being 0. In addition, graph convolution is a common operation of graph structure processing, defined as:

wherein the content of the first and second substances,is formed by a self-connecting contiguous matrix, W^lIt is possible to train the weights,is thatA diagonal matrix of vertex degrees.

And 3, respectively decoding the visible patch features and the second invisible patch features through a texture decoding module to obtain a first texture set of the training image. In one embodiment, the texture decoding module may include two sub-decoders, one of which is configured to decode the visible patch feature and the other of which is configured to decode the second invisible patch feature, such that the decoding of the invisible patch feature does not affect the decoding of the visible patch feature. For ease of understanding, refer to a schematic diagram of a sub-decoder shown in fig. 7a, which includes a plurality of Up-sampling blocks (upsampling structures), and refer to a schematic diagram of an upsampling structure shown in fig. 7b, which includes a first Upsample (2,2) layer, a third convolutional layer conv (k ═ 3, s ═ 1, p ═ 1), a third BN layer, a third ReUL layer, and a second Upsample (2,2) layer.

In one embodiment, according to the idea of round robin consistency, the embodiment of the present invention provides a joint loss function, which includes a texture loss function, a round robin consistency loss function and a penalty loss function. Referring to a joint loss function design logic diagram shown in FIG. 8, a texture loss value L may be calculated based on a texture truth value and a first texture set_tex(ii) a Rendering by using a renderer according to the texture true value, the random illumination parameter and the Mesh (V, F) to obtain a training image, and rendering by using a micro-renderer according to the first illumination parameter and the first texture set to obtain a micro-renderable image, thereby calculating a cycle consistency loss value L based on the training image and the micro-renderable image_r(ii) a Computing a confrontation loss value L based on a training image_adv. The embodiment of the invention restrains the generated texture and the predicted illumination parameter, so that the generated texture and the true texture are as close as possible, and the predicted illumination parameter and the random illumination parameter are as close as possible. In addition, the embodiment of the present invention may further render the generated texture and the predicted illumination parameter according to a rendering angle corresponding to the training image by using a micro-renderer, and require that the rendered image (i.e., the micro-renderable image) is as close as possible to the training image.

On this basis, an embodiment of the present invention provides an implementation manner for calculating a joint loss value based on a first illumination parameter, a first texture set, and a preset joint loss function, which is as follows:

step a, calculating a texture loss value based on the first texture set, the texture true value and the texture loss function. Wherein the texture loss function is as follows:wherein T is a true value of the texture,is a first texture set.

And b, rendering the original image by using the first illumination parameter, the first texture set and the rendering angle corresponding to the training image to obtain a micro-renderable image. In an embodiment, the generated first texture set and the first illumination parameter may be rendered according to a rendering angle corresponding to the training image by using a micro-rendering technology of the pytorch3d, so as to obtain a micro-renderable image.

And c, calculating a cycle consistent loss value based on the training image, the micro-renderable image and the cycle consistent loss function. The embodiment of the invention can make the texture generation model learn more effectively and converge more quickly by using the cyclic consistent loss function. Meanwhile, the embodiment of the invention does not need to carry out explicit constraint on the first illumination parameter through the cycle consistency loss function. In one embodiment, the round-robin uniform loss function is as follows:wherein, I_aA pre-rendered picture representing an angle a,the angle a of the micro-renderable image obtained by the micro-rendering is shown, and A represents all rendering angles.

And d, calculating a countermeasure loss value based on the training image, the micro-renderable image and the countermeasure loss function. In one embodiment, the training image and the micro-renderable image may be subjected to linear uniform sampling processing, and the countermeasure loss value may be calculated based on the image, the training image, the micro-renderable image, and the countermeasure loss function resulting from the linear uniform sampling processing. To make the texture more realistic, embodiments of the present invention use the WGAN-GP penalty loss function. The mathematical form of the loss function is as follows:

wherein D is a discriminator,is toFor the linearly and uniformly sampled image,is the gradient of the discriminator and λ is the weight.

And e, carrying out weighted summation on the texture loss value, the cycle consistency loss value and the confrontation loss value to obtain a joint loss value. In one embodiment, the joint loss values are expressed as follows:

L_total＝λ_texL_tex+λ_rL_r+λ_advL_adv；

wherein λ is_texWeight, λ, of texture loss value_rWeight, λ, for cyclic consistent loss value_advIs a weight against the loss value.

To facilitate understanding of the training method of the texture generation model provided in the foregoing embodiment, an application example of the training method of the texture generation model is provided in the embodiment of the present invention, referring to a flow diagram of another training method of the texture generation model shown in fig. 9, the method mainly includes the following steps S902 to S912:

step S902, pre-rendering the original image to obtain a training image.

Step S904, a first illumination parameter and a first texture set corresponding to the training image are generated by the texture generation model.

Step S906, a joint loss value is calculated according to the first illumination parameter and the first texture set.

Step S908, optimize the network parameters of the texture generation model using the joint loss values.

In step S910, it is determined whether the maximum number of iterations is reached. If yes, go to step S912; if not, step S902 is executed.

Step S912, save the network parameters of the texture generation model.

On the basis of the training method of the texture generation model provided in the foregoing embodiment, an embodiment of the present invention provides an image processing method, which, referring to a flow diagram of an image processing method shown in fig. 10, mainly includes the following steps S1002 to S1006:

step S1002, a target image to be processed is acquired.

Step S1004, inputting the target image to the texture generation model; the texture generation model is obtained by training by using the training method of the texture generation model provided in the foregoing embodiment. In practical applications, the texture generation model should be preloaded with model parameters of the human body model, so as to determine a second illumination parameter and a second texture set of the target image based on the model parameters.

Step S1006, a second illumination parameter and a second texture set corresponding to the target image are generated through the texture generation model.

In the image processing method provided by the embodiment of the invention, the texture generation model obtained by training the training method of the texture generation model is used for processing the target image, and the second illumination parameter and the second texture set corresponding to the target image are output, so that the texture quality is effectively improved, and the influence of illumination on the texture is also obviously reduced.

In order to facilitate understanding of the image processing method provided in the foregoing embodiment, an application example of the image processing method is further provided in the embodiment of the present invention, referring to a schematic flow chart of another image processing method shown in fig. 11, where the method mainly includes the following steps S1102 to S1108:

step S1102, loading model parameters of the human body model.

In step S1104, a target image to be processed is input.

Step S1106, a second illumination parameter and a second texture set corresponding to the target image are generated through the texture generation model.

In step S1108, it is determined whether or not to end. If yes, ending; if not, step S1104 is performed.

In summary, the embodiment of the present invention adopts a texture set-based representation method to generate the texture of each patch, so as to represent a higher-quality and detail-rich texture. In addition, the embodiment of the invention can also explicitly predict the illumination parameters, construct the loss function through micro-rendering, construct the loss function with consistent circulation and constrain the predicted illumination parameters, thereby effectively eliminating the influence of illumination on the generated texture.

For the training method of a texture generation model provided in the foregoing embodiment, an embodiment of the present invention provides a training apparatus of a texture generation model, and referring to a schematic structural diagram of a training apparatus of a texture generation model shown in fig. 12, the training apparatus mainly includes the following components:

a training image obtaining module 1202, configured to obtain a training image obtained by prerendering.

A training image input module 1204, configured to input a training image into the texture generation model, and output the first illumination parameter and the first texture set for the training image through the texture generation model.

A loss calculating module 1206, configured to calculate a joint loss value based on the first illumination parameter, the first texture set, and a preset joint loss function.

A training module 1208, configured to train the texture generation model with the joint loss value.

In the training method for the texture generation model provided by the embodiment of the invention, the first illumination parameter and the second texture set corresponding to the training image are output through the texture generation model, and the texture generation model is trained by combining the joint loss function, so that the trained texture generation model can represent textures with higher quality and more details through the texture set, in addition, the illumination parameter can be predicted in a display manner, and the illumination parameter is constrained through the joint loss function, thereby obviously reducing the influence of illumination on the generated texture and further improving the texture quality.

In one embodiment, the texture generation model includes an illumination estimation network and a texture prediction network; the training image input module 1204 is further configured to: a first illumination parameter of a training image is estimated by an illumination estimation network, and a first texture set of the training image is generated by a texture prediction network.

In one embodiment, a lighting estimation network includes a downsampled convolutional layer and a multi-layer perceptual layer; the training image input module 1204 is further configured to: extracting illumination characteristics of the training image through a downsampling convolutional layer; outputting a first illumination parameter of a training image according to illumination characteristics through a multilayer sensing layer; wherein the first illumination parameter comprises one or more of an environmental parameter, a scattering parameter, and an illumination intensity.

In one embodiment, the texture prediction network comprises a feature extraction module, a feature completion module and a texture decoding module; the training image input module 1204 is further configured to: extracting visible patch features and first invisible patch features of the training image through a feature extraction module; completing the first invisible patch characteristic based on the visible patch characteristic through a characteristic completing module to obtain a second invisible patch characteristic; and respectively decoding the visible patch features and the second invisible patch features through a texture decoding module to obtain a first texture set of the training image.

In one embodiment, the training image input module 1204 is further configured to: extracting the features of the training images through a feature encoder in the feature extraction module to obtain feature maps corresponding to the training images; sampling the feature map based on the depth value of each pixel in the training image to obtain visible vertex features and invisible vertex features; carrying out average processing on visible vertex features and/or invisible vertex features belonging to the same patch to obtain visible patch features and first invisible patch features of a training image; if the patch contains one or more invisible vertexes, determining the corresponding feature of the patch as a first invisible patch feature; and if the patch does not contain the invisible vertex, determining the corresponding characteristic of the patch as the characteristic of the visible patch.

In one embodiment, the training image input module 1204 is further configured to: judging whether the depth value of each pixel in the training image is smaller than a preset depth buffer value or not; if yes, determining the pixel as a visible vertex; if not, determining the pixel as an invisible vertex; and sampling on the feature map based on the visible vertex and the invisible vertex by utilizing bilinear interpolation to respectively obtain the visible vertex feature and the invisible vertex feature.

In one embodiment, the feature completion module includes a graph encoder and a graph decoder; the training image input module 1204 is further configured to: performing convolution operation and down-sampling operation on the first invisible patch characteristic based on the visible patch characteristic and the geometric information of the geometric model corresponding to the training image through the graph encoder; wherein the graph encoder includes a plurality of first graph convolution layers and a plurality of graph downsampling layers; performing convolution operation and up-sampling operation on the characteristics output by the graph encoder through the graph encoder based on the characteristics of the visible surface patches and the geometric information to obtain second invisible surface patch characteristics; wherein the graph decoder includes a plurality of second graph convolution layers and a plurality of graph upsampling layers.

In one embodiment, the training image acquisition module 1202 is further configured to: acquiring an original image; rendering the original image by using the random illumination parameters and the rendering angles to obtain pre-rendering images corresponding to the rendering angles respectively; and randomly selecting one prerender image from the prerender images as a training image.

In one embodiment, the joint loss function includes a texture loss function, a round-robin uniform loss function, and a countervailing loss function; the loss calculation module 1206 is further operable to: calculating a texture loss value based on the first texture set, the texture truth value and the texture loss function; rendering the original image by using the first illumination parameter, the first texture set and the rendering angle corresponding to the training image to obtain a micro-renderable image; calculating a cyclic consistent loss value based on the training image, the micro-renderable image and the cyclic consistent loss function; calculating a countermeasure loss value based on the training image, the micro-renderable image, and a countermeasure loss function; and carrying out weighted summation on the texture loss value, the cycle consistent loss value and the confrontation loss value to obtain a joint loss value.

In one embodiment, the loss calculation module 1206 is further configured to: and performing linear uniform sampling processing on the training image and the micro-renderable image, and calculating a resistance loss value based on the image, the training image, the micro-renderable image and a resistance loss function obtained by the linear uniform sampling processing.

As for the image processing method provided by the foregoing embodiment, an embodiment of the present invention further provides an image processing apparatus, referring to a schematic structural diagram of an image processing apparatus shown in fig. 13, the apparatus mainly includes the following components:

a target image obtaining module 1302, configured to obtain a target image to be processed.

A target image input module 1304 for inputting a target image to the texture generation model; the texture generation model is obtained by training by adopting the training method of the texture generation model provided by the embodiment;

and a generating module 1306, configured to generate, through the texture generation model, a second illumination parameter and a second texture set corresponding to the target image.

According to the image processing device provided by the embodiment of the invention, the texture generation model obtained by training the training method of the texture generation model is used for processing the target image, and the second illumination parameter and the second texture set corresponding to the target image are output, so that the texture quality is effectively improved, and the influence of illumination on the texture is also obviously reduced.

The device provided by the embodiment of the present invention has the same implementation principle and technical effect as the method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the method embodiments without reference to the device embodiments.

The embodiment of the invention provides a server, which particularly comprises a processor and a storage device; the storage means has stored thereon a computer program which, when executed by the processor, performs the method of any of the above described embodiments.

Fig. 14 is a schematic structural diagram of a server according to an embodiment of the present invention, where the server 100 includes: processor 140, memory 141, bus 142 and communication interface 143, said processor 140, communication interface 143 and memory 141 being connected by bus 142; processor 140 is operative to execute executable modules, such as computer programs, stored in memory 141.

The Memory 141 may include a high-speed Random Access Memory (RAM) and may further include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 143 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used.

The bus 142 may be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 14, but that does not indicate only one bus or one type of bus.

The memory 141 is used for storing a program, the processor 140 executes the program after receiving an execution instruction, and the method executed by the apparatus defined by the flow process disclosed in any of the foregoing embodiments of the present invention may be applied to the processor 140, or implemented by the processor 140.

The processor 140 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 140. The Processor 140 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA), or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 141, and the processor 140 reads the information in the memory 141 and completes the steps of the above method in combination with the hardware thereof.

The computer program product of the readable storage medium provided in the embodiment of the present invention includes a computer readable storage medium storing a program code, where instructions included in the program code may be used to execute the method described in the foregoing method embodiment, and specific implementation may refer to the foregoing method embodiment, which is not described herein again.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

27页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：反射图像生成模型及反射去除模型的训练方法

Training method of texture generation model, image processing method and device

相关技术

网友询问留言