Method for constructing target three-dimensional key point extraction model and recognizing posture in two-dimensional graph

文档序号:1673722 发布日期:2019-12-31 浏览:16次 中文

阅读说明:本技术 二维图形中目标三维关键点提取模型构建及姿态识别方法 (Method for constructing target three-dimensional key point extraction model and recognizing posture in two-dimensional graph ) 是由 彭进业 张少博 赵万青 祝轩 李斌 张薇 乐明楠 李展 罗迒哉 王珺 于 2019-08-12 设计创作,主要内容包括:本发明公开了一种二维图形中目标三维关键点提取模型构建及姿态识别方法,通过设计了三维关键点提取模型的网络结构,能够准确、直接输出目标三维关键点的坐标;通过设计的关键点损失函数,使网络利用无监督的方式自主学习提取具有语义一致性和几何一致性的关键点,提高了三维关键点提取的准确率。(The invention discloses a method for constructing a target three-dimensional key point extraction model and identifying a posture in a two-dimensional graph, which can accurately and directly output the coordinates of a target three-dimensional key point by designing a network structure of the three-dimensional key point extraction model; by means of the designed key point loss function, the network can independently learn and extract key points with semantic consistency and geometric consistency in an unsupervised mode, and the accuracy of extracting the three-dimensional key points is improved.)

1. A method for constructing a target three-dimensional key point extraction model in a two-dimensional graph is characterized by comprising the following steps:

the method comprises the following steps that 1, a plurality of two-dimensional image groups containing targets to be recognized are obtained, and two-dimensional images in the two-dimensional image groups are different in image acquisition angle;

obtaining a training image set;

step 2, inputting the training image set into a neural network for training;

the neural network comprises a feature extraction sub-network, and the feature extraction sub-network is respectively connected with a key point extraction sub-network and a target detection sub-network;

the feature extraction sub-network comprises a feature map extraction module and an interested region extraction module which are sequentially arranged;

the target detection sub-network comprises a target classification module and a bounding box detection module which are connected in parallel;

the key point extraction sub-network comprises a key point probability obtaining module and a key point output module which are connected in series;

the key point probability obtaining module is used for obtaining the probability that each pixel point is a three-dimensional key point;

the key point output module obtains the coordinates of each three-dimensional key point by using a formula I:

Figure FDA0002162963710000011

wherein [ x ]i,yi]Coordinates of the ith three-dimensional key point, I is 1,2, …, I, I is a positive integer, and Pi(u, v) probability calculator for key pointsProbability that the (u, v) th pixel point in the two-dimensional image output by the network is the ith three-dimensional key point, wherein (u, v) is the coordinate of the two-dimensional image, and both u and v are positive integers;

and obtaining a three-dimensional key point extraction model.

2. The method for constructing the extraction model of the target three-dimensional key points in the two-dimensional graph as claimed in claim 1, wherein the feature map extraction module comprises a feature pyramid network and a residual error network which are sequentially arranged; the region of interest extraction module comprises a region generation network.

3. The method for constructing the three-dimensional key point extraction model of the target in the two-dimensional graph, as claimed in claim 1, wherein the key point probability obtaining module comprises a plurality of volume blocks, an upsampling layer and a softmax layer which are connected in series in sequence;

the convolution block comprises a convolution layer and a ReLU active layer which are connected in sequence.

4. The method for constructing a three-dimensional key point extraction model of an object in a two-dimensional graph according to claim 1, wherein the loss function L of the three-dimensional key point extraction model is as follows:

wherein the content of the first and second substances,

Figure FDA0002162963710000022

the negative sample is a two-dimensional image which is extracted by the interesting region extraction module and does not contain a target in the interesting region; the positive sample is a two-dimensional image of an interesting region extracted by the interesting region extraction module and containing a target;

wherein the key point detects the loss function

Figure FDA0002162963710000024

5. The method for constructing a model for extracting three-dimensional key points of an object from a two-dimensional figure as claimed in claim 4, wherein said values β, γ, τ, ε, μ,

Figure FDA0002162963710000032

6. A method for extracting a target three-dimensional key point in a two-dimensional graph is characterized by comprising the following steps:

step A, collecting a two-dimensional image containing a target to be identified to obtain an image to be identified;

and step B, inputting the image to be recognized into the three-dimensional key point extraction model constructed by the method for constructing the three-dimensional key point extraction model of the target in the two-dimensional graph according to any one of claims 1 to 5, and obtaining a three-dimensional key point set of the target to be recognized, wherein the three-dimensional key point set comprises Q three-dimensional key points, and Q is a positive integer.

7. A method for recognizing a three-dimensional posture of an object in a two-dimensional image is used for obtaining a three-dimensional posture matrix of the object in the two-dimensional image, and is characterized by comprising the following steps:

step I, acquiring a two-dimensional image containing a target to be identified, and acquiring the image to be identified;

step II, obtaining a three-dimensional key point set of the target to be recognized in the image to be recognized by adopting the method for extracting the three-dimensional key points of the target in the two-dimensional graph as claimed in claim 6;

step III, calculating the distance between the three-dimensional key point set of the target to be recognized in the image to be recognized and the three-dimensional key point set of each image in the reference image library;

the reference image library comprises a plurality of reference images and information of each reference image, wherein the information of each reference image comprises a three-dimensional key point set of each reference image and a three-dimensional attitude matrix of a target in each reference image, which are obtained by adopting the method for extracting the target three-dimensional key point in the two-dimensional graph of claim 6 for each reference image;

taking the image corresponding to the three-dimensional key point set with the minimum distance as a comparison image, and obtaining the three-dimensional key point set of the comparison image and a three-dimensional attitude matrix of a target in the comparison image;

step IV, subtracting the coordinates of the mass center of the three-dimensional key point set of the target to be recognized from the coordinates of each three-dimensional key point in the three-dimensional key point set of the target to be recognized to obtain a new three-dimensional key point set of the target to be recognized;

subtracting the coordinate of the mass center of the three-dimensional key point set of the comparison image from the coordinate of each three-dimensional key point in the three-dimensional key point set of the comparison image to obtain a new three-dimensional key point set of the comparison image;

v, using singular value decomposition method to

Figure FDA0002162963710000041

wherein, X'nThe coordinate of the nth point in the three-dimensional key point set of the new target to be identified is P'nCoordinates of the nth point in the three-dimensional set of keypoints for the new comparison image, NPAs a new three-dimensional key point set of the target to be recognized or as a new three-dimensional key point set of the contrast imageThe total number of keypoints;

step VI, obtaining an attitude matrix T ═ R | T]Where t is muX-RμP,μXMean coordinate, mu, of a three-dimensional set of key points for a new object to be recognizedPAverage coordinates of the three-dimensional keypoint set of the new contrast image;

step VII, obtaining a three-dimensional attitude matrix T of the target to be recognized in the image to be recognized by adopting a formula IIIinput

Tinput=T·TrefFormula III

Wherein T isrefIs a three-dimensional pose matrix of objects in the contrast image.

Technical Field

The invention relates to a target three-dimensional gesture recognition method, in particular to a target three-dimensional key point extraction model construction and gesture recognition method in a two-dimensional graph.

Background

Target three-dimensional gesture recognition refers to recognizing the three-dimensional position and direction of a target object, and is a key module in many computer vision applications such as augmented reality, robot control, and unmanned tasks. However, the three-dimensional gesture recognition of the target is based on the need to extract three-dimensional key points of the target object, find the two-dimensional position of the object on the image, and extract some key points such as the projection of the 3D frame of the object on the image on the object, and these methods are effective by using a large amount of supervision information, but the workload of labeling three-dimensional information on the image is huge, and extremely high professional knowledge and complicated preparation work are required, and these methods cannot process images with occlusion and complicated backgrounds.

In addition, even after the three-dimensional key point of the target is obtained, the three-dimensional posture of the target cannot be accurately identified, so that the method for acquiring the three-dimensional posture of the target object in the two-dimensional image in the prior art has the problems of low posture acquisition accuracy, large workload, low real-time performance and low robustness.

Disclosure of Invention

The invention aims to provide a method for constructing a target three-dimensional key point extraction model and identifying a gesture in a two-dimensional image, which is used for solving the problems that the accuracy of the method for identifying the three-dimensional key points of a target object in the two-dimensional image is low, the gesture identification accuracy is low and the like in the prior art.

In order to realize the task, the invention adopts the following technical scheme:

a method for constructing a target three-dimensional key point extraction model in a two-dimensional graph is implemented according to the following steps:

step 1, acquiring a plurality of two-dimensional image groups containing targets to be recognized, wherein two-dimensional images in the two-dimensional image groups are different in image acquisition angle;

obtaining a training image set;

step 2, inputting the training image set into a neural network for training;

the neural network comprises a feature extraction sub-network, and the feature extraction sub-network is respectively connected with a key point extraction sub-network and a target detection sub-network;

the feature extraction sub-network comprises a feature map extraction module and a sense interest region extraction module which are sequentially arranged;

the target detection sub-network comprises a target classification module and a bounding box detection module which are connected in parallel;

the key point extraction sub-network comprises a key point probability obtaining module and a key point output module which are connected in series;

the key point probability obtaining module is used for obtaining the probability that each pixel point is a three-dimensional key point;

the key point output module obtains the coordinates of each three-dimensional key point by using a formula I:

wherein [ x ]i,yi]Coordinates of the ith three-dimensional key point, I is 1,2, …, I, I is a positive integer, and Pi(u, v) represents the probability that the (u, v) th pixel point in the two-dimensional image output by the key point probability calculation sub-network is the ith three-dimensional key point, wherein (u, v) is the coordinate of the two-dimensional image, and u and v are positive integers;

and obtaining a three-dimensional key point extraction model.

Furthermore, the characteristic map extraction module comprises a characteristic pyramid network and a residual error network which are sequentially arranged; the region of interest extraction module comprises a region generation network.

Further, the key point probability obtaining module comprises a plurality of convolution blocks, an up-sampling layer and a softmax layer which are sequentially connected in series;

the convolution block comprises a convolution layer and a ReLU active layer which are connected in sequence.

Further, the loss function L of the three-dimensional key point extraction model is as follows:

Figure BDA0002162963720000031

wherein the content of the first and second substances,

Figure BDA0002162963720000032

represents the sum of the classification loss functions of all negative examples,

Figure BDA0002162963720000033

target classification loss function L representing all positive samplesclassBounding box detection loss function LboxAnd a keypoint detection loss function LkeypointsThe sum of beta and gamma is more than 0;

the negative sample is a two-dimensional image which is extracted by the interesting region extraction module and does not contain a target in the interesting region; the positive sample is a two-dimensional image of a target contained in the interesting region extracted by the interesting region extraction module;

wherein the key point detects the loss function

Figure BDA0002162963720000034

Wherein L isdisTo be a significant loss function, LdepFor depth prediction loss function, LconAs a three-dimensional consistency loss function, LsepAs a function of separation loss, LposeEstimate the loss function for the relative attitude, tau, epsilon, mu,

Figure BDA0002162963720000035

Are all greater than 0.

Further, the beta, gamma, tau, epsilon, mu,

Figure BDA0002162963720000036

both are 1, and δ is 0.08.

A method for extracting a target three-dimensional key point in a two-dimensional graph is implemented according to the following steps:

step A, collecting a two-dimensional image containing a target to be identified to obtain an image to be identified;

and B, inputting the image to be recognized into a three-dimensional key point extraction model constructed by the construction method of the target three-dimensional key point extraction model in the two-dimensional graph to obtain a three-dimensional key point set of the target to be recognized, wherein the three-dimensional key point set comprises Q three-dimensional key points, and Q is a positive integer.

A method for recognizing a three-dimensional posture of a target in a two-dimensional image is used for obtaining a three-dimensional posture matrix of the target in the two-dimensional image and is executed according to the following steps:

step I, acquiring a two-dimensional image containing a target to be identified, and acquiring the image to be identified;

step II, obtaining a three-dimensional key point set of the target to be recognized in the image to be recognized by adopting the method for extracting the three-dimensional key points of the target in the two-dimensional graph as claimed in claim 6;

step III, calculating the distance between the three-dimensional key point set of the target to be recognized in the image to be recognized and the three-dimensional key point set of each image in the reference image library;

the reference image library comprises a plurality of reference images and information of each reference image, wherein the information of each reference image comprises a three-dimensional key point set of each reference image and a three-dimensional attitude matrix of a target in each reference image, which are obtained by adopting the method for extracting the target three-dimensional key point in the two-dimensional graph of claim 6 for each reference image;

taking the image corresponding to the three-dimensional key point set with the minimum distance as a comparison image, and obtaining the three-dimensional key point set of the comparison image and a three-dimensional attitude matrix of a target in the comparison image;

step IV, subtracting the coordinates of the mass center of the three-dimensional key point set of the target to be recognized from the coordinates of each three-dimensional key point in the three-dimensional key point set of the target to be recognized to obtain a new three-dimensional key point set of the target to be recognized;

subtracting the coordinate of the mass center of the three-dimensional key point set of the comparison image from the coordinate of each three-dimensional key point in the three-dimensional key point set of the comparison image to obtain a new three-dimensional key point set of the comparison image;

v, using singular value decomposition method to

Figure BDA0002162963720000051

Decomposing to obtain a rotation matrix R;

wherein, X'nThe coordinate of the nth point in the three-dimensional key point set of the new target to be identified is P'nCoordinates of the nth point in the three-dimensional set of keypoints for the new comparison image, NPThe total number of the three-dimensional key points in the three-dimensional key point set of the new target to be recognized or the three-dimensional key point set of the new contrast image is obtained;

step VI, obtaining an attitude matrix T ═ R | T]Where t is muX-RμP,μXMean coordinate, mu, of a three-dimensional set of key points for a new object to be recognizedPAverage coordinates of the three-dimensional key point set of the new contrast image;

step VII, obtaining a three-dimensional attitude matrix T of the target to be recognized in the image to be recognized by adopting a formula IIIinput

Tinput=T·TrefFormula III wherein TrefIs a three-dimensional pose matrix of objects in the contrast image.

Compared with the prior art, the invention has the following technical effects:

1. according to the method for constructing and extracting the target three-dimensional key point extraction model in the two-dimensional graph, the coordinates of the target three-dimensional key point can be accurately and directly output by designing the network structure of the three-dimensional key point extraction model; by the aid of the designed key point loss function, the network can independently learn and extract key points with semantic consistency and geometric consistency in an unsupervised mode, and accuracy of three-dimensional key point extraction is improved;

2. according to the method for constructing and extracting the target three-dimensional key point extraction model in the two-dimensional graph, in the network training stage, no three-dimensional model of any object or three-dimensional marking on the image is needed, compared with the existing method, the workload of marking can be greatly reduced, and the efficiency of the extraction method is improved;

3. according to the method for recognizing the three-dimensional posture of the target in the two-dimensional graph, the three-dimensional space coordinate system is established by setting the comparison image, and the recognition precision is improved.

Drawings

FIG. 1 is an internal structure diagram of a three-dimensional key point extraction model provided by the present invention;

FIG. 2 is a diagram of an internal structure of a keypoint probability acquisition module provided in an embodiment of the present invention;

FIG. 3 is an image to be recognized provided in an embodiment of the present invention;

fig. 4 is an image representation of a three-dimensional key point set obtained by performing three-dimensional key point extraction on the image to be recognized shown in fig. 3 according to an embodiment of the present invention.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings and examples. So that those skilled in the art can better understand the present invention. It is to be expressly noted that in the following description, a detailed description of known functions and designs will be omitted when it may obscure the subject matter of the present invention.

The following definitions or conceptual connotations relating to the present invention are provided for illustration:

three-dimensional key points: the structure is located on the more prominent structure of the object, represents the local characteristics of the surface of the object and has rotation invariance relative to the object.

A bounding box: for marking the position of an object in the image.

Significance loss function: and (4) using the light and shade characteristics to enable the key points to fall on the salient positions of the objects.

Depth prediction loss function: the epipolar geometry principle is used to train the network so that the depth of the key points can be accurately predicted.

Three-dimensional consistency loss function: for ensuring that the same area can be stably tracked under different viewing angles.

Separation loss function: and a certain distance is reserved between every two key points, so that the key points are prevented from being overlapped.

Relative attitude estimation loss function: is a penalty term for the angle, i.e. the difference between the true value of the angle of the relative pose of the camera between the pair of input images and the relative angle estimated from the detected keypoints, this term loss contributes to the generation of a meaningful and natural set of 3D keypoints.

Rotating the matrix: the rotation matrix is used to describe the rotation of the object around the x, y, z axes, and is a 3x3 orthogonal matrix with a determinant of 1.

An attitude matrix: the method comprises the following steps of [ R | T ], wherein R is a rotation matrix, and T is a translation matrix, and rotation information and translation information of an object in a three-dimensional space are described.

14页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种基于点云数据的工件位姿快速高精度估算方法及装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!