Three-dimensional model shape recognition method based on dynamic graph convolution

文档序号：1862777 发布日期：2021-11-19 浏览：23次中文

阅读说明：本技术 基于动态图卷积的三维模型形状识别方法 (Three-dimensional model shape recognition method based on dynamic graph convolution ) 是由韩丽佟宇宁兰鹏燕于 2021-08-03 设计创作，主要内容包括：本发明公开一种基于动态图卷积的三维模型形状识别方法,首先,提出一种动态权重分配的图卷积算子来建立滤波器权重与具有任意连通性的模型内邻域矩阵之间的对应关系；其次,通过交叉验证方法对采样点及其K邻域进行优化,并利用可变形卷积算子计算出更准确的邻域点信息,学习得到感受野中每个邻域点位置的偏移量；最后,采用自适应动态图卷积神经网络模型,实现了非刚性三维模型的有效识别。实验结果表面,本发明具有更高的识别准确性和鲁棒性,其识别精度较之最新的基于图卷积神经网络的形状识别算法提高了2％～4％,并解决了常规图卷积不能应用到不规则网格上的问题。(The invention discloses a three-dimensional model shape recognition method based on dynamic graph convolution, which comprises the following steps of firstly, providing a graph convolution operator with dynamic weight distribution to establish the corresponding relation between filter weight and a neighborhood matrix in a model with any connectivity; secondly, optimizing the sampling points and K neighborhoods thereof by a cross validation method, calculating more accurate neighborhood point information by using a deformable convolution operator, and learning to obtain the offset of each neighborhood point position in the receptive field; and finally, the effective identification of the non-rigid three-dimensional model is realized by adopting a self-adaptive dynamic graph convolution neural network model. The experimental result shows that the method has higher identification accuracy and robustness, the identification accuracy is improved by 2-4% compared with the latest shape identification algorithm based on the graph convolution neural network, and the problem that the conventional graph convolution cannot be applied to irregular grids is solved.)

1. A three-dimensional model shape recognition method based on dynamic graph convolution is characterized by comprising the following steps:

step 1, establishing an adaptive dynamic graph convolutional neural network (ADCNN), wherein the ADCNN comprises 1 deformable 3D convolutional layer, 4 convolutional layers, 2 upsampling layers and 3 full-connection layers, the ADCNN is respectively 3DConv, Conv1, Conv2, Conv3, Conv4, UC1, UC2, FC1, FC2 and FC3, and the sizes of all convolutional cores are 3 multiplied by 3 pixels;

the 3Dconv comprises 32 convolution kernels initialized by an Xavier method, and ReLU is used as an activation function; conv1 contains 64 convolutional kernels initialized by the Xavier method and a pooling layer of 64 neurons, with ReLU as the activation function; conv2 contains 32 convolutional kernels initialized by Xavier method and 32 pooling layers of neurons, taking ReLU as activation function; conv3 contains 64 convolutional kernels initialized by the Xavier method and a pooling layer of 64 neurons, with ReLU as the activation function; conv4 contains 32 convolutional kernels initialized by Xavier method and 32 pooling layers of neurons, taking ReLU as activation function; the dimension of the feature vector output by UC1 is 128; the dimension of the feature vector output by UC2 is 64; the dimensionality of the eigenvectors output by FC1 and FC2 are both 32; FC3 outputs 1024-dimensional feature vectors using a 1 × 1 convolution; the arrangement order of the layers is 3Dconv, FC1, Conv1, FC2, Conv2, Conv3, UC1, Conv4, UC2 and FC3 from front to back;

the 3Dconv layer completes convolution operation by using a deformable three-dimensional convolution operator defined by a formula (1) to obtain the 3D offset (delta x, delta y and delta z) of each neighborhood point;

wherein p is_iRepresenting the ith position, y (p), in the 3D model_i) Representing deformable three-dimensional convolution operators at p_iOutput characteristic generated, p_jIs represented by p_iThe j-th position, N, in the centered, 3 × 3 × 3 convolutional sampling grid_iRepresents p_iIs sampled in the neighborhood, | N_iI represents N_iNumber of included sampling points, Δ p_jRepresents p_j3D offset of w (p)_j) Represents p_jFor p_iTrainable weights of (x (p))_i+p_j+Δp_j) Representing the 3D model at position p_i+p_j+Δp_jThe value of the sampling characteristic function;

the Conv1, Conv2, Conv3 and Conv4 layers complete convolution operation by using convolution operators defined by formula (2);

wherein, y_iI-th eigenvector, x, representing the convolutional layer output_iI-th feature vector, x, representing convolutional layer input_jJ-th eigenvector representing convolutional layer input, b represents trainable offset, W_mRepresenting the mth trainable weighting matrix and 1. ltoreq. M. ltoreq.M, M representing the total number of trainable weighting matrices, q_m(x_i，x_j) Denotes x_jTo W_mThe weight function of (a), the definition of which is given by equation (3);

wherein the content of the first and second substances,and c_mRepresenting a linear transformation parameter vector, and a superscript T representing a transposition operation of the vector;

step 2, generating a neighborhood matrix U of the 3D model sampling points;

step 2.1 inputs the position coordinates of each sampling point of the n non-rigid 3D models to form a 3D model set D ═ M₁，M₂，…，M_i，…M_n}, said M_iRepresenting the ith 3D model;

2.2, for each 3D model in D, selecting neighborhood points by using a KNN network, and further establishing a neighborhood matrix U of the sampling points;

step 3, selecting one third of data from the 3D model set D as a training data set, one third of data as a test data set, and the other third of data as a verification data set, and further calculating the optimal number of neighborhood points by adopting a cross verification method;

step 4, training the adaptive dynamic graph convolution neural network ADCNN by using the 3D model data set obtained in the step 2, wherein the batch _ size is set to be 16, the regularization parameter is set to be 0.5, the learning rate is set to be 0.01, and the dropout rate is set to be 0.5;

step 4.1, inputting original coordinate information, neighborhood point position information and a neighborhood matrix U of each 3D model in a training data set, uniformly adjusting the resolution of the feature image of each model into 224 multiplied by 224 pixels, and enabling an iteration number counter iter ← 1;

step 4.2, for each 3D model, calculating the 3D offset (delta x, delta y, delta z) of each neighborhood point by using a deformable three-dimensional convolution operator of a 3Dconv layer, converting the 3D offset (delta x, delta y, delta z) of each neighborhood point into an integer offset (delta xz, delta yz, delta zz) by using a cubic linear interpolation algorithm, and adding the original coordinate of each neighborhood point and the offset (delta xz, delta yz, delta zz) to obtain an updated 3D model;

step 4.3, inputting each updated 3D model in the training data set into each subsequent layer of the ADCNN network, thereby obtaining a 4096-dimensional output characteristic vector y for each 3D model;

step 4.4, inputting the characteristic vector y of each 3D model into a Softmax function to obtain the recognition matching result of the ADCNN on the training set and the prediction loss thereof;

step 4.5, letting iter ← iter +1, if iter > TotalIter, obtaining a trained adaptive dynamic graph convolution neural network (ADCNN), and turning to step 5, otherwise, updating parameters of the ADCNN by using a reverse error propagation algorithm and prediction loss based on a random gradient descent method, and turning to step 4.2 to reprocess all 3D models in a training set, wherein TotalIter represents a preset iteration number;

step 5, utilizing the trained adaptive dynamic graph convolution neural network ADCNN to carry out shape recognition on the 3D model;

step 5.1, inputting original coordinate information, neighborhood point position information and a neighborhood matrix U of a 3D model to be processed, and uniformly adjusting the resolution of a characteristic image of the model to 224 multiplied by 224 pixels;

step 5.2, inputting the original coordinate information, neighborhood point position information, neighborhood matrix U and the adjusted characteristic image of the InputModel into an adaptive dynamic graph convolution neural network (ADCNN), and calculating to obtain a 4096-dimensional output characteristic vector y;

and 5.3, inputting the feature vector y into a Softmax function, and outputting the identification result of the 3D model InputModel.

Technical Field

The invention belongs to the field of shape analysis and identification of three-dimensional models, and particularly relates to a three-dimensional model shape identification method which is wide in application range, high in identification precision, good in robustness, independent of shape descriptors and based on dynamic graph convolution.

Background

Along with the popularization of the mobile internet, the rise of big data and the development of artificial intelligence, the 3D model has become an essential part of people's life. Especially in special fields such as virtual reality, unmanned driving, three-dimensional game animation, etc., the processing and analysis of 3D models is the key technical basis for wide application. Under the circumstances, efficient and accurate 3D model recognition has become one of the important research subjects in the fields of computer graphics, computer vision, etc. today, and a large number of open source 3D model databases, such as McGill model base, Summer model base, shapeogle model base, ShapeNet model base, SHREC model base, ModelNet model base, etc., are emerging. The application and development of these model databases not only meet the requirements of various users, but also provide great convenience for researchers engaged in 3D computer graphics. In the face of such a complex and large-scale database, many users urgently need an effective 3D model identification algorithm, so that the 3D models required by the users can be accurately and quickly found in a large batch of models.

Existing 3D model recognition algorithms are mainly classified into two categories, namely recognition algorithms based on manual feature descriptor extraction and recognition algorithms based on machine learning feature descriptors.

Firstly, an identification algorithm based on manual feature descriptor extraction requires a user to extract descriptors describing the basic structure shape features of a 3D model from 3D graphic model data by means of manual work, such as geometric feature descriptors, skeleton feature descriptors, image projection feature descriptors, multi-feature fusion descriptors, and the like, and further, an identification algorithm based on calculation of geometric feature descriptors, an identification algorithm based on skeleton feature descriptors, an identification algorithm based on image projection feature descriptors, and an identification algorithm based on multi-feature fusion feature descriptors appear. Since these feature descriptors cannot provide all the geometric feature information of the 3D model, when such algorithms are applied to different 3D graphical model databases, the descriptors need to be carefully designed according to the features of the data, and the designed features are very dependent on the specific databases, and the recognition effect on some databases is not necessarily excellent. Therefore, the 3D model identification method often has obvious limitations.

Secondly, on the basis that the bottom layer features are usually extracted from the surface of the model by the recognition algorithm based on the machine learning feature descriptors, deeper and more abstract high-level features of the 3D model are learned by utilizing a deep learning framework, manual participation is not needed, and the influence of factors such as model posture is small. Wang et al extracts features from the point cloud model, selects feature points from the farthest sampling points, performs convolution operation on the neighborhood of each feature point, and obtains local structure information after multi-layer pooling, spectral convolution and clustering, thereby connecting the feature point information adjacent to the spectral coordinates thereof as an overall feature descriptor of the model, and realizing shape recognition of the 3D model. Fang et al extracts a thermal shape descriptor HeatSD from the thermal kernel features of the 3D model, then trains by using a principal component analysis algorithm and linear discriminant analysis, generates a feature shape descriptor ESD and a Fisher shape descriptor FSD respectively, and inputs the ESD and FSD into a deep neural network to learn to obtain deep sd to complete 3D model shape recognition. Carlos et al maps a 3D model onto a sphere, generates an invariant feature map through continuous spherical transformation, and further obtains a feature descriptor with rotational invariance by using a global weighted average of the feature map to complete 3D shape recognition. Luciano et al introduced an advanced discriminant feature classifier based on automatic geodesic moment and depth machine learning, and obtained higher model identification rate by automatically learning the advanced discriminant features of each model through a hidden discriminant layer and a sparse self-encoder module. The Xizhige et al provides a feature automatic learning algorithm of a three-dimensional model by applying an extreme learning machine and a convolutional neural network research, and obtains a good practical application effect. Su et al use 2D projection images taken from different angles as input training signals for a neural network, train the input 2D projection images with a convolutional neural network, and learn to obtain a 3D-shaped classification recognition model. The model achieves higher shape recognition accuracy than a model learned directly using 3D data. However, the conventional Convolutional Neural Network (CNN) needs to extract the shape descriptor of the 3D model to describe the features, so as to realize the shape recognition of the 3D model, and the recognition accuracy and robustness are not satisfactory.

In this case, considering that most 3D models are presented and stored in the form of a graph, and the graph structure can naturally express a relevant data structure in real life, local structure information of each node in the graph is different and has high flexibility, researchers further construct a deep learning model on the graph structure. Because the Graph Convolution Neural Network (GCNN) has stronger description capacity on the modeling capacity of a local structure and ubiquitous node dependence, the 3D model identification method based on the graph convolution neural network is widely concerned by researchers, and the basic idea is to train a regressor from a multi-scale grid of a fixed bounding box, iteratively and circularly move or amplify elements in the grid towards a target, and further convert a target detection problem model into a path problem of searching for a boundary from the fixed grid to the tightly surrounding target. Unfortunately, although the 3D model identification method based on the graph convolution neural network can achieve good performance for conventional and stable grid representation, the method cannot be easily extended to 3D shape identification of irregular grid representation due to the fact that the selection of the neighborhood points is too fixed and the irregular model cannot be well fitted, and the range of the convolution kernel cannot be enlarged or reduced along with the change of the object, and the matching degree of the extracted features and the model is not accurate enough.

In summary, the 3D model shape recognition method based on machine learning and convolutional neural network still faces the problems of limited application range, low recognition accuracy, low robustness, high dependence on shape descriptors, and the like. Therefore, how to extend the convolutional neural network to an irregular and transformed 3D model and realize the shape recognition of the 3D model has not been solved effectively.

Disclosure of Invention

The invention aims to solve the technical problems in the prior art, and provides a three-dimensional model shape recognition method which can effectively process irregular and transformed 3D models, has wide application range, high recognition precision and good robustness, does not depend on shape descriptors and is based on dynamic graph convolution.

The technical solution of the invention is as follows: a three-dimensional model shape recognition method based on dynamic graph convolution is characterized by comprising the following steps:

wherein p is_iRepresenting the ith position, y (p), in the 3D model_i) Representing deformable three-dimensional convolution operators at p_iOutput characteristic generated, p_jIs represented by p_iCentered, 3 × 3 × 3 convolutionSampling the jth position in the grid, N_iRepresents p_iIs sampled in the neighborhood, | N_iI represents N_iNumber of included sampling points, Δ p_jRepresents p_j3D offset of w (p)_j) Represents p_jFor p_iTrainable weights of (x (p))_i+p_j+Δp_j) Representing the 3D model at position p_i+p_j+Δp_jThe value of the sampling characteristic function;

the Conv1, Conv2, Conv3 and Conv4 layers complete convolution operation by using convolution operators defined by formula (2);

wherein the content of the first and second substances,and c_mRepresenting a linear transformation parameter vector, and a superscript T representing a transposition operation of the vector;

step 2, generating a neighborhood matrix U of the 3D model sampling points;

step 2.1 inputs the position coordinates of each sampling point of the n non-rigid 3D models to form a 3D model set D ═ M₁，M₂，…，M_i,…M_n}, said M_iRepresenting the ith 3D model;

2.2, for each 3D model in D, selecting neighborhood points by using a KNN network, and further establishing a neighborhood matrix U of the sampling points;

step 4.4, inputting the characteristic vector y of each 3D model into a Softmax function to obtain the recognition matching result of the ADCNN on the training set and the prediction loss thereof;

step 5, utilizing the trained adaptive dynamic graph convolution neural network ADCNN to carry out shape recognition on the 3D model;

and 5.3, inputting the feature vector y into a Softmax function, and outputting the identification result of the 3D model InputModel.

Compared with the prior art, the invention has the following advantages: firstly, a self-adaptive dynamic subgraph convolution learning model is introduced to directly learn input coordinate information of an original model, trainable offset is added to the neighborhood position of an original sampling point by using a deformable convolution operator, new neighborhood point information is obtained by superposition and is input to a graph convolution neural network with dynamic weight distribution together with the original sampling point, a transformed three-dimensional model can be better adapted, and dependence of a traditional method on a shape descriptor is eliminated; secondly, the strategy of self-adaptive distribution of the filter weight is utilized, the corresponding relation between GHBorder in the graph is calculated through the dynamically learned characteristics, so that the correspondence between GHBorder in the graph with any connectivity can be accurately described, the method does not depend on the predefined static coordinates on the graph, and the more efficient shape recognition of the non-rigid three-dimensional model is realized. Therefore, the method can effectively process irregular and transformed 3D models, has the advantages of wide application range, high identification precision, good robustness, independence on shape descriptors and the like, and achieves the average identification accuracy rate of 97.64% on the SHREC2010, SHREC2011 and SHREC2015 model sets.

Detailed Description

The invention discloses a three-dimensional model shape recognition method based on dynamic graph convolution, which is carried out according to the following steps;

the Conv1, Conv2, Conv3 and Conv4 layers complete convolution operation by using convolution operators defined by formula (2);

wherein the content of the first and second substances,and c_mRepresenting a linear transformation parameter vector, and a superscript T representing a transposition operation of the vector;

step 2, generating a neighborhood matrix U of the 3D model sampling points;

step 2.1 inputs the position coordinates of each sampling point of the n non-rigid 3D models to form a 3D model set D ═ M₁,M₂,…,M_i,…M_n}, said M_iRepresenting the ith 3D model;

2.2, for each 3D model in D, selecting neighborhood points by using a KNN network, and further establishing a neighborhood matrix U of the sampling points;

step 4.4, inputting the characteristic vector y of each 3D model into a Softmax function to obtain the recognition matching result of the ADCNN on the training set and the prediction loss thereof;

step 4.5, letting iter ← iter +1, if iter > totalliter, obtaining a trained adaptive dynamic graph convolution neural network ADCNN, and turning to step 5, otherwise, using a reverse error propagation algorithm based on a random gradient descent method and a prediction loss to update parameters of the ADCNN, turning to step 4.2, and re-processing all 3D models in a training set, wherein totalliter represents a preset iteration number, and in the present embodiment, making totalliter ═ 500;

step 5, utilizing the trained adaptive dynamic graph convolution neural network ADCNN to carry out shape recognition on the 3D model;

and 5.3, inputting the feature vector y into a Softmax function, and outputting the identification result of the 3D model InputModel.

Three 3D model sets of SHREC2010, SHREC2011 and SHREC2015 are taken as experimental data sets, and average identification accuracy results obtained by identifying the 3D models based on an ACNN (adaptive connected neural network) method, a GCNN (graph-based connected logical network) method and the method are shown in table 1.

As can be seen from table 1, the ACNN method needs to extract the shape descriptor of the 3D model in advance to realize shape recognition, classification, segmentation, and the like of the 3D model, and cannot well fit the point set topology of the irregular model, and the recognition accuracy is only 84.52%, which is still not satisfactory; the convolution kernel range of the GCNN method cannot be enlarged or reduced along with the change of an object, the matching degree of the extracted features and the model and the selection of the neighborhood points have further improved space, and the identification accuracy is 95.04%; the method utilizes the self-adaptive dynamic subgraph convolution learning model to better adapt to the transformation model, gets rid of the dependence of the shape recognition method on the shape descriptor, introduces the deformable convolution operator and the self-adaptive distribution filter strategy, realizes more accurate 3D model shape recognition, and has the average recognition accuracy rate of 97.64 percent.

TABLE 1 comparison of recognition accuracy obtained by different methods on various datasets

Identification method	ACNN	GCNN	The invention
				SHREC2010	83.26％	94.65％	96.94％
SHREC2011	84.53％	95.10％	97.07％
				SHREC2015	85.78％	95.36％	98.90％

9页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：基于强化学习的传送控制

Three-dimensional model shape recognition method based on dynamic graph convolution

相关技术

网友询问留言