TM-SRC-based three-dimensional non-texture target pose estimation method

文档序号：551877 发布日期：2021-05-14 浏览：2次中文

阅读说明：本技术 一种基于tm-src的三维无纹理目标位姿估计方法 (TM-SRC-based three-dimensional non-texture target pose estimation method ) 是由王建林谭振国郭永奇邱科鹏于 2021-01-16 设计创作，主要内容包括：本发明公开了一种基于TM-SRC的三维无纹理目标位姿估计方法,首先,基于三维目标CAD模型离线生成不同采样视点下的模板图像构建模板库；然后通过引入采样视点位姿的先验知识构建先验分层树,在先验分层树高层使用相似性度量进行在线匹配,得到候选模板,候选模板的子节点被追踪到先验分层树的底层；最后,在先验分层树的底层使用稀疏表示分类进行分类获得最佳匹配模板关联的位姿参数。本发明考虑了穷举搜索时间复杂度过高和现有基于模板匹配的方法对遮挡和杂乱背景鲁棒性较低的问题,构建了基于TM-SRC的三维无纹理目标位姿估计方法,提高了在遮挡和杂乱背景下三维无纹理目标位姿估计的鲁棒性。(The invention discloses a TM-SRC-based three-dimensional non-texture target pose estimation method, which comprises the following steps of firstly, generating template images under different sampling viewpoints to construct a template library on the basis of a three-dimensional target CAD model in an off-line manner; then, constructing a priori hierarchical tree by introducing priori knowledge of a sampling viewpoint pose, performing online matching on the high layer of the priori hierarchical tree by using similarity measurement to obtain a candidate template, and tracking the child nodes of the candidate template to the bottom layer of the priori hierarchical tree; and finally, classifying the bottom layer of the prior hierarchical tree by using sparse representation classification to obtain the pose parameters associated with the optimal matching template. The invention considers the problems of overhigh complexity of exhaustive search time and lower robustness of the existing template matching-based method to the occlusion and disordered backgrounds, constructs the TM-SRC-based three-dimensional texture-free target pose estimation method, and improves the robustness of the three-dimensional texture-free target pose estimation under the occlusion and disordered backgrounds.)

1. A TM-SRC-based three-dimensional texture-free target pose estimation method is characterized by comprising the following steps: the method specifically comprises the following steps:

the method comprises the following steps: building a template library based on the three-dimensional target CAD model, calculating a projection transformation matrix by using pose parameters of different viewpoints, analyzing and rendering the three-dimensional target CAD model by using an open graphics library OpenGL, generating templates under different viewpoints by projection, extracting and storing the characteristics of the template library template, and completing the building of the template library offline;

step two: establishing a priori hierarchical tree for accelerating search and matching, classifying templates of adjacent viewpoints into a class by introducing priori knowledge of sampling viewpoint poses, taking a central viewpoint template as a class center, and performing down-sampling on a template image to complete the establishment of the priori hierarchical tree;

step three: inputting a three-dimensional target test image, performing down-sampling to construct a test image pyramid, and loading template features and index information of a prior hierarchical tree constructed offline; and matching the high layers of the prior hierarchical tree by using similarity measurement, introducing sparse representation classification into the bottom layer of the prior hierarchical tree for classification to obtain a final matching template, outputting the index of the final matching template and associated pose parameters, and realizing the pose estimation of the three-dimensional texture-free target.

2. The method according to claim 1, wherein the method for estimating the pose of the TM-SRC-based three-dimensional non-textured object is characterized in that: the first step comprises the following steps of,

simulating the imaging process of the CCD camera by utilizing OpenGL to generate template images under different viewpoints, extracting the characteristics of the template images, storing the characteristics, and completing template library construction in an off-line manner; the camera imaging is summarized as follows: world coordinate system O_w-X_wY_wZ_wThree-dimensional point in (1) passes through the camera coordinate system O_c-X_cY_cZ_cImage physical coordinate system O_RXy, image pixel coordinate system O_I-a transformation between uv, finally to a pixel point p in the image plane_I(u, v), the transformation relationship between the world coordinate system and the image pixel coordinate system is expressed as:

in the formula, eta is 1/z_cIs a scale factor, z_cIs a depth value; (u)₀,v₀) Is a principal point coordinate; f. of_xAnd f_yIs the ratio of the focal length of the camera to the pixel size in the x and y directions; gamma is O_R-a deviation of perpendicularity of two coordinate axes of the xy coordinate system; k is an internal reference matrix of the camera; m is an external reference matrix of the camera and consists of a rotation matrix R and a translation vector t;

constructing a template base based on a three-dimensional target CAD model, and projecting the target model from different visual angles; simulating an imaging process by using a virtual camera of OpenGL, assuming that a three-dimensional target CAD model is in the central position of a unit sphere, the virtual camera is positioned on the surface of the sphere, projecting the three-dimensional target CAD model at different viewpoints, moving the virtual camera on the spherical surface, expressing template pose parameters by four parameters of a sampling viewpoint radius r and rotating angles alpha, beta and lambda around an x-y axis and an optical axis z, and uniformly sampling on the spherical surface with the four pose parameters to obtain the pose parameters of each viewpoint; the RGB three-channel value of the color corresponding to each plane is respectively set as three components of a normal vector of the color, and different surfaces of the target are endowed with different colors to meet the requirement of subsequent feature extraction;

after the template image is obtained, feature extraction needs to be carried out on the template image for subsequent similarity calculation, and a large number of two-dimensional images are generated near the sampling viewpoint pose through random sampling projection; extracting gradients and gradient directions from each image by using a discrete difference operator Sobel operator, selecting the gradient direction with the maximum gradient amplitude in RGB three channels as the gradient direction of the pixel point, and quantizing the extracted gradient directions into 8 directions; constructing a gradient direction histogram on each pixel point by utilizing the quantized gradient direction; and finally, reserving the quantization gradient direction larger than the set threshold, constructing a binary character string as a feature for each pixel point, and reserving the maximum frequency of the histogram as the weight of the subsequent matching similarity measurement.

3. The method according to claim 1, wherein the method for estimating the pose of the TM-SRC-based three-dimensional non-textured object is characterized in that: the second step comprises the following steps of,

constructing a prior hierarchical tree to accelerate search matching, and accelerating the search matching process by a tree construction method; constructing a hierarchical tree by using priori knowledge of sampling viewpoint poses, uniformly sampling on a spherical surface with four pose parameters in a template generation process, wherein the pose parameters of each template are close to those of adjacent templates, so that the similarity between the adjacent templates is high; on the basis of the prior knowledge, the adjacent viewpoints are classified into one class, the central viewpoint is used as a class center to participate in next classification, a prior hierarchical tree is established, and the image pyramid is constructed by using down sampling to further accelerate the matching speed.

4. The method according to claim 1, wherein the method for estimating the pose of the TM-SRC-based three-dimensional non-textured object is characterized in that: the third step comprises the following steps of,

inputting a three-dimensional target test image, performing down-sampling to construct a test image pyramid, and loading template features and index information of a prior hierarchical tree constructed offline; introducing sparse representation classification on the basis of the existing pose estimation method based on template matching, constructing a pose estimation method based on TM-SRC, firstly, matching by using similarity measurement at the high level of a prior hierarchical tree generated in the step two, and adopting a robust similarity measurement based on a characteristic PCOF:

the similarity score is calculated by a bit operation and (symbol ^) where ori is^IAnd ori^TRespectively representing the characteristics of the test image and the template at each pixel point, x and y are coordinates of the pixel points with the template characteristics, x_i,y_iThe offset in the x and y directions in the sliding process; delta (ori)^I∈ori^T) Calculating function for single pixel point weight, n is characteristic quantity of template, w_iIs the weight of the pixel point, i.e. the frequency of the main direction of the histogram; if the result of the characteristic phase comparison of the test image and the template is true, adding corresponding weight to the numerator of the similarity score; .

Technical Field

The invention relates to a three-dimensional texture-free target pose estimation method, belongs to the technical field of machine vision, and particularly relates to a three-dimensional texture-free target pose estimation method based on Template Matching and Sparse Representation Classification (TM-SRC).

Background

The three-dimensional target pose estimation has wide application in the fields of equipment manufacturing, robot grabbing, augmented reality and the like. With the proposal of local Feature descriptors such as SIFT (Scale-innovative Feature Transform), surf (speeded Up Robust features), orb (organized Fast and rotaed brief), the difficulty of estimating the pose of a three-dimensional target with texture information is reduced, but most targets often lack or have no texture information, and for these non-texture targets, the method based on the local Feature descriptors cannot be applied.

The existing target pose estimation method mainly comprises three categories based on geometric features, deep learning and template matching. The pose estimation method based on the geometric features usually utilizes known structural features of three-dimensional targets such as points, straight lines, curves and the like, and carries out pose calculation on the structural features through coordinate transformation to obtain pose information of the three-dimensional targets, the method is extremely dependent on the texture features of the three-dimensional targets, and stable key points cannot be usually extracted for targets without textures; in recent years, a method based on deep learning obtains a good effect, but the method has the problem of difficulty in target data collection and labeling, and many scenes have insufficient computer resources to operate the method; the method based on template matching selects the pose parameter associated with the template with the highest similarity with the test image in the template library as a pose estimation result by calculating the similarity between the test image and the template image, and has good performance on a non-texture target and higher precision because the pose parameter does not depend on the texture characteristics of the target. However, the template matching-based method still has limitations, such as too high complexity of exhaustive search matching time for the template library; meanwhile, the existing pose estimation method based on template matching has the problem of low robustness on occlusion and disordered backgrounds.

Therefore, the invention provides a TM-SRC-based three-dimensional non-texture target pose estimation method, which constructs a priori hierarchical tree to accelerate search matching by introducing priori knowledge of sampling viewpoint poses, matches the priori hierarchical tree at the upper layer by using similarity measurement, and classifies the first priori hierarchical tree at the bottom layer by using sparse representation classification, so that pose estimation of a three-dimensional non-texture target is realized, and the robustness of the pose estimation of the three-dimensional non-texture target under the shielding and disordered backgrounds is improved.

Disclosure of Invention

The invention provides a TM-SRC-based three-dimensional texture-free target pose estimation method aiming at improving robustness of three-dimensional texture-free target pose estimation under occlusion and disordered backgrounds, which comprises the following steps:

The first step specifically comprises:

and simulating the imaging process of the CCD camera by utilizing OpenGL to generate template images under different viewpoints, extracting the characteristics of the template images, storing the characteristics, and completing template library construction in an off-line manner. Camera imaging can be summarized as: world coordinate system O_w-X_wY_wZ_wThree-dimensional point in (1) passes through the camera coordinate system O_c-X_cY_cZ_cImage, and imagePhysical coordinate system O_RXy, image pixel coordinate system O_I-a transformation between uv, finally to a pixel point p in the image plane_I(u, v), the transformation relationship between the world coordinate system and the image pixel coordinate system can be expressed as:

in the formula, eta is 1/z_cIs a scale factor, z_cIs a depth value; (u)₀,v₀) Is a principal point coordinate; f. of_xAnd f_yIs the ratio of the focal length of the camera to the pixel size in the x and y directions; gamma is O_R-a deviation of perpendicularity of two coordinate axes of the xy coordinate system; k is an internal reference matrix of the camera; m is an external parameter matrix of the camera and consists of a rotation matrix R and a translation vector t.

And constructing a template base based on the three-dimensional target CAD model, and projecting the target model from different visual angles. The invention utilizes the virtual camera of OpenGL to simulate the imaging process, assumes that a three-dimensional object CAD model is in the central position of a unit sphere, the virtual camera is positioned on the surface of the sphere, projects the three-dimensional object CAD model at different viewpoints, and can be realized by moving the virtual camera on the spherical surface, so that the pose parameters of the template can be represented by four parameters of sampling viewpoint radius r, rotation angles alpha, beta and lambda around an x-y axis and an optical axis z. The RGB three-channel values of the color corresponding to each plane are set as the three components of its normal vector, and different surfaces of the target are assigned different colors to meet the requirements of subsequent feature extraction, and the schematic sampling diagram of the template library is shown in fig. 1.

After a template image is obtained, Feature extraction needs to be carried out on the template image for subsequent similarity calculation, the method adopts the conventional image gradient Feature perspective accumulated direction Feature (PCOF) which has robustness on small target deformation, and the Feature extraction process comprises the following steps of firstly generating a large number of two-dimensional images near a sampling viewpoint pose through random sampling projection; then extracting gradients and gradient directions from each image by using a discrete difference operator Sobel operator, selecting the gradient direction with the maximum gradient amplitude in RGB three channels as the gradient direction of the pixel point, and quantizing the extracted gradient directions into 8 directions; constructing a gradient direction histogram on each pixel point by utilizing the quantized gradient direction; and finally, reserving the quantization gradient direction larger than the set threshold, constructing a binary character string as a feature for each pixel point, and reserving the maximum frequency of the histogram as the weight of the subsequent matching similarity measurement.

The second step specifically comprises:

the invention relates to a method for building a prior hierarchical tree to accelerate search matching. According to the invention, the hierarchical tree is constructed by using the priori knowledge of the sampling viewpoint poses, in the template generation process, step one is to uniformly sample on a spherical surface with four pose parameters, and the pose parameter of each template is close to that of the adjacent template, so that the similarity between the adjacent templates is very high. On the basis of the prior knowledge, the adjacent viewpoints are classified into one class, the central viewpoint is used as a class center to participate in next classification, and a prior hierarchical tree is established. FIG. 2 is a schematic diagram of a priori hierarchical tree generated in a two-dimensional pose space, and the method of the present invention only needs to be extended to a four-dimensional pose space.

The third step specifically comprises:

inputting a three-dimensional target test image, performing down-sampling to construct a test image pyramid, and loading template features and index information of a prior hierarchical tree constructed offline; introducing sparse representation classification on the basis of the existing pose estimation method based on template matching, and constructing a pose estimation method based on TM-SRC; firstly, matching the prior hierarchical tree generated in the step two at the high level by using similarity measurement, and adopting robust similarity measurement based on characteristic PCOF:

the similarity score is calculated by a bit operation and (symbol ^) where ori is^IAnd ori^TRespectively representing the characteristics of the test image and the template at each pixel point, x and y are coordinates of the pixel points with the template characteristics, x_i,y_iThe offset in the x and y directions in the sliding process; delta (ori)^I∈ori^T) Calculating function for single pixel point weight, n is characteristic quantity of template, w_iIs the weight of the pixel point, i.e. the frequency of the main direction of the histogram. And if the result of the characteristic AND of the test image and the template is true, adding corresponding weight to the numerator of the similarity score.

And matching the high level of the prior hierarchical tree by using similarity measurement to obtain a template with the highest similarity of the level, tracking the child nodes of the template to the next level, and introducing sparse representation classification for classification when the bottom level of the prior hierarchical tree is reached. Sparse representation classification is the use of a linear combination of elements in an overcomplete dictionary to describe a test sample. The overcomplete dictionary is formed by i-type templates D ═ D₁,d₂,…,d_n) And (4) forming. If the test image y can be linearly combined by an overcomplete dictionary, then:

y＝d₁α₁+d₂α₂+……+d_nα_n (4)

expressed in matrix form:

y＝Dα (5)

wherein α ═ (α)₁,α₂…α_n) Called target sparse coefficient, since the object usually has noise or partial occlusion, in order to consider the effects of occlusion and noise, equation (5) can be rewritten as:

where ε is the error vector whose non-zero elements indicate that the loxel is occluded or noisy, which can be modeled by the identity matrix I and the vector of trivial coefficients e.

The solution of equation (6) is not unique, and a unique solution is obtained by using a regular term constraint, and in order to obtain an optimal solution, the non-zero terms are expected to be as few as possible. Therefore, we turn the problem into l₁Regularized least squares problems, which usually have sparse solutions.

Wherein | | | purple hair₂(| | | purple hair)₁Respectively represent l₂And l₁And (5) regularizing constraint, wherein lambda is a regularizing parameter.

The invention has the advantages that: the problems that the time complexity of exhaustive template library searching is too high and the robustness of a pose estimation method based on template matching to shielding and a disordered background is not high are fully considered, a priori knowledge of sampling viewpoint poses is introduced to construct a priori hierarchical tree, the template matching speed is accelerated, and the problem that the matching speed is slow due to the fact that the number of templates in a few sub-nodes is too large is solved; the high level of the prior hierarchical tree is matched by using the similarity measurement, and the bottom level of the prior hierarchical tree is classified by using sparse representation classification, so that the generation of mismatching is effectively reduced; the pose estimation of the three-dimensional non-texture target is realized, and the robustness of the pose estimation under the conditions of a disordered background and target shielding is improved.

Drawings

FIG. 1 is a schematic diagram of template library sampling;

FIG. 2 is a schematic diagram of the generation of an a priori hierarchical tree in a two dimensional pose space;

FIG. 3 is an example of a generated template image;

FIG. 4 is a pose estimation result of the TM-SRC-based three-dimensional target pose estimation method on a test set in the embodiment;

FIG. 5 is an example of a test image under severe occlusion with an underlying template to be matched.

Fig. 6 is a flow chart of the method implementation.

Detailed Description

The present invention is further described with reference to the following examples and the accompanying drawings, which are not intended to limit the scope of the invention as claimed.

Examples

The embodiment adopts a disclosed and challenging three-dimensional non-texture target data set to test and evaluate the TM-SRC-based three-dimensional non-texture target pose estimation method; the data set consists of a plurality of non-texture targets with strong reflection properties, the camera shoots the targets in a range of +/-40 degrees around an x-y axis, a range of +/-180 degrees around a z axis and a distance of 680-800 millimeters from the center of the three-dimensional target, and the image resolution of the data set is 640 multiplied by 480; selecting some images with occlusion and disordered backgrounds from the data set to construct a test set, wherein all the images are disordered backgrounds in the test set, the test set is divided into three parts according to occlusion conditions, namely no occlusion, slight occlusion and severe occlusion, and the number of the test images in each part is 20.

The computer of the embodiment is configured by Intel (R) Xeon (R) E5-2630 v 22.60 GHz CPU and 64.00GB memory. The embodiment is carried out on a Visual Studio 2017 platform of a Windows10 system, and is realized by adopting C/C + + language programming based on an OpenCV Visual library.

The method is applied to the estimation of the pose of the image target in the test set, and comprises the following specific steps:

the method comprises the following steps: and generating template images of different viewpoints based on the three-dimensional target CAD model, extracting and storing the characteristic PCOF of the template images, and completing template library construction in an off-line manner. The radius r of a sampling viewpoint is used, four parameters of rotation angles alpha, beta and gamma around an x-y axis and an optical axis z are used for representing the pose of the target, the sampling range r of a template library is selected to be 680mm-800mm according to the pose range of the target image of the test set, the rotation angle alpha and beta of the x-y axis is +/-40 degrees, the rotation angle gamma of the optical axis is +/-180 degrees, the sampling interval r is 30mm, the alpha and beta are 8 degrees, the gamma is 5 degrees, and the number of the sampling viewpoints of the template library is 36001 finally. Firstly, reading and analyzing a CAD model file by using OpenGL, and projecting by using a projection transformation matrix obtained by sampling viewpoint pose parameters to obtain a three-dimensional non-texture target template image. FIG. 3 illustrates an example of a generated three-dimensional non-textured target template image.

And then extracting and storing the characteristics of the template library template, extracting PCOF characteristics of all template images, generating a series of template images near each sampling viewpoint for histogram statistics, and limiting the range of random parameters by each template to process shape change caused by the random parameters. The random range is determined by experiments to be the distance r +/-30 mm, the rotation angle alpha, beta +/-8 degrees around the x-y axis and the rotation angle gamma +/-5 degrees around the optical axis, the number of the templates generated under each sampling viewpoint is 800, and the threshold value for extracting the main direction of the gradient direction histogram is set to be 100.

Step two: and B, constructing a priori hierarchical tree by using priori knowledge of the sampling viewpoint pose when a template library is established in the first step, regarding each sampling template as one point in a four-dimensional space, then dividing adjacent points in the pose space into one class, tracking a central template of each class to the next layer in the four-dimensional space, dividing the pose space by using the priori knowledge to complete tree construction, and simultaneously performing down-sampling on each layer of templates to further accelerate matching.

The number of the sampling templates of the template library is 36001, a three-layer prior hierarchical tree is finally constructed, the number of the templates from the bottom layer to the top layer is 36001,4500,324, and the resolution is 300 × 300, 150 × 150 and 75 × 75 respectively.

Step three: inputting a three-dimensional non-texture target test set image sample, firstly, performing down-sampling on a test image, and constructing an image pyramid of the test image; then loading template features and index information of the prior hierarchical tree constructed in an off-line manner to a memory; sparse representation classification is introduced on the basis of a pose estimation method based on template matching to enhance the robustness of foreground occlusion and a disordered background. Matching the high layers of the prior hierarchical tree by using similarity measurement, obtaining a candidate template with the highest similarity at each layer, and obtaining a child node template position and pose parameter corresponding to the candidate template at the bottom layer; and then randomly sampling and generating 20 templates near the pose parameter corresponding to each sub-node, constructing an over-complete dictionary, finally performing matching through sparse representation classification to obtain an optimal matching template, and outputting the optimal matching template index and the associated pose parameter. The result of the TM-SRC-based three-dimensional target pose estimation method is shown in FIG. 4.

The above steps are specific applications of the invention in pose estimation of three-dimensional non-textured targets on a selected test set, and firstly, in order to verify the correctness of introducing sparse representation classification at the bottom layer of a prior hierarchical tree, the results of matching by respectively using similarity measurement and sparse representation classification at the bottom layer are compared, and fig. 5 shows an example of a test image under severe shielding and a bottom layer template to be matched. Table 1 gives the similarity scores between the templates and the test image and the absolute errors of the pose corresponding to each template. As can be seen from table 1, in the case of severe occlusion, the similarity scores between the template and the test image are all at a very low level, where the similarity score of template _9 is the highest, but obviously template _9 is not the correct template, and both template _2 and template _8 have a smaller absolute error in pose than template _ 9. Thus, if the similarity measure continues to be used at the bottom of the prior hierarchical tree for matching, the similarity score of the wrong template may be higher than the correct template, which may result in the generation of a false match. The invention uses sparse representation classification to replace similarity measurement to match at the bottom layer of the prior hierarchical tree. As shown in Table 2, the present invention is matched to template _2 accurately, therefore, the experiment verifies the correctness of matching by introducing sparse representation classification at the bottom of the prior hierarchical tree well.

TABLE 1 similarity score between each template and test image at bottom of prior hierarchical tree and absolute position error corresponding to the template

TABLE 2 PCOF and TM-SRC match results at the bottom of the prior-experience hierarchical tree

In order to verify the effectiveness of the method, a pose estimation method based on template matching using PCOF features is set as an experimental comparison method, and the performance of the TM-SRC-based three-dimensional non-texture target pose estimation method is evaluated by taking the average absolute error as a performance evaluation index. The pose estimation results for the test data set are shown in table 3.

Table 3 pose estimation results of TM-SRC-based pose estimation method on test data set

As can be seen from Table 3, the mean absolute error of TM-SRC under the conditions of no occlusion and slight occlusion is slightly reduced compared with PCOF on the test data set, and especially under the condition of serious occlusion, the performance of TM-SRC is far better than that of PCOF, and the performance of TM-SRC relative to PCOF is improved to a certain extent. Experimental results show that the TM-SRC-based three-dimensional texture-free target pose estimation method can effectively improve the robustness of three-dimensional texture-free target pose estimation under the condition that a three-dimensional texture-free target is blocked and has a disordered background.

12页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种适用于串型水果的采摘方法

TM-SRC-based three-dimensional non-texture target pose estimation method

相关技术

网友询问留言