Hierarchical template matching method based on multi-dimensional pyramid

文档序号:551880 发布日期:2021-05-14 浏览:2次 中文

阅读说明:本技术 一种基于多维金字塔的分层模板匹配方法 (Hierarchical template matching method based on multi-dimensional pyramid ) 是由 熊振华 柴子奇 吴建华 朱向阳 于 2021-02-01 设计创作,主要内容包括:本发明公开了一种基于多维金字塔的分层模板匹配方法。将离线渲染的模板数据,按照其渲染时的视点参数进行聚类,建立多个维度下的金字塔结构,实现匹配过程中的效率优化,其方法步骤如下:步骤1、在离线生成过程,得到彩色图和深度图;步骤2、构建多维模板金字塔;步骤3、在在线匹配过程,得到输入特征图;步骤4、得到高层匹配结果;步骤5、得到物体所在大致区间,作为二维图像上的ROI;步骤6、对应低层次金字塔的模板进行匹配测试;步骤7、对匹配姿态进行随机抽样一致检测,得到物体最终的检测和姿态估计结果。本发明利用CAD模型进行使用,适用于工业应用,能快速地查询,保证了匹配速度和精度的权衡。(The invention discloses a hierarchical template matching method based on a multi-dimensional pyramid. Clustering off-line rendered template data according to viewpoint parameters during rendering, establishing a pyramid structure under multiple dimensions, and realizing efficiency optimization in a matching process, wherein the method comprises the following steps: step 1, obtaining a color image and a depth image in an off-line generation process; step 2, constructing a multi-dimensional template pyramid; step 3, obtaining an input characteristic diagram in an online matching process; step 4, obtaining a high-level matching result; step 5, obtaining an approximate interval where the object is located, and using the approximate interval as an ROI on the two-dimensional image; step 6, carrying out matching test on the template corresponding to the low-level pyramid; and 7, performing random sampling consistent detection on the matched posture to obtain the final detection and posture estimation result of the object. The method is used by utilizing the CAD model, is suitable for industrial application, can quickly inquire and ensures the balance of matching speed and precision.)

1. A hierarchical template matching method based on a multi-dimensional pyramid is characterized in that offline rendered template data are clustered according to viewpoint parameters during rendering, pyramid structures under multiple dimensions are built, and efficiency optimization in a matching process is achieved, wherein the method comprises the following steps:

step 1, in an off-line generation process, utilizing triangular slice recursive subdivision of a regular icosahedron to generate camera observation viewpoint parameters, and utilizing OpenGL to render template data to obtain a color image and a depth image;

step 2, extracting the characteristics of the template data, performing secondary organization by using the viewpoint parameters, and constructing a multi-dimensional template pyramid;

step 3, in the on-line matching process, firstly, carrying out feature extraction on an input sample to obtain an input feature map;

step 4, performing matching test on the overall input feature map by using a template of the high-level pyramid to obtain a high-level matching result;

step 5, performing non-maximum value suppression on the matching position on the feature map to obtain an approximate interval where the object is located, and using the approximate interval as an ROI on the two-dimensional image;

step 6, in the ROI, according to the index of the matched high-level pyramid template matching result, continuing to use the template corresponding to the low-level pyramid for matching test;

and 7, after the template of the bottom pyramid is matched, performing random sampling consistent detection on the matched posture to obtain the final detection and posture estimation result of the object.

2. The multi-dimensional pyramid-based hierarchical template matching method according to claim 1, wherein the regular icosahedron trigonometric recursive division in the step 1 for generating the camera observation viewpoint parameters comprises the following steps:

step 1.1, constructing an inner tangent regular icosahedron on a spherical surface with the radius of 1 to obtain a vertex coordinate and a triangular surface index of the inner tangent regular icosahedron;

step 1.2, selecting the middle point of each edge line of each triangular surface, projecting the middle point onto a spherical surface to obtain a new vertex, completing subdivision on each triangular surface of the regular icosahedron, and obtaining a new vertex and a triangular sheet index;

step 1.3, circulating a triangular recursive subdivision process until the expected recursive layer number;

step 1.4, zooming all viewpoints by using the target radius to obtain viewpoint coordinates of all viewpoints on the spherical surface with different radii;

and step 1.5, placing the target object model at the origin of the coordinate system, placing the camera at the obtained vertex position, enabling the optical axis of the camera to point to the origin of the coordinate system, setting the in-plane rotation angle of the camera along the optical axis, and rendering to obtain a color image and a depth image under each viewpoint parameter.

3. The multi-dimensional pyramid-based hierarchical template matching method of claim 1, wherein the multi-dimensional template pyramid construction in step 2 comprises the following steps:

2.1, extracting the characteristics of the original template by using a characteristic construction mode same as that of the original LINEMOD;

2.2, aiming at the camera viewpoint parameters, building pyramid indexes in the radius dimension, wherein the viewpoints in each radius are distributed on a real physical spherical surface, so that the physical radius layer is formed, the pyramid is from top to bottom, the radius parameter sampling is dense, and the pyramid is called a radius dimension pyramid;

2.3, aiming at the camera viewpoint parameters, establishing pyramid indexes in the in-plane rotation dimensions of the camera, wherein the coordinates of the camera viewpoint rotated in different planes are overlapped in space, so that the pyramid is called a virtual radius layer, the pyramid is from top to bottom, and the in-plane rotation parameter sampling becomes dense, so that the pyramid is called an in-plane rotation dimension pyramid;

and 2.4, aiming at the camera viewpoint parameters, building a pyramid index on the triangular facet recursive subdivision dimension of the regular icosahedron, wherein the pyramid is from top to bottom and corresponds to the number of layers of triangular recursive subdivision, and the pyramid is called a recursive subdivision dimension pyramid.

4. The multi-dimensional pyramid-based hierarchical template matching method according to claim 1, wherein the multi-dimensional template pyramid in step 2 organizes the templates into a radius pyramid from coarse to fine in physical radius dimension by using a hierarchical structure of viewpoint parameters in space; different in-plane rotations are used as virtual radius layers in-plane rotation dimensions, and an in-plane rotation pyramid is formed by coarse to fine in a mode similar to that of a physical radius layer; and recursively subdividing dimensions on the triangular surface of the regular icosahedron, and organizing the template from coarse to fine into a recursively subdivided pyramid through recursively subdividing parent-child relationships.

5. The multi-dimensional pyramid-based hierarchical template matching method according to claim 1, wherein the high-level pyramid matching in step 4 uses templates in the high-level pyramid with a radius dimension, a recursive subdivision dimension and an in-plane rotation dimension, and uses an SSE instruction set for similarity parallel computation acceleration.

6. The multi-dimensional pyramid-based hierarchical template matching method of claim 1, wherein the hierarchical template matching within the ROI in step 6 comprises the steps of:

step 6.1, in each ROI, using the previous N candidate values of the initial matching result to obtain corresponding low-level template indexes of the initial matching result in three pyramids of radius dimension, recursive subdivision dimension and in-plane rotation dimension, and continuing to match to obtain new candidate values;

and 6.2, reserving the first N matching results of the new candidate values, and repeating the process to the bottom pyramid.

7. The multi-dimensional pyramid-based hierarchical template matching method of claim 1, wherein the multi-dimensional pyramid constrains connection relationships between similar viewpoints, and can well index nearest neighbor templates.

8. The multi-dimensional pyramid-based hierarchical template matching method of claim 1, wherein a hierarchical search algorithm used in the matching method is divided into two stages, namely an initial search stage and a hierarchical search stage.

9. The multi-dimensional pyramid-based hierarchical template matching method of claim 8, wherein when an initial candidate value is generated in the initial search stage, a multi-dimensional pyramid high-level template is used to perform global search on an input image, so as to obtain an ROI on the image and an initial candidate matching template in the multi-dimensional pyramid.

10. The multi-dimensional pyramid-based hierarchical template matching method according to claim 8, wherein the hierarchical search is performed according to an initial candidate value, so that on one hand, a global search space is reduced, and on the other hand, in a search process of a pyramid from a high layer to a low layer, the sampling density of parameters of each dimension is increased, a local search fine granularity is increased, and the matching precision is ensured; and in the hierarchical searching stage, matching is only carried out in the ROI, and a low-level pyramid template in the multi-dimensional pyramid is searched according to the initial candidate matching template.

Technical Field

The invention relates to the field of object detection and a six-degree-of-freedom attitude estimation method thereof, in particular to a hierarchical template matching method based on a multi-dimensional pyramid.

Background

Object Recognition and 6D Pose Estimation (Object Recognition and 6D position Estimation) are a key problem in machine vision technology, and the goal is to provide the robot with information for operating a target Object and to solve the problem of what the Object is and where. The 6D pose acquired by the target is the transformation of an object coordinate system and a vision sensor (camera) coordinate system, and consists of 3D translation transformation and 3D rotation transformation. At present, object recognition and 6D pose estimation are still important and challenging topics in practical application of many robots, and are also key technologies for an industrial robot to complete intelligent tasks (grabbing, polishing, assembling and the like).

The traditional template matching method mainly extracts a series of image blocks on an input image through a sliding window method, and compares the similarity of a scene image block and an object template by adopting an image correlation coefficient method. The method needs to establish image templates of objects under different postures, different lighting environments and different backgrounds, and due to the low generalization capability of the image templates, the number of the templates needs to be enough to influence the real-time performance, and the image templates are replaced by a local feature point method in the early years. LINE2D and LINEMOD algorithms were proposed by hindersoisser in 2011 and 2012, and the frameworks of these algorithms were template matching, but the composition of the templates was different from the traditional templates. The template is composed of the edge characteristic and the surface normal characteristic of the contour of the target object, and does not depend on the texture information of the object. In order to realize that the object can be recognized under various postures, the algorithms need to establish object templates under multiple visual angles and scales, so that the number of the templates is large. In order to accelerate the calculation of the similarity between the image block and the template, the algorithm uses a series of optimization methods such as a SIMD instruction set and an SSE instruction set to perform parallel calculation. However, under the conditions of unknown scale and large object pose change range, the number of templates can reach ten thousand levels, so the poor real-time property restricts the application of the template in the industrial field.

Therefore, those skilled in the art have been devoted to developing a hierarchical template matching method based on a multi-dimensional pyramid to solve these problems.

Disclosure of Invention

In view of the above defects in the prior art, the technical problem to be solved by the present invention is how to improve the matching real-time performance of the template matching method, more specifically, the LINEMOD method, in the large-scale search space, and improve the adaptability of the LINEMOD method in industrial applications.

In order to achieve the above object, the present invention provides a hierarchical template matching method based on a multidimensional pyramid, which clusters off-line rendered template data according to viewpoint parameters during rendering, establishes pyramid structures under multiple dimensions, and achieves efficiency optimization in a matching process, wherein the method comprises the following steps:

step 1, in an off-line generation process, utilizing triangular slice recursive subdivision of a regular icosahedron to generate camera observation viewpoint parameters, and utilizing OpenGL to render template data to obtain a color image and a depth image;

step 2, extracting the characteristics of the template data, performing secondary organization by using the viewpoint parameters, and constructing a multi-dimensional template pyramid;

step 3, in the on-line matching process, firstly, carrying out feature extraction on an input sample to obtain an input feature map;

step 4, performing matching test on the overall input feature map by using a template of the high-level pyramid to obtain a high-level matching result;

step 5, performing non-maximum value suppression on the matching position on the feature map to obtain an approximate interval where the object is located, and using the approximate interval as an ROI on the two-dimensional image;

step 6, in the ROI, according to the index of the matched high-level pyramid template matching result, continuing to use the template corresponding to the low-level pyramid for matching test;

and 7, after the template of the bottom pyramid is matched, performing random sampling consistent detection on the matched posture to obtain the final detection and posture estimation result of the object.

Further, the process of generating the camera observation viewpoint parameters by the regular icosahedron triangulation in the step 1 comprises the following steps:

step 1.1, constructing an inner tangent regular icosahedron on a spherical surface with the radius of 1 to obtain a vertex coordinate and a triangular surface index of the inner tangent regular icosahedron;

step 1.2, selecting the middle point of each edge line of each triangular surface, projecting the middle point onto a spherical surface to obtain a new vertex, completing subdivision on each triangular surface of the regular icosahedron, and obtaining a new vertex and a triangular sheet index;

step 1.3, circulating a triangular recursive subdivision process until the expected recursive layer number;

step 1.4, zooming all viewpoints by using the target radius to obtain viewpoint coordinates of all viewpoints on the spherical surface with different radii;

and step 1.5, placing the target object model at the origin of the coordinate system, placing the camera at the obtained vertex position, enabling the optical axis of the camera to point to the origin of the coordinate system, setting the in-plane rotation angle of the camera along the optical axis, and rendering to obtain a color image and a depth image under each viewpoint parameter.

Further, the building of the multidimensional template pyramid in the step 2 comprises the following steps:

2.1, extracting the characteristics of the original template by using a characteristic construction mode same as that of the original LINEMOD;

2.2, aiming at the camera viewpoint parameters, building pyramid indexes in the radius dimension, wherein the viewpoints in each radius are distributed on a real physical spherical surface, so that the physical radius layer is formed, the pyramid is from top to bottom, the radius parameter sampling is dense, and the pyramid is called a radius dimension pyramid;

2.3, aiming at the camera viewpoint parameters, establishing pyramid indexes in the in-plane rotation dimensions of the camera, wherein the coordinates of the camera viewpoint rotated in different planes are overlapped in space, so that the pyramid is called a virtual radius layer, the pyramid is from top to bottom, and the in-plane rotation parameter sampling becomes dense, so that the pyramid is called an in-plane rotation dimension pyramid;

and 2.4, aiming at the camera viewpoint parameters, building a pyramid index on the triangular facet recursive subdivision dimension of the regular icosahedron, wherein the pyramid is from top to bottom and corresponds to the number of layers of triangular recursive subdivision, and the pyramid is called a recursive subdivision dimension pyramid.

Further, the multidimensional template pyramid in the step 2 utilizes a hierarchical structure of viewpoint parameters in a space to organize the templates into a radius pyramid from coarse to fine in a physical radius dimension; different in-plane rotations are used as virtual radius layers in-plane rotation dimensions, and an in-plane rotation pyramid is formed by coarse to fine in a mode similar to that of a physical radius layer; and recursively subdividing dimensions on the triangular surface of the regular icosahedron, and organizing the template from coarse to fine into a recursively subdivided pyramid through recursively subdividing parent-child relationships.

Further, in the high-level pyramid matching in the step 4, in the matching process, templates in the high-level pyramid with a radius dimension, a recursive subdivision dimension and an in-plane rotation dimension are used, and an SSE instruction set is used for carrying out similarity parallel computation acceleration.

Further, the step 6 of performing hierarchical template matching within the ROI comprises the steps of:

step 6.1, in each ROI, using the previous N candidate values of the initial matching result to obtain corresponding low-level template indexes of the initial matching result in three pyramids of radius dimension, recursive subdivision dimension and in-plane rotation dimension, and continuing to match to obtain new candidate values;

and 6.2, reserving the first N matching results of the new candidate values, and repeating the process to the bottom pyramid.

Furthermore, the multidimensional pyramid restrains the connection relation between similar viewpoints, and nearest neighbor templates can be well indexed.

Furthermore, the hierarchical search algorithm used by the matching method is divided into an initial search stage and a hierarchical search stage.

Further, when the initial candidate value is generated in the initial search stage, global search is performed on the input image by using the multi-dimensional pyramid high-level template to obtain an ROI on the image and an initial candidate matching template in the multi-dimensional pyramid.

Furthermore, the hierarchical search is carried out according to the initial candidate value, on one hand, the global search space is reduced, on the other hand, in the searching process of the pyramid from a high layer to a low layer, the sampling density of parameters of each dimension is increased, the fine granularity of local search is increased, and the matching precision is ensured; and in the hierarchical searching stage, matching is only carried out in the ROI, and a low-level pyramid template in the multi-dimensional pyramid is searched according to the initial candidate matching template.

Compared with the prior art, the invention has the following characteristics:

and the color image and the depth image template of the object can be quickly and conveniently rendered by using the CAD model, and the method is suitable for industrial application.

Based on the characteristics of multiple dimensions of the viewpoint parameters, a multi-dimensional template pyramid is established, and the templates (viewpoint parameters) are indexed, so that similar and adjacent templates can be quickly inquired from coarse to fine.

Determining an ROI of a two-dimensional input image and an initial candidate value in a multi-dimensional pyramid in an initial matching process by utilizing two-step matching; in the hierarchical searching process, on one hand, the initial candidate value is used, the global searching space is reduced, on the other hand, as the pyramid layer number is increased, the local searching fine granularity is increased, and the balance between the matching speed and the matching precision is ensured.

The conception, the specific structure and the technical effects of the present invention will be further described with reference to the accompanying drawings to fully understand the objects, the features and the effects of the present invention.

Drawings

FIG. 1 is a schematic overall flow chart of offline rendering and online matching according to a preferred embodiment of the present invention;

FIG. 2 is a schematic diagram of a triangular recursive subdivision dimension pyramid of the present invention;

FIG. 3 is a schematic diagram of the pyramid of the in-plane rotation (virtual radius layer) and radius (physical radius layer) dimensions of the present invention;

FIG. 4 is a general schematic of the multi-dimensional pyramid of the present invention;

FIG. 5 is a visualization of the results of the matching process of the present invention;

FIG. 6 is a diagram of the effect of matching and sorting the scattered parts one by one in the industrial scene of the invention.

Detailed Description

The technical contents of the preferred embodiments of the present invention will be more clearly and easily understood by referring to the drawings attached to the specification. The present invention may be embodied in many different forms of embodiments and the scope of the invention is not limited to the embodiments set forth herein.

In the drawings, structurally identical elements are represented by like reference numerals, and structurally or functionally similar elements are represented by like reference numerals throughout the several views. The size and thickness of each component shown in the drawings are arbitrarily illustrated, and the present invention is not limited to the size and thickness of each component. The thickness of the components may be exaggerated where appropriate in the figures to improve clarity.

A hierarchical template matching method based on a multi-dimensional pyramid is characterized in that in an off-line generation process, a camera observation viewpoint parameter is generated by utilizing triangular slice recursive division of a regular icosahedron, and color images and depth images are obtained by utilizing OpenGL to render template data. The method comprises the following specific steps:

as shown in fig. 2, an inscribed regular icosahedron is constructed on a spherical surface with a radius of 1, and the vertex coordinates and the triangular surface index thereof are obtained.

Selecting the middle point of each edge line of each triangular surface, projecting the middle point onto the spherical surface to obtain a new vertex, completing the subdivision of each triangular surface of the alignment icosahedron, and obtaining a new vertex and a new triangular sheet index.

The triangular recursive subdivision process is cycled until the desired number of recursive layers.

As shown in fig. 3, a pyramid schematic diagram of in-plane rotation (virtual radius layer) and radius (physical radius layer) dimensions is obtained by scaling all viewpoints with a target radius to obtain viewpoint coordinates of all viewpoints on a spherical surface with different radii.

And placing the target object model at the origin of the coordinate system, placing the camera at the obtained vertex position, enabling the optical axis of the camera to point to the origin of the coordinate system, setting the in-plane rotation angle of the camera along the optical axis, and rendering to obtain a color image and a depth image under each viewpoint parameter.

As shown in fig. 4, a general schematic diagram of a multi-dimensional pyramid; and extracting the characteristics of the template data, performing secondary organization by using the viewpoint parameters, and constructing a multi-dimensional template pyramid. The multidimensional pyramid restrains the connection relation between similar viewpoints, and nearest neighbor templates can be well indexed. The method comprises the following specific steps:

feature extraction is performed on the original template using the same feature construction approach as the original LINEMOD.

For the camera viewpoint parameters, pyramid indexes are established in the radius dimension, since viewpoints on each radius are distributed on a real physical spherical surface, the physical radius layer is called, the pyramid is from top to bottom, the radius parameter sampling is dense, and the pyramid is called as a radius dimension pyramid.

And establishing pyramid indexes in the in-plane rotation dimensions of the camera according to the viewpoint parameters of the camera, wherein the coordinates of the camera viewpoints rotating in different planes are overlapped in space, so that the pyramid is called a virtual radius layer, the pyramid is from top to bottom, and the in-plane rotation parameter sampling is dense, so that the pyramid is called an in-plane rotation dimension pyramid.

And aiming at the viewpoint parameters of the camera, building a pyramid index in the triangular facet recursive subdivision dimension of the regular icosahedron, wherein the pyramid is from top to bottom and corresponds to the number of layers of triangular recursive subdivision, and the pyramid is called a recursive subdivision dimension pyramid.

In the online matching process, firstly, the query data is subjected to feature extraction to obtain an input feature map.

And on the overall input characteristic diagram, performing initial search by using a template of the high-level pyramid to obtain a high-level matching result and generate an initial candidate value. In the process, global search is carried out on the whole input image by using templates in a high-level pyramid with a radius dimension, a recursive subdivision dimension and an in-plane rotation dimension, and the aim is to obtain an ROI on the image and an initial candidate value in the high-level multi-dimensional pyramid. All similarity calculation links use SSE instruction sets consistent with LINEMOD to perform parallel calculation and acceleration.

And carrying out non-maximum value suppression on the matched position on the feature map to obtain an approximate section where the object is located, and using the approximate section as the ROI on the two-dimensional image.

And in the ROI, according to the initial searched high-level pyramid template matching result index, continuously using the template corresponding to the low-level pyramid for hierarchical search. The method comprises the following specific steps:

in each ROI, the first N values of the initial matching result (candidate value) are used to obtain corresponding low-level template indexes of the initial matching result in three pyramids of radius dimension, recursive subdivision dimension and in-plane rotation dimension, and matching is continued to obtain a new candidate value.

The first N results of the new candidate are retained and the process is repeated to the bottom pyramid.

And carrying out hierarchical search according to the initial candidate value, on one hand, reducing the global search space, on the other hand, in the searching process of the pyramid from a high layer to a low layer, the sampling density of the parameters of each dimension is increased, the fine granularity of local search is increased, and the matching precision is ensured.

As shown in fig. 5, in the visualization diagram of the matching process result of the invention, after the template of the bottom pyramid is matched, random sampling consistent detection is performed on the matching posture, so as to obtain the final detection and posture estimation result of the object.

As shown in fig. 6, the effect diagram of matching and sorting scattered parts in an industrial scene one by one according to the present invention is an effect diagram of sorting according to the hierarchical template matching method based on the multidimensional pyramid of the present invention.

As shown in fig. 1, in the offline template generation process, the offline rendering and online matching overall process schematic diagram of the invention is based on sampling of viewpoint parameters subdivided by regular icosahedron triangle recursion → based on OpenGL color image, depth map rendering → feature extraction → establishment of multidimensional template pyramid → screening of matching results; in the online template matching process, an input sample → feature extraction → a feature map of the input sample → hierarchical template matching → matching result screening; establishing a multidimensional template pyramid and a hierarchical template for matching and entering the screening of matching results.

The hierarchical template matching method based on the multidimensional pyramid can ensure the accuracy of target object identification and posture estimation, is low in time complexity and high in efficiency, and ensures the real-time performance of the method in industrial application. In the process of rendering the template, three pyramids of a radius dimension, a recursive subdivision dimension and an in-plane rotation dimension are established through viewpoint parameters of a camera. In the matching process, the high-level template indexes of all the pyramids are used for matching, and then the local part of the input image is used for hierarchical matching, so that the template matching efficiency is improved. The multi-dimensional pyramid structure is characterized in that a hierarchical structure of viewpoint parameters in a space is utilized, and templates are organized into a radius pyramid from coarse to fine in a physical radius dimension; different in-plane rotations are used as virtual radius layers in-plane rotation dimensions, and an in-plane rotation pyramid is formed by coarse to fine in a mode similar to that of a physical radius layer; and (4) recursively subdividing dimensions on the triangular surface of the regular icosahedron, and organizing the template from coarse to fine into a recursively subdivided pyramid. The multidimensional pyramid restrains the connection relation between similar viewpoints, and nearest neighbor templates can be well indexed. The used hierarchical search algorithm carries out hierarchical search according to the initial candidate value, so that the global search space is reduced, the local search fine granularity is increased, and the balance between the matching speed and the matching precision is obtained.

The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

11页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:激光扫地机的位姿重定位方法、装置、设备及介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!