Binocular vision-based unmanned aerial vehicle and electric power tower distance measuring method

文档序号：1859012 发布日期：2021-11-19 浏览：22次中文

阅读说明：本技术 基于双目视觉的无人机和电力杆塔距离测量方法 (Binocular vision-based unmanned aerial vehicle and electric power tower distance measuring method ) 是由吴志成林秀贵许家浩杨昌加王门鸿叶学知陈子良李博宁蔡志坚林旭鸣于 2021-08-04 设计创作，主要内容包括：本发明涉及一种基于双目视觉的无人机和电力杆塔距离测量方法,包括以下步骤：步骤S1:采用双目视觉摄像机,获取电力杆塔的图片；步骤S2:基于YOLACT算法对双目视觉摄像机拍摄到的电力杆塔图像进行实例分割,获得分割后的电力杆塔图像；步骤S3:通过SURF特征匹配方法对左右双目分割出的同一电力杆塔图像进行特征匹配,获得精确的特征点对;步骤S4:根据三角测距以及双目视觉中视差与深度的关系推出精确的深度图,进一步测算出无人机与电力杆塔之间的距离。本发明能精准测量无人机与电力杆塔之间的距离,保障无人机巡检期间机体的安全性与稳定性。(The invention relates to a binocular vision-based unmanned aerial vehicle and electric power tower distance measuring method, which comprises the following steps: step S1, acquiring pictures of the electric power tower by using a binocular vision camera; step S2, carrying out example segmentation on the electric power tower image shot by the binocular vision camera based on a YOLACT algorithm to obtain a segmented electric power tower image; the method comprises the steps of S3, performing feature matching on the same electric power tower image divided by a left binocular and a right binocular through a SURF feature matching method to obtain accurate feature point pairs, and S4, deriving an accurate depth map according to the triangular distance measurement and the relation between parallax and depth in binocular vision, and further calculating the distance between the unmanned aerial vehicle and the electric power tower. The distance between the unmanned aerial vehicle and the electric power tower can be accurately measured, and the safety and the stability of the machine body during the inspection period of the unmanned aerial vehicle are guaranteed.)

1. A binocular vision-based unmanned aerial vehicle and electric power tower distance measuring method is characterized by comprising the following steps:

step S1, acquiring pictures of the electric power tower by using a binocular vision camera;

step S2, carrying out example segmentation on the electric power tower image shot by the binocular vision camera based on a YOLACT algorithm to obtain a segmented electric power tower image;

step S3, performing feature matching on the same electric power tower image divided by the left and right binoculars by using a SURF feature matching method to obtain accurate feature point pairs;

and S4, deriving an accurate depth map according to the relation between the parallax and the depth in the triangular ranging and binocular vision, and further calculating the distance between the unmanned aerial vehicle and the electric power tower.

2. The binocular vision based unmanned aerial vehicle and electric power tower distance measuring method of claim 1, wherein the step S2 specifically comprises:

step S21, preprocessing the picture to make the picture conform to the size of the backbone, and inputting the picture into the backbone for feature extraction;

step S22, dividing the example into two subtasks for parallel processing through YOLACT;

and step S23, predicting the mask coefficient of each instance through the prediction head and the NMS network.

3. The binocular vision-based unmanned aerial vehicle and electric power tower distance measuring method according to claim 2, wherein the backbone structure adopts a ResNet101+ FPN network, and specifically comprises the following steps:

ResNet101 contains five convolution modules, and the outputs of the five convolution modules are respectively corresponding to C1 to C5 of the Yolact network structure;

b. adding an FPN network behind ResNet101, wherein the FPN network obtains P5 through a convolution layer from C5 of ResNet101, then carrying out bilinear interpolation amplification on P5 once, adding the amplified product with the convolved C4 to obtain P4, and obtaining P3 by the same method; furthermore, convolution of P5 gave P6, and convolution of P6 gave P7; thereby completing feature extraction and generating anchors of corresponding size.

4. The binocular vision based unmanned aerial vehicle and electric power tower distance measuring method of claim 3, wherein the step S22 specifically comprises:

p3 generates a group of prototype masks for the whole picture through Protonet, wherein each picture has k prototype masks;

extracting the P3 layer features in the backbone, performing 3 × 3 convolutions, then performing Upsampling +3 × 3 convolution to change the features into a quarter size of the original image, finally reducing the number of channels to k by 1 × 1 convolution, and generating k 138 × 138 prototype masks.

5. The binocular vision based unmanned aerial vehicle and electric power tower distance measuring method of claim 4, wherein the step S23 specifically comprises: additionally adding an output mask function on the basis of an Anchor-based detection model, namely outputting confidence scores of c categories, 4 regression quantities and k mask coefficients for each frame; the mask coefficient has positive and negative values, and based on the characteristic that the value domain of the tan h activation function is (-1, 1), the tan h function is used for nonlinear activation during the prediction of the mask coefficient;

the prediction head is improved on the basis of RetinaNet, and is subjected to 3 × 3 convolution by sharing a convolution network; wherein a is the number of anchors per P; splicing the anchor number of each layer after NMS to obtain all mask coefficients;

finally according to the formula

M＝σ(PC^T)

Wherein, P is a set of prototype masks of h × w × k, C is a set of mask coefficients of n × k, which represents n instances filtered by NMS and a threshold, σ is a sigmoid function, and the size of the finally obtained M is h × w × n, that is, n masks are predicted.

6. The binocular vision based unmanned aerial vehicle and electric power tower distance measuring method of claim 1, wherein the Loss of class function of the backsbone is caused by a class confidence Loss L_clsFrame regression loss L_boxAnd mask loss L_lossThree-part composition with class confidence loss L_clsThe same way in SSD, i.e. softmax loss

f (i, j) is the i, j-th element in the matrix f, maxf_jI.e. the largest score among all classification scores of a sample; frame regression loss L_boxThe same way in SSD, i.e., smooth-L1 loss

L_lossFor the integrated masks M and ground gateway masks M_gtCross entropy loss of two classes in between

L_mask＝BCE(M,M_gt)。

7. The binocular vision based unmanned aerial vehicle and electric power tower distance measuring method of claim 1, wherein the step S3 specifically comprises:

step S31, constructing a Hessian matrix, wherein the Hessian matrix is the vector f (x, y)

Then passes through discriminant of Hessian matrix

Judging and detecting edge points of the image; in order to generate stable image characteristics, a second-order standard Gaussian function is selected for filtering before the Hessian matrix is constructed, and a second-order partial derivative is calculated through convolution between specific kernels, so that the filtered Hessian matrix is obtained

Step S32, a box filter is adopted to approximately replace a Gaussian filter, and a weight value changing along with the scale is introduced to balance the error, so that the discriminant of the Hessian matrix is changed into

det(H)＝Dxx*Dyy-(0.9*Dxy)²

step S33, changing the size of the filter through box type filtering, and quickly calculating a response image of the box type filtering through an integrogram to construct a scale space; the characteristic points can be quickly searched and positioned by means of the scale space, each pixel point processed by the Hessian matrix is compared with 26 points in the three-dimensional image space and the adjacent region of the scale space, the characteristic points are preliminarily determined by utilizing NMS, the characteristic points at the sub-pixel level are obtained by adopting a three-dimensional linear interpolation method, meanwhile, the characteristic points with weak energy or wrong positioning are filtered, and the final stable characteristic points are screened out;

s34, distributing the principal direction of the characteristic point by the SURF algorithm, counting the haar wavelet response sum in the x-y direction in a sector area of 60 degrees within a certain radius range by counting the harr wavelet characteristics in the circular field of the characteristic point and taking the characteristic point as the center, giving different degrees of weight to the response values according to the distance from the characteristic point, and finally taking the sector with the maximum value as the principal direction of the characteristic point;

step S35, along the main direction of the feature point, a square with the side length of 20S is framed around the feature point and is divided into 16 sub-regions, S is the scale of the feature point, each sub-region counts haar wavelet features of 25 pixels in the horizontal direction and the vertical direction relative to the main direction to obtain four values

∑dx,∑|dx|,∑dy,∑|dy|

Namely, a vector of each sub-region, which is used as a descriptor of SURF characteristics;

and step S36, matching the feature points, determining the matching degree by calculating the Euclidean distance between the two feature points, wherein the shorter the Euclidean distance is, the better the matching degree of the two feature points is, and quickly eliminating the feature points with opposite directions by utilizing the positive and negative judgment of the Hessian matrix trace.

8. The binocular vision based unmanned aerial vehicle and electric power tower distance measuring method of claim 1, wherein the step S4 specifically comprises:

the parallax error obtained according to the triangulation principle is as follows:

d＝x^l-x^r

the relationship with the depth z is:

wherein the left and right phasesThe optical axes of the machines being parallel, x^lAnd x^rIs the imaging point of point P on the left and right image planes, T is the distance between the optical centers of the left and right cameras, P is a point in space, f is the focal length, O_lAnd O_rIs the optical center of the left and right cameras; the depth z can be obtained by obtaining the parallax d;

according to the SURF algorithm, for the feature matching images of the left and right binocular vision images after the example segmentation, the distance between matching points is counted, namely the parallax d;

and (3) solving a depth map according to the relationship between binocular vision parallax and depth distance, and eliminating errors generated by geometric distortion and noise interference in stereo matching so as to further solve the accurate distance between the unmanned aerial vehicle and the electric power tower.

Technical Field

The invention belongs to the field of electric power inspection systems and computer vision, and particularly relates to an unmanned aerial vehicle based on binocular vision and an electric power tower distance measuring method.

Background

With the rapid development of power systems, the national requirements for safe operation and power supply reliability of power lines are also high. Because the power transmission line and the power tower play very important roles in the power grid, the safety and stability of the running state of the power transmission line and the power tower play a decisive role in ensuring the integrity of the power grid structure. Therefore, in order to ensure the normal work of the power station, the daily routing inspection of the power tower is also important.

Traditional electric power patrols and examines and adopts the manual work mode of patrolling and examining, nevertheless can consume a large amount of manpowers and time, and unmanned aerial vehicle patrols and examines the function of taking photo by plane that its easy operation, visual angle are unique, the picture is clear, is replacing traditional manual work gradually and patrols and examines, nevertheless because unmanned aerial vehicle can receive electromagnetic interference around electric power tower, the phenomenon that the long-range controllability of organism reduces appears to influence unmanned aerial vehicle control's accuracy nature. In addition, most unmanned aerial vehicle inspection methods in the market at present cannot accurately feed back the accurate distance between the unmanned aerial vehicle and the electric power tower.

Disclosure of Invention

In view of the above, the present invention provides a binocular vision based unmanned aerial vehicle and a method for measuring a distance between power towers, so as to solve the above problems.

In order to achieve the purpose, the invention adopts the following technical scheme:

a binocular vision-based unmanned aerial vehicle and electric power tower distance measuring method comprises the following steps:

s1, acquiring pictures of the electric power tower according to the binocular vision camera carried by the unmanned aerial vehicle;

step S2, carrying out example segmentation on the electric power tower image shot by the binocular vision camera based on a YOLACT algorithm to obtain a segmented electric power tower image;

step S3, performing feature matching on the same electric power tower image divided by the left and right binoculars by using a SURF feature matching method to obtain accurate feature point pairs;

Further, the step S2 is specifically:

step S21, preprocessing the picture to make the picture conform to the size of the backbone, and inputting the picture into the backbone for feature extraction;

step S22, dividing the example into two subtasks for parallel processing through YOLACT;

and step S23, predicting the mask coefficient of each instance through the prediction head and the NMS network.

Further, the backbone structure adopts a ResNet101+ FPN network, which specifically comprises the following steps:

ResNet101 contains five convolution modules, and the outputs of the five convolution modules are respectively corresponding to C1 to C5 of the Yolact network structure;

Further, the step S22 is specifically:

p3 generates a group of prototype masks for the whole picture through Protonet, wherein each picture has k prototype masks;

Further, the step S23 is specifically: additionally adding an output mask function on the basis of an Anchor-based detection model, namely outputting confidence scores of c categories, 4 regression quantities and k mask coefficients for each frame; the mask coefficient has positive and negative values, and based on the characteristic that the value domain of the tan h activation function is (-1, 1), the tan h function is used for nonlinear activation during the prediction of the mask coefficient;

finally according to the formula

M＝σ(PC^T)

Further, the Loss L of class confidence of the Loss function of backspace_clsFrame regression loss L_boxAnd mask loss L_lossThree-part composition with class confidence loss L_clsThe same way in SSD, i.e. softmax loss

f (i, j) is the i, j-th element in the matrix f, maxf_jI.e. the largest score among all classification scores of a sample; frame regression loss L_boxThe same way in SSD, i.e., smooth-L1 loss

L_lossFor the integrated masks M and ground gateway masks M_gtCross entropy loss of two classes in between

L_mask＝BCE(M,M_gt)。

Further, the step S3 is specifically:

step S31, constructing a Hessian matrix, wherein the Hessian matrix is the vector f (x, y)

Then passes through discriminant of Hessian matrix

det(H)＝Dxx*Dyy-(0.9*Dxy)²

∑dx,∑|dx|,∑dy,∑|dy|

Namely, a vector of each sub-region, which is used as a descriptor of SURF characteristics;

Further, the step S4 is specifically:

the parallax error obtained according to the triangulation principle is as follows:

d＝x^l-x^r

the relationship with the depth z is:

in which the optical axes of the left and right cameras are parallel, x^lAnd x^rIs the imaging point of point P on the left and right image planes, T is the distance between the optical centers of the left and right cameras, P is a point in space, f is the focal length, O_lAnd O_rIs the optical center of the left and right cameras; the depth z can be obtained by obtaining the parallax d;

Compared with the prior art, the invention has the following beneficial effects:

according to the binocular vision-based unmanned aerial vehicle and electric power tower distance measuring method, the distance between the unmanned aerial vehicle and the electric power tower can be accurately measured, and the safety and stability of the machine body during the inspection period of the unmanned aerial vehicle are guaranteed, so that the position of a fault can be accurately positioned, and the normal work and use of a circuit are greatly guaranteed.

Drawings

FIG. 1 is a network structure of YOLACT according to an embodiment of the present invention;

fig. 2 is a network structure of Protonet according to an embodiment of the present invention;

FIG. 3 is a network structure of a Prediction Head according to an embodiment of the present invention;

FIG. 4 illustrates SURF keypoint location in an embodiment of the invention;

FIG. 5 illustrates the principal direction determination of SURF algorithm feature points in accordance with an embodiment of the present invention;

FIG. 6 is a generation of a feature point descriptor in an embodiment of the present invention;

fig. 7 shows the relationship between binocular disparity and depth distance according to an embodiment of the present invention.

Detailed Description

The invention is further explained below with reference to the drawings and the embodiments.

The invention provides a binocular vision-based unmanned aerial vehicle and electric power tower distance measuring method, which comprises the following steps:

step S1, acquiring pictures of the electric power tower by using a binocular vision camera;

step S2, carrying out example segmentation on the electric power tower image shot by the binocular vision camera based on a YOLACT algorithm to obtain a segmented electric power tower image;

referring to fig. 1, in this embodiment, preferably, first, a series of pictures of the electric power tower captured by a binocular camera mounted on an unmanned aerial vehicle are example-divided based on the YOLACT algorithm.

1. In order to make the shot picture conform to the size of the backbone, the picture needs to be preprocessed and input into the backbone for feature extraction. The backbone structure is the same as the retinane (i.e., single-step object detection model), and specifically, a ResNet101(residual neural network) is adopted, where 101 denotes a total number of layers of a convolutional layer + a fully-connected layer) + a FPN network.

11) ResNet101 contains five convolution modules, the outputs of which correspond to C1-C5 in FIG. 1;

12) to identify large scale images, one FPN network needs to be added after ResNet 101. The FPN network obtains P5 from C5 of ResNet101 through a convolution layer, then carries out bilinear interpolation amplification on P5 once, adds the P4 after convolution to obtain P4, and obtains P3 in the same way. Further, convolution of P5 resulted in P6, and convolution of P6 resulted in P7. Feature extraction is thus done and anchors (anchor points, which set the area of each layer's actual response, making a layer respond to a target of a particular size) of a corresponding size are generated: [24,48,96,192,384], and the next step is carried out.

2. According to fig. 1, yolcat divides the instance split into two subtasks for parallel processing.

21) P3 generates a set of prototype masks for the full picture with k prototype masks per picture through Protonet.

a. According to fig. 2, the P3 layer features in the backbone are extracted, and then the features are convolved by 3 × 3 layers, and then the features are further convolved by Upsampling (for enlarging the image) +3 × 3 layers to become a quarter of the original image, and finally the 1 × 1 convolution reduces the number of channels to k, and k 138 × 138 prototype masks are generated.

The function of the Protonet is similar to a semantic segmentation model, but the training of the Protonet part does not independently set loss, and only monitors the last mask output by the whole network.

22) P3-P7 predict the mask coefficient of each instance through prediction head and NMS network.

a. For mask coefficients, an output mask function is additionally added on the basis of a classical detection model based on Anchor, namely confidence scores of c categories, 4 regression quantities and k mask coefficients are output for each frame. And the mask coefficient has positive and negative values, and based on the characteristic that the value field of the tan h activation function is (-1, 1), the tan h function is used for nonlinear activation during the prediction of the mask coefficient.

b. According to fig. 3, the prediction head is improved on the basis of RetinaNet by sharing a 3 × 3 convolutional network and then performing 3 × 3 convolution respectively. Where a is the number of anchors in each P. And splicing the anchors of each layer after NMS to obtain all mask coefficients.

3. Finally according to the formula

M＝σ(PC^T)

The mask coefficients and the prototype mask are linearly combined. Wherein, P is a set of prototype masks of h × w × k, C is a set of mask coefficients of n × k, which represents n instances filtered by NMS and a threshold, σ is a sigmoid function (i.e. an activation function with a value ranging from 0 to 1), and the size of the finally obtained M is h × w × n, i.e. n masks are predicted.

Loss of class confidence L for Loss function_clsFrame regression loss L_boxAnd mask loss L_lossThree-part composition with class confidence loss L_clsThe same way in SSD, i.e. softmax loss

f (i, j) is the i, j-th element in the matrix f, maxf_jI.e., the largest of all the classification scores of the sample. Frame regression loss L_boxThe same way in SSD, i.e., smooth-L1 loss

L_lossFor the integrated masks M and ground gateway masks M_gtCross entropy loss of two classes in between

L_mask＝BCE(M,M_gt)

With the help of the steps, the example segmentation is firstly carried out on the pictures of the electric power towers shot by the unmanned aerial vehicle, different electric power towers in the pictures are distinguished, and the segmentation result is transferred to the next step.

Step S3, performing feature matching on the same electric power tower image divided by the left and right binoculars by using a SURF feature matching method to obtain accurate feature point pairs;

preferably, in this embodiment, step S3 specifically includes:

firstly, constructing a Hessian matrix, namely a square matrix formed by independent variables of second-order partial derivatives of vectors. For vector f (x, y), its Hessian matrix is

Then passes through discriminant of Hessian matrix

And judging and detecting edge points of the image. In order to generate stable image characteristics, in this embodiment, a second-order standard gaussian function is selected for filtering before constructing the Hessian matrix, and a second-order partial derivative is calculated through convolution between specific kernels, so as to obtain a filtered Hessian matrix

In order to increase the operation rate of the SURF algorithm to meet the requirement of feature matching, in this embodiment, a box filter is used to approximately replace a gaussian filter, and a weight value varying with the scale is introduced to balance the error, so that the discriminant of the Hessian matrix becomes

det(H)＝Dxx*Dyy-(0.9*Dxy)²

Dxx is a second-order partial derivative in the x direction, Dyy is a second-order partial derivative in the y direction, and Dxy is a second-order partial derivative obtained by firstly obtaining partial derivatives in the x direction and then obtaining partial derivatives in the y direction. Based on the discriminant, the filtering of the image can be converted into the problem of addition and subtraction operation of pixel sums between different areas on the image, and the sum of pixel gray-scale images can be rapidly calculated by means of an integral graph, so that the characteristic points can be rapidly discriminated, and the operation speed is improved.

In order to detect extreme points of different scales, in the implementation, on the premise of keeping the size of an image unchanged, the size of a filter is changed by means of box filtering, and a response image of the box filtering is quickly calculated through an integral graph to construct a scale space. The characteristic points can be quickly searched and positioned by means of the scale space, each pixel point processed by the Hessian matrix is compared with 26 points in the three-dimensional image space and the neighborhood of the scale space according to the graph 4, the characteristic points are preliminarily determined by utilizing NMS, the characteristic points at the sub-pixel level are obtained by adopting a three-dimensional linear interpolation method, meanwhile, the characteristic points with weak energy or wrong positioning are filtered, and finally stable characteristic points are screened out.

Meanwhile, in order to ensure the rotation invariance of the feature descriptors, the SURF algorithm needs to allocate the main directions of the feature points. According to fig. 5, the harr wavelet characteristics in the circular domain of the characteristic point are counted, namely the characteristic point is taken as the center, the sum of haar wavelet responses in the x-y direction in a sector area of 60 degrees within a certain radius range is counted, weights of different degrees are given to the response values according to the distance from the characteristic point, and finally the sector with the maximum value is taken as the main direction of the characteristic point.

Along the main direction of the feature point, a square with the side length of 20s (wherein s is the scale of the feature point) is framed around the feature point and is divided into 16 sub-regions, and haar wavelet characteristics of 25 pixels in the horizontal direction and the vertical direction relative to the main direction are counted by each sub-region to obtain four values

∑dx,∑|dx|,∑dy,∑|dy|

I.e. the vector for each sub-region. According to fig. 6, since there are 16 subregions and four vectors per subregion, there are 16 × 4-64-dimensional vectors as descriptors of SURF features.

And finally, matching the characteristic points, and determining the matching degree by calculating the Euclidean distance between the two characteristic points, wherein the shorter the Euclidean distance is, the better the matching degree of the two characteristic points is, and in addition, the characteristic points in opposite directions can be quickly eliminated by utilizing the positive and negative judgment of Hessian matrix tracks.

And S4, deriving an accurate depth map according to the relation between the parallax and the depth in the triangular ranging and binocular vision, and further calculating the distance between the unmanned aerial vehicle and the electric power tower.

Preferably, in this embodiment, step S4 is specifically to find the corresponding relationship between each pair of images, and then obtain a disparity map according to the principle of triangulation, and the invention performs disparity and depth conversion on the extracted feature points based on the extracted feature points. From FIG. 7, the parallax is easily obtained

d＝x^l-x^r

With depth z

In which the optical axes of the left and right cameras are parallel, x^lAnd x^rIs the imaging point of point P on the left and right image planes, T is the distance between the optical centers of the left and right cameras, P is a point in space, f is the focal length, O_lAnd O_rIs the optical center of the left and right cameras. Therefore, the depth z can be obtained by only obtaining the parallax d. According to the SURF algorithm, for the characteristic matching images of the left and right binocular vision images after the example segmentation, the distance between matching points is counted, namely the parallax d, the depth map is solved according to the relation between the binocular vision parallax and the depth distance, and then errors possibly generated due to geometric distortion and noise interference in stereo matching are eliminated, so that the accurate distance between the unmanned aerial vehicle and the power tower is further solved.

Preferably, in the present embodiment, in order to convert the captured 2D image information into 3D space object information and thereby reconstruct and identify an object, calibration of a binocular camera is required. The mutual relation between the three-dimensional geometric position of the space object and the corresponding point in the image is determined by the geometric model imaged by the camera, and the geometric model is determined by the camera parameters, so that the calibration of the camera is to obtain the internal reference and the external reference and determine the relative position relation of the binocular camera, thereby establishing a camera imaging model and determining the corresponding relation between the object point in the space coordinate system and the pixel point in the imaging plane.

Preferably, in this embodiment, because the distance between the unmanned aerial vehicle and the electric power tower needs to be measured, the parameters of the binocular camera needs to be calibrated before ranging, and because the same target needs to be subjected to feature matching after calibration, binocular correction is also needed, so that the detected same feature point is located on the same straight line in the horizontal direction of the two images of the left camera and the right camera, and corresponding distortion correction is performed, thereby correcting the images and greatly accelerating the matching speed of the feature point.

The above description is only a preferred embodiment of the present invention, and all equivalent changes and modifications made in accordance with the claims of the present invention should be covered by the present invention.

15页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种大视场异物检测装置及方法

Binocular vision-based unmanned aerial vehicle and electric power tower distance measuring method

相关技术

网友询问留言