Instrument detection method based on one-shot mechanism

文档序号:1964287 发布日期:2021-12-14 浏览:19次 中文

阅读说明:本技术 基于one-shot机制的仪表检测方法 (Instrument detection method based on one-shot mechanism ) 是由 李晖晖 汪瑨昇 王昕煜 刘玉昊 郭雷 刘航 于 2021-09-03 设计创作,主要内容包括:本发明提供了一种基于one-shot机制的仪表检测方法。首先,利用尺寸变换和透视变换对采集的仪表图像数据集进行增强处理;然后,构建基于one-shot机制的仪表检测网络模型,包括一个基于ResNet-18的特征提取孪生网络模块、RPN模块、ROIAlign池化层和基于Grid Head交换特征融合网络模块;接着,利用增强后的图像数据集对网络模型进行训练,得到训练好的网络;最后,利用训练好的网络对待检测仪表图像进行处理,得到最终的检测结果。本发明具有检测任意四边形仪表表盘位置的能力,且具有较高的仪表检测精度。(The invention provides a method for detecting an instrument based on a one-shot mechanism. Firstly, carrying out enhancement processing on an acquired instrument image data set by using size transformation and perspective transformation; then, constructing an instrument detection network model based on a one-shot mechanism, wherein the instrument detection network model comprises a feature extraction twin network module based on ResNet-18, an RPN module, a ROIAlign pooling layer and a Grid Head exchange feature fusion network module; secondly, training the network model by using the enhanced image data set to obtain a trained network; and finally, processing the image of the instrument to be detected by using the trained network to obtain a final detection result. The invention has the capability of detecting the position of any quadrilateral instrument dial and has higher instrument detection precision.)

1. A method for detecting a meter based on a one-shot mechanism is characterized by comprising the following steps:

step 1: respectively carrying out size enhancement and perspective transformation enhancement processing on each instrument image acquired by monitoring equipment, wherein the images before and after enhancement processing jointly form an instrument image data set; carrying out perspective transformation enhancement processing on a template image of instrument equipment contained in the collected instrument image to obtain an instrument template image data set;

step 2: constructing an instrument detection network model based on a one-shot mechanism, wherein the instrument detection network model comprises a ResNet-18-based feature extraction twin network module, an RPN module, a ROIAlign pooling layer and a Grid Head exchange feature fusion network module, wherein an instrument image to be detected and an instrument template image are respectively input into the ResNet-18 twin network module, and an instrument image feature to be detected and an instrument template image feature are output; the RPN module carries out primary positioning processing on the image characteristics of the instrument to be detected and outputs the image characteristics to obtain the coordinates of a rectangular surrounding frame of the instrument; the ROIAlign pooling layer pools the corresponding image characteristics of the instrument to be detected into the same size according to the coordinates of the rectangular surrounding frame of the instrument, and simultaneously pools the image characteristics of the instrument template into the same size by using the coordinates of the whole image; inputting the image characteristics of the instrument to be detected and the image characteristics of the instrument template after pooling into a GridHead exchange characteristic fusion network-based module at the same time, and outputting to obtain marking information of four vertex positions of a quadrangle of the instrument to be detected;

the Grid Head exchange-based feature fusion network module comprises a twin network module, a feature interaction fusion module and a deconvolution layer, wherein the twin network module consists of 8 convolution layers and is used for inputting the features F of the image to be detecteddinAnd respectively extracting the features of the template image, and respectively recording the extracted features as FdAnd Q, feature interactive fusion module pair extracted feature FdAnd Q, performing fusion treatment, specifically as follows:

will be characterized by FdAnd Q are respectively averagely divided into M groups according to the channels, and the characteristic graphs corresponding to the ith Grid point are respectively marked as FdiAnd QiAnd set of source points SiThe feature maps corresponding to the j-th point in (1) are respectively marked as FdjAnd QjI 1, 2.. said, M is a Grid point number, j 1, 2.. said, Ki,KiAs a set of source points SiThe number of the source points contained in the Grid is 1, the source points are points in the Grid, the distance between the points and the ith Grid point is 1, and all the source points form a source point set; then, the feature maps F are respectively compareddiAnd QiPerforming deconvolution processing to obtain corresponding unfused heatmap feature maps, which are respectively marked as hF' and hQ', respectively set up the characteristic diagram FdjAnd QjObtaining corresponding new feature graphs to be fused through 2 convolution layers with convolution kernels of 3 x 3, and respectively marking the feature graphs as Td:j→i(Fdj) And Tq:j→i(Qj) A feature map QiObtaining template characteristics to be fused through 2 convolution layers with convolution kernels of 3 x 3, and marking the template characteristics as Tqd:i(Qi) A feature map QjObtaining template characteristics to be fused through 2 convolution layers with convolution kernels of 3 x 3, and marking the template characteristics as Tqd:j→i(Qj),i=1,2,...,M,j=1,2,...,Ki(ii) a Next, the feature map F is processeddi、QiAnd fused feature map Td:j→i(Fdj)、Tq:j→i(Qj) TemplateCharacteristic Tqd:i(Qi)、Tqd:j→i(Qi) Additive fusion processing was performed according to the following formula, i ═ 1, 2., M, to obtain a feature map F 'after fusion'di、Q′i

Finally, feature map F'di、Q′iPerforming secondary addition fusion treatment according to the following formula to obtain a secondary fusion characteristic diagram F ″, respectivelydi、Q″i

Wherein, T'j→i(F′dj) Is represented by characteristic diagram F'djNew two-level feature map to be fused obtained by 2 convolutional layers, where convolutional layer structure and previously obtained feature map Td:j→i(Fdj) The coiling layers in the step (a) are the same in structure; t'j→i(Q′j) Is a characteristic map Q'jObtaining a new two-level feature map to be fused by 2 convolutional layers, wherein the convolutional layer structure obtains the feature map Tq:j→i(Qj) The coiling layers in the step (a) are the same in structure; t'qd:i(Qi) Representation characteristic diagram QiConvolution layer processing with two convolution kernels of 3 x 3; t'qd:j→i(Qj) Representation characteristic diagram QjConvolution layer processing with two convolution kernels of 3 x 3; 1,2, M, j 1,2, 1,Ki

the deconvolution layer in the feature fusion network module based on Grid Head exchange consists of two deconvolution layers, and a feature map F' output by the fusion modulediAnd Q ″)iRespectively inputting the data into a deconvolution layer, and outputting to obtain a final fused heatmap feature map;

and step 3: taking the image data set obtained in the step 1 and the image in the instrument template image data set as input, and training the instrument detection network model based on the one-shot mechanism constructed in the step 2 by adopting a random gradient descent method to obtain a trained network model; wherein the loss function of the network is calculated as:

Loss=Lcls+Lreg+Lseg (5)

wherein Loss represents the total Loss of the network, LclsRepresents the RPN module classification loss, LregRepresents the RPN module position regression loss, LclsAnd LregConsistent with losses in fast RCNN; l issegRepresenting the cross entropy loss of the Grid heatmap in the Grid Head exchange feature fusion network module, and obtaining the cross entropy loss by the following calculation:

Lseg=Lend of seg fusion+Lseg fused

Wherein L isseg unfusedRepresents the cross-entropy loss, L, corresponding to the unfused heatmap feature mapseg fusedThe cross entropy loss of the finally fused heatmap feature map is calculated according to the following formula:

wherein M is the number of Grid points, N is the number of pixels of the heatmap feature map, and tk,tRepresents the value, t'k,lRepresents the value of the kth pixel in the unfused heatmap feature map corresponding to the ith Grid point, tk,lAnd t'k,lThe value is in the range of 0 to 1,the method comprises the steps of representing the value of the kth pixel in a label graph corresponding to an unfused heatmap feature graph corresponding to the ith Grid point, and taking the value of the kth pixel as 0 and 1, wherein the pixel of 1 represents that a predicted Grid point area corresponds to the kth pixel, and the pixel of 0 represents that the predicted Grid point area does not correspond to the kth pixel;

and 4, step 4: inputting the instrument image to be detected and the template instrument image into the network model trained in the step 3, outputting to obtain a predicted heatmap feature map, and converting the generated heatmap into the quadrilateral vertex position of the instrument to be detected according to the GridRCNN target detection network mode, wherein the method specifically comprises the following steps:

wherein (I)x,Iy) Representing the coordinates of the vertex position of the meter to be detected in the image, (P)x,Py) Coordinates of the vertex position of the bounding box generated by the RPN module (H)x,Hy) Indicates the position of the final heatmap predicted point in the heatmap feature map, (w)p,hp) Represents the width and height of the bounding box generated by the RPN model, (w)o,ho) Indicating the width and height of the heatmap.

2. The method for detecting a meter based on one-shot mechanism according to claim 1, wherein: the specific process of the perspective transformation enhancement is as follows:

obtaining instrument size labels in an instrument image, setting 4 marked vertexes as P1, P2, P3 and P4 respectively, wherein P1 and P3 are diagonal vertexes, drawing horizontal and vertical straight lines through a vertex P1, drawing horizontal and vertical straight lines through a vertex P3, intersecting the four straight lines to generate 4 rectangular coordinate systems, setting a coordinate system A of a vertex P2 located at the closest distance from the origin thereof, a coordinate system B of a vertex P4 located at the closest distance from the origin thereof, randomly selecting one point in four quadrant regions of the coordinate system A as a vertex P2 ' transformed by the vertex P2, recording position information thereof, randomly selecting one point in four quadrant regions of the coordinate system B as a vertex P4 ' transformed by the vertex P4, recording position information thereof, transforming the positions of the points P1, P2, P3, P4 and the points P1, P2P 3 and P4 ' into corresponding positions according to a perspective transformation formula of the front and back vertexes 4, transforming the original instrument image according to the perspective transformation formula to obtain 2 images after perspective transformation; and traversing all the vertexes and rectangular coordinate system regions according to the process, and obtaining 16 images after perspective transformation for each instrument image.

Technical Field

The invention belongs to the technical field of computer vision and target detection, and particularly relates to a one-shot mechanism-based instrument detection method.

Background

Before deep learning is introduced, an instrument detection and positioning technology mainly uses an image matching mode and utilizes an instrument template to realize accurate positioning of an instrument, however, the method seriously depends on the accuracy of image matching, and is difficult to adapt to complex conditions such as illumination change, image definition change, appearance change of instruments of different manufacturers and the like based on traditional characteristics such as Scale-invariant feature transform (SIFT) and the like, so that positioning fails under partial scene conditions. However, the accuracy of the detection method based on the HOG (Histogram of Oriented Gradient) and the SVM (Support Vector Machine) is difficult to match with the existing convolutional neural network. Therefore, the research on the instrument detection and positioning technology based on the convolutional neural network has very important practical significance.

The difficulty of using computer vision technology to detect and position instruments mainly has two aspects: firstly, the instrument detection task often needs to accurately acquire the position of the dial plate of the instrument, correct the dial plate into a rectangular shape for subsequent reading identification and the like, and particularly, the instrument picture acquired under the monitoring equipment often inclines in angle and needs to be subjected to perspective transformation to acquire the rectangular dial plate of the instrument, so that the algorithm needs to have the capability of predicting any quadrilateral position, the general target detection technology usually only needs to predict the minimum bounding box of a target, the positioning effect of the detection positioning task of the instrument dial plate cannot meet the requirement of the subsequent task, and the instrument dial plate cannot be further corrected into the instrument dial plate under the normal viewing angle by only utilizing the position information of the minimum bounding box of the instrument dial plate. In addition, even if the example segmentation technology is used, the position mask of the instrument dial can be acquired, but the problem of reducing errors introduced when the position mask is used for perspective transformation is faced; secondly, in different production environments, the appearances of instruments with the same function have certain similarity, but due to different instrument manufacturers, the appearances of the instruments can be changed to a certain extent due to different factory batches or version models, the current detection algorithm based on deep learning depends on the data distribution of a training set, and when the data of the instruments to be actually tested are changed remarkably, namely the appearances of the instruments in the training set are different from the appearances of the instruments in the testing set, the testing effect can be greatly influenced.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a meter detection method based on a one-shot mechanism. Firstly, carrying out enhancement processing on an acquired instrument image data set by using size transformation and perspective transformation; then, constructing an instrument detection network model based on a one-shot mechanism, wherein the instrument detection network model comprises a feature extraction twin network module based on ResNet-18, an RPN module, an ROI Align pooling layer and a Grid Head exchange feature fusion network module; secondly, training the network model by using the enhanced image data set to obtain a trained network; and finally, processing the image of the instrument to be detected by using the trained network to obtain a final detection result. The invention has the capability of detecting the position of any quadrilateral instrument dial, and can conveniently convert the instrument into the rectangular instrument dial under the front view angle by using the predicted quadrilateral position information; in addition, through a one-shot mechanism, the detection algorithm can utilize the information of the template picture, and the instrument detection precision can be improved when the appearance of the instrument changes.

A method for detecting a meter based on a one-shot mechanism is characterized by comprising the following steps:

step 1: respectively carrying out size enhancement and perspective transformation enhancement processing on each instrument image acquired by monitoring equipment, wherein the images before and after enhancement processing jointly form an instrument image data set; carrying out perspective transformation enhancement processing on a template image of instrument equipment contained in the collected instrument image to obtain an instrument template image data set;

step 2: constructing an instrument detection network model based on a one-shot mechanism, wherein the instrument detection network model comprises a ResNet-18-based feature extraction twin network module, an RPN module, an ROI Align pooling layer and a Grid Head exchange feature fusion network module, wherein an instrument image to be detected and an instrument template image are respectively input into the ResNet-18 twin network module, and the image feature of the instrument to be detected and the image feature of the instrument template are output; the RPN module carries out primary positioning processing on the image characteristics of the instrument to be detected and outputs the image characteristics to obtain the coordinates of a rectangular surrounding frame of the instrument; the ROI Align pooling layer pools the corresponding image characteristics of the instrument to be detected into the same size according to the rectangular surrounding frame coordinates of the instrument, and simultaneously pools the image characteristics of the instrument template into the same size by using the full-image coordinates; inputting the image characteristics of the instrument to be detected and the image characteristics of the instrument template after pooling into a Grid Head exchange characteristic fusion network-based module at the same time, and outputting to obtain marking information of four vertex positions of a quadrangle of the instrument to be detected;

the Grid Head exchange-based feature fusion network module comprises a twin network module, a feature interaction fusion module and a deconvolution layer, wherein the twin network module consists of 8 convolution layers and is used for inputting the features F of the image to be detecteddinAnd respectively extracting the features of the template image, and respectively recording the extracted features as FdAnd Q, feature interactive fusion module pair extracted feature FdAnd Q, performing fusion treatment, specifically as follows:

will be characterized by FdAnd Q are respectively averagely divided into M groups according to the channels, and the characteristic graphs corresponding to the ith Grid point are respectively marked as FdiAnd QiAnd set of source points SiThe feature maps corresponding to the j-th point in (1) are respectively marked as FdjAnd QjI is 1,2, …, M is Grid point, j is 1,2, …, Ki,KiAs a set of source points SiThe number of the source points contained in the Grid is 1, the source points are points in the Grid, the distance between the points and the ith Grid point is 1, and all the source points form a source point set; then, the feature maps F are respectively compareddiAnd QiPerforming deconvolution processing to obtain corresponding unfused heatmap feature maps, which are respectively marked as hF' and hQ', respectively set up the characteristic diagram FdjAnd QjObtaining corresponding new feature graphs to be fused through 2 convolution layers with convolution kernels of 3 x 3, and respectively marking the feature graphs as Td:j→i(Fdj) And Tq:j→i(Qj) A feature map QiObtaining template characteristics to be fused through 2 convolution layers with convolution kernels of 3 x 3, and marking the template characteristics as Tqd:i(Qi) A feature map QjObtaining template characteristics to be fused through 2 convolution layers with convolution kernels of 3 x 3, and marking the template characteristics as Tqd:j→i(Qj),i=1,2,…,M,j=1,2,…,Ki(ii) a Next, the feature map F is processeddi、QiAnd fused feature map Td:j→i(Fdj)、Tq:j→i(Qj) Template feature Tqd:i(Qi)、Tqd:j→i(Qi) The additive fusion treatment was carried out according to the following formula, i ═ 1,2, …, and M, respectively, to obtain a feature map F'di、Q′i

Finally, feature map F'di、Q′iCarrying out secondary addition fusion treatment according to the following formula respectively to obtain a secondary fusion characteristic diagram F ″di、Q″i

Wherein, T'j→i(Fdj) Representation feature diagram FdjNew two-level feature map to be fused obtained by 2 convolutional layers, where convolutional layer structure and previously obtained feature map Td:j→i(Fdj) The coiling layers in the step (a) are the same in structure; t'j→i(Q′j) Is a characteristic map Q'jObtaining a new two-level feature map to be fused by 2 convolutional layers, wherein the convolutional layer structure obtains the feature map Tq:j→i(Qj) The coiling layers in the step (a) are the same in structure; t'qd:i(Qi) Representation characteristic diagram QiConvolution layer processing with two convolution kernels of 3 x 3; t'qd:j→i(Qj) Representation characteristic diagram QjConvolution layer processing with two convolution kernels of 3 x 3; i 1,2, …, M, j 1,2, …, Ki

The deconvolution layer in the feature fusion network module based on Grid Head exchange consists of two deconvolution layers, and a feature map F' output by the fusion modulediAnd Q ″)iRespectively inputting the data into a deconvolution layer, and outputting to obtain a final fused heatmap feature map;

and step 3: taking the image data set obtained in the step 1 and the image in the instrument template image data set as input, and training the instrument detection network model based on the one-shot mechanism constructed in the step 2 by adopting a random gradient descent method to obtain a trained network model; wherein the loss function of the network is calculated as:

Loss=Lcls+Lreg+Lseg (5)

wherein Loss represents the total Loss of the network, LclsRepresents the RPN module classification loss, LregRepresents the RPN module position regression loss, LclsAnd LregConsistent with losses in fast RCNN; l issegRepresenting the cross entropy loss of the Grid heatmap in the Grid Head exchange feature fusion network module, and obtaining the cross entropy loss by the following calculation:

Lseg=Lseg unfused+Lseg fused

Wherein L isseg unfusedRepresents the cross-entropy loss, L, corresponding to the unfused heatmap feature mapseg fusedThe cross entropy loss of the finally fused heatmap feature map is calculated according to the following formula:

wherein M is the number of Grid points, N is the number of pixels of the heatmap feature map, and tk,lRepresents the value, t'k,lRepresents the value of the kth pixel in the unfused heatmap feature map corresponding to the ith Grid point, tk,lAnd t'k,lThe value is in the range of 0 to 1,the method comprises the steps of representing the value of the kth pixel in a label graph corresponding to an unfused heatmap feature graph corresponding to the ith Grid point, and taking the value of the kth pixel as 0 and 1, wherein the pixel of 1 represents that a predicted Grid point area corresponds to the kth pixel, and the pixel of 0 represents that the predicted Grid point area does not correspond to the kth pixel;

and 4, step 4: inputting the instrument image to be detected and the template instrument image into the network model trained in the step 3, outputting to obtain a predicted heatmap feature map, and converting the generated heatmap into the quadrilateral vertex position of the instrument to be detected according to the Grid RCNN target detection network mode, wherein the method specifically comprises the following steps:

wherein (I)x,Iy) Representing the coordinates of the vertex position of the meter to be detected in the image, (P)x,Py) Coordinates of the vertex position of the bounding box generated by the RPN module (H)x,Hy) Indicates the position of the final heatmap predicted point in the heatmap feature map, (w)p,hp) Representing the width and height of the bounding box generated by the RPN model,(wo,ho) Indicating the width and height of the heatmap.

Further, the specific process of the perspective transformation enhancement is as follows:

obtaining instrument size labels in an instrument image, setting 4 marked vertexes as P1, P2, P3 and P4 respectively, wherein P1 and P3 are diagonal vertexes, drawing horizontal and vertical straight lines through a vertex P1, drawing horizontal and vertical straight lines through a vertex P3, intersecting the four straight lines to generate 4 rectangular coordinate systems, setting a coordinate system A of a vertex P2 located at the closest distance from the origin thereof, a coordinate system B of a vertex P4 located at the closest distance from the origin thereof, randomly selecting one point in four quadrant regions of the coordinate system A as a vertex P2 ' transformed by the vertex P2, recording position information thereof, randomly selecting one point in four quadrant regions of the coordinate system B as a vertex P4 ' transformed by the vertex P4, recording position information thereof, transforming the positions of the points P1, P2, P3, P4 and the points P1, P2P 3 and P4 ' into corresponding positions according to a perspective transformation formula of the front and back vertexes 4, transforming the original instrument image according to the perspective transformation formula to obtain 2 images after perspective transformation; and traversing all the vertexes and rectangular coordinate system regions according to the process, and obtaining 16 images after perspective transformation for each instrument image.

The invention has the beneficial effects that: because the instrument image is acquired by the monitoring equipment with the limited point positions, the imaging angle of the acquired instrument image and the size of the instrument target are less changed, and the size change of the instrument imaging angle instrument target of the instrument image set can be increased by adopting size enhancement and perspective transformation enhancement processing, so that a network model with higher generalization capability can be trained; because a method of predicting heatmp is adopted, the quadrilateral vertex of the instrument dial is directly predicted, and the position of the quadrilateral dial of the rectangular dial under different imaging angles can be accurately predicted; by adopting a one-shot mechanism in the Grid Head exchange feature fusion based network module, the similarity between the appearance of the instrument and the appearance of the instrument template is fully utilized, and the detection and positioning precision of the instrument dial is further improved. The invention can effectively position the position of the instrument dial after imaging under different imaging angles, can fully utilize the template picture information to help improve the instrument detection precision, and is helpful to promote the development of instrument detection by using the computer vision technology at present.

Drawings

FIG. 1 is a flow chart of a method for detecting a meter based on a one-shot mechanism according to the present invention;

FIG. 2 is a schematic diagram of a perspective transformation enhancement process in the present invention;

FIG. 3 is a perspective transformation enhanced instrument image;

FIG. 4 is a schematic diagram of a Grid Head exchange feature based fusion network module according to the present invention;

FIG. 5 is a schematic diagram of the distribution of 9 Grid points during quadrilateral position prediction according to the present invention;

FIG. 6 is an image of a meter image obtained by the method of the present invention;

in the figure, (a) -oil liquid surface detection result image; (b) -oil surface template picture; (c) -a bleeding table test result image; (d) -a drain table template picture.

Detailed Description

The present invention will be further described with reference to the following drawings and examples, which include, but are not limited to, the following examples.

As shown in fig. 1, the present invention provides a method for detecting a meter based on a one-shot mechanism, which is implemented as follows:

1. image data pre-processing

The instrument data under the monitoring scene is acquired from real monitoring equipment and is influenced by the installation position of the monitoring equipment, and the imaging angle of the instrument data is relatively fixed, so that the instrument image data has instrument images with different sizes and different visual angles by utilizing the characteristic that the instrument only has one piece of label information in each image and carrying out two enhancement modes of size enhancement and perspective transformation enhancement on the instrument image according to the label information of the instrument.

The specific process of the label-based gauge size enhancement process is as follows:

1) and acquiring a meter size label, and acquiring a minimum bounding rectangle frame according to the marked 4 vertexes.

2) And calculating the length-width ratio according to the instrument rectangular frame, and randomly sampling the length-width ratio within a certain range to ensure that the length-width ratio of the rectangular frame is changed within a certain range.

3) Different height ranges are set for different meters, so that the condition that the meter targets which do not exist in reality and have the height exceeding the normal range have adverse effects on the training of the detection algorithm is avoided. In addition, the height range can be properly adjusted according to the height of the label, so that an overlarge target generated by a small fuzzy picture is avoided. A value is randomly generated within the determined height range as the transformed height.

4) The transformed rectangle size is determined according to the randomly generated aspect ratio and height. And then the position of the rectangle in the picture is randomly determined according to the size of the picture. And acquiring a transformation matrix according to the new rectangular position to realize synchronous transformation of the label and the instrument image.

Based on the perspective transformation enhancement mode of the vertex position, more photos with different visual angles are simulated by utilizing the perspective transformation, and the imaging limitation of the fixed point position of the monitoring camera equipment is overcome. As shown in fig. 2, a meter size label in a meter image is obtained, a solid black line shows a labeled quadrangle, two opposite vertex points P1 and P3 respectively draw horizontal and vertical straight lines, and assuming that the origin point of the vertex point P2 is a closest coordinate system a and the origin point of the vertex point P4 is a closest coordinate system B, the enhanced position may randomly appear in 4 quadrant regions of each coordinate system. And forming new labels by the randomly sampled P2 'and P4' points and the points P1 and P3, solving a perspective transformation matrix corresponding to the positions of the old label and the new label, and generating an instrument picture under a new visual angle according to the perspective transformation matrix, wherein each label can have at least 16 different visual angle conditions, namely 16 perspective transformed images are obtained for each instrument image. Fig. 3 is an image at 16 viewing angles after a certain instrument image is subjected to perspective transformation and enhancement by adopting the method.

Through the steps, an image set with data being expanded in the training phase is obtained, and instrument images with different sizes and different visual angles under the real use condition can be covered, so that an instrument detection network model can be trained better.

2. Instrument detection network model based on one-shot mechanism

In order to complete accurate positioning of the meter dial, the main network needs to ensure that the feature extraction operation can be efficiently completed, and also needs to ensure higher processing speed, and meanwhile, considering that the scale of the meter data set is smaller, a ResNet-18 network is selected as the main network to complete feature extraction. Therefore, the instrument detection network model based on the one-shot mechanism constructed by the invention comprises a feature extraction twin network module based on ResNet-18, an RPN module, an ROI Align pooling layer and a Grid Head exchange feature fusion network module.

The ResNet-18 neural network is composed of 9 convolutional layers, 1 pooling layer and 4 residual error structures, and the same network module extracts two picture features respectively, and is a typical twin network structure. Respectively inputting the picture of the instrument to be detected and the picture of the instrument template into a ResNet-18 twin network module, and outputting to obtain the picture characteristics of the instrument to be detectedAnd instrument template picture features

The RPN module comprises 1 3 × 3 convolution layers and 2 1 × 1 convolution layers, performs initial positioning processing on the picture characteristics of the meter to be detected, and outputs the coordinates of the rectangular surrounding frame of the meter.

And the ROI Align pooling layer calculates the pixel point value of a non-integer position by utilizing a bilinear interpolation mode, realizes the normalization of the feature maps with different sizes, and outputs the input feature maps with different sizes into the feature map with the same size. The corresponding image characteristics of the instrument to be detected are pooled into the same size according to the rectangular surrounding frame coordinates of the instrument, and the image characteristics of the instrument template are also pooled into the same size by using the full-image coordinates.

And simultaneously inputting the image characteristics of the instrument to be detected and the image characteristics of the instrument template after pooling into a Grid Head exchange characteristic fusion network-based module, and outputting to obtain marking information of four vertex positions of the quadrangle of the instrument to be detected. The invention provides an interactive fusion strategy Grid Head module, keeps the main Network structure of the Grid R-CNN unchanged, only uses a one-shot mechanism in the Grid Head module to improve the target detection precision, and utilizes the characteristics of a meter data set to exert the respective advantages of an RPN (Region Proposal Network) module and the Grid Head module: the detection precision is improved, and under the condition of less data, the adverse effect of using a metric learning loss function and a complex fusion strategy on network training is avoided.

The Grid Head exchange-based feature fusion network module comprises a twin network module, a feature interaction fusion module and a deconvolution layer, wherein the twin network module is composed of 8 convolution layers and is used for inputting features F of a picture to be detecteddinAnd respectively extracting the features of the template picture, and respectively recording the extracted features as FdAnd Q, feature interactive fusion module pair extracted feature FdAnd Q, performing fusion treatment, specifically as follows:

will be characterized by FdAnd Q are respectively averagely divided into M groups according to the channels, and the characteristic graphs corresponding to the ith Grid point are respectively marked as FdiAnd QiAnd set of source points SiThe feature maps corresponding to the j-th point in (1) are respectively marked as FdjAnd QjI is 1,2, …, M is Grid point, j is 1,2, …, Ki,KiAs a set of source points SiThe number of the source points contained in the Grid is 1, the source points are points in the Grid, the distance between the points and the ith Grid point is 1, and all the source points form a source point set; then, the feature maps F are respectively compareddiAnd QiPerforming deconvolution processing to obtain corresponding unfused heatmap feature maps, which are respectively marked as hF' and hQ', respectively set up the characteristic diagram FdjAnd QjObtaining corresponding new feature graphs to be fused through 2 convolution layers with convolution kernels of 3 x 3, and respectively marking the feature graphs as Td:j→i(Fdj) And Tq:j→i(Qj) A feature map QiObtaining template characteristics to be fused through 2 convolution layers with convolution kernels of 3 x 3, and marking the template characteristics as Tqd:i(Qi) Will specially beSign graph QjObtaining template characteristics to be fused through 2 convolution layers with convolution kernels of 3 x 3, and marking the template characteristics as Tqd:j→i(Qj),i=1,2,…,M,j=1,2,…,Ki(ii) a Next, the feature map F is processeddi、QiAnd fused feature map Td:j→i(Fdj)、Tq:j→i(Qj) Template feature Tqd:i(Qi)、Tqd:j→i(Qi) The additive fusion treatment was carried out according to the following formula, i ═ 1,2, …, and M, respectively, to obtain a feature map F'di、Q′i

Finally, feature map F'di、Q′iCarrying out secondary addition fusion treatment according to the following formula respectively to obtain a secondary fusion characteristic diagram F ″di、Q″i

Wherein, T'j→i(F′dj) Is represented by characteristic diagram F'djNew two-level feature map to be fused obtained by 2 convolutional layers, where convolutional layer structure and previously obtained feature map Td:j→i(Fdj) The coiling layers in the step (a) are the same in structure; t'j→i(Q′j) Is a characteristic map Q'jObtaining a new two-level feature map to be fused by 2 convolutional layers, wherein the convolutional layer structure obtains the feature map Tq:j→i(Qj) The coiling layers in the same structure;T′qd:i(Qi) Representation characteristic diagram QiConvolution layer processing with two convolution kernels of 3 x 3; t'qd:j→i(Qj) Representation characteristic diagram QjConvolution layer processing with two convolution kernels of 3 x 3; i 1,2, …, M, j 1,2, …, Ki

The deconvolution layer in the feature fusion network module based on Grid Head exchange consists of two deconvolution layers, and a feature map F' output by the fusion modulediAnd Q ″)iAnd respectively inputting the data into a deconvolution layer, and outputting to obtain a final fused heatmap feature map.

3. Network model training

Taking the image data set obtained in the step 1 and the image in the instrument template image data set as input, and training the instrument detection network model based on the one-shot mechanism constructed in the step 2 by adopting a random gradient descent method to obtain a trained network model; wherein the loss function of the network is calculated as:

Loss=Lcls+Lreg+Lseg (14)

wherein Loss represents the total Loss of the network, LclsRepresents the RPN module classification loss, LregRepresents the RPN module position regression loss, LclsAnd LregConsistent with losses in fast RCNN; l issegRepresenting the cross entropy loss of the Grid heatmap in the Grid Head exchange feature fusion network module, and obtaining the cross entropy loss by the following calculation:

Lseg=Lseg unfused+Lseg fused (15)

Wherein L isseg unfusedRepresents the cross-entropy loss, L, corresponding to the unfused heatmap feature mapseg fusedThe cross entropy loss of the finally fused heatmap feature map is calculated according to the following formula:

wherein M is the number of Grid points, N is the number of pixels of the heatmap feature map, and tk,lRepresents the value, t'k,lRepresents the value of the kth pixel in the unfused heatmap feature map corresponding to the ith Grid point, tk,lAnd t'k,lThe value is in the range of 0 to 1,and the values of the kth pixel in the label graph corresponding to the unfused heatmap feature graph corresponding to the ith Grid point are represented in the value ranges of 0 and 1, the pixel of 1 represents that the corresponding predicted Grid point region is, and the pixel of 0 represents that the corresponding predicted Grid point region is not.

4. Meter detection

Inputting an instrument image to be detected and a template instrument image into the network model trained in the step 3, outputting to obtain a predicted heatmap feature map, converting the generated heatmap into the position of the quadrilateral vertex of the instrument to be detected according to the Grid RCNN target detection network mode, wherein when the Grid point is 9, a schematic diagram of the Grid point and the position of the quadrilateral vertex is shown in FIG. 5, and the specific steps are as follows:

wherein (I)x,Iy) Representing the coordinates of the vertex position of the meter to be detected in the image, (P)x,Py) Coordinates of the vertex position of the bounding box generated by the RPN module (H)x,Hy) Indicates the position of the final heatmap predicted point in the heatmap feature map, (w)p,hp) Represents the width and height of the bounding box generated by the RPN model, (w)o,ho) Indicating the width and height of the heatmap.

In order to verify the effectiveness of the method of the invention, in a hardware environment, the following steps are carried out: a CPU: i9-9900, memory: 16G, hard disk: 1T, independent display card: the NVIDIA GeForce GTX 1080ti, 11G, under the condition that the system environment is Ubuntu 16.0.4, software python 3.6, opencv3.4 and Pyorch 1.3 are adopted to carry out simulation experiments. The data set used in the experiment was a self-built instrument data set, and fig. 6 shows the resulting image obtained after the detection by the method of the present invention. It can be seen that the positions of the instrument dials can be accurately positioned by adopting a Gird point prediction mode, and the Grid Head exchange feature fusion network module based on a one-shot mechanism can improve the detection positioning precision of the instruments by utilizing the similarity among the similar instruments; the network model can accurately position the position of the instrument dial under different illumination conditions and different viewing angles.

14页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种智能识别表格报价图片并处理成标准数据的方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!