Pointer instrument image data synthesis method

文档序号:1862006 发布日期:2021-11-19 浏览:5次 中文

阅读说明:本技术 一种指针式仪表图像数据合成方法 (Pointer instrument image data synthesis method ) 是由 田联房 王昭霖 杜启亮 于 2021-07-28 设计创作,主要内容包括:本发明公开了一种指针式仪表图像数据合成方法,包括:1)构建表盘定位数据集和准备待增强数据集,训练表盘检测网络,使用网络输出表盘图像;2)量程量纲数据集的构建,训练字符识别网络,使用网络输出量程量纲的文字信息;3)下载不同风格字符数据集,训练字符生成GAN网络,使用网络输出新的量程量纲图像替换原量程量纲图像,输出量程量纲增强表盘图像;4)指针刻度数据集的构建,训练Mask R-CNN网络,使用网络输出指针刻度的掩模,对指针和刻度进行形态学变换,对指针进行旋转,生成新的指针图像和刻度图像,输出指针刻度增强表盘图像,嵌入原仪表图像表盘位置。本发明具有多样性、灵活性以及在不同背景下有更好的泛化能力,扩增可供读数仪表数据。(The invention discloses a pointer instrument image data synthesis method, which comprises the following steps: 1) constructing a dial plate positioning data set and a data set to be enhanced, training a dial plate detection network, and outputting a dial plate image by using the network; 2) constructing a range dimension data set, training a character recognition network, and outputting text information of the range dimension by using the network; 3) downloading character data sets of different styles, training characters to generate a GAN network, outputting a new range dimension image to replace an original range dimension image by using the network, and outputting a range dimension enhanced dial image; 4) constructing a pointer scale data set, training a Mask R-CNN network, using a Mask of network output pointer scales, performing morphological transformation on a pointer and the scales, rotating the pointer to generate a new pointer image and a new scale image, outputting a pointer scale enhanced dial image, and embedding the pointer scale enhanced dial image into the position of an original instrument image dial. The invention has diversity, flexibility and better generalization ability under different backgrounds, and can amplify data of a reading instrument.)

1. A pointer instrument image data synthesis method is characterized by comprising the following steps:

1) marking dial plate images on instrument images to construct a dial plate positioning data set and prepare a data set to be enhanced, loading training parameters, training a dial plate detection network by using the dial plate positioning data set, obtaining an optimal dial plate detection network after training is finished, inputting the instrument data set to be enhanced into the optimal dial plate detection network, outputting the dial plate images and cutting the dial plate images; the dial plate detection network adopts a YOLO network, and a backbone network of the YOLO network is improved into a mobile lightweight network so as to reduce network parameters and calculated amount and improve operation speed;

2) cutting the dial plate image marked in the step 1), marking the range dimension on the dial plate image, marking the position and character information of the range dimension as the marking content, constructing a range dimension data set, loading a training parameter, training a character recognition network by using the range dimension data set, obtaining an optimal character recognition network after training is finished, inputting the dial plate image output in the step 1) into the character recognition network, and outputting the character information of the range dimension on the dial plate image; the character recognition network adopts a YOLO network, and a backbone network of the YOLO network is improved into a mobile lightweight network so as to reduce network parameters and calculated amount and improve operation speed;

3) downloading character data sets of different styles on the network, respectively loading training parameters with the character data sets of different styles to train characters to generate a GAN network, obtaining a plurality of corresponding characters to generate the GAN network after training is finished, inputting the text information of the range dimension output in the step 2) into any one trained character to generate the GAN network, generating a range dimension image of a new character style, replacing the original range dimension image on the dial plate image output in the step 1), and outputting a range dimension enhanced dial plate image;

4) cutting the dial plate image marked in the step 1), marking a pointer and scales on the dial plate by using a Mask, constructing a pointer scale data set, training a Mask R-CNN network by using the pointer scale data set according to loading parameters, obtaining an optimal Mask R-CNN network after training, inputting the range dimension enhanced dial plate image output in the step 3) into the optimal Mask R-CNN network, outputting a Mask of the pointer and the scales, obtaining a scale image according to the scale Mask, performing morphological transformation on the scale image to generate a new scale image and replace the original scale image, obtaining a pointer image according to the pointer Mask, performing morphological transformation on the pointer image, rotating the pointer image to generate a new pointer image to replace the original pointer image, and outputting a pointer scale enhanced dial plate image; and embedding the pointer scale enhanced dial image into the dial position of the original instrument image to complete the image data synthesis of the pointer instrument.

2. The method for synthesizing image data of pointer instrument as claimed in claim 1, wherein in step 1), the camera collects various instrument images in different environments, and the instrument images are preprocessed by filtering, image enhancement and graying, so as to construct a dial positioning data set, then abnormal data in the dial positioning data set is removed, including abnormal images with dirty surface, extreme illumination and incomplete shooting, and then the rest data are labeled, wherein the labeled content is the dial position.

3. The method for synthesizing image data of a pointer instrument as claimed in claim 1, wherein in step 1), the specific conditions of the dial detection network are as follows:

a. constructing a feature extraction network according to the requirements of real-time performance and high precision, wherein the feature extraction network consists of a plurality of combined convolution modules, and the method specifically comprises the following steps:

the first layer is a combined convolution module A which consists of a zero filling layer, a convolution layer, a batch normalization layer and an activation layer;

the second layer is a combined convolution module B which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;

the third layer is a combined convolution module C which consists of a zero filling layer, a deep convolution layer, two batch normalization layers, two activation layers and a convolution layer;

the fourth layer is a combined convolution module B which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;

the fifth layer is a combined convolution module C which consists of a zero filling layer, a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;

the sixth layer is a combined convolution module B which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;

the seventh layer is a combined convolution module C which consists of a zero filling layer, a deep convolution layer, two batch normalization layers, two activation layers and a convolution layer;

the eighth layer is a combined convolution module D which consists of five combined convolution modules B;

the ninth layer is a combined convolution module C which consists of a zero filling layer, a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;

the tenth layer is a combined convolution module B which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;

b. constructing prediction networks for outputting and predicting targets with different sizes according to the output of different layers of the feature extraction network, wherein the prediction networks comprise a large-size target prediction network, a medium-size target prediction network and a small-size target prediction network;

b1, the input of the large-size target prediction network is the tenth layer output of the feature extraction network, the large-size target prediction network is composed of a plurality of combined convolution modules and convolution layers, and the specific steps are as follows:

the first layer is a combined convolution module D which consists of five combined convolution modules B;

the second layer is a combined convolution module B which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;

the third layer is a convolution layer;

b2, the input of the medium-size target prediction network is the eighth layer output of the feature extraction network and the first layer output of the large-size target prediction network, the medium-size target prediction network is composed of a plurality of combined convolution modules and convolution layers, and the specific steps are as follows:

the first layer is an input fusion module A, which consists of a combined convolution module B, an up-sampling layer and a tensor splicing layer;

the second layer is a combined convolution module D which consists of five combined convolution modules B;

the third layer is a combined convolution module B which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;

the fourth layer is a convolution layer;

b3, inputting the small-size target prediction network into the sixth layer output of the feature extraction network and the second layer output of the medium-size target prediction network, wherein the small-size target prediction network consists of a plurality of combined convolution modules and convolution layers, and the specific steps are as follows:

the first layer is an input fusion module A, which consists of a combined convolution module B, an up-sampling layer and a tensor splicing layer;

the second layer is a combined convolution module D which consists of five combined convolution modules B;

the third layer is a combined convolution module B which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;

the fourth layer is a convolution layer;

finally, the output of the large-size target prediction network, the medium-size target prediction network and the small-size target prediction network passes through a non-maximum value inhibition layer to obtain the positions and the types of the predicted targets;

c. setting loss functions including a central coordinate loss function, a width and height loss function, a confidence coefficient loss function and a category loss function;

the center coordinate loss function is formulated as follows:

Lossxy=markobject*(2-w*h)*Losslog(xytrue,xypredict)

in the formula, LossxyRepresenting loss of central coordinates, markobjectRepresenting whether the anchor point frame has the mark bit of the object or not, w representing the width of the anchor point frame, h representing the anchor pointHigh, Loss of framelogRepresenting a binary cross-entropy loss, xytrueRepresenting the true central coordinate value, xypredictRepresenting a predicted central coordinate value;

the broad height loss function is formulated as follows:

Losswh=0.5*markobject*(2-w*h)*(whtrue-whpredict)2

in the formula, LosswhRepresents the wide high loss, whtrueRepresents the true width and height value, whpredictRepresents a predicted aspect ratio value;

the confidence loss function is formulated as follows:

Lossconfidence=markobject*Losslog(markobject,cpredict)+(1-markobject)*Losslog(markobject,cpredict)*markignore

in the formula, LossconfidenceRepresenting a loss of confidence, cpredictRepresenting confidence values, mark, of prediction boxesignoreA flag bit representing an anchor box with an IOU less than a threshold;

the class loss function is formulated as follows:

Losscls=markobject*Losslog(clstrue,clspredict)

in the formula, LossclsRepresents class loss, clstrueRepresenting the true class, clspredictRepresenting a prediction category;

the total loss function is formulated as follows:

Loss=(Lossxy+Losswh+Lossconfidence+Losscls)/numf

where Loss represents total Loss and numf represents the floating point number of the total number of inputs;

loading a dial plate detection network designed by training parameters, wherein the training parameters are set as follows: setting a training optimizer Adam, an initial learning rate of 0.001, a maximum training period of 500 and a batch size of 8; and setting a verification set interval detection training accuracy, marking a training completion mark as reaching a maximum training period or meeting the requirement of average cross-over ratio, and storing the network after training is completed.

4. The method for synthesizing image data of a pointer instrument as claimed in claim 1, wherein in step 3), the character generation GAN network is composed of a generation network and a discriminant network, and the specific conditions are as follows:

a. constructing a generating network according to the requirement of generating character images, wherein the generating network is combined by a plurality of combined convolution modules, and the method specifically comprises the following steps:

the first layer is a full connection layer;

the second layer is a reconstruction layer;

the third layer is a combined convolution module A which consists of a zero filling layer, a convolution layer, a batch normalization layer and an activation layer;

the fourth layer is an upper sampling layer;

the fifth layer is a combined convolution module A which consists of a zero filling layer, a convolution layer, a batch normalization layer and an activation layer;

the sixth layer is an upper sampling layer;

the seventh layer is a combined convolution module A which consists of a zero filling layer, a convolution layer, a batch normalization layer and an activation layer;

the eighth layer is a combined convolution module A which consists of a zero filling layer, a convolution layer, a batch normalization layer and an activation layer;

b. constructing a discriminant network according to the effectiveness of the discriminant character image, wherein the input is a generated network output image, and the discriminant network is combined by a plurality of combined convolution modules, and the method specifically comprises the following steps:

the first layer is a combined convolution module E which consists of a zero filling layer, a convolution layer, a batch normalization layer, an activation layer and a Dropout layer;

the second layer is a combined convolution module E which consists of a zero filling layer, a convolution layer, a batch normalization layer, an activation layer and a Dropout layer;

the third layer is a zero filling layer;

the fourth layer is a combined convolution module E which consists of a zero filling layer, a convolution layer, a batch normalization layer, an activation layer and a Dropout layer;

the fifth layer is a combined convolution module E which consists of a zero filling layer, a convolution layer, a batch normalization layer, an activation layer and a Dropout layer;

the sixth layer is a global pooling layer;

the seventh layer is a full connection layer;

c. setting a loss function as a multi-class cross entropy, wherein the formula is as follows:

wherein Loss represents Loss, n represents number of classes, yiRepresenting the true probability of the corresponding i category,representing the prediction probability of the corresponding i category.

5. The pointer instrument image data synthesis method of claim 1, wherein in step 4), the Mask R-CNN network is a deep Mask R-CNN network, and is composed of a base network, a region suggestion network, a region feature aggregation module, a classification branch, a coordinate regression branch, and a Mask branch, and the concrete conditions are as follows:

a. firstly, extracting features of an input image through a base network to obtain feature maps with different scales;

b. the regional suggestion network carries out regional suggestion, each point on the characteristic diagram generates a candidate frame with different scales, the regional suggestion network carries out rough classification and rough positioning, a large number of candidate frames are screened out based on confidence coefficient and non-maximum inhibition thought, and the rest candidate frames are sent to a subsequent network;

c. outputting the characteristic diagram areas where the candidate frames with different sizes and scales are located through a region characteristic aggregation module to obtain a characteristic diagram with a fixed size; the region feature aggregation module divides the candidate frame into a fixed number of units, the boundary of each unit is not quantized, fixed four coordinate positions are calculated in each unit, then the values of the four positions are calculated by a bilinear interpolation method, and the maximum pooling operation is carried out based on the values of the four positions;

d. taking the feature graph with fixed size as the input of a classification branch, a coordinate regression branch and a Mask branch; the classification branch outputs the feature map category in a thermal coding mode, the coordinate regression branch is used for predicting coordinate and width-height deviation values of a candidate frame and a real target area, and the Mask branch outputs a binary Mask image of a target expressed by values of 0 and 1;

the deep Mask R-CNN network is characterized in that deep convolutional neural networks are adopted by a base network and Mask branches, and ResNet-50 is adopted as a main network structure of the base network and the Mask branches.

6. The method for synthesizing image data of a pointer instrument as claimed in claim 1, wherein in step 4), a scale image is obtained according to a scale mask, and the scale image is subjected to morphological transformation to generate a new scale image, wherein the morphological transformation methods include erosion, expansion, contrast adjustment, scaling and stretching; replacing the original scale image with the new scale image, wherein the part exceeding the original scale image is covered, and the part less than the original scale image is adaptively changed according to the pixel value of the image in the nearby non-scale area; obtaining a pointer image according to the pointer mask, performing morphological transformation on the pointer image to generate a new pointer image, clockwise rotating the new pointer image within the range of 0-45 degrees, replacing the original pointer image with the rotated pointer image, and outputting a pointer scale enhancement dial image; and embedding the pointer scale enhanced dial plate image into the dial plate position of the original instrument image to complete the image data synthesis of the pointer instrument.

Technical Field

The invention relates to the technical field of image processing and neural networks, in particular to a pointer instrument image data synthesis method.

Background

The instrument is used as a monitoring device, mainly comprises a pressure instrument, a temperature instrument, a flow instrument, an electrical instrument and an electronic measuring instrument, is widely applied to various aspects of industrial production and social life, and provides great convenience for life production. Compared with an artificial classification reading method, the method has the advantages of wide application range and high classification reading efficiency, the method gradually becomes mainstream along with the development of an image processing technology and a neural network technology, the key link in the method comprises the positioning and the recognition of the instrument, and the quality of an instrument image data set has important influence on the positioning of the instrument and the training of a classification network.

At present, research and implementation of an instrument classification method mainly focuses on classification training of different types of instrument images by using a neural network, the method needs a certain amount of training set to train the neural network, the instrument images to be identified need to be trained, the image data deficiency of the different types of instruments is always a hot spot problem in the neural network, and the identification and classification performance of the neural network on the different instruments is not ideal in the absence of a high-quality data set. At present, research and implementation of a method for enhancing instrument data mainly focuses on application of a traditional image processing technology, and particularly enhances a data set through a series of image processing technologies of turning, rotating, zooming and color disturbance. With the recent rapid development of gan (genetic adaptive networks) technology and image processing technology, it has become possible to generate an image augmented data set using a neural network.

In combination with the above discussion, the pointer instrument data synthesis method with diversity has higher practical application value.

Disclosure of Invention

The invention aims to overcome the defects and shortcomings of the prior art, provides a pointer instrument image data synthesis method, mainly utilizes a neural network and an image processing technology to realize the synthesis of pointer instrument image data, thereby expanding a data set.

In order to achieve the purpose, the technical scheme provided by the invention is as follows: a pointer instrument image data synthesis method comprises the following steps:

1) marking the dial on the instrument image to construct a dial positioning data set and prepare a data set to be enhanced, loading training parameters, training a dial detection network by using the dial positioning data set, obtaining an optimal dial detection network after training is finished, inputting the instrument data set to be enhanced into the optimal dial detection network, outputting a dial image and cutting the dial image; the dial plate detection network adopts a YOLO network, and a backbone network of the YOLO network is improved into a mobile lightweight network so as to reduce network parameters and calculated amount and improve operation speed;

2) cutting the dial plate image marked in the step 1), marking the range dimension on the dial plate image, marking the position and character information of the range dimension as the marking content, constructing a range dimension data set, loading a training parameter, training a character recognition network by using the range dimension data set, obtaining an optimal character recognition network after training is finished, inputting the dial plate image output in the step 1) into the character recognition network, and outputting the character information of the range dimension on the dial plate image; the character recognition network adopts a YOLO network, and a backbone network of the YOLO network is improved into a mobile lightweight network so as to reduce network parameters and calculated amount and improve operation speed;

3) downloading character data sets of different styles on the network, respectively loading training parameters with the character data sets of different styles to train characters to generate a GAN network, obtaining a plurality of corresponding characters to generate the GAN network after training is finished, inputting the text information of the range dimension output in the step 2) into any one trained character to generate the GAN network, generating a range dimension image of a new character style, replacing the original range dimension image on the dial plate image output in the step 1), and outputting a range dimension enhanced dial plate image;

4) cutting the dial plate image marked in the step 1), marking a pointer and scales on the dial plate by using a Mask, constructing a pointer scale data set, training a Mask R-CNN network by using the pointer scale data set according to loading parameters, obtaining an optimal Mask R-CNN network after training, inputting the range dimension enhanced dial plate image output in the step 3) into the optimal Mask R-CNN network, outputting a Mask of the pointer and the scales, obtaining a scale image according to the scale Mask, performing morphological transformation on the scale image to generate a new scale image and replace the original scale image, obtaining a pointer image according to the pointer Mask, performing morphological transformation on the pointer image, rotating the pointer image to generate a new pointer image to replace the original pointer image, and outputting a pointer scale enhanced dial plate image; and embedding the pointer scale enhanced dial image into the dial position of the original instrument image to complete the image data synthesis of the pointer instrument.

Further, in the step 1), various instrument images in different environments are collected through a camera, preprocessing operations of filtering, image enhancement and graying are carried out on the instrument images, so that a dial plate positioning data set is constructed, abnormal data in the dial plate positioning data set are removed, the abnormal data include abnormal images with dirty surfaces, extreme illumination and incomplete shooting, other data are marked, and the marked content is the dial plate position.

Further, in step 1), the specific situation of the dial plate detection network is as follows:

a. constructing a feature extraction network according to the requirements of real-time performance and high precision, wherein the feature extraction network consists of a plurality of combined convolution modules, and the method specifically comprises the following steps:

the first layer is a combined convolution module A which consists of a zero filling layer, a convolution layer, a batch normalization layer and an activation layer;

the second layer is a combined convolution module B which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;

the third layer is a combined convolution module C which consists of a zero filling layer, a deep convolution layer, two batch normalization layers, two activation layers and a convolution layer;

the fourth layer is a combined convolution module B which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;

the fifth layer is a combined convolution module C which consists of a zero filling layer, a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;

the sixth layer is a combined convolution module B which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;

the seventh layer is a combined convolution module C which consists of a zero filling layer, a deep convolution layer, two batch normalization layers, two activation layers and a convolution layer;

the eighth layer is a combined convolution module D which consists of five combined convolution modules B;

the ninth layer is a combined convolution module C which consists of a zero filling layer, a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;

the tenth layer is a combined convolution module B which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;

b. constructing prediction networks for outputting and predicting targets with different sizes according to the output of different layers of the feature extraction network, wherein the prediction networks comprise a large-size target prediction network, a medium-size target prediction network and a small-size target prediction network;

b1, the input of the large-size target prediction network is the tenth layer output of the feature extraction network, the large-size target prediction network is composed of a plurality of combined convolution modules and convolution layers, and the specific steps are as follows:

the first layer is a combined convolution module D which consists of five combined convolution modules B;

the second layer is a combined convolution module B which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;

the third layer is a convolution layer;

b2, the input of the medium-size target prediction network is the eighth layer output of the feature extraction network and the first layer output of the large-size target prediction network, the medium-size target prediction network is composed of a plurality of combined convolution modules and convolution layers, and the specific steps are as follows:

the first layer is an input fusion module A, which consists of a combined convolution module B, an up-sampling layer and a tensor splicing layer;

the second layer is a combined convolution module D which consists of five combined convolution modules B;

the third layer is a combined convolution module B which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;

the fourth layer is a convolution layer;

b3, inputting the small-size target prediction network into the sixth layer output of the feature extraction network and the second layer output of the medium-size target prediction network, wherein the small-size target prediction network consists of a plurality of combined convolution modules and convolution layers, and the specific steps are as follows:

the first layer is an input fusion module A, which consists of a combined convolution module B, an up-sampling layer and a tensor splicing layer;

the second layer is a combined convolution module D which consists of five combined convolution modules B;

the third layer is a combined convolution module B which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;

the fourth layer is a convolution layer;

finally, the output of the large-size target prediction network, the medium-size target prediction network and the small-size target prediction network passes through a non-maximum value inhibition layer to obtain the positions and the types of the predicted targets;

c. setting loss functions including a central coordinate loss function, a width and height loss function, a confidence coefficient loss function and a category loss function;

the center coordinate loss function is formulated as follows:

Lossxy=markobject*(2-w*h)*Losslog(xytrue,xypredict)

in the formula, LossxyRepresenting loss of central coordinates, markobjectRepresenting whether the anchor frame has the zone bit of the object or not, w representing the width of the anchor frame, h representing the height of the anchor frame, and LosslogRepresenting a binary cross-entropy loss, xytrueRepresenting the true central coordinate value, xypredictRepresenting a predicted central coordinate value;

the broad height loss function is formulated as follows:

Losswh=0.5*markobject*(2-w*h)*(whtrue-whpredict)2

in the formula, LosswhRepresents the wide high loss, whtrueRepresents the true width and height value, whpredictRepresents a predicted aspect ratio value;

the confidence loss function is formulated as follows:

Lossconfidence=markobject*Losslog(markobject,cpredict)

+(1-markobject)*Losslog(markobject,cpredict)*markignore

in the formula, LossconfidenceRepresenting a loss of confidence, cpredictRepresenting confidence values, mark, of prediction boxesignoreA flag bit representing an anchor box with an IOU less than a threshold;

the class loss function is formulated as follows:

Losscls=markobject*Losslog(clstrue,clspredict)

in the formula, LossclsRepresents class loss, clstrueRepresenting the true class, clspredictRepresenting a prediction category;

the total loss function is formulated as follows:

Loss=(Lossxy+Losswh+Lossconfidence+Losscls)/numf

where Loss represents total Loss and numf represents the floating point number of the total number of inputs;

loading a dial plate detection network designed by training parameters, wherein the training parameters are set as follows: setting a training optimizer Adam, an initial learning rate of 0.001, a maximum training period of 500 and a batch size of 8; and setting a verification set interval detection training accuracy, marking a training completion mark as reaching a maximum training period or meeting the requirement of average cross-over ratio, and storing the network after training is completed.

Further, in step 3), the character generation GAN network is composed of a generation network and a discriminant network, and the specific conditions are as follows:

a. constructing a generating network according to the requirement of generating character images, wherein the generating network is combined by a plurality of combined convolution modules, and the method specifically comprises the following steps:

the first layer is a full connection layer;

the second layer is a reconstruction layer;

the third layer is a combined convolution module A which consists of a zero filling layer, a convolution layer, a batch normalization layer and an activation layer;

the fourth layer is an upper sampling layer;

the fifth layer is a combined convolution module A which consists of a zero filling layer, a convolution layer, a batch normalization layer and an activation layer;

the sixth layer is an upper sampling layer;

the seventh layer is a combined convolution module A which consists of a zero filling layer, a convolution layer, a batch normalization layer and an activation layer;

the eighth layer is a combined convolution module A which consists of a zero filling layer, a convolution layer, a batch normalization layer and an activation layer;

b. constructing a discriminant network according to the effectiveness of the discriminant character image, wherein the input is a generated network output image, and the discriminant network is combined by a plurality of combined convolution modules, and the method specifically comprises the following steps:

the first layer is a combined convolution module E which consists of a zero filling layer, a convolution layer, a batch normalization layer, an activation layer and a Dropout layer;

the second layer is a combined convolution module E which consists of a zero filling layer, a convolution layer, a batch normalization layer, an activation layer and a Dropout layer;

the third layer is a zero filling layer;

the fourth layer is a combined convolution module E which consists of a zero filling layer, a convolution layer, a batch normalization layer, an activation layer and a Dropout layer;

the fifth layer is a combined convolution module E which consists of a zero filling layer, a convolution layer, a batch normalization layer, an activation layer and a Dropout layer;

the sixth layer is a global pooling layer;

the seventh layer is a full connection layer;

c. setting a loss function as a multi-class cross entropy, wherein the formula is as follows:

wherein Loss represents Loss, n represents number of classes, yiRepresenting the true probability of the corresponding i category,representing the prediction probability of the corresponding i-class。

Further, in step 4), the Mask R-CNN network is a deep Mask R-CNN network, and is composed of a base network, a regional suggestion network (RPN), a regional feature aggregation module (RoIAlign), a classification branch, a coordinate regression branch, and a Mask branch, and the specific conditions are as follows:

a. firstly, extracting features of an input image through a base network to obtain feature maps with different scales;

b. RPN carries on regional suggestion, it produces the candidate frame of different scales in every point on the characteristic diagram, and carry on rough classification and rough positioning through RPN, screen out a large number of candidate frames on the basis of confidence coefficient and non-very big inhibition thought, send the surplus candidate frame into the subsequent network;

c. outputting the feature map areas where the candidate frames with different sizes and scales are located through RoIAlign to obtain a feature map with a fixed size; the method comprises the following steps that RoIAlign divides a candidate frame into a fixed number of units, the boundary of each unit is not quantized, fixed four coordinate positions are calculated in each unit, then the values of the four positions are calculated by a bilinear interpolation method, and maximum pooling operation is carried out based on the values of the four positions;

d. taking the feature graph with fixed size as the input of a classification branch, a coordinate regression branch and a Mask branch; the classification branch outputs the feature map category in a thermal coding mode, the coordinate regression branch is used for predicting coordinate and width-height deviation values of a candidate frame and a real target area, and the Mask branch outputs a binary Mask image of a target expressed by values of 0 and 1;

the deep Mask R-CNN network is characterized in that deep convolutional neural networks are adopted by a base network and Mask branches, and ResNet-50 is adopted as a main network structure of the base network and the Mask branches.

Further, in step 4), a scale image is obtained according to the scale mask, and the scale image is subjected to morphological transformation to generate a new scale image, wherein the morphological transformation method comprises corrosion, expansion, contrast adjustment, scaling and stretching; replacing the original scale image with the new scale image, wherein the part exceeding the original scale image is covered, and the part less than the original scale image is adaptively changed according to the pixel value of the image in the nearby non-scale area; obtaining a pointer image according to the pointer mask, performing morphological transformation on the pointer image to generate a new pointer image, clockwise rotating the new pointer image within the range of 0-45 degrees, replacing the original pointer image with the rotated new pointer image, and outputting a pointer scale enhancement dial image; and embedding the pointer scale enhanced dial plate image into the dial plate position of the original instrument image to complete the image data synthesis of the pointer instrument.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. the invention realizes the image data synthesis of the pointer instrument by using a method combining the neural network and the image processing, thereby expanding a data set.

2. The invention increases the meter data for detection, identification and reading by synthesizing the image data of the pointer meter, greatly increases the training and testing data of the reading system of the pointer meter, improves the capability of artificial intelligence in meter detection, identification and reading, improves the robustness of a network and effectively promotes the development of automatic meter identification.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

FIG. 2 is a schematic diagram of a combined convolution module.

FIG. 3 is a schematic diagram of an input fusion module.

FIG. 4 is a diagram of a Mask R-CNN network structure.

Fig. 5 is an overall structure diagram of the ResNet-50.

FIG. 6 is an ID block structure diagram.

FIG. 7 is a conv block structure diagram.

FIG. 8 is a block diagram showing the structure of Mask branches.

Detailed Description

The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.

As shown in fig. 1, the pointer instrument image data synthesis method provided in this embodiment includes the following steps:

1) preparing a dial plate positioning data set and a data set to be enhanced; the camera collects various instrument images in different environments, preprocessing operations such as filtering, image enhancement and graying are carried out on the dial plate images through an image processing technology, so that a dial plate positioning data set is constructed, abnormal data in the dial plate positioning data set are removed, the abnormal data include abnormal images with dirty surfaces, extreme illumination and incomplete shooting, all the other data are marked, and the marking content is the dial plate position.

And training the dial plate detection network by using the dial plate positioning data set according to the loading training parameters, obtaining the optimal dial plate detection network after training is finished, inputting the instrument data set to be enhanced into the optimal dial plate detection network, outputting a dial plate image and cutting the dial plate image.

The dial plate detection network adopts a YOLO network, and a backbone network of the YOLO network is improved into a mobile lightweight network so as to reduce network parameters and calculated amount and improve operation speed. If the activation layers in the step 1) are not additionally stated, the activation layers are all Leaky Relu activation functions, and a dial plate detection network is constructed, and the method comprises the following steps:

a. constructing a feature extraction network

And constructing a feature extraction network according to the requirements of real-time performance and high precision. The feature extraction network is mainly composed of a plurality of combined convolution modules.

The feature extraction network structure is as follows:

the input image is 416 × 416 × 3.

The first layer is the combined convolution module a, as shown in fig. 2 (a). The module first passes through the zero-padding layer with an output of 418 x 3. Then the convolution kernel is (3, 3), the step length is 2, the filter number is 32, and the output is 208 multiplied by 32.

The second layer is a combined convolution module B, as shown in fig. 2 (B). The module first goes through deep convolution, batch normalization layer and activation layer, the convolution kernel is (3, 3), the step size is 1, padding is used to make the input and output sizes consistent, and the output is 208 × 208 × 32. And after the convolution, batch normalization and activation layers, the convolution kernel is (1, 1), the step size is 1, the number of filters is 64, the input and output sizes are consistent by using filling, and the output is 208 multiplied by 64.

The third layer is a combined convolution module C, as shown in fig. 2 (C). The module first passes through the zero-padding layer and the output is 210 x 64. And then the obtained product is subjected to deep convolution, batch normalization layer and activation layer, the convolution kernel is (3, 3), the step size is 2, and the output is 104 multiplied by 64. And finally, after convolution, batch normalization layer and activation layer, the convolution kernel is (1, 1), the step size is 1, the number of filters is 128, the input and output sizes are consistent by using filling, and the output is 104 multiplied by 128.

The fourth layer is a combined convolution module B, as shown in fig. 2 (B). The module first goes through deep convolution, batch normalization layer and activation layer, the convolution kernel is (3, 3), the step size is 1, the input and output sizes are consistent by using padding, and the output is 104 × 104 × 128. And then the filter passes through a convolution layer, a batch normalization layer and an activation layer, the convolution kernel is (1, 1), the step size is 1, the number of filters is 128, the input and output sizes are consistent by using filling, and the output is 104 multiplied by 128.

The fifth layer is a combined convolution module C, as shown in fig. 2 (C). The module first passes through the zero-padding layer with an output of 106 x 128. And then the obtained product is subjected to deep convolution, batch normalization layer and activation layer, the convolution kernel is (3, 3), the step size is 2, and the output is 52 multiplied by 128. And finally, after convolution, batch normalization layer and activation layer, the convolution kernel is (1, 1), the step size is 1, the number of filters is 256, the input and output sizes are consistent by using filling, and the output is 52 multiplied by 256.

The sixth layer is a combined convolution module B, as shown in fig. 2 (B). The module first goes through deep convolution, batch normalization layer and activation layer, the convolution kernel is (3, 3), the step size is 1, padding is used to make the input and output sizes consistent, and the output is 52 × 52 × 256. And then the filter passes through a convolution layer, a batch normalization layer and an activation layer, the convolution kernel is (1, 1), the step size is 1, the number of filters is 256, the input and output sizes are consistent by using filling, and the output is 52 multiplied by 256.

The seventh layer is a combined convolution module C, as shown in fig. 2 (C). The module first passes through the zero-padding layer and the output is 54 x 256. And then the obtained product is subjected to deep convolution, batch normalization layer and activation layer, the convolution kernel is (3, 3), the step size is 2, and the output is 26 multiplied by 256. And finally, after convolution, batch normalization layer and activation layer, the convolution kernel is (1, 1), the step size is 1, the number of filters is 512, the input and output sizes are consistent by using filling, and the output is 26 multiplied by 512.

The eighth layer is a combined convolution module D, as shown in fig. 2 (D). The modules pass through five combined convolution modules B in sequence, as shown in fig. 2 (B). In each combined convolution module B, the input first passes through a deep convolution, a batch normalization layer and an activation layer, the convolution kernel is (3, 3), the step size is 1, the input and output sizes are consistent by using padding, and the output size is 26 multiplied by 512. And (3) performing convolution, batch normalization and activation, wherein the convolution kernel is (1, 1), the step size is 1, the number of filters is 512, the input and output sizes are consistent by using filling, and the output is 26 multiplied by 512. After sequentially passing through the same combined convolution module B, the output is 26 × 26 × 512.

The ninth layer is a combined convolution module C, as shown in fig. 2 (C). The module first passes through the zero-padding layer and the output is 28 x 512. And then the obtained product is subjected to deep convolution, batch normalization layer and activation layer, the convolution kernel is (3, 3), the step length is 2, and the output is 13 multiplied by 512. And finally, performing convolution, batch normalization and activation layers, wherein the convolution kernel is (1, 1), the step length is 1, the number of filters is 1024, the input and output sizes are consistent by using filling, and the output is 13 multiplied by 1024.

The tenth layer is a combined convolution module B, as shown in fig. 2 (B). The module first goes through deep convolution, batch normalization layer and activation layer, the convolution kernel is (3, 3), the step size is 1, the input and output sizes are consistent by using padding, and the output is 13 × 13 × 1024. And then the convolution, batch normalization and activation layers are carried out, the convolution kernel is (1, 1), the step length is 1, the number of filters is 1024, the input and output sizes are consistent by using filling, and the output is 13 multiplied by 1024.

b. Building a predictive network

And constructing prediction networks for outputting and predicting targets with different sizes according to the output of different layers of the feature extraction network, wherein the prediction networks comprise a large-size target prediction network, a medium-size target prediction network and a small-size target prediction network.

b1 large-size target prediction network

The input is the tenth layer output of the feature extraction network, and the large-size target prediction network mainly comprises a plurality of combined convolution modules and convolution layers.

The input image is 13 × 13 × 1024.

The large-size target prediction network structure is as follows:

the first layer is the combined convolution module D, as shown in fig. 2 (D). The modules pass through five combined convolution modules B in sequence, as shown in fig. 2 (B). In the first combined convolution module B, the input first goes through deep convolution, batch normalization layer and activation layer, the convolution kernel is (1, 1), the step size is 1, padding is used to make the input and output sizes consistent, and the output is 13 × 13 × 1024. And performing convolution, batch normalization and activation layers, wherein the convolution kernel is (1, 1), the step length is 1, the number of filters is 512, the input and output sizes are consistent by using filling, and the output is 13 multiplied by 512. In the second combined convolution module B, the input is first subjected to deep convolution, batch normalization layer and activation layer, the convolution kernel is (3, 3), the step size is 1, padding is used to make the input and output sizes consistent, and the output is 13 × 13 × 512. And then the convolution, batch normalization and activation layers are carried out, the convolution kernel is (1, 1), the step length is 1, the number of filters is 1024, the input and output sizes are consistent by using filling, and the output is 13 multiplied by 1024. After the two different parameters of the combined convolution module B are alternately input, the output is 13 multiplied by 512.

The second layer is a combined convolution module B, as shown in fig. 2 (B). The module first goes through deep convolution, batch normalization layer and activation layer, the convolution kernel is (3, 3), the step size is 1, the input and output sizes are consistent by using padding, and the output is 13 × 13 × 512. And then the convolution, batch normalization and activation layers are carried out, the convolution kernel is (1, 1), the step length is 1, the number of filters is 1024, the input and output sizes are consistent by using filling, and the output is 13 multiplied by 1024.

The third layer is a convolutional layer. The convolution kernel is (1, 1), the step size is 1, the number of filters is 255, and the output is 13 × 13 × 255.

b2 medium size target prediction network

The input is the eighth layer output of the feature extraction network and the first layer output of the large-size target prediction network, and the medium-size target prediction network mainly comprises a plurality of combined convolution modules and convolution layers.

The input images are 26 × 26 × 512 and 13 × 13 × 512.

The medium-sized target prediction network structure is as follows:

the first layer is the input fusion module, as shown in FIG. 3. The input 13 x 512 first goes through the combined convolution module B where first the deep convolution, batch normalization layer and activation layer are passed, the convolution kernel is (1, 1) and the step size is 1, padding is used to make the input and output sizes consistent and the output is 13 x 512. And performing convolution, batch normalization and activation layers, wherein the convolution kernel is (1, 1), the step length is 1, the number of filters is 512, the input and output sizes are consistent by using filling, and the output is 13 multiplied by 512. And then passes through an up-sampling layer, the sampling factor is 2, and the output is 26 multiplied by 512. Finally, the output and input are 26 × 26 × 512 through a tensor splicing layer, and the output is 26 × 26 × 1024.

The second layer is the combined convolution module D, as shown in fig. 2 (D). The modules pass through five combined convolution modules B in sequence, as shown in fig. 2 (B). In the first combined convolution module B, the input first goes through deep convolution, batch normalization layer and activation layer, the convolution kernel is (1, 1), the step size is 1, padding is used to make the input and output sizes consistent, and the output is 26 × 26 × 1024. And then the data is subjected to convolution, batch normalization and activation layers, the convolution kernel is (1, 1), the step size is 1, the number of filters is 256, the input and output sizes are consistent by using filling, and the output is 26 multiplied by 256. In the second combined convolution module B, the input is first subjected to deep convolution, batch normalization layer and activation layer, the convolution kernel is (3, 3), the step size is 1, padding is used to make the input and output sizes consistent, and the output is 26 × 26 × 256. And (3) performing convolution, batch normalization and activation, wherein the convolution kernel is (1, 1), the step size is 1, the number of filters is 512, the input and output sizes are consistent by using filling, and the output is 26 multiplied by 512. After the two different parameters of the combined convolution module B are alternately input, the output is 26 multiplied by 256.

The third layer is a combined convolution module B, as shown in fig. 2 (B). The module first goes through deep convolution, batch normalization layer and activation layer, the convolution kernel is (3, 3), the step size is 1, the input and output sizes are consistent by using padding, and the output is 26 × 26 × 256. And (3) performing convolution, batch normalization and activation, wherein the convolution kernel is (1, 1), the step size is 1, the number of filters is 512, the input and output sizes are consistent by using filling, and the output is 26 multiplied by 512.

The fourth layer is a convolutional layer. The convolution kernel is (1, 1), the step size is 1, the number of filters is 255, and the output is 26 × 26 × 255.

b3 small-size target prediction network

The input is the sixth layer output of the feature extraction network and the second layer output of the medium-size target prediction network, and the small-size target prediction network mainly comprises a plurality of combined convolution modules and convolution layers.

The input images are 52 × 52 × 256 and 26 × 26 × 256.

The small-size target prediction network structure is as follows:

the first layer is the input fusion module, as shown in FIG. 3. The input 26 × 26 × 256 first passes through the combined convolution module B, where the deep convolution, batch normalization layer and activation layer are first passed, the convolution kernel is (1, 1), the step size is 1, padding is used to make the input and output size uniform, and the output is 26 × 26 × 256. And then the data is subjected to convolution, batch normalization and activation layers, the convolution kernel is (1, 1), the step size is 1, the number of filters is 256, the input and output sizes are consistent by using filling, and the output is 26 multiplied by 256. And the sampling factor is 2 after passing through an up-sampling layer, and the output is 52 multiplied by 256. Finally, the output and input 52 × 52 × 256 go through a tensor concatenation layer, and the output is 52 × 52 × 512.

The second layer is the combined convolution module D, as shown in fig. 2 (D). The modules pass through five combined convolution modules B in sequence, as shown in fig. 2 (B). In the first combined convolution module B, the input first goes through deep convolution, batch normalization layer and activation layer, the convolution kernel is (1, 1), the step size is 1, padding is used to make the input and output size consistent, and the output is 52X 512. And then the filter passes through a convolution layer, a batch normalization layer and an activation layer, the convolution kernel is (1, 1), the step size is 1, the number of filters is 128, the input and output sizes are consistent by using filling, and the output is 52 multiplied by 128. In the second combined convolution module B, the input is first subjected to deep convolution, batch normalization layer and activation layer, the convolution kernel is (3, 3), the step size is 1, padding is used to make the input and output sizes consistent, and the output is 52X 128. And then the filter passes through a convolution layer, a batch normalization layer and an activation layer, the convolution kernel is (1, 1), the step size is 1, the number of filters is 256, the input and output sizes are consistent by using filling, and the output is 52 multiplied by 256. After the two different parameters of the combined convolution module B are alternately input, the output is 52 multiplied by 128.

The third layer is a combined convolution module B, as shown in fig. 2 (B). The module first goes through deep convolution, batch normalization layer and activation layer, the convolution kernel is (3, 3), the step size is 1, padding is used to make the input and output sizes consistent, and the output is 52 × 52 × 128. And then the filter passes through a convolution layer, a batch normalization layer and an activation layer, the convolution kernel is (1, 1), the step size is 1, the number of filters is 256, the input and output sizes are consistent by using filling, and the output is 52 multiplied by 256.

The fourth layer is a convolutional layer. The convolution kernel is (1, 1), the step size is 1, the number of filters is 255, and the output is 52 × 52 × 255.

Finally, the output 13 × 13 × 255 of the large-size target prediction network, the output 26 × 26 × 255 of the medium-size target prediction network, and the output 52 × 52 × 255 of the small-size target prediction network are passed through the non-maximum suppression layer to obtain the predicted target positions and categories.

c. Setting a loss function

And setting the loss function as a summation mean of a central coordinate loss function, a width and height loss function, a confidence coefficient loss and a category loss function. The loss function is formulated as follows:

Loss=(Lossxy+Losswh+Lossconfidence+Losscls)/numf

wherein Loss represents total Loss, LossxyRepresenting Loss of center coordinates, LosswhRepresenting a wide high Loss, LossconfidenceRepresenting Loss of confidence, LossclsRepresents class loss and numf represents the floating point number of the total number of inputs. The respective loss functions are as follows:

Lossxy=markobject*(2-w*h)*Losslog(xytrue,xypredict)

Losswh=0.5*markobject*(2-w*h)*(whtrue-whpredict)2

Lossconfidence=markobject*Losslog(markobject,cpredict)

+(1-markobject)*Losslog(markobject,cpredict)*markignore

Losscls=markobject*Losslog(clstrue,clspredict)

Wherein markobjectRepresenting whether the anchor frame has the zone bit of the object or not, w representing the width of the anchor frame, h representing the height of the anchor frame, and LosslogRepresenting a binary cross-entropy loss, xytrueRepresenting the true central coordinate value, xypredictRepresents the predicted central coordinate value, whtrueRepresents the true width and height value, whpredictRepresenting the predicted width and height values, cpredictRepresenting confidence values, mark, of prediction boxesignoreFlags, cls, representing anchor boxes whose IOU is less than a thresholdtrueRepresenting the true class, clspredictRepresenting a prediction category.

The training process comprises the following steps:

d1 setting training parameters

Setting Adam as a training optimizer, 0.001 as an initial learning rate, 500 as an iteration number, 8 as a batch size, and K means clustering on all labels generates initial prior frames (38, 29), (65, 52), (94, 87), (142, 134), (195, 69), (216, 206), (337, 320), (397, 145), (638, 569).

d2, Online data enhancement

The data enhancement is carried out on the input image, the data set is expanded, and the data enhancement method comprises the following steps: random mirror image turning, random noise adding and random contrast adjusting

d3 setting training completion flag

Setting verification set interval detection training accuracy, setting a training completion flag to reach the maximum iteration number of 500 and the accuracy meeting the requirement, and storing the network after training is completed.

2) Using the dial plate marked in the step 1), marking the range dimension on the dial plate, wherein the marked content is the position and character information of the range dimension, constructing a range dimension data set, loading training parameters, using the range dimension data set to train a character recognition network, and obtaining an optimal character recognition network after training; the character recognition network adopts a YOLO network, and a backbone network of the YOLO network is improved into a mobile lightweight network so as to reduce network parameters and calculated amount and improve operation speed. Inputting the dial plate image output in the step 1) into a character recognition network, and outputting the text information of the measuring range dimension in the dial plate.

3) Downloading character data sets of different styles on a network, respectively loading training parameters with the character data sets of different styles to train characters to generate a GAN network, obtaining a plurality of corresponding characters to generate the GAN network after training is finished, inputting the text information of the range dimension output in the step 2) into any one trained character to generate the GAN network, generating a range dimension image of a new character style, replacing the original range dimension image on the dial image output in the step 1), and outputting a range dimension synthesized dial image, namely the range dimension enhanced dial image.

The method for constructing the character generation GAN network comprises a generation network and a discriminant network, and comprises the following steps:

a. building a generative network

And constructing a generating network according to the requirement of generating the character image. The generative network is mainly composed of a plurality of combined convolution modules.

The generated network structure is as follows:

the input random noise image is 28 × 28 × 1.

The first layer is a fully connected layer.

The second layer is the reconstruction layer and the output is 7 x 32.

The third layer is a combined convolution module a, as shown in fig. 2 (a). The module first passes through the zero-padding layer and the output is 9 x 32. Then the convolution kernel is (3, 3), the step length is 1, the filter number is 64, and the output is 7 multiplied by 64.

The fourth layer is an upsampling layer and the output is 14 x 64.

The fifth layer is a combined convolution module a, as shown in fig. 2 (a). The module first passes through the zero-padding layer and the output is 16 x 64. Then the convolution kernel is (3, 3), the step length is 1, the filter number is 128, and the output is 14 multiplied by 128.

The sixth layer is an upsampled layer and the output is 28 x 128.

The seventh layer is a combined convolution module a, as shown in fig. 2 (a). The module first passes through the zero-padding layer and the output is 30 x 128. Then the convolution kernel is (3, 3), the step size is 1, the filter number is 64, and the output is 28 multiplied by 64.

The eighth layer is a combined convolution module a, as shown in fig. 2 (a). The module first passes through the zero-padding layer and the output is 30 x 64. Then the convolution kernel is (3, 3), the step length is 1, the number of filters is 1, and the output is 28 multiplied by 1.

b. Building discriminant networks

And constructing a discriminant network according to the effectiveness of the discriminant character image. The input is a generated network output picture, and the discriminant network is mainly combined by a plurality of combined convolution modules, and the method specifically comprises the following steps:

the input image is 28 × 28 × 1.

The first layer is the combined convolution module E, as shown in fig. 2 (E). The module first passes through the zero-padding layer with an output of 30 x 1. Then the convolution kernel is (3, 3), the step length is 2, the filter number is 16, and the output is 14 multiplied by 16.

The second layer is a combined convolution module E, as shown in fig. 2 (E). The module first passes through the zero-padding layer with an output of 30 x 1. Then the convolution kernel is (3, 3), the step length is 2, the number of filters is 32, and the output is 7 multiplied by 32.

The third layer is a zero-padded layer with an output of 8 x 32.

The fourth layer is a combined convolution module E, as shown in fig. 2 (E). The module first passes through the zero-padding layer and the output is 10 x 32. Then the convolution layer, the batch normalization layer and the activation layer are passed, the convolution kernel is (3, 3), the step length is 2, the filter number is 64, and the output is 4 multiplied by 64.

The fifth layer is a combined convolution module E, as shown in fig. 2 (E). The module first passes through the zero-padding layer and the output is 6 x 64. Then the convolution kernel is (3, 3), the step length is 1, the filter number is 128, and the output is 4 multiplied by 128.

The sixth layer is a global pooling layer with an output of 1 × 1 × 128.

The seventh layer is a full connection layer, the output is 1+10+ m, and m is an identifiable dimension number.

c. Setting a loss function

Setting loss function as multi-class cross entropy

Wherein Loss represents Loss, n represents number of classes, yiRepresenting the true probability of the corresponding i category,representing the prediction probability of the corresponding i category.

And converting the generated image into a three-channel image, and performing color space transformation on the image to replace the range dimension image in the original dial image.

4) Using the dial plate marked in the step 1), marking the pointer and the scale on the dial plate by a Mask, constructing a pointer scale data set, training a deep Mask R-CNN network by using the pointer scale data set for loading parameters, obtaining an optimal deep Mask R-CNN network after the training is finished, inputting the range dimension enhanced dial plate image output in the step 3) into the optimal deep Mask R-CNN network, and outputting the Mask of the pointer image and the scale image.

The deep Mask R-CNN network structure is shown in FIG. 4 and mainly comprises a base network ResNet-50, a regional recommendation network (RPN) and a regional feature aggregation module (RoIAlign), a classification branch, a coordinate regression branch and a Mask branch. In the figure, conv is a conventional convolutional layer, Softmax is a layer for sorting output, and FC is a fully connected layer. The main structure of the whole network is introduced as follows:

the overall structure of the base network ResNet-50 is shown in FIG. 5 and mainly comprises an ID block and a conv block, wherein the ID block and the conv block are mainly composed of a nonlinear activation function ReLU and a combined convolution module F, the ID block is shown in FIG. 6, and the conv block is shown in FIG. 7. In the figure, CONV2D is a conventional convolutional layer, BatchNorm is a batch normalization layer, ReLU is a nonlinear activation function, MAXPOOL is a maximum pooling layer, AVGPOOL is an average pooling layer, and FC is a full link layer.

The RPN is composed of 13 × 3, two 1 × 1 convolutional layers and a nonlinear function Softmax, and is mainly used for carrying out rough classification and coordinate regression on prior candidate frames generated on a final feature map in a base network, screening is carried out based on classification confidence degrees and overlapping degrees of rectangular frames, and a certain number of potential candidate frames are obtained and used for subsequent processing.

The RoIAlign is mainly to pool feature maps where candidate frames are located to obtain a feature map with a fixed size. The feature map in each candidate frame is firstly averagely divided into 14 multiplied by 14 cells, the boundary of each cell is not subjected to quantization operation, then fixed four coordinate positions are calculated in each cell, values of four coordinates are calculated through bilinear interpolation, and maximum pooling operation is carried out based on the values of the four coordinates.

The classification branch is composed of a 3 × 3, 1 × 1 convolution layer and an output layer Softmax, and outputs the object class and the confidence in the candidate frame. The coordinate regression branch is also composed of a 3 × 3 convolution layer, a 1 × 1 convolution layer and an output layer Softmax, and outputs coordinate and width-height deviation values between the candidate frame and the real frame. The Mask branch is used for predicting a binary Mask of a target, is a full convolution network structure, also adopts a ResNet-50 network structure, channels of the middle network layer are all 256, the number of channels of the last layer is the number of categories, here is 2, and the structural schematic diagram is shown in FIG. 8.

The method comprises the steps of firstly extracting features of an input image through ResNet-50, carrying out region suggestion on an RPN (resilient packet network) to obtain a large number of potential candidate frames, then obtaining a feature map with a fixed size in a feature map region where each candidate frame is located through RoIAlign, taking the feature map as the input of a classification branch, a coordinate regression branch and a Mask branch, obtaining a binarization Mask of a target in the Mask branch, obtaining a classification result of the target in the classification branch, and obtaining a target positioning deviation value for coordinate correction in the coordinate regression branch.

And extracting the dial scale mask. Obtaining a scale image according to a scale mask, and randomly selecting a plurality of morphological transformation methods for the scale image to generate a new scale image, wherein the morphological transformation methods comprise the following steps: erosion, dilation, contrast adjustment, scaling, stretching.

And obtaining a pointer image according to the pointer mask, performing morphological transformation on the pointer image to generate a new pointer image, clockwise rotating the new pointer image in the range of 0-45 degrees, wherein the rotation center is a central point obtained by fitting the obtained scale image, namely the pointer center, replacing the original pointer image with the rotated pointer image, and outputting a pointer scale enhancement dial plate image.

And embedding the pointer scale enhanced dial image into the dial position of the original instrument image to complete the image data synthesis of the pointer instrument.

In conclusion, after the scheme is adopted, the invention provides a new method for enhancing the image data of the pointer instrument, and the synthetic data is used as an effective means for enhancing the data, so that the problem of insufficient image data of the pointer instrument can be effectively solved, the development of automatic instrument identification is effectively promoted, and the method has actual popularization value and is worthy of popularization.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

24页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种基于图像文本的仪表检测分类方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!