Pointer instrument image data synthesis method

文档序号：1862006 发布日期：2021-11-19 浏览：5次中文

阅读说明：本技术 一种指针式仪表图像数据合成方法 (Pointer instrument image data synthesis method ) 是由田联房王昭霖杜启亮于 2021-07-28 设计创作，主要内容包括：本发明公开了一种指针式仪表图像数据合成方法,包括：1)构建表盘定位数据集和准备待增强数据集,训练表盘检测网络,使用网络输出表盘图像；2)量程量纲数据集的构建,训练字符识别网络,使用网络输出量程量纲的文字信息；3)下载不同风格字符数据集,训练字符生成GAN网络,使用网络输出新的量程量纲图像替换原量程量纲图像,输出量程量纲增强表盘图像；4)指针刻度数据集的构建,训练Mask R-CNN网络,使用网络输出指针刻度的掩模,对指针和刻度进行形态学变换,对指针进行旋转,生成新的指针图像和刻度图像,输出指针刻度增强表盘图像,嵌入原仪表图像表盘位置。本发明具有多样性、灵活性以及在不同背景下有更好的泛化能力,扩增可供读数仪表数据。(The invention discloses a pointer instrument image data synthesis method, which comprises the following steps: 1) constructing a dial plate positioning data set and a data set to be enhanced, training a dial plate detection network, and outputting a dial plate image by using the network; 2) constructing a range dimension data set, training a character recognition network, and outputting text information of the range dimension by using the network; 3) downloading character data sets of different styles, training characters to generate a GAN network, outputting a new range dimension image to replace an original range dimension image by using the network, and outputting a range dimension enhanced dial image; 4) constructing a pointer scale data set, training a Mask R-CNN network, using a Mask of network output pointer scales, performing morphological transformation on a pointer and the scales, rotating the pointer to generate a new pointer image and a new scale image, outputting a pointer scale enhanced dial image, and embedding the pointer scale enhanced dial image into the position of an original instrument image dial. The invention has diversity, flexibility and better generalization ability under different backgrounds, and can amplify data of a reading instrument.)

1. A pointer instrument image data synthesis method is characterized by comprising the following steps:

1) marking dial plate images on instrument images to construct a dial plate positioning data set and prepare a data set to be enhanced, loading training parameters, training a dial plate detection network by using the dial plate positioning data set, obtaining an optimal dial plate detection network after training is finished, inputting the instrument data set to be enhanced into the optimal dial plate detection network, outputting the dial plate images and cutting the dial plate images; the dial plate detection network adopts a YOLO network, and a backbone network of the YOLO network is improved into a mobile lightweight network so as to reduce network parameters and calculated amount and improve operation speed;

2) cutting the dial plate image marked in the step 1), marking the range dimension on the dial plate image, marking the position and character information of the range dimension as the marking content, constructing a range dimension data set, loading a training parameter, training a character recognition network by using the range dimension data set, obtaining an optimal character recognition network after training is finished, inputting the dial plate image output in the step 1) into the character recognition network, and outputting the character information of the range dimension on the dial plate image; the character recognition network adopts a YOLO network, and a backbone network of the YOLO network is improved into a mobile lightweight network so as to reduce network parameters and calculated amount and improve operation speed;

3) downloading character data sets of different styles on the network, respectively loading training parameters with the character data sets of different styles to train characters to generate a GAN network, obtaining a plurality of corresponding characters to generate the GAN network after training is finished, inputting the text information of the range dimension output in the step 2) into any one trained character to generate the GAN network, generating a range dimension image of a new character style, replacing the original range dimension image on the dial plate image output in the step 1), and outputting a range dimension enhanced dial plate image;

4) cutting the dial plate image marked in the step 1), marking a pointer and scales on the dial plate by using a Mask, constructing a pointer scale data set, training a Mask R-CNN network by using the pointer scale data set according to loading parameters, obtaining an optimal Mask R-CNN network after training, inputting the range dimension enhanced dial plate image output in the step 3) into the optimal Mask R-CNN network, outputting a Mask of the pointer and the scales, obtaining a scale image according to the scale Mask, performing morphological transformation on the scale image to generate a new scale image and replace the original scale image, obtaining a pointer image according to the pointer Mask, performing morphological transformation on the pointer image, rotating the pointer image to generate a new pointer image to replace the original pointer image, and outputting a pointer scale enhanced dial plate image; and embedding the pointer scale enhanced dial image into the dial position of the original instrument image to complete the image data synthesis of the pointer instrument.

2. The method for synthesizing image data of pointer instrument as claimed in claim 1, wherein in step 1), the camera collects various instrument images in different environments, and the instrument images are preprocessed by filtering, image enhancement and graying, so as to construct a dial positioning data set, then abnormal data in the dial positioning data set is removed, including abnormal images with dirty surface, extreme illumination and incomplete shooting, and then the rest data are labeled, wherein the labeled content is the dial position.

3. The method for synthesizing image data of a pointer instrument as claimed in claim 1, wherein in step 1), the specific conditions of the dial detection network are as follows:

a. constructing a feature extraction network according to the requirements of real-time performance and high precision, wherein the feature extraction network consists of a plurality of combined convolution modules, and the method specifically comprises the following steps:

the first layer is a combined convolution module A which consists of a zero filling layer, a convolution layer, a batch normalization layer and an activation layer;

the second layer is a combined convolution module B which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;

the third layer is a combined convolution module C which consists of a zero filling layer, a deep convolution layer, two batch normalization layers, two activation layers and a convolution layer;

the fourth layer is a combined convolution module B which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;

the fifth layer is a combined convolution module C which consists of a zero filling layer, a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;

the sixth layer is a combined convolution module B which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;

the seventh layer is a combined convolution module C which consists of a zero filling layer, a deep convolution layer, two batch normalization layers, two activation layers and a convolution layer;

the eighth layer is a combined convolution module D which consists of five combined convolution modules B;

the ninth layer is a combined convolution module C which consists of a zero filling layer, a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;

the tenth layer is a combined convolution module B which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;

b. constructing prediction networks for outputting and predicting targets with different sizes according to the output of different layers of the feature extraction network, wherein the prediction networks comprise a large-size target prediction network, a medium-size target prediction network and a small-size target prediction network;

b1, the input of the large-size target prediction network is the tenth layer output of the feature extraction network, the large-size target prediction network is composed of a plurality of combined convolution modules and convolution layers, and the specific steps are as follows:

the first layer is a combined convolution module D which consists of five combined convolution modules B;

the second layer is a combined convolution module B which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;

the third layer is a convolution layer;

b2, the input of the medium-size target prediction network is the eighth layer output of the feature extraction network and the first layer output of the large-size target prediction network, the medium-size target prediction network is composed of a plurality of combined convolution modules and convolution layers, and the specific steps are as follows:

the first layer is an input fusion module A, which consists of a combined convolution module B, an up-sampling layer and a tensor splicing layer;

the second layer is a combined convolution module D which consists of five combined convolution modules B;

the third layer is a combined convolution module B which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;

the fourth layer is a convolution layer;

b3, inputting the small-size target prediction network into the sixth layer output of the feature extraction network and the second layer output of the medium-size target prediction network, wherein the small-size target prediction network consists of a plurality of combined convolution modules and convolution layers, and the specific steps are as follows:

the first layer is an input fusion module A, which consists of a combined convolution module B, an up-sampling layer and a tensor splicing layer;

the second layer is a combined convolution module D which consists of five combined convolution modules B;

the third layer is a combined convolution module B which consists of a depth convolution layer, two batch normalization layers, two activation layers and a convolution layer;

the fourth layer is a convolution layer;

finally, the output of the large-size target prediction network, the medium-size target prediction network and the small-size target prediction network passes through a non-maximum value inhibition layer to obtain the positions and the types of the predicted targets;

c. setting loss functions including a central coordinate loss function, a width and height loss function, a confidence coefficient loss function and a category loss function;

the center coordinate loss function is formulated as follows:

Loss_xy＝mark_object*(2-w*h)*Loss_log(xy_true,xy_predict)

in the formula, Loss_xyRepresenting loss of central coordinates, mark_objectRepresenting whether the anchor point frame has the mark bit of the object or not, w representing the width of the anchor point frame, h representing the anchor pointHigh, Loss of frame_logRepresenting a binary cross-entropy loss, xy_trueRepresenting the true central coordinate value, xy_predictRepresenting a predicted central coordinate value;

the broad height loss function is formulated as follows:

Loss_wh＝0.5*mark_object*(2-w*h)*(wh_true-wh_predict)²

in the formula, Loss_whRepresents the wide high loss, wh_trueRepresents the true width and height value, wh_predictRepresents a predicted aspect ratio value;

the confidence loss function is formulated as follows:

Loss_confidence＝mark_object*Loss_log(mark_object,c_predict)+(1-mark_object)*Loss_log(mark_object,c_predict)*mark_ignore

in the formula, Loss_confidenceRepresenting a loss of confidence, c_predictRepresenting confidence values, mark, of prediction boxes_ignoreA flag bit representing an anchor box with an IOU less than a threshold;

the class loss function is formulated as follows:

Loss_cls＝mark_object*Loss_log(cls_true,cls_predict)

in the formula, Loss_clsRepresents class loss, cls_trueRepresenting the true class, cls_predictRepresenting a prediction category;

the total loss function is formulated as follows:

Loss＝(Loss_xy+Loss_wh+Loss_confidence+Loss_cls)/numf

where Loss represents total Loss and numf represents the floating point number of the total number of inputs;

loading a dial plate detection network designed by training parameters, wherein the training parameters are set as follows: setting a training optimizer Adam, an initial learning rate of 0.001, a maximum training period of 500 and a batch size of 8; and setting a verification set interval detection training accuracy, marking a training completion mark as reaching a maximum training period or meeting the requirement of average cross-over ratio, and storing the network after training is completed.

4. The method for synthesizing image data of a pointer instrument as claimed in claim 1, wherein in step 3), the character generation GAN network is composed of a generation network and a discriminant network, and the specific conditions are as follows:

a. constructing a generating network according to the requirement of generating character images, wherein the generating network is combined by a plurality of combined convolution modules, and the method specifically comprises the following steps:

the first layer is a full connection layer;

the second layer is a reconstruction layer;

the third layer is a combined convolution module A which consists of a zero filling layer, a convolution layer, a batch normalization layer and an activation layer;

the fourth layer is an upper sampling layer;

the fifth layer is a combined convolution module A which consists of a zero filling layer, a convolution layer, a batch normalization layer and an activation layer;

the sixth layer is an upper sampling layer;

the seventh layer is a combined convolution module A which consists of a zero filling layer, a convolution layer, a batch normalization layer and an activation layer;

the eighth layer is a combined convolution module A which consists of a zero filling layer, a convolution layer, a batch normalization layer and an activation layer;

b. constructing a discriminant network according to the effectiveness of the discriminant character image, wherein the input is a generated network output image, and the discriminant network is combined by a plurality of combined convolution modules, and the method specifically comprises the following steps:

the first layer is a combined convolution module E which consists of a zero filling layer, a convolution layer, a batch normalization layer, an activation layer and a Dropout layer;

the second layer is a combined convolution module E which consists of a zero filling layer, a convolution layer, a batch normalization layer, an activation layer and a Dropout layer;

the third layer is a zero filling layer;

the fourth layer is a combined convolution module E which consists of a zero filling layer, a convolution layer, a batch normalization layer, an activation layer and a Dropout layer;

the fifth layer is a combined convolution module E which consists of a zero filling layer, a convolution layer, a batch normalization layer, an activation layer and a Dropout layer;

the sixth layer is a global pooling layer;

the seventh layer is a full connection layer;

c. setting a loss function as a multi-class cross entropy, wherein the formula is as follows:

wherein Loss represents Loss, n represents number of classes, y_iRepresenting the true probability of the corresponding i category,representing the prediction probability of the corresponding i category.

5. The pointer instrument image data synthesis method of claim 1, wherein in step 4), the Mask R-CNN network is a deep Mask R-CNN network, and is composed of a base network, a region suggestion network, a region feature aggregation module, a classification branch, a coordinate regression branch, and a Mask branch, and the concrete conditions are as follows:

a. firstly, extracting features of an input image through a base network to obtain feature maps with different scales;

b. the regional suggestion network carries out regional suggestion, each point on the characteristic diagram generates a candidate frame with different scales, the regional suggestion network carries out rough classification and rough positioning, a large number of candidate frames are screened out based on confidence coefficient and non-maximum inhibition thought, and the rest candidate frames are sent to a subsequent network;

c. outputting the characteristic diagram areas where the candidate frames with different sizes and scales are located through a region characteristic aggregation module to obtain a characteristic diagram with a fixed size; the region feature aggregation module divides the candidate frame into a fixed number of units, the boundary of each unit is not quantized, fixed four coordinate positions are calculated in each unit, then the values of the four positions are calculated by a bilinear interpolation method, and the maximum pooling operation is carried out based on the values of the four positions;

d. taking the feature graph with fixed size as the input of a classification branch, a coordinate regression branch and a Mask branch; the classification branch outputs the feature map category in a thermal coding mode, the coordinate regression branch is used for predicting coordinate and width-height deviation values of a candidate frame and a real target area, and the Mask branch outputs a binary Mask image of a target expressed by values of 0 and 1;

the deep Mask R-CNN network is characterized in that deep convolutional neural networks are adopted by a base network and Mask branches, and ResNet-50 is adopted as a main network structure of the base network and the Mask branches.

6. The method for synthesizing image data of a pointer instrument as claimed in claim 1, wherein in step 4), a scale image is obtained according to a scale mask, and the scale image is subjected to morphological transformation to generate a new scale image, wherein the morphological transformation methods include erosion, expansion, contrast adjustment, scaling and stretching; replacing the original scale image with the new scale image, wherein the part exceeding the original scale image is covered, and the part less than the original scale image is adaptively changed according to the pixel value of the image in the nearby non-scale area; obtaining a pointer image according to the pointer mask, performing morphological transformation on the pointer image to generate a new pointer image, clockwise rotating the new pointer image within the range of 0-45 degrees, replacing the original pointer image with the rotated pointer image, and outputting a pointer scale enhancement dial image; and embedding the pointer scale enhanced dial plate image into the dial plate position of the original instrument image to complete the image data synthesis of the pointer instrument.

Technical Field

The invention relates to the technical field of image processing and neural networks, in particular to a pointer instrument image data synthesis method.

Background

The instrument is used as a monitoring device, mainly comprises a pressure instrument, a temperature instrument, a flow instrument, an electrical instrument and an electronic measuring instrument, is widely applied to various aspects of industrial production and social life, and provides great convenience for life production. Compared with an artificial classification reading method, the method has the advantages of wide application range and high classification reading efficiency, the method gradually becomes mainstream along with the development of an image processing technology and a neural network technology, the key link in the method comprises the positioning and the recognition of the instrument, and the quality of an instrument image data set has important influence on the positioning of the instrument and the training of a classification network.

At present, research and implementation of an instrument classification method mainly focuses on classification training of different types of instrument images by using a neural network, the method needs a certain amount of training set to train the neural network, the instrument images to be identified need to be trained, the image data deficiency of the different types of instruments is always a hot spot problem in the neural network, and the identification and classification performance of the neural network on the different instruments is not ideal in the absence of a high-quality data set. At present, research and implementation of a method for enhancing instrument data mainly focuses on application of a traditional image processing technology, and particularly enhances a data set through a series of image processing technologies of turning, rotating, zooming and color disturbance. With the recent rapid development of gan (genetic adaptive networks) technology and image processing technology, it has become possible to generate an image augmented data set using a neural network.

In combination with the above discussion, the pointer instrument data synthesis method with diversity has higher practical application value.

Disclosure of Invention

The invention aims to overcome the defects and shortcomings of the prior art, provides a pointer instrument image data synthesis method, mainly utilizes a neural network and an image processing technology to realize the synthesis of pointer instrument image data, thereby expanding a data set.

In order to achieve the purpose, the technical scheme provided by the invention is as follows: a pointer instrument image data synthesis method comprises the following steps:

1) marking the dial on the instrument image to construct a dial positioning data set and prepare a data set to be enhanced, loading training parameters, training a dial detection network by using the dial positioning data set, obtaining an optimal dial detection network after training is finished, inputting the instrument data set to be enhanced into the optimal dial detection network, outputting a dial image and cutting the dial image; the dial plate detection network adopts a YOLO network, and a backbone network of the YOLO network is improved into a mobile lightweight network so as to reduce network parameters and calculated amount and improve operation speed;

Further, in the step 1), various instrument images in different environments are collected through a camera, preprocessing operations of filtering, image enhancement and graying are carried out on the instrument images, so that a dial plate positioning data set is constructed, abnormal data in the dial plate positioning data set are removed, the abnormal data include abnormal images with dirty surfaces, extreme illumination and incomplete shooting, other data are marked, and the marked content is the dial plate position.

Further, in step 1), the specific situation of the dial plate detection network is as follows: