Method for automatically detecting test paper layout character line

文档序号:1545155 发布日期:2020-01-17 浏览:4次 中文

阅读说明:本技术 一种试卷版面文字行自动检测的方法 (Method for automatically detecting test paper layout character line ) 是由 严军峰 闫琦 陈家海 叶家鸣 吴波 于 2019-09-19 设计创作,主要内容包括:本发明涉及图像目标检测技术领域,且公开了一种试卷版面文字行自动检测的方法,该系统基于改进后的PixelLink网络架构设计,引入PMDT算法金字塔标签生成思想,主要包含仿真数据生成、金字塔标签生成、图片特征提取与融合以及特征层预测部分;该试卷版面文字行自动检测的方法在原算法基础上,对文本、非文本分类预测进行了改进,引入PMDT算法思想,像素点0或1的文本、非文本预测改为[0-1]之间的区间预测,从而解决该算法在字符间距较大时,文字行多段检测等问题,本发明主要包含以下步骤:数据仿真、数据预处理、网络训练、模型输出,利用本发明,可提升试卷版面文字行自动检测的精度和召回率。(The invention relates to the technical field of image target detection and discloses a method for automatically detecting text lines on a test paper layout.A system is designed based on an improved PixelLink network architecture, introduces a concept of generating a pyramid label of a PMDT algorithm, and mainly comprises a simulation data generation part, a pyramid label generation part, a picture feature extraction and fusion part and a feature layer prediction part; the method for automatically detecting the test paper layout character line improves the classified prediction of texts and non-texts on the basis of the original algorithm, introduces the idea of a PMDT algorithm, changes the prediction of the texts and the non-texts of pixel points 0 or 1 into the interval prediction between [0-1], thereby solving the problems of character line multi-section detection and the like when the character interval of the algorithm is larger, and the method mainly comprises the following steps: the invention can improve the precision and recall rate of automatic test paper layout character line detection.)

1. A method for automatically detecting test paper layout character lines is characterized in that: the system is designed based on an improved PixelLink network architecture, introduces a PMDT algorithm pyramid tag generation idea, and mainly comprises simulation data generation, pyramid tag generation, picture feature extraction and fusion and feature layer prediction parts.

2. The method of claim 1, wherein the method comprises the steps of: the simulation data generation is specifically described as follows: the method comprises the steps of simulating training data in batches by using a programming language, controlling a generated sample format and a generated sample total amount by a simulation program through setting of built-in parameters, randomly selecting a white background picture in the simulation process, sequentially writing text line information on the background picture according to different test paper layouts, recording written coordinate position information, randomly selecting text line intervals between 10 pixels and 25 pixels, randomly selecting text line information fonts including common print forms and ttf fonts in individual handwriting formats to ensure that a model has good robustness, simulating 60 ten thousand pieces of training data for training by the simulation program, wherein 5 ten thousand pieces of training data are used for a test set, the other 5 ten thousand pieces of training data are used for a verification set, and the comparison effect on the verification set takes the text line detection precision and recall rate as indexes.

3. The method of claim 1, wherein the method comprises the steps of: the pyramid tag generation is specifically described as: the method uses a pyramid label generation process based on a PMDT algorithm, and the basic idea is to change the way of using a grountritth with 1 pixel point and 0 other pixel points in a text box of the original PixelLink algorithm, wherein the grountritth is [0, 1]]Referring to fig. 2, the PMDT refers to the center of the text box as the pyramid top, the value of the point is 1, the pyramid bottom is the edge marked by the text area, the value of each triangle edge in the graph is obtained by linear interpolation, and the PMDT defines the score value of the pixel score of any point in the text box as follows: for a given four points A (x)a,ya),B(xb,yb),C(xc,yc),D(xd,yd) For a point P (x) in the framep,yp) Value of (c) scorepIs calculated as follows: first, the pyramid tip (the center point of the frame where the character line is located) is calculated as xo=(xa+xb+xc+xd)/4,yo=(ya+yb+yc+yd) (ii)/4, for each MOMNWhere M and N are any two of the four points A, B, C, D, i.e. region ROAB、ROBC、ROCD、RODAWherein, in the step (A),can be decomposed using the following formula:

Figure FDA0002206818450000021

4. The method of claim 1, wherein the method comprises the steps of: the picture feature extraction and fusion is specifically described as follows: the main network vgg extracts features and performs feature fusion, the original PixelLink network extracts the features by using a vgg network, the four extracted features are subjected to convolution fusion in the size of 1x1, feature graphs with the number of channels being 16 are output, the output feature graphs and an upper layer feature graph are spliced and sampled, and finally the feature graphs with the size being the original graph 1/4 are output for prediction.

5. The method of claim 1, wherein the method comprises the steps of: the feature layer prediction is specifically described as: and (2) predicting by using the fused feature map, directly performing two convolutions with the size of 1x1 on the fused feature map by using an original PixelLink network, obtaining a feature map with the channel number of 2 for pixel point text and non-text pixel prediction, obtaining a feature map with the channel number of 16 for link judgment of judging whether pixel points are connected with surrounding 8 direction pixels, judging whether pixel points and surrounding 8 direction pixels by using different score thresholds in the prediction process, marking the pixel points meeting the pixel score threshold requirement as text regions, and obtaining text connected regions according to the link thresholds. After the fusion feature map is obtained, a feature map with the channel number of 16 is obtained through 1x1 convolution and is directly used for link judgment, at the moment, since pixel point pixel score values are between 0 and 1, a post-processing thought of the PMDT is used here to obtain all pixel points with the pixel points larger than 0.1, then the points are subjected to plane clustering, the part of post-processing operation is changed into a PMDT algorithm thought, and finally, the coordinate information of each detected text line region is output by combining the link points.

6. The method of claim 1, wherein the method comprises the steps of: the method comprises the following specific steps:

step one, simulating training data: the method is oriented to automatic detection of test paper layout character lines, a large number of pictures with character line position marking information are needed for training a test paper layout character line detection model, and the practical situation is that manual marking of all character line position coordinate information in the existing test paper layout is difficult to achieve, and the marking process is slow, so that the situations of marking errors and the like are difficult to avoid. Therefore, a large amount of picture data close to the real test paper layout needs to be simulated through a program, and meanwhile, the character line labeling information is carried by the program. Simulating 60 pieces of training data for training through a simulation program, wherein 5 pieces of training data are used for a test set, the other 5 pieces of training data are used for a verification set, and the comparison effect on the verification set takes the character line detection precision and the recall rate as indexes;

step two, data preprocessing: the position information of each layout coordinate in the simulated test paper is written into a txt file, the file is stored according to the form of the position coordinates [ xmin, ymin, xmax, ymax ] of the character lines in the layout, and when a plurality of character lines exist in the layout of each simulated test paper, the position information of each character line coordinate is sequentially added according to the format;

step three, training a neural network: integrating the network structure according to the above description architecture to generate a new test paper layout character line automatic detection algorithm, wherein the network integrally adopts an end-to-end training mode, and the network hyper-parameters are set as follows:

(1) and learning rate: the initial learning rate was set to 0.01, a 10% reduction per 10 rounds of training;

(2) and an optimizer: using adam or sgd optimizer (implementation process is decided according to model training condition);

(3) and the other: the size of the batchsize is set to be 8, the size is related to the video memory capacity, the total number of training rounds is 200, and the training process randomly rotates the training data between 45 degrees and 45 degrees;

step four, model prediction output: loading the trained model, respectively carrying out model prediction in 1000 pieces of real and simulation data, calculating and verifying the detection precision and recall rate of the character line in the set, and carrying out evaluation analysis on the model.

Technical Field

The invention relates to the technical field of image target detection, in particular to a method for automatically detecting test paper layout character lines.

Background

The character line detection is used as an important step of scene character recognition, is widely applied in the OCR field, and the detection performance directly influences the recognition accuracy. In the traditional character line detection, character lines are connected into a large connected domain by corrosion and expansion in opencv, character line position information is obtained by a method of searching the connected domain in a picture, or character lines are segmented, and then single characters are identified. In recent years, methods for detecting lines of characters by using deep learning techniques have been developed, and good detection effects have been obtained. The main objective of the line detection algorithm usually applied in the print layout recognition is to detect the line region of a rectangular frame pattern, the test paper layout is a special print format, the line of characters in the whole test paper layout is in nonstandard arrangement due to the inclination of the shooting or scanning process, and the line detection algorithm outputting the rectangular frame format cannot well deal with the situation. The character line detection algorithm in a natural scene can output four-point coordinates of a character line area and position character lines in a quadrilateral format, but when the distance between characters is large, the problem of character line multi-section detection exists, and a whole line of characters cannot be completely detected, for example, the PixelLink algorithm. Aiming at the problems, the idea of the PMDT algorithm is introduced into the PixelLink algorithm, and a novel method special for automatically detecting the text lines on the layout of the test paper is provided.

At present, many automatic detection methods for test paper layout text lines are based on existing target detection methods, such as yolo, ssd, PixelLink, Maskrcnn, and the like. Yolo needs to specify anchor, and anchor needs to be obtained by clustering training samples. Yolo and ssd only adapt to the condition that the test paper layout is not inclined, and when the shooting or scanning test paper is inclined at a certain angle, the algorithm has many false detections, and the requirement on input samples is high. The PixelLink algorithm outputs a quadrangular text area, even if the layout of an input test paper is inclined to a certain degree, the PixelLink algorithm can output four point coordinates of each character line, and the character lines can be aligned through perspective transformation after processing. However, when the character spacing of the character line is large, the algorithm cannot completely position the whole character line area, only can detect character areas in multiple sections in a scattered manner, and brings large time-consuming operation for post-processing, the Maskrcnn algorithm firstly detects the character line area and then separates the character line from the background on the basis of detecting a frame, the final effect depends on the quality of the character frame detection, the PMDT algorithm is improved aiming at the defects of the Maskrcnn, the pixel value of the text frame is changed from two classifications of 0 or 1 into an interval value between [0 and 1], and a good effect is achieved.

Most of the existing automatic detection of test paper layout character lines adopts the modes of yolo, ssd, PixelLink, maskrnnn and the like, and the modes have limitations and have higher requirements on the placing position of the test paper and the character line and character spacing.

Disclosure of Invention

Technical problem to be solved

Aiming at the defects of the prior art, the invention provides a method for automatically detecting the text lines on the layout of a test paper, which solves the problem that the time consumption is consumed by multi-section detection and manual marking of the text line information when the character spacing is larger in the PixelLink algorithm.

(II) technical scheme

In order to achieve the purpose, the invention provides the following technical scheme: a method for automatically detecting test paper layout character lines is characterized in that a system is designed based on an improved PixelLink network architecture, introduces a PMDT algorithm pyramid label generation idea, and mainly comprises a simulation data generation part, a pyramid label generation part, a picture feature extraction and fusion part and a feature layer prediction part.

Preferably, the simulation data generation is specifically described as: the method comprises the steps of simulating training data in batches by using a programming language, controlling a generated sample format and a generated sample total amount by a simulation program through setting of built-in parameters, randomly selecting a white background picture in the simulation process, sequentially writing text line information on the background picture according to different test paper layouts, recording written coordinate position information, randomly selecting text line intervals between 10 pixels and 25 pixels, randomly selecting text line information fonts including common print forms and ttf fonts in individual handwriting formats to ensure that a model has good robustness, simulating 60 ten thousand pieces of training data for training by the simulation program, wherein 5 ten thousand pieces of training data are used for a test set, the other 5 ten thousand pieces of training data are used for a verification set, and the comparison effect on the verification set takes the text line detection precision and recall rate as indexes.

Preferably, the pyramid tag generation is specifically described as: the method uses a pyramid label generation process based on a PMDT algorithm, and the basic idea is to change the way of using a grountritth with 1 pixel point and 0 other pixel points in a text box of the original PixelLink algorithm, wherein the grountritth is [0, 1]]Referring to fig. 2, the PMDT refers to the center of the text box as the pyramid top, the value of the point is 1, the pyramid bottom is the edge marked by the text area, the value of each triangle edge in the graph is obtained by linear interpolation, and the PMDT defines the score value of the pixel score of any point in the text box as follows: for a given four points A (x)a,ya),B(xb,yb),C(xc,yc)

Figure BDA0002206818460000034

For a point P (x) in the framep,yp) Value of (c) scorepIs calculated as follows: first, the pyramid tip (the center point of the frame where the character line is located) is calculated as xo=(xa+xb+xc+xd)/4,yo=(ya+yb+yc+yd) (ii)/4, for each MOMNWhere M and N are any two of the four points A, B, C, D, i.e. region ROAB、ROBC、ROCD、RODAWherein, in the step (A),can be decomposed using the following formula:

Figure BDA0002206818460000032

thus, α, β therein can be obtained by

Figure BDA0002206818460000033

Because the point P is in the R region, alpha and beta meet the condition that alpha is more than or equal to 0, or beta is more than or equal to 0, and the value of the point P is scorepMax (1- (α + β),0), from which the pixel value of the pixel point in each text line region is calculated between 0 and 1.

Preferably, the image feature extraction and fusion is specifically described as: the main network vgg extracts features and performs feature fusion, the original PixelLink network extracts the features by using a vgg network, the four extracted features are subjected to convolution fusion in the size of 1x1, feature graphs with the number of channels being 16 are output, the output feature graphs and an upper layer feature graph are spliced and sampled, and finally the feature graphs with the size being the original graph 1/4 are output for prediction.

Preferably, the feature layer prediction is specifically described as: and (3) using the fused feature map for prediction, directly performing two convolutions with the size of 1x1 on the fused feature map by the original PixelLink network, and obtaining the feature map with the channel number of 2 for pixel point text and non-text pixel prediction. In addition, a characteristic graph with 16 channels is obtained and used for judging whether pixel points are connected with surrounding 8 directional pixels or not, pixel and link are judged by using different score thresholds in the prediction process, finally pixel points meeting the requirement of pixel score thresholds are marked as text areas, and text connected areas are obtained according to the link thresholds. After the fusion feature map is obtained, a feature map with the channel number of 16 is obtained through 1x1 convolution and is directly used for link judgment, at the moment, since pixel point pixel score values are between 0 and 1, a post-processing thought of the PMDT is used here to obtain all pixel points with the pixel points larger than 0.1, then the points are subjected to plane clustering, the part of post-processing operation is changed into a PMDT algorithm thought, and finally, the coordinate information of each detected text line region is output by combining the link points.

Preferably, the method comprises the following specific steps:

step one, simulating training data: the method is oriented to automatic detection of test paper layout character lines, a large number of pictures with character line position marking information are needed for test paper layout character line detection model training, and the practical situation is that manual marking of position coordinate information of all character lines in the existing test paper layout is difficult to achieve, the marking process is slow, marking errors are inevitable, and the like, so that a large number of picture data close to the real test paper layout need to be simulated through a program, and meanwhile, the test paper layout character line marking information is carried by the test paper layout character line detection model training. Simulating 60 pieces of training data for training through a simulation program, wherein 5 pieces of training data are used for a test set, the other 5 pieces of training data are used for a verification set, and the comparison effect on the verification set takes the character line detection precision and the recall rate as indexes;

step two, data preprocessing: the position information of each layout coordinate in the simulated test paper is written into a txt file, the file is stored according to the form of the position coordinates [ xmin, ymin, xmax, ymax ] of the character lines in the layout, and when a plurality of character lines exist in the layout of each simulated test paper, the position information of each character line coordinate is sequentially added according to the format;

step three, training a neural network: integrating the network structure according to the above description architecture to generate a new test paper layout character line automatic detection algorithm, wherein the network integrally adopts an end-to-end training mode, and the network hyper-parameters are set as follows:

(1) and learning rate: the initial learning rate was set to 0.01, a 10% reduction per 10 rounds of training;

(2) and an optimizer: using adam or sgd optimizer (implementation process is decided according to model training condition);

(3) and the other: the size of the batchsize is set to be 8, the size is related to the video memory capacity, the total number of training rounds is 200, and the training process randomly rotates the training data between 45 degrees and 45 degrees;

step four, model prediction output: loading the trained model, respectively carrying out model prediction in 1000 pieces of real and simulation data, calculating and verifying the detection precision and recall rate of the character line in the set, and carrying out evaluation analysis on the model.

(III) advantageous effects

The invention provides a method for automatically detecting test paper layout lines, which has the following beneficial effects:

aiming at the current situation, the invention provides a method for automatically detecting the text lines on the test paper layout, which mainly introduces the core idea of the PMDT into the PixelLink algorithm and provides a new algorithm specially aiming at the automatic detection of the text lines on the test paper layout, and the new algorithm considers the shape information of the text lines more, so that the detection result is more compact, and the problem that the time consumption of multi-section detection and manual marking of the text line information is caused when the character spacing is larger in the PixelLink algorithm is solved.

Drawings

FIG. 1 is a flow chart of the overall implementation of the present invention;

FIG. 2 is a schematic diagram of a pyramid tag generation structure of the PMDT algorithm in the overall implementation flow;

fig. 3 is a flow chart of a backbone network of the PixelLink algorithm in the overall implementation flow.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1-3, the present invention provides a technical solution: a method for automatically detecting test paper layout character lines comprises a data simulation part, a pyramid label generation part, a picture feature extraction and fusion part and a feature layer prediction part, wherein the method comprises the following steps:

a data simulation part: the method mainly describes a training data making thought used in the method, and the basic thought is that training data are simulated in batches by using a programming language, a simulation program controls a generated sample format and the total amount of samples by setting built-in parameters, a white background picture is randomly selected in the simulation process, text line information is sequentially written in the background picture according to different test paper layouts, written coordinate position information is recorded, and the text line spacing is randomly selected from 10-25 pixels. To ensure that the model has good robustness, the line information fonts are randomly selected, including ttf fonts in common print and individual script formats. 60 pieces of training data are simulated through a simulation program for training, wherein 5 pieces of training data are used for a test set, the other 5 pieces of training data are used for a verification set, and the comparison effect on the verification set takes the character line detection precision and the recall rate as indexes.

The pyramid tag generation section: the pyramid tag generation process based on the PMDT algorithm used in the method is mainly described, the basic idea is to change the way of grouping the original PixelLink algorithm that the pixel point in the text box is 1 and other pixel points are 0, and the grouping is [0, 1]]The values in the interval, group route, refer to fig. 2. The center of the text box is called as the pyramid top by the PMDT, the value of the point is 1, the bottom of the pyramid is the edge marked by the character area, the value of each triangle edge in the graph is obtained by linear interpolation, and the score value of the pixel score of the PMDT on any point in the character box is defined as follows: for a given four points A (x)a,ya),B(xb,yb),C(xc,yc),D(xd,yd) For a point P (x) in the framep,yp) Value of (c) scorepIs calculated as follows:

first, the pyramid tip (the center point of the frame where the character line is located) is calculated as xo=(xa+xb+xc+xd)/4,yo=(ya+yb+yc+yd) (ii)/4, for each MOMNWhere M and N are any two of the four points A, B, C, D, i.e. region ROAB、ROBC、ROCD、RODAWherein, in the step (A),

Figure BDA0002206818460000071

can be decomposed using the following formula:

Figure BDA0002206818460000072

thus, α, β therein can be obtained by

Figure BDA0002206818460000073

Because the point P is in the R region, alpha and beta meet the condition that alpha is more than or equal to 0, or beta is more than or equal to 0, and the value of the point P is scorepMax (1- (α + β),0), from which the pixel value of the pixel point in each text line region is calculated between 0 and 1.

The picture feature extraction and fusion part comprises: the method mainly describes how features are extracted by a backbone network and feature fusion is carried out, the original PixelLink network extracts the features by using vgg networks, the extracted four features respectively correspond to the network structure shown in FIG. 3, after each layer of network is convolved by the size of 1x1, feature maps with the number of channels being 16 are output, the output feature maps and upper layer feature maps are spliced and sampled, and finally, the feature maps with the size of the original image 1/4 are output for prediction.

The characteristic layer prediction part: mainly describing how to use the feature map after fusion for prediction, an original PixelLink network directly performs two convolutions with the size of 1x1 on the fusion feature map respectively, obtains a feature map with the channel number of 2 for pixel text and non-text pixel prediction, obtains a feature map with the channel number of 16 for link judgment of judging whether pixel points are connected with surrounding 8 direction pixels, determines pixel and link by using different score thresholds in the prediction process, marks pixel points meeting the requirement of pixel score threshold as text areas, obtains text connected areas according to the link threshold, obtains a feature map with the channel number of 16 for link judgment directly after obtaining the fusion feature map through 1x1 convolution, obtains all pixel points with the channel number of 0.1 by using a post-processing thought of PMDT at the moment because the pixel point score value is between 0 and 1, and then performs plane clustering on the points, and finally, outputting the coordinate information of each detected text line area by combining link points.

A method for automatically detecting test paper layout character lines comprises the following steps:

step one, simulating training data: the method is oriented to test paper layout character line automatic detection, a large number of pictures with character line position marking information are needed for test paper layout character line detection model training, the practical situation is that manual marking of all character line position coordinate information in the existing test paper layout is difficult to achieve, the marking process is slow, marking errors are inevitable and the like, therefore, a large number of picture data close to the real test paper layout need to be simulated through a program, meanwhile, the test paper layout with character line marking information simulates 60 pieces of training data for training through a simulation program, 5 pieces of the training data are used for a test set, the other 5 pieces of the training data are used for a verification set, and the comparison effect on the verification set takes character line detection precision and recall rate as indexes.

Step two, data preprocessing: and writing the coordinate position information of each layout in the simulated test paper into a txt file, wherein the file is stored according to the form of character line position coordinates [ xmin, ymin, xmax, ymax ] in the layout, and when a plurality of character lines exist in the layout of each simulated test paper, the coordinate position information of each character line is sequentially added according to the format.

Step three, training a neural network: integrating the network structure according to the above description architecture to generate a new test paper layout character line automatic detection algorithm, wherein the network integrally adopts an end-to-end mode and is set as follows:

(1) and learning rate: the initial learning rate was set to 0.01, a 10% reduction per 10 rounds of training;

(2) and an optimizer: using adam or sgd optimizer (implementation process is decided according to model training condition);

(3) and the other: the size of the batchsize is set to be 8, the size is related to the video memory capacity, the total number of training rounds is 200, and the training process randomly rotates the training data between 45 degrees and 45 degrees;

step four, model prediction output: and loading the trained model, performing model prediction in 1000 pieces of real and simulation data respectively, and calculating and verifying the detection precision and recall rate of the character lines in the set. And performing evaluation analysis on the model.

Aiming at the test paper image, the invention realizes the automatic detection of the lines of characters in the test paper by a deep learning method, outputs the position coordinate information of all the lines of characters in the test paper, and lays a foundation for the construction of a large-scale test paper database and the automatic transcription of the test paper.

In conclusion, the invention provides a method for automatically detecting the text lines on the test paper layout for the current situation, which mainly introduces the core idea of the PMDT into the PixelLink algorithm and provides a new algorithm specially aiming at the automatic detection of the text lines on the test paper layout, wherein the new algorithm takes the shape information of the text lines into more consideration, so that the detection result is more compact, and the problem that the time consumption of multi-section detection and manual marking of the text line information is caused when the character spacing is larger in the PixelLink algorithm is solved.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

11页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:汉字识别纠错方法、装置、计算机可读介质及电子设备

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!