Text recognition training optimization method based on deep neural network

文档序号：1379197 发布日期：2020-08-14 浏览：2次中文

阅读说明：本技术 一种基于深度神经网络的文本识别训练优化方法 (Text recognition training optimization method based on deep neural network ) 是由夏路遥侯进黄贤俊于 2020-04-28 设计创作，主要内容包括：本发明公开了一种基于深度神经网络的文本识别训练优化方法,属于计算机视觉技术领域。本发明所述训练方法,通过降低连续错误或连续正确的次数较多的样本加入训练的概率,可以降低人工标注错误带来的影响,同时提高低频词出现的频率,并对训练样本的总量做了筛选,可以更快的让模型收敛,降低训练耗时。可以提高识别模型的准确率。(The invention discloses a text recognition training optimization method based on a deep neural network, and belongs to the technical field of computer vision. According to the training method, by reducing the probability that samples with more continuous errors or continuous correct times are added into training, the influence caused by manual marking errors can be reduced, the frequency of low-frequency words is improved, the total amount of training samples is screened, the model convergence can be faster, and the training time consumption is reduced. The accuracy of the recognition model can be improved.)

1. A text recognition training optimization method based on a deep neural network is characterized by comprising the following steps:

(1) reading training data from a stored cache when training is started;

(2) judging the training state of the current training data, wherein the training state is the continuous recognition error times or the continuous recognition correct times, the value range is from 10 continuous errors to 10 continuous correct times, and the 10 times of the error exceeding 10 times are recorded;

(3) determining the probability of adding training next time to the corresponding data according to the principle that the more times of continuous errors or continuous correctness is, the lower the probability of adding training next time is;

(4) training the data according to the probability of the next training adding determined in the step (3);

(5) and carrying out multiple rounds of training to obtain the sample distribution after training optimization.

2. The method for optimizing text recognition training according to claim 1, wherein the method for determining the probability of adding corresponding data to training next time in the step (3) specifically comprises: when the continuous error or continuous correct times are less than n, the probability of the corresponding data added into the training next time is 100%, when the continuous error or continuous correct times are more than or equal to n, the probability of the corresponding data added into the training next time is reduced by half in sequence along with the increase of the continuous error or continuous correct times, and the value of n is 3, 4, 5, 6 or 7.

3. The method for optimizing text recognition training according to claim 1, wherein the method for determining the probability of adding corresponding data to training next time in the step (3) specifically comprises: when the number of continuous errors is less than 4, the probability of adding corresponding data into training next time is 100%, and when the number of continuous errors or the number of continuous correct errors is greater than or equal to 4, the probability of adding corresponding data into training next time is reduced by half in sequence along with the increase of the number of continuous errors; when the number of consecutive corrections is less than 7,

the probability of the corresponding data to be added into the training next time is 100%, and when the continuous error or continuous correct times are greater than or equal to 7, the probability of the corresponding data to be added into the training next time is reduced by half in sequence along with the increase of the continuous error times.

4. The method for optimizing text recognition training according to claim 1, wherein the method for determining the probability of adding corresponding data to training next time in the step (3) specifically comprises: when the continuous error or continuous correct times are less than n, the probability of the corresponding data to be added into the training next time is 100%, when the continuous error or continuous correct times are more than or equal to n, the probability of the corresponding data to be added into the training next time is decreased by 10% along with the increase of the continuous error or continuous correct times, and the value of n is 7 or 8.

5. The method for optimizing text recognition training according to claim 1, wherein in the step (4), a three-bit decimal between 0 and 1 is randomly generated, if the obtained numerical value is greater than the probability that the corresponding data is added to the training next time, the data is not subjected to the training of the current round, if the obtained numerical value is less than or equal to the probability that the corresponding data is added to the training next time, the corresponding data is added to the training of the current round, and the training state of the data is updated according to whether the trained data is correctly recognized or not.

6. A text recognition method based on a deep neural network is characterized by comprising the following steps:

(1) manually marking the original text, and marking a text region and text content;

(2) establishing a text detection model, and training the text detection model by using the text region data labeled in the step (1);

(3) establishing a text recognition model, and training the text recognition model by using the text recognition training optimization method of claim 1 in combination with the text recognition content labeled in step (1).

7. The text recognition method of claim 6, wherein the text detection model is one of FasterRCNN, SSD, YOLO-v2, EAST, RRCNN, TextBoxes, CTPN.

8. The method according to claim 6, wherein the text detection model is FasterRCNN, and comprises the following steps:

1) extracting abstract features of the text picture by a deep convolutional network;

2) recommending a candidate text region using the region candidate network;

3) and returning the accurate region of the bill from the candidate region.

9. The text recognition method of claim 6, wherein the text recognition model is a CNN + RNN model.

10. The text recognition method of claim 6, wherein the text recognition model specifically comprises the steps of:

1) extracting ear tag picture features by using a convolutional network;

2) inputting the above features into a bidirectional recurrent neural network formed by LSTM;

3) and combining the overlapped characters and the placeholders by using a CTC algorithm, and outputting the character sequence with the maximum probability.

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to a text detection and identification technology.

Technical Field

OCR (Optical Character Recognition) refers to a process in which an electronic device (e.g., a scanner or a digital camera) examines characters printed on paper and then translates the text image into computer text using a Character Recognition method. At present, many companies have a need for converting pictures into electronic data, such as a large number of text pictures of reimbursement bills or personal certificates provided by users into related data required in the system, and the technology of converting pictures into electronic data relies on OCR. The current OCR technology is divided into two modules of detection and identification, wherein the detection module is responsible for detecting a character area, and the identification module is responsible for cutting the detected area and identifying the area as corresponding characters.

In the existing OCR technology, a large amount of text labeling data is needed for training a recognition model, and the text labeling data usually has the problems of manual labeling error, different person labeling difference in fuzzy fields and the like, so that the training effect of the recognition model is influenced; and the accuracy of the recognition model itself can be affected by the distribution of the raw data samples: the accuracy rate of a large amount of texts is high, but the accuracy rate of the texts with low frequency is low, the training of the recognition model is long, and the phenomenon that the texts are mistakenly recognized as high-frequency similar words is easy to occur. The invention mainly solves the problem caused by uneven distribution of data samples.

Disclosure of Invention

Aiming at the technical problems, the invention provides a text recognition training optimization method based on a deep neural network, and the training method can reduce the influence caused by manual labeling errors, improve the occurrence frequency of low-frequency words, screen the total amount of training samples, more quickly converge a model and reduce the training time. The accuracy of the recognition model can be improved.

The invention comprises the following technical scheme:

a text recognition training optimization method based on a deep neural network comprises the following steps:

(1) reading training data from a stored cache when training is started;

(4) training the data according to the probability of the next training adding determined in the step (3);

(5) and carrying out multiple rounds of training to obtain the sample distribution after training optimization.

In the text recognition training optimization method, along with the increase of the training times, the adding probability of data with higher continuous recognition error times or continuous recognition correct times is continuously reduced; the probability of adding samples that are easy to identify correctly (continuously correct) is continuously reduced, and the probability of adding samples that are easy to identify incorrectly (continuously incorrect) is also reduced. Samples that are easy to correct, typically high frequency words, reduce the probability of addition, and may reduce overfitting from such samples. Samples that are always wrong, often mislabeled or the samples themselves are blurred, and the samples have negative effects on the model itself. Samples that are left that often switch states between correct and incorrect are those that we wish to train to improve accuracy.

Alternatively, in the above text recognition training optimization method, the turn is over 100.

As an optional mode, in the text recognition training optimization method, the method for determining the probability of adding corresponding data to training next time in step (3) specifically includes: when the number of continuous errors is less than 4, the probability of adding training to the corresponding data next time is 100%, and when the number of continuous errors is greater than or equal to 4, the probability of adding training to the corresponding data next time is reduced by half in sequence along with the increase of the number of continuous errors; when the continuous correct times are less than 7, the probability of adding corresponding data into training next time is 100%, and when the continuous correct times are greater than or equal to 7, the probability of adding corresponding data into training next time is reduced by half in sequence along with the increase of the continuous correct times.

As an optional mode, in the text recognition training optimization method, the method for determining the probability of adding corresponding data to training next time in step (3) specifically includes: when the number of continuous errors is less than 6, the probability of adding training for the next time corresponding to the data is 100%, when the number of continuous errors is 6, the probability of adding training for the next time corresponding to the data is 85%, and when the number of continuous errors is 7, 8, 9 and 10, the probability of adding training for the next time corresponding to the data is 70%, 55%, 40% and 25% respectively; when the number of continuous correctness is less than 5, the probability of adding corresponding data into training next time is 100%, when the number of continuous correctness is equal to 5, the probability of adding corresponding data into training next time is 85%, and when the number of continuous errors is 6, 7, 8, 9, 10, the probability of adding corresponding data into training next time is 70%, 55%, 40%, 25%, 10%, respectively.

In the text recognition training optimization method, the method for determining the probability of adding corresponding data into training next time in the step (3) is not limited to the above optional manners, and only needs to be in accordance with the lower the sample adding probability that the number of times of continuous correctness or continuous errors is larger.

Optionally, in the text recognition training optimization method, in the step (4), a three-bit decimal of 0 to 1 is randomly generated, if the obtained numerical value is greater than the probability that the corresponding data is added to the training next time, the data is not subjected to the training of the current round, if the obtained numerical value is less than or equal to the probability that the corresponding data is added to the training next time, the corresponding data is added to the training of the current round, and the training state of the data is updated according to whether the trained data is correctly recognized.

The invention also provides a text recognition method based on the deep neural network, which comprises the following steps:

(1) manually marking the original text, and marking a text region and text content;

(2) establishing a text detection model, and training the text detection model by using the text region data labeled in the step (1);

Alternatively, in the text recognition method, the text detection model may use any one of general target detection algorithms such as fasternn, SSD, YOLO-v2, or may use a post-algorithm specifically optimized for text detection: such as: EAST, RRCNN, TextBoxes, CTPN, etc.

Optionally, in the text recognition method, the text detection model is fast RCNN, and specifically includes the following steps:

1) extracting text picture abstract features (feature maps) by a deep convolutional network (conv layers);

2) recommending a candidate text region using the region candidate network;

3) and returning the accurate region of the bill from the candidate region.

The model is mainly based on the detection of the currently developed deep convolutional neural network, the mature fast RCNN framework is improved at present, and the basic version of the framework has higher accuracy rate for larger objects. The flow of the framework is as follows: I. extracting features of pictures, II, enumerating a large number of rectangles to try to return corresponding objects, III, dividing the enumerated rectangles into 2 types: and (D) cutting the positive sample from the feature map, and then regressing the boundary of the target according to the feature map.

Alternatively, in the text recognition method, the text recognition model is a CNN + RNN model. The model identifies character strings through a deep circulation network, combines CNN and RNN, extracts image characteristics from the CNN, transversely slices a characteristic graph, infers texts by adopting an LSTM circulation network, and calculates the difference between predicted character strings and labels by adopting a CTC loss function to finish end-to-end training.

As an alternative, in the text recognition method, the text recognition model specifically includes the following steps:

1) extracting ear tag picture features by using a convolutional network;

2) inputting the above features into a bidirectional recurrent neural network formed by LSTM;

3) and combining the overlapped characters and the placeholders by using a CTC algorithm, and outputting the character sequence with the maximum probability.

All of the features disclosed in this specification, or all of the steps in any method or process so disclosed, may be combined in any combination, except combinations of features and/or steps that are mutually exclusive.

The invention has the beneficial effects that:

the training method can reduce the influence caused by manual labeling errors, improve the frequency of low-frequency words, screen the total amount of training samples, and enable the model to be converged more quickly, thereby reducing the training time. The accuracy of the recognition model can be improved.

Description of the drawings:

fig. 1 is a schematic structural diagram of an intelligent warehousing accessory video identification retrieval system according to the present invention;

FIG. 2 is a flowchart illustrating the operation of the intelligent warehousing accessory video identification retrieval system according to the present invention;

FIG. 3 is a schematic structural view of the data acquisition front-end smart wearable glasses of the present invention;

fig. 4 is a schematic view of the box label and the accessory label according to the present invention.

The specific implementation mode is as follows:

the following is a detailed description of the present invention by taking text detection and recognition of a bill as an example. It should not be understood that the scope of the above-described subject matter of the present invention is limited to the following examples, and the methods of the present invention may be applied to detection and recognition of text other than documents. Any modification made without departing from the spirit and principle of the present invention and equivalent replacement or improvement made by the common knowledge and conventional means in the field shall be included in the protection scope of the present invention.

11页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种无人售货装置

Text recognition training optimization method based on deep neural network

相关技术

网友询问留言