End-to-end method for text detection target extraction relation based on deep neural network

文档序号:1379196 发布日期:2020-08-14 浏览:3次 中文

阅读说明:本技术 一种基于深度神经网络的文本检测目标提取关系的端到端方法 (End-to-end method for text detection target extraction relation based on deep neural network ) 是由 丛建亭 侯进 黄贤俊 于 2020-04-28 设计创作,主要内容包括:本发明公开了一种基于深度神经网络的文本检测目标提取关系的端到端方法,属于计算机视觉技术领域。所述方法在现有的基于深度神经网络的二阶段的目标检测算法的第二阶段结构流程中增加检测目标的匹配关系判断模块,从而实现检测目标是否匹配的训练和预测。通过将文字结构化提取做到深度学习网络结构中,实现端到端的提取功能,从而达到节省提取功能的维护成本目的。本发明所述基于深度神经网络的方法,在文本检测中能够实现端到端的训练和预测,能够达到鲁棒性强,无需再使用规则库进行关系提取,从而降低提取关系资源开发和维护成本。(The invention discloses an end-to-end method for extracting a target extraction relation based on a text detection of a deep neural network, and belongs to the technical field of computer vision. According to the method, a matching relation judgment module of the detection target is added in a second-stage structural flow of the existing two-stage target detection algorithm based on the deep neural network, so that training and prediction of whether the detection target is matched or not are achieved. The character structured extraction is carried out in a deep learning network structure, and an end-to-end extraction function is realized, so that the aim of saving the maintenance cost of the extraction function is fulfilled. The method based on the deep neural network can realize end-to-end training and prediction in text detection, can achieve strong robustness, does not need to use a rule base for relation extraction, and reduces the development and maintenance cost of extracting relation resources.)

1. An end-to-end method for extracting a relation of a text detection target based on a deep neural network is characterized in that a matching relation judgment module of the detection target is added in a second-stage structure process of the existing two-stage target detection algorithm based on the deep neural network, so that training and prediction of whether the detection target is matched or not are realized.

2. The end-to-end method for extracting the relation of the text detection target as claimed in claim 1, wherein the matching relation judgment module of the detection target only judges the matching relation between the foreground roi feature sequences.

3. The end-to-end method of text detection target extraction relationship as claimed in claim 2, wherein the method for determining the foreground roi feature sequence and the background roi feature sequence specifically comprises: the position coordinates of the roi intersect with the iou of the true position, and the iou intersection is higher than 0.50, the foreground is determined, and the background is determined when the iou intersection is lower than 0.50.

4. The end-to-end method for extracting relationship of text detection target according to claim 1, wherein the matching relationship determining module of the detection target specifically comprises the following procedures:

(1) obtaining foreground roi sequence characteristics;

(2) connecting any two foreground roi features;

(3) judging whether the two connected foreground roi characteristics have a matching relation or not based on the labeling truth value, if so, setting the trained class label to be 1, and if not, setting the trained class label to be 0;

(4) and (4) passing the connected feature vectors through a feature extraction network, and performing classification and identification, thereby realizing the judgment of the foreground roi features of any two splices.

5. The end-to-end method for extracting target in text detection according to claim 4, wherein in the step (2), each roi feature shape is 1 × 1024, and the two features are concat connected, so that the connected feature vector shape is 1 × 2048.

6. The end-to-end method of text detection target extraction relationship of claim 1, wherein the text detection base framework is an arbitrary two-stage target detection algorithm.

7. The end-to-end method for extracting target in text detection according to claim 1, wherein the text detection basic framework is one of fast RCNN, R2CNN, mask RCNN.

8. The end-to-end method for extracting relationship of text detection target according to claim 1, wherein a matching relationship determination module for detecting target is added in the RCNN network structure flow of the second stage based on the fast RCNN framework.

9. The end-to-end method of extracting a relationship of a text detection target according to claim 8, characterized in that the specific flow is as follows:

(1) inputting an image;

(2) the first stage is as follows: extracting a target candidate region through an RPN (resilient packet network) to generate a roi characteristic sequence;

(3) and a second stage: and the roi characteristic sequence pushed by the RPN in the first stage enters the RCNN in the second stage to separate a foreground roi characteristic sequence and a background roi characteristic sequence, and then the matching relation between the foreground roi characteristic sequences is judged by a text detection target matching relation judgment module.

10. The end-to-end method for extracting a target from a text according to claim 9, specifically comprising the steps of:

(1) inputting an image;

(2) the first stage is as follows: extracting a target candidate region through an RPN (resilient packet network) to generate a roi characteristic sequence;

(3) and a second stage: the roi feature sequence pushed by the RPN in the first stage enters the RCNN in the second stage, and is determined by the position coordinate of roi and the iou intersection of the true position to separate the foreground roi feature sequence and the background roi feature sequence, wherein the foreground roi feature sequence and the background roi feature sequence are determined when the iou intersection is higher than 0.50, and the background is determined when the iou intersection is lower than 0.50; then randomly screening out two foreground roi features, wherein the shape of each roi feature is 1 × 1024, and concat connection is carried out on the two features, so that the shape of the connected feature vector is 1 × 2048; for any two spliced foreground roi features, judging whether the two foreground roi features have a matching relation based on the labeling truth value, if so, setting the trained class label to be 1, and if not, setting the trained class label to be 0; and (3) passing the feature vector in the step (2) through a full-connection or convolution layer feature extraction network, then sending the feature vector into softmax for classification and identification, and finally outputting a text detection result.

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to an end-to-end method for text detection target extraction relation based on a deep neural network.

Technical Field

OCR (Optical Character Recognition) refers to a process in which an electronic device (e.g., a scanner or a digital camera) examines characters printed on paper and then translates the text image into computer text using a Character Recognition method. At present, deep learning makes great progress in the field of picture recognition. Compared with the traditional picture identification method, the method has the advantages that low-level visual characteristics such as colors, HOG and the like are used; deep neural networks can learn more advanced, abstract features, which makes the performance of deep neural networks far superior to traditional approaches. In particular, since 2014, deep learning has begun to produce excellent results in the fields of object detection, object segmentation, and the like, and a series of methods such as deep lab, YOLO, fast RCNN, and the like are developed, so that the recognition accuracy rate exceeds the level of human recognition on a specific task, and the method is used in a large scale in a production environment. However, in the prior art, after character recognition, structured extraction is usually performed on a recognition result, and most of the structured extraction functions are implemented by establishing a rule base, for example, enumeration based on prior knowledge and template rules, and generally such methods need to write a large amount of codes for specific problems, and are high in development and maintenance cost, poor in generalization performance, not robust enough, and poor in maintainability.

Disclosure of Invention

Aiming at the technical problems, the invention provides an end-to-end method for text detection target extraction relation based on a deep neural network, which realizes an end-to-end extraction function by extracting characters in a structured manner in a deep learning network structure, thereby achieving the aim of saving the maintenance cost of the extraction function.

The invention comprises the following technical scheme:

an end-to-end method for extracting a relation of a text detection target based on a deep neural network is characterized in that a matching relation judgment module of the detection target is added in a second-stage structural flow of the existing two-stage target detection algorithm based on the deep neural network, so that training and prediction of whether the detection target is matched or not are realized. The method realizes end-to-end training and prediction of the text target matching relationship by explicitly labeling the text target matching relationship, has better robustness, and simultaneously has very low maintenance cost.

As an optional mode, in the end-to-end method for extracting a relationship of a text detection target, the matching relationship determination module of the detection target only determines a matching relationship between foreground roi (region of interest) feature sequences.

As an optional mode, in the end-to-end method for extracting a relationship between text detection targets, the method for determining the foreground roi feature sequence and the background roi feature sequence specifically includes: the position coordinates of the roi intersect with the iou of the true position, and the iou intersection is higher than 0.50, the foreground is determined, and the background is determined when the iou intersection is lower than 0.50. Wherein, iou (interaction-Over-Union) is defined as: overlap degree of two rectangular boxes (bounding box), overlap degree iou calculation method of rectangular box A, B: iou ═ B)/(atob), the ratio of the overlapping areas of the A, B rectangular frames to the union of their areas.

As an optional mode, in the end-to-end method for extracting a relationship of a text detection target, the matching relationship determining module of the detection target specifically includes the following procedures:

(1) obtaining foreground roi sequence characteristics;

(2) connecting any two foreground roi features;

(3) judging whether the two connected foreground roi characteristics have a matching relation or not based on the labeling truth value, if so, setting the trained class label to be 1, and if not, setting the trained class label to be 0;

(4) and (4) passing the connected feature vectors through a feature extraction network, and performing classification and identification, thereby realizing the judgment of the foreground roi features of any two splices.

Alternatively, in the end-to-end method for extracting a target from a text, in step (2), each roi feature shape is 1 × 1024, and the two features are concat-connected, so that a connected feature vector shape is 1 × 2048.

Alternatively, in the end-to-end method for extracting a target from a text, in step (2), each roi feature shape is 1 × 512, and the two features are concat-connected, so that a connected feature vector shape is 1 × 1024.

Alternatively, in the end-to-end method for extracting a target from a text, in step (2), each roi feature shape is 1 × 2048, and the two features are concat-connected, so that a connected feature vector shape is 1 × 4096.

Optionally, in the end-to-end method for extracting a target in text detection, the text detection basic framework is any two-stage target detection algorithm, such as any one of fast RCNN, R2CNN, and mask _ RCNN.

Optionally, in the end-to-end method for extracting a relationship between text detection targets, based on the fast RCNN framework, a matching relationship determination module for detecting targets is added in the RCNN network structure flow of the second stage.

As an optional mode, in the end-to-end method for extracting a target extraction relationship in text detection, the specific flow is as follows:

(1) inputting an image;

(2) the first stage is as follows: extracting a target candidate Region through an RPN (Region generation Network) to generate a roi characteristic sequence;

(3) and a second stage: and the roi characteristic sequence pushed by the RPN in the first stage enters the RCNN in the second stage to separate a foreground roi characteristic sequence and a background roi characteristic sequence, and then the matching relation between the foreground roi characteristic sequences is judged by a text detection target matching relation judgment module.

As an optional mode, in the end-to-end method for extracting a relationship of a text detection target, the method specifically includes the following steps:

(1) inputting an image;

(2) the first stage is as follows: extracting a target candidate region through an RPN (resilient packet network) to generate a roi characteristic sequence;

(3) and a second stage: the roi feature sequence pushed by the RPN in the first stage enters the RCNN in the second stage, and is determined by the position coordinate of roi and the iou intersection of the true position to separate the foreground roi feature sequence and the background roi feature sequence, wherein the foreground roi feature sequence and the background roi feature sequence are determined when the iou intersection is higher than 0.50, and the background is determined when the iou intersection is lower than 0.50; then randomly screening out two foreground roi features, wherein the shape of each roi feature is 1 × 1024, and concat connection is carried out on the two features, so that the shape of the connected feature vector is 1 × 2048; for any two spliced foreground roi features, judging whether the two foreground roi features have a matching relation based on the labeling truth value, if so, setting the trained class label to be 1, and if not, setting the trained class label to be 0; and (3) passing the feature vector in the step (2) through a full-connection or convolution layer feature extraction network, then sending the feature vector into softmax classification and identification, and finally outputting a text target matching relation judgment result.

All of the features disclosed in this specification, or all of the steps in any method or process so disclosed, may be combined in any combination, except combinations of features and/or steps that are mutually exclusive.

The invention has the beneficial effects that:

the method based on the deep neural network can realize end-to-end training and prediction in text detection, can achieve strong robustness, does not need to use a rule base for relation extraction, and reduces the development and maintenance cost of extracting relation resources.

Description of the drawings:

fig. 1 is a schematic diagram of a network structure of fast RCNN used in embodiment 1 of the present invention;

fig. 2 is a schematic diagram of adding a text detection target matching relationship determination module to an RCNN network structure in embodiment 1 of the present invention;

FIG. 3 is a schematic diagram of a process for implementing the roi foreground target relationship determination module;

the specific implementation mode is as follows:

the present invention will be described in further detail with reference to the following examples. This should not be understood as limiting the scope of the above-described subject matter of the present invention to the following examples. Any modification made without departing from the spirit and principle of the present invention and equivalent replacement or improvement made by the common knowledge and conventional means in the field shall be included in the protection scope of the present invention.

9页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种基于深度神经网络的文本识别训练优化方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!