Paper text image processing method and device and electronic equipment

文档序号：1956886 发布日期：2021-12-10 浏览：13次中文

阅读说明：本技术 一种纸质文本图像处理方法、装置及电子设备 (Paper text image processing method and device and electronic equipment ) 是由郭彦军郝志军刘子强于 2021-09-10 设计创作，主要内容包括：本发明公开了一种纸质文本图像处理方法、装置及电子设备,包括：获取纸质文本图像；对所述纸质文本图像进行文本识别得到所述纸质文本图像中所包含的文本的定位框以及定位框的坐标信息；根据所述定位框的坐标信息,对所述定位框内文本进行分类识别；当所述定位框内文本中的文本为反向文本,对识别到的反向文本响应清除操作。通过识别纸质文本图像将图像中的反向文本去除,使得转化后的文本更加清晰,提高了纸质文本数字化转换过程中的效率。(The invention discloses a paper text image processing method, a paper text image processing device and electronic equipment, wherein the paper text image processing method comprises the following steps: acquiring a paper text image; performing text recognition on the paper text image to obtain a positioning box of a text contained in the paper text image and coordinate information of the positioning box; classifying and identifying texts in the positioning frame according to the coordinate information of the positioning frame; and when the text in the positioning box is the reverse text, responding to the recognized reverse text by clearing operation. The reverse text in the image is removed by identifying the paper text image, so that the converted text is clearer, and the efficiency of the paper text in the digital conversion process is improved.)

1. A paper text image processing method is characterized by comprising the following steps:

acquiring a paper text image;

performing text recognition on the paper text image to obtain a positioning box of a text contained in the paper text image and coordinate information of the positioning box;

classifying and identifying texts in the positioning frame according to the coordinate information of the positioning frame;

and when the text in the positioning box is the reverse text, responding to the recognized reverse text by clearing operation.

2. The method of claim 1, wherein when the text message contains reverse text, the method further comprises, after responding to a clearing operation to the identified reverse text:

performing high-definition processing on the paper text image subjected to the clearing operation to obtain a high-definition paper text image;

when the paper text image contains a picture, determining the position information of the picture in the high-definition paper text image;

and replacing the picture in the high-definition paper text image with the picture in the paper text image according to the position information.

3. The method of claim 1, wherein before performing text recognition on the paper text image to obtain a positioning box of text included in the paper text image and coordinate information of the positioning box, the method further comprises:

and performing deviation rectification and edge cutting processing based on the paper text image.

4. The method according to claim 1 or 3, wherein the performing text recognition on the paper text image to obtain a positioning box of a text included in the paper text image and coordinate information of the positioning box comprises:

reducing the paper text image to obtain a reduced paper text image;

performing feature extraction on the reduced paper text image to obtain a feature map;

carrying out sliding window feature interception on the feature graph to obtain a feature sub-graph;

and obtaining a feature vector of the paper text image based on the feature subgraph, and obtaining the coordinate information of the positioning frame of the text according to the feature vector.

5. The method of claim 4, wherein deriving feature vectors for paper text images based on the feature subgraph comprises:

inputting the characteristic subgraph into a recurrent neural network to obtain the sequence characteristics of the characteristic subgraph;

inputting the sequence features of the feature subgraph into a recurrent neural network again to carry out reduction operation to obtain a reduced feature subgraph;

and inputting the reduced characteristic subgraph into a full-connection layer to obtain a characteristic vector containing all information.

6. The method of claim 1, wherein when the text in the go-to box is reverse text, responding to a clear operation to the recognized reverse text comprises:

amplifying the positioning frame of the reverse text meeting the target condition;

and removing the reverse text in the enlarged positioning box.

7. A paper-based text image processing apparatus, comprising:

the acquisition module is used for acquiring a paper text image;

the recognition module is used for performing text recognition on the paper text image to obtain a positioning box of a text contained in the paper text image and coordinate information of the positioning box;

the classification module is used for classifying and identifying the text in the positioning frame according to the coordinate information of the positioning frame;

and the clearing module is used for responding to clearing operation on the recognized reverse text when the text in the positioning box is the reverse text.

8. The apparatus of claim 7, further comprising:

the high-definition processing module is used for performing high-definition processing on the paper text image subjected to the clearing operation to obtain a high-definition paper text image;

the positioning module is used for determining the position information of the picture in the high-definition paper text image when the paper text image contains the picture;

and the replacing module is used for replacing the picture in the high-definition paper text image by using the picture in the paper text image according to the position information.

9. An electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the steps of the paper text image processing method as claimed in any one of claims 1-6.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the paper text image processing method according to any one of claims 1 to 6.

Technical Field

The invention relates to the technical field of image recognition, in particular to a paper text image processing method and device and electronic equipment.

Background

With the development of the internet, the efficiency and the safety of storage, management, retrieval and the like can be obviously improved by a digital storage management method. In the process of digitally converting paper documents, the situation of various paper documents is extremely complicated, for example, some paper documents have the problem that information on the back of the paper may penetrate into the front of the paper due to long storage time, so that reverse text is formed on the front of the paper. Because the digitization of the file has a strict standard, a great deal of manpower and time are needed to ensure that the information of the paper file in the digitization process is complete and clear in the digitization process, and the digitization efficiency is seriously influenced.

Disclosure of Invention

Therefore, the technical problem to be solved by the present invention is to overcome the defect that the interference of the font on the back side between the paper books to the front side causes the recognition of the digitized text, so as to provide a method, an apparatus and an electronic device for processing the image of the paper text.

According to a first aspect, an embodiment of the present invention discloses a paper text image processing method, including: acquiring a paper text image; performing text recognition on the paper text image to obtain a positioning box of a text contained in the paper text image and coordinate information of the positioning box; classifying and identifying texts in the positioning frame according to the coordinate information of the positioning frame; and when the text in the positioning box is the reverse text, responding to the recognized reverse text by clearing operation.

Optionally, after the text information includes a reverse text and a clearing operation is performed on the identified reverse text, the method further includes: performing high-definition processing on the paper text image subjected to the clearing operation to obtain a high-definition paper text image; when the paper text image contains a picture, determining the position information of the picture in the high-definition paper text image; and replacing the picture in the high-definition paper text image with the picture in the paper text image according to the position information.

Optionally, before performing text recognition on the paper text image to obtain a positioning box of a text included in the paper text image and coordinate information of the positioning box, the method further includes: and performing deviation rectification and edge cutting processing based on the paper text image.

Optionally, the performing text recognition on the paper text image to obtain a positioning box of a text included in the paper text image and coordinate information of the positioning box includes: reducing the paper text image to obtain a reduced paper text image; performing feature extraction on the reduced paper text image to obtain a feature map; carrying out sliding window feature interception on the feature graph to obtain a feature sub-graph; and obtaining a feature vector of the paper text image based on the feature subgraph, and obtaining the coordinate information of the positioning frame of the text according to the feature vector.

Optionally, obtaining a feature vector of the paper text image based on the feature subgraph includes: inputting the characteristic subgraph into a recurrent neural network to obtain the sequence characteristics of the characteristic subgraph; inputting the sequence features of the feature subgraph into a recurrent neural network again to carry out reduction operation to obtain a reduced feature subgraph; and inputting the reduced characteristic subgraph into a full-connection layer to obtain a characteristic vector containing all information.

Optionally, when the text in the positioning box is a reverse text, responding to a clearing operation for the recognized reverse text, including: amplifying the positioning frame of the reverse text meeting the target condition; and removing the reverse text in the enlarged positioning box.

According to a second aspect, an embodiment of the present invention further discloses a paper text image processing apparatus, including: the acquisition module is used for acquiring a paper text image; the recognition module is used for performing text recognition on the paper text image to obtain a positioning box of a text contained in the paper text image and coordinate information of the positioning box; the classification module is used for classifying and identifying the text in the positioning frame according to the coordinate information of the positioning frame; and the clearing module is used for responding to clearing operation on the recognized reverse text when the text in the positioning box is the reverse text.

Optionally, the apparatus further comprises: the high-definition processing module is used for performing high-definition processing on the paper text image subjected to the clearing operation to obtain a high-definition paper text image; the positioning module is used for determining the position information of the picture in the high-definition paper text image when the paper text image contains the picture; and the replacing module is used for replacing the picture in the high-definition paper text image by using the picture in the paper text image according to the position information.

According to a third aspect, an embodiment of the present invention further discloses an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the steps of the paper text image processing method according to the first aspect or any one of the optional embodiments of the first aspect.

According to a fourth aspect, the embodiments of the present invention further disclose a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the paper text image processing method according to the first aspect or any one of the optional embodiments of the first aspect.

The technical scheme of the invention has the following advantages:

the paper text image processing method provided by the invention comprises the steps of obtaining a paper text image, carrying out text recognition on the paper text image to obtain a positioning frame of a text contained in the paper text image and coordinate information of the positioning frame, carrying out classification recognition on the text in the positioning frame according to the coordinate information of the positioning frame, and responding to a clearing operation on the recognized reverse text when the text in the positioning frame is the reverse text. The reverse text is removed through the recognition of the paper text image, so that the converted text is clearer, and the efficiency of the paper text in the digital conversion process is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a flowchart of a specific example of a paper text image processing method in an embodiment of the present invention;

FIG. 2 is a schematic block diagram of a specific example of a paper text image processing apparatus in an embodiment of the present invention;

fig. 3 is a diagram of a specific example of an electronic device in an embodiment of the invention.

Detailed Description

The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; the two elements may be directly connected or indirectly connected through an intermediate medium, or may be communicated with each other inside the two elements, or may be wirelessly connected or wired connected. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

The embodiment of the invention discloses a paper text image processing method, which comprises the following steps of:

step 101: and acquiring a paper text image.

For example, the paper text image is an electronic document that is converted from a paper document into a digital electronic document through scanning, photographing and the like, the paper document may be a paper archive document, and the archive image is obtained through scanning by using a scanner with set fixed parameters.

Step 102: and performing text recognition on the paper text image to obtain a positioning box of the text contained in the paper text image and coordinate information of the positioning box.

For example, text information in a paper text image is recognized, the recognized content can be text contained in the image, and a positioning box of the text and coordinate information of the positioning box are obtained through the recognition of the text. The Text in the archival image can be identified and located by using a CTPN (Connection Text forward Network) for connecting to a Text area Network (Text area Network) for performing horizontal Text detection, for example, the archival image is reduced, and the Text in the reduced archival image is located by using a CTPN model to obtain a location box.

Step 103: and classifying and identifying the texts in the positioning frame according to the coordinate information of the positioning frame.

For example, the text information in the positioning box may include a forward text formed by a text on the front side of the paper archive document and a reverse text formed by a font on the back side of the paper archive document penetrating through the front side of the document, all the forward text and the reverse text are positioned, and the method for identifying the text may be determined by using binary classification.

Step 104: and when the text in the positioning box is the reverse text, responding to the recognized reverse text by clearing operation.

For example, the text in the text box recognized as the reverse text may be removed by inputting the text information in the text box obtained by positioning into an image classification model miniVGG, obtaining a reverse text box with a confidence level exceeding 0.9, amplifying the reverse text box with a confidence level exceeding 0.9, obtaining a text box with a font of a reverse font in the paper text image, and removing the text in the text box.

The paper text image processing method provided by the invention comprises the steps of obtaining a paper text image, and carrying out text recognition on the paper text image to obtain a positioning box of a text contained in the paper text image and coordinate information of the positioning box; classifying and identifying texts in the positioning frame according to the coordinate information of the positioning frame; and when the text in the positioning box is the reverse text, responding to the recognized reverse text by clearing operation. The reverse text is removed through recognition of the paper text image, so that the converted text is clearer, and the efficiency of the paper text in the digital conversion process is improved.

As an optional embodiment of the present invention, after step 104, the method further comprises: performing high-definition processing on the paper text image subjected to the clearing operation to obtain a high-definition paper text image; when the paper text image contains a picture, determining the position information of the picture in the high-definition paper text image; and replacing the picture in the high-definition paper text image with the picture in the paper text image according to the position information.

Illustratively, the archive image from which the reverse text has been removed is subjected to high-definition processing, so that the digitized archive image is clear and tidy, an AutoEncoder deep learning method can be used to perform high-definition processing on the archive image from which the reverse text is removed, fine tuning is performed on the high-definition archive image from which the reverse text is removed, rule-based normalization processing is performed on the fine-tuned high-definition archive image, so as to obtain a final high-definition archive image, for example, the archive image from which the reverse text is removed is segmented, the segmented archive image is encoded by using an encor, the encoded archive image is decoded by using a decor again, so as to obtain a high-definition archive image, and the high-definition archive images are merged to obtain a high-definition archive image. The high definition processing method is not limited in this embodiment, and those skilled in the art can determine the method according to actual needs. The pictures such as the portrait in the archive image after high definition are identified to obtain the coordinate information of the pictures, the portrait picture in the archive image is replaced into the archive image after high definition processing according to the identified coordinate information, the portrait picture in the archive image is kept unchanged, and the authenticity of the archive is ensured.

As an optional implementation manner of the present invention, before the step 102, the method further includes: and performing deviation rectification and edge cutting processing based on the paper text image.

In the process of acquiring the digitized file image, the file image may be inclined due to the position of the scanner, the operation problems and the like in the scanning process, so that the correction and trimming process is required before text recognition, the effective part of the file image is obtained, the identification and high-definition processing of the file image are facilitated, and the identification accuracy is improved.

As an alternative embodiment of the present invention, the step 102 includes: reducing the paper text image to obtain a reduced paper text image; performing feature extraction on the reduced paper text image to obtain a feature map; carrying out sliding window feature interception on the feature graph to obtain a feature sub-graph; and obtaining a feature vector of the paper text image based on the feature subgraph, and obtaining the coordinate information of the positioning frame of the text according to the feature vector.

Illustratively, the identification of the archival image may be performed by identifying features of the archival image, for example, performing feature extraction on the archival image based on a VGG16 convolution calculation model, taking the fifth layer convolution layer as an output, and obtaining a feature matrix, the feature extraction including: the file image after rectification and trimming is reduced by adopting a CTPN (character detection algorithm), the reduced file image is subjected to convolution operation by using a VGG16 convolution calculation model to obtain a feature map of the file, the feature map is densely slid on by using a 3x3 space sliding window with the step length of 1 to obtain a series of 3x3 feature subgraphs.

As an optional embodiment of the present invention, obtaining a feature vector of a paper text image based on the feature subgraph includes: inputting the characteristic subgraph into a recurrent neural network to obtain the sequence characteristics of the characteristic subgraph; inputting the sequence features of the feature subgraph into a recurrent neural network again to carry out reduction operation to obtain a reduced feature subgraph; and inputting the reduced characteristic subgraph into a full-connection layer to obtain a characteristic vector containing all information.

Illustratively, the feature sub-graph is the feature sub-graph obtained according to the sliding window, BLSTM decoding operation can be performed on the feature sub-graph through CTPN, resihape operation is performed on the 3x3 feature sub-graph to adjust the shape of the feature rectangle of the feature sub-graph, reshape operation is performed on the adjusted sequence feature again, the feature sub-graph of 3x3 is obtained by reduction, the feature sub-graph includes space and sequence features, and the reduced feature sub-graph is input into the full-link layer to obtain a one-dimensional feature vector for packing and replacing all information. And calculating the coordinate information, the category information and the edge adjustment information of the one-dimensional characteristic vector through the CTPN to obtain the coordinate information of the positioning frame.

As an alternative embodiment of the present invention, the step 104 includes: amplifying the positioning frame of the reverse text meeting the target condition; and removing the reverse text in the enlarged positioning box.

Illustratively, before removing the reverse text, the positioning box of the reverse text is enlarged, and the reverse text is removed from the enlarged reverse text box, so that the precision of removing the reverse text can be improved, and the forward text is prevented from being removed inaccurately by recognition.

The embodiment of the invention also discloses a paper text image processing device, as shown in fig. 2, the device comprises:

the acquiring module 201 is configured to acquire a paper text image. For example, the details are as above step 101, and are not described here again.

The identification module 202 is configured to perform text identification on the paper text image to obtain a positioning box of a text included in the paper text image and coordinate information of the positioning box. Illustratively, the details are given in step 102 above and will not be described here.

And the classification module 203 is configured to perform classification and identification on the text in the positioning frame according to the coordinate information of the positioning frame. Illustratively, the details are referred to above as step 103, and are not described here again.

And the clearing module 204 is used for responding to clearing operation on the recognized reverse text when the text in the positioning box is the reverse text. Illustratively, the details are given in step 104 above and will not be described here.

The paper text image processing device provided by the invention obtains a paper text image; performing text recognition on the paper text image to obtain a positioning box of a text contained in the paper text image and coordinate information of the positioning box; classifying and identifying texts in the positioning frame according to the coordinate information of the positioning frame; and when the text in the positioning box is the reverse text, responding to the recognized reverse text by clearing operation. The reverse text is removed through recognition of the paper text image, so that the converted text is clearer, and the efficiency of the paper text in the digital conversion process is improved.

As an optional embodiment of the present invention, the apparatus further comprises: the high-definition processing module is used for performing high-definition processing on the paper text image subjected to the clearing operation to obtain a high-definition paper text image; the positioning module is used for determining the position information of the picture in the high-definition paper text image when the paper text image contains the picture; and the replacing module is used for replacing the picture in the high-definition paper text image by using the picture in the paper text image according to the position information.

An embodiment of the present invention further provides an electronic device, as shown in fig. 3, the electronic device may include a processor 301 and a memory 302, where the processor 301 and the memory 302 may be connected by a bus or in another manner, and fig. 3 takes the connection by the bus as an example.

Processor 301 may be a Central Processing Unit (CPU). The Processor 301 may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or combinations thereof.

The memory 302, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the paper text image processing method in the embodiments of the present invention. The processor 301 executes various functional applications and data processing of the processor by running non-transitory software programs, instructions and modules stored in the memory 302, namely, implements the paper text image processing method in the above method embodiment.

The memory 302 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by the processor 301, and the like. Further, the memory 302 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 302 may optionally include memory located remotely from the processor 301, which may be connected to the processor 301 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The one or more modules are stored in the memory 302 and, when executed by the processor 301, perform a paper text image processing method as in the embodiment shown in fig. 1.

The details of the electronic device may be understood with reference to the corresponding related description and effects in the embodiment shown in fig. 1, and are not described herein again.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD), a Solid State Drive (SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.

Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.

11页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：图像形成装置及其驱动下载提示方法、打印系统

Paper text image processing method and device and electronic equipment

相关技术

网友询问留言