Method, apparatus and storage medium for locating each character in a recognized text line

文档序号：1816894 发布日期：2021-11-09 浏览：9次中文

阅读说明：本技术 定位已识别文本行中的每个字符的方法、设备和存储介质 (Method, apparatus and storage medium for locating each character in a recognized text line ) 是由张明捷汪留安孙俊于 2020-05-09 设计创作，主要内容包括：本申请公开了一种用于定位已识别的文本行中的每个字符的方法和设备以及计算机可读存储介质。该方法包括：步骤S1：给所述文本行中的每个字符标记核心笔画,所述标记指示笔画属于所述文本行中的哪个字符；步骤S2：基于已标记的笔画,对与该已标记的笔画粘连的未标记的粘连笔画和未标记的孤立笔画进行标记,其中,所述孤立笔画是指在两个已标记笔画之间仅有的一个未标记笔画；和步骤S3：将前N对或前M％对彼此相隔最近的两个相邻笔画合并在一起,其中,所述两个相邻笔画没有被标记至不同的字符,以及其中,N是大于或等于1的整数,并且M是0至100之间的任意数值,重复进行步骤S2和S3,直到所有笔画都被标记到所述文本行中的字符为止。(A method and apparatus and computer-readable storage medium for locating each character in a recognized line of text are disclosed. The method comprises the following steps: step S1: marking each character in the line of text with a core stroke, the marking indicating to which character in the line of text the stroke belongs; step S2: based on the marked strokes, marking unmarked stuck strokes and unmarked isolated strokes which are stuck with the marked strokes, wherein the isolated strokes refer to only one unmarked stroke between the two marked strokes; and step S3: merging together two adjacent strokes that are the first N or M% closest to each other, wherein the two adjacent strokes are not labeled to different characters, and wherein N is an integer greater than or equal to 1 and M is any value between 0 and 100, repeating steps S2 and S3 until all strokes are labeled to characters in the line of text.)

1. A method for locating each character in a line of text that has been identified, comprising:

step S1: marking each character in the line of text with a core stroke, the marking indicating to which character in the line of text the stroke belongs;

step S2: based on the marked strokes, marking unmarked stuck strokes and unmarked isolated strokes which are stuck with the marked strokes, wherein the isolated strokes refer to only one unmarked stroke between the two marked strokes; and

step S3: merging together two adjacent strokes that are the first N or M% closest apart from each other, wherein the two adjacent strokes are not labeled to different characters, and wherein N is an integer greater than or equal to 1 and M is any value between 0 and 100,

steps S2 and S3 are repeated until all strokes are marked to a character in the line of text.

2. The method of claim 1, wherein said marking core strokes comprises, in order:

marking strokes whose stroke range contains or is contained in the recognition range of the character in the text line and which overlap with the core range of the character as core strokes of the character,

for a character in the text line that has not yet been marked with a core stroke, marking unmarked strokes that overlap the core extent of the character as core strokes of the character, an

For a character in the text line that has not yet marked a core stroke, the unmarked stroke with the largest proportion of overlap with the recognition range of the character is marked as the core stroke of the character.

3. The method of claim 2, wherein the first and second light sources are selected from the group consisting of,

wherein if a stroke that overlaps the core range of the character is simultaneously marked as a core stroke of a plurality of characters, the stroke is not marked as a core stroke of any character,

and wherein a stroke is marked as a core stroke of the leftmost character if the stroke range is contained in or contained in the recognition range of the character and overlaps the core range of the character, or a stroke having the largest overlap ratio with the recognition range of the character is marked as a core stroke of a plurality of characters at the same time.

4. The method of claim 1, wherein marking the unmarked sticky strokes and the unmarked isolated strokes based on the marked strokes comprises:

merging the unlabeled sticky strokes to the labeled strokes if the unlabeled sticky strokes are sticky with the labeled strokes; and

merging the unmarked isolated stroke to the marked stroke closest to the unmarked isolated stroke; and

repeating the above steps until all the stuck strokes and the isolated strokes are marked.

5. The method according to claim 1, wherein the step S3 further comprises:

merging all strokes with the same mark obtained in step S1 and step S2 together;

calculating the distance between two adjacent strokes; and

the first N pairs or the first M% pairs of two adjacent strokes that are closest to each other are merged together.

6. The method of claim 4 or 5, wherein the distance is based on a difference between a maximum value among a leftmost position of one stroke and a leftmost position of another stroke of the two adjacent strokes and a minimum value among a rightmost position of the one stroke and a rightmost position of the another stroke.

7. The method of any of claims 1-5, wherein:

if the absolute value of the difference between the distance of a stroke and its two adjacent strokes is greater than said first threshold, the stroke is marked to the character that is closest to it.

8. The method of claim 7, further comprising modifying an end position obtained by a recognition engine for each character in the recognized line of text prior to marking the core stroke, the modifying comprising:

in the case where the character is different from its next character:

if the character exists in the first P candidates of the recognition result of the next time stamp after the character end position and the corresponding confidence degree is larger than a second threshold value, the next time stamp is divided into the end positions of the character again,

iteratively performing the step of subdividing until the time stamp before the end position of the next one of the characters is moved; and

in the case that the character is the same as its next character:

if the character is present in the first Q candidates of each time stamp between the two identical character end positions, the end position of the character is not changed, and

if the character does not exist in the first Q candidates of each time stamp between the two identical character end positions, the same processing as the case where the character is different from the next character is performed, where P and Q are positive integers greater than 1 and P is smaller than Q.

9. An apparatus for locating each character in a recognized line of text, comprising:

a first marking device configured to mark each character in the line of text with a core stroke, the mark indicating to which character in the line of text the stroke belongs;

second marking means configured to mark, based on the marked strokes, unmarked stuck strokes that are stuck with the marked strokes and unmarked isolated strokes, wherein the isolated strokes refer to only one unmarked stroke between two marked strokes; and

merging means configured to merge together two adjacent strokes of the first N pairs or the first M% pairs that are closest to each other, wherein the two adjacent strokes are not labeled to different characters, and wherein N is an integer greater than or equal to 1 and M is any value between 0 and 100,

wherein all strokes are marked to characters in the text line after processing by the second marking means and the merging means.

10. A computer-readable storage medium storing a program executable by a processor to:

operation S1: marking each character in the line of text with a core stroke, the marking indicating to which character in the line of text the stroke belongs;

operation S2: based on the marked strokes, marking unmarked stuck strokes and unmarked isolated strokes which are stuck with the marked strokes, wherein the isolated strokes refer to only one unmarked stroke between the two marked strokes; and

operation S3: merging together two adjacent strokes that are the first N or M% closest apart from each other, wherein the two adjacent strokes are not labeled to different characters, and wherein N is an integer greater than or equal to 1 and M is any value between 0 and 100,

operations S2 and S3 are repeated until all strokes are marked to a character in the line of text.

Technical Field

The present disclosure relates to the field of text recognition, and in particular to locating each character in recognized text.

Background

Word spotting algorithms are a long-standing, fundamental and challenging research problem in the fields of Optical Character Recognition (OCR) and scene character recognition (STR). A good single character positioning algorithm can improve the accuracy of character recognition, marking and classification of text line contents.

To date, a number of algorithms have been proposed for word spotting. These algorithms can be classified mainly into the following four types: edge-based methods, texture-based methods, connected component domain (CC) -based methods, and deep learning algorithm-based methods. Among them, the edge-based method focuses mainly on how to extract text with high contrast between the text and the background, however, this method is affected by shadows and highlights. Similarly, the texture-based method uses a texture analysis algorithm to locate the distribution region of the text, and the following methods are common: spatial variation, Gabor filtering, wavelet transform, and the like. The CC-based method is to perform connected domain analysis on the text line image to extract possible characters, and further assemble these character regions into a text segment. The methods above mainly focus on how to extract the distribution region of characters from the natural scene image, but cannot directly locate the specific region of each character. In recent years, due to the rapid development of deep learning algorithms, neural networks, such as Convolutional Neural Networks (CNNs), have been increasingly frequently used to identify and locate characters in lines of text.

Further, the Convolutional Recurrent Neural Network (CRNN) is a new and very efficient recognition algorithm, which is an end-to-end general framework, and is very suitable for sequence scenes of arbitrary length. Therefore, CRNN is widely used in the field of OCR because it has high recognition accuracy. However, CRNN's word spotting performance is not high and often carries more serious errors, and further improvement is needed.

Disclosure of Invention

The following presents a simplified summary of the disclosure in order to provide a basic understanding of some aspects of the disclosure. It should be understood that this summary is not an exhaustive overview of the disclosure. It is not intended to identify key or critical elements of the disclosure or to delineate the scope of the disclosure. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is discussed later.

According to one aspect of the invention, a method is provided for locating each character in a line of text that has been recognized. The method comprises the following steps: step S1: marking each character in the line of text with a core stroke, the marking indicating to which character in the line of text the stroke belongs; step S2: based on the marked strokes, marking unmarked stuck strokes and unmarked isolated strokes which are stuck with the marked strokes, wherein the isolated strokes refer to only one unmarked stroke between the two marked strokes; and step S3: merging together two adjacent strokes that are the first N or M% closest to each other, wherein the two adjacent strokes are not labeled to different characters, and wherein N is an integer greater than or equal to 1 and M is any value between 0 and 100, repeating steps S2 and S3 until all strokes are labeled to characters in the line of text.

According to another aspect of the present invention, there is provided an apparatus for locating each character in a line of text that has been identified, comprising: a first marking device configured to mark each character in the line of text with a core stroke, the mark indicating to which character in the line of text the stroke belongs; second marking means configured to mark, based on the marked strokes, unmarked stuck strokes that are stuck with the marked strokes and unmarked isolated strokes, wherein the isolated strokes refer to only one unmarked stroke between two marked strokes; and merging means configured to merge together two adjacent strokes of the first N pairs or the first M% pairs that are closest to each other, wherein the two adjacent strokes are not labeled to different characters, and wherein N is an integer greater than or equal to 1 and M is any value between 0 and 100, wherein all strokes are labeled to characters in the text line.

According to other aspects of the invention, corresponding computer program code, computer readable storage medium and computer program product are also provided.

The method and the device for locating each character in the recognized text line can improve the locating precision of a single character, thereby being beneficial to being combined with other recognition engines to further improve the text recognition precision.

These and other advantages of the present invention will become more apparent from the following detailed description of the preferred embodiments of the present invention, taken in conjunction with the accompanying drawings.

Drawings

To further clarify the above and other advantages and features of the present disclosure, a more particular description of embodiments of the present disclosure will be rendered by reference to the appended drawings. Which are incorporated in and form a part of this specification, along with the detailed description that follows. Elements having the same function and structure are denoted by the same reference numerals. It is appreciated that these drawings depict only typical examples of the disclosure and are therefore not to be considered limiting of its scope. In the drawings:

FIG. 1 illustrates a flow diagram of a method of correcting an end position of each character in a recognized line of text prior to marking core strokes, according to one embodiment;

FIG. 2a shows the soft-max value of each timestamp candidate obtained by CRNN identification;

FIG. 2b shows the CRNN and the over-cut results;

FIG. 2c shows the new end position of each character after correction;

FIG. 3 illustrates a method for locating each character in a recognized line of text, according to one embodiment of the invention;

FIG. 4 illustrates a flow diagram for marking core strokes in accordance with one embodiment;

FIG. 5 illustrates a flow diagram of marking sticky strokes and isolated strokes in accordance with one embodiment;

FIG. 6 illustrates a flow diagram for merging two adjacent strokes whose first N pairs or first M% pairs are closest to each other, according to one embodiment;

FIG. 7a exemplarily shows a result of marking a core stroke;

FIG. 7b exemplarily shows the result of the sorting of the stuck strokes;

FIG. 7c schematically shows the result of the isolated stroke classification;

FIG. 7d illustratively shows the result of merging the first N pairs or the first M% pairs of two adjacent strokes that are closest apart;

FIG. 7e illustratively shows the result of all strokes being marked to a recognized character;

FIG. 8 is a block diagram of an apparatus for locating each character in a recognized line of text, according to one embodiment;

FIG. 9 is a block diagram of an exemplary architecture of a general purpose personal computer in which methods and/or apparatus according to embodiments of the invention may be implemented.

Detailed Description

Exemplary embodiments of the present disclosure will be described hereinafter with reference to the accompanying drawings. In the interest of clarity and conciseness, not all features of an actual implementation are described in the specification. It will of course be appreciated that in the development of any such actual embodiment, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which will vary from one implementation to another. Moreover, it will be appreciated that such a development effort might be complex and time-consuming, but would nevertheless be a routine undertaking for those of ordinary skill in the art having the benefit of this disclosure.

Here, it should be further noted that, in order to avoid obscuring the present disclosure with unnecessary details, only the device structures and/or processing steps closely related to the scheme according to the present disclosure are shown in the drawings, and other details not so relevant to the present disclosure are omitted.

As described above, the existing text recognition methods are not satisfactory in terms of word-spotting performance.

In order to overcome the defects of the prior art, the invention provides a novel single word positioning algorithm based on a CRNN and an over-segmentation algorithm. First, a handwritten single line of text is identified using CRNN and the position of each character is located. This positioning result is relatively coarse and provides only a rough range of characters. Meanwhile, the image is over-cut by using an over-cut algorithm, namely, the image is cut into one stroke. Then, the end position of each character given by the CRNN is properly corrected by using the soft-max fraction generated in the CRNN identification process, so as to improve the positioning precision of the CRNN. Next, each recognized character is initialized and then the core strokes of each character in the line of text are found and marked, thereby matching each character in the line of text to at least one stroke as accurately as possible. Finally, based on the core strokes and some a priori knowledge, each unmarked stroke is marked by using a proper algorithm so as to mark each stroke to each character recognized by the CRNN, thereby obtaining the corresponding stroke of each character, namely the distribution area of each character.

How the end position of each character in the recognized text line is corrected before marking the core strokes, which is obtained using the CRNN and the over-segmentation algorithm, is described first with reference to fig. 1 and 2a to 2 c.

FIG. 1 illustrates a flow diagram of a method for correcting the end position of each character in a recognized line of text prior to marking core strokes, according to one embodiment.

First, in step 101, an image text line is recognized using the CRNN algorithm, and characters in the recognized text line are stroke-cut using the over-cut algorithm.

Specifically, in the present embodiment, as shown in fig. 2a and 2b, the image text line is first identified using the CRNN algorithm to obtain the content of the text line, the possible candidates of each character and their corresponding confidence levels, and the ending timestamp position of each character. Fig. 2a shows the top ten candidates predicted in each timestamp and their corresponding confidence values in the 12 th to 14 th timestamps. The vertical bar in fig. 2b shows the corresponding ending timestamp position of each character given by CRNN, i.e. the position with the highest probability in the adjacent several timestamps. For example, in fig. 2b, the probability of the character "mountain" at the 12 th timestamp is the highest, and therefore the end position of the character "mountain" is the 12 th timestamp. As can be seen from fig. 2b, the CRNN may provide a coarse and less trusted end position for each character.

Next, each character recognized by the CRNN is segmented into a discrete stroke using the over-segmentation algorithm, the result of which is shown in the box of FIG. 2 b.

As can be seen from fig. 2b, the word ending positions provided by CRNN are not accurate. Preferably, the CRNN-provided end position may be slightly modified in advance to correct some relatively obvious and simple errors.

Next, in step 102, a character in the text line recognized by the CRNN is selected as the current character. And proceeds to step 103 to determine if the current character is the same as its next character.

In the case where the current character is different from its next character (i.e., "no" in step 103), it proceeds to step 1031. In step 1031, the next timestamp after the end position of the current character is found, and for the recognition result of this timestamp, it is determined whether there is the current character in its previous P candidates and the corresponding confidence is greater than the threshold TH 2.

If so, then at step 1033, the next timestamp is re-divided into the end positions of the current character. Then, it proceeds to step 1036. In step 1036, it is determined whether the next timestamp of the new end position is the end position of the next character (identified by CRNN). If yes, the end position of the current character is corrected, and the process continues to step 104. If not, go back to step 1031 to continue searching for the end position of the current character.

If the result of step 1031 is negative, then proceed to step 1035 to not correct the ending position of the current character. Then, proceed to step 104.

In step 104, it is determined whether the current character is the last character in the characters to be corrected. If so, the process ends, and the ending positions of all the characters to be corrected are corrected once. If not, proceed to step 105. In step 105, the next character in the characters to be corrected is selected as the new current character and the process returns to step 103, and the iterative judgment is performed according to the same logic until the end positions of all the characters to be corrected are corrected.

Preferably, P is 2 and TH2 is 0.01. It should be understood, however, that the values of P and TH2 are not limited thereto, but may be set as desired.

For example, as shown in fig. 2a, the 12 th timestamp has the highest confidence of the character "mountain", and thus is also the ending location of the character "mountain" recognized by CRNN. However, as can be seen from fig. 2b, this end position is clearly inaccurate, since the strokes of a part of the character "mountain" are not included. In fact, the 13 th timestamp still belongs to the character "mountain", while the 14 th timestamp does not belong to the character "mountain".

To correct the ending position of the character "mountain", in step 103, a search is performed sequentially backwards for each timestamp, starting from its ending timestamp 12, according to the result of CRNN recognition (as shown in fig. 2 a). After the search, it is found that the "mountain" character exists in the 2 nd candidate in the 13 th timestamp, and the confidence of the candidate is 0.02 and is higher than the threshold value of 0.01, the timestamp 13 is considered to still belong to the character "mountain", further, it can be found that the confidence of the "mountain" character in the timestamp 14 is lower than the threshold value of 0.01, and therefore, the timestamp 13 is marked instead of the timestamp 14 as the end position of the "mountain" character, that is, the end position of the "mountain" character is extended backward to the 13 th timestamp, and the corrected end position is shown in fig. 2 c.

In the case where the current character is the same as its next character (i.e., "yes" in step 103), it proceeds to step 1032. In step 1032, for each time stamp between the two identical character end positions, it is determined whether the current character is present in the top Q candidates for each time stamp. If yes, go to step 104, and determine whether the current character is the last character in the characters to be corrected. If so, the process ends, and the ending positions of all the characters to be corrected are corrected once. If not, proceed to step 105. In step 105, the next character in the characters to be corrected is selected as the new current character and the process returns to step 103, and the iterative judgment is performed according to the same logic until the end positions of all the characters to be corrected are corrected.

It should be noted that to prevent the mistaken confusing effect of foldings on this correction method, the text recognized by the CRNN needs to be detected in advance. If two consecutive identical characters appear at a certain position, it needs to be detected for confusion, i.e. step 1032, if the character appears in the top Q candidates all the time stamps between the end positions of the two characters, the information is considered to be confused, and the present correction method cannot correct it accurately, so the end position of the first character (i.e. the current character) in the superimposed character is not corrected.

Preferably, Q is taken to be 5, P and Q are positive integers greater than 1 and P is less than Q. However, it is to be understood that the values of P and Q are not limited thereto, but may be set as needed.

By the correction method, the end position of the character in the text line identified by the CRNN can be corrected to a correct end position, so that the positioning performance of the CRNN is improved.

A method 300 for locating each character in a recognized line of text in accordance with an embodiment of the present invention is described below in conjunction with fig. 3-7 e.

As shown in fig. 3, first, in step S1, each character in the text line is marked with a core stroke indicating to which character in the text line the stroke belongs.

Preferably, in this embodiment, the text line is a text line recognized by CRNN. However, it should be understood that the invention is not so limited, but that the lines of text may also be lines of text that have not been or have been position corrected, as recognized by any other existing recognition engine.

In order to establish a correspondence between each over-segmented stroke and a character recognized by the CRNN, at least one core stroke needs to be found and marked for each character recognized by the CRNN. By core stroke is meant the stroke that is most likely to belong to a character among all strokes of that character. Since the selection of the core stroke may significantly affect the performance of the subsequent steps, only those strokes most likely to belong to the character are selected as the core stroke in step S1.

Based on the ending position of each character given by CRNN, an approximate distribution range of each character, referred to herein as the recognition range of the character, may be obtained. While this area is often judged less accurately in the border areas, it is often judged more accurately in some core areas (e.g., areas to the middle right). Further, it can be obviously considered that as the overlap area between a stroke and the recognition range of a character is higher, the stroke is more likely to belong to the character.

FIG. 4 illustrates a flow diagram for marking core strokes (i.e., step S1), according to one embodiment.

In step 401, strokes whose stroke range contains or is contained in the recognition range of a character in a text line and which overlap the core range of the character are marked as core strokes of the character.

Specifically, in this embodiment, if a stroke overlaps the core region of a character and the stroke is completely contained within the CRNN recognition range of the character or, conversely, the distribution range of the stroke completely covers the CRNN recognition range of the character, the stroke is marked as the core stroke of the character.

Preferably, if the CRNN recognition range of a character is set to 0 to 1 in the lateral direction, a region of 0.4 to 0.8 within the recognition range of the character is generally regarded as the core region of the character. It should be understood that this interval value is only an example, and the present invention is not limited thereto.

Next, at step 402, it is determined whether there are characters in the line of text that have not yet been marked with a core stroke. If so, then in step 403, the unlabeled strokes that overlap the core range of the character are labeled as the core strokes of the character. If not, then proceed to step S2, which will be described in detail below.

Specifically, in this embodiment, after step 401 is performed, if there are characters in the text line that do not contain the core stroke, the constraint condition needs to be slightly relaxed, and a search is performed among the remaining unmarked strokes, so as to find a stroke with the highest probability as the core stroke for these characters. Thus, in step 403, the unlabeled strokes that overlap the core range of the character are labeled as the core strokes of the character.

Next, in step 404, it is determined whether there are characters in the line of text that have not yet been marked with a core stroke. If so, then in step 405, the unmarked stroke having the greatest proportion of overlap with the recognition range of the character is marked as the core stroke of the character. If not, then proceed to step S2, which will be described in detail below.

It should be noted that after step 405, if the character of the unmarked core stroke still exists in the text line, the CRNN is deemed to have a more severe recognition or positioning error for the character, and therefore the character is deleted and not positioned.

It should also be noted that in step 403, if a stroke is simultaneously marked as a core stroke of multiple characters, the stroke is not marked as a core stroke of any character. Similarly, in steps 401 and 405, if a stroke is simultaneously marked as the core stroke of multiple characters, the stroke is marked as the core stroke of the leftmost character.

FIG. 7a shows an example of a text line marked with core strokes by the method shown in FIG. 4, where dark and light colors represent core strokes corresponding to different characters, and gray represents unmarked strokes.

Returning to FIG. 3, after the core stroke is marked for each character in the line of text, processing continues with step S2. In step S2, based on the marked strokes, unmarked stuck strokes and unmarked isolated strokes that are stuck to the marked strokes are marked, wherein an isolated stroke refers to only one unmarked stroke between two marked strokes.

Specifically, in the present embodiment, after each character is marked with core strokes as much as possible, other unmarked strokes need to be marked based on these marked strokes. Strokes having the same label may be considered as a whole, i.e., a collection of labeled strokes of a category.

FIG. 5 illustrates a flow diagram of marking sticky strokes and isolated strokes, according to one embodiment.

As shown in FIG. 5, in step 501, if an unmarked stroke sticks to a marked stroke, the unmarked stroke is merged into the set of marked strokes.

It should be noted that in some special cases, certain strokes may stick to more than two sets of marked strokes. At this time, since the classification cannot be determined, the classification and fusion are not performed for the time being.

FIG. 7b illustrates an example of a stuck stroke category. As shown in FIG. 7b, all the stuck strokes are categorized into the recognition character to which they belong.

Next, in step 502, the unlabeled isolated stroke is merged to the labeled stroke that is closest to it.

Specifically, in the present embodiment, an isolated stroke refers to only one unlabeled stroke between two labeled strokes, and this unlabeled stroke is referred to as an isolated stroke, and obviously, when an isolated stroke exists, this stroke belongs to either the set to which the left-side stroke belongs or the set to which the right-side stroke belongs. Thus, isolated strokes can be categorized by the following distance formula:

D＝max(s₁，c₁)-min(s₂，c₂)+1 (1)

wherein s is₁，s₂Pixel coordinates representing the left and right edges of an isolated stroke, respectively, and c₁，c₂Pixel coordinates representing the left and right edges of the object from which the distance is calculated.

It should be noted that the object for calculating distance may be another marked stroke, or a set of marked strokes, or the recognition range of characters provided by CRNN.

It should also be noted that when D >1, no overlapping region between the two objects is indicated; when D is less than or equal to 1, the overlapping area exists between the two, and the width of the overlapping area is 2-D pixels.

It should be understood that, because there is a certain randomness when writing, when the distance between an isolated stroke and two adjacent strokes is very close, i.e. the absolute value of the difference between the two distances is smaller than the threshold TH1, the attribution of the isolated stroke cannot be judged only according to the distance information, and new information needs to be introduced. In this case, the method according to the present embodiment calculates the overlap width of the isolated stroke and the recognition range of two adjacent characters according to formula (1), and assigns the isolated stroke to the character class having the largest overlap width.

Preferably, the threshold TH1 is 8 pixels. It should be understood, however, that the present invention is not so limited, and the value of TH1 may be set as desired.

Finally, in step 503, it is determined whether all the stuck strokes and isolated strokes are marked. If so, it proceeds to step S3, which will be described below. If not, return to step 501.

In particular, in the present embodiment, since new stuck or isolated strokes may be generated after sorting the stuck and isolated strokes, steps 501 and 502 need to be iterated until no new stuck or isolated strokes are generated.

FIG. 7c illustrates an example of isolated stroke classification. In this example, the stroke "1" located in the middle right of the character "rest" is closer to the character "rest" and is therefore classified into the category of the character "rest".

Returning to fig. 3, the process proceeds to step S3. In step S3, two adjacent strokes that are the first N pairs or the first M% pairs closest to each other are merged together, wherein the two adjacent strokes are not labeled to different characters, and wherein N is an integer greater than or equal to 1 and M is any value between 0 and 100.

Specifically, in the present embodiment, after steps S1 and S2, there may still be some strokes that are not classified, and they can only be classified using a greedy strategy.

FIG. 6 illustrates a flow diagram for merging together two adjacent strokes that are the first N pairs or the first M% pairs that are closest apart from each other, according to one embodiment.

First, in step 601, all the strokes with the same mark obtained through steps S1 and S2 are merged together and regarded as one stroke.

Next, in step 602, a distance between two adjacent strokes is calculated. Specifically, in the present embodiment, the distance between all adjacent strokes may be calculated according to formula (1), for example.

Next, in step 603, the first N pairs or the first M% pairs of two adjacent strokes that are closest to each other are merged together, wherein the two adjacent strokes are not labeled to different characters, and wherein N is an integer greater than or equal to 1 and M is any value between 0 and 100. Specifically, in this embodiment, all the distances calculated in step 602 are sorted according to size, the smallest top N or M% distances are selected, and the adjacent stroke pairs corresponding to the distances are combined into one class, that is, belong to the same character.

Note that if the selected adjacent stroke pair has been marked to two different CRNN recognition characters, then this distance is skipped and no selection is made. Furthermore, the strategy in categorizing strokes for merging is similar to the isolated stroke categorization strategy, namely: firstly, merging strokes to adjacent strokes which are closer to each other according to the distance between the strokes; if the stroke is close to the two distances of the left and right strokes, the stroke is classified to the character with the larger overlap.

It should also be noted that in calculating the distance, it is possible to merge several unlabeled strokes with each other, resulting in a new category that is not present in the character recognized by the CRNN. Therefore, this new class is only an intermediate variable in the fusion merge. After multiple iterations in the whole process, the newly generated temporary categories are gradually fused and merged into the categories of the recognition characters generated by the CRNN, and the categories of the final result only correspond to the recognition characters generated by the CRNN.

FIG. 7d shows an example of merging the top N pairs or the top M% pairs of the most recent strokes. In this example, N is chosen to be 1, and stroke "1" is closest in distance to stroke set "け", so they are labeled here as the same category and merged.

Returning to FIG. 3, finally, in step S4, it is determined whether there are still unlabeled strokes. If so, the process returns to step S2. If not, the method 300 ends.

Specifically, in this embodiment, it is determined whether there are any more strokes that have not been marked onto the character recognized by the CRNN. If so, go back to step S2. After many iterations, all strokes are classified and the newly generated provisional category is also merged to disappear. Thus, all strokes are categorically labeled onto the CRNN-recognized character, in other words, the position information corresponding to all recognized characters is accurately located.

FIG. 7e shows an example where all strokes are marked to a recognized character. As shown in FIG. 7e, all strokes corresponding to the recognized character are found.

The method 300 according to this embodiment assigns each stroke to their corresponding recognized character with high accuracy based on the CRNN's ending position and the spatial distribution between the strokes, thereby achieving stroke-level positioning of each character. The positioning result provides possibility for mutually fusing the recognition results among a plurality of recognition engines.

The recognition accuracy as shown in table 1 below was obtained by testing the method 300 according to the present embodiment using a plurality of handwritten japanese data sets (AIBU, questonaire, gene, and PFU). As can be seen from table 1, the average accuracy of the method 300 is about 95%, thus providing a very high degree of accuracy.

Data set (number of images)	AIBU(897)	Questionnaire(319)	Cogent(26)	PFU(25)	In total (1267)
						Accuracy of	95.0％	94.2％	93.5％	94.9％	95.0％

TABLE 1

The methods discussed above may be implemented entirely by computer-executable programs, or may be implemented partially or entirely using hardware and/or firmware. When implemented in hardware and/or firmware, or when a computer-executable program is loaded into a hardware device capable of running the program, an apparatus for locating each character in the identified lines of text as described below is implemented. In the following, a summary of these devices is given without repeating some details that have been discussed above, but it should be noted that, although these devices may perform the methods described in the foregoing, the methods do not necessarily employ or be performed by those components of the described devices.

Fig. 8 shows an apparatus 800 for locating each character in a recognized text line according to an embodiment, comprising a first marking means 801, a second marking means 802 and a merging means 803. The first marking means 801 is used to mark each character in the text line with a core stroke, which indicates to which character in the text line the stroke belongs. The second marking means 802 is configured to mark, based on the marked strokes, unmarked stuck strokes and unmarked isolated strokes that are stuck to the marked strokes, wherein an isolated stroke refers to the only one unmarked stroke between two marked strokes. The merging means 803 is configured to merge together two adjacent strokes of the first N pairs or the first M% pairs that are closest to each other, wherein the two adjacent strokes are not labeled to different characters, and wherein N is an integer greater than or equal to 1 and M is any value between 0 and 100. After processing by the second marking means and the merging means, all strokes are marked to characters in the text line.

The apparatus 800 for locating each character in a recognized line of text shown in fig. 8 corresponds to the method 300 for locating each character in a recognized line of text shown in fig. 3. Accordingly, the relevant details of the devices in the apparatus 800 for locating each character in the recognized text line have been given in detail in the description of the method 300 for locating each character in the recognized text line of fig. 3, and will not be described again here.

Each constituent module and unit in the above-described apparatus may be configured by software, firmware, hardware, or a combination thereof. The specific means or manner in which the configuration can be used is well known to those skilled in the art and will not be described further herein. In the case of implementation by software or firmware, a program constituting the software is installed from a storage medium or a network to a computer (for example, a general-purpose computer 900 shown in fig. 9) having a dedicated hardware configuration, and the computer can execute various functions and the like when various programs are installed.

FIG. 9 is a block diagram of an exemplary architecture of a general purpose personal computer in which methods and/or apparatus according to embodiments of the invention may be implemented. As shown in fig. 9, a Central Processing Unit (CPU)901 performs various processes in accordance with a program stored in a Read Only Memory (ROM)902 or a program loaded from a storage section 908 to a Random Access Memory (RAM) 903. In the RAM 903, data necessary when the CPU901 executes various processes and the like is also stored as necessary. The CPU901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output interface 905 is also connected to bus 904.

The following components are connected to the input/output interface 905: an input section 906 (including a keyboard, a mouse, and the like), an output section 907 (including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker, and the like), a storage section 908 (including a hard disk, and the like), a communication section 909 (including a network interface card such as a LAN card, a modem, and the like). The communication section 909 performs communication processing via a network such as the internet. The driver 910 may also be connected to the input/output interface 905 as necessary. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary, so that a computer program read out therefrom is mounted in the storage section 908 as necessary.

In the case where the series of processes described above is implemented by software, a program constituting the software is installed from a network such as the internet or a storage medium such as the removable medium 911.

It will be understood by those skilled in the art that such a storage medium is not limited to the removable medium 911 shown in fig. 9 in which the program is stored, distributed separately from the apparatus to provide the program to the user. Examples of the removable medium 911 include a magnetic disk (including a floppy disk (registered trademark)), an optical disk (including a compact disk read only memory (CD-ROM) and a Digital Versatile Disk (DVD)), a magneto-optical disk (including a Mini Disk (MD) (registered trademark)), and a semiconductor memory. Alternatively, the storage medium may be the ROM 902, a hard disk included in the storage section 908, or the like, in which programs are stored, and which is distributed to users together with the device including them.

The invention also provides a corresponding computer program code and a computer program product with a machine readable instruction code stored. Which when read and executed by a machine may perform the method 300 in accordance with an embodiment of the present invention as described above.

Accordingly, storage media configured to carry the above-described program product having machine-readable instruction code stored thereon are also included in the present disclosure. Including, but not limited to, floppy disks, optical disks, magneto-optical disks, memory cards, memory sticks, and the like.

Through the above description, the embodiments of the present disclosure provide the following technical solutions, but are not limited thereto.

Supplementary note 1. a method for locating each character in a recognized line of text, comprising:

step S1: marking each character in the line of text with a core stroke, the marking indicating to which character in the line of text the stroke belongs;

steps S2 and S3 are repeated until all strokes are marked to a character in the line of text.

Supplementary notes 2. the method according to supplementary notes 1, wherein said marking core strokes comprises in sequence:

for a character in the text line that has not yet been marked with a core stroke, marking unmarked strokes that overlap the core extent of the character as core strokes of the character, an

Appendix 3. according to the method of appendix 2,

wherein if a stroke that overlaps the core range of the character is simultaneously marked as a core stroke of a plurality of characters, the stroke is not marked as a core stroke of any character,

Supplementary note 4. the method according to supplementary note 3, wherein, setting the recognition range to 1, the core range is 0.4 to 0.8.

Additional note 5. the method of additional note 1, wherein marking the unlabeled stuck strokes and the unlabeled isolated strokes based on the marked strokes comprises:

merging the unlabeled sticky strokes to the labeled strokes if the unlabeled sticky strokes are sticky with the labeled strokes; and

merging the unmarked isolated stroke to the marked stroke closest to the unmarked isolated stroke; and

repeating the above steps until all the stuck strokes and the isolated strokes are marked.

Supplementary note 6. the method according to supplementary note 1, wherein the step S3 further includes:

merging all strokes with the same mark obtained in step S1 and step S2 together;

calculating the distance between two adjacent strokes; and

the first N pairs or the first M% pairs of two adjacent strokes that are closest to each other are merged together.

Additional notes 7. the method according to additional notes 5 or 6, wherein the distance is based on a difference between a maximum value among a leftmost position of one stroke and a leftmost position of the other stroke of the two adjacent strokes and a minimum value among a rightmost position of the one stroke and a rightmost position of the other stroke.

Supplementary note 8. the method according to any one of supplementary notes 1 to 6, wherein:

if the absolute value of the difference between the distance of a stroke and its two adjacent strokes is less than a first threshold, marking the stroke to the character whose recognition range overlaps most with the range of the stroke; and if the absolute value of the difference between the distance of a stroke and its two adjacent strokes is greater than said first threshold, marking the stroke to the character that is closest to it.

Supplementary note 9. the method according to supplementary note 8, wherein the first threshold is 8 pixels.

Reference 10. the method according to any of the references 1 to 6, wherein the recognized text lines are obtained by a convolutional recurrent neural network and the strokes of the character are obtained by an over-segmentation algorithm.

Supplementary notes 11. the method according to supplementary notes 10, further comprising modifying the end position obtained by the convolutional recurrent neural network for each character in the recognized text line prior to marking the core stroke.

Supplementary note 12. the method according to supplementary note 11, wherein the correcting comprises:

in the case where the character is different from its next character:

iteratively performing the step of subdividing until the time stamp before the end position of the next one of the characters is moved; and

in the case that the character is the same as its next character:

if the character is present in the first Q candidates of each time stamp between the two identical character end positions, the end position of the character is not changed, and

if the character does not exist in the first Q candidates of each time stamp between the two identical character end positions, the same processing as in the case where the character is different from the next character is performed,

wherein P and Q are positive integers greater than 1 and P is less than Q.

Supplementary note 13 the method according to supplementary note 12, wherein the second threshold is 0.01.

Supplementary notes 14. the method according to any of supplementary notes 1 to 6, further comprising fusing text lines recognized by different recognition models with the final localization result to obtain an accurate recognition result.

Reference 15. an apparatus for locating each character in a recognized line of text, comprising:

a first marking device configured to mark each character in the line of text with a core stroke, the mark indicating to which character in the line of text the stroke belongs;

wherein all strokes are marked to characters in the text line after processing by the second marking means and the merging means.

Supplementary note 16. the apparatus according to supplementary note 15, wherein the first marking device is further configured to:

for a character in the text line that has not yet been marked with a core stroke, marking unmarked strokes that overlap the core extent of the character as core strokes of the character, an

Apparatus according to annex 17, or 16, wherein the second labeling means is further configured to:

merging the unlabeled sticky strokes to the labeled strokes if the unlabeled sticky strokes are sticky with the labeled strokes; and

merging the unmarked isolated stroke to the marked stroke closest to the unmarked isolated stroke; and

the above process is repeated until all the stuck strokes and isolated strokes are marked.

Supplementary note 18. the apparatus according to supplementary note 17, wherein the merging means is further configured to:

merging all strokes with the same label together;

calculating the distance between two adjacent strokes; and

the first N pairs or the first M% pairs of two adjacent strokes that are closest to each other are merged together.

Supplementary notes 19. apparatus according to supplementary notes 15, further comprising correction means configured to correct the end position obtained by the convolutional recurrent neural network for each character in the recognized text line before marking the core stroke

Note 20. a computer-readable storage medium storing a program executable by a processor to perform the operations of:

operation S1: marking each character in the line of text with a core stroke, the marking indicating to which character in the line of text the stroke belongs;

operations S2 and S3 are repeated until all strokes are marked to a character in the line of text.

Finally, it should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Furthermore, without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Although the embodiments of the present invention have been described in detail with reference to the accompanying drawings, it should be understood that the above described embodiments are only configured to illustrate the present invention and do not constitute a limitation of the present invention. It will be apparent to those skilled in the art that various modifications and variations can be made in the above-described embodiments without departing from the spirit and scope of the invention. Accordingly, the scope of the invention is to be defined only by the claims appended hereto, and by their equivalents.

29页详细技术资料下载

Method, apparatus and storage medium for locating each character in a recognized text line

相关技术

网友询问留言