Text character segmentation method and device

文档序号:1953716 发布日期:2021-12-10 浏览:21次 中文

阅读说明:本技术 文本字符分割的方法和装置 (Text character segmentation method and device ) 是由 肖杨 王亚领 钟能 刘设伟 于 2021-09-10 设计创作,主要内容包括:本发明公开了一种文本字符分割的方法和装置,涉及计算机技术领域。该方法的一具体实施方式包括:使用深度学习网络获取文本行图像中各字符的中心区域坐标;对所述文本行图像进行图像处理以获取文本行的边界;根据所述文本行图像中各字符的中心区域坐标,和各字符的中心区域在所述文本行图像的垂直投影图像中的位置,确定相邻字符之间的分割点;根据所述文本行的边界和相邻字符之间的分割点,进行文本字符分割。该实施方式能够精确的进行字符分割,分割结果准确,进而可以提升文本识别的准确率,增加OCR结果的准确度,有效的代替人工操作,节省了人力和时间成本。(The invention discloses a method and a device for segmenting text characters, and relates to the technical field of computers. One embodiment of the method comprises: obtaining the central region coordinates of each character in the text line image by using a deep learning network; performing image processing on the text line image to acquire the boundary of the text line; determining a segmentation point between adjacent characters according to the coordinates of the central area of each character in the text line image and the position of the central area of each character in the vertical projection image of the text line image; and according to the boundary of the text line and the dividing point between the adjacent characters, performing text character division. This embodiment can be accurate carry out character segmentation, and the segmentation result is accurate, and then can promote text recognition's rate of accuracy, increases OCR result's the degree of accuracy, and manpower and time cost have been saved in effectual replacement manual operation.)

1. A method of text character segmentation, comprising:

obtaining the central region coordinates of each character in the text line image by using a deep learning network;

performing image processing on the text line image to acquire the boundary of the text line;

determining a segmentation point between adjacent characters according to the coordinates of the central area of each character in the text line image and the position of the central area of each character in the vertical projection image of the text line image;

and according to the boundary of the text line and the dividing point between the adjacent characters, performing text character division.

2. The method of claim 1, wherein obtaining the coordinates of the center region of each character in the image of the text line using a deep learning network comprises:

performing feature extraction on the text line image by using a convolutional neural network to obtain a feature map;

converting the characteristic diagram into a characteristic vector sequence according to the set length of the characteristic vector sequence;

and inputting the characteristic vector sequence into a recurrent neural network to obtain the central area coordinates of each character in the text line image.

3. The method of claim 2, wherein before using the convolutional neural network to perform feature extraction on the text line image to obtain the feature map, further comprising:

zooming the text line image according to a set zooming factor;

and inputting the feature vector sequence into a recurrent neural network to obtain the coordinates of the central area of each character in the text line image, wherein the obtaining of the coordinates of the central area of each character comprises the following steps:

inputting the characteristic vector sequence into a recurrent neural network to obtain the coordinates of the central area of each character in the zoomed text line image;

and calculating the coordinates of the central area of each character in the text line image according to the scaling factor and the obtained coordinates of the central area of each character in the scaled text line image.

4. The method of claim 1, wherein image processing the text line image to obtain the boundaries of the text lines comprises:

carrying out binarization processing on the text line image to obtain a binary image;

acquiring horizontal direction projection and vertical direction projection of the binary image;

and determining the upper and lower boundaries of the text line image according to the horizontal direction projection, and determining the left and right boundaries of the text line image according to the vertical direction projection.

5. The method of claim 4, wherein obtaining the horizontal direction projection and the vertical direction projection of the binary image comprises:

calculating the sum of pixel values of each line in the binary image to obtain the horizontal direction projection of the binary image;

and calculating the sum of each row of pixel values in the binary image to obtain the vertical projection of the binary image.

6. The method according to claim 4, wherein the pixel value of the pixel point in the binary image is 0 or 255;

determining the upper and lower boundaries of the text line image according to the horizontal direction projection, and determining the left and right boundaries of the text line image according to the vertical direction projection comprises:

according to the horizontal direction projection, the sum of the pixel values of each horizontal line is sequentially obtained from top to bottom, and the horizontal line with the first pixel value sum not being 0 is used as the upper boundary of the text line image; sequentially acquiring the sum of pixel values of each horizontal line from bottom to top, and taking the horizontal line of which the first pixel value sum is not 0 as the lower boundary of the text line image;

according to the vertical direction projection, sequentially obtaining the sum of pixel values of each vertical column from left to right, and taking the vertical column with the first pixel value sum not being 0 as the left boundary of the text line image; and sequentially acquiring the sum of the pixel values of each vertical column from right to left, and taking the vertical column of which the sum of the first pixel value is not 0 as the right boundary of the text line image.

7. The method of claim 1, wherein determining the segmentation points between adjacent characters based on the coordinates of the center region of each character in the text line image and the position of the center region of each character in the vertically projected image of the text line image comprises:

judging whether blank interval areas exist between the central areas of adjacent characters according to the central area coordinates of each character in the text line images and the positions of the central areas of each character in the vertical projection images of the text line images, wherein the blank interval areas are areas with the sum of pixel values of columns in the vertical projection images being continuously 0;

under the condition that a blank space region exists between the center regions of adjacent characters, selecting the center of the blank space region closest to the center region of the left character in the adjacent characters as a segmentation point between the adjacent characters;

and under the condition that no blank space region exists between the central regions of the adjacent characters, selecting a column which is closest to the central region of the left character in the adjacent characters and has the smallest sum of pixel values of the columns as a segmentation point between the adjacent characters.

8. An apparatus for text character segmentation, comprising:

the first processing module is used for acquiring the central region coordinates of each character in the text line image by using a deep learning network;

the second processing module is used for carrying out image processing on the text line image so as to obtain the boundary of the text line;

the segmentation point determination module is used for determining segmentation points between adjacent characters according to the coordinates of the central area of each character in the text line image and the position of the central area of each character in the vertical projection image of the text line image;

and the character segmentation module is used for segmenting text characters according to the boundaries of the text lines and segmentation points between adjacent characters.

9. An electronic device for text character segmentation, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.

10. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-7.

Technical Field

The invention relates to the technical field of computers, in particular to a text character segmentation method and a text character segmentation device.

Background

In the insurance claim settlement link, the client uploads a plurality of claim settlement image data including cards, medical bills and the like. Then, Character Recognition is carried out through an OCR (Optical Character Recognition) technology to realize automatic card information and medical information input, auxiliary information extraction, medical knowledge map construction and the like, and important work such as automatic claim settlement or claim quality inspection is realized.

However, in the image data uploaded by the client, due to the reasons of the photographing device, the angle, the distance, the light, the printing or scanning depth and the misalignment of the bill itself, the characters in the image may be blurred, inclined, covered, and the like, so that the accuracy of character recognition is reduced, and the condition that part of the characters are recognized incorrectly for the key fields often occurs. At this time, character segmentation needs to be carried out on the field text line image, character recognition is further carried out, and the accuracy of field text recognition is improved. At present, the character segmentation is usually performed by using a conventional image processing or deep learning method.

However, in the process of implementing the present invention, the inventor finds that the current commonly used character segmentation method is difficult to precisely segment fuzzy characters with serious noise interference, so that the character segmentation result is not accurate enough, and the accuracy of text recognition is seriously affected.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method and an apparatus for text character segmentation, which can perform character segmentation accurately, and the segmentation result is accurate, so that the accuracy of text recognition can be improved, the accuracy of OCR result is increased, manual operation is effectively replaced, and labor and time costs are saved.

To achieve the above object, according to an aspect of an embodiment of the present invention, a method for text character segmentation is provided.

A method of text character segmentation, comprising:

obtaining the central region coordinates of each character in the text line image by using a deep learning network;

performing image processing on the text line image to acquire the boundary of the text line;

determining a segmentation point between adjacent characters according to the coordinates of the central area of each character in the text line image and the position of the central area of each character in the vertical projection image of the text line image;

and according to the boundary of the text line and the dividing point between the adjacent characters, performing text character division.

Optionally, the obtaining of the coordinates of the center region of each character in the text line image using the deep learning network includes:

performing feature extraction on the text line image by using a convolutional neural network to obtain a feature map;

converting the characteristic diagram into a characteristic vector sequence according to the set length of the characteristic vector sequence;

and inputting the characteristic vector sequence into a recurrent neural network to obtain the central area coordinates of each character in the text line image.

Optionally, before extracting features of the text line image using a convolutional neural network to obtain a feature map, the method further includes:

zooming the text line image according to a set zooming factor;

and inputting the feature vector sequence into a recurrent neural network to obtain the coordinates of the central area of each character in the text line image, wherein the obtaining of the coordinates of the central area of each character comprises the following steps:

inputting the characteristic vector sequence into a recurrent neural network to obtain the coordinates of the central area of each character in the zoomed text line image;

and calculating the coordinates of the central area of each character in the text line image according to the scaling factor and the obtained coordinates of the central area of each character in the scaled text line image.

Optionally, the image processing the text line image to obtain the boundary of the text line includes:

carrying out binarization processing on the text line image to obtain a binary image;

acquiring horizontal direction projection and vertical direction projection of the binary image;

and determining the upper and lower boundaries of the text line image according to the horizontal direction projection, and determining the left and right boundaries of the text line image according to the vertical direction projection.

Optionally, acquiring the horizontal direction projection and the vertical direction projection of the binary image comprises:

calculating the sum of pixel values of each line in the binary image to obtain the horizontal direction projection of the binary image;

and calculating the sum of each row of pixel values in the binary image to obtain the vertical projection of the binary image.

Optionally, the pixel value of a pixel point in the binary image is 0 or 255;

determining the upper and lower boundaries of the text line image according to the horizontal direction projection, and determining the left and right boundaries of the text line image according to the vertical direction projection comprises:

according to the horizontal direction projection, the sum of the pixel values of each horizontal line is sequentially obtained from top to bottom, and the horizontal line with the first pixel value sum not being 0 is used as the upper boundary of the text line image; sequentially acquiring the sum of pixel values of each horizontal line from bottom to top, and taking the horizontal line of which the first pixel value sum is not 0 as the lower boundary of the text line image;

according to the vertical direction projection, sequentially obtaining the sum of pixel values of each vertical column from left to right, and taking the vertical column with the first pixel value sum not being 0 as the left boundary of the text line image; and sequentially acquiring the sum of the pixel values of each vertical column from right to left, and taking the vertical column of which the sum of the first pixel value is not 0 as the right boundary of the text line image.

Optionally, determining the segmentation point between the adjacent characters according to the coordinates of the central region of each character in the text line image and the position of the central region of each character in the vertically projected image of the text line image comprises:

judging whether blank interval areas exist between the central areas of adjacent characters according to the central area coordinates of each character in the text line images and the positions of the central areas of each character in the vertical projection images of the text line images, wherein the blank interval areas are areas with the sum of pixel values of columns in the vertical projection images being continuously 0;

under the condition that a blank space region exists between the center regions of adjacent characters, selecting the center of the blank space region closest to the center region of the left character in the adjacent characters as a segmentation point between the adjacent characters;

and under the condition that no blank space region exists between the central regions of the adjacent characters, selecting a column which is closest to the central region of the left character in the adjacent characters and has the smallest sum of pixel values of the columns as a segmentation point between the adjacent characters.

According to another aspect of the embodiments of the present invention, there is provided an apparatus for text character segmentation.

An apparatus for text character segmentation, comprising:

the first processing module is used for acquiring the central region coordinates of each character in the text line image by using a deep learning network;

the second processing module is used for carrying out image processing on the text line image so as to obtain the boundary of the text line;

the segmentation point determination module is used for determining segmentation points between adjacent characters according to the coordinates of the central area of each character in the text line image and the position of the central area of each character in the vertical projection image of the text line image;

and the character segmentation module is used for segmenting text characters according to the boundaries of the text lines and segmentation points between adjacent characters.

According to yet another aspect of an embodiment of the present invention, an electronic device for text character segmentation is provided.

An electronic device for text character segmentation, comprising: one or more processors; the storage device is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors implement the method for segmenting the text characters provided by the embodiment of the invention.

According to yet another aspect of embodiments of the present invention, a computer-readable medium is provided.

A computer readable medium, on which a computer program is stored, which when executed by a processor implements a method of text character segmentation as provided by embodiments of the present invention.

One embodiment of the above invention has the following advantages or benefits: acquiring the central region coordinates of each character in the text line image by using a deep learning network; performing image processing on the text line image to acquire the boundary of the text line; determining a segmentation point between adjacent characters according to the coordinates of the central area of each character in the text line image and the position of the central area of each character in the vertical projection image of the text line image; according to the technical scheme of text character segmentation according to the boundary of the text line and the segmentation points between the adjacent characters, the segmentation points between the adjacent characters are determined by combining a deep learning network and an image processing technology and are used for text character segmentation, accurate character segmentation is achieved, the segmentation result is accurate, the accuracy of character recognition can be improved, the reliability of an OCR result is increased, manual operation is effectively replaced, and labor and time cost are saved.

Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

FIG. 1 is a schematic diagram illustrating a prior art implementation of text character segmentation;

FIG. 2 is a diagram illustrating a character segmentation error condition in the prior art;

FIG. 3 is a diagram illustrating the main steps of a method for text character segmentation according to an embodiment of the present invention;

FIG. 4 is a process flow diagram of deep learning branching according to one embodiment of the invention;

FIG. 5 is a process flow diagram of text character segmentation, in accordance with one embodiment of the present invention;

FIG. 6 is a flow chart illustrating an implementation of text character segmentation according to an embodiment of the present invention;

FIG. 7 is a process flow diagram of text character segmentation in accordance with another embodiment of the present invention;

FIG. 8 is a schematic diagram of the main blocks of an apparatus for text character segmentation according to an embodiment of the present invention;

FIG. 9 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;

fig. 10 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

At present, in the insurance claim settlement link, card and medical image text information need to be manually input or verified, and time and labor are consumed. The method has the advantages that the OCR technology is used for recognizing the claim image, labor can be saved, the time consumption for claim settlement is shortened, the image quality influences the text recognition accuracy, the key fields are processed by character segmentation, the character recognition is further performed, the text recognition accuracy can be effectively improved, and manual entry or audit is effectively replaced.

Fig. 1 is a schematic diagram of a prior art implementation principle of text character segmentation. As shown in fig. 1, the conventional character segmentation method generally includes the following steps: firstly, performing binarization processing and the like on a text line image to obtain a binary image; then, horizontally projecting and vertically projecting the binary image, obtaining upper and lower boundaries of a text line based on a horizontal projection result, and obtaining left and right boundaries of the text line and the boundary position of each character based on vertical projection; and then, the segmentation of each character is realized by utilizing the upper, lower, left and right boundaries and the character boundaries, or the segmentation of each character is realized by combining the boundaries and setting the character width.

However, this method cannot perform precise segmentation of characters, so that the text recognition accuracy is greatly reduced. Fig. 2 is a diagram illustrating a character segmentation error condition in the prior art. As shown in fig. 2, the character segmentation error in the prior art mainly appears in the following three aspects: 1. for characters with separated strokes, the segmentation effect is poor; 2. characters such as Chinese characters, letters, punctuations and the like have inconsistent character widths in a text line, and the setting of a character width threshold value is difficult to adapt to various characters; 3. the character adhesion and poor segmentation effect are caused by factors such as fuzzy text lines, printing/scanning quality, complex strokes, noise and the like when vertical projection is carried out.

Accordingly, the invention provides a text character segmentation method and a text character segmentation device, which can perform character segmentation on key fields in a card and a medical image in a claim settlement case, and mainly solves the following problems: 1. the character segmentation is difficult due to the existence of stroke-separated characters in the text line; 2. the problem of difficult character segmentation caused by the inconsistent widths of various characters such as Chinese characters, letters and punctuations in a text line; 3. the character is stuck in the text line due to factors such as fuzzy, printing/scanning quality, complex stroke, noise and the like, so that the character is difficult to divide. And then promote the rate of accuracy of character recognition, increase the credibility of OCR result, effectual replacement manual work is the key link of realizing claim quality control and automatic claim settlement.

The text character segmentation method provided by the invention can accurately segment text characters in the claim image, has strong universality, is suitable for processing segmentation of text line characters in images of printing, scanning and electronic versions, and can segment character types including Chinese characters, English, punctuations and the like. The method for segmenting the text characters combines the deep learning algorithm and the image processing algorithm to realize accurate character segmentation. The method comprises the steps of predicting the number of characters in a text line image by utilizing a deep learning network, and obtaining an estimated center coordinate of each character in the text line; utilizing an image processing method to obtain the vertical and horizontal projections of a text line; and then determining character boundaries by combining the number of characters of text lines in the deep learning result, the coordinates of the character estimation center and the vertical and horizontal direction projections in the image processing result, and realizing accurate character segmentation.

Fig. 3 is a schematic diagram of the main steps of a method for text character segmentation according to an embodiment of the present invention. As shown in fig. 3, the method for segmenting text characters according to the embodiment of the present invention mainly includes the following steps S301 to S304.

Step S301: obtaining the central region coordinates of each character in the text line image by using a deep learning network;

step S302: performing image processing on the text line image to acquire the boundary of the text line;

step S303: determining a segmentation point between adjacent characters according to the coordinates of the central area of each character in the text line image and the position of the central area of each character in the vertical projection image of the text line image;

step S304: and according to the boundary of the text line and the dividing point between the adjacent characters, performing text character division.

Through the steps S301 to S304, the central area coordinates of each character in the image of the text line are predicted by using the deep learning network, the boundary of the text line is obtained by using an image processing technology, then the dividing points between the adjacent characters are determined by combining the central area coordinates of each character and the positions of each character in the vertical projection image, and finally the text character is divided according to the boundary of the text line and the dividing points between the adjacent characters, so that the dividing points between the adjacent characters are determined by combining the deep learning network and the image processing technology for text character division, accurate character division is realized, the dividing result is accurate, the accuracy of character recognition can be improved, the reliability of an OCR result is increased, manual operation is effectively replaced, and labor and time costs are saved. In specific execution, the step S301 and the step S302 do not have a sequential order, and may be executed sequentially or simultaneously.

According to an embodiment of the present invention, when obtaining the center region coordinates of each character in the text line image using the deep learning network in step S301, the following steps may be specifically performed:

step S3011: performing feature extraction on the text line image by using a convolutional neural network to obtain a feature map;

step S3012: converting the characteristic diagram into a characteristic vector sequence according to the set length of the characteristic vector sequence;

step S3013: and inputting the characteristic vector sequence into a recurrent neural network to obtain the central area coordinates of each character in the text line image.

In step S3011, when feature extraction is performed on the text line image, a convolutional neural network CNN may be selected for performing, and the CNN network is preferably a ResNet network; the recurrent neural network RNN used in step S3013 is preferably a bidirectional LSTM (Long Short-Term Memory artificial neural network). By inputting the feature vector sequence into the RNN network, it is possible to predict whether or not a text line image region corresponding to each feature vector sequence contains characters, and the number of characters contained in the text line image. If a text line image area contains a character, the area is used as the center area of the character.

According to another embodiment of the present invention, before the step S3011, performing feature extraction on the text line image by using a convolutional neural network to obtain a feature map, the method further includes:

zooming the text line image according to a set zooming factor;

and step S3013, inputting the feature vector sequence to a recurrent neural network to obtain a central area coordinate of each character in the text line image, which may specifically include:

inputting the characteristic vector sequence into a recurrent neural network to obtain the coordinates of the central area of each character in the zoomed text line image;

and calculating the coordinates of the central area of each character in the text line image according to the scaling factor and the obtained coordinates of the central area of each character in the scaled text line image.

The central region coordinates of each character in the text line image can be obtained by scaling the text line image before feature extraction and scaling coordinate values opposite to the scaling of the image according to the scaling factor after the output of the recurrent neural network is obtained. By scaling the image, the dimensionality of the feature vector can be reduced, and the calculation data volume of the recurrent neural network is reduced, so that the calculation efficiency of the central region coordinates of each character in the text line image is improved.

Fig. 4 is a schematic processing flow diagram of the deep learning branch according to an embodiment of the present invention, which shows a main processing flow for acquiring the coordinates of the center region of each character in the text line image by using the deep learning network. The processing flow of the deep learning branch is as follows:

1. carrying out image scaling resize operation on the text line image, wherein the width and the height of the original image of the text line image are w and H respectively, and scaling the height of the text line image to a fixed value H, wherein the scaling factor ratio is H/H, and preferably H is 32; the width of the text line image is scaled to W according to the aspect ratio of the original image. Then, the image size after resize is H × W × C, where C represents the number of image channels;

2. and performing CNN feature extraction on the image after resize, wherein ResNet is preferably selected by the CNN network. Acquiring a feature map of the image after resize, wherein the size of the feature map is 1 x (W/STEP) x C1STEP represents the STEP size of the horizontal pixel in the image after resize, C1Representing the number of channels of the characteristic diagram;

3. converting a feature map obtained by CNN feature extraction into a feature vector sequence, setting the length of the feature vector sequence (hereinafter referred to as "time STEP") as T ═ W/STEP, and setting the size of each time STEP feature vector as C1X 1, translation invariance since the convolutional layer, max pooling layer, and activation function layer in CNN are performed on local areasEach feature vector corresponds to an H multiplied by STEP area in the image after resize;

4. the sequence of feature vectors is input to the RNN network, preferably bi-directional LSTM, and the sequence is predicted for a tag with a value of "signed" or "unsigned". The time STEP predicted as "having character" corresponds to the coordinates of a block of H × STEP area in the image after resize, and the area is the estimated coordinates of the character center area in the image after resize. Wherein, the number of time steps labeled as 'having characters' is the number of characters in the text line;

5. and converting the area coordinates corresponding to the time step marked as 'with characters' in the image after the resize into the area coordinates of the time step marked as 'with characters' in the original image by combining the scaling factor ratio, so as to obtain the central area coordinates of each character in the original image.

According to one embodiment of the present invention, when performing image processing on the text line image to obtain the boundary of the text line in step S302, the method specifically includes:

step S3021: carrying out binarization processing on the text line image to obtain a binary image;

step S3022: acquiring horizontal direction projection and vertical direction projection of the binary image;

step S3023: and determining the upper and lower boundaries of the text line image according to the horizontal direction projection, and determining the left and right boundaries of the text line image according to the vertical direction projection.

When the binarization processing is performed on the text line image, a general method is to convert the image into a gray scale image, and then convert the gray scale image into a binary image by a method such as adaptive threshold. When the horizontal direction projection and the vertical direction projection of the binary image are obtained, the horizontal direction projection of the binary image is obtained by calculating the sum of the pixel values of each line in the binary image; and calculating the sum of each row of pixel values in the binary image to obtain the vertical projection of the binary image.

In the embodiment of the present invention, the pixel value of the pixel point in the binary image is 0 or 255. Further, when determining the upper and lower boundaries of the text line image according to the horizontal direction projection and determining the left and right boundaries of the text line image according to the vertical direction projection, step S3023 is specifically implemented by the following steps:

according to the horizontal direction projection, the sum of the pixel values of each horizontal line is sequentially obtained from top to bottom, and the horizontal line with the first pixel value sum not being 0 is used as the upper boundary of the text line image; sequentially acquiring the sum of pixel values of each horizontal line from bottom to top, and taking the horizontal line of which the first pixel value sum is not 0 as the lower boundary of the text line image;

according to the vertical direction projection, sequentially obtaining the sum of pixel values of each vertical column from left to right, and taking the vertical column with the first pixel value sum not being 0 as the left boundary of the text line image; and sequentially acquiring the sum of the pixel values of each vertical column from right to left, and taking the vertical column of which the sum of the first pixel value is not 0 as the right boundary of the text line image.

According to another embodiment of the present invention, when determining the dividing point between the adjacent characters according to the coordinates of the central area of each character in the text line image and the position of the central area of each character in the vertically projected image of the text line image in step S303, the method may specifically include:

judging whether blank interval areas exist between the central areas of adjacent characters according to the central area coordinates of each character in the text line images and the positions of the central areas of each character in the vertical projection images of the text line images, wherein the blank interval areas are areas with the sum of pixel values of columns in the vertical projection images being continuously 0;

under the condition that a blank space region exists between the center regions of adjacent characters, selecting the center of the blank space region closest to the center region of the left character in the adjacent characters as a segmentation point between the adjacent characters;

and under the condition that no blank space region exists between the central regions of the adjacent characters, selecting a column which is closest to the central region of the left character in the adjacent characters and has the smallest sum of pixel values of the columns as a segmentation point between the adjacent characters.

According to the method for determining the segmentation points between the adjacent characters, only one segmentation point can be found between the adjacent characters, so that the problems that more than 1 segmentation point occurs to the adjacent characters in the vertical projection image and the characters are segmented wrongly due to character stroke separation can be solved, because the accurate number of characters included in the text line output by the deep neural network and the estimated central coordinates of the characters exist in the adjacent characters and only one segmentation point can be obtained; the problems that no segmentation point exists between adjacent characters in a vertical projection image due to character adhesion and character segmentation is wrong are solved, because the accurate character number of a text line output by a deep neural network and the estimated center coordinates of the characters, a segmentation point can be obtained by the adjacent characters certainly. According to the technical scheme of the invention, the segmentation point is substantially a vertical column between adjacent characters, and the mapping in the vertical projection image is the segmentation point.

FIG. 5 is a process flow diagram of text character segmentation according to an embodiment of the present invention, wherein the process flow diagram of text character segmentation is shown in combination with a deep learning branch and an image processing branch. As shown in fig. 5, the top row in the figure shows the processing result of the deep learning branch in fig. 4, where a plurality of small rectangles in the vertical direction are the center regions of the obtained characters; the second row of which shows the vertical projection of the output of the image processing branch. The position of the central region of each character in the vertical projection drawing can be obtained according to the coordinates of the central region of each character and the vertical projection drawing of the text line image, as shown in the third line in the drawing. Then, according to the coordinates of the central region of each character and the position of the central region of each character in the vertical projection drawing, the division point between the central regions of adjacent characters, that is, the division point of adjacent characters, can be obtained, as indicated by the arrow at the fourth row in the drawing. Finally, the characters are divided by the upper and lower boundaries, the left and right boundaries of the text line and the division points of the adjacent characters, namely, the result shown in the last line in the figure is obtained.

Fig. 6 is a schematic flow chart of implementing text character segmentation according to an embodiment of the present invention. As shown in fig. 6, the text line image is processed by two branches, which are indicated in the boxes, wherein the branch in the left box is a deep learning branch, and the branch in the right box is an image processing branch. For the deep learning branch, firstly, carrying out image zooming processing, zooming the image height to a fixed value H according to a zooming factor, and zooming the image width to W in equal proportion according to the aspect ratio of the original image; performing CNN feature extraction on the zoomed image to obtain a feature map; then, converting the characteristic diagram into a characteristic vector sequence; then, performing character prediction on the feature vector sequence by using a recurrent neural network, and acquiring region coordinates corresponding to the feature vectors with characters; and finally, calculating according to the zooming factor and the previously acquired region coordinates, and determining the center region coordinates of each character in the text line image. For the image processing branch, firstly, image binarization processing is carried out; then, projecting the image after binarization processing to obtain horizontal projection and vertical projection; then, the upper and lower boundaries of the text line are determined from the horizontal projection, and the left and right boundaries of the text line are determined from the vertical projection, thereby determining the boundaries of the text line. And then, combining the coordinates of the central area of each character in the text line image output by the deep learning branch and the position of the central area of each character in the vertical projection image to determine the segmentation point between the characters. Finally, the characters are segmented according to the boundaries of the text lines and the segmentation points between adjacent characters.

The following further illustrates the implementation of the present invention with reference to fig. 7 and another embodiment. FIG. 7 is a process flow diagram of text character segmentation according to another embodiment of the invention. In the embodiment, a character segmentation process of a project name in a medical invoice at the time of insurance claim settlement is shown, which can realize accurate character segmentation and improve the accuracy of character recognition. The processing flow of text character segmentation is as follows:

1) the two branches of processing of deep learning and image processing are simultaneously performed on the text line image of the input item name. Wherein, the deep learning branch comprises the following steps 2) to 6), and the image processing branch comprises the following steps 7) to 9);

2) a deep learning branch, performing a resize operation on a text line RGB image, wherein the width and height of an original image are W and H respectively, the height of the zoomed text line image is up to a fixed value H of 32, the ratio of a zoom factor is 32/H, the width of the text line is proportionally scaled to W according to the width and height ratio of the original image, the W of 32/ratio, the size of the image after the resize is 32 xW × C, and the C of 3 represents the number of image channels;

3) performing CNN feature extraction on the image after resize, selecting one of ResNet, VGG, GoogleLeNet or MobileNet by the CNN network, preferably ResNet, and acquiring a feature map of the resize image, wherein the size of the feature map is 1 x (W/STEP) x C1Let STEP equal to 4, denote the horizontal pixel STEP size in resize image, C1Representing the number of channels of the characteristic diagram;

4) converting the characteristic graph extracted by the CNN into a characteristic vector sequence s ═ t1,t2,……,tTThe time STEP of the feature vector sequence is T ═ W/STEP ═ 4, and each time STEP feature vector T isi(i e (1, T)) has a size C1X 1, with translation invariance due to the fact that convolution layer, max pooling layer and activation function layer in CNN are performed on local area, ith feature vector tiThe rectangular area corresponding to a 32 × 4 block in the resize diagram is [ [ i × 4,0 ]],[(i+1)*4,32]]Center point P of1=[i*4,0],P2=[(i+1)*4,32]Respectively representing the coordinates of the top left corner vertex and the bottom right corner vertex of the rectangular area;

5) inputting the converted feature vector sequence into an RNN network, preferably a bidirectional LSTM, with a prediction sequence s ═ t1,t2,……,tTThe label of which outputs a predicted label list l ═ y (t)1),y(t2),...,y(tT) E.g. { "no character", "with character", "-"; corresponding to the time step predicted as 'having characters' to a 32 x 4 area coordinate in the resize image, wherein the area coordinate is the estimated coordinate of the character center area in the resize image;

6) for the time step labeled as 'with character' of the output of the deep learning branch and the coordinate of the area of the time step corresponding to the resize graph, the number of the time steps labeled as 'with character' is the character of the text lineThe number of the cells; coordinate of region in resize map for "character" time step, e.g. assume tiThe time step is predicted to be "with characters", corresponding to the region coordinate in the resize map as [ [ i 4,0 ]],[(i+1)*4,32]]The area coordinates converted into the original image are [ [ (i 4)/ratio,0 ] in combination with the scaling factor ratio],[((i+1)*4)/ratio,h]]Obtaining the estimated central area coordinates of the characters in the original image;

7) the image processing flow is to carry out binarization processing on the text line image, convert the text line image into a gray image, convert the gray image into a binary image by utilizing an OTSU binarization threshold segmentation method, and the pixel value of a pixel point in the image is 0 or 255;

8) acquiring horizontal and vertical projection images of the binary image, wherein the horizontal projection is the sum of pixel values of each line of pixel points of the binary image of the text line; vertical projection, namely calculating the sum of pixel values of each row of pixel points of a binary image of a text line;

9) determining the upper and lower boundaries of the text line image by using the horizontal direction projection output by the image processing branch; determining the left and right boundaries of the text line image by using the vertical direction projection output by the image processing branch; according to the horizontal direction projection, the sum of the pixel values of each horizontal line is sequentially obtained, and the horizontal line of which the first pixel value sum is not 0 is used as the upper boundary of the text line image; sequentially acquiring the sum of pixel values of each horizontal line from bottom to top, and taking the horizontal line of which the first pixel value sum is not 0 as the lower boundary of the text line image; according to the vertical direction projection, the sum of the pixel values of each vertical column is sequentially obtained from left to right, and the vertical column with the first pixel value sum not being 0 is used as the left boundary of the text line image; sequentially acquiring the sum of pixel values of each vertical column from right to left, and taking the vertical column of which the sum of the first pixel value is not 0 as the right boundary of the text line image;

10) obtaining the position of the character center area in the vertical projection image by using the estimated character center area coordinates, and searching a unique segmentation point between the estimated center positions of adjacent characters:

a. when blank space areas (areas representing that the sum of pixel values of columns in the vertical projection image is continuously 0) exist between the adjacent character center areas, taking the center column of the blank space area closest to the left character center area as a segmentation point of the adjacent character;

b. when no blank space area exists between the adjacent character center areas, selecting a column which is closest to the left character center area and has the smallest sum of pixel values as a division point of the adjacent characters between the adjacent character center areas;

11) characters are segmented using upper and lower boundaries, left and right boundaries of lines of text, and segmentation points between adjacent characters.

And obtaining accurate characters after the project name segmentation according to the processing flow, wherein the characters are used for character recognition, and the final project name field recognition accuracy is improved.

According to another aspect of the invention, a device for segmenting text characters is also provided. Fig. 8 is a schematic diagram of main blocks of an apparatus for text character segmentation according to an embodiment of the present invention, and as shown in fig. 8, an apparatus 800 for text character segmentation according to an embodiment of the present invention mainly includes a first processing module 801, a second processing module 802, a segmentation point determining module 803, and a character segmentation module 804.

A first processing module 801, configured to obtain center region coordinates of each character in the text line image using a deep learning network;

a second processing module 802, configured to perform image processing on the text line image to obtain a boundary of a text line;

a dividing point determining module 803, configured to determine a dividing point between adjacent characters according to the coordinates of the central region of each character in the text line image and the position of the central region of each character in the vertically projected image of the text line image;

and the character segmentation module 804 is used for performing text character segmentation according to the boundary of the text line and the segmentation points between the adjacent characters.

According to an embodiment of the present invention, the first processing module 801 may further be configured to:

performing feature extraction on the text line image by using a convolutional neural network to obtain a feature map;

converting the characteristic diagram into a characteristic vector sequence according to the set length of the characteristic vector sequence;

and inputting the characteristic vector sequence into a recurrent neural network to obtain the central area coordinates of each character in the text line image.

According to another embodiment of the present invention, before feature extraction is performed on the text line image by using the convolutional neural network to obtain the feature map, the first processing module 801 may further be configured to:

zooming the text line image according to a set zooming factor;

and when the feature vector sequence is input to a recurrent neural network to obtain the center region coordinates of each character in the text line image, the method can be further used for:

inputting the characteristic vector sequence into a recurrent neural network to obtain the coordinates of the central area of each character in the zoomed text line image;

and calculating the coordinates of the central area of each character in the text line image according to the scaling factor and the obtained coordinates of the central area of each character in the scaled text line image.

According to yet another embodiment of the invention, the second processing module 802 may be further configured to:

carrying out binarization processing on the text line image to obtain a binary image;

acquiring horizontal direction projection and vertical direction projection of the binary image;

and determining the upper and lower boundaries of the text line image according to the horizontal direction projection, and determining the left and right boundaries of the text line image according to the vertical direction projection.

According to yet another embodiment of the invention, the second processing module 802 may be further configured to:

calculating the sum of pixel values of each line in the binary image to obtain the horizontal direction projection of the binary image;

and calculating the sum of each row of pixel values in the binary image to obtain the vertical projection of the binary image.

According to another embodiment of the present invention, the pixel value of the pixel point in the binary image is 0 or 255;

the second processing module 802 may also be configured to:

according to the horizontal direction projection, the sum of the pixel values of each horizontal line is sequentially obtained from top to bottom, and the horizontal line with the first pixel value sum not being 0 is used as the upper boundary of the text line image; sequentially acquiring the sum of pixel values of each horizontal line from bottom to top, and taking the horizontal line of which the first pixel value sum is not 0 as the lower boundary of the text line image;

according to the vertical direction projection, sequentially obtaining the sum of pixel values of each vertical column from left to right, and taking the vertical column with the first pixel value sum not being 0 as the left boundary of the text line image; and sequentially acquiring the sum of the pixel values of each vertical column from right to left, and taking the vertical column of which the sum of the first pixel value is not 0 as the right boundary of the text line image.

According to yet another embodiment of the present invention, the dividing point determining module 803 may be further configured to:

judging whether blank interval areas exist between the central areas of adjacent characters according to the central area coordinates of each character in the text line images and the positions of the central areas of each character in the vertical projection images of the text line images, wherein the blank interval areas are areas with the sum of pixel values of columns in the vertical projection images being continuously 0;

under the condition that a blank space region exists between the center regions of adjacent characters, selecting the center of the blank space region closest to the center region of the left character in the adjacent characters as a segmentation point between the adjacent characters;

and under the condition that no blank space region exists between the central regions of the adjacent characters, selecting a column which is closest to the central region of the left character in the adjacent characters and has the smallest sum of pixel values of the columns as a segmentation point between the adjacent characters.

According to the technical scheme of the embodiment of the invention, the central region coordinates of each character in the text line image are obtained by using a deep learning network; performing image processing on the text line image to acquire the boundary of the text line; determining a segmentation point between adjacent characters according to the coordinates of the central area of each character in the text line image and the position of the central area of each character in the vertical projection image of the text line image; according to the technical scheme of text character segmentation according to the boundary of the text line and the segmentation points between the adjacent characters, the segmentation points between the adjacent characters are determined by combining a deep learning network and an image processing technology and are used for text character segmentation, accurate character segmentation is achieved, the segmentation result is accurate, the accuracy of character recognition can be improved, the reliability of an OCR result is increased, manual operation is effectively replaced, and labor and time cost are saved.

Fig. 9 shows an exemplary system architecture 900 of a method of text character segmentation or an apparatus of text character segmentation to which embodiments of the present invention may be applied.

As shown in fig. 9, the system architecture 900 may include end devices 901, 902, 903, a network 904, and a server 905. Network 904 is the medium used to provide communication links between terminal devices 901, 902, 903 and server 905. Network 904 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

A user may use the terminal devices 901, 902, 903 to interact with a server 905 over a network 904 to receive or send messages and the like. The terminal devices 901, 902, 903 may have installed thereon various communication client applications, such as a text recognition application, a character segmentation application, a picture processing application, a text processing application, a mailbox client, social platform software, and the like (for example only).

The terminal devices 901, 902, 903 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 905 may be a server providing various services, such as a background management server (for example only) providing support for a character segmentation request sent by a user using the terminal devices 901, 902, 903. The background management server can perform deep learning network processing and image processing on the received data such as the text line image and the like, and determine the segmentation points between adjacent characters; processes such as text character segmentation are performed and the processing results (e.g., center region coordinates of each character, boundaries of text lines, character segmentation results-by way of example only) are fed back to the terminal device.

It should be noted that the method for segmenting text characters provided by the embodiment of the present invention is generally executed by the server 905, and accordingly, the apparatus for segmenting text characters is generally disposed in the server 905.

It should be understood that the number of terminal devices, networks, and servers in fig. 9 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring now to FIG. 10, a block diagram of a computer system 1000 suitable for use with a terminal device or server implementing an embodiment of the invention is shown. The terminal device or the server shown in fig. 10 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 10, the computer system 1000 includes a Central Processing Unit (CPU)1001 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)1002 or a program loaded from a storage section 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data necessary for the operation of the system 1000 are also stored. The CPU 1001, ROM 1002, and RAM 1003 are connected to each other via a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.

The following components are connected to the I/O interface 1005: an input section 1006 including a keyboard, a mouse, and the like; an output section 1007 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 1008 including a hard disk and the like; and a communication section 1009 including a network interface card such as a LAN card, a modem, or the like. The communication section 1009 performs communication processing via a network such as the internet. The driver 1010 is also connected to the I/O interface 1005 as necessary. A removable medium 1011 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1010 as necessary, so that a computer program read out therefrom is mounted into the storage section 1008 as necessary.

In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication part 1009 and/or installed from the removable medium 1011. The computer program executes the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 1001.

It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units or modules described in the embodiments of the present invention may be implemented by software, or may be implemented by hardware. The described units or modules may also be provided in a processor, and may be described as: a processor includes a first processing module, a second processing module, a segmentation point determination module, and a character segmentation module. Where the names of these units or modules do not constitute a limitation on the units or modules themselves in some cases, for example, the first processing module may also be described as "a module for acquiring the center region coordinates of each character in the text line image using the deep learning network".

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: obtaining the central region coordinates of each character in the text line image by using a deep learning network; performing image processing on the text line image to acquire the boundary of the text line; determining a segmentation point between adjacent characters according to the coordinates of the central area of each character in the text line image and the position of the central area of each character in the vertical projection image of the text line image; and according to the boundary of the text line and the dividing point between the adjacent characters, performing text character division.

According to the technical scheme of the embodiment of the invention, the central region coordinates of each character in the text line image are obtained by using a deep learning network; performing image processing on the text line image to acquire the boundary of the text line; determining a segmentation point between adjacent characters according to the coordinates of the central area of each character in the text line image and the position of the central area of each character in the vertical projection image of the text line image; according to the technical scheme of text character segmentation according to the boundary of the text line and the segmentation points between the adjacent characters, the segmentation points between the adjacent characters are determined by combining a deep learning network and an image processing technology and are used for text character segmentation, accurate character segmentation is achieved, the segmentation result is accurate, the accuracy of character recognition can be improved, the reliability of an OCR result is increased, manual operation is effectively replaced, and labor and time cost are saved.

The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

22页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种基于LAC-FLOSS算法和IER算法的时间序列分割方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!