Text detection method, device, medium and electronic equipment based on image processing

文档序号：1521505 发布日期：2020-02-11 浏览：8次中文

阅读说明：本技术 基于图像处理的文本检测方法、装置、介质及电子设备 (Text detection method, device, medium and electronic equipment based on image processing ) 是由张秋晖刘岩朱兴杰丁笑天于 2019-10-24 设计创作，主要内容包括：本发明实施例提供了一种基于图像处理的文本检测方法、装置、介质及电子设备,该方法包括：通过霍夫变换获取待检测图像的文本倾斜角度,并利用文本倾斜角度旋转待检测图像以得到倾斜校正图像；将倾斜校正图像输入预先训练的文本区域检测模型以得到倾斜校正图像中的文本待选区域；基于文本待选区域确定与待检测图像相对应的一个或者多个文本待选图像,并对各个文本待选图像进行灰度化处理；根据经过灰度化处理后的文本待选图像中的各个像素点的灰度值确定二值化分割阈值,并利用二值化分割阈值对文本待选图像进行二值化处理；识别经过二值化处理后的文本待选图像中的文字信息。该方法可以提高文字识别效率和识别精度。(The embodiment of the invention provides a text detection method, a text detection device, a text detection medium and electronic equipment based on image processing, wherein the method comprises the following steps: acquiring a text inclination angle of the image to be detected through Hough transformation, and rotating the image to be detected by utilizing the text inclination angle to obtain an inclination correction image; inputting the inclination correction image into a pre-trained text region detection model to obtain a text candidate region in the inclination correction image; determining one or more text images to be selected corresponding to the images to be detected based on the text areas to be selected, and carrying out gray processing on each text image to be selected; determining a binarization segmentation threshold value according to the gray value of each pixel point in the text image to be selected after the graying processing, and performing binarization processing on the text image to be selected by using the binarization segmentation threshold value; and identifying character information in the binary processed text image to be selected. The method can improve the character recognition efficiency and the recognition precision.)

1. A text detection method based on image processing is characterized by comprising the following steps:

acquiring a text inclination angle of an image to be detected through Hough transform, and rotating the image to be detected by utilizing the text inclination angle to obtain an inclination correction image;

inputting the inclination correction image into a pre-trained text region detection model to obtain a text region to be selected in the inclination correction image;

determining one or more text images to be selected corresponding to the images to be detected based on the text areas to be selected, and carrying out gray processing on each text image to be selected;

determining a binarization segmentation threshold value according to the gray value of each pixel point in the text image to be selected after the graying processing, and performing binarization processing on the text image to be selected by using the binarization segmentation threshold value;

and identifying character information in the image to be selected of the text after binarization processing.

2. The method for detecting text based on image processing according to claim 1, wherein the obtaining of the text inclination angle of the image to be detected through hough transform comprises:

carrying out binarization processing on an image to be detected to obtain a binarized image, and mapping pixel coordinates of each pixel point in the binarized image from a rectangular coordinate space to a polar coordinate space;

traversing target pixel points in the binary image, and calculating coordinate values of the target pixel points in the polar coordinate space by using a target function to determine polar coordinate point count values falling into each space grid in the polar coordinate space;

determining the spatial grid of which the polar coordinate point count value is greater than a preset threshold value as a target spatial grid so as to determine an inclination correction straight line corresponding to the target spatial grid in the binary image;

and determining the text inclination angle of the image to be detected according to the inclination angle of the inclination correction straight line.

3. The image-processing-based text detection method according to claim 2, wherein said traversing a target pixel point in the binarized image, and calculating a coordinate value of the target pixel point in the polar coordinate space using an objective function to determine a polar coordinate point count value falling within each spatial grid in the polar coordinate space, comprises:

determining the pixel value of a pixel point corresponding to the text position in the binary image as a target pixel value;

dividing the polar coordinate space by taking a preset length and a preset angle as intervals to obtain a plurality of spatial grids;

traversing target pixel points with target pixel values in the binary image to obtain coordinate values of the target pixel points in a rectangular coordinate space;

calculating the coordinate value of each polar coordinate point of the target pixel point in a polar coordinate space by using a target function according to the coordinate value of the target pixel point in a rectangular coordinate space;

and determining the count value of the polar coordinate point falling into each space grid in the polar coordinate space according to the coordinate value of each polar coordinate point in the polar coordinate space.

4. The image-processing-based text detection method of claim 1, wherein the text region detection model is a convolutional neural network model having a plurality of convolutional pooling units.

5. The image-processing-based text detection method according to claim 4, wherein the text region detection model includes a first convolution pooling unit, a second convolution pooling unit, a third convolution pooling unit, a fourth convolution pooling unit, and a fifth convolution pooling unit connected in sequence;

the first convolution pooling unit includes a first convolution layer and a first pooling layer;

the second convolution pooling unit comprises two second convolution layers and one second pooling layer;

the third convolution pooling unit comprises three third convolution layers and a third pooling layer;

the fourth convolution pooling unit comprises three fourth convolution layers and one fourth pooling layer;

the fifth convolution pooling unit includes three fifth convolution layers and one fifth pooling layer.

6. The image-processing-based text detection method according to claim 5, wherein inputting the tilt-corrected image into a pre-trained text region detection model to obtain a text candidate region in the tilt-corrected image comprises:

inputting the inclination correction image into a pre-trained text region detection model;

carrying out convolution pooling on the inclination correction image by each convolution pooling unit of the text region detection model in sequence;

and obtaining feature maps output by the third convolution pooling unit, the fourth convolution pooling unit and the fifth convolution pooling unit, and determining a text candidate area in the tilt correction image according to the feature maps.

7. The method for detecting the text based on the image processing according to any one of claims 1 to 6, wherein the determining a binarization segmentation threshold according to the gray value of each pixel point in the grayed text image to be selected and performing binarization processing on the text image to be selected by using the binarization segmentation threshold comprises:

performing convolution processing on the text image to be selected after the graying processing by using an expansion convolution core to obtain an expansion image;

carrying out convolution processing on the expansion image by utilizing a corrosion convolution core to obtain a corrosion image;

counting the gray value of each pixel point in the corrosion image to obtain a binarization segmentation threshold value, and determining a binarization threshold value range according to the binarization segmentation threshold value;

determining pixel points of the corrosion image with the gray value within the binarization threshold range as first pixel points, and determining pixel points of the corrosion image with the gray value outside the binarization threshold range as second pixel points;

and setting the pixel value of the first pixel point as a first pixel value and setting the pixel value of the second pixel point as a second pixel value.

8. A text detection apparatus based on image processing, comprising:

the inclination correction module is configured to obtain a text inclination angle of an image to be detected through Hough transformation, and rotate the image to be detected by utilizing the text inclination angle to obtain an inclination correction image;

the region detection module is configured to input the inclination correction image into a pre-trained text region detection model to obtain a text candidate region in the inclination correction image;

the image determining module is configured to determine one or more text images to be selected corresponding to the images to be detected based on the text areas to be selected, and perform graying processing on each text image to be selected;

the binarization processing module is configured to determine a binarization segmentation threshold value according to the gray value of each pixel point in the text image to be selected after the graying processing, and carry out binarization processing on the text image to be selected by using the binarization segmentation threshold value;

and the character recognition module is configured to recognize character information in the image to be selected of the text after binarization processing.

9. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the data processing method of any one of claims 1 to 7.

10. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to carry out a data processing method as claimed in any one of claims 1 to 7.

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a text detection method based on image processing, a text detection device based on image processing, a computer-readable medium, and an electronic device.

Background

With the development of computer vision and deep neural networks, the technology of text recognition has been greatly developed. The technology can be used for identifying certificates such as identity cards and the like, and has wide application prospect in bill identification. However, in practical application, the problem of image distortion to a certain extent exists in the links of image acquisition, processing, transmission and the like, so that the application range of bill identification is smaller, the accuracy is lower, and the manual effect cannot be completely achieved. Therefore, the method has great significance in establishing an effective image processing technology by applying some image processing means and improving the accuracy of image identification.

The existing character recognition methods are mainly divided into methods such as a traditional image algorithm and a machine learning neural network. The traditional image algorithm (such as threshold segmentation, straight line detection and the like) can obtain a better segmentation effect under the conditions that the content format is relatively fixed and the picture is clear, but cannot be applied to pictures with complex formats or blurry formats. The neural network algorithm is mainly used for convolutional neural network frameworks such as CTPN, SSD, EAST and the like at present, good effects can be obtained in invoice detection with complex formats, but the detection effect depends on training sample data seriously, and the detected text area is always larger than the actual area, so that the application in practice is limited.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present invention and therefore may include information that does not constitute prior art known to a person of ordinary skill in the art.

Disclosure of Invention

Embodiments of the present invention provide a text detection method based on image processing, a text detection device based on image processing, a computer readable medium, and an electronic device, so as to overcome technical problems of low text recognition efficiency, poor recognition accuracy, and the like, caused by defects and limitations of related technologies, at least to a certain extent.

Additional features and advantages of the invention will be set forth in the detailed description which follows, or may be learned by practice of the invention.

According to a first aspect of the embodiments of the present invention, there is provided a text detection method based on image processing, including:

inputting the inclination correction image into a pre-trained text region detection model to obtain a text region to be selected in the inclination correction image;

and identifying character information in the image to be selected of the text after binarization processing.

In some embodiments of the present invention, based on the above technical solutions, the obtaining of the text inclination angle of the image to be detected through hough transform includes:

and determining the text inclination angle of the image to be detected according to the inclination angle of the inclination correction straight line.

In some embodiments of the present invention, based on the above technical solution, the traversing the target pixel point in the binarized image, and calculating the coordinate value of the target pixel point in the polar coordinate space by using a target function to determine the polar coordinate point count value falling into each spatial grid in the polar coordinate space includes:

determining the pixel value of a pixel point corresponding to the text position in the binary image as a target pixel value;

dividing the polar coordinate space by taking a preset length and a preset angle as intervals to obtain a plurality of spatial grids;

traversing target pixel points with target pixel values in the binary image to obtain coordinate values of the target pixel points in a rectangular coordinate space;

In some embodiments of the present invention, based on the above technical solution, the text region detection model is a convolutional neural network model having a plurality of convolutional pooling units.

In some embodiments of the present invention, based on the above technical solution, the text region detection model includes a first convolution pooling unit, a second convolution pooling unit, a third convolution pooling unit, a fourth convolution pooling unit, and a fifth convolution pooling unit, which are connected in sequence;

the first convolution pooling unit includes a first convolution layer and a first pooling layer;

the second convolution pooling unit comprises two second convolution layers and one second pooling layer;

the third convolution pooling unit comprises three third convolution layers and a third pooling layer;

the fourth convolution pooling unit comprises three fourth convolution layers and one fourth pooling layer;

the fifth convolution pooling unit includes three fifth convolution layers and one fifth pooling layer.

In some embodiments of the present invention, based on the above technical solution, inputting the tilt-corrected image into a pre-trained text region detection model to obtain a text candidate region in the tilt-corrected image, including:

inputting the inclination correction image into a pre-trained text region detection model;

carrying out convolution pooling on the inclination correction image by each convolution pooling unit of the text region detection model in sequence;

In some embodiments of the present invention, based on the above technical solutions, the determining a binarization segmentation threshold according to a gray value of each pixel point in the grayed text image to be selected, and performing binarization processing on the text image to be selected by using the binarization segmentation threshold includes:

performing convolution processing on the text image to be selected after the graying processing by using an expansion convolution core to obtain an expansion image;

carrying out convolution processing on the expansion image by utilizing a corrosion convolution core to obtain a corrosion image;

and setting the pixel value of the first pixel point as a first pixel value and setting the pixel value of the second pixel point as a second pixel value.

According to a second aspect of the embodiments of the present invention, there is provided a text detection apparatus based on image processing, including:

and the character recognition module is configured to recognize character information in the image to be selected of the text after binarization processing.

According to a third aspect of embodiments of the present invention, there is provided a computer-readable medium, on which a computer program is stored, which when executed by a processor, implements the image processing-based text detection method as described in the first aspect of the embodiments above.

According to a fourth aspect of embodiments of the present invention, there is provided an electronic apparatus, including: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the image processing-based text detection method as described in the first aspect of the embodiments above.

The technical scheme provided by the embodiment of the invention has the following beneficial effects:

in the technical scheme provided by some embodiments of the invention, the text region in the image to be detected is accurately positioned, and then the text candidate image obtained by positioning is utilized to perform character recognition, so that the influence of other elements in the image to be detected on the character recognition effect can be reduced, and the character recognition efficiency and accuracy can be improved. In addition, the embodiment of the invention can also reduce the manual checking time, improve the checking efficiency and reduce the workload of manual checking, thereby saving a large amount of labor cost.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:

FIG. 1 schematically illustrates a flow chart of steps of a text detection method based on image processing in an embodiment of the present invention;

FIG. 2 is a flow chart schematically illustrating the steps of detecting the tilt angle of a text in an embodiment of the present invention;

FIG. 3 schematically illustrates a flow chart of steps for determining polar point technique values within respective spatial grids in an embodiment of the present invention;

FIG. 4 is a flow chart that schematically illustrates steps for obtaining a text candidate area in an embodiment of the present invention;

FIG. 5 is a flow chart schematically illustrating the steps of binarization processing of a candidate image of a text in the embodiment of the present invention;

FIG. 6 is a block diagram schematically illustrating a structure of a text detection apparatus based on image processing according to an embodiment of the present invention;

FIG. 7 illustrates a schematic structural diagram of a computer system suitable for use with the electronic device to implement an embodiment of the invention.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations or operations have not been shown or described in detail to avoid obscuring aspects of the invention.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

In the field of insurance claim settlement, when a customer applies for claim settlement, paper documents such as insurance policy and medical bill need to be uploaded to a background database of an insurance company, manual checking and entry are needed, and certain distortion and fuzziness phenomena may exist in the process of shooting, transmitting and storing image data, so that the problems of low efficiency, high error rate and the like of manual information checking exist.

The method combines a depth network and a traditional image processing technology, and has the main principle that firstly, the direction of characters in an image is positioned according to Hough transformation, then the inclination correction of the image is carried out, then a deep neural network is used for obtaining a region to be selected of the characters, and then the characters are accurately positioned by an image threshold processing method. The method combines the strong generalization of the deep neural network and the accuracy of the threshold segmentation method, can improve the accuracy of character region positioning to a certain extent, and further can improve the efficiency and the accuracy of text recognition.

Fig. 1 schematically shows a flow chart of steps of a text detection method based on image processing in an embodiment of the present invention. As shown in fig. 1, the method may mainly include the following steps:

and S110, obtaining a text inclination angle of the image to be detected through Hough transformation, and rotating the image to be detected by utilizing the text inclination angle to obtain an inclination correction image.

The method comprises the steps of firstly finding out character trends in an image to be detected by Hough transform to obtain a text inclination angle, and then carrying out rotation operation on the image to be detected by using the text inclination angle to obtain an inclination correction image after inclination correction. Optionally, before the tilt correction is performed on the image to be detected, the image to be detected may be preprocessed. For example, the to-be-detected image based on the RGB color space may be sharpened, denoised, and the like.

And S120, inputting the inclination correction image into a pre-trained text region detection model to obtain a text candidate region in the inclination correction image.

In order to accurately position the text position in the image, the embodiment of the invention can pre-construct the text region detection model, and train the text region detection model by using the data marked with the character position so as to improve the capability of the model for accurately identifying and acquiring the text candidate region in the image. The tilt-corrected image obtained in step S110 is input to the trained text region detection model, and the text region candidate in the tilt-corrected image may be output by the text region detection model. For a tilt-corrected image, the step may obtain one or more text candidate regions from the output of the text region detection model, specifically related to the distribution of characters in the image to be detected.

Step S130, one or more text candidate images corresponding to the images to be detected are determined based on the text candidate areas, and graying processing is carried out on each text candidate image.

According to the text candidate areas determined in step S120, in this step, the image to be detected may be segmented, and one or more text candidate images corresponding to the text candidate areas are obtained by segmenting from the image to be detected. Generally, the text candidate area determined in step S120 may be a rectangular area, and the step may use four vertex coordinates of the rectangular area to segment the text candidate image from the image to be detected. After the images to be selected of the texts are obtained, the step also needs to perform graying processing on the images to be selected of the texts so as to perform character recognition subsequently. Optionally, after the multiple text candidate areas are obtained in step S120, the text candidate areas may be screened in this step. For example, the step may count heights of all the candidate text regions and calculate an average value thereof, and then determine a height filtering range according to the height average value, for example, the height filtering range may be 0.7-1.2 times of the height average value. The text candidate regions are screened based on the height screening range, and the text candidate regions with overlarge sizes or undersize sizes outside the height screening range can be filtered, so that the accuracy of text positioning can be improved, and the situation that the text candidate regions contain or overlap with each other is avoided.

And S140, determining a binarization segmentation threshold value according to the gray value of each pixel point in the text image to be selected after the graying processing, and performing binarization processing on the text image to be selected by using the binarization segmentation threshold value.

Most areas in the image to be selected of the text are occupied by the text to be identified, so a binarization segmentation threshold value can be determined according to the gray value of each pixel point in the image to be selected of the text after the graying processing, each pixel point in the image to be selected of the text can be divided into two types by using the binarization segmentation threshold value, and then the binarization processing is carried out on the pixel points to obtain a corresponding binarization image. Alternatively, the binarization segmentation threshold used for binarization processing in this step may correspond to one point value or one numerical range.

And S150, identifying character information in the text image to be selected after binarization processing.

After the binarization processing is performed on the image to be detected in the step S140, the text information in the image to be detected in the text can be identified in the step S, so that the text detection and the text identification of the image to be detected are completed. In this step, the text information Recognition may specifically adopt an Optical Character Recognition (OCR) technology or any other text Recognition technology.

The text detection method based on image processing provided by the embodiment of the invention can accurately position the text region in the image to be detected, and then perform character recognition by using the positioned image to be selected of the text, thereby reducing the influence of other elements in the image to be detected on the character recognition effect, and improving the character recognition efficiency and accuracy. In addition, the embodiment of the invention can also reduce the manual checking time, improve the checking efficiency and reduce the workload of manual checking, thereby saving a large amount of labor cost.

The method adopts the thought of Hough transform to detect the direction of the characters in the image and perform inclination correction, and has better effect compared with the traditional method for directly detecting the characters in the image. Fig. 2 schematically shows a flowchart of the steps of detecting the tilt angle of the text in the embodiment of the present invention. As shown in fig. 2, on the basis of the above embodiment, the obtaining of the text inclination angle of the image to be detected through hough transform in step S110 may include the following steps:

and S210, carrying out binarization processing on the image to be detected to obtain a binarized image, and mapping the pixel coordinates of each pixel point in the binarized image from a rectangular coordinate space to a polar coordinate space.

The image to be detected can be a color image based on an RGB color space generally, and the step firstly carries out binarization processing on the image to be detected to obtain a binarization image. Specifically, the original image to be detected is converted into a gray image, and then the gray image is subjected to binarization processing to obtain a binarized image. The binarization is to set the gray value of the pixel point on the image to 0 or 255, that is, the whole image has an obvious visual effect only including black and white. Each pixel point on the binary image can be correspondingly represented as a coordinate point in a rectangular coordinate space, and the pixel coordinates of each pixel point in the binary image are converted from the rectangular coordinate to the polar coordinate by establishing the mapping relation between the rectangular coordinate space and the polar coordinate space. For example, the coordinate value of a certain pixel point in rectangular coordinate space is (x, y), and the coordinate value in polar coordinate space is (λ, μ). The value range of λ may be

μ can range from

Where len is the size of the binarized image, and may be, for example, the length of a rectangular image.

Step S220, traversing target pixel points in the binary image, and calculating coordinate values of the target pixel points in the polar coordinate space by using a target function so as to determine polar coordinate point count values falling into each space grid in the polar coordinate space.

Based on the specified value ranges of λ and μ, a plurality of spatial grids for uniform discretized sampling can be determined in polar coordinate space. According to the hough transform principle, one coordinate point in the original rectangular coordinate space can correspondingly determine one curve in the polar coordinate space. In the step, target pixel points with target pixel values in the binary image are traversed, and then the polar coordinate point count values falling into each space grid in the polar coordinate space can be determined according to the position coordinates of the target pixel points.

And step S230, determining the spatial grid with the polar coordinate point count value larger than a preset threshold value as a target spatial grid so as to determine an inclination correction straight line corresponding to the target spatial grid in the binary image.

The spatial grids with the polar coordinate point count value larger than the preset threshold value can be determined as target spatial grids by counting the number of target pixel points falling into each spatial grid. According to the hough transform principle, a straight line in the original rectangular coordinate space can be correspondingly determined according to a coordinate point in the polar coordinate space. Each target space grid in the step can determine one or a group of inclination correction straight lines corresponding to the target space grid in the binary image.

And S240, determining the text inclination angle of the image to be detected according to the inclination angle of the inclination correction straight line.

The tilt correction straight line obtained in step S230 can be used to substantially characterize the arrangement direction of the characters in the image to be detected. The text inclination angle of the image to be detected can be determined according to the inclination angle of the inclination correction straight line. Optionally, the inclination correction straight lines may be screened, for example, the inclination correction straight lines with inclination angles within a preset range (e.g., between-30 degrees and 30 degrees) may be determined as target straight lines, then one of the target straight lines with the largest number is selected, and the inclination angle of the target straight line is used as the text inclination angle of the image to be detected.

FIG. 3 schematically illustrates a flow chart of steps for determining polar point technique values within respective spatial grids in an embodiment of the present invention. As shown in fig. 3, on the basis of the above embodiment, step s220, traversing the target pixel point in the binarized image, and calculating the coordinate value of the target pixel point in the polar coordinate space by using an objective function to determine the polar coordinate point count value falling into each spatial grid in the polar coordinate space, may include the following steps:

and S310, determining the pixel value of the pixel point corresponding to the text position in the binary image as a target pixel value.

The pixel value of each pixel point in the binarized image can only have two values, and taking a black-and-white image as an example, the pixel value of each pixel point can only be 0 or 255. If the pixel value of the pixel point corresponding to the text position in the binarized image is 0, 0 may be taken as the target pixel value. If the pixel value of the pixel point corresponding to the text position in the binarized image is 255, 255 may be taken as the target pixel value.

And S320, dividing the polar coordinate space by taking the preset length and the preset angle as intervals to obtain a plurality of space grids.

The whole polar coordinate space can be divided into a plurality of space grids by taking the preset length and the preset angle as the sampling intervals of two polar coordinate parameters. For example, the predetermined length may be 0.02 × len, and the predetermined angle may be pi/30. For the polar coordinate parameters (λ, μ), spatial division can be performed according to λ taking interval 0.02 × len and μ taking interval pi/30, and each spatial grid determined by the spatial division serves as a sampling node, so that uniform discretization sampling is realized.

And S330, traversing target pixel points with target pixel values in the binary image to acquire coordinate values of all the target pixel points in a rectangular coordinate space.

And traversing each pixel point in the binary image, determining a target pixel point with a target pixel value by acquiring the pixel value of each pixel point, and acquiring the coordinate value of each target pixel point in a rectangular coordinate space. Each target pixel point can be regarded as a pixel point corresponding to the position of the text in the image to be detected.

And S340, calculating the coordinate value of each polar coordinate point of the target pixel point in the polar coordinate space by using a target function according to the coordinate value of the target pixel point in the rectangular coordinate space.

Each target pixel point in the binarized image, which is located in the rectangular coordinate space, may correspond to a curve in the polar coordinate space. For example, the coordinate value of a certain target pixel point in the rectangular coordinate space is (x) ₀,y ₀) Then, a curve equation determined by the target pixel point in the polar coordinate space is λ ═ x cos μ + y sin μ, and the curve equation serves as a target function, that is, the curve equation is used to represent coordinate values of each polar coordinate point corresponding to the target pixel point in the polar coordinate space.

And S350, determining the polar coordinate point count value falling into each space grid in the polar coordinate space according to the coordinate value of each polar coordinate point in the polar coordinate space.

By sampling the curve corresponding to each target pixel point using the spatial grids determined by division in step S320, the polar coordinate point count value falling within each spatial grid in the polar coordinate space can be determined. A plurality of curves in polar coordinate space can be correspondingly determined at a plurality of coordinate points located on the same straight line in rectangular coordinate space, and the curves will have a common intersection point, so that the spatial grid in which the intersection point is located will have a relatively high polar point count value. In other words, the spatial grid having a higher polar point count value may correspond to a straight line that determines the direction in which characters are arranged in the rectangular coordinate space. The inclination correction of the image to be detected can be realized by utilizing the determined straight line inclination angle subsequently, so that the recognition efficiency and the recognition accuracy in the subsequent character recognition process are improved.

In some embodiments of the present invention, the text region detection model for text region detection of the image to be detected may be a convolutional neural network model having a plurality of convolutional pooling units. Preferably, the text region detection model in the embodiment of the present invention may include a first convolution pooling unit, a second convolution pooling unit, a third convolution pooling unit, a fourth convolution pooling unit, and a fifth convolution pooling unit, which are connected in sequence.

The first convolution pooling unit comprises a first convolution layer and a first pooling layer; for example, the first convolutional layer may perform convolutional processing on input data using 64 convolutional kernels of 3 × 3, and the first pooling layer may perform pooling on the convolved data using maximum pooling.

The second convolution pooling unit comprises two second convolution layers and a second pooling layer; for example, two second convolution layers may each perform convolution processing on data output by the first convolution pooling unit using 128 convolution kernels of 3 × 3, and the second pooling layer may perform pooling processing on the convolved data using maximum pooling.

The third convolution pooling unit comprises three third convolution layers and a third pooling layer; for example, the first two third convolutional layers may perform convolution processing on the data output by the second convolution pooling unit by using 256 convolution kernels of 3 × 3, the third convolutional layer may continue convolution processing on the data by using 256 convolution kernels of 1 × 1, and the third pooling layer may perform pooling processing on the convolved data by using maximum pooling.

The fourth convolution pooling unit comprises three fourth convolution layers and one fourth pooling layer; for example, the first two fourth convolutional layers may perform convolution processing on the data output by the third convolution pooling unit by using 512 3 × 3 convolution kernels, the third fourth convolutional layer may continue convolution processing by using 512 1 × 1 convolution kernels, and the fourth pooling layer may perform pooling processing on the convolved data by using maximum pooling.

The fifth convolution pooling unit comprises three fifth convolution layers and one fifth pooling layer; for example, the first two fifth convolutional layers may perform convolution processing on the data output by the fourth convolution pooling unit by using 512 3 × 3 convolution kernels, the third fifth convolutional layer may continue convolution processing by using 512 1 × 1 convolution kernels, and the fifth pooling layer may perform pooling processing on the convolved data by using maximum pooling.

By using the text region detection model provided by the embodiment of the invention, the deep-level features of the image to be detected can be extracted through multilayer convolution, so that the text candidate region is obtained. Fig. 4 schematically shows a flowchart of steps for obtaining a text candidate area in the embodiment of the present invention. As shown in fig. 4, on the basis of the above embodiment, step s120. inputting the tilt-corrected image into a pre-trained text region detection model to obtain a text candidate region in the tilt-corrected image may include the following steps:

and S410, inputting the inclination correction image into a pre-trained text region detection model.

The tilt-corrected image is input as input data to a text region detection model trained in advance, and is subjected to detection processing by the text region detection model. The tilt-corrected image input to the text region detection model may be, for example, a 512-by-512 three-channel image. If the original tilt-corrected image does not conform to the specified image size, it may be resized in advance and then input to the text region detection model.

And S420, carrying out convolution pooling on the inclination correction image by each convolution pooling unit of the text area detection model in sequence.

The text region detection model comprises a plurality of convolution pooling units which are connected in sequence, and each convolution pooling unit can gradually acquire deep features of the inclination correction image along with deep convolution pooling.

And S430, obtaining feature maps output by the third convolution pooling unit, the fourth convolution pooling unit and the fifth convolution pooling unit, and determining a text candidate area in the inclination correction image according to the feature maps.

For a text region detection model with five convolution pooling units, the feature maps output by the third convolution pooling unit, the fourth convolution pooling unit and the fifth convolution pooling unit can be extracted, and then a sigmoid activation function is adopted to regress the position of an external quadrangle containing characters in the tilt correction image, so that a text region to be selected in the tilt correction image is determined. The plurality of convolution pooling units are used as output layers, the condition that candidate areas with characters in the inclination correction image are different in size can be considered, and the accuracy of determining the text candidate area is improved.

Aiming at the text candidate image obtained by segmenting the text candidate region, the embodiment of the invention can correct the text candidate image by using a threshold processing technology. Fig. 5 schematically shows a flowchart of the steps of performing binarization processing on a candidate image of a text in the embodiment of the invention. As shown in fig. 5, on the basis of the above embodiment, in step s140, determining a binarization segmentation threshold according to a gray value of each pixel point in the grayed text image to be selected, and performing binarization processing on the text image to be selected by using the binarization segmentation threshold, the method may further include the following steps:

and S510, performing convolution processing on the text image to be selected after the graying processing by using an expansion convolution kernel to obtain an expansion image.

Although the image to be selected for the text can exclude other elements except the text in the image to be detected, larger gaps still exist among characters in the image. The expansion operation method is to perform convolution processing on the text image to be selected after the graying processing by using an expansion convolution kernel, and the expansion convolution kernel in the embodiment of the invention can be, for example:

and S520, performing convolution processing on the expanded image by using the corrosion convolution kernel to obtain a corrosion image.

And obtaining an expansion image after the expansion operation is finished, and performing corrosion operation on the expansion image to obtain a corrosion image in the step, so that the position of the character in the corrosion image can be more accurately returned to the region of the character external quadrangle. The erosion operation mode is to perform convolution processing on the expanded image by using an erosion convolution kernel, and the erosion convolution kernel in the embodiment of the present invention may be, for example:

s530, counting the gray value of each pixel point in the corrosion image to obtain a binarization segmentation threshold value, and determining a binarization threshold value range according to the binarization segmentation threshold value.

In the erosion image obtained after the expansion operation and the erosion operation, the characters occupy most space of the image, so the gray value of each pixel point in the erosion image can be counted to obtain the gray average value of all the pixel points, the gray average value can be used as a binarization segmentation threshold value, and then a binarization threshold value range can be determined according to the binarization segmentation threshold value. For example, the binary segmentation threshold obtained by counting the gray values is p, and the binary threshold range may be determined as [0.85 × p,1.15 × p ] in the embodiment of the present invention.

And S540, determining pixel points of the corrosion image with the gray value within the binarization threshold range as first pixel points, and determining pixel points of the corrosion image with the gray value outside the binarization threshold range as second pixel points.

The binarization threshold range determined in the step S530 is utilized to perform type division on each pixel point in the corrosion image, and if the gray value of a certain pixel point falls within the binarization threshold range, the pixel point can be determined as a first pixel point; and if the gray value of a certain pixel point falls outside the binarization threshold range, the pixel point can be determined as a second pixel point.

Step S550, setting the pixel value of the first pixel point as a first pixel value and setting the pixel value of the second pixel point as a second pixel value.

Based on the pixel point type division determined in step S540, in this step, the pixel value of each pixel point is re-assigned to complete binarization processing of the to-be-selected text image, specifically, the pixel value of the first pixel point may be set as a first pixel value, and the pixel value of the second pixel point may be set as a second pixel value.

Through the method of combining the image processing technology and the convolutional neural network, compared with the traditional character detection method, the character detection effect in the image to be detected has more accurate position information, has more general practicability, and can improve the accuracy of subsequent character recognition.

The following describes embodiments of the apparatus of the present invention, which can be used to perform the above-mentioned text detection method based on image processing.

Fig. 6 schematically shows a block diagram of a structure of a text detection apparatus based on image processing in an embodiment of the present invention. As shown in fig. 6, the text detection apparatus 600 may mainly include: a tilt correction module 610, an area detection module 620, an image determination module 630, a binarization processing module 640, and a character recognition module 650.

The tilt correction module 610 is configured to obtain a text tilt angle of the image to be detected through hough transform, and rotate the image to be detected by using the text tilt angle to obtain a tilt correction image.

The region detection module 620 is configured to input the tilt-corrected image into a pre-trained text region detection model to obtain a text candidate region in the tilt-corrected image.

The image determining module 630 is configured to determine one or more text candidate images corresponding to the images to be detected based on the text candidate areas, and perform graying processing on each text candidate image.

The binarization processing module 640 is configured to determine a binarization segmentation threshold according to the gray value of each pixel point in the grayed text image to be selected, and perform binarization processing on the text image to be selected by using the binarization segmentation threshold.

The character recognition module 650 is configured to recognize character information in the binarized text candidate image.

In some embodiments of the present invention, the tilt correction module 610 may further include:

the coordinate conversion unit is configured to perform binarization processing on an image to be detected to obtain a binarized image, and map pixel coordinates of each pixel point in the binarized image from a rectangular coordinate space to a polar coordinate space;

a coordinate point counting unit configured to traverse a target pixel point in the binarized image, calculate a coordinate value of the target pixel point in the polar coordinate space using a target function to determine a polar coordinate point count value falling within each spatial grid in the polar coordinate space;

a straight line determination unit configured to determine a spatial grid of which the polar coordinate point count value is larger than a preset threshold value as a target spatial grid to determine a tilt correction straight line in the binarized image corresponding to the target spatial grid;

an angle determination unit configured to determine a text inclination angle of the image to be detected according to an inclination angle of the inclination correction straight line.

In some embodiments of the present invention, the coordinate point counting unit may further include:

a target pixel value determination subunit configured to determine, as a target pixel value, a pixel value of a pixel point in the binarized image corresponding to a text position;

a space grid dividing subunit configured to divide the polar coordinate space into a plurality of space grids at intervals of a preset length and a preset angle;

a coordinate value obtaining subunit, configured to traverse target pixel points having target pixel values in the binarized image to obtain coordinate values of the target pixel points in a rectangular coordinate space;

the coordinate value determining subunit is configured to calculate, according to the coordinate value of the target pixel point in the rectangular coordinate space, the coordinate value of each polar coordinate point of the target pixel point in the polar coordinate space by using a target function;

a coordinate point count subunit configured to determine a polar coordinate point count value falling within each spatial grid in the polar coordinate space from the coordinate values of each polar coordinate point in the polar coordinate space.

In some embodiments of the present invention, the text region detection model is a convolutional neural network model having a plurality of convolutional pooling units.

In some embodiments of the present invention, the text region detection model includes a first convolution pooling unit, a second convolution pooling unit, a third convolution pooling unit, a fourth convolution pooling unit, and a fifth convolution pooling unit connected in sequence;

the first convolution pooling unit includes a first convolution layer and a first pooling layer;

the second convolution pooling unit comprises two second convolution layers and one second pooling layer;

the third convolution pooling unit comprises three third convolution layers and a third pooling layer;

the fourth convolution pooling unit comprises three fourth convolution layers and one fourth pooling layer;

the fifth convolution pooling unit includes three fifth convolution layers and one fifth pooling layer.

In some embodiments of the invention, the region detection module 620 may include:

an image input unit configured to input the tilt-corrected image into a pre-trained text region detection model;

a convolution pooling processing unit configured to sequentially perform convolution pooling processing on the tilt-corrected image by the respective convolution pooling units of the text region detection model;

and the feature output unit is configured to acquire feature maps output by the third, fourth and fifth convolution pooling units and determine a text candidate area in the tilt correction image according to the feature maps.

In some embodiments of the present invention, the binarization processing module 640 may include:

the expansion processing unit is configured to perform convolution processing on the text image to be selected after the graying processing by using an expansion convolution kernel to obtain an expansion image;

an erosion processing unit configured to perform convolution processing on the expanded image using an erosion convolution kernel to obtain an erosion image;

the segmentation threshold value determining unit is configured to count the gray value of each pixel point in the corrosion image to obtain a binarization segmentation threshold value, and determine a binarization threshold value range according to the binarization segmentation threshold value;

the pixel point classification unit is configured to determine pixel points of the corrosion image with the gray value within the binarization threshold range as first pixel points, and determine pixel points of the corrosion image with the gray value outside the binarization threshold range as second pixel points;

a pixel value setting unit configured to set a pixel value of the first pixel point as a first pixel value and set a pixel value of the second pixel point as a second pixel value.

For details which are not disclosed in the embodiments of the apparatus of the present invention, reference is made to the above-described embodiments of the text detection method based on image processing of the present invention for details which are not disclosed in the embodiments of the apparatus of the present invention, since each functional module of the text detection apparatus based on image processing of the present invention corresponds to a step of the above-described exemplary embodiment of the text detection method based on image processing.

Referring now to FIG. 7, shown is a block diagram of a computer system 700 suitable for use with the electronic device implementing an embodiment of the present invention. The computer system 700 of the electronic device shown in fig. 7 is only an example, and should not bring any limitation to the function and the scope of use of the embodiments of the present invention.

As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU)701, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data necessary for system operation are also stored. The CPU701, the ROM 702, and the RAM 703 are connected to each other via a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

The following components are connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.

In particular, according to an embodiment of the present invention, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the invention include a computer program product comprising a computer program embodied on a computer-readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711. The computer program executes the above-described functions defined in the system of the present application when executed by the Central Processing Unit (CPU) 701.

It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present invention may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.

As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to perform text detection based on image processing as described in the above embodiments.

For example, the electronic device may implement the following as shown in fig. 1: s110, obtaining a text inclination angle of an image to be detected through Hough transformation, and rotating the image to be detected by utilizing the text inclination angle to obtain an inclination correction image; s120, inputting the inclination correction image into a pre-trained text region detection model to obtain a text candidate region in the inclination correction image; s130, determining one or more text images to be selected corresponding to the images to be detected based on the text areas to be selected, and carrying out gray processing on each text image to be selected; s140, determining a binarization segmentation threshold value according to the gray value of each pixel point in the text image to be selected after the graying processing, and performing binarization processing on the text image to be selected by using the binarization segmentation threshold value; and S150, identifying character information in the text image to be selected after binarization processing.

As another example, the electronic device may implement the various method steps shown in fig. 2-5.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the invention. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiment of the present invention can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiment of the present invention.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

23页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种基于图像及OCR识别的关键词获取方法

Text detection method, device, medium and electronic equipment based on image processing

相关技术

网友询问留言