Text image angle deviation rectifying method and device and computer readable storage medium

文档序号:1545166 发布日期:2020-01-17 浏览:4次 中文

阅读说明:本技术 文本图像角度纠偏方法、装置及计算机可读存储介质 (Text image angle deviation rectifying method and device and computer readable storage medium ) 是由 王博 于 2019-09-06 设计创作,主要内容包括:本发明涉及一种人工智能技术,揭露了一种文本图像角度纠偏方法,包括获取文本图像,对所述文本图像进行预处理操作,得到二值化文本图像;通过迭代算法检测所述二值化文本图像中偏斜的文本,得到偏斜文本图像,并对所述偏斜文本图像进行裁剪,得到二值拷贝图像;对所述二值拷贝图像进行递进旋转,将递进旋转后的所述二值拷贝图像转换为频数投影直方图集;计算所述频数投影直方图集的峰顶点与峰谷点的标准差,得到标准差集,将所述标准差集中最大标准差作为所述文本图像的纠偏角度,从而完成对所述文本图像的角度纠偏。本发明还提出一种文本图像角度纠偏装置以及一种计算机可读存储介质。本发明实现了文本图像角度的精准纠偏。(The invention relates to an artificial intelligence technology, and discloses a text image angle deviation rectifying method, which comprises the steps of obtaining a text image, and carrying out preprocessing operation on the text image to obtain a binary text image; detecting a skewed text in the binary text image through an iterative algorithm to obtain a skewed text image, and cutting the skewed text image to obtain a binary copy image; the binary copy image is rotated progressively, and the binary copy image after progressive rotation is converted into a frequency projection histogram set; and calculating the standard deviation of the peak top point and the peak valley point of the frequency projection histogram set to obtain a standard deviation set, and taking the maximum standard deviation in the standard deviation set as the deviation correction angle of the text image, thereby completing the angle deviation correction of the text image. The invention also provides a text image angle deviation rectifying device and a computer readable storage medium. The invention realizes the accurate correction of the text image angle.)

1. A text image angle deviation rectifying method is characterized by comprising the following steps:

acquiring a text image, and carrying out preprocessing operation on the text image to obtain a binary text image;

detecting a skewed text in the binary text image through an iterative algorithm to obtain a skewed text image, and cutting the skewed text image to obtain a binary copy image;

the binary copy image is rotated progressively, the binary copy image after progressive rotation is converted into a frequency projection histogram, and a frequency projection histogram set of the binary copy image is obtained according to the progressive rotation angle of the binary copy image;

and calculating the standard deviation of the peak top point and the peak valley point of the frequency projection histogram set to obtain a standard deviation set, and taking the maximum standard deviation in the standard deviation set as the deviation correction angle of the text image, thereby completing the angle deviation correction of the text image.

2. The method for correcting the angle of the text image according to claim 1, wherein the pre-processing the text image to obtain a binarized text image comprises:

and denoising the text image through a self-adaptive image denoising filter, performing contrast enhancement on the text image subjected to the denoising by using a contrast stretching mode, and performing thresholding operation on the text image subjected to the contrast enhancement according to an OTSU algorithm to obtain the binary text image.

3. The method for rectifying an angle of a text image according to claim 1, wherein said converting the binary copy image after the progressive rotation into a frequency projection histogram comprises:

performing Fourier transform on the binary copy image after progressive rotation;

calculating the amplitude spectrum and the phase spectrum of the binary copy image after Fourier transform;

and constructing the frequency projection histogram according to the amplitude spectrum and the phase spectrum.

4. The method for rectifying an angle of a text image according to claim 3, wherein the Fourier transform method comprises:

Figure FDA0002194256060000011

it is transformed into:

wherein u is 0,1,2,3 … M-1; v ═ 0,1,2,3 … N-1; x is 0,1,2,3 … M-1; y is 0,1,2,3 … N-1; m, N are the number of long and wide pixel points in the binary copy image, x and y are space coordinate points, F (x, y) is the sampling value of the space domain of the binary copy image, F (u, v) is the sampling value of the Fourier transform domain of the binary copy image, and u and v are the coordinate points of the transform domain.

5. The method of any one of claims 1 to 4, wherein the method of calculating the standard deviation of the peak-to-peak and peak-to-valley points in the frequency projection histogram set comprises:

where σ denotes the standard deviation of the frequency projection histogram, xiRepresenting the ith peak in the frequency projection histogram, n representing the number of peaks in the frequency projection histogram, yjRepresents the ith peak-valley point in the frequency projection histogram, m represents the number of peak-valley points in the frequency projection histogram, and μ is the average of all peak tops and peak-valley points.

6. A text image angle deviation rectifying apparatus, comprising a memory and a processor, wherein the memory stores a text image angle deviation rectifying program operable on the processor, and the text image angle deviation rectifying program when executed by the processor implements the steps of:

acquiring a text image, and carrying out preprocessing operation on the text image to obtain a binary text image;

detecting a skewed text in the binary text image through an iterative algorithm to obtain a skewed text image, and cutting the skewed text image to obtain a binary copy image;

the binary copy image is rotated progressively, the binary copy image after progressive rotation is converted into a frequency projection histogram, and a frequency projection histogram set of the binary copy image is obtained according to the progressive rotation angle of the binary copy image;

and calculating the standard deviation of the peak top point and the peak valley point of the frequency projection histogram set to obtain a standard deviation set, and taking the maximum standard deviation in the standard deviation set as the deviation correction angle of the text image, thereby completing the angle deviation correction of the text image.

7. The apparatus for correcting an angle of a text image according to claim 6, wherein said preprocessing the text image to obtain a binarized text image comprises:

and denoising the text image through a self-adaptive image denoising filter, performing contrast enhancement on the text image subjected to the denoising by using a contrast stretching mode, and performing thresholding operation on the text image subjected to the contrast enhancement according to an OTSU algorithm to obtain the binary text image.

8. The apparatus for rectifying an angle of a text image according to claim 6, wherein said converting the binary copy image after the progressive rotation into a frequency projection histogram comprises:

performing Fourier transform on the binary copy image after progressive rotation;

calculating the amplitude spectrum and the phase spectrum of the binary copy image after Fourier transform;

and constructing the frequency projection histogram according to the amplitude spectrum and the phase spectrum.

9. The text image angle rectifying device according to claim 8, wherein the fourier transform method comprises:

Figure FDA0002194256060000031

it is transformed into:

wherein u is 0,1,2,3 … M-1; v ═ 0,1,2,3 … N-1; x is 0,1,2,3 … M-1; y is 0,1,2,3 … N-1; m, N are the number of long and wide pixel points in the binary copy image, x and y are space coordinate points, F (x, y) is the sampling value of the space domain of the binary copy image, F (u, v) is the sampling value of the Fourier transform domain of the binary copy image, and u and v are the coordinate points of the transform domain.

10. A computer readable storage medium having stored thereon a text image angle rectification program executable by one or more processors to perform the steps of the text image angle rectification method according to any one of claims 1 to 5.

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a text image angle deviation rectifying method and device based on projection and a computer readable storage medium.

Background

The optical character recognition technology has an extremely wide application scene in the current society. The Optical Character Recognition (OCR) is a process of recognizing Optical characters in a picture by using an image processing and pattern Recognition technology and translating the Optical characters into computer characters, the main process of the OCR is to input an image and perform preprocessing, binarization processing, denoising, Character cutting and Character Recognition, most of the OCR algorithms are realized based on a decision tree and a Support Vector Machine (SVM), the Recognition precision is very sensitive to the deflection of the characters, however, the acquisition of a text image is difficult to achieve zero deflection, and a certain difficulty exists in calculating a deviation correction angle precisely if the deviation correction angle needs to be calculated.

Disclosure of Invention

The invention provides a text image angle deviation correcting method, a text image angle deviation correcting device and a computer readable storage medium, and mainly aims to present an accurate deviation correcting result to a user when the user corrects the text image angle in a knowledge base.

In order to achieve the above object, the present invention provides a text image angle deviation rectifying method, which includes:

acquiring a text image, and carrying out preprocessing operation on the text image to obtain a binary text image;

detecting a skewed text in the binary text image through an iterative algorithm to obtain a skewed text image, and cutting the skewed text image to obtain a binary copy image;

the binary copy image is rotated progressively, the binary copy image after progressive rotation is converted into a frequency projection histogram, and a frequency projection histogram set of the binary copy image is obtained according to the progressive rotation angle of the binary copy image;

and calculating the standard deviation of the peak top point and the peak valley point of the frequency projection histogram set to obtain a standard deviation set, and taking the maximum standard deviation in the standard deviation set as the deviation correction angle of the text image, thereby completing the angle deviation correction of the text image.

Optionally, the preprocessing the text image to obtain a binarized text image includes:

and denoising the text image through a self-adaptive image denoising filter, performing contrast enhancement on the text image subjected to the denoising by using a contrast stretching mode, and performing thresholding operation on the text image subjected to the contrast enhancement according to an OTSU algorithm to obtain the binary text image.

Optionally, the converting the binary copy image after the progressive rotation into a frequency projection histogram includes:

performing Fourier transform on the binary copy image after progressive rotation;

calculating the amplitude spectrum and the phase spectrum of the binary copy image after Fourier transform;

and constructing the frequency projection histogram according to the amplitude spectrum and the phase spectrum.

Optionally, the fourier transform method comprises:

Figure BDA0002194256070000021

it is transformed into:

Figure BDA0002194256070000022

wherein u-0, 1,2, 3.. M-1; n-1, 0,1,2, 3.; m-1, 0,1,2, 3.; n-1, 0,1,2, 3; m, N are the number of long and wide pixel points in the binary copy image, x and y are space coordinate points, F (x, y) is the sampling value of the space domain of the binary copy image, F (u, v) is the sampling value of the Fourier transform domain of the binary copy image, and u and v are the coordinate points of the transform domain.

Optionally, the method for calculating a standard deviation of peak top points and peak bottom points in the frequency projection histogram set includes:

Figure BDA0002194256070000023

wherein σStandard deviation, x, representing frequency projection histogramiRepresenting the ith peak in the frequency projection histogram, n representing the number of peaks in the frequency projection histogram, yjRepresents the ith peak-valley point in the frequency projection histogram, m represents the number of peak-valley points in the frequency projection histogram, and μ is the average of all peak tops and peak-valley points.

In addition, in order to achieve the above object, the present invention further provides a text image angle deviation rectifying device, which includes a memory and a processor, wherein the memory stores a text image angle deviation rectifying program operable on the processor, and when the text image angle deviation rectifying program is executed by the processor, the following steps are implemented:

acquiring a text image, and carrying out preprocessing operation on the text image to obtain a binary text image;

detecting a skewed text in the binary text image through an iterative algorithm to obtain a skewed text image, and cutting the skewed text image to obtain a binary copy image;

the binary copy image is rotated progressively, the binary copy image after progressive rotation is converted into a frequency projection histogram, and a frequency projection histogram set of the binary copy image is obtained according to the progressive rotation angle of the binary copy image;

and calculating the standard deviation of the peak top point and the peak valley point of the frequency projection histogram set to obtain a standard deviation set, and taking the maximum standard deviation in the standard deviation set as the deviation correction angle of the text image, thereby completing the angle deviation correction of the text image.

Optionally, the preprocessing the text image to obtain a binarized text image includes:

and denoising the text image through a self-adaptive image denoising filter, performing contrast enhancement on the text image subjected to the denoising by using a contrast stretching mode, and performing thresholding operation on the text image subjected to the contrast enhancement according to an OTSU algorithm to obtain the binary text image.

Optionally, the converting the binary copy image after the progressive rotation into a frequency projection histogram includes:

performing Fourier transform on the binary copy image after progressive rotation;

calculating the amplitude spectrum and the phase spectrum of the binary copy image after Fourier transform;

and constructing the frequency projection histogram according to the amplitude spectrum and the phase spectrum.

Optionally, the fourier transform method comprises:

Figure BDA0002194256070000031

it is transformed into:

Figure BDA0002194256070000032

wherein u is 0,1,2,3 … M-1; v ═ 0,1,2,3 … N-1; x is 0,1,2,3 … M-1; y is 0,1,2,3 … N-1; m, N are the number of long and wide pixel points in the binary copy image, x and y are space coordinate points, F (x, y) is the sampling value of the space domain of the binary copy image, F (u, v) is the sampling value of the Fourier transform domain of the binary copy image, and u and v are the coordinate points of the transform domain.

In addition, to achieve the above object, the present invention also provides a computer readable storage medium having a text image angle rectification program stored thereon, where the text image angle rectification program is executable by one or more processors to implement the steps of the text image angle rectification method as described above.

According to the text image angle deviation rectifying method, the text image angle deviation rectifying device and the computer readable storage medium, when a user conducts text image angle deviation rectifying, preprocessing operation is conducted on an obtained text image, an inclined text image in the text image is analyzed and processed to obtain a frequency projection histogram set of the text image, the standard deviation of a peak top point and a peak valley point of the frequency projection histogram set is calculated, the maximum standard deviation is used as the deviation rectifying angle of the text image, and therefore an accurate text image angle deviation rectifying result can be presented to the user.

Drawings

Fig. 1 is a schematic flow chart of a text image angle deviation rectifying method according to an embodiment of the present invention;

fig. 2 is a schematic diagram of an internal structure of a text image angle deviation correcting device according to an embodiment of the present invention;

fig. 3 is a schematic block diagram of a text image angle deviation rectifying program in the text image angle deviation rectifying device according to an embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The invention provides a text image angle deviation rectifying method. Referring to fig. 1, a schematic flow chart of a text image angle deviation rectifying method according to an embodiment of the present invention is shown. The method may be performed by an apparatus, which may be implemented by software and/or hardware.

In this embodiment, the text image angle rectification method includes:

and S1, acquiring a text image, and carrying out preprocessing operation on the text image to obtain a binary text image.

In a preferred embodiment of the present invention, the text image may be image data such as a certificate and an invoice. The preprocessing operation is as follows: and denoising the text image through a self-adaptive image denoising filter, performing contrast enhancement on the text image subjected to the denoising by using a contrast stretching mode, and performing thresholding operation on the text image subjected to the contrast enhancement according to an OTSU algorithm to obtain the binary text image. In detail, the pre-treatment operation is implemented as follows:

a. noise reduction:

the invention carries out noise reduction on the text image through the self-adaptive image noise reduction filter, is used for filtering salt and pepper noise of the text image, and can protect the details of the text image to a great extent. Wherein the salt and pepper noise is a white point or a black point which randomly appears in the image, and the adaptive image noise reduction filter is a signal extractor for extracting an original signal from a signal polluted by noise.

The preferred embodiment of the present invention obtains a degraded image g (x, y) under the influence of the salt-pepper noise η (x, y) by presetting the text image as f (x, y) under the action of the degradation function H. Thus, an image degradation formula is obtained: g (x, y) ═ η (x, y) + f (x, y), and denoising the text image by using an Adaptive Filter method, wherein the denoising is calculated by the following formula:

Figure BDA0002194256070000051

wherein the content of the first and second substances,is the variance of the noise of the text image,

Figure BDA0002194256070000053

is the mean of the pixel gray levels in a window around point (x, y),is the variance of the pixel gray levels within a window around point (x, y).

b. Contrast enhancement:

the contrast refers to the contrast between the brightness maximum and minimum in the imaging system, wherein low contrast increases the difficulty of image processing. In the preferred embodiment of the invention, a contrast stretching method is adopted, and the aim of enhancing the contrast of the text image is fulfilled by improving the dynamic range of the gray level. The contrast stretch is also called gray stretch.

Furthermore, the invention performs gray scale stretching on the specific area according to the piecewise linear transformation function in the contrast stretching method, thereby further improving the contrast of the output image. When contrast stretching is performed, gray value transformation is essentially achieved. The invention realizes gray value conversion by linear stretching, wherein the linear stretching refers to pixel level operation with linear relation between input and output gray values, and a gray conversion formula is as follows:

Db=f(Da)=a*Da+b

where a is the linear slope and b is the intercept on the Y-axis. When a > 1, the image contrast of the output image is enhanced compared with the original image. When a < 1, the contrast of the output image is weakened compared with the original image, where DaRepresenting the gray value of the input image, DbRepresenting the output image grey scale value.

c. Image thresholding operation:

the invention performs binarization efficient algorithm on the text image with enhanced contrast by OTSU algorithm to obtain a binarized image. Further, in the preferred embodiment of the present invention, the preset gray level t is a segmentation threshold of the foreground and the background of the text image after the contrast enhancement, and the ratio of the number of foreground points to the text image after the contrast enhancement is w0Average gray of u0(ii) a The proportion of the background points to the text image after the contrast enhancement is w1Average gray of u1And then the total average gray scale of the text image after the contrast enhancement is:

u=w0*u0+w1*u1

wherein the variance of the foreground and background images of the text image after contrast enhancement is:

g=w0*(u0-u)*(u0-u)+w1*(u1-u)*(u1-u)=w0*w1*(u0-u1)*(u0-u1),

when the variance g is maximum, the difference between the foreground and the background is maximum, the gray level t is an optimal threshold value, the gray level larger than the gray level t in the text image after the contrast enhancement is set to be 255, and the gray level smaller than the gray level t is set to be 0, so that the binary text image of the text image after the contrast enhancement is obtained.

Further, the preprocessing operation of the present invention may further include performing dimension reduction on the binarized text image by a principal component analysis method, so that the binarized text image can be processed more efficiently. Wherein, the principal component analysis method is a method for converting a group of variables with possible correlation into a group of linearly uncorrelated variables through orthogonal transformation.

S2, detecting the inclined text in the binary text image through an iterative algorithm to obtain an inclined text image, and extracting the inclined text image to obtain a binary copy image.

In the preferred embodiment of the invention, the biased text in the binary text image is detected by an AdaBoost iterative algorithm to form a biased text image. The AdaBoost iterative algorithm is a detection algorithm, the core of the AdaBoost iterative algorithm is iteration, a weak classifier is constructed according to different training sets, and each basic weak classifier is combined together to form a final strong classifier. The AdaBoost iterative algorithm is realized by adjusting data distribution, and setting the weight of each sample according to the judgment of the correctness of each sample classification in each training set and the accuracy of the total classification of the last sample. And the newly obtained weight value is used as a data set for training the lower-layer classifier, and then the classifiers trained each time are combined to form the final decision classifier.

The invention divides different areas in the binary text image to obtain training samples (x)1,y1),(x2,y2),…(xn,yn) Wherein, the negative sample (background) is yiDenoted by 0, positive samples (foreground, i.e. containing skewed text) are denoted by yiIs denoted by 1. Preferably, the weak classifier constructed by the invention is as follows:

Figure BDA0002194256070000071

wherein f is characterized inθ is a threshold, p indicates the direction of the unequal sign, and x represents a detection sub-window. By collecting the constructed weak classifiers and classifying the minimum error rate epsilon in the constructed weak classifierstIs best weak classifier ht(x) Is selected, the epsilontThe calculation formula of (2) is as follows:

εt=minf,p,θi(wi/∑wi)|h(x,f,p,θ)-yi|,

wherein, w is a feature weight, and a final strong classifier is obtained:

Figure BDA0002194256070000072

βt=εt/(1-εt)。

further, the method detects the inclined text in the binary text image in a mode of cascading classifiers. The cascade classifier is a text detection cascade classifier formed by the strong classifiers obtained by training in a cascade mode, and the cascade classifier is a degenerated decision tree. In the cascade classifier, the classification of the 2 nd-layer classifier is triggered by the positive sample obtained by the 1 st-layer classification, the classification of the 3 rd-layer classifier is triggered by the positive sample obtained by the 2 nd-layer classification, and so on. And finally, detecting all the skewed text images in the binarized text image under a general environment, and cutting the skewed text images to obtain the binary copy image.

S3, the binary copy image is rotated progressively, the binary copy image after progressive rotation is converted into a frequency projection histogram, and a frequency projection histogram set of the binary copy image is obtained according to the progressive rotation angle of the binary copy image.

The preferred embodiment of the present invention performs progressive rotation on the binary copy image according to a preset angle, and preferably, the present invention performs progressive rotation on the binary copy image between-45 ° and 45 ° in units of 2 ° and calculates the number of long and wide pixel points in the binary copy image after each progressive rotation.

Further, the binary copy image after progressive rotation is converted into a frequency projection histogram through a Fourier transform algorithm. In detail, the fourier transform method includes:

Figure BDA0002194256070000073

it is transformed into:

Figure BDA0002194256070000081

wherein u-0, 1,2, 3.. M-1; n-1, 0,1,2, 3.; m-1, 0,1,2, 3.; n-1, 0,1,2, 3; m, N are the number of long and wide pixel points in the binary copy image, x and y are space coordinate points, F (x, y) is the sampling value of the space domain of the binary copy image, F (u, v) is the sampling value of the Fourier transform domain of the binary copy image, and u and v are the coordinate points of the transform domain. Wherein when the binary copy image is truly a square matrix, then M is equal to N. F (u, v) is referred to as a frequency spectrum of the binary copy image signal F (x, y), and the fourier-transformed binary copy image magnitude spectrum and phase spectrum are calculated, respectively:

Figure BDA0002194256070000082

Figure BDA0002194256070000083

wherein F (u, v) ═ R (u, v) + jI (u, v) ═ F (u, v) | ejφ(u,v)And | F (u, v) | represents the two-copy image magnitude spectrum, and Φ (u, v) represents the binary-copy image phase spectrum.

Furthermore, the invention constructs a frequency projection histogram according to the calculated amplitude spectrum and phase spectrum of the binary copy image, and different frequency projection histograms, namely the frequency projection histogram set of the binary copy image, can be obtained according to different progressive rotation angles of the binary copy image.

S4, calculating the standard deviation of the peak top point and the peak valley point in the frequency projection histogram set to obtain a standard deviation set, and taking the maximum standard deviation in the standard deviation set as the deviation rectifying angle of the text image to finish the angle deviation rectifying of the text image.

In a preferred embodiment of the present invention, the method for calculating the standard deviation of the peak top and the peak bottom in the frequency projection histogram set comprises:

Figure BDA0002194256070000084

where σ denotes the standard deviation of the frequency projection histogram, xiRepresenting the ith peak in the frequency projection histogram, n representing the number of peaks in the frequency projection histogram, yiRepresents the ith peak-valley point in the frequency projection histogram, m represents the number of peak-valley points in the frequency projection histogram, and μ is the average of all peak tops and peak-valley points. The standard deviation found reflects the degree of dispersion between the peak valley point and the peak apex.

Further, the method calculates the standard deviation of all histograms in the frequency projection histogram set to obtain a standard deviation set, obtains the best orientation after the text image is corrected when the standard deviation is the maximum according to the structural characteristics of the text image, obtains the correction angle of the text image, and performs rotation correction on the original image according to the correction angle.

The invention also provides a text image angle deviation correcting device. Fig. 2 is a schematic diagram of an internal structure of a text image angle deviation rectifying device according to an embodiment of the present invention.

In this embodiment, the text image angle deviation rectifying device 1 may be a PC (Personal Computer), a terminal device such as a smart phone, a tablet Computer, and a mobile Computer, or may be a server. The text image angle rectifying device 1 at least comprises a memory 11, a processor 12, a communication bus 13 and a network interface 14.

The memory 11 includes at least one type of readable storage medium, which includes a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like. The memory 11 may be an internal storage unit of the text image angle rectifying device 1 in some embodiments, such as a hard disk of the text image angle rectifying device 1. The memory 11 may also be an external storage device of the text image angle correction device 1 in other embodiments, such as a plug-in hard disk provided on the text image angle correction device 1, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the memory 11 may also include both an internal storage unit of the text image angle rectifying apparatus 1 and an external storage device. The memory 11 may be used not only to store application software installed in the text image angle correction device 1 and various kinds of data, such as a code of the text image angle correction program 01, but also to temporarily store data that has been output or is to be output.

The processor 12 may be a Central Processing Unit (CPU), a controller, a microcontroller, a microprocessor or other data Processing chip in some embodiments, and is used for executing program codes stored in the memory 11 or Processing data, such as executing the text image angle deviation correction program 01.

The communication bus 13 is used to realize connection communication between these components.

The network interface 14 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), typically used to establish a communication link between the apparatus 1 and other electronic devices.

Optionally, the apparatus 1 may further comprise a user interface, which may comprise a Display (Display), an input unit such as a Keyboard (Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the text image angle correction device 1 and for displaying a visual user interface.

While FIG. 2 shows only the text image angle rectifying device 1 with the components 11-14 and the text image angle rectifying program 01, it will be understood by those skilled in the art that the structure shown in FIG. 1 does not constitute a limitation of the text image angle rectifying device 1, and may include fewer or more components than those shown, or some components in combination, or a different arrangement of components.

In the embodiment of the apparatus 1 shown in fig. 2, the memory 11 stores a text image angle rectification program 01; the processor 12 executes the text image angle rectification program 01 stored in the memory 11 to implement the following steps:

the method comprises the steps of firstly, obtaining a text image, and carrying out preprocessing operation on the text image to obtain a binary text image.

In a preferred embodiment of the present invention, the text image may be image data such as a certificate and an invoice. The preprocessing operation is as follows: and denoising the text image through a self-adaptive image denoising filter, performing contrast enhancement on the text image subjected to the denoising by using a contrast stretching mode, and performing thresholding operation on the text image subjected to the contrast enhancement according to an OTSU algorithm to obtain the binary text image. In detail, the pre-treatment operation is implemented as follows:

d. noise reduction:

the invention carries out noise reduction on the text image through the self-adaptive image noise reduction filter, is used for filtering salt and pepper noise of the text image, and can protect the details of the text image to a great extent. Wherein the salt and pepper noise is a white point or a black point which randomly appears in the image, and the adaptive image noise reduction filter is a signal extractor for extracting an original signal from a signal polluted by noise.

The preferred embodiment of the present invention obtains a degraded image g (x, y) under the influence of the salt-pepper noise η (x, y) by presetting the text image as f (x, y) under the action of the degradation function H. Then, obtaining an image degradation formula, wherein g (x, y) ═ η (x, y) + f (x, y), and performing noise reduction on the text image by using an Adaptive Filter method, wherein the calculation formula of the noise reduction is as follows:

Figure BDA0002194256070000101

wherein the content of the first and second substances,

Figure BDA0002194256070000102

is the variance of the noise of the text image,

Figure BDA0002194256070000103

is the mean of the pixel gray levels in a window around point (x, y),

Figure BDA0002194256070000104

is the variance of the pixel gray levels within a window around point (x, y).

e. Contrast enhancement:

the contrast refers to the contrast between the brightness maximum and minimum in the imaging system, wherein low contrast increases the difficulty of image processing. In the preferred embodiment of the invention, a contrast stretching method is adopted, and the aim of enhancing the contrast of the text image is fulfilled by improving the dynamic range of the gray level. The contrast stretch is also called gray stretch.

Furthermore, the invention performs gray scale stretching on the specific area according to the piecewise linear transformation function in the contrast stretching method, thereby further improving the contrast of the output image. When contrast stretching is performed, gray value transformation is essentially achieved. The invention realizes gray value conversion by linear stretching, wherein the linear stretching refers to pixel level operation with linear relation between input and output gray values, and a gray conversion formula is as follows:

Db=f(Da)=a*Da+b

where a is the linear slope and b is the intercept on the Y-axis. When a > 1, the graph of the output at this timeThe contrast of the image is enhanced compared to the original image. When a < 1, the contrast of the output image is weakened compared with the original image, where DaRepresenting the gray value of the input image, DbRepresenting the output image grey scale value.

f. Image thresholding operation:

the invention performs binarization efficient algorithm on the text image with enhanced contrast by OTSU algorithm to obtain a binarized image. Further, in the preferred embodiment of the present invention, the preset gray level t is a segmentation threshold of the foreground and the background of the text image after the contrast enhancement, and the ratio of the number of foreground points to the text image after the contrast enhancement is w0Average gray of u0(ii) a The proportion of the background points to the text image after the contrast enhancement is w1Average gray of u1And then the total average gray scale of the text image after the contrast enhancement is:

u=w0*u0+w1*u1

wherein the variance of the foreground and background images of the text image after contrast enhancement is:

g=w0*(u0-u)*(u0-u)+w1*(u1-u)*(u1-u)=w0*w1*(u0-u1)*(u0-u1),

when the variance g is maximum, the difference between the foreground and the background is maximum, the gray level t is an optimal threshold value, the gray level larger than the gray level t in the text image after the contrast enhancement is set to be 255, and the gray level smaller than the gray level t is set to be 0, so that the binary text image of the text image after the contrast enhancement is obtained.

Further, the preprocessing operation of the present invention may further include performing dimension reduction on the binarized text image by a principal component analysis method, so that the binarized text image can be processed more efficiently. Wherein, the principal component analysis method is a method for converting a group of variables with possible correlation into a group of linearly uncorrelated variables through orthogonal transformation.

And secondly, detecting a skewed text in the binary text image through an iterative algorithm to obtain a skewed text image, and extracting the skewed text image to obtain a binary copy image.

In the preferred embodiment of the invention, the biased text in the binary text image is detected by an AdaBoost iterative algorithm to form a biased text image. The AdaBoost iterative algorithm is a detection algorithm, the core of the AdaBoost iterative algorithm is iteration, a weak classifier is constructed according to different training sets, and each basic weak classifier is combined together to form a final strong classifier. The AdaBoost iterative algorithm is realized by adjusting data distribution, and setting the weight of each sample according to the judgment of the correctness of each sample classification in each training set and the accuracy of the total classification of the last sample. And the newly obtained weight value is used as a data set for training the lower-layer classifier, and then the classifiers trained each time are combined to form the final decision classifier.

The invention divides different areas in the binary text image to obtain training samples (x)1,y1),(x2,y2),…(xn,yn) Wherein, the negative sample (background) is yiDenoted by 0, positive samples (foreground, i.e. containing skewed text) are denoted by yiIs denoted by 1. Preferably, the weak classifier constructed by the invention is as follows:

Figure BDA0002194256070000121

where f is a feature, θ is a threshold, p indicates the direction of the disparity, and x represents a detection sub-window. By collecting the constructed weak classifiers and classifying the minimum error rate epsilon in the constructed weak classifierstIs best weak classifier ht(x) Is selected, the epsilontThe calculation formula of (2) is as follows:

εt=minf,p,θi(wi/∑wi)|h(x,f,p,θ)-yi|,

wherein, w is a feature weight, and a final strong classifier is obtained:

Figure BDA0002194256070000122

βt=εt/(1-εt)。

further, the method detects the inclined text in the binary text image in a mode of cascading classifiers. The cascade classifier is a text detection cascade classifier formed by the strong classifiers obtained by training in a cascade mode, and the cascade classifier is a degenerated decision tree. In the cascade classifier, the classification of the 2 nd-layer classifier is triggered by the positive sample obtained by the 1 st-layer classification, the classification of the 3 rd-layer classifier is triggered by the positive sample obtained by the 2 nd-layer classification, and so on. And finally, detecting all the skewed text images in the binarized text image under a general environment, and cutting the skewed text images to obtain the binary copy image.

And step three, performing progressive rotation on the binary copy image, converting the binary copy image after progressive rotation into a frequency projection histogram, and obtaining a frequency projection histogram set of the binary copy image according to the progressive rotation angle of the binary copy image.

The preferred embodiment of the present invention performs progressive rotation on the binary copy image according to a preset angle, and preferably, the present invention performs progressive rotation on the binary copy image between-45 ° and 45 ° in units of 2 ° and calculates the number of long and wide pixel points in the binary copy image after each progressive rotation.

Further, the binary copy image after progressive rotation is converted into a frequency projection histogram through a Fourier transform algorithm. In detail, the fourier transform method includes:

Figure BDA0002194256070000131

it is transformed into:

Figure BDA0002194256070000132

wherein u-0, 1,2, 3.. M-1; n-1, 0,1,2, 3.; m-1, 0,1,2, 3.; n-1, 0,1,2, 3; m, N are the number of long and wide pixel points in the binary copy image, x and y are space coordinate points, F (x, y) is the sampling value of the space domain of the binary copy image, F (u, v) is the sampling value of the Fourier transform domain of the binary copy image, and u and v are the coordinate points of the transform domain. Wherein when the binary copy image is truly a square matrix, then M is equal to N. F (u, v) is referred to as a frequency spectrum of the binary copy image signal F (x, y), and the fourier-transformed binary copy image magnitude spectrum and phase spectrum are calculated, respectively:

Figure BDA0002194256070000133

Figure BDA0002194256070000134

wherein F (u, v) ═ R (u, v) + jI (u, v) ═ F (u, v) | e(u, v), | F (u, v) | represents the two-copy image magnitude spectrum, and φ (u, v) represents the binary-copy image phase spectrum.

Furthermore, the invention constructs a frequency projection histogram according to the calculated amplitude spectrum and phase spectrum of the binary copy image, and different frequency projection histograms, namely the frequency projection histogram set of the binary copy image, can be obtained according to different progressive rotation angles of the binary copy image.

And step four, calculating the standard deviation of the peak top point and the peak valley point in the frequency projection histogram set to obtain a standard deviation set, and taking the maximum standard deviation in the standard deviation set as the deviation rectifying angle of the text image to finish the angle deviation rectifying of the text image.

In a preferred embodiment of the present invention, the method for calculating the standard deviation of the peak top and the peak bottom in the frequency projection histogram set comprises:

Figure BDA0002194256070000141

where σ denotes the standard deviation of the frequency projection histogram, xiRepresenting the ith peak in the frequency projection histogram, n representing the number of peaks in the frequency projection histogram, yjRepresents the ith peak-valley point in the frequency projection histogram, m represents the number of peak-valley points in the frequency projection histogram, and μ is the average of all peak tops and peak-valley points. The standard deviation found reflects the degree of dispersion between the peak valley point and the peak apex.

Further, the method calculates the standard deviation of all histograms in the frequency projection histogram set to obtain a standard deviation set, obtains the best orientation after the text image is corrected when the standard deviation is the maximum according to the structural characteristics of the text image, obtains the correction angle of the text image, and performs rotation correction on the original image according to the correction angle.

Optionally, in other embodiments, the text image angle deviation rectifying program may be further divided into one or more modules, and the one or more modules are stored in the memory 11 and executed by one or more processors (in this embodiment, the processor 12) to implement the present invention.

For example, referring to fig. 3, a schematic diagram of program modules of a text image angle deviation rectifying program in an embodiment of the text image angle deviation rectifying apparatus of the present invention is shown, in this embodiment, the text image angle deviation rectifying program may be divided into a text image preprocessing module 10, a text image detecting module 20, an image converting module 30, and a calculating module 40, which exemplarily:

the text image preprocessing module 10 is configured to: and acquiring a text image, and carrying out preprocessing operation on the text image to obtain a binary text image.

The text image detection module 20 is configured to: and detecting a skewed text in the binary text image through an iterative algorithm to obtain a skewed text image, and cutting the skewed text image to obtain a binary copy image.

The image conversion module 30 is configured to: and progressively rotating the binary copy image, converting the progressively rotated binary copy image into a frequency projection histogram, and obtaining a frequency projection histogram set of the binary copy image according to the progressively rotated angle of the binary copy image.

The calculation module 40 is configured to: and calculating the standard deviation of the peak top point and the peak valley point of the frequency projection histogram set to obtain a standard deviation set, and taking the maximum standard deviation in the standard deviation set as the deviation correction angle of the text image, thereby completing the angle deviation correction of the text image.

The functions or operation steps of the text image preprocessing module 10, the text image detecting module 20, the image converting module 30, and the calculating module 40 when executed are substantially the same as those of the above embodiments, and are not repeated herein.

In addition, an embodiment of the present invention further provides a computer-readable storage medium, where a text image angle rectification program is stored on the computer-readable storage medium, where the text image angle rectification program is executable by one or more processors to implement the following operations:

acquiring a text image, and carrying out preprocessing operation on the text image to obtain a binary text image;

detecting a skewed text in the binary text image through an iterative algorithm to obtain a skewed text image, and cutting the skewed text image to obtain a binary copy image;

the binary copy image is rotated progressively, the binary copy image after progressive rotation is converted into a frequency projection histogram, and a frequency projection histogram set of the binary copy image is obtained according to the progressive rotation angle of the binary copy image;

and calculating the standard deviation of the peak top point and the peak valley point of the frequency projection histogram set to obtain a standard deviation set, and taking the maximum standard deviation in the standard deviation set as the deviation correction angle of the text image, thereby completing the angle deviation correction of the text image.

The embodiment of the computer-readable storage medium of the present invention is substantially the same as the embodiments of the text image angle deviation rectifying device and method, and will not be described herein again.

It should be noted that the above-mentioned numbers of the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

17页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:图像内文字识别方法、装置及计算机可读存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!