Method and device for recognizing characters in image and computer readable storage medium

文档序号:1545167 发布日期:2020-01-17 浏览:4次 中文

阅读说明:本技术 图像内文字识别方法、装置及计算机可读存储介质 (Method and device for recognizing characters in image and computer readable storage medium ) 是由 陈少琼 卢宁 于 2019-09-06 设计创作,主要内容包括:本发明涉及一种人工智能技术,揭露了一种图像内文字识别方法,包括:获取原始图像数据集及标签集,对所述原始图像数据集进行局部亮化处理得到标准图像数据集,将所述标准图像数据集进行仿射变换生成特征候选区域集,将所述特征候选区域集进行具有初始内部参数的卷积操作、池化操作后通过激活操作后得出文字集,将所述文字集与所述标签集进行相同比对,若所述文字集与所述标签集的相同准确率小于预设阈值时重新预测,若所述文字集与所述标签集的相同准确率大于所述预设阈值时,接收用户输入的图像后识别出用户所述图像中的文字并输出。本发明还提出一种图像内文字识别装置以及一种计算机可读存储介质。本发明可以实现精准的图像内文字识别功能。(The invention relates to an artificial intelligence technology, and discloses a method for recognizing characters in an image, which comprises the following steps: the method comprises the steps of obtaining an original image data set and a label set, carrying out local brightening processing on the original image data set to obtain a standard image data set, carrying out affine transformation on the standard image data set to generate a characteristic candidate area set, carrying out convolution operation and pooling operation on the characteristic candidate area set with initial internal parameters, then obtaining a character set through activation operation, carrying out identical comparison on the character set and the label set, carrying out re-prediction if identical accuracy of the character set and the label set is smaller than a preset threshold value, and receiving an image input by a user, identifying characters in the image of the user and outputting the characters if identical accuracy of the character set and the label set is larger than the preset threshold value. The invention also provides a device for recognizing the characters in the image and a computer readable storage medium. The invention can realize accurate character recognition function in the image.)

1. A method for recognizing text in an image, the method comprising:

step A: acquiring an original image data set containing characters and a label set, and carrying out local brightening treatment on the original image data set to obtain a standard image data set;

and B: carrying out affine transformation on the standard image data set to generate a feature candidate region set;

and C: extracting a feature matrix set from the feature candidate region set based on convolution operation and pooling operation with initial internal parameters, and predicting a character set after activation operation is performed according to the feature matrix set;

step D: c, comparing the character set with the label set in the same way, if the accuracy rate of the character set and the label set is smaller than a preset threshold value, adjusting internal parameters of the convolution operation and the pooling operation, returning to the step C to predict again, and if the accuracy rate of the character set and the label set is larger than the preset threshold value, outputting the internal parameters of the convolution operation and the pooling operation as the optimal internal parameters;

step E: and receiving an image input by a user, performing affine transformation on the image input by the user, performing convolution operation and pooling operation with the optimal internal parameters, and identifying and outputting characters in the image through the activation operation.

2. The method for recognizing words in an image according to claim 1, wherein the set of labels comprises a set of label words and a set of label positions;

the label character set records characters of images in the original image data set;

the set of tag locations records the coordinate locations of the text of the image in the original image dataset within the image.

3. The method of recognizing a text word in an image according to claim 2, wherein the local brightening process includes:

finding a text image area g (x, y) from the original image dataset according to the set of label positions;

calculating to obtain a brightness linear enhancement function e (x, y) according to the character image area g (x, y);

and enhancing the brightness of the character image area g (x, y) according to a brightness linear enhancement function e (x, y) to finish the local brightness processing.

4. The method of image text recognition according to claim 3, wherein the enhancing the brightness of the text image area g (x, y) according to the brightness linear enhancement function e (x, y) comprises calculating the enhanced text image area using the following formula:

wherein f (x, y) is the enhanced character image area, N is the average brightness value of the character image area, and (x)1,y1) Is the coordinate of the lower left of the character image region, N1(x) is the brightness value of the lower left coordinate of the text image area4,y4) Is the upper right coordinate of the text image area, N4And the brightness value is the brightness value of the coordinate at the upper right of the character image area.

5. The method for recognizing words in images according to claim 1, wherein the convolution operation and the pooling operation include:

a convolution template is constructed in advance and the convolution step length is determined;

calculating the convolution template and the feature candidate area set according to the convolution step to obtain a convolution matrix set after convolution operation, and finishing the convolution operation;

and selecting the maximum value or the average value of the matrix in the convolution matrix set to replace the convolution matrix set, and finishing the pooling operation.

6. An apparatus for intra-image text recognition, the apparatus comprising a memory and a processor, the memory having stored thereon an intra-image text recognition program executable on the processor, the intra-image text recognition program when executed by the processor implementing the steps of:

step A: acquiring an original image data set containing characters and a label set, and carrying out local brightening treatment on the original image data set to obtain a standard image data set;

and B: carrying out affine transformation on the standard image data set to generate a feature candidate region set;

and C: extracting a feature matrix set from the feature candidate region set based on convolution operation and pooling operation with initial internal parameters, and predicting a character set after activation operation is performed according to the feature matrix set;

step D: c, comparing the character set with the label set in the same way, if the accuracy rate of the character set and the label set is smaller than a preset threshold value, adjusting internal parameters of the convolution operation and the pooling operation, returning to the step C to predict again, and if the accuracy rate of the character set and the label set is larger than the preset threshold value, outputting the internal parameters of the convolution operation and the pooling operation as the optimal internal parameters;

step E: and receiving an image input by a user, performing affine transformation on the image input by the user, performing convolution operation and pooling operation with the optimal internal parameters, and identifying and outputting characters in the image through the activation operation.

7. The in-image text recognition apparatus of claim 6, wherein the set of labels comprises a set of label text and a set of label positions;

the label character set records characters of images in the original image data set;

the set of tag locations records the coordinate locations of the text of the image in the original image dataset within the image.

8. The in-image text recognition apparatus according to claim 7, wherein the local brightening process includes:

finding a text image area g (x, y) from the original image dataset according to the set of label positions;

calculating to obtain a brightness linear enhancement function e (x, y) according to the character image area g (x, y);

and enhancing the brightness of the character image area g (x, y) according to a brightness linear enhancement function e (x, y) to finish the local brightness processing.

9. The device for recognizing text words in images as claimed in claim 8, wherein said enhancing the brightness of said text image area g (x, y) according to said brightness linear enhancing function e (x, y) comprises calculating the enhanced text image area by using the following formula:

wherein f (x, y) is the enhanced character image area, N is the average brightness value of the character image area, and (x)1,y1) Is the coordinate of the lower left of the character image region, N1(x) is the brightness value of the lower left coordinate of the text image area4,y4) Is the upper right coordinate of the text image area, N4And the brightness value is the brightness value of the coordinate at the upper right of the character image area.

10. A computer-readable storage medium having stored thereon an in-image text recognition program executable by one or more processors to perform the steps of the in-image text recognition method of any one of claims 1 to 5.

Technical Field

The present invention relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for recognizing characters in an image, and a computer-readable storage medium.

Background

Characters in the image are intelligently identified, so that the labor input can be effectively reduced, and the image can be efficiently classified according to the characters. The traditional method carries out character recognition operation based on modes of color denoising, graying, histogram construction and the like, has high requirements on a collected character picture library although the principle is simple and visual, and has low recognition rate on complex characters.

Disclosure of Invention

The invention provides a method and a device for recognizing characters in an image and a computer readable storage medium, and mainly aims to provide a method for recognizing characters in an image.

In order to achieve the above object, the present invention provides a method for recognizing a text in an image, comprising:

step A: acquiring an original image data set containing characters and a label set, and carrying out local brightening treatment on the original image data set to obtain a standard image data set;

and B: carrying out affine transformation on the standard image data set to generate a feature candidate region set;

and C: extracting a feature matrix set from the feature candidate region set based on convolution operation and pooling operation with initial internal parameters, and predicting a character set after activation operation is performed according to the feature matrix set;

step D: c, comparing the character set with the label set in the same way, if the accuracy rate of the character set and the label set is smaller than a preset threshold value, adjusting internal parameters of the convolution operation and the pooling operation, returning to the step C to predict again, and if the accuracy rate of the character set and the label set is larger than the preset threshold value, outputting the internal parameters of the convolution operation and the pooling operation as the optimal internal parameters;

step E: and receiving an image input by a user, performing affine transformation on the image input by the user, performing convolution operation and pooling operation with the optimal internal parameters, and identifying and outputting characters in the image through the activation operation.

Optionally, the tag set includes a tag literal set and a tag location set;

the label character set records characters of images in the original image data set;

the set of tag locations records the coordinate locations of the text of the image in the original image dataset within the image.

Optionally, the local brightening process includes:

finding a text image area g (x, y) from the original image dataset according to the set of label positions;

calculating to obtain a brightness linear enhancement function e (x, y) according to the character image area g (x, y);

and enhancing the brightness of the character image area g (x, y) according to a brightness linear enhancement function e (x, y) to finish the local brightness processing.

Optionally, the enhancing the brightness of the text image region g (x, y) according to the brightness linear enhancement function e (x, y) includes calculating an enhanced text image region by using the following formula:

Figure BDA0002194254810000021

wherein f (x, y) is the enhanced character image area, N is the average brightness value of the character image area, and (x)1,y1) Is the coordinate of the lower left of the character image region, N1(x) is the brightness value of the lower left coordinate of the text image area4,y4) Is the upper right coordinate of the text image area, N4And the brightness value is the brightness value of the coordinate at the upper right of the character image area.

Optionally, the convolution operation and pooling operation comprise:

a convolution template is constructed in advance and the convolution step length is determined;

calculating the convolution template and the feature candidate area set according to the convolution step to obtain a convolution matrix set after convolution operation, and finishing the convolution operation;

and selecting the maximum value or the average value of the matrix in the convolution matrix set to replace the convolution matrix set, and finishing the pooling operation.

In addition, in order to achieve the above object, the present invention further provides an image text recognition apparatus, including a memory and a processor, wherein the memory stores an image text recognition program operable on the processor, and the image text recognition program, when executed by the processor, implements the following steps:

step A: acquiring an original image data set containing characters and a label set, and carrying out local brightening treatment on the original image data set to obtain a standard image data set;

and B: carrying out affine transformation on the standard image data set to generate a feature candidate region set;

and C: extracting a feature matrix set from the feature candidate region set based on convolution operation and pooling operation with initial internal parameters, and predicting a character set after activation operation is performed according to the feature matrix set;

step D: c, comparing the character set with the label set in the same way, if the accuracy rate of the character set and the label set is smaller than a preset threshold value, adjusting internal parameters of the convolution operation and the pooling operation, returning to the step C to predict again, and if the accuracy rate of the character set and the label set is larger than the preset threshold value, outputting the internal parameters of the convolution operation and the pooling operation as the optimal internal parameters;

step E: and receiving an image input by a user, performing affine transformation on the image input by the user, performing convolution operation and pooling operation with the optimal internal parameters, and identifying and outputting characters in the image through the activation operation.

Optionally, the tag set includes a tag literal set and a tag location set;

the label character set records characters of images in the original image data set;

the set of tag locations records the coordinate locations of the text of the image in the original image dataset within the image.

Optionally, the local brightening process includes:

finding a text image area g (x, y) from the original image dataset according to the set of label positions;

calculating to obtain a brightness linear enhancement function e (x, y) according to the character image area g (x, y);

and enhancing the brightness of the character image area g (x, y) according to a brightness linear enhancement function e (x, y) to finish the local brightness processing.

Optionally, the enhancing the brightness of the text image region g (x, y) according to the brightness linear enhancement function e (x, y) includes calculating an enhanced text image region by using the following formula:

Figure BDA0002194254810000031

wherein f (x, y) is the enhanced character image area, N is the average brightness value of the character image area, and (x)1,y1) Is the coordinate of the lower left of the character image region, N1(x) is the brightness value of the lower left coordinate of the text image area4,y4) Is the upper right coordinate of the text image area, N4And the brightness value is the brightness value of the coordinate at the upper right of the character image area.

Further, to achieve the above object, the present invention also provides a computer readable storage medium having stored thereon an in-image text recognition program executable by one or more processors to implement the steps of the in-image text recognition method as described above.

The invention carries out local lightening processing on the character part of the original image data set, can improve the recognition rate of the characters, further utilizes affine transformation to extract the characteristic points of the characters, can be beneficial to the convolution operation and the pooling operation in the later period, and can effectively improve the recognition accuracy rate of the characters because the convolution operation and the pooling operation can maximally utilize the characteristic points for learning and recognition. Therefore, the method, the device and the computer readable storage medium for recognizing the characters in the image can realize accurate and efficient image character recognition.

Drawings

Fig. 1 is a schematic flow chart of a method for recognizing text in an image according to an embodiment of the present invention;

fig. 2 is a schematic diagram of an internal structure of an image text recognition apparatus according to an embodiment of the present invention;

fig. 3 is a block diagram illustrating an in-image character recognition program in the in-image character recognition apparatus according to an embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The invention provides a method for recognizing characters in an image. Fig. 1 is a schematic flow chart of a method for recognizing text in an image according to an embodiment of the present invention. The method may be performed by an apparatus, which may be implemented by software and/or hardware.

In this embodiment, the method for recognizing a text in an image includes:

s1, obtaining an original image data set containing characters and a label set, and carrying out local brightening treatment on the original image data set to obtain a standard image data set.

Preferably, the original image data set containing the text is composed of one image with text, for example, the image a is a picture of a bus stop board, so that various kinds of stop board text information exist, the image B is a college entrance businessman, so that encouraging banner text exists in the image B, the image C is a scene graph of a snack street, so that various kinds of text of restaurant names exist, and the like, thereby composing the original image data set.

Preferably, the label set comprises two parts, one is to record the text of the image in the original image data set, namely label textSet, as the set of labels records the text within the image B: the effort was successful, insisting on ensuring the success, recording the text within the image C: hunan cuisine, hotpot, Sichuan hotpot; the second is to record the coordinate position of the text of the image in the original image data set in the image, i.e. the label position set, for example, the lowest corner of the image B is used as the origin, and the position of the text in the image B is ((x)1,y1),(x2,y2),(x3,y3),(x4,y4) Wherein (x)1,y1) Lower left corner of the text region (x)2,y2) The upper left corner of the text region is represented (x)3,y3) Lower right corner of the text area (x)4,y4) The upper right corner of the text area is represented, thereby determining the area of the text within the image.

Specifically, the local brightening process includes finding a text image area g (x, y) from the original image data set according to the tag position set, and enhancing the brightness of the text image area g (x, y) according to a brightness linear enhancement function e (x, y) to complete the local brightening process. The local brightening treatment is used for enhancing the brightness contrast ratio of the character image area and other non-character image areas and facilitating subsequent character recognition. Further, the linear enhancing function e (x, y) enhances the brightness of the text image area g (x, y) as follows:

Figure BDA0002194254810000051

wherein f (x, y) is the enhanced character image area, N is the average brightness value of the character image area, and (x)1,y1) Is the coordinate of the lower left of the character image region, N1(x) is the brightness value of the lower left coordinate of the text image area4,y4) Is the upper right coordinate of the text image area, N4And the brightness value is the brightness value of the coordinate at the upper right of the character image area.

And S2, carrying out affine transformation on the standard image data set to generate a feature candidate region set.

Specifically, the affine transformation includes: and sequentially performing convolution extraction on the standard image data set to generate a space transformation matrix set, and performing matrix operation on the space transformation matrix set and the standard image data set to generate a characteristic candidate region set.

Preferably, the convolution extracting includes: z (x, y) ═ F (x, y) × T dm, where Z (x, y) is the set of spatial transformation matrices, F (x, y) is the standard image dataset including the text image region enhanced by F (x, y) above, T is the standard matrix extracted by the convolution, and m is the preset difference between the standard image dataset and the standard matrix.

Further, the matrix operation is as follows:

wherein the content of the first and second substances,representing pixels within the standard image dataset, t representing the standard image dataset,

Figure BDA0002194254810000062

a feature value of the feature candidate region set, s representing the feature candidate region set.

And S3, extracting a feature matrix set from the feature candidate region set based on convolution operation and pooling operation with initial internal parameters, and predicting a character set after activation operation is carried out according to the feature matrix set.

And the convolution operation and the pooling operation comprise the steps of constructing a convolution template in advance, determining convolution step length, calculating the convolution template and the characteristic candidate area set according to the convolution step length to obtain a convolution matrix set after the convolution operation, and finishing the convolution operation. And selecting the maximum value or the average value of the matrix in the convolution matrix set to replace the convolution matrix set, and finishing the pooling operation. The initial internal parameters are thus the convolution template, the convolution step size and the pooling operation.

Further, the pre-constructed convolution template may be a standard 3 x 3 matrix, such as

Figure BDA0002194254810000063

The calculation mode of calculating the matrix after the convolution operation is a mode that the convolution amplitude is 1 from left to right, and if the characteristic candidate area matrix with 9 × 9 characteristics in the characteristic candidate area set is:said pre-constructed convolution template

Figure BDA0002194254810000065

First andand calculating in the following way: multiplying corresponding dimensions of 1 × 0, 0 × 3, 1 × 1 and the like, and finally obtaining the result:

Figure BDA0002194254810000067

and so on, the pre-constructed convolution template

Figure BDA0002194254810000068

Continuing to traverse right one step with a convolution magnitude of 1 and the matrix as: the pre-constructed convolution templatePerforming the above operation to obtain the pre-constructed convolution template

Figure BDA00021942548100000610

It follows that a large number of small-dimensional matrices can be generated when the convolution operation is completed, as described above

Figure BDA00021942548100000611

And

Figure BDA00021942548100000612

etc. therefore, the pooling operation is to make the dimensionality of the large number of small-dimensional matrices produced by the convolution operation smaller, preferably using the maximization principle, as described aboveAndthe maximum values 3 and 7 are substituted to complete the pooling operation.

Preferably, the convolution and pooling operations are repeated, for example, 16 times to obtain a final feature matrix set.

Preferably, the activation operation is to pass the feature matrix set throughsAnd performing probability estimation on the soft max function, and selecting a character prediction result with the maximum probability as a final prediction character and outputting the final prediction character. The above-mentionedsoftmaThe function x is:

Figure BDA0002194254810000071

wherein, p (word) represents the output probability of word, k represents the data size of the characteristic matrix set, e is infinite acyclic decimal, and j represents the selectable number of word range. For example, when a word is a hot spicy dip, p (word) is calculated to be 0.87, and when a word is a hunan restaurant, p (word) is calculated to be 0.24, so that the feature matrix represents the characters of the hot spicy dip.

S4, comparing the character set with the label set in the same way, if the accuracy of the character set and the label set is smaller than a preset threshold, adjusting the internal parameters of the convolution operation and the pooling operation, returning to the step C to predict again, and if the accuracy of the character set and the label set is larger than the preset threshold, outputting the internal parameters of the convolution operation and the pooling operation as the optimal internal parameters.

Preferably, the character set and the tag set are sequentially compared, if the character of the image a predicted by the character set is "love me china", and the tag set recorded the character of the image a as "happy china", the character comparison between the character set and the tag set to the image a is wrong, and the same accuracy is obtained by analogy, and generally the same accuracy can be set to 90.5%.

The readjustment is to adjust the convolution template and the convolution step size, and to adjust the step size and pooling mode of the pooling operation.

And S5, receiving an image input by a user, performing affine transformation on the image input by the user, performing convolution operation and pooling operation with the optimal internal parameters, and recognizing and outputting characters in the image through the activation operation.

If an image which participates in an academic conference and is input by a user is accepted, the characters in the image are predicted according to the affine transformation, the convolution operation, the pooling operation and the activation operation: a exterior-applied decoration of professor Huang is for teaching.

The invention also provides a device for recognizing the characters in the image. Fig. 2 is a schematic diagram illustrating an internal structure of an image text recognition apparatus according to an embodiment of the present invention.

In the present embodiment, the device 1 may be a PC (Personal Computer), a terminal device such as a smart phone, a tablet Computer, or a mobile Computer, or may be a server. The intra-image word recognition apparatus 1 includes at least a memory 11, a processor 12, a communication bus 13, and a network interface 14.

The memory 11 includes at least one type of readable storage medium, which includes a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like. The memory 11 may in some embodiments be an internal storage unit of the in-image text recognition apparatus 1, for example a hard disk of the in-image text recognition apparatus 1. The memory 11 may also be an external storage device of the image-text recognition apparatus 1 in other embodiments, such as a plug-in hard disk provided on the image-text recognition apparatus 1, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the memory 11 may also include both an internal storage unit of the in-image character recognition apparatus 1 and an external storage device. The memory 11 can be used not only to store application software installed in the in-image character recognition apparatus 1 and various types of data, such as the code of the in-image character recognition program 01, but also to temporarily store data that has been output or is to be output.

Processor 12, which in some embodiments may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor or other data Processing chip, is configured to execute program code or process data stored in memory 11, such as executing word in image recognition program 01.

The communication bus 13 is used to realize connection communication between these components.

The network interface 14 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), typically used to establish a communication link between the apparatus 1 and other electronic devices.

Optionally, the apparatus 1 may further comprise a user interface, which may comprise a Display (Display), an input unit such as a Keyboard (Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the in-image text recognition apparatus 1 and for displaying a visual user interface.

Fig. 2 shows only the in-image character recognition apparatus 1 with the components 11 to 14 and the in-image character recognition program 01, and it will be understood by those skilled in the art that the structure shown in fig. 1 does not constitute a limitation of the in-image character recognition apparatus 1, and may include fewer or more components than those shown, or some components in combination, or a different arrangement of components.

In the embodiment of the apparatus 1 shown in fig. 2, the memory 11 stores an in-image character recognition program 01; the processor 12 implements the following steps when executing the in-image character recognition program 01 stored in the memory 11:

the method comprises the steps of firstly, obtaining an original image data set containing characters and a label set, and carrying out local brightening processing on the original image data set to obtain a standard image data set.

Preferably, the original image data set containing the text is composed of one image with text, for example, the image a is a picture of a bus stop board, so that various kinds of stop board text information exist, the image B is a college entrance businessman, so that encouraging banner text exists in the image B, the image C is a scene graph of a snack street, so that various kinds of text of restaurant names exist, and the like, thereby composing the original image data set.

Preferably, the label set includes two parts, one is to record the text of the image in the original image data set, i.e. the label text set, if the label set records the text in the image B: the effort was successful, insisting on ensuring the success, recording the text within the image C: hunan cuisine, hotpot, Sichuan hotpot; the second is to record the coordinate position of the text of the image in the original image data set in the image, i.e. the label position set, for example, the lowest corner of the image B is used as the origin, and the position of the text in the image B is ((x)1,y1),(x2,y2),(x3,y3),(x4,y4) Wherein (x)1,y1) Lower left corner of the text region (x)2,y2) The upper left corner of the text region is represented (x)3,y3) Lower right corner of the text area (x)4,y4) The upper right corner of the text area is represented, thereby determining the area of the text within the image.

Specifically, the local brightening process includes finding a text image area g (x, y) from the original image data set according to the tag position set, and enhancing the brightness of the text image area g (x, y) according to a brightness linear enhancement function e (x, y) to complete the local brightening process. The local brightening treatment is used for enhancing the brightness contrast ratio of the character image area and other non-character image areas and facilitating subsequent character recognition. Further, the linear enhancing function e (x, y) enhances the brightness of the text image area g (x, y) as follows:

Figure BDA0002194254810000091

wherein f (x, y) is the enhanced character image area, N is the average brightness value of the character image area, and (x)1,y1) Is the coordinate of the lower left of the character image region, N1(x) is the brightness value of the lower left coordinate of the text image area4,y4) Is the upper right coordinate of the text image area, N4And the brightness value is the brightness value of the coordinate at the upper right of the character image area.

And secondly, carrying out affine transformation on the standard image data set to generate a feature candidate region set.

Specifically, the affine transformation includes: and sequentially performing convolution extraction on the standard image data set to generate a space transformation matrix set, and performing matrix operation on the space transformation matrix set and the standard image data set to generate a characteristic candidate region set.

Preferably, the convolution extracting includes: z (x, y) ═ F (x, y) × T dm, where Z (x, y) is the set of spatial transformation matrices, F (x, y) is the standard image dataset including the text image region enhanced by F (x, y) above, T is the standard matrix extracted by the convolution, and m is the preset difference between the standard image dataset and the standard matrix.

Further, the matrix operation is as follows:

Figure BDA0002194254810000101

wherein the content of the first and second substances,

Figure BDA0002194254810000102

representing pixels within the standard image dataset, t representing the standard image dataset,a feature value of the feature candidate region set, s representing the feature candidate region set.

And step three, extracting a characteristic matrix set from the characteristic candidate region set based on convolution operation and pooling operation with initial internal parameters, and predicting a character set after activation operation is carried out according to the characteristic matrix set.

And the convolution operation and the pooling operation comprise the steps of constructing a convolution template in advance, determining convolution step length, calculating the convolution template and the characteristic candidate area set according to the convolution step length to obtain a convolution matrix set after the convolution operation, and finishing the convolution operation. And selecting the maximum value or the average value of the matrix in the convolution matrix set to replace the convolution matrix set, and finishing the pooling operation. The initial internal parameters are thus the convolution template, the convolution step size and the pooling operation.

Further, the pre-constructed convolution template may be a standard 3 x 3 matrix, such as

Figure BDA0002194254810000104

The calculation mode of calculating the matrix after the convolution operation is a mode that the convolution amplitude is 1 from left to right, and if the characteristic candidate area matrix with 9 × 9 characteristics in the characteristic candidate area set is:

Figure BDA0002194254810000105

said pre-constructed convolution template

Figure BDA0002194254810000106

First and

Figure BDA0002194254810000107

and calculating in the following way: multiplying corresponding dimensions of 1 × 0, 0 × 3, 1 × 1 and the like, and finally obtaining the result:

Figure BDA0002194254810000108

and so on, the pre-constructed convolution template

Figure BDA0002194254810000109

Continuing to traverse right one step with a convolution magnitude of 1 and the matrix as: the pre-constructed convolution template

Figure BDA0002194254810000111

Performing the above operation to obtain the pre-constructed convolution template

Figure BDA0002194254810000112

It follows that a large number of small-dimensional matrices can be generated when the convolution operation is completed, as described above

Figure BDA0002194254810000113

And

Figure BDA0002194254810000114

etc. therefore, the pooling operation is to make the dimensionality of the large number of small-dimensional matrices produced by the convolution operation smaller, preferably using the maximization principle, as described aboveAnd

Figure BDA0002194254810000116

the maximum values 3 and 7 are substituted to complete the pooling operation.

Preferably, the convolution and pooling operations are repeated, for example, 16 times to obtain a final feature matrix set.

Preferably, the activation operation is to perform probability estimation on the feature matrix set through a softmax function, and select a character prediction result with the highest probability as a final predicted character and output the final predicted character. The softmax function is:

Figure BDA0002194254810000117

wherein, p (word) represents the output probability of word, k represents the data size of the characteristic matrix set, e is infinite acyclic decimal, and j represents the selectable number of word range. For example, when a word is a hot spicy dip, p (word) is calculated to be 0.87, and when a word is a hunan restaurant, p (word) is calculated to be 0.24, so that the feature matrix represents the characters of the hot spicy dip.

And step four, comparing the character set with the label set in the same way, if the accuracy rate of the character set and the label set is smaller than a preset threshold value, adjusting the internal parameters of the convolution operation and the pooling operation, returning to the step C to predict again, and if the accuracy rate of the character set and the label set is larger than the preset threshold value, outputting the internal parameters of the convolution operation and the pooling operation as the optimal internal parameters.

Preferably, the character set and the tag set are sequentially compared, if the character of the image a predicted by the character set is "love me china", and the tag set recorded the character of the image a as "happy china", the character comparison between the character set and the tag set to the image a is wrong, and the same accuracy is obtained by analogy, and generally the same accuracy can be set to 90.5%.

The readjustment is to adjust the convolution template and the convolution step size, and to adjust the step size and pooling mode of the pooling operation.

And step five, receiving an image input by a user, performing affine transformation on the image input by the user, performing convolution operation and pooling operation with the optimal internal parameters, and identifying and outputting characters in the image through the activation operation.

If an image which participates in an academic conference and is input by a user is accepted, the characters in the image are predicted according to the affine transformation, the convolution operation, the pooling operation and the activation operation: a exterior-applied decoration of professor Huang is for teaching.

Alternatively, in other embodiments, the intra-image word recognition program may be further divided into one or more modules, and the one or more modules are stored in the memory 11 and executed by one or more processors (in this embodiment, the processor 12) to implement the present invention.

For example, referring to fig. 3, a schematic diagram of program modules of an intra-image character recognition program in an embodiment of the apparatus for recognizing intra-image characters of the present invention is shown, in this embodiment, the intra-image character recognition program may be divided into a data receiving and processing module 10, a feature extraction module 20, a model training module 30, and a character recognition output module 40, which exemplarily:

the data receiving and processing module 10 is configured to: the method comprises the steps of obtaining an original image data set containing characters and a label set, and carrying out local brightening processing on the original image data set to obtain a standard image data set.

The feature extraction module 20 is configured to: and carrying out affine transformation on the standard image data set to generate a feature candidate region set.

The model training module 30 is configured to: extracting a feature matrix set from the feature candidate area set based on convolution operation and pooling operation with initial internal parameters, predicting a character set after activation operation is carried out according to the feature matrix set, carrying out identical comparison on the character set and the tag set, adjusting the internal parameters of the convolution operation and the pooling operation if the identical accuracy of the character set and the tag set is smaller than a preset threshold value, returning to the step C for re-prediction, and outputting the internal parameters of the convolution operation and the pooling operation as the optimal internal parameters if the identical accuracy of the character set and the tag set is larger than the preset threshold value.

The character recognition output module 40 is configured to: and receiving an image input by a user, performing affine transformation on the image input by the user, performing convolution operation and pooling operation with the optimal internal parameters, and identifying and outputting characters in the image through the activation operation.

The functions or operation steps implemented by the data receiving and processing module 10, the feature extraction module 20, the model training module 30, the character recognition output module 40 and other program modules when executed are substantially the same as those of the above embodiments, and are not described herein again.

Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, where an intra-image text recognition program is stored on the computer-readable storage medium, where the intra-image text recognition program is executable by one or more processors to implement the following operations:

the method comprises the steps of obtaining an original image data set containing characters and a label set, and carrying out local brightening processing on the original image data set to obtain a standard image data set.

And carrying out affine transformation on the standard image data set to generate a feature candidate region set.

Extracting a feature matrix set from the feature candidate area set based on convolution operation and pooling operation with initial internal parameters, predicting a character set after activation operation is carried out according to the feature matrix set, carrying out identical comparison on the character set and the tag set, adjusting the internal parameters of the convolution operation and the pooling operation if the identical accuracy of the character set and the tag set is smaller than a preset threshold value, returning to the step C for re-prediction, and outputting the internal parameters of the convolution operation and the pooling operation as the optimal internal parameters if the identical accuracy of the character set and the tag set is larger than the preset threshold value.

And receiving an image input by a user, performing affine transformation on the image input by the user, performing convolution operation and pooling operation with the optimal internal parameters, and identifying and outputting characters in the image through the activation operation.

It should be noted that the above-mentioned numbers of the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

16页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种由粗到精的车牌检测算法及其系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!