Text image direction correction method and device and electronic equipment

文档序号：169635 发布日期：2021-10-29 浏览：20次中文

阅读说明：本技术 一种文本图像方向矫正方法、装置及电子设备 (Text image direction correction method and device and electronic equipment ) 是由李霄鹏袁景伟胡亚龙黄宇飞王岩于 2021-07-23 设计创作，主要内容包括：本发明属于图像处理技术领域,提供一种文本图像方向矫正方法、装置及电子设备,所述方法包括：获取第一文本图像；对所述第一文本图像按照N个预定旋转角度进行旋转操作,得到N个旋转图像,其中,N为大于等于二的自然数；根据所述N个旋转图像预估所述第一文本图像的矫正方向；根据所述第一文本图像的矫正方向对第一文本图像进行方向矫正。本发明根据第一文本图像同步旋转后得到的多个方向的旋转图像综合判断第一文本图像的矫正方向,具有更高的准确率；并且可快速识别文本图像的矫正方向,同时还可以识别不包含文字的文本图像的矫正方向,操作简便,识别快速,应用范围广的优点。(The invention belongs to the technical field of image processing, and provides a text image direction correction method, a text image direction correction device and electronic equipment, wherein the method comprises the following steps: acquiring a first text image; rotating the first text image according to N preset rotation angles to obtain N rotating images, wherein N is a natural number which is greater than or equal to two; predicting the correction direction of the first text image according to the N rotating images; and carrying out direction correction on the first text image according to the correction direction of the first text image. The correction direction of the first text image is comprehensively judged according to the rotating images in multiple directions obtained after the first text image is synchronously rotated, so that the correction method has higher accuracy; the correction direction of the text image can be quickly recognized, the correction direction of the text image without characters can be recognized, and the method has the advantages of simplicity and convenience in operation, quickness in recognition and wide application range.)

1. A method for correcting the direction of a text image is characterized by comprising the following steps:

acquiring a first text image;

rotating the first text image according to N preset rotation angles to obtain N rotating images, wherein N is a natural number which is greater than or equal to two;

predicting a correction direction of the first text image according to the N rotating images, wherein the correction direction refers to a direction in which the first text image is rotated for facilitating subsequent image identification;

and carrying out direction correction on the first text image according to the correction direction of the first text image.

2. The method according to claim 1, wherein estimating the correction direction of the first text image based on the N rotated images comprises:

inputting the N rotating images into a trained direction estimation model to evaluate the probability of correct direction of each rotating image; and taking the direction of the rotating image with the highest probability as the rectification direction of the first text image.

Optionally, the direction estimation model is generated after an image classification model is trained.

3. The method according to claim 2, wherein the direction prediction model comprises: the device comprises N neural networks and a judgment module respectively connected with the N neural networks; the N neural networks respectively calculate the probability that the direction of the input rotation image is correct; the judging module judges the correction direction of the first text image according to the probability level that the direction of each rotating image is correct;

optionally, the neural network includes a shallow convolutional network and a self-attention neural network connected in sequence, and the self-attention neural network is connected to the determining module; the shallow layer convolution network is used for extracting the characteristic data of the rotating image; the self-attention neural network is used for acquiring the probability that the direction of the rotating image is correct according to the characteristic data;

optionally, the shallow convolutional network comprises: a plurality of volume blocks and a full link layer;

optionally, each of the convolution blocks includes a convolution layer, a pooling layer, a batch normalization layer, and an excitation layer.

4. The method according to claim 3, wherein the self-attention neural network is based on a Transformer model;

optionally, the self-attention neural network comprises: the self-attention module is used for converting an input image into a specification meeting the requirements of the two classification modules, and the two classification modules are used for obtaining the probability that the direction of the rotating image is correct;

optionally, the two-class module comprises a fully connected layer and a softmax layer.

5. The method for correcting the orientation of a text image according to claim 1, wherein rotating the first text image by N predetermined rotation angles to obtain N rotated images comprises:

converting the first text image into a first text image matrix;

and performing matrix operation on the first text image matrix to obtain N rotation matrixes corresponding to the N rotation images.

6. The method according to claim 4, wherein the predetermined rotation angles are four, and the method includes: the first text image is not rotated, the first text image is rotated by 90 degrees in a first direction, the first text image direction is rotated by 180 degrees in a first direction, and the first text image direction is rotated by 270 degrees in a first direction;

performing matrix operation on the first text image matrix to obtain N rotated image matrices corresponding to the N rotated images, including:

directly taking the first text image matrix as the first rotated image matrix;

after the first text image matrix is transposed, the vertical bisector of the transposed matrix is used as a symmetry axis to carry out axial symmetry processing on matrix elements, and a second rotary image matrix is obtained;

performing axial symmetry processing on the matrix elements by taking the vertical bisector of the first text image matrix as a symmetry axis, and performing axial symmetry processing on the matrix elements by taking the horizontal bisector of the processed matrix as a symmetry axis to obtain a third rotation image matrix;

and after the first text image matrix is transposed, performing axisymmetric processing on matrix elements by taking a horizontal bisector of the transposed matrix as a symmetry axis to obtain a fourth rotated image matrix.

7. A method for searching questions by photographing is characterized by comprising the following steps:

collecting a first text image containing a target title to be searched;

performing direction correction on the first text image by adopting the text image direction correction method according to any one of claims 1-6 before performing target topic identification;

and identifying the first text image.

8. An apparatus for rectifying the orientation of a text image, the apparatus comprising:

the acquisition module is used for acquiring a first text image;

the rotation module is used for performing rotation operation on the first text image according to N preset rotation angles to obtain N rotation images, wherein N is a natural number which is more than or equal to two;

the estimation module is used for estimating the correction direction of the first text image according to the N rotating images, wherein the correction direction refers to the direction of the first text image subjected to rotating operation for facilitating subsequent image identification;

and the correction module is used for correcting the direction of the first text image according to the correction direction of the first text image.

9. An electronic device comprising a processor and a memory, the memory for storing a computer-executable program, characterized in that:

the computer program, when executed by the processor, performs the method of any of claims 1-6.

10. A computer-readable medium storing a computer-executable program, wherein the computer-executable program, when executed, implements the method of any of claims 1-6.

Technical Field

The present invention belongs to the field of image processing technology, and is especially one kind of image direction correcting technology, and is especially one text image direction correcting method, device, electronic equipment and computer readable medium.

Text image as referred to herein refers to an image containing text and/or graphics;

the text image direction refers to the arrangement direction of characters and/or patterns in the text image according to the reading habit, for example, the characters are positive, not inclined and not inverted;

the identification direction or direction as referred to herein refers to the direction of a predetermined font or pattern in the subject identification process, and generally coincides with the general arrangement direction of text or images when reading.

Background

At present, more and more photo-taking and question-searching products appear in the market. Through the product, a user can find the same or similar questions and answers only by shooting the question images to be searched and uploading the question images, so that the question retrieval is greatly facilitated.

The existing shooting and question searching is mainly realized through image recognition and question searching, characters or patterns in text images uploaded by users are recognized, and then the characters or patterns are delivered to a retrieval system to quickly search existing questions in a question bank, so that the same or similar questions and answers are found. In practice, the direction of the font or pattern in the text image taken by the user (i.e., the text image direction) is not always consistent with the predetermined recognition direction of the font or pattern during the topic recognition (i.e., the normal reading direction), and the font or pattern falls (or is deflected seriously) during the topic recognition, which causes a recognition error of the font or pattern, affects the accuracy of the topic search, and reduces the user experience.

Disclosure of Invention

Technical problem to be solved

The invention aims to solve the technical problem of wrong question identification caused by inconsistency between the direction of a shot text image and the identification direction.

(II) technical scheme

In order to solve the above technical problem, an aspect of the present invention provides a method for correcting a direction of a text image, including:

acquiring a first text image;

rotating the first text image according to N preset rotation angles to obtain N rotating images, wherein N is a natural number which is greater than or equal to two;

and carrying out direction correction on the first text image according to the correction direction of the first text image.

According to a preferred embodiment of the present invention, estimating the correction direction of the first text image according to the N rotated images includes:

Optionally, the direction estimation model is generated after an image classification model is trained.

According to a preferred embodiment of the present invention, the direction estimation model includes: the device comprises N neural networks and a judgment module respectively connected with the N neural networks; the N neural networks respectively calculate the probability that the direction of the input rotation image is correct; the judging module judges the correction direction of the first text image according to the probability level that the direction of each rotating image is correct;

optionally, the shallow convolutional network comprises: a plurality of volume blocks and a full link layer;

optionally, each of the convolution blocks includes a convolution layer, a pooling layer, a batch normalization layer, and an excitation layer.

According to a preferred embodiment of the present invention, the self-attention neural network is based on a Transformer model;

optionally, the two-class module comprises a fully connected layer and a softmax layer.

According to a preferred embodiment of the present invention, performing a rotation operation on the first text image according to N predetermined rotation angles to obtain N rotated images includes:

converting the first text image into a first text image matrix;

and performing matrix operation on the first text image matrix to obtain N rotation matrixes corresponding to the N rotation images.

According to a preferred embodiment of the present invention, the predetermined rotation angles are four, including: the first text image is not rotated, the first text image is rotated by 90 degrees in a first direction, the first text image direction is rotated by 180 degrees in a first direction, and the first text image direction is rotated by 270 degrees in a first direction;

performing matrix operation on the first text image matrix to obtain N rotated image matrices corresponding to the N rotated images, including:

directly taking the first text image matrix as the first rotated image matrix;

The second aspect of the present invention provides a method for searching questions by taking pictures, comprising:

collecting a first text image containing a target title to be searched;

before target question recognition is carried out, carrying out direction correction on the first text image by adopting the text image direction correction method;

and identifying the first text image.

A third aspect of the present invention provides an apparatus for correcting a direction of a text image, the apparatus comprising:

the acquisition module is used for acquiring a first text image;

and the correction module is used for correcting the direction of the first text image according to the correction direction of the first text image.

A fourth aspect of the invention proposes an electronic device comprising a processor and a memory for storing a computer-executable program, the processor performing the method when the computer program is executed by the processor.

The fifth aspect of the present invention also provides a computer-readable medium storing a computer-executable program, which when executed, implements the method.

(III) advantageous effects

The method comprises the steps of carrying out rotation operation on a first text image according to N preset rotation angles to obtain N rotation images, and estimating the correction direction of the first text image according to the N rotation images, for example, inputting the N rotation images as feature data into a trained direction estimation model to obtain the correction direction and the like; and correcting the direction of the first text image according to the correction direction of the first text image, so that the direction of the text image is consistent with the recognition direction, the problem recognition accuracy is improved, and the user experience is improved. Compared with the prior art, the correction direction of the first text image is comprehensively judged according to the rotation images in multiple directions obtained after the first text image is rotated, and the accuracy is higher; the invention can quickly identify the correction direction of the text image only by inputting the text image without detecting and identifying the character line, and can also correct the direction of the text image without characters, thereby having the advantages of simple operation, quick identification and wide application range.

The prediction model of the invention can comprise N neural networks and judgment modules respectively connected with the N neural networks; the N neural networks respectively calculate the probability that the direction of the input rotation image is correct; the judging module judges the correction direction of the first text image according to the probability level that the direction of each rotating image is correct. Compared with the traditional convolutional neural network, the identification effect of the model is more accurate.

Drawings

FIG. 1 is a flow chart of a method for correcting the orientation of a text image according to the present invention;

FIGS. 2 a-2 e are schematic diagrams illustrating a rotation operation performed on a first text image according to the present invention;

FIGS. 3 a-3 d are schematic illustrations of the horizontal bisector of the matrix and the vertical bisector of the matrix of the present invention;

FIG. 4 is a schematic diagram of a direction estimation model according to the present invention;

FIG. 5 is a schematic diagram of a neural network of the present invention;

FIG. 6 is a schematic flow chart illustrating estimating a correction direction of the first text image according to the present invention;

FIG. 7 is a schematic structural diagram of a device for correcting the orientation of a text image according to the present invention;

FIG. 8 is a schematic structural diagram of an electronic device of one embodiment of the invention;

fig. 9 is a schematic diagram of a computer-readable recording medium of an embodiment of the present invention.

Detailed Description

In describing particular embodiments, specific details of structures, properties, effects, or other features are set forth in order to provide a thorough understanding of the embodiments by one skilled in the art. However, it is not excluded that a person skilled in the art may implement the invention in a specific case without the above-described structures, performances, effects or other features.

The flow chart in the drawings is only an exemplary flow demonstration, and does not represent that all the contents, operations and steps in the flow chart are necessarily included in the scheme of the invention, nor does it represent that the execution is necessarily performed in the order shown in the drawings. For example, some operations/steps in the flowcharts may be divided, some operations/steps may be combined or partially combined, and the like, and the execution order shown in the flowcharts may be changed according to actual situations without departing from the gist of the present invention.

The block diagrams in the figures generally represent functional entities and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different network and/or processing unit devices and/or microcontroller devices.

The same reference numerals denote the same or similar elements, components, or parts throughout the drawings, and thus, a repetitive description thereof may be omitted hereinafter. It will be further understood that, although the terms first, second, third, etc. may be used herein to describe various elements, components, or sections, these elements, components, or sections should not be limited by these terms. That is, these phrases are used only to distinguish one from another. For example, a first device may also be referred to as a second device without departing from the spirit of the present invention. Furthermore, the term "and/or", "and/or" is intended to include all combinations of any one or more of the listed items.

The terms referred to herein explain:

text image: the image containing text or pattern or both, such as the photo of test paper page, book cover, etc.

And (3) correcting the direction of the text image: and correcting the image with the direction different from the recognition direction into the image with the direction same as the recognition direction.

The recognition direction refers to the arrangement direction of the set fonts or patterns in the recognition process, such as title recognition, and generally, when the image is normally shot (shot strictly as required without deviation or distortion), the arrangement of the title fonts and the image in the image conforms to the reading habit of people, for example, the fonts are forward, do not invert and do not deviate.

In order to solve the technical problem, the invention provides a text image direction correction method, which comprises the steps of carrying out rotation operation on a first text image according to N preset rotation angles to obtain N rotation images; then, for example, inputting the N rotated images as feature data into a trained direction estimation model to estimate the correction direction of the first text image; and finally, carrying out direction correction on the first text image according to the correction direction of the first text image, thereby ensuring that the direction of the text image is consistent with the recognition direction, improving the problem recognition accuracy rate and improving the user experience.

The correction direction of the first text image can be estimated according to the N rotating images by adopting an estimation model, wherein the estimation model comprises N neural networks and judgment modules respectively connected with the N neural networks; the N neural networks respectively calculate the probability that the direction of the input rotation image is correct; the judging module judges the correction direction of the first text image according to the probability level that the direction of each rotating image is correct. The correct direction of the rotated image means that the direction of the rotated image (obtained after the first text image is rotated by a preset angle) is consistent with or close to the image direction required when the direction is identified subsequently. Compared with the traditional convolutional neural network, the identification effect of the model is more accurate.

Wherein, the neural network can comprise a shallow layer convolution network and a self-attention neural network which are connected in sequence; the self-attention neural network is connected with the judging module. The shallow layer convolution network is used for extracting the characteristic data of the rotating image; the self-attention neural network includes: the self-attention neural network is used for acquiring the probability that the direction of the rotating image is correct according to the characteristic data. Optionally, the shallow convolutional network comprises: a plurality of volume blocks, each of which may include a volume layer, a pooling layer, a batch normalization bn (batch normalization) layer, and a stimulus ReLU layer (ReLU layer), and a fully connected layer.

In order that the objects, technical solutions and advantages of the present invention will become more apparent, the present invention will be further described in detail with reference to the accompanying drawings in conjunction with the following specific embodiments.

Fig. 1 is a schematic flow chart of a text image direction correction method according to the present invention, and as shown in fig. 1, the method includes the following steps:

s1, acquiring a first text image;

in the step, a first text image to be subjected to directional correction so as to perform operations such as image recognition better is obtained, and the first text image can be, for example, a photo which is collected by a user and contains a target topic to be searched in a scene of photographing and searching the topic.

Here, the first text image may be an original text image obtained directly from the acquisition end without any processing, or may be an image of the original text image after image processing (such as filtering processing, brightness processing, and the like) for facilitating image recognition is performed on the original text image.

In the step, for example, a first text image containing a target title searched by a user can be directly obtained through an image collector such as a camera and a scanner; a first text image containing a target topic searched by a user may also be retrieved from memory. The target topic in the first text image may only contain text (such as a reading topic of a senior level), may only contain a pattern (such as a kindergarten topic), and may also contain both text and a pattern (such as a geometric topic), and the present invention is not limited in particular.

S2, rotating the first text image according to N preset rotation angles to obtain N rotating images;

in order to improve the accuracy of direction estimation, the method respectively rotates the first text image according to a plurality of preset rotation angles to obtain a plurality of rotation images, and inputs the plurality of rotation images into a direction estimation model to estimate the preset rotation angle (namely, the correction direction) to which the first text image needs to be rotated when the direction of the first text image is corrected.

The rotation of the first text image may be performed in a synchronous rotation manner, or a rotation effect may be achieved by image processing, for example.

Illustratively, this step may include:

s21, converting the first text image into a first text image matrix;

in one embodiment, python, matlab, etc. may be used to convert the first text image into a corresponding first text image matrix.

And S22, performing matrix operation on the first text image matrix to obtain N rotation matrixes corresponding to the N rotation images.

The predetermined rotation angle herein refers to a preset angle corresponding to the rotation of the first text image from the starting direction to the ending direction. The number N of predetermined rotation angles is a natural number equal to or greater than two. Obviously, in the present invention, the greater the number of the predetermined rotation angles, the higher the correction direction estimation accuracy of the direction estimation model, but the greater the calculation amount of the model. In order to balance the estimation accuracy and the calculation amount, the present invention preferably selects the number N of the predetermined rotation angles to be four, including: the first text image orientation is not rotated, the first text image orientation is rotated 90 degrees in the first direction, the first text image orientation is rotated 180 degrees in the first direction, and the first text image orientation is rotated 270 degrees in the first direction. The first direction may be a clockwise direction or a counterclockwise direction.

The following describes the rotation process of the first text image with reference to fig. 2a to 2e, taking the first direction as the clockwise direction as an example.

As shown in fig. 2a as the first text image, for convenience of explaining the rotation process, the starting direction of the first text image may be preset, for example, the starting direction of the first text image is set as the left-to-right reading direction of the characters and/or patterns in the first text image, i.e. the left-to-right direction indicated by the arrow OA in fig. 2 a. In this way, the rotation for the first text image can be referenced to the starting direction of the first text image. The first preset rotation angle is zero degrees, the first text image matrix is directly used as the first rotated image matrix for the first text image without rotating, and fig. 2b is a first rotated image obtained by not rotating the initial direction OA of the first text image. The second preset rotation angle is 90 degrees clockwise relative to OA, and for 90 degrees clockwise rotation of the first text image, the matrix elements can be axisymmetrically processed by using a vertical bisector of the transposed matrix as a symmetry axis after the first text image matrix is transposed, so as to obtain a second rotated image matrix. Fig. 2c is a second rotated image obtained by rotating the first text image start direction OA by 90 degrees in a clockwise direction.

The third preset rotation angle is 180 degrees clockwise relative to OA, and for rotating the first text image 180 degrees along the first direction, after the vertical bisector of the first text image matrix is used as a symmetry axis to carry out axisymmetric processing on the matrix elements, the horizontal bisector of the processed matrix is used as a symmetry axis to carry out axisymmetric processing on the matrix elements, so that a third rotation image matrix is obtained; fig. 2d is a third rotated image obtained by rotating the first text image start direction OA by 180 degrees in a clockwise direction. And the fourth preset rotation angle is 270 degrees clockwise relative to the OA, the first text image is rotated 270 degrees along the first direction, the first text image matrix is transposed, and then the horizontal bisector of the transposed matrix is used as a symmetry axis to carry out axisymmetric processing on the matrix elements, so that a fourth rotated image matrix is obtained. Fig. 2e is a fourth rotated image obtained by rotating the first text image start direction OA by 90 degrees in a clockwise direction.

Wherein the horizontal bisector of the matrix refers to the straight line where the middle row of the matrix is located, and for the matrix with odd rows, the straight line where the middle row is located refers to the straight line connecting the various elements in the middle row of the matrix, such as the straight line M1N1 in fig. 3 a. For an even row matrix, the straight line in the middle row refers to the row dividing line that divides the middle two rows of the matrix, as shown by line M2N2 in fig. 3 b. The vertical bisector of the matrix refers to the line in which the middle column of the matrix lies, and for an odd column matrix, the line in which the middle column lies refers to the line connecting the various elements in the middle column of the matrix, as in line M3N3 in fig. 3 c. For an even column matrix, the straight line in the middle column refers to the column dividing line that divides the middle two columns of the matrix, as shown by line M4N4 in fig. 3 d.

In this context, the above matrix operation can be performed by calling a function cv2.flip inside opencv through an interface of python.

S3, pre-estimating the correction direction of the first text image according to the N rotating images;

the correction direction refers to a direction in which the first text image is rotated for facilitating subsequent image recognition; the direction is generally a direction that is in accordance with the reading habit of the user, that is, the user sees the font in the image as positive, not negative, and not skewed.

In this step, the correction direction of the first text image can be estimated according to the N rotated images, and the specific implementation manner is not limited. For example, a test text may be added to the first text image, and then the test texts of N rotated images are identified, respectively, and it is determined which rotated image has a direction closer to the correct one according to the identification accuracy level, that is, the image direction corresponding to the most effective result of the identification scheme is determined to be closer.

This step may further evaluate which direction of the N rotated images may be used as the correction direction of the first text image, that is, which preset rotation angle is to be used for the rotation when determining the first image direction correction. In one embodiment, the correction direction of the first text image may be determined by calculating a probability that the direction of each rotated image is correct, and selecting a rotation angle corresponding to the rotated image with the highest probability. This embodiment can be implemented by the following direction estimation model.

For example, the N rotation images may be input into a trained direction estimation model to evaluate the probability that the direction of each rotation image is correct; and taking the preset rotation angle corresponding to the rotation image with the maximum probability as the correction direction of the first text image. The probability that the direction of the rotated image is correct refers to the probability that the direction of the rotated image is the identification direction. The direction pre-estimation model can pre-estimate the probability that each rotating image direction is the identification direction according to each rotating image, and the preset rotating angle corresponding to the rotating image direction with the maximum probability is used as the correction direction of the first text image, namely the target direction to which the first text image direction needs to be corrected, and the target direction is consistent with the identification direction, so that the problem that the identification cannot be performed or the identification accuracy is low due to inversion or deflection of characters and/or patterns in the title identification process can be avoided or reduced. The identification direction refers to the direction of the font or pattern of the input image set by the identification scheme in the process of identifying the subject.

Optionally, the direction estimation model is generated after an image classification model is trained. The direction of the original image is estimated by recognizing the type of each rotated image of the original image, and the original image correction direction is obtained.

For example, as shown in fig. 4, the direction estimation model 30 may include: n neural networks 31, and a determination module 32 connected to the N neural networks 31, respectively;

the N neural networks 31 respectively calculate the probability that the direction of the input rotated image is correct; the neural network 31 comprises a shallow convolutional network 311 and a self-attention neural network 312 which are connected in sequence; the self-attention neural network 312 is connected to the determination module 32.

The determining module 32 determines the rectification direction of the first text image according to the probability level that the direction of each rotated image is correct.

For example, in order to improve the recognition accuracy of the direction judgment, the judgment module 32 may comprehensively judge the rectification direction of the first text image according to the probability that the direction of each rotated image is correct. For example, the rotated image direction with the highest probability is directly determined as the correction direction of the first text image, or a specified direction between the rotated image direction with the highest probability and the rotated image direction with the second probability is used as the first correction direction of the first text image. Illustratively, the shallow convolutional network 310 includes: a plurality of convolution blocks (con1, con2, con3) and a full connection layer, the shallow convolution network being used to extract feature data of the rotated image; the self-attention neural network includes: the self-attention neural network is used for acquiring the probability that the direction of the rotating image is correct according to the characteristic data; optionally, each convolution block includes a convolution layer, a pooling layer, a batch normalized BN layer, and an excitation relu layer.

In one embodiment, as shown in fig. 5, the shallow convolutional network 310 comprises: three volume blocks: conv1, conv2 and conv3, and a full tie layer. Each volume block is composed of a volume layer, a pooling layer, a BN (Batch Normalization) layer, and a relu (normalized Linear Units) layer. The input rotated image sequentially passes through three rolling blocks and a full connection layer, wherein: for example, the data format of the rotated image matrix is [ h1, w1, c1], the data format after being rolled into blocks is [ h2, w2, c2], and the data format after passing through the full connection layer is [ T, c3 ]. Where h1 and w1 indicate that the rotated image resize is of a fixed size, for example, 512 × 512, and c1 is 3, which indicates that the input image is a color 3 channel map; h2, w2, c2 are the sizes of the corresponding feature maps of the rotated images, such as: 16 x 128; t, c3 is the size of the output image, such as: 16*256.

The self-attention neural network is used to identify the probability that the rotated image orientation is correct (i.e., consistent with the identified orientation) based on the feature data. Wherein the self-attention neural network 311 may be based on a Transformer model.

The self-attention neural network 311 includes: a self-attention module and a classification module. The self-attention module is used for converting an input image into a specification meeting the requirements of the two classification modules, and the two classification modules are used for obtaining the probability that the direction of the rotating image is correct.

Illustratively, the self-attention module includes a position-dependent feed-forward network that encodes the input data while encoding its position information in an encoding stage, and a multi-headed self-attention layer. The multi-headed self-attention layer uses a self-attention mechanism in each sub-layer to associate input data and its position in the same input sequence. Furthermore, attention is called multi-headed because several attention layers are stacked in parallel, with different linear transformations of the same input. This helps the model capture various aspects of the input and improve its expressive power. Illustratively, the self-attention module has an input data size of 16 × 256 and an output size of 16 × 200. The two-classification module is composed of a full connection layer and an activation layer (such as softmax). This architecture enables parallel processing, shorter training times and higher conversion accuracy without any duplicate components. Further, when the maximum value of the probability of the correct direction in each rotated image is smaller than the preset probability, the correction direction of the first text image is not determined, but based on the probability of the correct direction of each rotated image, N preset rotation angles of the next rotation are determined, the probability calculation and judgment are continued until the maximum value of the probability of the correct direction is smaller than the preset probability, and then the correction direction is determined.

Illustratively, if the predetermined angle of rotation of the rotated image a is 45 degrees counterclockwise; if the predetermined rotation angle of the B picture is 90 degrees counterclockwise; the probability that the orientation of the rotated image a is correct is calculated to be the largest, and B is the second (or almost the same), it can be determined that the correction direction of the first rotated image is between the orientations of the rotated image a and the rotated image B, and is close to the orientation of the rotated image a, i.e. the correction direction is rotated by 45 degrees and 90 degrees counterclockwise, and is closer to 90 degrees. If the probability that the direction of the rotated image A is correct is smaller than the preset probability (actually, the image A is corrected by rotating 45 degrees anticlockwise, and the recognition correct rate is not acceptable), determining the angle of the second rotation within the range (or the range can be expanded appropriately according to the actual situation), for example, rotating 60 degrees, 75 degrees, 80 degrees, 85 degrees anticlockwise, and the like, continuously calculating the probability that the direction of each rotated image after the second rotation is correct, subsequently similarly iterating until the maximum value of the probability that the direction is correct is larger than or equal to the preset probability, and determining the correction direction according to the rotation angle of the rotated image with the maximum probability.

Fig. 6 is a schematic flow chart illustrating the estimation of the correction direction of the first text image according to the present invention, where four predetermined rotation angles are taken as an example in fig. 6, the method includes:

s61, collecting a first text image;

and S62, performing rotation operation on the first text image according to four preset rotation angles to obtain four rotation images.

Wherein the four predetermined angles of rotation include: the first text image orientation is not rotated, the first text image orientation is rotated 90 degrees in the first direction, the first text image orientation is rotated 180 degrees in the first direction, and the first text image orientation is rotated 270 degrees in the first direction. The first direction may be a clockwise direction or a counterclockwise direction. Each arrow in fig. 6 represents a predetermined rotational direction.

S63, inputting the four rotated images into a shallow convolution network and a self-attention module respectively, and estimating the probability of the correct direction of each rotated image to obtain the estimated result of each rotated image;

wherein, the shallow convolutional network is marked as CNN. The estimated result of each rotated image is represented by results 1 to 4.

S64, judging the correction direction of the first text image according to the estimation result of each rotating image;

in order to improve the accuracy of the direction estimation model, the direction estimation model may be trained in advance.

The training set is obtained by performing data amplification on historical text images. The historical text image may be an image including a question, such as a test paper, a book page, and a cover. And the data amplification refers to performing crop operation on the image after Gaussian noise and salt and pepper noise are added to the original image.

And S4, correcting the direction of the first text image according to the correction direction of the first text image.

Illustratively, according to the preset angles of 0 degree, 90 degrees, 180 degrees and 270 degrees, if the correction direction is that the direction of the first text image is not rotated, the first text image is directly recognized without being corrected in the step; if the correction direction is that the first text image direction is rotated by 90 degrees along the first direction, the step transfers the first text image matrix, and then uses the horizontal bisector of the transposed matrix as a symmetry axis to carry out axisymmetric processing on the matrix elements to obtain a corrected first text image, and then carries out identification; if the correction direction is that the direction of the first text image is rotated by 180 degrees along the first direction, the vertical bisector of the first text image matrix is taken as a symmetry axis to carry out axisymmetric treatment on the matrix elements, then the horizontal bisector of the treated matrix is taken as a symmetry axis to carry out axisymmetric treatment on the matrix elements, and the corrected first text image is obtained and then identified; if the correction direction is that the first text image direction is rotated by 270 degrees along the first direction, the first text image matrix is transposed in the step, then the vertical bisector of the transposed matrix is used as a symmetry axis to carry out axial symmetry processing on the matrix elements, so that the corrected first text image is obtained, and then the recognition is carried out. Here, the first direction is the same as the first direction in step S2, that is, the first direction of the rotating operation is clockwise, and the first direction in this step is also clockwise, and vice versa.

The method comprises the steps of synchronously rotating a first text image according to N preset rotation angles to obtain N rotation images, inputting the N rotation images serving as feature data into a trained direction estimation model, and identifying the correction direction of the first text image; and correcting the direction of the first text image according to the correction direction of the first text image, so that the direction of the text image is consistent with the recognition direction, the problem recognition accuracy is improved, and the user experience is improved. Compared with the prior art, the correction direction of the first text image is comprehensively recognized according to the rotating images in multiple directions obtained after the first text image is synchronously rotated, and the accuracy is higher; the method can quickly identify the correction direction of the text image only by inputting the text image without detecting and identifying the character line, and can also identify the correction direction of the text image without characters, thereby having the advantages of simple operation, quick identification and wide application range.

The embodiment of the invention also provides a photographing question searching method, which comprises the following steps:

s101, collecting a first text image containing a target title to be searched;

s102, before target question recognition, adopting any one of the text image direction correction methods to correct the direction of the first text image;

s103, identifying the first text image.

Fig. 7 is a schematic structural framework diagram of a text image orientation correction device provided by the present invention, and as shown in fig. 7, the device includes:

an obtaining module 71, configured to obtain a first text image;

a rotation module 72, configured to perform a rotation operation on the first text image according to N predetermined rotation angles to obtain N rotation images, where N is a natural number greater than or equal to two;

an estimating module 73, configured to estimate, according to the N rotated images, a correction direction of the first text image, where the correction direction is a direction in which the first text image is rotated for facilitating subsequent image identification;

and the correcting module 74 is configured to correct the direction of the first text image according to the correcting direction of the first text image.

Illustratively, the estimation module 73 is configured to input the N rotation images into a trained direction estimation model to estimate a probability that a direction of each rotation image is correct; and taking the direction of the rotating image with the highest probability as the rectification direction of the first text image.

Optionally, the direction estimation model is generated after an image classification model is trained.

Illustratively, the direction pre-estimation model includes: the device comprises N neural networks and a judgment module respectively connected with the N neural networks; the N neural networks respectively calculate the probability that the direction of the input rotation image is correct; the judging module judges the correction direction of the first text image according to the probability level that the direction of each rotating image is correct;

optionally, the shallow convolutional network comprises: a plurality of volume blocks and a full link layer;

optionally, each of the convolution blocks includes a convolution layer, a pooling layer, a batch normalization layer, and an excitation layer.

Illustratively, the self-attention neural network is based on a Transformer model;

optionally, the two-class module comprises a fully connected layer and a softmax layer.

The rotation module 72 includes:

the conversion module is used for converting the first text image into a first text image matrix;

and the operation module is used for performing matrix operation on the first text image matrix to obtain N rotation matrixes corresponding to the N rotation images.

Illustratively, the predetermined rotation angles are four, including: the first text image is not rotated, the first text image is rotated by 90 degrees in a first direction, the first text image direction is rotated by 180 degrees in a first direction, and the first text image direction is rotated by 270 degrees in a first direction;

the operation module is used for directly taking the first text image matrix as the first rotating image matrix; after the first text image matrix is transposed, the vertical bisector of the transposed matrix is used as a symmetry axis to carry out axial symmetry processing on matrix elements, and a second rotary image matrix is obtained; performing axial symmetry processing on the matrix elements by taking the vertical bisector of the first text image matrix as a symmetry axis, and performing axial symmetry processing on the matrix elements by taking the horizontal bisector of the processed matrix as a symmetry axis to obtain a third rotation image matrix; and after the first text image matrix is transposed, performing axisymmetric processing on matrix elements by taking a horizontal bisector of the transposed matrix as a symmetry axis to obtain a fourth rotated image matrix.

Fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, which includes a processor and a memory, where the memory stores a computer-executable program, and when the computer program is executed by the processor, the processor executes a text image orientation correction method.

As shown in fig. 8, the electronic device is in the form of a general purpose computing device. The processor can be one or more and can work together. The invention also does not exclude that distributed processing is performed, i.e. the processors may be distributed over different physical devices. The electronic device of the present invention is not limited to a single entity, and may be a sum of a plurality of entity devices.

The memory stores a computer executable program, typically machine readable code. The computer readable program may be executed by the processor to enable an electronic device to perform the method of the invention, or at least some of the steps of the method.

The memory may include volatile memory, such as Random Access Memory (RAM) and/or cache memory, and may also be non-volatile memory, such as read-only memory (ROM).

Optionally, in this embodiment, the electronic device further includes an I/O interface, which is used for data exchange between the electronic device and an external device. The I/O interface may be a local bus representing one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, and/or a memory storage device using any of a variety of bus architectures.

It should be understood that the electronic device shown in fig. 8 is only one example of the present invention, and elements or components not shown in the above example may be further included in the electronic device of the present invention. For example, some electronic devices further include a display unit such as a display screen, and some electronic devices further include a human-computer interaction element such as a button, a keyboard, and the like. Electronic devices are considered to be covered by the present invention as long as the electronic devices are capable of executing a computer-readable program in a memory to implement the method of the present invention or at least a part of the steps of the method.

Fig. 9 is a schematic diagram of a computer-readable recording medium of an embodiment of the present invention. As shown in fig. 9, a computer-readable recording medium stores therein a computer-executable program, which, when executed, implements the text image orientation correction method of the present invention described above. The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

From the above description of the embodiments, those skilled in the art will readily appreciate that the present invention can be implemented by hardware capable of executing a specific computer program, such as the system of the present invention, and electronic processing units, servers, clients, mobile phones, control units, processors, etc. included in the system, and the present invention can also be implemented by a vehicle including at least a part of the above system or components. The invention can also be implemented by computer software executing the method of the invention, for example, by control software executed by a microprocessor, an electronic control unit, a client, a server, etc. of a live device. It should be noted that the computer software for executing the method of the present invention is not limited to be executed by one or a specific hardware entity, but may also be implemented in a distributed manner by hardware entities without specific details, and for the computer software, the software product may be stored in a computer readable storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or may be stored in a distributed manner on a network, as long as it can enable an electronic device to execute the method according to the present invention.

While the foregoing embodiments have described the objects, aspects and advantages of the present invention in further detail, it should be understood that the present invention is not inherently related to any particular computer, virtual machine or electronic device, and various general-purpose machines may be used to implement the present invention. The invention is not to be considered as limited to the specific embodiments thereof, but is to be understood as being modified in all respects, all changes and equivalents that come within the spirit and scope of the invention.

25页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种文本倾斜角度检测方法、系统及存储介质

Text image direction correction method and device and electronic equipment

相关技术

网友询问留言