Character recognition method and device

文档序号:1113566 发布日期:2020-09-29 浏览:10次 中文

阅读说明:本技术 文字识别方法及装置 (Character recognition method and device ) 是由 徐杨柳 于 2019-03-19 设计创作,主要内容包括:本发明公开了一种文字识别方法及装置,属于图像识别领域。所述方法包括:对输入图像进行扩增,得到多个图像,所述多个图像包含同一待识别的文字,所述多个图像中所述文字的朝向不同;对所述多个图像进行融合,得到融合后的图像,所述融合后的图像包含多种朝向的所述文字的特征信息,所述多种朝向包括所述多个图像中所述文字的朝向;对所述融合后的图像进行文字识别,输出文字识别结果。本发明在文字识别过程中可以同时观察该多种不同朝向的文字信息,可以满足日益复杂的多种不同朝向的文字识别需求。(The invention discloses a character recognition method and device, and belongs to the field of image recognition. The method comprises the following steps: amplifying an input image to obtain a plurality of images, wherein the images contain the same character to be identified, and the orientations of the characters in the images are different; fusing the images to obtain a fused image, wherein the fused image comprises feature information of the characters in multiple orientations, and the multiple orientations comprise the orientations of the characters in the images; and performing character recognition on the fused image, and outputting a character recognition result. The invention can simultaneously observe the character information in various different orientations in the character recognition process, and can meet the increasingly complex character recognition requirements in various different orientations.)

1. A method for recognizing a character, the method comprising:

amplifying an input image to obtain a plurality of images, wherein the images contain the same character to be identified, and the orientations of the characters in the images are different;

fusing the images to obtain a fused image, wherein the fused image comprises feature information of the characters in multiple orientations, and the multiple orientations comprise the orientations of the characters in the images;

and performing character recognition on the fused image, and outputting a character recognition result.

2. The method of claim 1, wherein the fusing the plurality of images to obtain a fused image comprises:

connecting the plurality of images in a channel dimension to obtain the fused image; or the like, or, alternatively,

fusing the plurality of images through a convolutional neural network to obtain fused images; or the like, or, alternatively,

and fusing the plurality of images through a deep decision tree to obtain the fused image.

3. The method of claim 2, wherein said fusing the plurality of images through a convolutional neural network to obtain the fused image comprises:

learning, by the convolutional neural network, weights of the plurality of images;

according to the weights of the images, carrying out weighted summation on the images in a channel dimension to obtain the fused image; or the like, or, alternatively,

and according to the weights of the images, weighting the images and then connecting the weighted images in a channel dimension to obtain the fused image.

4. The method according to claim 1, wherein the performing character recognition on the fused image and outputting a character recognition result comprises:

extracting the characteristics of the fused image;

and decoding the extracted features to obtain the character recognition result.

5. The method of claim 1, wherein the augmenting the input image to obtain a plurality of images comprises:

and amplifying the input image by adopting at least one amplification mode to obtain the plurality of images, wherein the at least one amplification mode comprises rotation, mirror image turning and distortion.

6. A character recognition apparatus, comprising:

the system comprises an amplification module, a recognition module and a recognition module, wherein the amplification module is used for amplifying an input image to obtain a plurality of images, the plurality of images comprise the same character to be recognized, and the orientations of the characters in the plurality of images are different;

the fusion module is used for fusing the images to obtain fused images, wherein the fused images comprise feature information of the characters in multiple orientations, and the multiple orientations comprise the orientations of the characters in the images;

and the recognition module is used for performing character recognition on the fused image and outputting a character recognition result.

7. The apparatus of claim 6, wherein the fusion module is configured to connect the plurality of images in a channel dimension to obtain the fused image; or the like, or, alternatively,

the fusion module is used for fusing the plurality of images through a convolutional neural network to obtain fused images; or the like, or, alternatively,

the fusion module is used for fusing the plurality of images through a deep decision tree to obtain the fused images.

8. The apparatus of claim 7, wherein the fusion module is configured to learn the weights of the plurality of images through the convolutional neural network; according to the weights of the images, carrying out weighted summation on the images in a channel dimension to obtain the fused image; or, according to the weights of the plurality of images, the plurality of images are weighted and then connected in channel dimensions to obtain the fused image.

9. The apparatus of claim 6, wherein the recognition module is configured to extract features of the fused image; and decoding the extracted features to obtain the character recognition result.

10. The apparatus of claim 6, wherein the augmentation module is configured to augment the input image to obtain the plurality of images using at least one augmentation mode, the at least one augmentation mode comprising rotation, mirror inversion, and warping.

11. An electronic device comprising a processor and a memory; the memory is used for storing at least one instruction; the processor, configured to execute at least one instruction stored on the memory to implement the method steps of any of claims 1-5.

12. A computer-readable storage medium having stored therein at least one instruction which, when executed by a processor, implements the method steps of any of claims 1-5.

Technical Field

The invention relates to the field of image recognition, in particular to a character recognition method and device.

Background

Character Recognition, such as OCR (Optical Character Recognition), is a technique for converting Optical characters in an image into text format characters by an electronic device. With the development of character recognition technology, people have higher and higher requirements on character recognition, and how to accurately and effectively recognize characters in various different orientations becomes a problem to be solved urgently.

Disclosure of Invention

The embodiment of the invention provides a character recognition method and a character recognition device, which can solve the problem that the character recognition in the related technology has great limitation. The technical scheme is as follows:

in a first aspect, a method for character recognition is provided, the method comprising:

amplifying an input image to obtain a plurality of images, wherein the images contain the same character to be identified, and the orientations of the characters in the images are different;

fusing the images to obtain a fused image, wherein the fused image comprises feature information of the characters in multiple orientations, and the multiple orientations comprise the orientations of the characters in the images;

and performing character recognition on the fused image, and outputting a character recognition result.

In one possible implementation manner, the fusing the plurality of images to obtain a fused image includes:

connecting the plurality of images in a channel dimension to obtain the fused image; or the like, or, alternatively,

fusing the plurality of images through a convolutional neural network to obtain fused images; or the like, or, alternatively,

and fusing the plurality of images through a deep decision tree to obtain the fused image.

In one possible implementation manner, the fusing the plurality of images through a convolutional neural network to obtain the fused image includes:

learning, by the convolutional neural network, weights of the plurality of images;

according to the weights of the images, carrying out weighted summation on the images in a channel dimension to obtain the fused image; or the like, or, alternatively,

and according to the weights of the images, weighting the images and then connecting the weighted images in a channel dimension to obtain the fused image.

In a possible implementation manner, the performing text recognition on the fused image and outputting a text recognition result includes:

extracting the characteristics of the fused image;

and decoding the extracted features to obtain the character recognition result.

In one possible implementation, the amplifying the input image to obtain a plurality of images includes:

and amplifying the input image by adopting at least one amplification mode to obtain the plurality of images, wherein the at least one amplification mode comprises rotation, mirror image turning and distortion.

In a second aspect, there is provided a text recognition apparatus, the apparatus comprising:

the system comprises an amplification module, a recognition module and a recognition module, wherein the amplification module is used for amplifying an input image to obtain a plurality of images, the plurality of images comprise the same character to be recognized, and the orientations of the characters in the plurality of images are different;

the fusion module is used for fusing the images to obtain fused images, wherein the fused images comprise feature information of the characters in multiple orientations, and the multiple orientations comprise the orientations of the characters in the images;

and the recognition module is used for performing character recognition on the fused image and outputting a character recognition result.

In a possible implementation manner, the fusion module is configured to connect the multiple images in a channel dimension to obtain the fused image; or the like, or, alternatively,

the fusion module is used for fusing the plurality of images through a convolutional neural network to obtain fused images; or the like, or, alternatively,

the fusion module is used for fusing the plurality of images through a deep decision tree to obtain the fused images.

In one possible implementation, the fusion module is configured to learn weights of the plurality of images through the convolutional neural network; according to the weights of the images, carrying out weighted summation on the images in a channel dimension to obtain the fused image; or, according to the weights of the plurality of images, the plurality of images are weighted and then connected in channel dimensions to obtain the fused image.

In one possible implementation, the recognition module is configured to extract features of the fused image; and decoding the extracted features to obtain the character recognition result.

In one possible implementation, the augmentation module is configured to augment the input image with at least one augmentation mode to obtain the plurality of images, where the at least one augmentation mode includes rotation, mirror inversion, and warping.

In a third aspect, an electronic device is provided that includes a processor and a memory; the memory is used for storing at least one instruction; the processor is configured to execute at least one instruction stored in the memory to implement the method steps of any one of the implementation manners of the first aspect.

In a fourth aspect, a computer-readable storage medium is provided, in which at least one instruction is stored, and the at least one instruction, when executed by a processor, implements the method steps of any one of the implementations of the first aspect.

The technical scheme provided by the embodiment of the invention has the beneficial effects that at least:

the method comprises the steps of obtaining a plurality of images containing characters in different orientations by amplifying input images, further fusing the images in different orientations, extracting features of the fused images, and decoding the extracted features to obtain character recognition results. Because the fused image contains the character information in various different orientations, the character information in various different orientations can be observed simultaneously in the character recognition process, and the increasingly complex character recognition requirements in various different orientations can be met.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flowchart of a text recognition method according to an embodiment of the present invention;

FIG. 2 is a flow chart of a method for recognizing words according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a spin amplification method provided by an embodiment of the present invention;

FIG. 4 is a schematic illustration of an amplification and fusion process provided by an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a text recognition network according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a character recognition apparatus according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of an electronic device 700 according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

Fig. 1 is a flowchart of a text recognition method according to an embodiment of the present invention. Referring to fig. 1, the method includes:

101. and amplifying the input image to obtain a plurality of images, wherein the images contain the same character to be recognized, and the orientations of the characters in the images are different.

102. And fusing the plurality of images to obtain a fused image, wherein the fused image comprises the characteristic information of the characters in various orientations, and the various orientations comprise the orientations of the characters in the plurality of images.

103. And performing character recognition on the fused image, and outputting a character recognition result.

According to the method provided by the embodiment of the invention, after the input image is amplified to obtain a plurality of images containing characters in different orientations, the plurality of images in different orientations are further fused, and then the extracted features are decoded after the fused images are subjected to feature extraction to obtain character recognition results. Because the fused image contains the character information in various different orientations, the character information in various different orientations can be observed simultaneously in the character recognition process, and the increasingly complex character recognition requirements in various different orientations can be met.

In one possible implementation, the fusing the plurality of images to obtain a fused image includes:

connecting the plurality of images in channel dimension to obtain the fused image; or the like, or, alternatively,

fusing the plurality of images through a convolutional neural network to obtain a fused image; or the like, or, alternatively,

and fusing the plurality of images through a deep decision tree to obtain the fused image.

In one possible implementation, the fusing the plurality of images through a convolutional neural network to obtain the fused image includes:

learning weights of the plurality of images through the convolutional neural network;

according to the weights of the images, carrying out weighted summation on the images in channel dimensions to obtain the fused image; or the like, or, alternatively,

and according to the weights of the images, weighting the images and then connecting the images in a channel dimension to obtain the fused image.

In a possible implementation manner, the performing text recognition on the fused image and outputting a text recognition result includes:

extracting the characteristics of the fused image;

and decoding the extracted features to obtain the character recognition result.

In one possible implementation, the augmenting the input image to obtain a plurality of images includes:

and amplifying the input image by adopting at least one amplification mode to obtain the plurality of images, wherein the at least one amplification mode comprises rotation, mirror image turning and distortion.

All the above-mentioned optional technical solutions can be combined arbitrarily to form the optional embodiments of the present invention, and are not described herein again.

Fig. 2 is a flowchart of a text recognition method according to an embodiment of the present invention. The method is performed by an electronic device, see fig. 2, the method comprising:

201. and amplifying the input image to obtain a plurality of images, wherein the images contain the same character to be recognized, and the orientations of the characters in the images are different.

The input image may be one or more Feature maps (Feature maps), and the orientations of the characters in the Feature maps may be the same, as in the case where the Feature maps are obtained by Feature extraction on the same image, the Feature maps include Feature information of the characters in the same orientation. The orientations of characters in the plurality of feature maps may be different, and if the plurality of feature maps are obtained by amplification, the plurality of feature maps include feature information of characters in a plurality of orientations. The characteristic diagram is a data structure and is composed of a three-dimensional matrix, the three dimensions are respectively the width, the height and the channel number, an RGB image can also be regarded as the characteristic diagram with the channel number of 3 and can be applied to a convolutional neural network, the convolutional neural network is a feedforward artificial neural network, neurons of the convolutional neural network can respond to peripheral units in a limited coverage range, and structural information of the image is effectively extracted through weight sharing and characteristic convergence.

In the embodiment of the invention, the electronic equipment can adopt at least one amplification mode to amplify the input image in multiple angles to obtain a plurality of images containing characters in different orientations. Wherein at least one amplification mode may comprise rotation, mirror inversion, twisting, and the like. Optionally, when the electronic device augments the input image, a feature extraction method such as a color histogram may also be used to assist, for example, extracting color features of the input image by using the color histogram to enhance the effect of feature map augmentation.

Taking the spin-amplification method alone as an example, the amplification at multiple angles can be represented by the following formula:

Rd=rotate(d,Fin)

wherein, FinFor input image, d is rotation angle, rotate is rotation operation, RdThe amplified image is obtained. FinAfter the rotary amplification of a plurality of angles, a plurality of Rs can be obtaineddA plurality of RdMay include FinAs in the case where the plurality of angles includes 0 degrees.

Referring to fig. 3, fig. 3 is a schematic diagram of rotation amplification according to an embodiment of the present invention, as shown in fig. 3, an electronic device may perform amplification on an input image at different angles, as shown in fig. 3, after the input image is amplified at 4 angles, such as 0 degree, 90 degrees, 180 degrees, and 270 degrees, 4 images are obtained, and characters "fill" in the 4 images have different orientations. Of course, the angles (0 degrees, 90 degrees, 180 degrees, and 270 degrees) in fig. 3 are only one example, and in fact, the electronic device may select other different angles, rotate different times, or perform other transformation operations (mirror flipping, warping, etc.) to augment the input image.

For the case that the input image is a feature map, the electronic device may amplify the feature map to obtain a plurality of images, and then the electronic device may perform the following steps 202 to 204 on the plurality of images. For the case that the input image is a plurality of feature maps, the electronic device may amplify each feature map of the plurality of feature maps respectively to obtain a plurality of sets of images, each set of images includes a plurality of images obtained by amplifying one feature map, and then the electronic device may perform subsequent steps 202 to 204 on each set of images of the plurality of sets of images respectively.

202. And fusing the plurality of images to obtain a fused image, wherein the fused image comprises the characteristic information of the characters in various orientations, and the various orientations comprise the orientations of the characters in the plurality of images.

In the embodiment of the present invention, the multiple images obtained in step 202 include feature information of characters in different orientations, and the electronic device may fuse the feature information into one image to obtain a fused image. The electronic device fusing the amplified multiple images may include, but is not limited to, the following possible implementations:

in the first mode, the multiple images are connected in the channel dimension to obtain the fused image.

In this way, the electronic device can connect a plurality of images in each channel dimension (channel direction) and fuse the images to obtain a new image. Taking 4 images of 3 channels as F1, F2, F3, and F4 as examples, F1, F2, F3, and F4 are fused in the first way, and 1 image of 12 channels can be obtained.

And in the second mode, the weights of the images are learned through a convolutional neural network, and the images are subjected to weighted summation in channel dimensions according to the weights of the images to obtain the fused image.

Wherein, the weights of the plurality of images can be calculated by a convolutional neural network with a Softmax layer. Specifically, the plurality of images are subjected to convolution layer processing to obtain feature maps of the plurality of images, and the feature maps of the plurality of images are subjected to Softmax layer processing to obtain weights of the plurality of images.

In one possible implementation, the training process of the convolutional neural network may include: and training network parameters of the convolutional neural network based on the sample images in different orientations and the randomly initialized weights by adopting a back propagation method, so that the trained convolutional neural network can learn the weights of the images in different orientations. During training, a shallow-level small-scale convolutional neural network can be obtained based on a small-scale sample image.

Further, after the electronic device obtains the weights of the plurality of images through the convolutional neural network, the plurality of images can be subjected to weighted summation in each channel dimension, and a new image is obtained through fusion. Taking 4 images of 3 channels as F1, F2, F3, and F4 as examples, F1, F2, F3, and F4 are fused in a second way, i.e., weighted summation is performed: f-a 1 × F1+ a2 × F2+ a3 × F3+ a4 × F4, resulting in an image F of 1 3 channels.

Taking 4 RGB images as an example, the RGB images have 3 color channels of R (red), G (green), and B (blue), each pixel has 3 values to represent, the electronic device can be respectively in the R channel, the values of the pixels of the 4 RGB images in the R channel are weighted and summed, the values of the pixels of the 4 RGB images in the G channel are weighted and summed, and the values of the pixels of the 4 RGB images in the B channel are weighted and summed to obtain 1 RGB image.

And in the third mode, the weights of the images are learned through a convolutional neural network, and the images are weighted and then connected in channel dimensions according to the weights of the images to obtain the fused image.

In this way, after the electronic device obtains the weights of the plurality of images through the convolutional neural network, the plurality of images can be connected in each channel dimension, and a new image is obtained through fusion. Taking 4 images of 3 channels as F1, F2, F3 and F4 as examples, F1, F2, F3 and F4 are fused in a third way, namely, weighting is performed first to obtain a 1F 1, a 2F 2, a 3F 3 and a 4F 4, and then a 1F 1, a 2F 2, a 3F 3 and a 4F 4 are connected in the channel dimension to obtain 1 image of 12 channels.

The second and third modes are two possible implementation modes for obtaining the fused image by fusing the multiple images through a convolutional neural network.

And in a fourth mode, the multiple images are fused through a deep decision tree to obtain the fused image.

The fourth way is to replace the convolutional neural network with a Deep decision tree (Deep Forest) through which the weights of the plurality of images are learned. And further, according to the weights of the images, weighting and summing the images in the channel dimension, or weighting and connecting the images in the channel dimension to obtain a fused image. Wherein, the process of weighting and summing the plurality of images in the channel dimension is described in the second way; the process of weighting the multiple images and then connecting the images in the channel dimension is described in the third method, and is not described again.

Of course, besides the deep decision tree, other ways may be used for fusion, which is not limited in the embodiment of the present invention.

Referring to fig. 4, fig. 4 is a schematic diagram of an amplification and fusion process provided by an embodiment of the present invention, wherein amplification is an amplification operation, and Concatenate is a ligation in a channel dimension. FIG. 4 (a) is a view showing a case where no amplification and fusion operation is performed; (b) the graph corresponds to the first way, and the images are directly connected on the channel dimension; (c) the graph corresponds to the second mode, each image is endowed with a new weight (calculated by a convolutional neural network with a Softmax layer in fig. 4), a Scale layer can be connected behind the Softmax layer, and then a new image is obtained by weighted summation (Sum); (d) the graph corresponds to the third mode, and each image is also given a new weight (calculated by a convolutional neural network with a Softmax layer in the graph), a Scale layer can be connected behind the Softmax layer, and the weighted images are connected in the channel dimension to obtain a new image.

Taking the rotational amplification of 4 angles (0 degrees, 90 degrees, 180 degrees and 270 degrees) in step 201 as an example, the fusion of step 202 can be expressed by the following formula:

Fout=fuse(R0,R90,R180,R270)

wherein R is an image obtained after amplification (R)0For images obtained by 0 degree rotation, R90For images obtained by 90 degree rotation, R180For an image obtained by 180 degree rotation, R270For an image obtained through 270 degrees rotation), fuse is a fusion operation, FoutIs the fused output image.

As shown in fig. 3, the 4 images obtained by amplification are fused to obtain a new image, and the image contains the character information of 4 orientations.

The above steps 201 and 202 are processes of performing amplification and Fusion (FAME), which is an image processing method that can amplify a group of original images in a certain orientation into a plurality of groups of images in other orientations, and then fuse the images, thereby achieving the purpose of recognizing characters in different orientations.

It should be noted that the amplification and fusion processes represented by step 201 and step 202 may be repeated, and after the electronic device performs amplification and fusion once, the obtained image is amplified and fused again, so that the fusion of the features may be more sufficient.

203. And extracting the characteristics of the fused image.

In the embodiment of the present invention, after the electronic device obtains the fully fused image through step 202, the depth feature of the image may be further extracted on the basis of the fully fused image. Methods of feature extraction include, but are not limited to, convolutional neural networks, deep decision trees, and the like. Taking feature extraction using a convolutional neural network as an example, the electronic device may input the fused image to the convolutional neural network, and output features of the fused image.

Optionally, after the electronic device extracts the features of the fused image, the electronic device may further encode the extracted features to achieve a better character recognition effect. The encoding method includes, but is not limited to, BiLSTM (Bidirectional Long-Short Term Memory) encoding.

204. And decoding the extracted features to obtain a character recognition result.

In the embodiment of the invention, decoding refers to a process from characteristics to characters. After extracting the features of the fused image in step 203, the electronic device may decode the extracted features, thereby outputting a recognition result of the character. The decoding method includes, but is not limited to, an Attention (Attention) based decoding mechanism and a CTC (connection temporal classification) based decoding method.

It should be noted that, in the embodiment of the present invention, the structures of feature extraction and feature decoding are not constrained, that is, no matter what manner is used in step 203 to perform feature extraction, any decoding manner may be used in step 204 to decode the features extracted in step 203.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a text recognition network according to an embodiment of the present invention. The amplification (amplification) in fig. 5 may be 4-degree rotation amplification, and the fusion (Combination) in fig. 5 may be connection in the channel dimension of the image. Fig. 5 is a schematic structural diagram of the entire network drawn by taking three times of convolution neural networks (previous conv1, previous conv2, and previous conv 3) applied to a 9-layer convolution as an example. Wherein, Conv (Convolution) represents Convolution layers with Convolution kernel of 3 × 3 and step size of 1, the number after Conv represents the number of Convolution kernels, all Convolution layers have a ReLU (Rectified Linear Unit) activation function layer (not shown in fig. 5) after them, some Convolution layers have a pooling (posing) layer after them, and the figure is represented by/() and the four numbers of/() are parameters of the pooling layer. The word recognition network shown in FIG. 5 may also include a BilSTM (Bi-Long Short-term-memory, bidirectional Long Short-term memory) layer and an Attention layer.

Step 203 and step 204 are one possible implementation manner of performing character recognition on the fused image and outputting a character recognition result.

According to the character recognition method provided by the embodiment of the invention, in the character recognition process, after the input image is amplified, a plurality of images in different orientations obtained by amplification are fused, so that the network can be recognized to observe a plurality of orientations of the input image at the same time. The method is easy to implement, and the capability of identifying characters in multiple directions can be realized without calibrating the position of each character. The basic framework of character recognition may be based on a convolutional neural network (as shown in fig. 5), or based on other algorithms, the amplification manner may be rotation amplification, mirror inversion, or warping, etc., and the fusion means may be direct connection in channel dimension, or weighted summation, or other fusion manners.

According to the method provided by the embodiment of the invention, after the input image is amplified to obtain a plurality of images containing characters in different orientations, the plurality of images in different orientations are further fused, and then the extracted features are decoded after the fused images are subjected to feature extraction to obtain character recognition results. Because the fused image contains the character information in various different orientations, the character information in various different orientations can be observed simultaneously in the character recognition process, and the increasingly complex character recognition requirements in various different orientations can be met.

Fig. 6 is a schematic structural diagram of a character recognition device according to an embodiment of the present invention. Referring to fig. 6, the apparatus includes:

an amplifying module 601, configured to amplify an input image to obtain multiple images, where the multiple images contain the same character to be recognized, and directions of the characters in the multiple images are different;

a fusion module 602, configured to fuse the multiple images to obtain a fused image, where the fused image includes feature information of the text in multiple orientations, and the multiple orientations include orientations of the text in the multiple images;

the recognition module 603 is configured to perform character recognition on the fused image, and output a character recognition result.

In one possible implementation, the fusion module 602 is configured to connect the multiple images in a channel dimension to obtain the fused image; or the like, or, alternatively,

the fusion module 602 is configured to fuse the multiple images through a convolutional neural network to obtain a fused image; or the like, or, alternatively,

the fusion module 602 is configured to fuse the multiple images through a deep decision tree to obtain the fused image.

In one possible implementation, the fusion module 602 is configured to learn the weights of the plurality of images through the convolutional neural network; according to the weights of the images, carrying out weighted summation on the images in channel dimensions to obtain the fused image; or, according to the weights of the plurality of images, the plurality of images are weighted and then connected in the channel dimension to obtain the fused image.

In one possible implementation, the recognition module 603 is configured to extract features of the fused image; and decoding the extracted features to obtain the character recognition result.

In one possible implementation, the amplification module 601 is configured to amplify the input image to obtain the plurality of images by at least one amplification method, where the at least one amplification method includes rotation, mirror inversion, and warping.

According to the device provided by the embodiment of the invention, after the input image is amplified to obtain a plurality of images containing characters in different orientations, the plurality of images in different orientations are further fused, and then after the fused images are subjected to feature extraction, the extracted features are decoded to obtain character recognition results. Because the fused image contains the character information in various different orientations, the character information in various different orientations can be observed simultaneously in the character recognition process, and the increasingly complex character recognition requirements in various different orientations can be met.

It should be noted that: in the text recognition apparatus provided in the above embodiment, only the division of the functional modules is illustrated, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the device may be divided into different functional modules to complete all or part of the functions described above. In addition, the text recognition device and the text recognition method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments and are not described herein again.

Fig. 7 is a schematic structural diagram of an electronic device 700 according to an embodiment of the present invention, where the electronic device 700 may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 701 and one or more memories 702, where the memory 702 stores at least one instruction, and the at least one instruction is loaded and executed by the processor 701 to implement the text recognition method provided by the foregoing method embodiments. Of course, the electronic device 700 may further have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input and output, and the electronic device 700 may further include other components for implementing device functions, which are not described herein again.

In an exemplary embodiment, a computer-readable storage medium, such as a memory, storing at least one instruction, which when executed by a processor, implements the text recognition method in the above embodiments, is also provided. For example, the computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc Read-Only Memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only exemplary of the present invention and should not be taken as limiting the invention, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

16页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种基于深度学习的图像文本坐标定位方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!