Image analysis method, system, device and medium based on OCR

文档序号：1379192 发布日期：2020-08-14 浏览：5次中文

阅读说明：本技术 一种基于ocr的图像分析方法、系统、设备及介质 (Image analysis method, system, device and medium based on OCR ) 是由周曦姚志强林旸焜许梅芳于 2020-04-21 设计创作，主要内容包括：本发明提供一种基于OCR的图像分析方法、系统、设备及介质,包括：根据预先获取的目标图像的特征对目标图像进行语义分割,获取语义分割结果；基于所述语义分割结果对目标图像的版面进行分析。本发明通过对目标图像进行语义分割,根据分割后的结果进行版面分析,能够在遮挡、成像角度、旋转、光照等干扰下,快速、准确地检测目标图像中的文本；并且对于不规范的图像版式或表面,本发明也能结构化提取字段信息。(The invention provides an image analysis method, system, device and medium based on OCR, comprising: performing semantic segmentation on the target image according to the characteristics of the target image acquired in advance to acquire a semantic segmentation result; and analyzing the layout of the target image based on the semantic segmentation result. According to the invention, through semantic segmentation of the target image and layout analysis according to the segmented result, the text in the target image can be rapidly and accurately detected under the interference of shielding, imaging angle, rotation, illumination and the like; and for irregular image formats or surfaces, the method can also structurally extract field information.)

1. An OCR-based image analysis method, comprising the steps of:

performing semantic segmentation on the target image according to the characteristics of the target image acquired in advance to acquire a semantic segmentation result;

and analyzing the layout of the target image based on the semantic segmentation result.

2. An OCR-based image analysis method according to claim 1 and further comprising preprocessing the target image before obtaining the features of the target image, said preprocessing including at least one of:

and carrying out position correction on the text in the target image through a text position correction algorithm, and converting the text box label of the target image into a label at a pixel level.

3. An OCR-based image analysis method according to claim 2 wherein the annotated content includes one of: text line attributes, corners, offsets.

4. An OCR-based image analysis method according to any of claims 1 to 3, characterized in that based on a deep learning ensemble scheme, training is performed using random gradient descent with momentum to train one or more layout analysis deep learning network models;

and inputting the pre-acquired features of the target images and the context correlation attributes among the target image levels into one or more trained layout analysis deep learning network models to perform semantic segmentation on the target images.

5. An OCR-based image analysis method according to claim 4, wherein the semantic segmentation of the target image comprises at least one of: the method comprises the steps of a text box of a regression target image, an anchor detection box of the regression target image, pixel points in a prediction target image and pixel points in a corner region in the prediction target image.

6. An OCR-based image analysis method according to claim 5, wherein the obtained semantic segmentation results comprise at least one of: acquiring corner positions through a text box of a regression target image, acquiring boundary positions of the text box of the target image through an anchor detection box of the regression target image and pixel points in a corner region in a prediction target image, and acquiring the precise boundary positions of the text box of the target image through pixel points in the corner region in the prediction target image.

7. An OCR-based image analysis method according to claim 6 and also comprising corner matching in combination with a plurality of weights; including at least one of the following weights: distance, aspect ratio, angle.

8. An OCR-based image analysis method according to claim 6,

classifying all predicted pixel points to obtain the attribute of a text box of a target image;

and analyzing the layout of the target image based on the attribute of the text box.

9. An OCR-based image analysis method as claimed in claim 4, further comprising increasing one or more disturbance parameters to improve robustness of one or more layout deep learning network models when training the one or more layout deep learning network models.

10. An OCR-based image analysis method according to claim 9 and wherein said disturbance parameter comprises at least one of: background, rotation, perspective, distortion, noise, gaussian blur, motion blur.

11. An OCR-based image analysis method as in claim 4, wherein in training one or more layout analysis deep learning network models, model losses are calculated using different methods for predictions of different attributes, and all model losses are weighted to obtain the total loss of the model.

12. An OCR-based image analysis method according to claim 11 and wherein the calculated model loss comprises at least one of: smooth L1 loss, cross entropy loss.

13. An OCR-based image analysis method according to claim 1 wherein the features of the target object include at least one of: global features of the target image, local features of the target image, and a correlation between target image level contexts.

14. An OCR-based image analysis method according to claim 1 or 13, wherein the features of the target image are obtained through a convolutional neural network or a full convolutional network, and the features include a global feature of the target image and a local feature of the target image.

15. An OCR-based image analysis method according to claim 14 and further comprising boosting the receptive field of a full convolution network using a parallel architecture formed jointly by a concatenation of hole convolutions and different sampling rate hole convolutions.

16. An OCR-based image analysis system, comprising:

the segmentation module is used for performing semantic segmentation on the target image according to the characteristics of the target image acquired in advance to acquire a semantic segmentation result;

and the analysis module is used for analyzing the layout of the target image based on the semantic segmentation result.

17. An OCR-based image analysis system according to claim 16 and further comprising, prior to obtaining features of the target image, preprocessing of the target image including at least one of:

and carrying out position correction on the text in the target image through a text position correction algorithm, and converting the text box label of the target image into a label at a pixel level.

18. An OCR-based image analysis system according to claim 17 and wherein the annotated content comprises one of: text line attributes, corners, offsets.

19. An OCR-based image analysis system according to any one of claims 16 to 18 wherein a deep learning based ensemble scheme is trained using random gradient descent with momentum to train out one or more layout deep learning network models;

20. An OCR-based image analysis system according to claim 19 wherein the semantic segmentation of the target image includes at least one of: the method comprises the steps of a text box of a regression target image, an anchor detection box of the regression target image, pixel points in a prediction target image and pixel points in a corner region in the prediction target image.

21. An OCR-based image analysis system according to claim 20 and wherein the obtained semantic segmentation results include at least one of: acquiring corner positions through a text box of a regression target image, acquiring boundary positions of the text box of the target image through an anchor detection box of the regression target image and pixel points in a corner region in a prediction target image, and acquiring the precise boundary positions of the text box of the target image through pixel points in the corner region in the prediction target image.

22. An OCR-based image analysis system according to claim 21 and further comprising corner matching in combination with a plurality of weights; including at least one of the following weights: distance, aspect ratio, angle.

23. An OCR-based image analysis system according to claim 21,

classifying all predicted pixel points to obtain the attribute of a text box of a target image;

and analyzing the layout of the target image based on the attribute of the text box.

24. An OCR-based image analysis system according to claim 19 and further comprising adding one or more interference parameters to improve robustness of said one or more layout deep learning network models when training said one or more layout deep learning network models.

25. An OCR-based image analysis system according to claim 24 and wherein said disturbance parameter comprises at least one of: background, rotation, perspective, distortion, noise, gaussian blur, motion blur.

26. An OCR based image analysis system according to claim 19 wherein in training one or more layout deep learning network models, model losses are calculated using different methods for predictions of different attributes and all model losses are weighted to obtain the total loss of the model.

27. An OCR-based image analysis system according to claim 16 wherein the calculated model loss includes at least one of: smooth L1 loss, cross entropy loss.

28. An OCR-based image analysis system according to claim 16 and wherein said target object features include at least one of: global features of the target image, local features of the target image, and a correlation between target image level contexts.

29. An OCR-based image analysis system according to claim 16 or 28 and wherein the features of the target image are obtained by convolutional neural network, full convolutional network, including obtaining global features of the target image, local features of the target image.

30. An OCR-based image analysis system according to claim 29 and further comprising using a parallel architecture formed jointly by a concatenation of hole convolutions and different sampling rate hole convolutions to boost the receptive field of a full convolution network.

31. An OCR-based image analysis apparatus, comprising:

performing semantic segmentation on the target image according to the characteristics of the target image acquired in advance to acquire a semantic segmentation result;

and analyzing the layout of the target image based on the semantic segmentation result.

32. An apparatus, comprising:

one or more processors; and

one or more machine-readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform the method recited by one or more of claims 1-15.

33. One or more machine-readable media having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform the method recited by one or more of claims 1-15.

Technical Field

The present invention relates to the field of image technologies, and in particular, to an image analysis method, system, device, and medium based on OCR.

Background

Layout analysis or layout analysis is an important issue in the field of OCR (Optical Character recognition), and aims to determine whether a given picture or image includes a designated object and to obtain an accurate position and boundary of the designated object. In the field of OCR, semantic segmentation and a general object detection framework have been widely adopted for scene text detection tasks. Due to interference such as shielding, imaging angle, rotation and illumination, common target detection in the prior art hardly meets the requirements of rapid and accurate text detection at the same time, and field information cannot be extracted structurally for irregular image formats or layouts.

Disclosure of Invention

In view of the above-mentioned shortcomings of the prior art, it is an object of the present invention to provide an OCR-based image analysis method, system, device and medium for solving the problems in the prior art.

To achieve the above and other related objects, the present invention provides an OCR-based image analysis method, including the steps of:

performing semantic segmentation on the target image according to the characteristics of the target image acquired in advance to acquire a semantic segmentation result;

and analyzing the layout of the target image based on the semantic segmentation result.

Optionally, before acquiring the features of the target image, the method further includes preprocessing the target image, where the preprocessing includes at least one of:

and carrying out position correction on the text in the target image through a text position correction algorithm, and converting the text box label of the target image into a label at a pixel level.

Optionally, the annotated content comprises one of: text line attributes, corners, offsets.

Optionally, training by using random gradient descent with momentum based on an overall scheme of deep learning to train one or more layout analysis deep learning network models;

Optionally, the semantic segmentation performed on the target image comprises at least one of: the method comprises the steps of a text box of a regression target image, an anchor detection box of the regression target image, pixel points in a prediction target image and pixel points in a corner region in the prediction target image.

Optionally, the obtained semantic segmentation result includes at least one of: acquiring corner positions through a text box of a regression target image, acquiring boundary positions of the text box of the target image through an anchor detection box of the regression target image and pixel points in a corner region in a prediction target image, and acquiring the precise boundary positions of the text box of the target image through pixel points in the corner region in the prediction target image.

Optionally, carrying out corner matching by combining a plurality of weights; including at least one of the following weights: distance, aspect ratio, angle.

Optionally, classifying all predicted pixel points to obtain the attribute of a text box of the target image;

and analyzing the layout of the target image based on the attribute of the text box.

Optionally, when training one or more layout analysis deep learning network models, further comprising adding one or more interference parameters to improve robustness of the one or more layout analysis deep learning network models.

Optionally, the interference parameter comprises at least one of: background, rotation, perspective, distortion, noise, gaussian blur, motion blur.

Optionally, when one or more layout analysis deep learning network models are trained, for predictions of different attributes, model losses are calculated by using different methods, and all the model losses are weighted to obtain a total loss of the model.

Optionally, the calculated model loss comprises at least one of: smooth L1 loss, cross entropy loss.

Optionally, the characteristics of the target object include at least one of: global features of the target image, local features of the target image, and a correlation between target image level contexts.

Optionally, the features of the target image are obtained through a convolutional neural network or a full convolutional network, including obtaining global features of the target image and local features of the target image.

Optionally, the method further comprises the step of improving the receptive field of the full convolution network by using a parallel architecture formed by the hole convolution cascade and the hole convolutions with different sampling rates.

The present invention also provides an OCR-based image analysis system, including: