Entry region extraction device and entry region extraction program

文档序号:1220302 发布日期:2020-09-04 浏览:12次 中文

阅读说明:本技术 记入区域提取装置和记入区域提取程序 (Entry region extraction device and entry region extraction program ) 是由 松本光弘 片冈江利 渡边阳介 山本俊辅 菅野干人 小出高道 于 2018-08-24 设计创作,主要内容包括:在记入区域提取装置(10)中,学习部(22)从多个文件的图像(30)中学习与文件的种类对应的特征而得到学习模型(31)。特征区域提取部(23)使用学习模型(31)从文件样品的图像(40)中提取作为表示与文件样品的种类对应的特征的区域的特征区域(43)。记入区域提取部(25)从文件样品的图像(40)中的除了由特征区域提取部(23)提取出的特征区域(43)以外的剩余区域中,提取作为记入栏区域的记入区域(46)。(In a recording region extraction device (10), a learning unit (22) learns features corresponding to the types of documents from images (30) of a plurality of documents to obtain a learning model (31). A feature region extraction unit (23) extracts a feature region (43) from an image (40) of a document sample, the feature region being a region representing a feature corresponding to the type of the document sample, using a learning model (31). A posting region extraction unit (25) extracts a posting region (46) that is a posting column region from the remaining region in the image (40) of the document sample, excluding the feature region (43) extracted by the feature region extraction unit (23).)

1. A check-in area extracting device comprising:

a feature region extraction unit that extracts a feature region, which is a region indicating a feature corresponding to a type of a document sample, from an image of at least one document sample using a learning model obtained by learning a feature corresponding to a type of a document from an image of a plurality of documents including entry fields determined according to the type of the document and individually entered in the entry fields for each document; and

and a posting region extraction unit that extracts a posting region that is a posting column region from the remaining region in the image of the document sample excluding the feature region extracted by the feature region extraction unit.

2. The posting area extraction apparatus according to claim 1, wherein,

the feature region extraction unit digitizes the significance of a feature corresponding to the type of the document sample for each pixel of the image of the document sample, binarizes the value of the significance using a threshold, and divides the image of the document sample into the feature region and the residual region.

3. The posting area extraction apparatus according to claim 1 or 2, wherein,

the entry region extraction device further includes an object region extraction unit that extracts one or more object regions, which are regions in which objects are described in the document sample, from the image of the document sample as candidates for the entry region,

the entry region extraction unit excludes an object region overlapping with the feature region from the object regions extracted by the object region extraction unit, from the candidates of the entry region.

4. The posting area extraction apparatus according to claim 3, wherein,

the entry region extraction unit synthesizes 2 or more overlapping object regions among the object regions extracted by the object region extraction unit.

5. A posting area extraction apparatus according to claim 3 or 4,

the entry region extraction unit synthesizes 2 or more object regions having a distance between them of a threshold value or less among the object regions extracted by the object region extraction unit.

6. A posting area extraction apparatus according to any one of claims 3 to 5, wherein,

the entry region extraction unit excludes a target region having an area equal to or smaller than a threshold value from the target region extracted by the target region extraction unit, from the candidates for the entry region.

7. A posting area extraction apparatus according to any one of claims 3 to 6, wherein,

the object region extraction section recognizes at least a character and a mark as the object.

8. A posting area extraction apparatus according to any one of claims 1 to 7,

the entry region extraction device further includes an entry region synthesis unit that synthesizes 2 or more entry regions extracted by the feature region extraction unit from the images of the plurality of document samples, when the 2 or more entry regions are overlapped with each other, the 2 or more entry regions being extracted by the entry region extraction unit from the remaining regions other than the feature region in the images of the different document samples.

9. A posting area extraction program that causes a computer to execute:

a feature region extraction process of extracting a feature region, which is a region indicating a feature corresponding to a type of a document sample, from an image of at least one document sample using a learning model obtained by learning a feature corresponding to a type of a document from an image of a plurality of documents including entry fields determined according to the types of the documents and individually entered in the entry fields for each document; and

and a posting region extraction process of extracting a posting region as a posting column region from a remaining region in the image of the document sample other than the feature region extracted by the feature region extraction process.

Technical Field

The present invention relates to a posting area extraction device and a posting area extraction program.

Background

Patent document 1 describes the following technique: a plurality of standard documents are divided into a plurality of small regions, the average density of RGB components and the feature quantity of the average color are extracted for each small region, and a registration field region is extracted based on the deviation quantity of the feature quantity.

Patent document 2 describes the following technique: the intermediate image generated by the neural network is processed, and an image of a region representing the feature of the object is extracted from the variation of the loss function and synthesized.

Patent document 3 describes the following technique: a character region is extracted by performing local adaptive binarization of color components by performing local adaptive threshold processing and dilation processing on a low-resolution image.

Disclosure of Invention

Problems to be solved by the invention

The technique described in patent document 1 is susceptible to variations and noise, and the extraction accuracy is significantly influenced by parameters. Since the method of determining the small region and the threshold setting of the deviation amount are complicated, there is a possibility that sufficient accuracy cannot be obtained in practice even if the technique described in patent document 1 is applied.

The object of the present invention is to extract a registered column region from a document sample with high accuracy.

Means for solving the problems

An entry region extraction device according to an aspect of the present invention includes:

a feature region extraction unit that extracts a feature region, which is a region indicating a feature corresponding to a type of a document sample, from an image of at least one document sample using a learning model obtained by learning a feature corresponding to a type of a document from an image of a plurality of documents including entry fields determined according to the type of the document and individually entered in the entry fields for each document; and

and a posting region extraction unit that extracts a posting region that is a posting column region from the remaining region in the image of the document sample excluding the feature region extracted by the feature region extraction unit.

The feature region extraction unit digitizes the significance of a feature corresponding to the type of the document sample for each pixel of the image of the document sample, binarizes the value of the significance using a threshold, and divides the image of the document sample into the feature region and the residual region.

The entry region extraction device further includes an object region extraction unit that extracts one or more object regions, which are regions in which objects are described in the document sample, from the image of the document sample as candidates for the entry region,

the entry region extraction unit excludes an object region overlapping with the feature region from the object regions extracted by the object region extraction unit, from the candidates of the entry region.

The entry region extraction unit synthesizes 2 or more overlapping object regions among the object regions extracted by the object region extraction unit.

The entry region extraction unit synthesizes 2 or more object regions having a distance between them of a threshold value or less among the object regions extracted by the object region extraction unit.

The entry region extraction unit excludes a target region having an area equal to or smaller than a threshold value from the target region extracted by the target region extraction unit, from the candidates for the entry region.

The object region extraction section recognizes at least a character and a mark as the object.

The entry region extraction device further includes an entry region synthesis unit that synthesizes 2 or more entry regions extracted by the feature region extraction unit from the images of the plurality of document samples, when the 2 or more entry regions are overlapped with each other, the 2 or more entry regions being extracted by the entry region extraction unit from the remaining regions other than the feature region in the images of the different document samples.

A written region extraction program according to an aspect of the present invention causes a computer to execute:

a feature region extraction process of extracting a feature region, which is a region indicating a feature corresponding to a type of a document sample, from an image of at least one document sample using a learning model obtained by learning a feature corresponding to a type of a document from an image of a plurality of documents including entry fields determined according to the types of the documents and individually entered in the entry fields for each document; and

and a posting region extraction process of extracting a posting region as a posting column region from a remaining region in the image of the document sample other than the feature region extracted by the feature region extraction process.

Effects of the invention

In the present invention, since the entry column region is extracted from the remaining region of the image other than the feature region extracted using the learning model without finely dividing the image of the document sample, the influence of the variation and the noise can be suppressed. Therefore, according to the present invention, the entry field region can be extracted from the document sample with high accuracy.

Drawings

Fig. 1 is a block diagram showing the configuration of a posting area extraction device according to embodiment 1.

Fig. 2 is a flowchart showing the operation of the entry region extraction device according to embodiment 1.

Fig. 3 is a diagram illustrating a flow of processing performed by the entry region extraction device according to embodiment 1.

Fig. 4 is a diagram illustrating an example of the binarization process performed by the entry region extraction device according to embodiment 1.

Fig. 5 is a diagram illustrating an example of the feature region extraction processing performed by the entry region extraction device according to embodiment 1.

Fig. 6 is a diagram illustrating an example of the target region extraction process performed by the entry region extraction device according to embodiment 1.

Fig. 7 is a diagram illustrating an example of entry region extraction processing performed by the entry region extraction device according to embodiment 1.

Fig. 8 is a diagram showing an example of entry region merging processing performed by the entry region extraction device according to embodiment 1.

Fig. 9 is a diagram illustrating an example of entry region synthesis processing performed by the entry region extraction device according to embodiment 1.

Detailed Description

Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the drawings, the same or corresponding portions are denoted by the same reference numerals. In the description of the embodiments, the description of the same or corresponding portions is omitted or simplified as appropriate. The present invention is not limited to the embodiments described below, and various modifications can be made as necessary. For example, the embodiments described below may be partially implemented.

20页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种图像特征提取方法及装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!