Image processing system, image processing method, and storage medium

文档序号：588920 发布日期：2021-05-25 浏览：28次中文

阅读说明：本技术 图像处理系统、图像处理方法和存储介质 (Image processing system, image processing method, and storage medium ) 是由世渡秀和于 2020-11-25 设计创作，主要内容包括：本发明涉及一种图像处理系统、图像处理方法和存储介质。为了降低针对图像数据中的字符串篡改的误判断的可能性,本发明包括：生成单元,其被配置为基于篡改字符图像、篡改之前的字符图像、以及表示所述篡改字符图像和所述篡改之前的字符图像之间的差的图像,通过进行机器学习处理来生成学习模型；输入单元,其被配置为输入图像数据；以及估计单元,其被配置为通过使用所述生成单元所生成的学习模型,来估计所述输入单元所输入的图像数据是否包括篡改字符。(The invention relates to an image processing system, an image processing method and a storage medium. In order to reduce the possibility of misjudgment of character string tampering in image data, the present invention includes: a generation unit configured to generate a learning model by performing machine learning processing based on a tampered character image, a character image before tampering, and an image representing a difference between the tampered character image and the character image before tampering; an input unit configured to input image data; and an estimation unit configured to estimate whether or not the image data input by the input unit includes a falsified character by using the learning model generated by the generation unit.)

1. An image processing system comprising:

generating means for generating a learning model by performing machine learning processing based on a falsified image, an image before falsification, and an image representing a difference between the falsified image and the image before falsification;

an input section for inputting image data; and

an estimating section that estimates whether or not the image data input by the input section includes a falsified image by using the learning model generated by the generating section.

2. The image processing system according to claim 1, further comprising display means, wherein in a case where the estimation means estimates that the image data input for the input means includes the falsified image, the display means emphasizes the falsified portion in the falsified image.

3. The image processing system according to claim 2, wherein the display means displays an image represented by the image data input by the input means together with the image in which the tampered portion is emphasized.

4. The image processing system of claim 2, further comprising:

a receiving component;

first determining means for determining that the specified pixel is a tampered pixel, in a case where the receiving means receives a specification of the pixel in the tampered image displayed by the displaying means by a user; and

second determining means for determining that the specified pixel is not a falsified pixel, in a case where the receiving means receives a designation of a pixel in the falsified image displayed by the display means by a user.

5. The image processing system according to claim 2, wherein in a case where the estimation means estimates that the image data input by the input means does not include the falsified image, the display means displays information indicating that falsification is not detected.

6. The image processing system according to claim 1, wherein the learning means generates a learning model by performing machine learning processing based on a falsified image represented by the image data input by the input means, an image before falsification represented by the image data input by the input means, and an image representing a difference between the falsified image and the image before falsification.

7. The image processing system according to claim 6, further comprising third determination means for determining, based on the specified identification information, a learning model updated by the machine learning process based on a falsified image represented by the image data input by the input means and an image before falsification represented by the image data input by the input means.

8. The image processing system according to claim 7, further comprising storage means for storing the learning model and the identification information in an associated manner,

wherein the third determination means determines the learning model corresponding to the specified identification information as the learning model updated by the machine learning process based on the image before falsification input by the input means.

9. The image processing system of claim 1, further comprising:

a determination unit configured to determine whether or not an image represented by the image data input by the input unit is a falsified image, or whether or not an image represented by the image data input by the input unit is an image before falsification; and

storage means for storing, in an associated manner, image data determined by the determination means that the image data represents a falsified image and image data determined by the determination means that the image data represents an image before falsification.

10. The image processing system according to claim 9, wherein the storage means stores, in an associated manner, image data determined by the determination means that the image data represents a falsified image and image data determined by the determination means that the image data represents an image before falsification, based on an identifier included in the image data input by the input means.

11. The image processing system of claim 10, wherein the identifier is a two-dimensional code.

12. The image processing system according to claim 1,

wherein the image processing system includes at least an image processing apparatus, a learning apparatus, and an estimation apparatus,

wherein the image processing apparatus includes the input section,

wherein the learning apparatus includes a learning part, an

Wherein the estimation device comprises the estimation component.

13. The image processing system according to claim 12,

wherein the image processing apparatus further includes a transmission section for transmitting the image data input by the input section to the learning apparatus,

wherein the learning apparatus further includes receiving means for receiving the image data transmitted from the image processing apparatus via the transmitting means, an

Wherein the learning means performs the machine learning process based on the image data received by the receiving means.

14. The image processing system according to claim 12,

wherein the image processing system includes at least the image processing device, the learning device, the estimation device, and a character recognition device,

wherein the character recognition apparatus includes a character recognition section for performing character recognition processing, and

wherein the estimating means of the estimating device estimates whether or not the image data input by the input means includes a falsified image based on the image data input by the image processing device, a learning model generated by the learning device, and a result of the character recognition processing output by the character recognition device.

15. An image processing method comprising:

generating a learning model by performing machine learning processing based on a falsified image, an image before falsification, and an image representing a difference between the falsified image and the image before falsification;

inputting image data; and

estimating whether or not the image data input in the input includes a falsified image by using the learning model generated in the generation.

16. A non-transitory computer-readable storage medium storing a program that, when executed by a computer, causes the computer to perform an image processing method comprising:

inputting image data; and

estimating whether or not the image data input in the input includes a falsified image by using the learning model generated in the generation.

Technical Field

The invention relates to an image processing system, an image processing method and a storage medium.

Background

Conventional techniques to detect tampering (alteration) in image data are known.

Japanese patent laid-open No. 2009-200794 discusses an image processing apparatus for determining tampering. The image processing apparatus divides image data into a plurality of groups each including pixels whose luminance values are close, and performs character recognition processing for each group. The image processing apparatus determines that the image has been tampered with if the result of performing the character recognition processing for each group and the result of performing the character recognition processing without grouping the image data are different from each other.

Disclosure of Invention

The technique discussed in japanese patent laid-open No. 2009-200794 judges the presence or absence of tampering based on only brightness. Thus, even if the character is not falsified, the technique may make a false judgment of falsification due to a change in the brightness of the character caused by ink blurring or writing pressure.

The present invention has been devised in view of the above problems, and aims to reduce the possibility of erroneous judgment for falsification of a character string in image data.

An image processing system according to the present invention includes: generating means for generating a learning model by performing machine learning processing based on a falsified image, an image before falsification, and an image representing a difference between the falsified image and the image before falsification; an input section for inputting image data; and an estimating section that estimates whether or not the image data input by the input section includes a falsified image by using the learning model generated by the generating section.

An image processing method according to the present invention includes: generating a learning model by performing machine learning processing based on a falsified image, an image before falsification, and an image representing a difference between the falsified image and the image before falsification; inputting image data; and estimating whether or not the image data input in the input includes a falsified image by using the learning model generated in the generation.

A non-transitory computer-readable storage medium according to the present invention stores a program that, when executed by a computer, causes the computer to perform an image processing method including: generating a learning model by performing machine learning processing based on a falsified image, an image before falsification, and an image representing a difference between the falsified image and the image before falsification; inputting image data; and estimating whether or not the image data input in the input includes a falsified image by using the learning model generated in the generation.

Further features of the invention will become apparent from the following description of exemplary embodiments with reference to the attached drawings.

Drawings

Fig. 1 is a block diagram showing an example of an image processing system according to a typical embodiment.

Fig. 2A is a block diagram showing an example of the structure of an image processing apparatus. Fig. 2B is a block diagram showing an example of the structure of the learning apparatus. Fig. 2C is a block diagram showing an example of the structure of the tamper detection server.

Fig. 3A is a sequence diagram showing an example of the overall processing flow performed by the image processing system in the learning phase according to the exemplary embodiment. Fig. 3B is a sequence diagram showing an example of the overall processing flow performed by the image processing system in the tamper detection stage according to the exemplary embodiment.

Fig. 4 is a schematic diagram illustrating an example of a blank learning original.

Fig. 5 is a schematic diagram showing an example of a Graphical User Interface (GUI) for receiving an instruction to read an original.

Fig. 6A is a schematic diagram showing an example of an original learning image. Fig. 6B is a schematic diagram showing an example of tampering with the learning image. Fig. 6C is a schematic diagram showing an example of learning data generated from an original learning image and a falsified learning image.

Fig. 7A is a schematic diagram showing an example of a processing target image. Fig. 7B is a schematic diagram showing an example of a bitmap as a result of tampering detection. Fig. 7C is a schematic diagram showing an example of an emphasized image having an emphasized tampered portion.

Fig. 8 is a flowchart showing an example of a specific processing flow performed by the image processing apparatus at the stage of tamper detection.

Fig. 9 is a flowchart showing an example of a specific processing flow performed by the tamper detection server in the tamper detection phase.

Fig. 10 is a schematic diagram showing an example of a GUI for setting and specifying tamper detection.

Fig. 11A is a schematic diagram showing an example of a GUI for displaying a list of tamper detection results. Fig. 11B is a schematic diagram showing an example of a GUI for displaying details of a tampering detection result.

Fig. 12 is a schematic diagram showing an example of a GUI for enabling a user to correct a tampering detection result.

Fig. 13A is a schematic diagram showing a first example of a GUI in which an emphasized image and a comparative image are arranged in a contrasting manner. Fig. 13B is a schematic diagram showing a second example of a GUI in which an emphasized image and a comparative image are arranged in a contrasting manner.

Fig. 14A is an example of a comparison image in the first comparison mode. Fig. 14B is an example of a comparison image in the second comparison mode. Fig. 14C is an example of a comparison image in the third comparison mode. Fig. 14D is an example of a comparison image in the fourth comparison mode.

Fig. 15 is a schematic diagram showing another example of a GUI for enabling a user to correct a tampering detection result.

Fig. 16 is a flowchart showing an example of a specific display control processing flow performed by the image processing apparatus in a case where falsification is detected in the target document.

Fig. 17 is a schematic diagram showing an example of a GUI in which an emphasized image and a comparative image are arranged in a contrasting manner according to a modification.

Detailed Description

Exemplary embodiments will be described in detail below with reference to the accompanying drawings. The following exemplary embodiments do not limit the invention within the scope of the appended claims. Although a number of features are described in the exemplary embodiments, not all of them are essential to the invention and can be combined in any way. In addition, in the drawings, the same reference numerals are assigned to the same or similar structures, and a repetitive description of these structures will be omitted.

<1. System overview >

Fig. 1 is a schematic diagram showing an example of the structure of an image processing system 100 according to an exemplary embodiment. The image processing system 100 includes an image processing device 101, a learning device 102, a tampering detection server 103, and an Optical Character Recognition (OCR) server 104. The image processing device 101, the learning device 102, the falsification detection server 103, and the OCR server 104 are connected to each other via a network 105.

The image processing apparatus 101 may be, for example, a multifunction peripheral (MFP) having printing and image reading functions, or a digital scanner dedicated to image reading. The image processing apparatus 101 includes a reading unit 111 and a display control unit 112. The reading unit 111 reads the original 11 to generate a read image. More specifically, the reading unit 111 acquires a read image of the original 11. The original 11 typically includes a character string, and thus the read image includes a character image.

For example, the image processing apparatus 101 may support generation of learning data in a learning phase. More specifically, the operator hand-writes characters in the prepared blank learning original, and sets the hand-written learning original to the image processing apparatus 101. The learning original may be, for example, a table document having a writing field at one or more predetermined positions. The learning document may have visual identification information (e.g., a print number, a barcode, or a two-dimensional code) for uniquely identifying each individual learning document. The image processing apparatus 101 is also capable of printing a blank learning original. The reading unit 111 reads the set learning original to generate a read image 12. The read image 12 is processed as an image of an original of a learning original. In this specification, the read image 12 is also referred to as an original learning image. The operator or others tampers with the written learning original (i.e., original) (e.g., strokes the written learning original with a pen), and sets the tamped learning original to the image processing apparatus 101. The reading unit 111 reads the falsified learning original to generate a read image 13. In this specification, the read image 13 is also referred to as a falsification learning image. A plurality of pairs of the original learning image 12 and the falsified learning image 13 are generated by repeating the following sequence: handwriting characters in a learning manuscript; reading an original (learning original) with the image processing apparatus 101; intentionally tampering with the learning original; and reading the falsified learning original (falsified version) with the image processing apparatus 101. The image processing apparatus 101 transmits these pairs of the original learning image 12 and the falsified learning image 13 to the learning apparatus 102 via the network 105. The learning device 102 performs machine learning by using learning data generated from these pairs as described below. Regardless of the above description, the generation of the original learning image 12 and the falsified learning image 13 may also be performed by an apparatus other than the image processing apparatus 101.

In the falsification detection stage, the image processing apparatus 101 reads an object original including handwritten characters to generate a read image 21. According to the present specification, the read image 21 is also referred to as a processing target image. The image processing apparatus 101 transmits the generated processing object image 21 to the falsification detection server 103 via the network 105. The display control unit 112 of the image processing apparatus 101 receives the detection result data 32 from the falsification detection server 103. The result data 32 indicates the result of tampering detection by using the processing object image 21. Then, the display control unit 112 controls screen display of the falsification detection result based on the detection result data 32. Various examples of display control will be specifically described below.

The learning apparatus 102 may be an information processing apparatus such as a computer and a workstation that performs a supervised learning process. The learning device 102 includes a data processing unit 121, a learning unit 122, and a storage unit 123. The data processing unit 121 accumulates the above-described pairs of the original learning image 12 and the falsified learning image 13 generated by the image processing apparatus 101 (or other apparatus) in the storage unit 123. The data processing unit 121 generates learning data based on the accumulated pairs. The learning unit 122 generates and/or updates the learned model (learning model) 41 for tamper detection by machine learning processing using learning data generated based on a learning image (for example, a pair of the original learning image 12 and the falsification learning image 13) that is a read image of a learning original. The learning unit 122 instructs the storage unit 123 to store the generated and/or updated learned model 41. For example, if a neural network model is used as the machine learning model, the learned model 41 is a data set including parameters such as weights and biases for each node of the neural network. Deep learning based on multi-layer neural networks may be used as an example of a machine learning technique for generating and/or updating neural network models. Some examples of the generation of learning data and the generation and/or updating of learned models will be described in detail below. The learning unit 122 provides the learned model 41 to the tampering detection server 103 in response to a request from the tampering detection server 103 described below.

The tamper detection server 103 may be an information processing apparatus such as a computer and a workstation. The falsification detection server 103 detects a falsified portion included in the target original by using the processing target image 21 received from the image processing apparatus 101. The falsification detection server 103 includes an image acquisition unit 131 and a detection unit 132. The image acquisition unit 131 acquires the processing target image 21 which is a read image of the target document. The detection unit 132 detects a tampered portion included in the target document using the processing target image 21. According to the present exemplary embodiment, the detection unit 132 performs tamper detection using the above-described learned model 41 supplied from the learning device 102. In the present exemplary embodiment, the detection unit 132 estimates whether each of a plurality of pixels in the processing object image 21 belongs to a falsified portion (for example, falsification detection for each pixel) by using the learned model 41. Alternatively, the following modifications will be explained below: the detection unit 132 determines whether each of one or more characters in the processing object image 21 includes a falsified portion (for example, falsification detection for each character). As a result of the falsification detection, the detection unit 132 generates the detection result data 32 indicating which portion of the processing target image 21 is determined to have been falsified, and supplies the generated detection result data 32 to the image processing apparatus 101. The detection result data 32 may include, for example, bitmap data indicating whether or not each pixel of the processing target image 21 belongs to a tampered portion. The image processing apparatus 101 presents the falsification detection result indicated by the detection result data 32 to the user, and the user authenticates the result. To support authentication of the user, the image processing apparatus 101 or the falsification detection server 103 generates an emphasized image indicating pixels in the emphasized processing object image 21 that have been determined to belong to a falsified portion. In the case where the falsification detection server 103 generates an emphasized image, the falsification detection server 103 transmits the generated emphasized image to the image processing apparatus 101 together with the above-described bitmap data.

According to the present exemplary embodiment, the processing target image 21 can be applied to the learned model 41 in units of characters. Thus, the falsification detection server 103 transmits the processing object image 21 to the OCR server 104 to request the OCR server 104 to recognize the characters included in the processing object image 21. The OCR server 104 may be an information processing apparatus such as a computer and a workstation. The OCR server 104 performs Optical Character Recognition (OCR) in response to a request from the tamper detection server 103. The OCR server 104 includes a character recognition unit 141. The character recognition unit 141 performs OCR on the processing target image 21 using a known technique, thereby recognizing characters and character area positions in the processing target image 21. The character recognition unit 141 transmits the recognition result data 31 indicating the recognition result to the falsification detection server 103.

<2. apparatus Structure >

Fig. 2A is a block diagram showing an example of the structure of the image processing apparatus 101. Fig. 2B is a block diagram showing an example of the structure of the learning device 102. Fig. 2C is a block diagram showing an example of the structure of the tamper detection server 103.

(1) Image processing apparatus

The image processing apparatus 101 includes a Central Processing Unit (CPU)201, a Read Only Memory (ROM)202, a Random Access Memory (RAM)204, a printer device 205, a scanner device 206, a conveyance device 207, a storage device 208, an input device 209, a display device 210, and an external interface (I/F) 211. The data bus 203 is a communication line for interconnecting these devices included in the image processing apparatus 101.

The CPU 201 is configured to control the entire image processing apparatus 101. The CPU 201 executes a boot program stored in the ROM 202, which is a nonvolatile memory, to start an Operating System (OS) of the image processing apparatus 101. The CPU 201 executes the controller program stored in the storage device 208 under the OS. The controller program is a program for controlling the respective devices of the image processing apparatus 101. The RAM 204 serves as a main memory device of the CPU 201. The RAM 204 provides the CPU 201 with a temporary storage area (i.e., a work area).

The printer apparatus 205 is configured to print an image on paper (also referred to as a recording material or sheet). The printer device 205 may employ an electrophotographic method using a photosensitive drum or a photosensitive belt, an inkjet method for discharging ink from a micro nozzle array to directly print an image on a sheet, and any other printing method. The scanner device 206, which includes an optical reading device such as a Charge Coupled Device (CCD) for optically scanning an original, converts an electric signal supplied from the optical reading device into image data of a read image. A conveying device 207, which may be an Automatic Document Feeder (ADF), conveys documents placed on the ADF one by one to a scanner device 206. The scanner device 206 is capable of reading not only the original conveyed from the conveying device 207 but also an original set on an original positioning plate (not shown) of the image processing apparatus 101.

The storage device 208 may be a writable/readable secondary storage device including nonvolatile memory such as a Hard Disk Drive (HDD) and a solid state drive (SDD). The storage device 208 stores various types of data including the above-described controller program, setting data, and image data. The input device 209 such as a touch panel and hardware keys receives user input such as an operation instruction or information input from a user. The input device 209 transmits an input signal indicating the content of the received user input to the CPU 201. A display device 210 such as a Liquid Crystal Display (LCD) or a Cathode Ray Tube (CRT) displays an image (e.g., a user interface image) generated by the CPU 201 on a screen. For example, the CPU 201 can determine what operation the user has performed based on the pointing position and the assignment of the user interface. The pointing position is indicated by an input signal received from the input device 209. The assignment of the user interface is displayed by the display means 210. According to the determination result, the CPU 201 controls the operation of the corresponding device or changes the content displayed by the display device 210.

An external interface (I/F)211 transmits and receives various types of data including image data with respect to external devices via the network 105. The network 105 may be, for example, a Local Area Network (LAN), a telephone line, a short-range wireless (e.g., infrared) network, or any other type of network. The external I/F211 can receive Page Description Language (PDL) data describing rendering content used for printing from external devices such as the learning device 102 and a Personal Computer (PC) (not shown). The CPU 201 interprets the PDL data received by the external I/F211 to generate image data. The image data may be sent to printer device 205 for printing or may be sent to storage device 208 for storage. The external I/F211 may transmit the image data of the read image acquired by the scanner device 206 to the falsification detection server 103 for falsification detection, and receive the detection result data 32 from the falsification detection server 103.

(2) Learning apparatus

The learning apparatus 102 includes a CPU 231, a ROM 232, a RAM 234, a storage device 235, an input device 236, a display device 237, an external I/F238, and a Graphics Processing Unit (GPU) 239. The data bus 233 is a communication line for connecting these devices included in the learning apparatus 102 to each other.

The CPU 231 is configured to control the entire learning apparatus 102. The CPU 231 executes a boot program stored in the ROM 232, which is a nonvolatile memory, to start the OS of the learning device 102. The CPU 231 executes the learning data generation program and the learning program stored in the storage device 235 on the OS. The learning data generation program is a program for generating learning data based on a pair of the original learning image 12 and the falsified learning image 13. A learning program is a program for generating and/or updating a learned model (e.g., a neural network model) for tamper detection through machine learning. The RAM 234 serves as a main memory device of the CPU 231, and provides a temporary storage area (e.g., work area) to the CPU 231.

The storage 235 may be a writable/readable secondary storage including a nonvolatile memory such as an HDD and an SDD. The storage device 235 stores various types of data including the above-described programs, learning images, learning data, and model data. The input device 236 such as a mouse and a keyboard receives user input such as an operation instruction and information input from a user. A display device 237 such as an LCD and a CRT displays an image generated by the CPU 231 on a screen. The external I/F238 transmits and receives data related to the learning process with respect to an external device via the network 105. The external I/F238 can receive a pair of the original learning image 12 and the falsified learning image 13 from the image processing apparatus 101, for example. The external I/F238 may send the learned model 41 generated and/or updated by machine learning to the tamper detection server 103. The GPU 239, which is a processor capable of advanced parallel processing, facilitates learning processing for generating and/or updating the learned model 41 in cooperation with the CPU 231.

(3) Tamper detection server

The tamper detection server 103 includes a CPU 261, ROM 262, RAM 264, storage 265, input 266, display 267, and external I/F268. The data bus 263 is a communication line for mutually connecting these devices included in the tamper detection server 103.

The CPU 261 is a controller for controlling the entire tamper detection server 103. The CPU 261 executes a boot program stored in the ROM 262 as a nonvolatile memory to start the OS of the tamper detection server 103. The CPU 261 executes the tamper detection program stored in the storage device 265 on the OS. The falsification detection program is a program for detecting a falsified portion included in a target original by using a read image (i.e., a processing target image) of the target original acquired from a client apparatus (e.g., the image processing apparatus 101). The RAM 264 serves as a main memory device of the CPU 261. The RAM 264 provides a temporary memory (i.e., a work area) to the CPU 261.

The storage 265 may be a writable/readable secondary storage including a nonvolatile memory such as an HDD and an SDD. The storage device 265 stores various types of data such as the program, image data, and detection result data 32 described above. Input devices 266 such as a mouse and a keyboard receive user inputs such as operation instructions and information inputs from a user. A display device 267 such as an LCD and a CRT displays an image generated by the CPU 261 on a screen. The external I/F268 transmits and receives data related to tamper detection with respect to the external device via the network 105. The external I/F268 may receive the processing target image 21 from the image processing apparatus 101, for example, and transmit the detection result data 32 to the image processing apparatus 101. The external I/F268 may send a request to the learning device 102 for providing the learned model 41 and receive the learned model 41 from the learning device 102. The external I/F268 may transmit a request for performing OCR to the OCR server 104 and receive recognition result data 31 indicating OCR results from the OCR server 104.

Although not shown in fig. 2C, the structure of the OCR server 104 may also be similar to that of the tamper detection server 103.

<3. Process flow >

Fig. 3A is a sequence diagram showing an example of a schematic processing flow in the learning phase performed by the image processing system 100. Fig. 3B is a sequence diagram showing an example of a schematic processing flow in the tamper detection phase by the image processing system 100.

<3-1. learning stage >

In step S301, the operator sets a learning original full of handwritten characters on the image processing apparatus 101 in the learning stage, and instructs the image processing apparatus 101 to read the original. In this case, the operator inputs information indicating that the set learning original is an untampered original to the image processing apparatus 101 via the input device 209. In step S302, according to the instruction of the operator, the reading unit 111 of the image processing apparatus 101 reads the set learning original to generate a read image 12. The reading unit 111 attaches a flag indicating that the read image 12 is an original learning image to the read image 12. In step S303, the operator sets the falsified learning original to the image processing apparatus 101, and instructs the image processing apparatus 101 to read the original. In this case, the operator inputs information indicating that the set learning original includes a tampered portion to the image processing apparatus 101 via the input device 209. In step S304, according to the instruction of the operator, the reading unit 111 of the image processing apparatus 101 reads the set learning original to generate a read image 13. The reading unit 111 attaches a flag indicating that the read image 13 is a falsified learning image to the read image 13. In steps S302 and S304, the reading unit 111 reads the identification information included in the learning original to recognize that the original learning image 12 and the falsified learning image 13 are a pair of an original and a falsified version of the same learning original. The reading unit 111 associates a document Identifier (ID) for identifying the recognized learning document with the original learning image 12 and the falsified learning image 13. The reading unit 111 further associates a data set ID for identifying a unit generated/updated by the learning model with the original learning image 12 and the falsified learning image 13. As an example, in the case where one learned model is generated and/or updated for each image processing system, the data set ID may be an identifier for uniquely identifying the image processing apparatus 101. As another example, where one learned model is generated and/or updated for each user, the dataset ID may be an identifier that uniquely identifies each user. As yet another example, where one learned model is generated and/or updated for each user group, the data set ID may be an identifier that uniquely identifies each user group. In step S305, the reading unit 111 transmits these learning images and related data (e.g., the original learning image 12, the falsified learning image 13, the flag, the original ID, and the data set ID) thus generated to the learning device 102.

Fig. 4 is a schematic diagram illustrating an example of a blank learning original. In the example shown in fig. 4, a learning original 401 is a table-form document including an eight-character writing field 402. The operator inputs an arbitrary character in each character writing field 402 to generate an original of the learning original. The learning document 401 has identification information 403 located in the upper right corner. The identification information (also referred to as embedded information) 403 is a two-dimensional code for uniquely identifying a visual representation of the original ID of the learning original 401. The position and size of the character writing field included in the learning document identified by the document ID are defined in advance and shared with the learning device 102. The original ID may be, for example, a Universally Unique Identifier (UUID).

Fig. 5 is a schematic diagram illustrating an example of a Graphical User Interface (GUI) used by the image processing apparatus 101 to receive an instruction for reading an original. The operation window 500 shown in fig. 5 may be displayed on the screen of the display device 210, for example. The operation window 500 includes a preview area 501, type selection buttons 502 and 503, a scan button 505, and a start transmission button 506. The type selection buttons 502 and 503 enable the operator to specify whether the original to be read or the type (also referred to as an attribute) of the read original is an original or a tampered version. In the case where the user operates (e.g., taps) the type selection button 502, the reading unit 111 attaches a flag indicating that the read image is a falsified learning image to the read image. In contrast, in the case where the user operates the type selection button 503, the reading unit 111 attaches a flag indicating that the read image is the original learning image to the read image. In the operation window 500, a button corresponding to the specified type (original or tampered version) may be displayed in an emphasized manner. The scan button 505 is used to trigger reading of an original set on the image processing apparatus 101. When the user operates the scan button 505 and the scanning is completed, a preview image of the read image is displayed in the preview area 501. Before starting data transmission, the operator may set another original on the image processing apparatus 101, and operate the scan button 505 again to store a plurality of read images and related data in the image processing apparatus 101 as a whole. When the scanning of at least one learning original is completed and then the original type is specified, the start transmission button 506 becomes operable. The start send button 506 is used to trigger the sending of the read image and related data. When the user operates the start transmission button 506, the reading unit 111 transmits the learning image and the related data to the learning device 102 (see step S305 shown in fig. 3A).

In step S305 shown in fig. 3A, the data processing unit 121 of the learning apparatus 102 receives the learning image and the related data from the image processing apparatus 101. In step S306, the data processing unit 121 instructs the storage unit 123 to store the received learning image and related data. When a sufficient amount of data is accumulated for learning, the data processing unit 121 and the learning unit 122 start processing for machine learning. In step S307, the data processing unit 121 reads a pair of the original learning image 12 and the falsified learning image 13 from the storage unit 123 to generate learning data from these read images.

Fig. 6A, 6B, and 6C illustrate how learning data is generated based on a learning image. Fig. 6A and 6B show an example of the original learning image 12 and an example of the falsified learning image 13, respectively. As described above, the original learning image 12 and the falsified learning image 13 forming a pair include the common character writing column. In the example shown in fig. 6A, the number "1" is input to the character writing field 402a of the original learning image 12. In contrast, the contents of the character writing field 402b of the falsified learning image 13 have been falsified. The character in the character writing bar 402b looks like the number "4" due to the added stroke. The data processing unit 121 cuts out partial images of the character writing field from the original learning image 12 and the falsified learning image 13, and calculates a difference image between the partial images cut out from the common character writing field. The data processing unit 121 can acquire the position of the character writing field to be cut out based on the original ID acquired by reading the identification information 403. In the difference image, it is assumed that a pixel indicating zero or an absolute value equal to or smaller than a predetermined threshold value does not belong to a falsified portion, and that a pixel indicating an absolute value larger than the threshold value belongs to a falsified portion. The character region image 611 shown in fig. 6C is a partial image cut out from the character writing field 402b of the falsification learning image 13. The character region image 611 serves as an input learning image input to the machine learning model by the learning unit 122. The binary image 612 shown in fig. 6C is an image generated and/or updated by the following process: a partial image is cut out from the character writing field 402a of the original learning image 12; the partial image (after positioning as needed) is subtracted from the character area image 611; and binarizing the image thus obtained on the basis of the threshold. In the binary image 612, a pixel indicating true (e.g., a white pixel) belongs to the tampered portion, and a pixel indicating false (e.g., a black pixel) does not belong to the tampered portion. The binary image 612 serves as a teacher image to be processed as teacher data for machine learning by the learning unit 122.

The learning image may include the original learning image 12 without the corresponding tampered learning image 13. In this case, the data processing unit 121 may generate, as the input learning image, a partial image cut out from the character writing field 402a of the original learning image 12, and generate, as the teacher image, a binary image in which all pixels having the same size indicate false (i.e., indicate that the entire image does not include a tampered portion).

The data processing unit 121 generates a plurality of the above-described input images and corresponding teacher images based on a plurality of pairs of the original learning image 12 and the falsified learning image 13 associated with the same data set ID. In step S308, the learning unit 122 repeatedly performs learning processing using these input images and teacher image within the range of the same dataset ID, thereby generating and/or updating the learned model 41 for tamper detection. The learned model 41 is not limited to a particular model, but may be a Full Convolution Network (FCN) model. For example, one iteration of the learning process may include the following process: inputting an input image to a model; calculating an error of the teacher data for the output data calculated based on the model (with the provisional parameter values); and adjusting the parameter value for reducing the error. For example, cross entropy may be used as an indicator of the error. For example, a back propagation method may be used as a technique for adjusting the parameter value. The learning unit 122 may repeat the learning process until it is determined that the learning converges, or until the number of repetitions reaches the upper limit value. Then, the learning unit 122 stores the generated and/or updated learned model 41 (a set of model parameters constituting the learned model 41) in the storage unit 123 in association with the respective dataset ID. The learning unit 122 may generate and/or update different learned models 41 for two or more different data set IDs. The learning unit 122 may update the previously generated and/or updated learned model 41 by additional learning processing using the newly acquired learning image. The learning unit 122 may also select learning data to be input to the learning process by any one of an online learning method, a batch (batch) learning method, and a mini-batch (mini-batch) learning method.

<3-2. tamper detection phase >

(1) Schematic process flow

In the falsification detection stage, in step S351, the user sets a target original to the image processing apparatus 101 and instructs the image processing apparatus 101 to read the original. The user may be the same or different from the operator involved in the learning phase. In step S352, the reading unit 111 of the image processing apparatus 101 reads the set target original in accordance with the user instruction to generate a read image 21. In step S353, the user instructs the image processing apparatus 101 to detect falsification to the target original. In step S354, the reading unit 111 attaches a flag indicating that the read image 21 is a processing target image to the read image 21 in accordance with the instruction for detecting falsification, and acquires setting data relating to falsification detection (for example, from a memory). The acquired setting data may include, for example, a data set ID for identifying a learning model to be used for tamper detection (e.g., an identifier for identifying the image processing apparatus 101 and a user or a user group). In step S355, the reading unit 111 transmits the processing object image 21 and the related data to the falsification detection server 103 together with the falsification detection request.

In step S355, the image acquisition unit 131 of the falsification detection server 103 receives the processing target image 21 and related data as the read image of the target document, and the falsification detection request from the image processing apparatus 101. The image acquisition unit 131 outputs the received image and data to the detection unit 132. In step S356, the detection unit 132 requests the learning device 102 to provide the latest learned model 41. The model request to be sent to the learning device 102 can include a data set ID. Upon receiving the model request, in step S357, the learning unit 122 of the learning apparatus 102 reads the latest learned model 41 from the storage unit 123, and sends the read learned model 41 to the detection unit 132. The most recent learned model 41 is identified by, for example, a dataset ID. In step S358, the detection unit 132 transmits the processing object image 21 to the OCR server 104 to request the OCR server 104 to recognize the character included in the processing object image 21. After receiving the OCR request, in step S359, the character recognition unit 141 of the OCR server 104 performs OCR on the processing object image 21 to recognize the character and the character area position in the processing object image 21. In step S360, the character recognition unit 141 sends the recognition result data 31 indicating the recognition result to the detection unit 132. In step S361, the detection unit 132 applies the processing target image 21 to the learned model 41 supplied from the learning device 102, thereby detecting a tampered portion included in the target original. As described above, the learned model 41 is a model generated and/or updated by machine learning using a learning image as a read image of a learning original. In the falsification detection process, for example, the detection unit 132 applies a partial image of the processing target image 21 to the processing target image 21 for each character region recognized as a result of the OCR. Thus, a falsification detection result is generated as bitmap data indicating whether or not each pixel belongs to a falsified portion with respect to the character region of each character recognized in the processing target image 21. In step S362, the detection unit 132 transmits the detection result data 32 to the image processing apparatus 101. The detection result data 32 includes integrated bitmap data having the same size as the processing object image 21 (indicating whether or not each pixel belongs to a tampered portion) generated by integrating the bitmap data obtained for each character region. In the following description, this integrated bitmap data is referred to as a detection result image. The detection unit 132 may additionally generate an emphasized image that emphasizes pixels determined as belonging to a tampered portion (hereinafter referred to as tampered pixels) as a result of the tampering detection using the processing target image 21, and include the generated emphasized image in the detection result data 32.

Fig. 7A shows the processing target image 21a as an example. Fig. 7B shows a detection result image 32a indicating a falsification detection result with respect to the processing object image 21 a. Fig. 7C shows an emphasized image 32b emphasizing a tampered portion of the processing target image 21 a. The processing target image 21a is an image generated by reading a contract book as a target document. The processing target image 21a includes a plurality of character regions. The character in the character area 701a looks like the number "4". These character areas may be recognized as a result of OCR by the character recognition unit 141 of the OCR server 104. The detection unit 132 of the falsification detection server 103 cuts out a character area image in each character area from the processing target image 21a, and applies the learned model 41 to each character area image. This enables the detection unit 132 to determine whether each pixel in each character area belongs to a tampered portion. The detection result image 32a is a binary image in which the falsification detection results are integrated into the entire image. In the detection result image 32a, the pixels determined to belong to the falsified portion indicate true (for example, black pixels), and the pixels determined not to belong to the falsified portion indicate false (for example, white pixels). For example, the pixels of some strokes constituting the number "4" in the character region 701a represent true. This implies the possibility of: strokes are added later for tampering. The emphasized image 32b is an image generated in the processing target image 21a by changing the color of the pixel indicating true in the detection result image 32a to a specific color. The specific color in this case is not limited, and may be, for example, red (RGB ═ 255,0,0 ]). In the example shown in fig. 7C, the color of some strokes of the numeral "4" in the character region 701a within the emphasized image 32b changes. The technique for emphasizing the pixels judged to belong to the tampered portion is not limited to the above-described technique for changing the color. The technique may take any method such as thickening or flashing the lines.

In step S362 shown in fig. 3B, the display control unit 112 of the image processing apparatus 101 receives the above-described detection result data 32 from the falsification detection server 103. In step S363, the display control unit 112 displays the falsification detection result on the screen based on the detection result data 32. In the present exemplary embodiment, in a case where the detection result data 32 indicates that a falsified portion is detected in the target document, the display control unit 112 displays an emphasized image of an area including the falsified portion and a comparison image on the screen in a contrasting manner. In this case, the emphasized image is an image emphasizing a tampered portion in the processing object image 21, and the comparison image is an image indicating the tampered portion as it is in the processing object image 21. This enables the user to grasp which part of the image is suspected of being tampered based on the emphasized image, and judge whether the image has been tampered by monitoring the color sense or shading of the relevant part in the comparison image. Examples of such comparative displays and some variations of the comparative images will be described below.

(2) Concrete processing flow (image processing apparatus) of tamper detection stage

Fig. 8 is a flowchart showing an example of a specific processing flow performed by the image processing apparatus 101 at the stage of tamper detection. The processing shown in fig. 8 is performed by the image processing apparatus 101 under the control of the CPU 201 that executes the controller program loaded from the storage device 208 into the RAM 204. The process may be started when a predetermined operation by the user is detected via the input device 209 of the image processing apparatus 101.

In step S801, the reading unit 111 reads the target original set on the conveying device 207 by using the scanner device 206 to generate a processing target image. The processing target image may be, for example, a full color (3 channels of RGB) image. In step S802, the reading unit 111 receives a tamper detection instruction input by the user via the input device 209. In step S803, the reading unit 111 transmits the processing object image and related data (e.g., data set ID) together with the tamper detection request to the tamper detection server 103 via the external I/F211. In step S804, the display control unit 112 waits for reception of the detection result data 32 from the tampering detection server 103. Upon receiving the detection result data 32 from the falsification detection server 103 via the external I/F211 (yes in step S804), the processing proceeds to step S805. In step S805, the display control unit 112 determines whether the detection result data 32 indicates that the target document includes a falsified portion. If the display control unit 112 determines that the target document includes a tampered portion (yes in step S805), the processing proceeds to step S806. In contrast, in a case where the display control unit 112 determines that the target document does not include a falsified portion (no in step S805), the processing proceeds to step S810. In step S806, the display control unit 112 determines whether to display the emphasized image and the comparison image in a contrasting manner as a falsification detection result or to display only the emphasized image as a falsification detection result based on the user input. If it is determined that the emphasized image and the comparison image are displayed in a contrasting manner (yes in step S806), the processing proceeds to step S807. If it is determined that only the emphasized image is displayed (no in step S806), the processing proceeds to step S808. In step S807, the display control unit 112 displays, on the screen of the display device 210, an emphasized image for emphasizing a tampered portion in the processing target image and a comparative image that represents the tampered portion as it is in the processing target image in a contrasting manner. Examples of the comparative display will be further explained below with reference to fig. 13A, 13B, and 15. In step S808, the display control unit 112 displays an emphasized image for emphasizing a tampered portion in the processing target image on the screen of the display device 210. Examples will be further apparent with reference to fig. 11B and 12. In step S810, the display control unit 112 displays information indicating that tampering has not been detected in the processing target image on the screen of the display device 210. In addition to displaying such a falsification detection result, the image processing apparatus 101 may store the processing target image and the emphasized image and the comparison image as image data in the storage device 208. Alternatively, the image processing apparatus 101 may transmit these images to other apparatuses via the external I/F211. Alternatively, one or more of these images may be printed by printer device 205.

(3) Specific processing flow of tamper detection stage (tamper detection server)

Fig. 9 is a flowchart showing an example of a specific processing flow performed by the tamper detection server 103 at the tamper detection stage. The processing shown in fig. 9 is performed by the tamper detection server 103 under the control of the CPU 261 executing the controller program loaded from the storage 265 into the RAM 264. The process may be started when a tamper detection request is received from the image processing apparatus 101 via the external I/F268. The process of waiting for a tamper detection request may be started when the power of the tamper detection server 103 is turned ON (ON).

In step S901, the image acquisition unit 131 receives a processing target image and related data (e.g., a data set ID) and a tamper detection request from the image processing apparatus 101 via the external I/F268. In step S902, the detection unit 132 transmits a request for providing a learned model to the learning apparatus 102 via the external I/F268, and acquires the learned model from the learning apparatus 102. The detection unit 132 obtains the learned model identified by, for example, the data set ID received together with the tamper detection request. The detection unit 132 builds a neural network model on, for example, the RAM 264, and reflects the values of the model parameters received from the learning device 102 to the built model. In step S903, the detection unit 132 transmits a request for OCR on the processing target image to the OCR server 104 via the external I/F268 together with the processing target image, and receives recognition result data representing an OCR result from the OCR server 104. In step S904, the detection unit 132 cuts out one character region image of the characters recognized in the processing target image from the processing target image, and applies the cut-out character region image to the learned model acquired in step S902. The detection unit 132 thus determines whether each of a plurality of pixels within the character area image belongs to a tampered portion. The character area image may be grayscaled before being applied to the learned model. The result of this determination is bitmap data similar to the binary image 612 shown in fig. 6C. In step S905, the detection unit 132 determines whether any unprocessed character region remains in the processing target image. If an unprocessed character region remains in the processing target image (yes in step S905), the processing returns to step S904. In step S904, the detection unit 132 repeats the determination of step S904 for the next character region. In contrast, if no unprocessed character region remains in the processing target image (no in step S905), the processing proceeds to step S906. In step S906, the detection unit 132 integrates the detection results acquired for the respective character regions as a result of the repetition of step S904 into one bitmap data, thereby generating a detection result image. In step S907, the detection unit 132 generates an emphasized image for emphasizing a tampered portion in the processing target image. In step S908, the detection unit 132 transmits the detection result data 32 including the detection result image and the emphasized image to the image processing apparatus 101 via the external I/F268.

In the above example, the detection unit 132 cuts out the character area image from the processing target image based on the OCR result. However, it is not necessary to use OCR. For example, if the target document is a table having a known format, the detection unit 132 may cut out an image in a partial area (for example, a square area) at a predetermined position from the processing target image as a character area image according to the known format.

<4. details of display control >

Fig. 10 is a schematic diagram showing an example of a GUI for setting and specifying tamper detection. The GUI may be displayed on the screen of the display device 210 under the control of the display control unit 112 of the image processing apparatus 101. The setup window 1000 shown in fig. 10 includes a preview area 1001, a tamper-emphasized button 1002, a change file name button 1003, a folder assignment button 1004, a scan button 1005, and a transmission start button 1006. The emphasis tampering button 1002 is used to set whether or not to enable the emphasis of a tampered portion at the time of displaying the tampering detection result. The change file name button 1003 is used to set whether automatic change of the file name of the read image data is enabled or not based on the falsification detection result. The folder assignment button 1004 is used to set whether automatic folder assignment for read image data is enabled or not based on the falsification detection result. In the setting window 1000, a button corresponding to the enabled setting may be displayed in an emphasized manner. The scan button 1005 is used to trigger reading of an original set on the image processing apparatus 101. When the user operates a scan button 1005 and the scanning is completed, a preview image of the read image is displayed in the preview area 1001. Before starting data transmission, the user may set another original on the image processing apparatus 101 and operate the scan button 1005 again to instruct the image processing apparatus 101 to store a plurality of read images and related data in the entirety in the image processing apparatus 101. When the scanning of at least one target original is completed, the start transmission button 1006 becomes operable. The start transmission button 1006 is used to input a tamper detection instruction for triggering transmission of the processing target image and the related data. When the user operates the start transmission button 1006, the reading unit 111 transmits the processing object image and the related data to the falsification detection server 103 together with the falsification detection request.

Fig. 11A and 11B are schematic diagrams showing an example of a GUI for displaying a list of tamper detection results and details of the tamper detection results. The GUI may be displayed on the screen of the display device 210 under the control of the display control unit 112. The list window 1100 shown in fig. 11A includes a list area 1101, a change file name button 1103, a folder assignment button 1104, and an OK button 1105. The list area 1101 displays a list of three list items 1102a, 1102b, and 1102c corresponding to the falsification detection results of three different object documents. Each list item includes a data item "original ID" for identifying the target original (or corresponding read image), a "date and time" indicating when tampering detection was performed, and "presence or absence of tampering" indicating whether tampering was detected in the original. At the right end of each list item, the image is previewed and read. In the example shown in fig. 11A, as a result of falsification detection, two object documents having document IDs of "Scan 1" and "Scan 3" are determined to include falsified portions. In contrast, the object document having the document ID of "Scan 2" is determined not to include a tampered portion. The display control unit 112 may determine that one or more pixels in the detection result image included in the detection result data 32 are determined as the object original belonging to the falsified portion to be "falsified". When the user operates (e.g., taps) the list item 1102a, the display control unit 112 displays a detailed window 1150 (fig. 11B) for displaying the object document identified by the document ID ═ Scan1 on the screen. The change file name button 1103 triggers execution of automatic change of the file name of the read image data based on the falsification detection result. If the user operates a change file name button 1103, the display control unit 112 automatically changes the file name of the read image data corresponding to the subject document whose check box has been checked. For example, in the case of read image data "with falsification", a predetermined prefix or suffix (e.g., "with falsification") indicating that falsification exists may be applied to the file name of the read image data. In the case of read image data that is "tamper-free", a predetermined prefix or suffix (e.g., "tamper-free") meaning that there is no tampering may be applied to the file name. The folder assignment button 1104 triggers execution of automatic folder assignment of read image data based on the falsification detection result. If the user operates the folder assignment button 1104, the display control unit 112 automatically assigns the read image data corresponding to the subject document whose check box has been selected to one of the plurality of folders. For example, the read image data "with falsification" may be stored in a first folder (for example, folder name "with falsification"). The "tamper-free" read image data may be stored in a second folder (e.g., folder name "tamper-free"). When the user operates the OK button 1105, the display control unit 112 ends the display of the list window 1100.

The detailed window 1150 shown in fig. 11B includes an image display area 1151, a correction button 1153, a contrast display button 1154, a send button 1155, a print button 1156, and an OK button 1157. The image display region 1151 is a region for displaying an emphasized image of the target document designated in the list window 1100. In the example shown in fig. 11B, the emphasized image 32B is displayed in the image display region 1151. When the user operates the correction button 1153, the display control unit 112 displays a correction window 1200 (described below) for enabling the user to correct a falsification detection result on the screen. When the user operates the contrast display button 1154, the display control unit 112 displays a contrast display window 1300 or 1350 (described below) in which the emphasized image and the comparison image are arranged in a contrasting manner on the screen. The contrast display button 1154 can become operable only for the target document determined as including the falsified portion as a result of the falsification detection. When the user operates the transmission button 1155, the display control unit 112 transmits either one or both of the processing target image and the emphasized image of the designated target document to the other apparatus. This sending of the image may be done using any method, such as attaching the image to an email, or sending an email or other message that describes a link to the image on a file server, etc. The transmission destination may be a destination registered in the image processing apparatus 101, or may be designated by the user via a destination pop-up window (not shown). When the user operates the print button 1156, the display control unit 112 instructs the printer apparatus 205 to print either one or both of the processing target image and the emphasized image of the designated target document. When the user operates the OK button 1157, the display control unit 112 ends the display of the detailed window 1150, and redisplays the list window 1100 on the screen.

Fig. 12 is a schematic diagram showing an example of a GUI for enabling a user to correct a tampering detection result. The GUI may be displayed on the screen of the display device 210 under the control of the display control unit 112. The correction window 1200 shown in fig. 12 includes an image display area 1201, a tamper pixel designation button 1203, a tamper pixel cancellation button 1204, and an OK button 1205. The image display area 1201 is an area for displaying a highlighted image of the designated target document. In the example shown in fig. 12, an emphasized image 32b is displayed in the image display area 1201. When the user operates the designate tampered pixel button 1203, the display control unit 112 changes the pixel designated by the user in the image display area 1201 to the pixel belonging to the tampered portion in the tamper detection result. When the user operates a tamper cancel pixel button 1204, the display control unit 112 changes the pixel specified by the user in the image display area 1201 to a pixel that does not belong to a tampered part in the tamper detection result. When the user operates the OK button 1205, the display control unit 112 reflects the correction of the falsification detection result by the user to the detection result data 32, and ends the display of the correction window 1200. When the user corrects the falsification detection result in this manner, automatic change of the file name or automatic folder assignment may be performed based on the corrected falsification detection result. These automatic functions are triggered by operating the buttons 1103 or 1104 in the list window 1100 shown in fig. 11A. This makes it possible to more accurately perform manual or systematic original processing after verifying the presence or absence of tampering (according to a folder name or a storage folder) based on the tamper detection result after correction.

When the user corrects the falsification detection result via the above-described correction window 1200, the display control unit 112 may update the learned model based on the falsification detection result after the correction. More specifically, the display control unit 112 transmits the pair of the processing target image and the detection result image after correction to the learning device 102 along with the model update request. When receiving a model update request from the display control unit 112, the learning unit 122 of the learning device 102 may update the learned model by using a character region image for each character region in the processing target image as an input image and using a character region image for the same character region in the detection result image as a teacher image. Therefore, the pattern of the pixel which is likely to be erroneously detected by the current learned model is relearned, thereby effectively improving the accuracy of falsification detection using the learned model.

Fig. 13A is a schematic diagram showing a first example of a GUI in which an emphasized image and a comparative image are arranged in a contrasting manner. The GUI may be displayed on the screen of the display device 210 under the control of the display control unit 112. In the first example, the GUI displays an emphasized image representing the entire target document and a comparison image. The contrast display window 1300 shown in fig. 13A includes an emphasized image display region 1301, a comparative image display region 1302, a character-based display button 1303, a correction button 1305, and an OK button 1306. In the example shown in fig. 13A, the emphasized image 32b is displayed in the emphasized image display region 1301, and the comparative image 21a is displayed in the comparative image display region 1302. The emphasized image 32b is an image for emphasizing pixels determined to belong to a falsified portion in the read image of the target document. The comparison image 21a identical to the processing target image 21a represents the pixels determined to belong to the falsified portion in the read image as they are (without emphasis). When the user operates the character-based display button 1303, the display control unit 112 changes the screen to a contrast display window 1350 (described below) in which the emphasized image and the comparison image cut out for each character region are arranged in a contrasting manner. When the user operates the correction button 1305, the display control unit 112 displays the above-described correction window 1200 that enables the user to correct the falsification detection result on the screen. When the user operates the OK button 1306, the display control unit 112 ends the contrast display.

Fig. 13B is a schematic diagram showing a second example of a GUI in which an emphasized image and a comparative image are arranged in a contrasting manner. The GUI may be displayed on the screen of the display device 210 under the control of the display control unit 112. In the second example, the GUI displays the emphasized image and the comparison image as character-based partial images (character region images). The comparison display window 1350 shown in fig. 13B includes a list area 1351, an overall display button 1353, a change comparison mode button 1354, and an OK button 1356. The list area 1351 displays character areas of one or more characters recognized in the target document in a list form in a vertically scrollable manner. In the example shown in fig. 13B, two list items 1352a and 1352B are displayed in the list area 1351. Each list item may be uniquely identified by a combination of the original ID of the target original and the number ("No.") assigned to each character region. The list area 1351 includes a highlighted image display area 1361 at the center and a comparison image display area 1362 at the right end. Each emphasized image display region 1361 displays a partial image of one character region in the emphasized image including the emphasized falsified portion. In this case, the partial image is also referred to as an emphasized image. Each comparison image display region 1362 displays a partial image of one character region in the processing target image including the falsified portion which is not emphasized. In this case, the partial images are also referred to as comparison images. However, the content of the comparison image displayed in the comparison image display area 1362 may be changed according to a comparison mode (described below). When the user operates (e.g., taps) the emphasized image display region 1361 or the compared image display region 1362, the display control unit 112 displays a correction window 1500 (described below) that enables the user to correct a falsification detection result for each character region on the screen. When the user operates the whole display button 1353, the display control unit 112 changes the screen to the contrast display window 1300 described above. When the user operates the change comparison mode button 1354, the display control unit 112 changes the comparison mode setting (described below) according to the user input. When the user operates the OK button 1356, the display control unit 112 ends the comparison display.

The display control unit 112 may support only a single comparison mode, or dynamically switch the comparison mode for comparison display between candidates of a plurality of comparison modes. In the case of a single comparison mode, the comparison display window 1350 need not include the change comparison mode button 1354. In the case of switching the comparison mode for the comparison display, candidates for the comparison mode may include, for example, two or more of the following modes:

comparison pattern C1: the comparison image is an image of a character area including a tampered portion that is not emphasized.

Comparison pattern C2: the comparison image includes a character region including an unreinforced falsified portion and a peripheral region of the character region.

Comparison pattern C3: the comparison image includes an image of another character region representing the same character as the character represented by the emphasized image.

Comparison pattern C4: the comparison image is an image of a character area including a falsified portion which is suppressed or not displayed.

The display control unit 112 may display a list of candidates of these comparison modes on the screen to enable the user to specify a desired comparison mode. Alternatively, the setting of the comparison mode may be toggled (changed in sequence) in a predetermined order by each user operation of the change comparison mode button 1354. The display control unit 112 can change the content of the comparison image to be displayed in the comparison image display area 1362 within the comparison display window 1350 in accordance with the comparison mode designated by the user in this manner.

Fig. 14A shows an example of a comparison image in the comparison pattern C1. A comparison image 1401 shown in fig. 14A is an image cut out from the processing target image at the same position and the same size as the character region of the character represented by the corresponding emphasized image. In the comparison image 1401, the tampered portion is not emphasized, but is represented as it is in the read image generated by reading the target document. The comparison pattern C1 enables the user to grasp at a glance the correspondence between the emphasized image and the portion made up of characters in the comparison image.

Fig. 14B shows an example of a comparison image in the comparison pattern C2. The comparison image 1402 shown in fig. 14B is an image cut out from the processing target image including the peripheral region of the comparison image 1401. In the comparison image 1402, the falsified portion is not emphasized, but is represented as it is in the read image generated by reading the target document. The comparison pattern C2 enables the user to evaluate elements such as color sense, shading, and stroke features by characters before and after the character suspected of being falsified, thereby verifying the falsification detection result.

Fig. 14C shows an example of a comparison image in the comparison pattern C3. The comparison mode C3 enables a user to display, for example, comparison images 1401 and 1403 in a comparison image display area 1362 within a comparison display window 1350. The comparison image 1403 is an image of another character region representing the same character as the character represented in the character region of the corresponding emphasized image. The display control unit 112 may determine whether two characters input at different positions are the same, for example, based on the OCR result. In addition, in the comparison image 1403, the tampered portion is not emphasized, but is represented as it is in the read image generated by the reading target document. The comparison pattern C3 enables the user to refer to elements such as color sense, shading, and stroke features of the same character as the character suspected of being falsified, thereby verifying the falsification detection result.

Fig. 14D shows an example of a comparison image in the comparison pattern C4. The comparison image 1404 shown in fig. 14D is an image cut out from the processing target image at the same position and the same size as the character area of the character represented by the corresponding emphasized image. However, unlike the comparison image 1401, a tampered portion in the comparison image 1404 is suppressed or not displayed. The comparison pattern C4 enables the user to view the contents of the original in a state where there is no stroke that is likely to be added by falsification, and verifies the falsification detection result by focusing only on pixels that are determined not to belong to a falsified portion.

Fig. 15 is a schematic diagram showing another example of a GUI for enabling a user to correct a tampering detection result. The GUI may be displayed on the screen of the display device 210 under the control of the display control unit 112. The correction window 1500 shown in fig. 15 includes an emphasized image display region 1501, a comparison image display region 1502, a change comparison mode button 1503, a designate tampered pixel button 1504, a cancel tampered pixel button 1505, and an OK button 1506. The emphasized image display region 1501 is a region for displaying an emphasized image of a character region in the designated target document. In the example shown in fig. 15, an emphasized image 1511 is displayed in the emphasized image display region 1501. The comparison image display area 1502 is an area for displaying a comparison image of a character area in the designated target document. In the example shown in fig. 15, a comparison image 1512 is displayed in the comparison image display area 1502. In this example, the above-described comparison pattern C1 is set. When the user operates the change comparison mode button 1503, the display control unit 112 changes the setting of the comparison mode between the above-described two or more candidates of the comparison mode, and displays a comparison image corresponding to the newly set comparison mode in the comparison image display area 1502. When the user operates the designate tampered pixel button 1504, the display control unit 112 changes the pixel designated by the user in the emphasized image display region 1501 to the pixel belonging to the tampered portion in the tamper detection result. When the user operates the tamper cancel pixel button 1505, the display control unit 112 changes the pixel specified by the user in the emphasized image display region 1501 to a pixel that does not belong to a tampered portion in the tamper detection result. When the user operates the OK button 1506, the display control unit 112 reflects the correction of the falsification detection result by the user to the detection result data 32, and ends the display of the correction window 1500.

Fig. 16 is a flowchart illustrating an example of a specific display control processing flow performed by the image processing apparatus 101 in a case where tampering of the target original is detected. The processing shown in fig. 16 is performed by the image processing apparatus 101 under the control of the CPU 201 that executes the controller program loaded from the storage device 208 to the RAM 204. The process may be equivalent to step S807 shown in fig. 8.

In step S1601, the display control unit 112 determines which of the contrast display and the entire contrast display based on the character is designated as the display mode of the contrast display. For example, when the user operates the contrast display button 1154 of the detailed window 1150 shown in fig. 11B or the entire display button 1353 of the contrast display window 1350 shown in fig. 13B, the display control unit 112 may determine that the entire contrast display is specified. In contrast, when the user operates the character-based display button 1303 of the contrast display window 1300 illustrated in fig. 13A, the display control unit 112 may determine that character-based contrast display is specified. If the display control unit 112 determines that the character-based contrast display is designated (yes in step S1601), the processing proceeds to step S1602. In contrast, if the display control unit 112 determines that the whole contrast display is designated (no in step S1601), the processing proceeds to step S1620.

In step S1602, the display control unit 112 acquires character area data indicating the position and size of one or more character areas in an image. For example, the display control unit 112 may receive the recognition result data 31 indicating the result of character recognition by the OCR server 104 together with the detection result data 32 from the tampering detection server 103 as character area data. Alternatively, if the target original is a table having a known format, the display control unit 112 may acquire character area data including a predefined position and size of the character area included in the known format from the storage 208. The subsequent steps S1603 and S1612 are repeated for each character region including pixels determined to belong to a tampered portion based on the detection result data 32, which are included in the character region indicated by the character region data. Referring to the processing object image 21a and the detection result image 32a shown in fig. 7A, for example, a character region 701a and two adjacent square character regions on the right side of the character region 701a are character regions including pixels judged to belong to a tampered portion. Thus, steps S1603 to S1612 are repeated for the three character areas.

In step S1603, the display control unit 112 selects one of the character areas including the tampered portion. Hereinafter, the selected character area is referred to as a selection area. In step S1604, the display control unit 112 cuts out a partial image of the selection area from the emphasized image according to the position and size indicated in the character area data. In step S1605, the display control unit 112 determines whether the currently set comparison mode is the comparison mode C2. If the comparison mode C2 is currently set (yes in step S1605), the processing proceeds to step S1606. If the comparison mode C1, C3, or C4 is currently set (no in step S1605), the processing proceeds to step S1607. In step S1606, the display control unit 112 cuts out a partial image including the selection area and the peripheral area outside the selection area from the read image. As an example, the size of the partial image including the peripheral region may be W times the size of the selection region in the horizontal direction and H times the size of the selection region in the vertical direction (magnifications W and H are larger than a preset value 1, for example, W ═ 4 and H ═ 2). As another example, the peripheral region may be dynamically set as a region including N characters (N is a preset integer) in the vicinity of the selection region. In the comparison mode C2 (yes in step S1605), the partial image cut out in step S1606 is used as a comparison image. In the comparison mode C1, C3, or C4 (no in step S1605), the processing proceeds to step S1607. In step S1607, the display control unit 112 cuts out a partial image of the selection area from the read image. In the case where the comparison mode C1 is currently set, the partial image cut out in step S1607 is used as the comparison image. If the comparison mode C3 is currently set (yes in step S1608), the processing proceeds to step S1609. In step S1609, the display control unit 112 cuts out, from the read image, a partial image of the other character area or areas that represent the same character as the character represented in the selection area. The other character region is desirably a region excluding the pixels judged to belong to the tampered portion. The display control unit 112 may request the OCR server 104 to perform OCR, and recognize another character area representing the same character as the character represented in the selection area based on the character recognition result returned from the OCR server 104. If such other character regions do not exist in the target document, step S1609 may be skipped. In the comparison mode C3, a combination of partial images cut out in steps S1607 and S1609 (for example, an image including two partial images arranged side by side) is used as a comparison image. If the comparison mode C4 is currently set (no in step S1608 and yes in step S1610), the processing proceeds to step S1611. In step S1611, the display control unit 112 suppresses the values of pixels determined to belong to a tampered portion in the partial image cut out in step S1607. Examples of inhibition are: the display control unit 112 corrects the pixel value to the same color as the background color, such as white, or a color close to the background color. In the comparison mode C4, the partial image processed in this manner is used as a comparison image. In step S1612, the display control unit 112 determines whether there is an unprocessed character region including the pixels determined to belong to the tampered portion. If such unprocessed character regions remain (yes in step S1612), the process returns to step S1603. Then, the display control unit 112 repeats the above steps S1603 to S1611 for the next character area. If no unprocessed character regions remain (no in step S1612), the process advances to step S1613.

In step S1613, the display control unit 112 displays one or more pairs of emphasized images and compared images for each character region in a contrasting manner in a contrasting display window. Such character-based contrast display enables a user to grasp which part of each character is likely to be tampered with based on the emphasized image, and to judge whether each character has been tampered with by checking the color sense or shade of the relevant part in the compared image.

In step S1620, the display control unit 112 displays the emphasized image representing the entire target document and the comparison image (read image) in a contrasting manner in the contrast display window. As described above, the present exemplary embodiment enables smooth switching between the entire contrast display representing the entire original and the above-described character-based contrast display. Thus, the user can switch between rough grasp of where the character in the document is likely to be falsified in the entire display and verification of each individual character in the display based on the character, and thus the user can efficiently verify the falsification detection result by the target document.

In fig. 16, the display control unit 112 uses the emphasized image received from the falsification detection server 103. However, the display control unit 112 may generate an emphasized image from the processing object image (read image) based on the falsification detection result.

<5. modified example >

The above description has focused on the technique for applying the character area image in each character area to the learned model to determine whether each pixel in the character area belongs to a tampered portion. However, the technique for tamper detection is not limited to the above example. For example, as a first modification, tamper detection may be performed not in units of pixels but in units of characters. Character-based tamper detection may utilize, for example, a learned model generated and/or updated by using, as teacher data, a flag indicating whether each character image includes a tampered portion instead of using a teacher image (e.g., binary image 612 shown in fig. 6C) indicating whether each pixel belongs to a tampered portion. The learned model may be any judgment type model that outputs a bit indicating whether or not one character area image (input image) includes a tampered portion. For example, a Visual Geometry Group (VGG) as a neural network model may be utilized.

Fig. 17 shows an example of a GUI in which an emphasized image and a comparative image are arranged in a contrasting manner based on the result of tampering detection for each character according to the first modification. The comparison display window 1700 shown in fig. 17 includes an emphasized image display region 1701, a comparison image display region 1702, a character-based display button 1703, a correction button 1305, and an OK button 1306. In the example shown in fig. 17, an emphasized image 1732 is displayed in the emphasized image display region 1701, and a comparative image 1721 is displayed in the comparative image display region 1702. The emphasized image 1732 is an image for emphasizing characters in a character region determined to include a tampered portion in the read image of the target document. The comparison image 1721 is identical to the processing target image 21 a. The comparison image 1721 is an image in which characters in a character region determined to include a falsified portion are represented as they are (without emphasis) in the read image. When the user operates the character-based display button 1703, the display control unit 112 may change the screen to a contrast display window (not shown) in which the emphasized image and the comparison image cut out for each character region are arranged in a contrasting manner.

In the second modification, instead of using the judgment type model, tampering detection may be performed by using an automatic encoder type model (for example, a Variational Auto Encoder (VAE)) for encoding a character region image to extract a feature amount of a character. The learning process of VAE includes an encoder process for calculating a dimension-reduced feature amount based on an input image and a decoder process for restoring the input image based on the calculated feature amount. The encoder has a neural network structure, and the decoder has an inverse structure of the encoder. The model error is evaluated as the difference (e.g., cross entropy) between the input image and the restored image. The values of the model parameters are adjusted by, for example, a back propagation method so that the error is reduced. In this case, the falsified learning image and teacher data are not used. In the learning stage, the learning device 102 generates and/or updates a learned model for extracting feature amounts of untampered characters from character region images by learning processing using a plurality of original learning images. In general, it is not realistic to generate and/or update a single model from which appropriate feature quantities can be extracted for all characters. Thus, the learning device 102 can learn the values of different model parameters for each character type. For example, the characters "1", "2", and "3" may be treated as different character types. In the falsification detection stage, for example, the falsification detection server 103 applies each character area image to an encoder of a learned model corresponding to a character type recognized as a result of OCR, and extracts a feature amount from the character area image. The falsification detection server 103 acquires reference feature amounts extracted in advance for known untampered character region images having the same character type. The falsification detection server 103 may determine that the character area image does not include a falsified portion if a difference between the two feature amounts satisfies a predetermined condition (for example, if a Manhattan distance (Manhattan distance) is equal to or smaller than a threshold value). In contrast, if the manhattan distance between the feature amount extracted from the character area image and the reference feature amount exceeds the above threshold, the falsification detection server 103 may determine that the character area image includes a falsified portion. According to the present modification, as in the first modification, the falsification detection server 103 determines whether each character recognized in the processing target image includes a falsified portion (i.e., falsification detection based on characters). The comparative display of the emphasized image and the comparative image according to the present modification can be performed in a manner similar to the example described above with reference to fig. 17.

The above-described exemplary embodiments mainly focus on an example in which the emphasized image and the comparative image are horizontally arranged in the comparative display window. However, the emphasized image and the comparison image may be arranged in any desired direction other than the horizontal direction. The emphasized image and the compared image may also be displayed in different windows. As a third modification, the emphasized image and the comparative image may be displayed at different timings instead of being spatially arranged in a contrasting manner. For example, X seconds of emphasized image display and X seconds of comparative image display may be alternately and repeatedly performed in a single image display area. The term "contrast display" according to the present specification includes all of these display modes.

<6. summary >, a pharmaceutical composition comprising the same

Exemplary embodiments of the present invention are described above in detail with reference to fig. 1 to 17. According to the above-described exemplary embodiment, a falsified portion included in an object original is detected by using a read image of the object original, and an emphasized image for emphasizing the falsified portion in the read image and a comparison image that indicates the falsified portion as it is in the read image are displayed on a screen in a contrasting manner. When the user verifies the falsification detection result, the above-described structure makes it possible to clearly present to the user which portion in the image is likely to be falsified while presenting to the user the original color sense and shading of the relevant portion in the comparison image. This enables the user to grasp the position of the tampered portion and easily verify the detection result based on the color sense or shade of the relevant portion.

The above-described exemplary embodiments enable the emphasized image and the comparison image to be displayed in a contrasting manner according to a comparison mode selected by the user from two or more comparison modes different in content of the comparison image. The first comparison mode enables the user to immediately grasp the correspondence between the emphasized image and the portion constituted by the characters in the comparison image. The second comparison mode enables the user to evaluate elements such as color sense, shading, and stroke features based on characters before and after the character that is likely to be falsified, thereby verifying the falsification detection result. The third comparison mode enables the user to refer to elements such as color sensation, shading, and stroke features of the same character (in another character region) as the character that is likely to be falsified, thereby verifying the falsification detection result. The fourth comparison mode enables the user to view the contents of the original in a state where there is no stroke that is likely to be added by falsification, and verifies the falsification detection result by focusing only on pixels that are determined not to belong to a falsified portion. Allowing flexible switching between these comparison modes enables the user to efficiently perform operations for verifying the tampering detection result.

The above-described exemplary embodiments may provide a user with a user interface that enables the user to correct a tampering detection result indicating which portion of a read image is determined to have been tampered with. The above-described configuration enables the user to immediately correct a detection error when the user finds the detection error by monitoring the contrast display of the emphasized image and the compared image. This makes it possible to smoothly transfer the tamper detection result appropriately corrected by the user to a manual or systematic process after verifying the presence or absence of tampering.

Certain exemplary embodiments make it possible to determine whether each of a plurality of pixels in a read image belongs to a tampered portion, and emphasize a pixel determined to belong to the tampered portion in an emphasized image. In this case, the detailed tamper detection result may be visually presented to the user. For example, in the case of tampering in which strokes are added to individual body characters, only the strokes are emphasized. A specific modification makes it possible to determine whether each of one or more character areas in a read image includes a tampered portion, and emphasize a character in a character area determined to include a tampered portion in an emphasized image. In this case, since the load of the calculation processing required for tamper detection is low, even if the read image has a large amount of data, the tamper detection result can be presented to the user quickly.

OTHER EMBODIMENTS

The embodiments of the present invention can also be realized by a method in which software (programs) that perform the functions of the above-described embodiments are supplied to a system or an apparatus through a network or various storage media, and a computer or a Central Processing Unit (CPU), a Micro Processing Unit (MPU) of the system or the apparatus reads out and executes the methods of the programs.

While the present invention has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.

40页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种遮光控制系统

Image processing system, image processing method, and storage medium

相关技术

网友询问留言