Automatic image correction using machine learning

文档序号：1102610 发布日期：2020-09-25 浏览：11次中文

阅读说明：本技术 使用机器学习的自动图像校正 (Automatic image correction using machine learning ) 是由克里斯蒂安·坎顿·费勒布莱恩·多汉斯基托马斯·沃德·迈耶乔纳森·莫顿于 2017-12-28 设计创作，主要内容包括：在一个实施例中,计算系统可以访问人的训练图像和参考图像以及不完整图像。生成器可以基于不完整图像生成修复图像,并且判别器可以用于确定修复图像、训练图像和参考图像中的每一个是否可能由生成器生成。系统可以基于该确定来计算损失,并相应地更新判别器。使用更新的判别器,系统可以确定由生成器生成的第二修复图像是否可能由生成器生成。系统可以基于该确定来计算损失,并相应地更新生成器。一旦训练完成,生成器可用于生成给定图像的修改版本,例如即使人的眼睛在输入图像中是闭着的,也使人的眼睛看起来是睁开的。(In one embodiment, a computing system may access training images and reference images of a person and an incomplete image. The generator may generate a repair image based on the incomplete image, and the discriminator may be used to determine whether each of the repair image, the training image, and the reference image are likely to be generated by the generator. The system may calculate the loss based on this determination and update the arbiter accordingly. Using the updated discriminator, the system may determine whether the second repair image generated by the generator is likely to be generated by the generator. The system may calculate the loss based on the determination and update the generator accordingly. Once training is complete, the generator may be used to generate a modified version of the given image, for example to make the eyes of a person appear open even though the eyes are closed in the input image.)

1. A method comprising, by a computing system:

accessing a training image of a person, a reference image of the person, and an incomplete image, wherein the incomplete image corresponds to the training image with removed portions, wherein the training image, the reference image, and the incomplete image are associated with a training sample of a training dataset;

generating, using a generator, a repair image based on the incomplete image, wherein the repair image includes a repair portion corresponding to the removed portion of the incomplete image;

determining, using a discriminator, whether each of the repair image, the training image, and the reference image is likely to be generated by the generator;

calculating a first loss based on the determination of whether the repair image is likely to be generated by the generator;

calculating a second loss based on the determination of whether the training image is likely to be generated by the generator;

calculating a third loss based on the determination of whether the reference image is likely to be generated by the generator;

updating the discriminator based on the first loss, the second loss, and the third loss;

determining, using an updated discriminator, whether a second repair image generated by the generator is likely to be generated by the generator, wherein the second repair image is associated with a second training image;

calculating a fourth loss based on the determination of whether the second repair image is likely to be generated by the generator; and

updating the generator based on the fourth loss;

wherein the updated generator is configured to receive an input image and generate a modified version of the input image, wherein the modified version of the input image comprises a portion repaired by the updated generator.

2. The method of claim 1, wherein prior to generating the repair image using the generator, the method further comprises:

calculating a reconstruction loss based on a third training image and an associated third repair image generated by the generator; and

updating the generator based on the reconstruction loss.

3. The method of claim 1, wherein the removed portion and the repaired portion are associated with a facial feature type.

4. The method of claim 3, wherein prior to generating the repair image using the generator, the method further comprises:

training an auto-encoder using training samples of the image of the facial feature type, the auto-encoder comprising an encoder and a decoder;

accessing a third training image and an associated third repair image generated by the generator;

generating, using the encoder, a first encoded representation of a portion of the third training image associated with the facial feature type;

generating, using the encoder, a second encoded representation of a portion of the third repair image associated with the facial feature type;

calculating a perceptual loss based on the first encoded representation and the second encoded representation; and

updating the generator based on the perceptual loss.

5. The method of claim 4, wherein the training sample of the image of the facial feature type is associated with a second training data set different from the training data set.

6. The method of claim 1, wherein the training image and the reference image depict the person's face from different perspectives.

7. The method of claim 1, wherein the first and second light sources are selected from the group consisting of,

wherein the input image is an image of a second person with eyes closed or partially closed; and

wherein the modified version of the input image is an image of the second person with open eyes.

8. A system, comprising: one or more processors and one or more computer-readable non-transitory storage media coupled to the one or more processors, the one or more computer-readable non-transitory storage media comprising instructions that are operable when executed by the one or more processors to cause the system to perform operations comprising:

generating, using a generator, a repair image based on the incomplete image, wherein the repair image includes a repair portion corresponding to the removed portion of the incomplete image;

determining, using a discriminator, whether each of the repair image, the training image, and the reference image is likely to be generated by the generator;

calculating a first loss based on the determination of whether the repair image is likely to be generated by the generator;

calculating a second loss based on the determination of whether the training image is likely to be generated by the generator;

calculating a third loss based on the determination of whether the reference image is likely to be generated by the generator;

updating the discriminator based on the first loss, the second loss, and the third loss;

calculating a fourth loss based on the determination of whether the second repair image is likely to be generated by the generator; and

updating the generator based on the fourth loss;

9. The system of claim 8, wherein prior to generating the repair image using the generator, wherein the processor, when executing the instructions, is further operable to perform operations comprising:

calculating a reconstruction loss based on a third training image and an associated third repair image generated by the generator; and

updating the generator based on the reconstruction loss.

10. The system of claim 8, wherein the removed portion and the repaired portion are associated with a facial feature type.

11. The system of claim 10, wherein prior to generating the repair image using the generator, the processor when executing the instructions is further operable to perform operations comprising:

training an auto-encoder using training samples of the image of the facial feature type, the auto-encoder comprising an encoder and a decoder;

accessing a third training image and an associated third repair image generated by the generator;

generating, using the encoder, a first encoded representation of a portion of the third training image associated with the facial feature type;

generating, using the encoder, a second encoded representation of a portion of the third repair image associated with the facial feature type;

calculating a perceptual loss based on the first encoded representation and the second encoded representation; and

updating the generator based on the perceptual loss.

12. The system of claim 11, wherein the training sample of the image of the facial feature type is associated with a second training data set different from the training data set.

13. The system of claim 8, wherein the training image and the reference image depict the person's face from different angles.

14. The system of claim 8, wherein the first and second sensors are arranged in a single package,

wherein the input image is an image of a second person with eyes closed or partially closed; and

wherein the modified version of the input image is an image of the second person with open eyes.

15. One or more computer-readable non-transitory storage media embodying software that is operable when executed to cause one or more processors to perform operations comprising:

generating, using a generator, a repair image based on the incomplete image, wherein the repair image includes a repair portion corresponding to the removed portion of the incomplete image;

determining, using a discriminator, whether each of the repair image, the training image, and the reference image is likely to be generated by the generator;

calculating a first loss based on the determination of whether the repair image is likely to be generated by the generator;

calculating a second loss based on the determination of whether the training image is likely to be generated by the generator;

calculating a third loss based on the determination of whether the reference image is likely to be generated by the generator;

updating the discriminator based on the first loss, the second loss, and the third loss;

calculating a fourth loss based on the determination of whether the second repair image is likely to be generated by the generator; and

updating the generator based on the fourth loss;

16. The media of claim 15, wherein prior to generating the repair image using the generator, the software is further operable when executed to cause the one or more processors to perform operations comprising:

calculating a reconstruction loss based on a third training image and an associated third repair image generated by the generator; and

updating the generator based on the reconstruction loss.

17. The media of claim 15, wherein the removed portion and the repaired portion are associated with a facial feature type.

18. The media of claim 17, wherein prior to generating the repair image using the generator, the software is further operable when executed to cause the one or more processors to perform operations comprising:

training an auto-encoder using training samples of the image of the facial feature type, the auto-encoder comprising an encoder and a decoder;

accessing a third training image and an associated third repair image generated by the generator;

generating, using the encoder, a first encoded representation of a portion of the third training image associated with the facial feature type;

generating, using the encoder, a second encoded representation of a portion of the third repair image associated with the facial feature type;

calculating a perceptual loss based on the first encoded representation and the second encoded representation; and

updating the generator based on the perceptual loss.

19. The media of claim 18, wherein the training sample of the image of the facial feature type is associated with a second training data set different from the training data set.

20. The medium of claim 15, wherein the training image and the reference image depict the person's face from different perspectives.

21. A method comprising, by a computing system:

generating, using a generator, a repair image based on the incomplete image, wherein the repair image includes a repair portion corresponding to the removed portion of the incomplete image;

determining, using a discriminator, whether each of the repair image, the training image, and the reference image is likely to be generated by the generator;

calculating a first loss based on the determination of whether the repair image is likely to be generated by the generator;

calculating a second loss based on the determination of whether the training image is likely to be generated by the generator;

calculating a third loss based on the determination of whether the reference image is likely to be generated by the generator;

updating the discriminator based on the first loss, the second loss, and the third loss;

calculating a fourth loss based on the determination of whether the second repair image is likely to be generated by the generator; and

updating the generator based on the fourth loss;

22. The method of claim 21, wherein prior to generating the repair image using the generator, the method further comprises:

calculating a reconstruction loss based on a third training image and an associated third repair image generated by the generator; and

updating the generator based on the reconstruction loss.

23. The method of claim 21 or 22, wherein the removed portion and the repaired portion are associated with a facial feature type.

24. The method of claim 23, wherein prior to generating the repair image using the generator, the method further comprises:

training an auto-encoder using training samples of the image of the facial feature type, the auto-encoder comprising an encoder and a decoder;

accessing a third training image and an associated third repair image generated by the generator;

generating, using the encoder, a first encoded representation of a portion of the third training image associated with the facial feature type;

generating, using the encoder, a second encoded representation of a portion of the third repair image associated with the facial feature type;

calculating a perceptual loss based on the first encoded representation and the second encoded representation; and

updating the generator based on the perceptual loss.

25. The method of claim 24, wherein the training sample of the image of the facial feature type is associated with a second training data set different from the training data set.

26. The method of any of claims 21 to 25, wherein the training image and the reference image depict the person's face from different perspectives.

27. The method of any one of claims 21 to 26,

wherein the input image is an image of a second person with eyes closed or partially closed; and

wherein the modified version of the input image is an image of the second person with open eyes.

28. A system, comprising: one or more processors and one or more computer-readable non-transitory storage media coupled to the one or more processors, the one or more computer-readable non-transitory storage media comprising instructions which, when executed by the one or more processors, are operable to cause the system to perform a method according to any one of claims 21 to 27 or perform operations comprising:

generating, using a generator, a repair image based on the incomplete image, wherein the repair image includes a repair portion corresponding to the removed portion of the incomplete image;

determining, using a discriminator, whether each of the repair image, the training image, and the reference image is likely to be generated by the generator;

calculating a first loss based on the determination of whether the repair image is likely to be generated by the generator;

calculating a second loss based on the determination of whether the training image is likely to be generated by the generator;

calculating a third loss based on the determination of whether the reference image is likely to be generated by the generator;

updating the discriminator based on the first loss, the second loss, and the third loss;

calculating a fourth loss based on the determination of whether the second repair image is likely to be generated by the generator; and

updating the generator based on the fourth loss;

29. One or more computer-readable non-transitory storage media embodying software that is operable when executed to cause one or more processors to perform a method according to any one of claims 21 to 27 or to perform operations comprising:

generating, using a generator, a repair image based on the incomplete image, wherein the repair image includes a repair portion corresponding to the removed portion of the incomplete image;

determining, using a discriminator, whether each of the repair image, the training image, and the reference image is likely to be generated by the generator;

calculating a first loss based on the determination of whether the repair image is likely to be generated by the generator;

calculating a second loss based on the determination of whether the training image is likely to be generated by the generator;

calculating a third loss based on the determination of whether the reference image is likely to be generated by the generator;

updating the discriminator based on the first loss, the second loss, and the third loss;

calculating a fourth loss based on the determination of whether the second repair image is likely to be generated by the generator; and

updating the generator based on the fourth loss;

Technical Field

The present disclosure relates generally to computer image processing.

Background

A social networking system, which may include a social networking website, may enable its users (e.g., individuals or organizations) to interact with it and with each other through it. The social networking system may create and store a user profile (user profile) associated with the user in the social networking system with input from the user. The user profile may include demographic information, communication channel information, and information about personal interests of the user. The social networking system may also create and store relationship records of the user with other users of the social networking system using input from the user, as well as provide services to facilitate social interactions between or among users. For example, a user may post photos on a social networking system and allow other users to view, comment, and tag the photos.

When photographing a person, one common phenomenon is that a snapshot of the human subject at a moment may be less than ideal. For example, the subject's eyes may be closed or partially closed, the mouth may be shaped in an unattractive manner, or the nose may be wrinkled. The difficulty of capturing an ideal instantaneous photograph is even higher when the subject includes multiple people, due to the inherent difficulty of posing everyone at the instant the photograph is taken. While image processing software may allow a user to manually edit a photograph to correct for unwanted features, the process is cumbersome and time consuming, requires advanced image processing skills, and the range that can be edited may be limited.

Summary of the specific embodiments

The subject matter described herein provides an automated process for modifying an image (e.g., a frame in a photograph or video). In certain embodiments, undesired portions of an image may be automatically replaced with desired substitutes. For example, particular embodiments may take as input an image of a person whose eyes are closed (undesired portion) and output a modified version of the image, with the eyes of the person open. In particular embodiments, a machine learning model may be trained to replace image pixels corresponding to a particular facial feature (e.g., eyes, mouth, etc.) (including surrounding areas) of a person with automatically generated pixels that depict how the facial feature may appear if the facial feature is positioned differently (e.g., eyes open, mouth smiling, etc.). In particular embodiments, the machine learning model may be based on generating a countermeasure network (GAN) and trained using training data samples, each training data sample including a training image of a person and one or more additional reference images of the person. Based on these images, the machine learning model can learn how certain feature types should be replaced to produce realistic modifications (e.g., experiments have shown that even the reflection patterns in a pair of machine-generated eyes appear realistic in the context of the underlying images).

In an embodiment, a method includes, by a computing system:

generating a repaired (in-padded) image based on the incomplete image using a generator, wherein the repaired image includes a repaired portion corresponding to the removed portion of the incomplete image;

determining, using a discriminator, whether each of the repair image, the training image, and the reference image is likely to be generated by a generator;

calculating a first loss based on the determination of whether the repair image is likely to be generated by the generator;

calculating a second loss based on the determination of whether the training image is likely to be generated by the generator;

calculating a third loss based on the determination of whether the reference image is likely to be generated by the generator;

updating the discriminator based on the first loss, the second loss, and the third loss;

determining, using the updated discriminator, whether a second repair image generated by the generator is likely to be generated by the generator, wherein the second repair image is associated with a second training image;

calculating a fourth loss based on the determination of whether the second repair image is likely to be generated by the generator; and

updating the generator based on the fourth loss;

wherein the updated generator is configured to receive the input image and generate a modified version of the input image, wherein the modified version of the input image includes the portion repaired by the updated generator.

In an embodiment, before generating the repair image using the generator, the method further comprises:

calculating a reconstruction loss based on the third training image and the associated third repair image generated by the generator; and

the generator is updated based on the reconstruction loss.

In an embodiment, the removed portion and the repaired portion are associated with facial feature types.

In an embodiment, before generating the repair image using the generator, the method further comprises:

training an auto-encoder using training samples of an image of a facial feature type, the auto-encoder comprising an encoder and a decoder;

accessing a third training image and an associated third repair image generated by the generator;

generating, using an encoder, a first encoded representation of a portion of a third training image associated with a facial feature type;

generating, using the encoder, a second encoded representation of a portion of the third repair image associated with the facial feature type;

calculating a perceptual loss based on the first encoded representation and the second encoded representation; and

the generator is updated based on the perceptual loss.

In an embodiment, the training samples of the images of the facial feature type are associated with a second training data set different from the training data set.

In an embodiment, the training image and the reference image depict a person's face from different angles.

In an embodiment, the input image is an image of a second person with eyes closed or partially closed; and the modified version of the input image is an image of the second person with their eyes open.

In an embodiment, there is provided a system comprising: one or more processors and one or more computer-readable non-transitory storage media coupled to the one or more processors, the one or more computer-readable non-transitory storage media comprising instructions, which when executed by the one or more processors, are operable to cause a system to perform a method according to any embodiment of the invention or to perform operations comprising:

generating, using a generator, a repair image based on the incomplete image, wherein the repair image includes a repair portion corresponding to the removed portion of the incomplete image;

determining, using a discriminator, whether each of the repair image, the training image, and the reference image is likely to be generated by a generator;

calculating a first loss based on the determination of whether the repair image is likely to be generated by the generator;

calculating a second loss based on the determination of whether the training image is likely to be generated by the generator;

calculating a third loss based on the determination of whether the reference image is likely to be generated by the generator;

updating the discriminator based on the first loss, the second loss, and the third loss;

calculating a fourth loss based on the determination of whether the second repair image is likely to be generated by the generator; and

updating the generator based on the fourth loss;

In an embodiment, one or more computer-readable non-transitory storage media are provided that contain software that when executed are operable to cause one or more processors to perform a method according to any embodiment of the invention or to perform operations comprising:

generating, using a generator, a repair image based on the incomplete image, wherein the repair image includes a repair portion corresponding to the removed portion of the incomplete image;

determining, using a discriminator, whether each of the repair image, the training image, and the reference image is likely to be generated by a generator;

calculating a first loss based on the determination of whether the repair image is likely to be generated by the generator;

calculating a second loss based on the determination of whether the training image is likely to be generated by the generator;

calculating a third loss based on the determination of whether the reference image is likely to be generated by the generator;

updating the discriminator based on the first loss, the second loss, and the third loss;

calculating a fourth loss based on the determination of whether the second repair image is likely to be generated by the generator; and

updating the generator based on the fourth loss;

The embodiments disclosed herein are merely examples, and the scope of the present disclosure is not limited to them. Particular embodiments may include all, some, or none of the components, elements, features, functions, operations, or steps of the embodiments disclosed above. Embodiments in accordance with the present invention are specifically disclosed in the accompanying claims directed to methods, storage media, systems, and computer program products, wherein any feature mentioned in one claim category (e.g., method) may also be claimed in another claim category (e.g., system). The dependencies or references back in the appended claims are chosen for formal reasons only. However, any subject matter resulting from an intentional back-reference to any previous claim (in particular multiple dependencies) may also be claimed, such that any combination of a claim and its features is disclosed and may be claimed regardless of the dependency selected in the appended claims. The claimed subject matter comprises not only the combination of features set out in the appended claims, but also any other combination of features in the claims, wherein each feature mentioned in the claims may be combined with any other feature or combination of features in the claims. Furthermore, any embodiments and features described or depicted herein may be claimed in a separate claim and/or in any combination with any embodiment or feature described or depicted herein or with any feature of the appended claims.

38页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种筛选局部特征点的方法及装置

Automatic image correction using machine learning

相关技术

网友询问留言