Image processing method, image processing device, electronic equipment and storage medium

文档序号:1954556 发布日期:2021-12-10 浏览:9次 中文

阅读说明:本技术 图像处理方法、装置、电子设备及存储介质 (Image processing method, image processing device, electronic equipment and storage medium ) 是由 徐屹 于 2020-05-21 设计创作,主要内容包括:本公开关于一种图像处理方法、装置、电子设备及存储介质,属于计算机视觉领域。该方法包括:获取待处理的人体图像,对人体图像的人体区域进行分区处理,得到N个子图像区域;分别对N个子图像区域中的每个子图像区域进行边缘提取和色块提取,得到边缘提取结果和N个色块,其中,色块提取是指将每个子图像区域包括的各个像素点的颜色值统一配置为相同数值,相同数值是根据各个像素点的原始颜色值确定的;将边缘提取结果叠加到N个色块上,得到融合图像;对人体图像的人体区域进行人体关键点预测,得到人体关键点预测结果;基于人体关键点预测结果和设置的褶皱出现规则,在融合图像上绘制褶皱得到目标图像。本公开丰富了图像处理方式且处理效果佳。(The disclosure relates to an image processing method, an image processing device, electronic equipment and a storage medium, and belongs to the field of computer vision. The method comprises the following steps: acquiring a human body image to be processed, and carrying out partition processing on a human body area of the human body image to obtain N sub-image areas; respectively carrying out edge extraction and color lump extraction on each sub-image region in the N sub-image regions to obtain an edge extraction result and N color lumps, wherein the color lump extraction refers to uniformly configuring color values of all pixel points included in each sub-image region into the same value, and the same value is determined according to original color values of all the pixel points; superposing the edge extraction result to the N color blocks to obtain a fused image; predicting human body key points in a human body region of a human body image to obtain a human body key point prediction result; and drawing folds on the fusion image to obtain a target image based on the human body key point prediction result and the set fold occurrence rule. The method and the device enrich the image processing modes and have good processing effect.)

1. An image processing method, characterized in that the method comprises:

acquiring a human body image to be processed, and carrying out partition processing on a human body area of the human body image to obtain N sub-image areas, wherein the value of N is a positive integer;

respectively carrying out edge extraction and color block extraction on each sub-image region in the N sub-image regions to obtain an edge extraction result and N color blocks, wherein the color block extraction refers to uniformly configuring color values of all pixel points included in each sub-image region into the same value, and the same value is determined according to original color values of all the pixel points;

superposing the edge extraction result to the N color blocks to obtain a fused image;

predicting human body key points of the human body region of the human body image to obtain a human body key point prediction result;

and drawing folds on the fusion image based on the human body key point prediction result and the set fold occurrence rule to obtain a target image.

2. The image processing method according to claim 1, wherein the partitioning the human body region of the human body image includes:

partitioning the human body area according to the human body parts and the clothes included in the human body area to obtain masks for indicating the N sub-image areas, wherein the masks corresponding to each sub-image area are respectively represented by different colors;

wherein, a sub-image area corresponds to a color block, and each sub-image area comprises a human body part or a dress.

3. The image processing method according to claim 1, wherein the performing edge extraction on each of the N sub-image regions respectively comprises:

for each sub-image area, carrying out filtering processing on the sub-image area to obtain a filtering image;

calculating gradient data of each pixel point in the filtering image to obtain a gradient image;

according to the gradient data of each pixel point, filtering the pixel points included in the gradient image to obtain the residual pixel points which are not filtered;

based on the gradient strength of the residual pixel points and the two set thresholds, screening the residual pixel points to obtain screened pixel points;

and connecting the screened pixel points to obtain the edge extraction result.

4. The method according to claim 1, wherein said performing color block extraction on each sub-image region of the N sub-image regions comprises:

for each sub-image area, acquiring the color average value of all pixel points in the sub-image area;

and configuring the color value of each pixel point in the sub-image area as the color average value to obtain a color block corresponding to the sub-image area.

5. The image processing method of claim 4, wherein the obtaining the color average of all the pixels in the sub-image region comprises:

respectively acquiring a first color average value of all pixel points in the sub-image region in an R channel, a second color average value in a G channel and a third color average value in a B channel;

the configuring the color value of each pixel point in the sub-image region as the color average value includes:

and configuring the color value of each pixel point in the sub-image area in the R channel as the first color average value, configuring the color value in the G channel as the second color average value, and configuring the color value in the B channel as the third color average value.

6. The image processing method according to claim 1, wherein the step of drawing a wrinkle on the fused image based on the human body key point prediction result and the set wrinkle occurrence rule to obtain a target image comprises:

generating a plurality of selectable items based on the human keypoint prediction result and the wrinkle occurrence rule; each selectable item corresponds to two human key points in the human key point prediction result;

displaying the plurality of selectable items;

determining M target selectable items selected by a user in the plurality of selectable items, wherein the value of M is a positive integer;

for each target selectable item, taking two human body key points corresponding to the target selectable item as a starting point and an end point of a fold to be drawn respectively;

and connecting the determined starting point and the corresponding end point to obtain a fold drawn on the fused image.

7. The image processing method according to claim 6, wherein connecting the determined start point and the corresponding end point to obtain a wrinkle rendered on the fused image comprises:

and connecting the determined starting point and the corresponding end point by adopting a Bezier curve to obtain a fold drawn on the fused image.

8. An image processing apparatus, characterized in that the apparatus comprises:

an acquisition module configured to acquire a human body image to be processed;

the first processing module is configured to perform partition processing on a human body region of the human body image to obtain N sub-image regions, wherein the value of N is a positive integer;

the extraction module is configured to perform edge extraction and color block extraction on each sub-image region in the N sub-image regions respectively to obtain an edge extraction result and N color blocks, wherein the color block extraction refers to uniformly configuring color values of all pixel points included in each sub-image region to be the same value, and the same value is determined according to original color values of all the pixel points;

the fusion module is configured to overlay the edge extraction result onto the N color blocks to obtain a fusion image;

the prediction module is configured to predict human key points of the human body region of the human body image to obtain a human key point prediction result;

and the second processing module is configured to draw folds on the fusion image based on the human body key point prediction result and the set fold occurrence rule to obtain a target image.

9. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the image processing method of any one of claims 1 to 7.

10. A computer-readable storage medium, wherein instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the image processing method of any of claims 1 to 7.

Technical Field

The present disclosure relates to the field of computer vision technologies, and in particular, to an image processing method and apparatus, an electronic device, and a storage medium.

Background

Stylizing images in an artistic manner has been a popular research direction in the field of computer vision. The stylization is a specific application in an image processing algorithm, and aims to convert the style of an image into a certain style type, and keep other elements in the image unchanged, so as to realize a specific visual effect desired by a user.

Taking a quadratic element style as an example, the technology converts an image into a cartoon type style of quadratic element, namely, realizes cartoon of the image. The better the image processing effect, the more satisfactory cartoon effect can be generated, and therefore, how to convert the real image into a high-quality painting-style image becomes a problem to be solved by those skilled in the art.

Disclosure of Invention

The present disclosure provides an image processing method, an image processing apparatus, an electronic device, and a storage medium, which not only enrich image processing modes, but also have a good image processing effect. The technical scheme of the disclosure is as follows:

according to a first aspect of embodiments of the present disclosure, there is provided an image processing method, the method including:

acquiring a human body image to be processed, and carrying out partition processing on a human body area of the human body image to obtain N sub-image areas, wherein the value of N is a positive integer;

respectively carrying out edge extraction and color block extraction on each sub-image region in the N sub-image regions to obtain an edge extraction result and N color blocks, wherein the color block extraction refers to uniformly configuring color values of all pixel points included in each sub-image region into the same value, and the same value is determined according to original color values of all the pixel points;

superposing the edge extraction result to the N color blocks to obtain a fused image;

predicting human body key points of the human body region of the human body image to obtain a human body key point prediction result;

and drawing folds on the fusion image based on the human body key point prediction result and the set fold occurrence rule to obtain a target image.

In a possible implementation manner, the partitioning the human body region of the human body image includes:

partitioning the human body area according to the human body parts and the clothes included in the human body area to obtain masks for indicating the N sub-image areas, wherein the masks corresponding to each sub-image area are respectively represented by different colors;

wherein, a sub-image area corresponds to a color block, and each sub-image area comprises a human body part or a dress.

In a possible implementation manner, the performing edge extraction on each of the N sub-image regions respectively includes:

for each sub-image area, carrying out filtering processing on the sub-image area to obtain a filtering image;

calculating gradient data of each pixel point in the filtering image to obtain a gradient image;

according to the gradient data of each pixel point, filtering the pixel points included in the gradient image to obtain the residual pixel points which are not filtered;

based on the gradient strength of the residual pixel points and the two set thresholds, screening the residual pixel points to obtain screened pixel points;

and connecting the screened pixel points to obtain the edge extraction result.

In a possible implementation manner, the gradient data includes a gradient strength and a gradient direction, and the filtering processing of the pixel points included in the gradient image according to the gradient data of each pixel point includes:

for each pixel point in the gradient image, comparing the gradient strength of the pixel point with the gradient strength of two adjacent pixel points;

if the gradient intensity of the pixel point is greater than the gradient intensities of the two pixel points, the pixel point is reserved;

if the gradient intensity of the pixel point is minimum or smaller than the gradient intensity of any one of the two pixel points, filtering the pixel point;

and the two adjacent phase pixel points are positioned in the gradient direction of the pixel point and positioned at two sides of the pixel point.

In one possible implementation, the two thresholds include a first threshold and a second threshold, the first threshold being greater than the second threshold; the screening processing is carried out on the residual pixel points based on the gradient strength of the residual pixel points and two set thresholds, and the screening processing comprises the following steps:

for each pixel point in the residual pixel points, if the gradient intensity of the pixel point is greater than the set first threshold value, the pixel point is reserved and marked as a first-class pixel point; or the like, or, alternatively,

if the gradient strength of the pixel point is smaller than the first threshold value and larger than the second threshold value, and the pixel point adjacent to the pixel point comprises the first type pixel point, the pixel point is reserved; or the like, or, alternatively,

if the gradient strength of the pixel point is smaller than the first threshold value and larger than the second threshold value, and the pixel point adjacent to the pixel point does not comprise the first type of pixel point, filtering the pixel point; or the like, or, alternatively,

and if the gradient strength of the pixel point is smaller than the second threshold value, filtering the pixel point.

In a possible implementation manner, the performing color block extraction on each sub-image region of the N sub-image regions includes:

for each sub-image area, acquiring the color average value of all pixel points in the sub-image area;

and configuring the color value of each pixel point in the sub-image area as the color average value to obtain a color block corresponding to the sub-image area.

In a possible implementation manner, the obtaining the color average value of all pixel points in the sub-image region includes:

respectively acquiring a first color average value of all pixel points in the sub-image region in an R channel, a second color average value in a G channel and a third color average value in a B channel;

the configuring the color value of each pixel point in the sub-image region as the color average value includes:

and configuring the color value of each pixel point in the sub-image area in the R channel as the first color average value, configuring the color value in the G channel as the second color average value, and configuring the color value in the B channel as the third color average value.

In a possible implementation manner, the predicting the human body key points in the human body region of the human body image, where the number of key points included in the human body key point prediction result is greater than a target threshold, includes:

predicting key points of the human body image on the basis of a key point prediction model;

the key point prediction model is obtained by training a deep neural network based on a specified training data set, each sample human body image in the specified training data set corresponds to label information, and the label information marks corresponding mapping points when the marking points in the sample human body images are mapped to corresponding three-dimensional human body models;

the generation process of the label information comprises the following steps: firstly, segmenting the human body part of the sample human body image, and then sampling each segmented human body part based on marker points with approximate equal distances to obtain a plurality of marker points for marking the human body part; and positioning a mapping point corresponding to each marking point on the three-dimensional human body model.

In a possible implementation manner, the drawing a wrinkle on the fused image based on the human body key point prediction result and the set wrinkle occurrence rule to obtain a target image includes:

generating a plurality of selectable items based on the human keypoint prediction result and the wrinkle occurrence rule; each selectable item corresponds to two human key points in the human key point prediction result;

displaying the plurality of selectable items;

determining M target selectable items selected by a user in the plurality of selectable items, wherein the value of M is a positive integer;

for each target selectable item, taking two human body key points corresponding to the target selectable item as a starting point and an end point of a fold to be drawn respectively;

and connecting the determined starting point and the corresponding end point to obtain a fold drawn on the fused image.

In one possible implementation, the generating a plurality of selectable items based on the human keypoint prediction result and the wrinkle occurrence rule includes:

determining a wrinkle occurrence region based on the wrinkle occurrence rule; wherein the wrinkle occurrence region refers to a region on a human body where wrinkles exist;

screening human key points from the human key point prediction result according to the determined fold occurrence area;

generating the plurality of selectable items according to the screened human key points; wherein each selectable item corresponds to two of the screened human key points.

In one possible implementation, connecting the determined start point and the corresponding end point to obtain a wrinkle drawn on the fused image includes:

and connecting the determined starting point and the corresponding end point by adopting a Bezier curve to obtain a fold drawn on the fused image.

In one possible implementation, the connection rule of the bezier curve includes:

randomly generating a first included angle value and a second included angle value within a specified included angle value interval;

taking the first included angle value as the tangential direction of the starting point, and taking the second included angle value as the tangential direction of the end point; generating the wrinkle based on a tangential direction of the starting point and a tangential direction of the ending point;

the first included angle value is formed by tangency of a first tangent line passing through the starting point and a specified straight line, and the second included angle value is formed by tangency of a second tangent line passing through the end point and the specified straight line; the first tangent line and the second tangent line are located on the same side of the designated straight line, and the designated straight line passes through the starting point and the ending point.

According to a second aspect of the embodiments of the present disclosure, there is provided an image processing apparatus, the apparatus including:

an acquisition module configured to acquire a human body image to be processed;

the first processing module is configured to perform partition processing on a human body region of the human body image to obtain N sub-image regions, wherein the value of N is a positive integer;

the extraction module is configured to perform edge extraction and color block extraction on each sub-image region in the N sub-image regions respectively to obtain an edge extraction result and N color blocks, wherein the color block extraction refers to uniformly configuring color values of all pixel points included in each sub-image region to be the same value, and the same value is determined according to original color values of all the pixel points;

the fusion module is configured to overlay the edge extraction result onto the N color blocks to obtain a fusion image;

the prediction module is configured to predict human key points of the human body region of the human body image to obtain a human key point prediction result;

and the second processing module is configured to draw folds on the fusion image based on the human body key point prediction result and the set fold occurrence rule to obtain a target image.

In a possible implementation manner, the first processing module is configured to perform partition processing on the human body region according to a human body part and clothing included in the human body region, to obtain masks for indicating the N sub-image regions, where the masks corresponding to each sub-image region are respectively represented by different colors; wherein, a sub-image area corresponds to a color block, and each sub-image area comprises a human body part or a dress.

In one possible implementation manner, the extraction module includes:

the first processing unit is configured to carry out filtering processing on each sub-image area to obtain a filtering image;

the calculation unit is configured to calculate gradient data of each pixel point in the filtering image to obtain a gradient image;

the second processing unit is configured to filter the pixel points included in the gradient image according to the gradient data of each pixel point to obtain the residual pixel points which are not filtered;

the third processing unit is configured to perform screening processing on the residual pixel points based on the gradient strength of the residual pixel points and the two set thresholds to obtain screened pixel points;

and the connecting unit is configured to connect the screened pixel points to obtain the edge extraction result.

In a possible implementation manner, the second processing unit is configured to, for each pixel point in the gradient image, compare the gradient strength of the pixel point with the gradient strength of two adjacent pixel points; if the gradient intensity of the pixel point is greater than the gradient intensities of the two pixel points, the pixel point is reserved; if the gradient intensity of the pixel point is minimum or smaller than the gradient intensity of any one of the two pixel points, filtering the pixel point; and the two adjacent phase pixel points are positioned in the gradient direction of the pixel point and positioned at two sides of the pixel point.

In a possible implementation manner, the third processing unit is configured to, for each of the remaining pixels, if the gradient strength of the pixel is greater than the set first threshold, retain the pixel, and mark the pixel as a first-class pixel; or if the gradient strength of the pixel point is smaller than the first threshold and larger than the second threshold, and the neighborhood pixel points of the pixel point comprise the first type pixel points, keeping the pixel point; or, if the gradient strength of the pixel point is smaller than the first threshold and larger than the second threshold, and the neighborhood pixel points of the pixel point do not include the first type pixel point, filtering the pixel point; or, if the gradient strength of the pixel point is smaller than the second threshold, filtering the pixel point.

In one possible implementation manner, the extraction module further includes:

the acquisition unit is configured to acquire the color average value of all pixel points in each sub-image area;

and the extraction unit is configured to configure the color value of each pixel point in the sub-image area as the color average value to obtain a color block corresponding to the sub-image area.

In a possible implementation manner, the obtaining unit is configured to obtain a first color average value of all pixel points in the sub-image region in an R channel, a second color average value in a G channel, and a third color average value in a B channel, respectively;

the extracting unit is configured to configure the color value of each pixel point in the sub-image region in the R channel as the first color average value, the color value in the G channel as the second color average value, and the color value in the B channel as the third color average value.

In one possible implementation manner, the second processing module includes:

a determination unit configured to generate a plurality of selectable items based on the human body key point prediction result and the wrinkle occurrence rule; each selectable item corresponds to two human key points in the human key point prediction result; displaying the plurality of selectable items; determining M target selectable items selected by a user in the plurality of selectable items, wherein the value of M is a positive integer; for each target selectable item, taking two human body key points corresponding to the target selectable item as a starting point and an end point of a fold to be drawn respectively;

and the drawing unit is configured to connect the determined starting point and the corresponding end point to obtain a fold drawn on the fused image.

In a possible implementation manner, the rendering unit is configured to connect the determined start point and the corresponding end point by using a bezier curve to obtain a wrinkle rendered on the fused image.

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the image processing method of the first aspect.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium, wherein instructions, when executed by a processor of an electronic device, enable the electronic device to perform the image processing method of the first aspect.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product, wherein instructions of the computer program product, when executed by a processor of an electronic device, enable the electronic device to perform the image processing method of the first aspect.

The technical scheme provided by the embodiment of the disclosure at least has the following beneficial effects:

after the human body image to be processed is sequentially subjected to the processing of human body partition, edge extraction, color block extraction and the like, the edge extraction result is superposed on the extracted color block to form a fused image; then, predicting the human body key points of the human body region of the human body image to obtain a human body key point prediction result; and finally, drawing folds on the fusion image to obtain a target image based on the human body key point prediction result and the set fold occurrence rule. Namely, the embodiment of the disclosure can add the wrinkle to the fused image based on the predicted human key points and wrinkle appearance rules, so that the human image to be processed is converted into an image with a certain painting style, and the image processing mode is enriched. For example, when the wrinkle occurrence rule is a cartoon style for two-dimensional, a cartoon type image having a two-dimensional style can be obtained. In addition, the edge extraction and the color block extraction are carried out after the human body is partitioned, so that the edge extraction and the color block extraction are guaranteed to have semantic selectivity and are not disordered and random, and then the wrinkles are drawn based on the predicted human body key points, so that the edge extraction effect when the human body boundary and the background color are close is guaranteed, more accurate wrinkles can be obtained, the image is not disordered, the wrinkles are guaranteed to be only shown at the required positions, and the image processing effect is better.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

Fig. 1 is a schematic diagram illustrating an implementation environment involved with an image processing method according to an example embodiment.

FIG. 2 is a flow diagram illustrating an image processing method according to an exemplary embodiment.

FIG. 3 is a flow diagram illustrating an image processing method according to an exemplary embodiment.

FIG. 4 is a diagram illustrating an image processing effect according to an exemplary embodiment.

FIG. 5 is a diagram illustrating an image processing effect according to an exemplary embodiment.

FIG. 6 is a diagram illustrating an image processing effect according to an exemplary embodiment.

FIG. 7 is a diagram illustrating an image processing effect according to an exemplary embodiment.

Fig. 8 is a block diagram illustrating an image processing apparatus according to an exemplary embodiment.

FIG. 9 is a block diagram illustrating an electronic device in accordance with an example embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The user information to which the present disclosure relates may be information authorized by the user or sufficiently authorized by each party.

Before explaining the embodiments of the present disclosure in detail, some terms related to the embodiments of the present disclosure are explained.

Quadratic element: the early japanese animation and game works are both formed by two-dimensional images, and the picture is a plane, so the picture is called a "two-dimensional world" and is called a "two-dimensional" for short, and the picture is opposite to the picture, namely a "three-dimensional" which is the existing one, namely the real world. Quadratic element means the wonderful world that human beings imagine, shows the visual experience of abusing viewers with various longitudes, and in essence is also the blurred longing of dream life and the expectation of a nice future in human mind in the cubic world.

Cartoon: is an artistic form, which is a picture for depicting life or current affairs by a simple and exaggerated method.

Dense human body key points: refers to a sufficiently dense number of human keypoints, i.e. the human keypoint divisions are no longer simply such as head, neck, shoulder, elbow, hand, hip, knee, foot, etc., but are distributed sufficiently dense, e.g. the number of keypoints can be tens. In one possible implementation, the dense human key points include, but are not limited to: forehead, left eye, right eye, left ear, right ear, mouth, left shoulder, right shoulder, left elbow, right elbow, left wrist, right wrist, left palm, right palm, chest, left hip, right hip, left knee, right knee, left ankle, right wrist, left foot palm, right foot palm, etc.

An implementation environment related to an image processing method provided by the embodiment of the present disclosure is described below.

The image processing method can be applied to interactive scenes, such as a video call process, a video live broadcast process and the like; the method and the device for processing the human body image or video may also be applied in a non-interactive scene, for example, in the process of taking an image or video by a user person, or may also perform image processing on a human body image or video stored locally by the user, which is not specifically limited in this embodiment of the disclosure.

Taking an application in a non-interactive scenario as an example, referring to fig. 1, the implementation environment includes a user 101 and an electronic device 102, where the electronic device 102 generally refers to a mobile computer device such as a tablet computer, a smart phone, and the like. The electronic device 102 is configured to execute the image processing method.

In addition, if the application is applied in an interactive scenario, the implementation environment shown in fig. 1 further includes a server in data communication with the electronic device 102 and at least one other electronic device in data communication with the server.

Based on the implementation environment, the embodiment of the disclosure provides an image processing method, in which after a human body image to be processed is sequentially subjected to human body partitioning, edge extraction, color block extraction and the like, an edge extraction result is superimposed on an extracted color block to form a fused image; then, predicting the human body key points of the human body region of the human body image to obtain a human body key point prediction result; and finally, drawing folds on the fusion image to obtain a target image based on the human body key point prediction result and the set fold occurrence rule. Namely, the embodiment of the disclosure can add the wrinkle to the fused image based on the predicted human key points and wrinkle appearance rules, so that the human image to be processed is converted into an image with a certain painting style, and the image processing mode is enriched. For example, when the wrinkle occurrence rule is a cartoon style for two-dimensional, a cartoon type image having a two-dimensional style can be obtained. Namely, abstract, concise and clear folds can be added, and the clear, planar and abstract quadratic element style conversion of the human body image is realized. Illustratively, the disclosed embodiments can convert the human body image to be processed into a cartoon type image with a two-dimensional style (such as flattening, abstracting and having a hook edge effect).

In addition, the edge extraction and the color block extraction are carried out after the human body is partitioned, so that the edge extraction and the color block extraction are guaranteed to be semantically selective and are not disordered and random, and then the folds are drawn based on the predicted human body key points, so that the edge extraction effect when the human body boundary and the background color are close is guaranteed, more accurate folds can be obtained, and the pictures are not in disorder, for example, the folds mainly appear on limb boundaries, joints, clothing boundaries, pockets, trouser legs, skirts and the like and have definite meanings, and the image processing effect is better.

Fig. 2 is a flowchart illustrating an image processing method, as shown in fig. 2, for use in the electronic device shown in fig. 1, according to an exemplary embodiment, including the following steps.

In step 201, a human body image to be processed is acquired, and a human body region of the human body image is subjected to partition processing to obtain N sub-image regions, where a value of N is a positive integer.

In step 202, edge extraction and color lump extraction are respectively performed on each sub-image region in the N sub-image regions to obtain an edge extraction result and N color lumps, where color lump extraction refers to uniformly configuring color values of each pixel point included in each sub-image region to be the same value, and the same value is determined according to an original color value of each pixel point.

In step 203, the edge extraction result is superimposed on the N color patches to obtain a fused image.

In step 204, the human body region of the human body image is subjected to human body key point prediction to obtain a human body key point prediction result.

In step 205, based on the human body key point prediction result and the set wrinkle occurrence rule, a wrinkle is drawn on the fusion image, and a target image is obtained.

According to the method provided by the embodiment of the disclosure, after the human body image to be processed is sequentially subjected to the processing of human body partition, edge extraction, color block extraction and the like, the edge extraction result is superposed on the extracted color block to form a fused image; then, predicting the human body key points of the human body region of the human body image to obtain a human body key point prediction result; and finally, drawing folds on the fusion image to obtain a target image based on the human body key point prediction result and the set fold occurrence rule. Namely, the embodiment of the disclosure can add the wrinkle to the fused image based on the predicted human key points and wrinkle appearance rules, so that the human image to be processed is converted into an image with a certain painting style, and the image processing mode is enriched. For example, when the wrinkle occurrence rule is a cartoon style for two-dimensional, a cartoon type image having a two-dimensional style can be obtained. In addition, the edge extraction and the color block extraction are carried out after the human body is partitioned, so that the edge extraction and the color block extraction are guaranteed to have semantic selectivity and are not disordered and random, and then the wrinkles are drawn based on the predicted human body key points, so that the edge extraction effect when the human body boundary and the background color are close is guaranteed, more accurate wrinkles can be obtained, the image is not disordered, the wrinkles are guaranteed to be only shown at the required positions, and the image processing effect is better.

In a possible implementation manner, the partitioning the human body region of the human body image includes:

partitioning the human body area according to the human body parts and the clothes included in the human body area to obtain masks for indicating the N sub-image areas, wherein the masks corresponding to each sub-image area are respectively represented by different colors;

wherein, a sub-image area corresponds to a color block, and each sub-image area comprises a human body part or a dress.

In a possible implementation manner, the performing edge extraction on each of the N sub-image regions respectively includes:

for each sub-image area, carrying out filtering processing on the sub-image area to obtain a filtering image;

calculating gradient data of each pixel point in the filtering image to obtain a gradient image;

according to the gradient data of each pixel point, filtering the pixel points included in the gradient image to obtain the residual pixel points which are not filtered;

based on the gradient strength of the residual pixel points and the two set thresholds, screening the residual pixel points to obtain screened pixel points;

and connecting the screened pixel points to obtain the edge extraction result.

In a possible implementation manner, the gradient data includes a gradient strength and a gradient direction, and the filtering processing of the pixel points included in the gradient image according to the gradient data of each pixel point includes:

for each pixel point in the gradient image, comparing the gradient strength of the pixel point with the gradient strength of two adjacent pixel points;

if the gradient intensity of the pixel point is greater than the gradient intensities of the two pixel points, the pixel point is reserved;

if the gradient intensity of the pixel point is minimum or smaller than the gradient intensity of any one of the two pixel points, filtering the pixel point;

and the two adjacent phase pixel points are positioned in the gradient direction of the pixel point and positioned at two sides of the pixel point.

In one possible implementation, the two thresholds include a first threshold and a second threshold, the first threshold being greater than the second threshold;

the screening processing is carried out on the residual pixel points based on the gradient strength of the residual pixel points and two set thresholds, and the screening processing comprises the following steps:

for each pixel point in the residual pixel points, if the gradient intensity of the pixel point is greater than the set first threshold value, the pixel point is reserved and marked as a first-class pixel point; or the like, or, alternatively,

if the gradient strength of the pixel point is smaller than the first threshold value and larger than the second threshold value, and the pixel point adjacent to the pixel point comprises the first type pixel point, the pixel point is reserved, and the pixel point is marked as a second type pixel point; or the like, or, alternatively,

if the gradient strength of the pixel point is smaller than the first threshold value and larger than the second threshold value, and the pixel point adjacent to the pixel point does not comprise the first type of pixel point, filtering the pixel point; or the like, or, alternatively,

and if the gradient strength of the pixel point is smaller than the second threshold value, filtering the pixel point.

In a possible implementation manner, the performing color block extraction on each sub-image region of the N sub-image regions includes:

for each sub-image area, acquiring the color average value of all pixel points in the sub-image area;

and configuring the color value of each pixel point in the sub-image area as the color average value to obtain a color block corresponding to the sub-image area.

In a possible implementation manner, the obtaining the color average value of all pixel points in the sub-image region includes:

respectively acquiring a first color average value of all pixel points in the sub-image region in an R channel, a second color average value in a G channel and a third color average value in a B channel;

the configuring the color value of each pixel point in the sub-image region as the color average value includes:

and configuring the color value of each pixel point in the sub-image area in the R channel as the first color average value, configuring the color value in the G channel as the second color average value, and configuring the color value in the B channel as the third color average value.

In a possible implementation manner, the predicting the human body key points in the human body region of the human body image, where the number of key points included in the human body key point prediction result is greater than a target threshold, includes:

predicting key points of the human body image on the basis of a key point prediction model;

the key point prediction model is obtained by training a deep neural network based on a specified training data set, each sample human body image in the specified training data set corresponds to label information, and the label information marks corresponding mapping points when the marking points in the sample human body images are mapped to corresponding three-dimensional human body models;

the generation process of the label information comprises the following steps: firstly, segmenting the human body part of the sample human body image, and then sampling each segmented human body part based on marker points with approximate equal distances to obtain a plurality of marker points for marking the human body part; and positioning a mapping point corresponding to each marking point on the three-dimensional human body model.

In a possible implementation manner, the drawing a wrinkle on the fused image based on the human body key point prediction result and the set wrinkle occurrence rule to obtain a target image includes:

generating a plurality of selectable items based on the human keypoint prediction result and the wrinkle occurrence rule; each selectable item corresponds to two human key points in the human key point prediction result;

displaying the plurality of selectable items;

determining M target selectable items selected by a user in the plurality of selectable items, wherein the value of M is a positive integer;

for each target selectable item, taking two human body key points corresponding to the target selectable item as a starting point and an end point of a fold to be drawn respectively;

and connecting the determined starting point and the corresponding end point to obtain a fold drawn on the fused image.

In one possible implementation, the generating a plurality of selectable items based on the human keypoint prediction result and the wrinkle occurrence rule includes:

determining a wrinkle occurrence region based on the wrinkle occurrence rule; wherein the wrinkle occurrence region refers to a region on a human body where wrinkles exist;

screening human key points from the human key point prediction result according to the determined fold occurrence area;

generating the plurality of selectable items according to the screened human key points; wherein each selectable item corresponds to two of the screened human key points.

In one possible implementation, connecting the determined start point and the corresponding end point to obtain a wrinkle drawn on the fused image includes:

and connecting the determined starting point and the corresponding end point by adopting a Bezier curve to obtain a fold drawn on the fused image.

In one possible implementation, the connection rule of the bezier curve includes:

randomly generating a first included angle value and a second included angle value within a specified included angle value interval;

taking the first included angle value as the tangential direction of the starting point, and taking the second included angle value as the tangential direction of the end point; generating the wrinkle based on a tangential direction of the starting point and a tangential direction of the ending point;

the first included angle value is formed by tangency of a first tangent line passing through the starting point and a specified straight line, and the second included angle value is formed by tangency of a second tangent line passing through the end point and the specified straight line; the first tangent line and the second tangent line are located on the same side of the designated straight line, and the designated straight line passes through the starting point and the ending point.

All the above optional technical solutions may be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.

FIG. 3 is a flow chart illustrating a method of image processing, as shown in FIG. 3, for use in the electronic device shown in FIG. 1, according to an exemplary embodiment. The image processing method includes the following steps.

In step 301, the electronic device obtains a human body image to be processed, and performs partition processing on a human body region of the image to be processed based on an image semantic segmentation model to obtain a mask for indicating N sub-image regions.

Wherein, a human body image refers to an image including a human body in the embodiments of the present disclosure.

In addition, the to-be-processed human body image acquired by the electronic device may be a video frame in a video call or a video live broadcast process, may also be an image currently shot or shot in advance by a user, and may also be a video frame in a section of video shot in advance, which is not specifically limited in this embodiment of the present disclosure.

The embodiment of the present disclosure only illustrates the whole image processing flow by taking one image as an example, and it can be extended that the image processing flow can be applied to multiple images or each video frame in a video.

This step is used to predict different partitions of the human body on the human body image to be processed by a trained convolutional neural network (i.e., image semantic segmentation model).

Exemplarily, after the human body partition processing by the image semantic segmentation model, the mask of each partition (also referred to as sub-image region in this document) is obtained. Taking the N sub-image regions as an example, as shown in step (1) in fig. 4, the mask corresponding to each sub-image region may be respectively represented by different colors with large discrimination, where the value of N is a positive integer.

The first point to be noted is that the human partition method determines the number of color blocks in the subsequent color block extraction. Exemplarily, how many partitions are obtained in this step, and how many color blocks are obtained when performing color block extraction finally.

The second point to be noted is that the N sub-image regions may include only the human body parts, and as shown in step (1) in fig. 4, after the human body partition processing, masks of the human body part regions of the head, the upper garment, the lower garment, the two arms, and the two legs are obtained. Wherein, step (1) in fig. 4 shows a most basic human body partition mode, and the final obtained picture result is close to a two-dimensional picture with a simpler style, wherein, the upper garment and the lower garment are pure colors with a single color, and do not comprise two or more colors.

The third point to be noted is that, besides the human body part, the N sub-image regions may further include a first type decoration for decorating the human body and a second type decoration for decorating the apparel. For example, the first type of decoration may be a tie, and the second type of decoration may be a pocket or logo on the upper or lower garment, etc. That is, sometimes the human body region includes some finer partitions (such as a tie, a pocket on clothes, or a logo), and when the human body partition processing is performed, the convolutional neural network can also be trained to predict the partitions, so that the picture details are more sufficient.

As described above, in the embodiment of the present disclosure, the human body image to be processed is subjected to semantic segmentation processing based on the image semantic segmentation model, so as to obtain the N sub-image regions. The image semantic segmentation model is sensitive to the edges generally, so that the image semantic segmentation model can be used for obtaining more accurate segmentation edges, and the segmentation effect is ensured. In one possible implementation, the training process of the image semantic segmentation model includes, but is not limited to:

3011. and acquiring a sample human body image and a labeling segmentation result of the sample human body image, and inputting the sample human body image into a convolutional neural network.

The number of the sample human body images can be thousands, and the training sample images correspond to the labeling segmentation results of the manual labeling.

Illustratively, the labeling segmentation result is obtained by artificially labeling each region included in the human body region of the sample human body image.

3012. And determining whether the prediction segmentation result of the sample human body image output by the convolutional neural network is matched with the annotation segmentation result based on the target loss function.

As an example, the target loss function may be a cross entropy loss function, and the deep convolutional neural network may be a full convolutional neural network, which is not particularly limited in the embodiments of the present disclosure.

3013. And if the prediction segmentation result is not matched with the annotation segmentation result, iteratively updating the network parameters of the convolutional neural network repeatedly and circularly until the model is converged to obtain the image semantic segmentation model.

In step 302, the electronic device performs edge extraction on the N sub-image regions, respectively.

This step is used to perform edge extraction on the result obtained in the previous step. Since the human body is already partitioned in the previous step, and the respective masks of each partition are respectively represented by different colors with large differentiation, the edge extraction can be directly performed on the result obtained in the step, and exemplarily, a Canny edge extractor can be used for performing the edge extraction.

Among them, Canny edge extraction is a technique for extracting useful structural information from different visual objects and greatly reducing the amount of data to be processed, and is widely used in various computer vision systems. In general, the Canny edge extraction algorithm can be divided into the following steps:

3021. and for each sub-image area, carrying out filtering processing on the sub-image area to obtain a filtering image.

In this step, a gaussian filter can be used to perform gaussian filtering to smooth the image and filter out noise.

In order to reduce the influence of noise on the edge extraction result as much as possible, it is necessary to filter out the noise to prevent erroneous detection caused by the noise. To smooth the image, a gaussian filter is used to convolve with the image to reduce the apparent noise contribution on the edge extractor. In addition, the choice of the gaussian convolution kernel size will affect the performance of the Canny edge extractor. The larger the size, the less sensitive the edge extractor is to noise, but the positioning error of the edge extraction will also increase slightly.

3022. And calculating the gradient data of each pixel point in the filtering image to obtain a gradient image.

The gradient data comprises gradient strength and gradient direction, and the gradient strength and the gradient direction of each pixel point in the filtering image are calculated.

Where edges in an image can point in various directions, the Canny edge extraction algorithm uses four operators to detect horizontal, vertical, and diagonal edges in an image. The operator for edge detection returns the first derivative values in the horizontal Gx and vertical Gy directions, so that the gradient strength and gradient direction of each pixel point can be determined.

3023. And according to the gradient data of each pixel point, filtering the pixel points included in the gradient image to obtain the residual pixel points which are not filtered.

In this step, Non-Maximum Suppression (Non-Maximum Suppression) is applied to eliminate spurious responses caused by edge detection. Non-maxima suppression is an edge thinning technique, and the effect of non-maxima suppression is "thin" edges. After gradient computation of the image, edges extracted based on gradient values alone remain blurred. While non-maxima suppression may help suppress all gradient values outside the local maxima to 0. That is, usually, the gray scale change places are concentrated, the gray scale change in the gradient direction in the local range is the largest, and the others are not retained, so that a large part of pixel points can be eliminated. An edge that is multiple pixels wide will become a single pixel wide edge. I.e., a "fat" edge becomes a "thin" edge.

In a possible implementation manner, the filtering processing of the pixel points included in the gradient image according to the gradient data of each pixel point includes: for each pixel point in the gradient image, comparing the gradient strength of the pixel point with the gradient strength of two adjacent pixel points; if the gradient strength of the pixel point is greater than the gradient strength of the two pixel points, the pixel point is reserved; if the gradient strength of the pixel point is minimum or less than the gradient strength of any one of the two pixel points, filtering the pixel point; and the two adjacent phase pixel points are positioned in the gradient direction of the pixel point and positioned at two sides of the pixel point.

3024. Based on the gradient strength of the residual pixel points and the two set thresholds, screening the residual pixel points to obtain screened pixel points; and connecting the screened pixel points to obtain an edge extraction result.

The method comprises the steps of determining real and potential edges by applying Double-Threshold (Double-Threshold) screening, and finally finishing edge extraction by inhibiting isolated weak edges.

In one possible implementation, the two thresholds include a first threshold and a second threshold, the first threshold being greater than the second threshold; illustratively, the first threshold is also referred to as a high threshold and the second threshold is also referred to as a low threshold.

Illustratively, based on the gradient strength of the remaining pixel points and two set thresholds, the remaining pixel points are subjected to a screening process, including but not limited to:

for each pixel point in the rest pixel points, if the gradient intensity of the pixel point is greater than a set first threshold value, the pixel point is reserved and marked as a first-class pixel point; or if the gradient strength of the pixel point is smaller than a first threshold value and larger than a second threshold value, and the pixel point adjacent to the pixel point comprises a first type pixel point, reserving the pixel point, and marking the pixel point as a second type pixel point; or, if the gradient strength of the pixel point is smaller than the first threshold value and larger than the second threshold value, and the pixel point adjacent to the pixel point does not include the first type pixel point, filtering the pixel point; or, if the gradient strength of the pixel point is smaller than the second threshold value, filtering the pixel point.

Wherein, the first type of pixel point is also called as a strong edge pixel point; the second type of pixels are also called weak edge pixels.

After non-maximum suppression is applied, the remaining pixel points can more accurately represent the actual edge in the image. However, there are still some edge pixels due to noise and color variations. To resolve these spurious responses, edge pixels can be filtered with weak gradient values and edge pixels with high gradient values are retained. Illustratively, this may be achieved by selecting a high-low threshold. If the gradient value of the edge pixel is higher than the high threshold value, marking the edge pixel as a strong edge pixel; if the gradient value of the edge pixel point is smaller than the high threshold value and larger than the low threshold value, the edge pixel point is marked as a weak edge pixel point; if the gradient value of the edge pixel point is smaller than the low threshold value, the edge pixel point is restrained.

In addition, the pixel points that are classified as strong edges have been determined to be edges because they were extracted from the true edges in the image. However, there is some debate about weak edge pixels, since these pixels can be extracted from the real edge or caused by noise or color change. In order to obtain an accurate edge extraction result, weak edge pixel points caused by the latter should be suppressed. Typically, weak edge pixel points caused by real edges will be connected to strong edge pixel points, while the noise response is not connected. In order to track edge connection, by checking weak edge pixel points and 8 neighborhood pixel points thereof, as long as one of the weak edge pixel points is a strong edge pixel point, the weak edge point can be kept as a real edge.

Wherein, step (2) in fig. 4 shows the edge extraction result of the human body partition.

In addition, the execution sequence of the step 302 and the step 303 described below may be arbitrary, and the edge extraction step may be executed first, or the color block extraction step may be executed first, which is not specifically limited in this embodiment of the disclosure. The embodiments of the present disclosure are described only by taking the example of performing the edge extraction first and then performing the patch extraction.

In step 303, the electronic device performs color block extraction on the N sub-image regions, respectively, to obtain N color blocks, and superimposes the obtained edge extraction result on the N color blocks, to obtain a fused image.

This step is used for color block extraction. The color block extraction refers to uniformly configuring color values of all pixel points included in each sub-image area into the same numerical value. And the same value is determined according to the original color value of each pixel point.

In short, first, a color average is extracted, that is, for each partition, a color average of all the pixels in the partition is calculated (for example, R, G, B channels are averaged separately), and then all the pixels in the partition are smeared to the color value. After the color block is extracted, the edge extracted in step 302 may be superimposed on the picture, so as to obtain the picture effect of step (3) in fig. 4.

In detail, the color block extraction is performed on the N sub-image regions respectively, and the method comprises the following steps:

3031. and for each sub-image area, acquiring the color average value of all pixel points in the sub-image area.

Illustratively, the color average of all pixel points in the sub-image region is obtained, including but not limited to: acquiring a first color average value of all pixel points in the sub-image region in an R channel; acquiring a second color average value of all pixel points in the sub-image region in a G channel; and acquiring the third color average value of all pixel points in the sub-image area in the B channel.

3032. And configuring the color value of each pixel point in the subimage area as a color average value to obtain a color block corresponding to the subimage area.

Correspondingly, configuring the color value of each pixel point in the sub-image area as a color average value, including: configuring the color value of each pixel point in the sub-image area in the R channel into a first color average value; configuring the color value of each pixel point in the sub-image area in the G channel as a second color average value; and configuring the color value of each pixel point in the sub-image area in the B channel as a third color average value.

In step 304, the electronic device performs human key point prediction on a human body region of the human body image to obtain a human key point prediction result.

In one possible implementation, the human body key point prediction is performed on the human body region of the human body image, including but not limited to: predicting key points of a human body in a human body region of a human body image based on a key point prediction model; the key point prediction model is obtained by training a deep neural network based on a specified training data set, each sample human body image in the specified training data set corresponds to label information, and the label information marks corresponding mapping points when the marking points in the sample human body images are mapped to corresponding three-dimensional human body models; the generation process of the label information comprises the following steps: firstly, segmenting a human body part of a sample human body image, and then sampling each segmented human body part based on marker points with approximate equal distances to obtain a plurality of marker points for marking the human body part; and positioning a mapping point corresponding to each marking point on the three-dimensional human body model.

Take the prediction of dense human key points as an example, wherein step (4) in fig. 4 shows the prediction result of the dense human key points. Illustratively, the predicted dense human key points include, but are not limited to: the left shoulder, the right shoulder, the left elbow, the right elbow, the left wrist, the right wrist, the left palm, the right palm, the chest, the left hip, the right hip, the left knee, the right knee, the left ankle, the right ankle, the left sole, the right sole, etc., which are not specifically limited in this regard in the embodiments of the present disclosure.

Illustratively, the dense human body key point prediction is performed on a human body region of a human body image, and comprises the following steps: based on a dense pose estimation model (DensePose), dense human body key point prediction is carried out on a human body region of a human body image.

The dense attitude estimation model is obtained by training a deep neural network based on a specified training data set. The dense pose estimation model is capable of constructing a dense mapping between a three-dimensional body model and a two-dimensional body image. In addition, the training dataset is designated as the densipose COCO dataset, which is a dataset containing over 5 million images from the COCO dataset and manually labeled by the staff, i.e., the staff manually constructed over 500 million correspondences on top of the densipose COCO dataset.

That is, what needs to be established is a dense correspondence of two-dimensional body images to three-dimensional body models. That is, each sample human body image in the designated training data set corresponds to label information, and the label information records the dense correspondence between the sample human body image and the corresponding three-dimensional human body model; the generation process of the label information comprises the following steps: firstly, a human body part of a sample human body image is segmented, then each segmented human body part is sampled based on approximately equidistant marking points, and a mapping point corresponding to each marking point is positioned on a three-dimensional human body model.

In addition, the deep neural network (DensePose-RCNN) is a variant of Mask-RCNN, which realizes pixel alignment along with a segmentation Mask and an ROI layer of the Mask-RCNN, introduces a full convolution network on RoI Align, and classifies pixels. That is, DensePose-RCNN includes R-CNN structures of pyramid network features and region feature aggregation RoI Align structures. The RoI Align is a regional characteristic aggregation mode, the problem of region mismatching caused by two times of quantization in ROI Pooling operation is well solved, and the accuracy of a prediction result is improved.

It should be noted that the DensePose algorithm maps the two-dimensional image coordinates to the three-dimensional model by using deep learning, and processes the intensive coordinates at the speed of multiple frames per second, thereby finally realizing the accurate positioning and attitude estimation of the dynamic object. In detail, the DensePose algorithm can project surface pixels of a human body in a two-dimensional human body image onto the surface of a three-dimensional human body, and can also transform a three-dimensional model after estimating the UV coordinates of human body mark points in the two-dimensional human body image, so as to realize that the space coordinates are converted into the UV coordinates and then are attached to the two-dimensional human body image. That is, after the UV coordinates of the mark points are manually marked, the surface of a three-dimensional figure can be projected onto the two-dimensional image through transformation, and appropriate transformation can be performed according to the posture of the human body in the two-dimensional image, so that the surface of the three-dimensional model can be tightly attached to the two-dimensional human body.

In step 305, the electronic device draws a wrinkle on the fused image based on the human body key point prediction result and the set wrinkle occurrence rule, so as to obtain a target image.

The wrinkle appearance rule may be for a cartoon style of a quadratic element, or may be for a strongbox, a sketch, an abstract drawing, or the like, which is not specifically limited in the embodiment of the present disclosure.

In a possible implementation manner, based on the human body key point prediction result and the set wrinkle occurrence rule, a wrinkle is drawn on the fused image to obtain a target image, including but not limited to:

a, generating a plurality of selectable items based on a human body key point prediction result and the wrinkle occurrence rule; wherein each selectable item corresponds to two human key points in the human key point prediction result.

Illustratively, a number of alternatives are generated based on the human keypoint prediction and the wrinkle occurrence rule, including but not limited to: determining a wrinkle occurrence region based on a wrinkle occurrence rule; wherein the wrinkle occurrence region refers to a region on the human body where wrinkles exist; screening human key points from the human key point prediction result according to the determined fold occurrence area; and generating a plurality of selectable items according to the screened human key points.

Wherein, each selectable item corresponds to two human key points in the screened human key points. For example, two key points, namely the middle of the right front of the trunk and the lower part of the right front of the trunk, correspond to one option.

Step b, displaying a plurality of selectable items; m target selectable items selected by a user are determined among the plurality of selectable items.

Wherein, the value of M is a positive integer. That is, after presenting the plurality of selectable items, the terminal may determine which selectable items are selected by the user, and draw a wrinkle according to the key points corresponding to the selectable items. For example, if the user selects a selectable item a corresponding to two key points, the middle of the right front of the torso and below the right front of the torso, the terminal knows that a wrinkle needs to be drawn between the two key points.

Step c, regarding each target selectable item, taking two human body key points corresponding to the target selectable item as a starting point and an end point of a fold to be drawn respectively; and connecting the determined starting point and the corresponding end point to obtain a fold drawn on the fused image.

In one possible implementation, the determined start point and the corresponding end point are connected to obtain a wrinkle drawn on the fused image, including but not limited to: and connecting the determined starting point and the corresponding end point by adopting a Bezier curve to obtain a fold drawn on the fused image. The bezier curve may be a second-order bezier curve.

For example, the connection rule of the second-order bezier curve may include: randomly generating a first included angle value and a second included angle value within a specified included angle value interval; taking the first included angle value as the tangential direction of a starting point and taking the second included angle value as the tangential direction of an end point; wrinkles are generated based on the tangential direction of the starting point and the tangential direction of the end point.

The first included angle value is formed by tangency of a first tangent line passing through the starting point and the designated straight line, and the second included angle value is formed by tangency of a second tangent line passing through the terminal point and the designated straight line; the first tangent line and the second tangent line are located on the same side of the designated straight line, and the designated straight line passes through the starting point and the end point.

Illustratively, to ensure that the curve looks more natural, the specified angle value interval may be 10 to 20 degrees.

Illustratively, taking the two-dimensional stylization processing of the human body in the human body image as an example, the cartoon type image processing is only performed on the human body, such as only considering the body transformation, and neglecting the head and the background, i.e. the head and the background are not considered. In the case of performing two-dimensional stylization on a human body, the presence of wrinkles is essential, and the presence of wrinkles does not cause the image to look too simple and lack details, and the posture, dynamics, and the like of the human body are difficult to be reflected. However, the folds for the human body are not several curves which appear randomly, but have a relatively common distribution pattern on the human body. For example, wrinkles are mainly present in the boundaries of limbs, joints, clothing, pockets, creases, skirts, and the like, which have definite meanings.

Illustratively, wrinkles typically occur for several reasons:

3051. gravity. That is, the clothes may droop under the influence of gravity. The upper cloth is attached to the body, the lower cloth is loosely hung down, and the wrinkles are generated along with the lower cloth

3052. And (4) bending. That is, when the joint is bent, the clothing article forms radial wrinkles at the joint.

3053. And (4) loosening. That is, such a creased edge is wide and looks loose.

In one possible implementation, the types of wrinkles include, but are not limited to, the following: a. the crease of the elbow. b. Folds at the joints. c. And (5) splicing and sewing the cloth to form folds. d. When the arm is lifted, the cloth under the armpit is pulled upwards to generate folds. e. The trouser is draped on the instep. f. Wrinkles appear in the skirt.

It should be noted that the type and style of the wrinkles in the actual cartoon may refer to the style of the picture of the cartoon writer.

Based on the above description, to draw a wrinkle, it is necessary to identify the specific position of each key point of the human body on the picture; and this key point must be dense enough not just the shoulder, stride, elbow, knee, etc., but tens of key points, which is why the dense human key points need to be predicted in the above steps.

As an example, after obtaining dense human key points, the method of drawing wrinkles may be: firstly, two end points of each fold are determined, and then a plurality of determined end point pairs can be connected by using a Bezier curve, so that the folds look more natural and better accord with the abstract characteristics of a quadratic element human cartoon style.

In a possible implementation, the two end points of the folds are selected in relation to the regions where the folds are usually found in the human body, for example, depending on the tendency of the garment, the front portion of the torso extends to the lower right of the front portion of the torso corresponding to the start point and the end point of a fold. For example, the wrinkle appearance area may be a limb boundary, a joint, a clothing boundary, a pocket, a trouser line, a skirt, and the like, which may refer to a picture style of an animation writer, and this is not limited in this disclosure. And (5) drawing the folds on the human body with the extracted color blocks to obtain the final picture effect shown in the step (5) in the figure 4.

The method provided by the embodiment of the disclosure has at least the following beneficial effects:

after the human body image to be processed is sequentially subjected to the processing of human body partition, edge extraction, color block extraction and the like, the edge extraction result is superposed on the extracted color block to form a fused image; then, predicting the human body key points of the human body region of the human body image to obtain a human body key point prediction result; and finally, drawing folds on the fusion image to obtain a target image based on the human body key point prediction result and the set fold occurrence rule. Namely, the embodiment of the disclosure can add the wrinkle to the fused image based on the predicted human key points and wrinkle appearance rules, so that the human image to be processed is converted into an image with a certain painting style, and the image processing mode is enriched. For example, when the wrinkle occurrence rule is a cartoon style for two-dimensional, a cartoon type image having a two-dimensional style can be obtained. In addition, the edge extraction and the color block extraction are carried out after the human body is partitioned, so that the edge extraction and the color block extraction are guaranteed to have semantic selectivity and are not disordered and random, and then the wrinkles are drawn based on the predicted human body key points, so that the edge extraction effect when the human body boundary and the background color are close is guaranteed, more accurate wrinkles can be obtained, the image is not disordered, the wrinkles are guaranteed to be only shown at the required positions, and the image processing effect is better.

Among them, the middle graph of fig. 7 shows the effect of performing two-dimensional stylization on the human body included in the human body image to be processed (left graph in fig. 7). The cartoon type image processing is only performed on a human body, for example, only conversion of the body is considered, and the head and the background are ignored, that is, the head and the background are not considered. It is also conceivable to perform a two-dimensional stylization process of the header and the background to obtain a screen effect shown in the right diagram of fig. 7, and this is not particularly limited in the embodiment of the present disclosure.

Fig. 8 is a block diagram illustrating an image processing apparatus according to an exemplary embodiment. Referring to fig. 8, the apparatus includes an acquisition module 801, a first processing module 802, an extraction module 803, a fusion module 804, a prediction module 805, and a second processing module 806.

An acquisition module 801 configured to acquire a human body image to be processed;

a processing module 802, configured to perform partition processing on a human body region of the human body image to obtain N sub-image regions, where a value of N is a positive integer;

an extracting module 803, configured to perform edge extraction and color block extraction on each sub-image region in the N sub-image regions, respectively, to obtain an edge extraction result and N color blocks, where the color block extraction refers to uniformly configuring color values of each pixel point included in each sub-image region to be the same value, and the same value is determined according to an original color value of each pixel point;

a fusion module 804 configured to superimpose the edge extraction result on the N color blocks to obtain a fusion image;

a prediction module 805 configured to perform human key point prediction on a human body region of the human body image to obtain a human key point prediction result;

and a second processing module 806, configured to draw a wrinkle on the fused image based on the human body key point prediction result and the set wrinkle occurrence rule, so as to obtain a target image.

According to the device provided by the embodiment of the disclosure, after the human body image to be processed is sequentially subjected to the processing of human body partition, edge extraction, color block extraction and the like, the edge extraction result is superposed on the extracted color block to form a fused image; then, predicting the human body key points of the human body region of the human body image to obtain a human body key point prediction result; and finally, drawing folds on the fusion image to obtain a target image based on the human body key point prediction result and the set fold occurrence rule. Namely, the embodiment of the disclosure can add the wrinkle to the fused image based on the predicted human key points and wrinkle appearance rules, so that the human image to be processed is converted into an image with a certain painting style, and the image processing mode is enriched. For example, when the wrinkle occurrence rule is a cartoon style for two-dimensional, a cartoon type image having a two-dimensional style can be obtained. In addition, the edge extraction and the color block extraction are carried out after the human body is partitioned, so that the edge extraction and the color block extraction are guaranteed to have semantic selectivity and are not disordered and random, and then the wrinkles are drawn based on the predicted human body key points, so that the edge extraction effect when the human body boundary and the background color are close is guaranteed, more accurate wrinkles can be obtained, the image is not disordered, the wrinkles are guaranteed to be only shown at the required positions, and the image processing effect is better.

In a possible implementation manner, the first processing module is configured to perform partition processing on the human body region according to a human body part and clothing included in the human body region, to obtain masks for indicating the N sub-image regions, where the masks corresponding to each sub-image region are respectively represented by different colors; wherein, a sub-image area corresponds to a color block, and each sub-image area comprises a human body part or a dress.

In one possible implementation manner, the extraction module includes:

the first processing unit is configured to carry out filtering processing on each sub-image area to obtain a filtering image;

the calculation unit is configured to calculate gradient data of each pixel point in the filtering image to obtain a gradient image;

the second processing unit is configured to filter the pixel points included in the gradient image according to the gradient data of each pixel point to obtain the residual pixel points which are not filtered;

the third processing unit is configured to perform screening processing on the residual pixel points based on the gradient strength of the residual pixel points and the two set thresholds to obtain screened pixel points;

and the connecting unit is configured to connect the screened pixel points to obtain the edge extraction result.

In a possible implementation manner, the second processing unit is configured to, for each pixel point in the gradient image, compare the gradient strength of the pixel point with the gradient strength of two adjacent pixel points; if the gradient intensity of the pixel point is greater than the gradient intensities of the two pixel points, the pixel point is reserved; if the gradient intensity of the pixel point is minimum or smaller than the gradient intensity of any one of the two pixel points, filtering the pixel point; and the two adjacent phase pixel points are positioned in the gradient direction of the pixel point and positioned at two sides of the pixel point.

In a possible implementation manner, the third processing unit is configured to, for each of the remaining pixels, if the gradient strength of the pixel is greater than the set first threshold, retain the pixel, and mark the pixel as a first-class pixel; or if the gradient strength of the pixel point is smaller than the first threshold and larger than the second threshold, and the neighborhood pixel points of the pixel point comprise the first type pixel point, reserving the pixel point, and marking the pixel point as a second type pixel point; or, if the gradient strength of the pixel point is smaller than the first threshold and larger than the second threshold, and the neighborhood pixel points of the pixel point do not include the first type pixel point, filtering the pixel point; or, if the gradient strength of the pixel point is smaller than the second threshold, filtering the pixel point.

In one possible implementation manner, the extraction module further includes:

the acquisition unit is configured to acquire the color average value of all pixel points in each sub-image area;

and the extraction unit is configured to configure the color value of each pixel point in the sub-image area as the color average value to obtain a color block corresponding to the sub-image area.

In a possible implementation manner, the obtaining unit is configured to obtain a first color average value of all pixel points in the sub-image region in an R channel, a second color average value in a G channel, and a third color average value in a B channel, respectively;

the extracting unit is configured to configure the color value of each pixel point in the sub-image region in the R channel as the first color average value, the color value in the G channel as the second color average value, and the color value in the B channel as the third color average value.

In one possible implementation manner, the second processing module includes:

a determination unit configured to generate a plurality of selectable items based on the human body key point prediction result and the wrinkle occurrence rule; each selectable item corresponds to two human key points in the human key point prediction result; displaying the plurality of selectable items; determining M target selectable items selected by a user in the plurality of selectable items, wherein the value of M is a positive integer; for each target selectable item, taking two human body key points corresponding to the target selectable item as a starting point and an end point of a fold to be drawn respectively;

and the drawing unit is configured to connect the determined starting point and the corresponding end point to obtain a fold drawn on the fused image.

In a possible implementation manner, the rendering unit is configured to connect the determined start point and the corresponding end point by using a bezier curve to obtain a wrinkle rendered on the fused image.

All the above optional technical solutions may be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 9 shows a block diagram of an electronic device according to an exemplary embodiment of the present disclosure.

In general, the apparatus 900 includes: a processor 901 and a memory 902.

Processor 901 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so forth. The processor 901 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 901 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 901 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, the processor 901 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

Memory 902 may include one or more computer-readable storage media, which may be non-transitory. The memory 902 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 902 is used to store at least one instruction for execution by processor 901 to implement an image processing method performed by an electronic device provided by method embodiments in the present disclosure.

In some embodiments, the apparatus 900 may further optionally include: a peripheral interface 903 and at least one peripheral. The processor 901, memory 902, and peripheral interface 903 may be connected by buses or signal lines. Various peripheral devices may be connected to the peripheral interface 903 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 904, a touch display screen 905, a camera 906, an audio circuit 907, a positioning component 908, and a power supply 909.

The peripheral interface 903 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 901 and the memory 902. In some embodiments, the processor 901, memory 902, and peripheral interface 903 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 901, the memory 902 and the peripheral interface 903 may be implemented on a separate chip or circuit board, which is not limited by this embodiment.

The Radio Frequency circuit 904 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 904 communicates with communication networks and other communication devices via electromagnetic signals. The radio frequency circuit 904 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 904 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 904 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, generations of mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 904 may also include NFC (Near Field Communication) related circuits, which are not limited by this disclosure.

The display screen 905 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 905 is a touch display screen, the display screen 905 also has the ability to capture touch signals on or over the surface of the display screen 905. The touch signal may be input to the processor 901 as a control signal for processing. At this point, the display 905 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display screen 905 may be one, providing the front panel of the device 900; in other embodiments, the display 905 may be at least two, respectively disposed on different surfaces of the device 900 or in a folded design; in still other embodiments, the display 905 may be a flexible display, disposed on a curved surface or on a folded surface of the device 900. Even more, the display screen 905 may be arranged in a non-rectangular irregular figure, i.e. a shaped screen. The Display panel 905 can be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and other materials.

The camera assembly 906 is used to capture images or video. Optionally, camera assembly 906 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 906 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

Audio circuit 907 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 901 for processing, or inputting the electric signals to the radio frequency circuit 904 for realizing voice communication. The microphones may be multiple and placed at different locations on the device 900 for stereo sound acquisition or noise reduction purposes. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 901 or the radio frequency circuit 904 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, audio circuit 907 may also include a headphone jack.

The positioning component 908 is used to locate the current geographic Location of the device 900 for navigation or LBS (Location Based Service). The Positioning component 908 may be a Positioning component based on the Global Positioning System (GPS) in the united states, the beidou System in china, or the galileo System in russia.

A power supply 909 is used to supply power to the various components in the device 900. The power source 909 may be alternating current, direct current, disposable or rechargeable. When the power source 909 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the device 900 also includes one or more sensors 910. The one or more sensors 910 include, but are not limited to: acceleration sensor 911, gyro sensor 912, pressure sensor 913, fingerprint sensor 914, optical sensor 915, and proximity sensor 916.

The acceleration sensor 911 may detect the magnitude of acceleration in three coordinate axes of a coordinate system established with the apparatus 900. For example, the acceleration sensor 911 may be used to detect the components of the gravitational acceleration in three coordinate axes. The processor 901 can control the touch display 905 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 911. The acceleration sensor 911 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 912 may detect a body direction and a rotation angle of the device 900, and the gyro sensor 912 may cooperate with the acceleration sensor 911 to acquire a 3D motion of the device 900 by the user. The processor 901 can implement the following functions according to the data collected by the gyro sensor 912: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

The pressure sensors 913 may be disposed on the side bezel of the device 900 and/or underneath the touch display screen 905. When the pressure sensor 913 is disposed on the side frame of the device 900, the user's holding signal of the device 900 may be detected, and the processor 901 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 913. When the pressure sensor 913 is disposed at a lower layer of the touch display 905, the processor 901 controls the operability control on the UI interface according to the pressure operation of the user on the touch display 905. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 914 is used for collecting a fingerprint of the user, and the processor 901 identifies the user according to the fingerprint collected by the fingerprint sensor 914, or the fingerprint sensor 914 identifies the user according to the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, processor 901 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings, etc. The fingerprint sensor 914 may be disposed on the front, back, or side of the device 900. When a physical key or vendor Logo is provided on device 900, fingerprint sensor 914 may be integrated with the physical key or vendor Logo.

The optical sensor 915 is used to collect ambient light intensity. In one embodiment, the processor 901 may control the display brightness of the touch display 905 based on the ambient light intensity collected by the optical sensor 915. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 905 is increased; when the ambient light intensity is low, the display brightness of the touch display screen 905 is turned down. In another embodiment, the processor 901 can also dynamically adjust the shooting parameters of the camera assembly 906 according to the ambient light intensity collected by the optical sensor 915.

A proximity sensor 916, also known as a distance sensor, is typically provided on the front panel of the device 900. The proximity sensor 916 is used to capture the distance between the user and the front of the device 900. In one embodiment, the processor 901 controls the touch display 905 to switch from the bright screen state to the dark screen state when the proximity sensor 916 detects that the distance between the user and the front face of the device 900 is gradually decreased; when the proximity sensor 916 detects that the distance between the user and the front of the device 900 becomes gradually larger, the touch display 905 is controlled by the processor 901 to switch from the breath screen state to the bright screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 9 does not constitute a limitation of the device 900 and may include more or fewer components than shown, or combine certain components, or employ a different arrangement of components.

In an exemplary embodiment, a computer-readable storage medium comprising instructions, such as a memory comprising instructions, executable by a processor of the electronic device 900 to perform the image processing method described above is also provided. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, a computer program product is also provided, in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform the image processing method as described in the above method embodiment.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

31页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:图像处理方法和装置、电子设备及存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!