Image processing method and device, electronic device and storage medium

文档序号：197392 发布日期：2021-11-02 浏览：44次中文

阅读说明：本技术 图像处理方法和装置、电子设备及存储介质 (Image processing method and device, electronic device and storage medium ) 是由王柏润张学森刘春亚陈景焕伊帅于 2021-05-19 设计创作，主要内容包括：本公开涉及一种图像处理方法和装置、神经网络训练方法和装置、动作识别方法和装置、电子设备及存储介质。所述图像处理方法包括：获取图像中的人体检测框、与目标身体部位对应的目标关键点、以及所述人体检测框与所述目标关键点的第一关联关系信息；根据所述目标关键点和所述人体检测框,生成针对所述目标身体部位的目标检测框；根据所述第一关联关系信息和预先标注的第二关联关系信息,确定第三关联关系信息,其中,所述第二关联关系信息表征第一身体部位与所述人体检测框的关联关系,所述第三关联关系信息表征所述目标检测框与针对所述第一身体部位的第一检测框的关联关系。(The disclosure relates to an image processing method and apparatus, a neural network training method and apparatus, a motion recognition method and apparatus, an electronic device, and a storage medium. The image processing method comprises the following steps: acquiring a human body detection frame, a target key point corresponding to a target body part and first incidence relation information of the human body detection frame and the target key point in an image; generating a target detection frame aiming at the target body part according to the target key point and the human body detection frame; and determining third association relation information according to the first association relation information and second association relation information which is labeled in advance, wherein the second association relation information represents the association relation between a first body part and the human body detection frame, and the third association relation information represents the association relation between the target detection frame and the first detection frame aiming at the first body part.)

1. An image processing method, comprising:

acquiring a human body detection frame, a target key point corresponding to a target body part and first incidence relation information of the human body detection frame and the target key point in an image;

generating a target detection frame aiming at the target body part according to the target key point and the human body detection frame;

determining third association relation information according to the first association relation information and second association relation information labeled in advance, wherein,

the second incidence relation information represents the incidence relation between the first body part and the human body detection frame,

the third association information characterizes an association of the target detection frame with a first detection frame for the first body part.

2. The image processing method according to claim 1, wherein the acquiring of the human body detection frame in the image, the target key point corresponding to the target body part, and the first association relationship information between the human body detection frame and the target key point comprises:

acquiring a human body detection frame in the image and human body key points in the human body detection frame;

extracting target key points corresponding to the target body part from the human body key points;

and generating first incidence relation information of the human body detection frame and the extracted target key points.

3. The image processing method according to claim 1,

the target detection frame takes the target key point as a positioning point and meets a preset area proportional relation with at least one of the human body detection frame and a preset detection frame,

the preset detection frame is a pre-labeled detection frame aiming at a preset body part.

4. The image processing method according to claim 3, wherein the area of the target detection frame is determined according to the following parameters:

a first weight of the human body detection frame,

The preset area proportional relation between the human body detection frame and the target detection frame,

The area of the human body detection frame,

A second weight of the preset detection frame,

A preset area proportional relation between the preset detection frame and the target detection frame, and

and presetting the area of the detection frame.

5. The image processing method according to any one of claims 1 to 4, wherein the determining third association information according to the first association information and pre-labeled second association information comprises:

and associating the first detection frame associated with the human body detection frame with the target detection frame to generate third association relation information.

6. The image processing method according to any one of claims 1 to 5, characterized in that the method further comprises:

in the case where the target body part includes at least one of two first symmetric parts of a human body, orientation discrimination information of the target body part is acquired.

7. The image processing method according to claim 6, wherein determining third association information according to the first association information and pre-labeled second association information comprises:

acquiring orientation discrimination information of the first body part in a case where the first body part includes at least one of two second symmetric parts of a human body;

associating the first detection frame which is associated with the human body detection frame and has the same direction distinguishing information with the target detection frame according to the first association relation information and second association relation information which is marked in advance; and

and generating third association relation information according to the association result of the first detection frame and the target detection frame.

8. The image processing method according to claim 6 or 7, wherein the acquiring orientation discrimination information of the target body part comprises:

and determining the position distinguishing information of the target body part based on the human body detection frame and the target key point corresponding to the target body part.

9. The image processing method according to any one of claims 6 to 8, characterized in that the method further comprises:

generating a relevance label of the target body part based on the third relevance relation information and the orientation distinguishing information of the target body part.

10. The image processing method of any of claims 1-9, wherein the first body part and the target body part are one of: face, hand, elbow, knee, shoulder and foot.

11. The image processing method according to any one of claims 1 to 10, further comprising:

generating fifth incidence relation information according to the second incidence relation information and the pre-labeled fourth incidence relation information, wherein,

the fourth incidence relation information represents the incidence relation between the second body part and the human body detection frame,

the fifth incidence relation information characterizes an incidence relation between the target detection frame and a second detection frame for the second body part.

12. The image processing method according to claim 11,

the first body part is different from the second body part, and

the second body portion is one of the following: face, hand, elbow, knee, shoulder and foot.

13. The image processing method according to any one of claims 1 to 12,

and displaying corresponding incidence relation marking information on the image according to the third incidence relation information or according to the second incidence relation information and the third incidence relation information.

14. A method of training a neural network for detecting an association between body parts in an image, the method comprising:

training the neural network by using an image training set;

wherein the images in the image training set contain labeling information,

the annotation information includes information of an association between the first body part and the target body part in the image,

the incidence relation information is determined according to the method of any one of claims 1-13.

15. A method of motion recognition, the method comprising:

recognizing the motion of the human body in the image based on the incidence relation information of the first body part and the target body part in the image,

wherein the incidence relation information is derived from a neural network trained by the method of claim 14.

16. An image processing apparatus characterized by comprising:

the key point acquisition module is used for acquiring a human body detection frame, a target key point corresponding to a target body part and first incidence relation information of the human body detection frame and the target key point in an image;

the detection frame generation module is used for generating a target detection frame aiming at the target body part according to the target key point and the human body detection frame;

and the incidence relation determining module is used for determining third incidence relation information according to the first incidence relation information and second incidence relation information which is labeled in advance, wherein the second incidence relation information represents the incidence relation between a first body part and the human body detection frame, and the third incidence relation information represents the incidence relation between the target detection frame and the first detection frame aiming at the first body part.

17. An apparatus for training a neural network, the neural network being configured to detect an association between body parts in an image, the apparatus comprising:

a training module for training the neural network using an image training set;

wherein the images in the training set of images contain annotation information, the annotation information comprising information of an association between the first body part and the target body part in the images, the association information being determined according to the method of any one of claims 1-13.

18. An action recognition device, characterized in that the device comprises:

an identification module for identifying a motion of a human body in an image based on association information of a first body part and a target body part in the image, wherein the association information is derived from a neural network trained by the method of claim 14.

19. An electronic device, comprising:

a memory; and

a processor;

the memory is for storing computer instructions executable by the processor for implementing the method of any one of claims 1 to 15 when the computer instructions are executed.

20. A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the operations of the method of any one of claims 1 to 15.

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to an image processing method and apparatus, an electronic device, and a storage medium.

Background

With the development of artificial intelligence technology, neural networks are more and more widely applied to data detection and discrimination, so that labor cost is reduced, and efficiency and accuracy are improved. The training of the neural network needs to use large-scale labeled training samples as a training set. However, the labeling of the human body parts in the image cannot be efficiently and accurately performed at present, so that it is difficult to obtain a sufficient training sample, and the efficiency and accuracy of model training are adversely affected.

Disclosure of Invention

The present disclosure provides an image processing method and apparatus, an electronic device, and a storage medium to solve the disadvantages of the related art.

According to a first aspect of the present disclosure, there is provided an image processing method including: acquiring a human body detection frame, a target key point corresponding to a target body part and first incidence relation information of the human body detection frame and the target key point in an image; generating a target detection frame aiming at the target body part according to the target key point and the human body detection frame; and determining third association relation information according to the first association relation information and second association relation information which is labeled in advance, wherein the second association relation information represents the association relation between a first body part and the human body detection frame, and the third association relation information represents the association relation between the target detection frame and the first detection frame aiming at the first body part.

According to a second aspect of the present disclosure, there is provided a training method of a neural network for detecting an association between body parts in an image, the method comprising: training the neural network by using an image training set; wherein the images in the image training set contain annotation information, the annotation information comprising association information between a first body part and a target body part in the images, the association information being determined according to the method of the first aspect.

According to a third aspect of the present disclosure, there is provided a motion recognition method, the method comprising: the motion of the human body in the image is identified based on the association information of the first body part and the target body part in the image, wherein the association information is obtained by a neural network trained by the method according to the second aspect.

According to a fourth aspect of the present disclosure, there is provided an image processing apparatus comprising: the key point acquisition module is used for acquiring a human body detection frame, a target key point corresponding to a target body part and first incidence relation information of the human body detection frame and the target key point in an image; the detection frame generation module is used for generating a target detection frame aiming at the target body part according to the target key point and the human body detection frame; and the incidence relation determining module is used for determining third incidence relation information according to the first incidence relation information and second incidence relation information which is labeled in advance, wherein the second incidence relation information represents the incidence relation between a first body part and the human body detection frame, and the third incidence relation information represents the incidence relation between the target detection frame and the first detection frame aiming at the first body part.

According to a fifth aspect of the present disclosure, there is provided a training apparatus of a neural network for detecting an association relationship between body parts in an image, the apparatus comprising: a training module for training the neural network using an image training set; wherein the images in the image training set contain annotation information, the annotation information comprising association information between a first body part and a target body part in the images, the association information being determined according to the method of the first aspect.

According to a sixth aspect of the present disclosure, there is provided a motion recognition apparatus, the apparatus comprising: an identifying module, configured to identify a motion of a human body in the image based on association relationship information of the first body part and the target body part in the image, where the association relationship information is obtained from a neural network trained by the method according to the second aspect.

According to a seventh aspect of the present disclosure, there is provided an electronic device comprising a memory for storing computer instructions executable by the processor and a processor for implementing the method of the first, second or third aspect when executing the computer instructions.

According to an eighth aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of the first, second or third aspect.

According to the embodiment, the human body detection frames corresponding to all human bodies in the image can be accurately acquired by acquiring the human body detection frames in the image, the target key points corresponding to the target body parts and the first association relation information of the human body detection frames and the target key points, and the target key points associated with each human body detection frame are acquired; further generating a target detection frame aiming at the target body part according to the target key point and the human body detection frame; and finally, according to the pre-labeled second association relation information of the first body part and the human body detection frame and the first association relation information, determining third association relation information of the target body part and the first body part, and realizing automatic association of the target body part and the first body part. The determined third association relation information can be used as labeling information of the target body part in the image, the problem of low efficiency of manual labeling is solved, and the efficiency of association labeling between the body parts in the image is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

Fig. 1 is a flowchart illustrating an image processing method according to an embodiment of the present disclosure.

Fig. 2 is a schematic diagram illustrating a processing result of an image according to an embodiment of the present disclosure.

Fig. 3 is a schematic structural diagram of an image processing apparatus shown in an embodiment of the present disclosure.

Fig. 4 is a schematic structural diagram of an electronic device shown in an embodiment of the present disclosure.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

With the development of artificial intelligence technology, the neural network can detect and judge data, so that the labor cost is reduced, and the efficiency and the accuracy are improved. The training of the neural network needs to use large-scale labeled training samples as a training set. The human body images used for training the motion recognition model need to be labeled on all parts of the human body, and the labeling cannot be efficiently and accurately performed in the related technology, so that the efficiency and the accuracy of the model training are adversely affected.

Based on this, in a first aspect, at least one embodiment of the present disclosure provides an image processing method, please refer to fig. 1, which illustrates a flow of the method, including steps S101 to S103.

The image targeted by the image processing method may be an image for training a neural network model, wherein the neural network model may be a model for recognizing a human motion, for example, the model may be used for recognizing a motion of a game player in a table game scene. In an exemplary application scenario, a video during a table game may be recorded and then input into the model, and the model may identify the actions of each person in each frame of image in the video; the model may perform motion recognition by recognizing several parts of the human body. The image targeted by the image processing method comprises at least one human body, and the positions of a plurality of body parts of the human body are marked by using rectangular frames and the like in advance.

In step S101, a human body detection frame, a target key point corresponding to a target body part, and first association relationship information between the human body detection frame and the target key point in an image are acquired.

The image comprises at least one human body, each human body corresponds to one human body detection frame, the corresponding human body can be completely surrounded by the human body detection frames, and the human body detection frames can be minimum frames surrounding the corresponding human body. The shape of the human detection frame may be a rectangle or other reasonable shape, and the present disclosure is not intended to be particularly limited thereto. The human body detection frame comprises at least one target key point, and the target key points correspond to target body parts of the human body, such as body parts of wrists, shoulders, elbows and the like. A target body part of the human body corresponds to at least one target keypoint. The number of target key points corresponding to different target body parts of the human body may be the same or different, and this disclosure is not intended to be particularly limited.

In this step, the human body detection frame may be acquired as follows: the human body key points can be detected from the image, the edge of the human body object is determined, and then a human body detection frame surrounding the human body object is constructed, so that the position of the human body detection frame in the image is determined. Specifically, when the human body detection frame is rectangular, the coordinate positions of four vertices of the rectangular frame may be obtained.

Acquiring the target key points corresponding to the target body part may include: and obtaining the position information of the target key point in the image, for example, obtaining the position coordinates of one or more pixel points corresponding to the target key point. The positions of the target key points can be determined by detecting the target key points of the human body detection frame or by detecting the target key points in the image according to the relative position characteristics of the target body part in the human body.

The first incidence relation information of the target key point and the human body detection frame comprises the attribution relation of the target key point and the human body corresponding to the human body detection frame, namely, when the target key point belongs to the human body in the human body detection frame, the target key point is associated with the human body detection frame; on the contrary, when the target key point does not belong to the human body in the human body detection frame, the target key point is not associated with the human body detection frame. The first association relationship information may be determined based on positions of the human body detection box and the target key point.

In one example, the target body part includes any one of: human face, human hand, elbow, knee, shoulder and human foot; accordingly, the target key points corresponding to the target body part include any one of: face keypoints, hand keypoints, elbow keypoints, knee keypoints, shoulder keypoints, and foot keypoints.

In step S102, a target detection frame for the target body part is generated according to the target key point and the human body detection frame.

The target body part is a body part of the human body or other body parts needing to be marked and/or related in the image. According to the acquired position of the target key point, an enclosing frame enclosing the target key point can be generated to serve as a detection frame of the corresponding target body part.

When a plurality of target body parts to be labeled are provided, the target body parts can be labeled in batches, so that the detection frames for the target body parts can be determined in batches in the step, and the target body parts can be labeled in sequence, so that the detection frames for the target body parts can be determined one by one in the step.

In this step, the detection frame for the target body part is determined according to one or more target key points and the corresponding human body detection frame. The detection frame for the target body part may be used as a position tag for the target body part.

As an example, fig. 2 shows a schematic diagram of a detection box for a target body part. As shown in fig. 2, the image includes three human bodies 210, 220, and 230, and a detection frame 212 of the elbow corresponding to the human body 210, a detection frame 222 of the elbow corresponding to the human body 220, and a detection frame 232 of the elbow corresponding to the human body 230, and the detection frames 212 for the elbow are paired, that is, include the left elbow and the right elbow.

In step S103, third association relationship information is determined according to the first association relationship information and pre-labeled second association relationship information, where the second association relationship information represents an association relationship between a first body part and the human body detection frame, and the third association relationship information represents an association relationship between the target detection frame and the first detection frame for the first body part.

The first body part may be an annotated body part, and the annotation information may include a position of the detection frame for the first body part and a relationship with the human body. Optionally, the annotation information of the first body part further includes, but is not limited to, at least one of a part name and orientation distinguishing information.

The second association relationship information may be obtained based on label information of the first body part, and the association relationship between the first body part and the human body detection frame may be determined by the association relationship between the first body part and the human body in the human body detection frame.

The third association information may be determined as follows: associating the human body detection frame with a target detection frame associated with the human body detection frame; and associating the target detection frame associated with the same human body detection frame with the first detection frame for the first body part according to the association result between the human body detection frame and the target detection frame and the second association relationship information, thereby obtaining third association relationship information.

In one example, the first body part is a human face, and the target body part is an elbow, the third association information between the human face and the elbow may be determined according to the above method. Referring specifically to fig. 2, three human bodies 210, 220 and 230 are shown, the first body part of the human body 210 is a human face 211, and the target body part of the human body 210 is an elbow 212, so that third association information between the human face 211 and the elbow 212 can be determined. Similarly, the first body part of the human body 220 is the human face 221, and the target body part of the human body 220 is the elbow 222, so that third association information of the human face 221 and the elbow 222 can be determined; the first body part of the human body 230 is the human face 231, and the target body part of the human body 230 is the elbow 232, so that the third association information of the human face 231 and the elbow 232 can be determined.

It is to be understood that the elbow is just one example of the target body part, and in practical applications, the target body part may also be a wrist, a shoulder, a neck, a knee, etc. In some scenarios, face information is used to distinguish different people, which may be associated with identity information of the people. According to the method, the human face and the elbow of the same human body are associated by using the labeled human face in the image through the human body detection frame as a medium, so that the identity information of the human body corresponding to the elbow can be determined, the detection of the association relationship between the human face and other body parts except the human face from the image is facilitated, and the identity information of the people corresponding to the other body parts is determined.

In another example, where the first body part is a human hand and the target body part is an elbow, third association information may be determined for the human hand and the elbow. Referring specifically to fig. 2, three human bodies 210, 220 and 230 are shown, the first body part of the human body 210 is a human hand 213, and the target body part of the human body 210 is an elbow 212, so that third association information between the human hand 213 and the elbow 212 can be determined; the first body part of the human body 220 is a human hand 223, and the target body part of the human body 220 is an elbow 222, so that third association information of the human hand 223 and the elbow 222 can be determined; the first body part of the human body 230 is a human hand 233, and the target body part of the human body 230 is an elbow 232, whereby the third relational information of the human hand 233 and the elbow 232 can be determined.

The target detection frame and the third association information can be used as labeling information of the target body part in the image, and therefore the method achieves automatic labeling of the target body part in the image. When the neural network for recognizing the human body action or the body part is trained based on the images, a large number of images can be automatically labeled quickly, so that a sufficient training sample is provided for the training of the neural network, and the difficulty in obtaining the training sample of the neural network is reduced.

According to the embodiment, the automatic association of the target body part and the first body part is realized by acquiring the human body detection frame, the target key point corresponding to the target body part and the first association relationship information between the human body detection frame and the target key point in the image, further generating the target detection frame for the target body part according to the target key point and the human body detection frame, and finally determining the third association relationship information between the target detection frame and the first detection frame for the first body part according to the second association relationship information between the pre-labeled human body detection frame and the first body part and the first association relationship information, and then, the incidence relation labeling of the target body part and the first body part is realized, the problem of low efficiency of manual labeling is solved, and the incidence labeling efficiency between the body parts in the image is improved.

In some embodiments of the present disclosure, a human detection frame, a target key point, and first association relationship information between the human detection frame and the target key point in an image may be acquired in the following manner: firstly, acquiring a human body detection frame in an image and human body key points in the human body detection frame; next, extracting target key points corresponding to the target body part from the human body key points; and finally, generating first incidence relation information of the human body detection frame and the extracted target key points.

The human body detection frame comprises at least one human body key point, and the human body key points can correspond to at least one body part of a human body, such as a wrist, a shoulder, an elbow, a hand, a foot, a face and other body parts. One body part of the human body corresponds to at least one human body key point. The number of human body key points corresponding to different body parts of the human body may be the same or different, and this disclosure is not intended to be particularly limited.

In this step, the key points of the human body can be obtained as follows: and inputting the image into a neural network for detecting a human body object in the image, and acquiring the position information of the human body key point output by the neural network. Optionally, the neural network may further output position information of the human body detection frame. The neural network for detecting the human body object in the image is a model trained by mass data, can accurately extract the features of each position of the image, and identify the content of the image according to the extracted features, for example, can identify the human body key points in the image according to the extracted features and determine the position information of the human body key points, and optionally, can identify the human body detection frame in the image according to the extracted features and determine the position information of the human body key points.

In this step, the corresponding edge of the human body can be determined based on the detected position information of the key points of the human body, and then a human body detection frame surrounding the human body is constructed, so that the position of the human body detection frame in the image is determined. The attribution relationship between the human body detection frame and the human body key points can be determined based on the position containing relationship between the human body detection frame and the human body key points in the image.

In one example, the acquired body parts corresponding to the human key points include at least one of the following: human face, human hand, elbow, knee, shoulder and human foot, correspondingly, human key points include at least one of the following: face keypoints, hand keypoints, elbow keypoints, knee keypoints, shoulder keypoints, and foot keypoints.

In this step, the position information of all the key points of the human body can be screened according to the relative position characteristics of the target body part in the human body, so that the key points of the human body matched with the relative position characteristics of the target body part are determined as the target key points. In one example, the human body detection frame includes a face key point, a hand key point, an elbow key point, a knee key point, a shoulder key point and a foot key point, and when the target portion is an elbow, the elbow key point can be extracted from the human body key points to serve as the target key point.

In this step, the first association relationship information between the target key point and the human body detection frame may be determined according to the attribution relationship between the extracted target key point and the human body detection frame.

In some embodiments of the present disclosure, the target detection frame takes the target key point as a positioning point, and satisfies a preset area proportional relationship with at least one of the human body detection frame and a preset detection frame, where the preset detection frame is a pre-labeled detection frame for a preset body part.

The positioning point of the target detection frame may be the center of the detection frame, that is, the target key point is the center of the target detection frame.

Wherein, the preset area proportional relationship may be: within a preset proportion interval, the proportion interval can be obtained according to prior knowledge of human engineering and the like, or determined by statistics of the area ratio of the target body part, the preset body part and the human body in some images. The preset area proportional relation between the detection frame and the human body detection frame aiming at different target body parts can be different, namely the preset area proportional relation between each target detection frame and the human body detection frame can be independently set. The preset area proportional relationship of the detection frame for the target body part and different preset body parts can be different, that is, the preset area proportional relationship of the target detection frame and different preset detection frames can be set independently.

According to the method, the target detection frame can be quickly constructed, and the position marking of the target body part is realized.

In this step, the area of the target detection frame may be determined according to the following parameters: the first weight of the human body detection frame, the preset area proportional relation between the human body detection frame and the target detection frame, the area of the human body detection frame, the second weight of the preset detection frame, the preset area proportional relation between the preset detection frame and the target detection frame and the area of the preset detection frame. That is, the target detection frame and the human body detection frame may only satisfy a preset area proportional relationship, that is, the first weight is 1, and the second weight is 0; or the first weight is 0 and the second weight is 1, only the preset area proportional relation with the preset detection frame is satisfied; the first weight and the second weight can also respectively satisfy corresponding preset area proportion relations with the human body detection frame and the preset detection frame, namely, the first weight and the second weight are both proportion from 0 to 1, and the sum of the first weight and the second weight is 1.

Specifically, the area of the target detection frame may be determined according to the following formula:

S＝w₁×t₁×S₁+w₂×t₂×S₂

wherein S is the area of the target detection frame, w₁Is a first weight, t₁A preset area ratio, S, of the human body detection frame to the target detection frame₁Is the area of the human body detection frame, w₂Is a second weight, t₂A preset area ratio S of the preset detection frame to the target detection frame₂The area of the preset detection frame is used.

The target detection frame may be the same as the human body detection frame in shape, for example, the human body detection frame may be rectangular, the target detection frame may also be rectangular, and the length-width ratio of the human body detection frame is equal to the length-width ratio of the target detection frame; for example, the preset area ratio relationship between the target detection frame and the human body detection frame for the elbow of the target body part is 1:9, and when the human body detection frame is rectangular, the length and width of the human body detection frame can be reduced to 1/3 in equal proportion, and then the length and width of the target detection frame can be obtained.

The target detection frame may also be different from the corresponding human detection frame in shape, and the corresponding detection frame shape may be preset according to different positions, for example, the human detection frame is rectangular, and the human face detection frame is circular. When the target detection frame and the human body detection frame are both rectangular, the aspect ratio may be different, and the aspect ratio of the rectangular detection frame may be preset according to different body parts.

In some scenes, the size of the face can represent the depth information of the human body to a certain extent, that is, the area of the detection frame of the face can represent the depth information of the human body, so that the face can be used as a preset body part, that is, the area of the target detection frame is determined by combining two aspects of the human body detection frame and the face detection frame.

In an embodiment of the present disclosure, determining the target detection frame may be determining a position of the detection frame for the target body part on the image, for example, when the detection frame is rectangular, coordinates of four vertices of the detection frame may be determined. In this embodiment, the target detection frame is generated according to constraint conditions in various aspects such as shape, area, preset weight, and location point position, so that a target detection frame with higher accuracy can be obtained, and further, the labeling information of the target body part generated according to the target detection frame also has higher accuracy. In addition, the method automatically generates the target detection frame aiming at the target body part, solves the problem of low efficiency of manual labeling, and improves the labeling efficiency of the target body part.

The parts of the human body comprise individual parts such as the face and the neck, and symmetrical parts such as the hands, the elbows, the knees, the shoulders, the feet and the like. The symmetric parts are present in pairs and have orientation discrimination information for discriminating the orientation of the body part in the human body, for example, left and right, illustratively, the orientation discrimination information of the left hand, left elbow, left arm is "left", and the orientation discrimination information of the right hand, right elbow, right arm is "right". Further, the first body part may be a single part or a symmetrical part, the target body part may be an independent part or a symmetrical part, and the type of the first body part and the type of the target body part may determine the mode of generating the third association information.

In the first case, that is, when the first body part includes a single part and the target body part includes a single part, the third association information may be generated in the following manner: and associating the first detection frame aiming at the first body part and the target detection frame aiming at the target body part, which are associated with the human body detection frame, to generate third association relation information. For example, if the first body part is a human face and the target body part is a neck, the third association information between the human face and the neck is determined.

In the second case, i.e. when the first body part comprises a single part and the target body part comprises at least one of two first symmetric parts of a human body, third association information is determined in the following manner: firstly, acquiring orientation distinguishing information of a target body part; and then, associating the first detection frame and the target detection frame associated with the same human body detection frame according to the first association relation information and the second association relation information labeled in advance to generate third association relation information. The target detection frame, the third association information and the direction distinguishing information of the target body part can be used as the labeling information of the target body part in the image.

For example, if the first body part is a human face and the target body part includes a left elbow and a right elbow, the third association relationship information between the human face and the left elbow and the third association relationship information between the human face and the right elbow are determined, and then the detection frame for the left elbow, the third association relationship information between the human face and the left elbow, and the orientation distinguishing information "left" may be used as the labeling information of the left elbow, and the detection frame for the right elbow, the third association relationship information between the human face and the right elbow, and the orientation distinguishing information "right" may be used as the labeling information of the right elbow.

In a third case, i.e. when the first body part comprises at least one of two second symmetric parts of a human body and the target body part comprises a single part, third association information is determined in the following manner: firstly, acquiring orientation distinguishing information of a first body part; then, according to the first incidence relation information and second incidence relation information labeled in advance, a first detection frame and a target detection frame which are associated with the same human body detection frame are associated to generate third incidence relation information; the target detection frame, the third association information, and the orientation distinguishing information of the first body part may be used as labeling information of the target body part in the image.

For example, if the target body part is a face and the first body part includes a left elbow, the third association information between the face and the left elbow is determined, and then the detection frame for the face, the third association information between the face and the left elbow, and the orientation distinguishing information "left" may be used as the labeling information of the face.

In a fourth case, that is, when the target body part includes at least one of two first symmetric parts of a human body and the first body part includes at least one of two second symmetric parts of a human body, the third association information is determined as follows: first, obtaining orientation discrimination information of the target body part, and obtaining orientation discrimination information of the first body part; then, according to the first incidence relation information and second incidence relation information which is labeled in advance, associating a first detection frame which is associated with the same human body detection frame and has the same direction distinguishing information with a target detection frame; and finally, generating third association relation information according to the association result of the first detection frame and the target detection frame, wherein the target detection frame, the third association relation information and the orientation distinguishing information of the target body part can be used as the labeling information of the target body part in the image.

For example, the first body part includes a left hand and a right hand, and the target body part includes a left elbow and a right elbow, the third correlation information between the left elbow and the third correlation information between the right elbow and the right elbow may be determined according to the detected relative position relationship between the left hand, the right hand, and the left elbow and the right elbow, and further, the detection frame for the left elbow, the third correlation information between the left elbow and the left elbow, and the orientation distinction information "left" may be used as the label information of the left elbow, and the detection frame for the right elbow, the third correlation information between the right elbow and the right elbow, and the orientation distinction information "right" may be used as the label information of the right elbow.

Wherein the second incidence relation information can be obtained based on the labeling information of the first body part; that is, the annotation information of the first body part may include a correspondence relationship between the first body part, the human body, and the human body detection frame. The second association relationship information may also be obtained from the correspondence between the human body detection frame and the human body key points in the human body detection frame, and specifically, the correspondence between the first body part, the human body, and the human body detection frame may be obtained by the correspondence between the first body part and the human body key points therein and the correspondence between the human body key points and the human body detection frame.

The annotation information of the first body part may further include orientation distinguishing information corresponding to the at least one second symmetric part, that is, the at least one second symmetric part is labeled left or right correspondingly, so that the orientation distinguishing information of the first body part may be obtained from the annotation information of the first body part. The orientation distinguishing information of the first body part may also be determined based on the human body detection frame and the human body key points corresponding to the first body part, that is, the two second symmetric parts have different human body key points, so the orientation distinguishing information of the second symmetric parts can be determined according to the position information of the human body key points included in the two second symmetric parts, that is, the orientation of the human body key point is left, the orientation distinguishing information of the corresponding second symmetric part is left, and the orientation of the human body key point is right, and the orientation distinguishing information of the corresponding second symmetric part is right. The orientation distinguishing information of the target body part may also be determined based on the human body detection frame and the target key points corresponding to the target body part, and the specific obtaining manner is the same as the obtaining manner of the orientation distinguishing information of the first body part, and is not repeated here.

The target detection frame for the target body part and the first detection frame for the first body part, which are associated with the same human body detection frame, may be determined according to the position affiliation relationship, that is, the target detection frame and the first detection frame included in the same human body detection frame are used as the target detection frame and the first detection frame associated with the same human body detection frame.

In the embodiment of the disclosure, the third association relationship information is determined in different manners for different types of the first body part and the target body part, so that the accuracy of the association relationship between the first body part and the target body part is improved.

In the embodiment of the present disclosure, after determining third association information, the relevance tag of the target body part may be generated based on the third association information and the orientation distinguishing information of the target body part.

Wherein, when training a neural network for recognizing human body action or recognizing a body part based on an image, the relevance label can be one of labels of a target body part in the image. Further, the relevance label can contain orientation distinguishing information, so that the orientations of the symmetrical body parts can be distinguished, the labeling accuracy of the target body part is further improved, and the training efficiency and the training quality of the neural network are improved.

In some embodiments of the present disclosure, the image processing method further comprises: and generating fifth incidence relation information according to the second incidence relation information and pre-labeled fourth incidence relation information, wherein the fourth incidence relation information represents the incidence relation between a second body part and the human body detection frame, and the fifth incidence relation information represents the incidence relation between the target detection frame and a second detection frame aiming at the second body part.

The second body part is a labeled body part, and the labeling information may include a position, a part name, orientation distinguishing information, a corresponding relationship with a human body, and the like of a detection frame for the second body part. Therefore, the fourth association information may be obtained based on the label information of the second body part, that is, the association between the second body part and the human body detection frame may be determined by the association between the second body part and the human body within the human body detection frame.

The fourth association relationship information may also be obtained from the corresponding relationship between the human body detection frame and the human body key points in the human body detection frame, and the specific obtaining manner is the same as that of the first body part, which is not repeated here.

The first body part and the second body part are both independent parts, the first body part is a symmetric part, the second body part is independent parts, the first body part is an independent part, the second body part is a symmetrical part, the second body part is a third situation, the second body part is a symmetrical part, and the fourth situation, the first body part is an independent part.

In one example, the first body part is different from the second body part, and the second body part is one of: face, hand, elbow, knee, shoulder and foot.

For example, if the first body part is a human face and the second body part is a human hand, fifth association relationship information between the human face and the human hand can be determined; specifically, referring to fig. 2, three human bodies 210, 220, and 230 are shown in the figure, where a first body part of the human body 210 is a human face 211, and a second body part of the human body 210 is a human hand 213, so that fifth association relationship information between the human face 211 and the human hand 213 can be determined; the first body part of the human body 220 is a human face 221, the second body part of the human body 220 is a human hand 223, and fifth incidence relation information of the human face 221 and the human hand 223 can be determined; the first body part of the human body 230 is the human face 231, the second body part of the human body 230 is the human hand 233, and fifth association information between the human face 231 and the human hand 233 can be determined.

In the embodiment of the disclosure, by determining the fifth incidence relation information, the annotation information of the image can be further enriched, and thus, the image can be applied to training of a multitask neural network, for example, training of the neural network for detecting the incidence of the elbow, the face and the hand, so that the sample collection difficulty in the multitask neural network training is reduced, and the training quality of the multitask neural network can be improved.

In some embodiments of the present disclosure, the image processing method further comprises: and displaying corresponding incidence relation marking information on the image according to the third incidence relation information or according to the second incidence relation information and the third incidence relation information.

The association relation indication information may be displayed in a form of a connection line, that is, the third association relation information may be displayed by using a connection line between the target detection frame for the target body part and the first detection frame for the first body part.

In an example, after the target body part is the left hand and the first body part is the left elbow and the third correlation information between the left hand and the left elbow is determined, the detection frame for the left hand and the detection frame for the left elbow may be connected by a connection line to serve as corresponding correlation indication information, specifically, referring to fig. 2, which shows three human bodies 210, 220, and 230, the target body part of the human body 210 is the left hand 213 and the first body part of the human body 210 is the left elbow 212, and the detection frame for the left hand 213 and the detection frame for the left elbow 212 may be determined to be connected by a connection line to serve as indication information of the third correlation information between the two; the target body part of the human body 220 is a left hand 223, the first body part of the human body 220 is a left elbow 222, and it can be determined that the detection frame for the left hand 223 and the detection frame for the left elbow 222 are connected by a connecting line to serve as the indication information of the third association information therebetween; the target body part of the human body 230 is the left hand 233, the first body part of the human body 230 is the left elbow 232, and it can be determined that the detection frame for the left hand 233 and the detection frame for the left elbow 232 are connected by a connection line to serve as indication information of the third association information therebetween.

Correspondingly, corresponding incidence relation marking information can be displayed on the image according to the fifth incidence relation information or according to the fourth incidence relation information and the fifth incidence relation information. The first incidence relation information may be displayed by using a connection line between the second detection frame for the second body part and the first detection frame for the first body part.

When the third association relationship information and the fifth association relationship information are displayed on the image, the association relationship marking information of the first body part, the target body part and the second body part is formed, for example, the first body part is a human face, the target body part is a left elbow, and the second body part is a left hand, the association relationship marking information of the human face, the left elbow and the left hand is formed; specifically, referring to fig. 2, three human bodies 210, 220, and 230 are shown in the figure, where a first body part of the human body 210 is a human face 211, a target body part of the human body 210 is a left elbow 212, and a second body part of the human body 210 is a left hand 213, and a detection frame for the human face 211, a detection frame for the left elbow 212, and a detection frame for the left hand 213 may be sequentially connected to form association relationship indication information of the human face 211, the left elbow 212, and the left hand 213; the first body part of the human body 220 is a human face 221, the target body part of the human body 220 is a left elbow 222, the second body part of the human body 220 is a left hand 223, and a detection frame for the human face 221, a detection frame for the left elbow 222 and a detection frame for the left hand 223 can be sequentially connected to form incidence relation marking information of the human face 221, the left elbow 222 and the left hand 223; the first body part of the human body 230 is a human face 231, the target body part of the human body 230 is a left elbow 232, the second body part of the human body 230 is a left hand 233, and the detection frame for the human face 231, the detection frame for the left elbow 232 and the detection frame for the left hand 233 can be sequentially connected to form the association relationship marking information of the human face 231, the left elbow 232 and the left hand 233.

The association relation marking information is not limited to be displayed in a connecting line mode, and modes such as marking different body parts associated with the same human body by using a detection frame with the same color, marking personnel identification marks corresponding to different parts of the same human body and the like can also be used.

In the embodiment of the disclosure, by displaying at least one of the third association relationship information and the fifth association relationship information, the labeling result can be visually displayed, so that a labeling person can conveniently check the association labeling result, and when the method is applied to human body action detection and tracking, the human body action and tracking result can be displayed through the association relationship marking information, so that the detection result of the association relationship can be evaluated.

According to a second aspect of embodiments of the present disclosure, there is provided a training method of a neural network for detecting an association relationship between body parts in an image, the method including: training the neural network by using an image training set; wherein the images in the image training set contain annotation information, the annotation information comprising association information between a first body part and a target body part in the images, the association information being determined according to the method of the first aspect.

The third association relation information obtained by the image processing method is used for labeling the images in the image training set, so that more accurate and reliable labeling information can be obtained, and the neural network for detecting the association relation between the body parts in the images obtained by training has higher accuracy.

According to a third aspect of the embodiments of the present disclosure, there is provided a motion recognition method, the method including: the motion of the human body in the image is identified based on the association information of the first body part and the target body part in the image, wherein the association information is obtained by a neural network trained by the method according to the second aspect.

According to the incidence relation information between the human body parts predicted by the neural network for detecting the incidence relation between the human body parts in the image, different human body parts of the same human body can be accurately associated in human body action detection, so that the analysis of the relative position and angle relation between different human body parts of the human body is unified, the human body action is further determined, and a more accurate human body action recognition result can be obtained.

Referring to fig. 3, according to a fourth aspect of the embodiments of the present disclosure, there is provided an image processing apparatus including:

a key point obtaining module 301, configured to obtain a human body detection frame in an image, a target key point corresponding to a target body part, and first association relationship information between the human body detection frame and the target key point;

a detection frame generation module 302, configured to generate a target detection frame for the target body part according to the target key point and the human body detection frame;

an association relation determining module 303, configured to determine third association relation information according to the first association relation information and second association relation information labeled in advance, where the second association relation information represents an association relation between a first body part and the human body detection frame, and the third association relation information represents an association relation between the target detection frame and the first detection frame for the first body part.

According to a fifth aspect of embodiments of the present disclosure, there is provided a training apparatus of a neural network for detecting an association relationship between body parts in an image, the apparatus including:

and the training module is used for training the neural network by utilizing the image training set.

Wherein the images in the image training set contain annotation information, the annotation information comprising association information between a first body part and a target body part in the images, the association information being determined according to the method of the first aspect.

According to a sixth aspect of the embodiments of the present disclosure, there is provided a motion recognition apparatus, the apparatus including:

and the identification module is used for identifying the action of the human body in the image based on the incidence relation information of the first body part and the target body part in the image. Wherein the incidence relation information is derived from a neural network trained by the method according to the second aspect.

With regard to the apparatus in the above-mentioned embodiment, the specific manner in which each module performs the operation has been described in detail in the third aspect with respect to the embodiment of the method, and will not be elaborated here.

Referring to fig. 4, according to a seventh aspect of the embodiments of the present disclosure, there is provided an electronic device, including a memory for storing computer instructions executable by the processor, and a processor for implementing the method of the first, second or third aspect when executing the computer instructions.

According to an eighth aspect of embodiments of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of the first, second or third aspect.

In this disclosure, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. The term "plurality" means two or more unless expressly limited otherwise.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope of the present disclosure. The scope of the present disclosure is to be limited only by the following claims.

18页详细技术资料下载

Image processing method and device, electronic device and storage medium

相关技术

网友询问留言