Neural network training method and device, and associated object detection method and device

文档序号:144500 发布日期:2021-10-22 浏览:9次 中文

阅读说明:本技术 神经网络的训练方法和装置、关联对象的检测方法和装置 (Neural network training method and device, and associated object detection method and device ) 是由 张学森 刘春亚 王柏润 陈景焕 于 2021-04-28 设计创作,主要内容包括:本公开提供一种神经网络的训练方法和装置、关联对象的检测方法和装置,其中神经网络的训练方法包括:检测图像中的第一类对象和第二类对象;基于检测出的第一类对象和第二类对象生成至少一个候选对象组,其中,所述候选对象组包括至少一个所述第一类对象和至少两个所述第二类对象;基于神经网络确定所述第一类对象分别与同一候选对象组中的各第二类对象之间的匹配度;根据所述第一类对象分别与同一候选对象组内的各第二类对象的匹配度,确定所述候选对象组的群组关联损失,其中,所述群组关联损失正相关于所述第一类对象与非关联的第二类对象之间的匹配度;根据所述群组关联损失,调整所述神经网络的网络参数。(The present disclosure provides a training method and apparatus for a neural network, and a detection method and apparatus for a related object, wherein the training method for the neural network includes: detecting a first type of object and a second type of object in an image; generating at least one candidate object group based on the detected first class objects and second class objects, wherein the candidate object group comprises at least one first class object and at least two second class objects; determining the matching degree between the first class of objects and each second class of objects in the same candidate object group respectively based on a neural network; determining group association loss of the candidate object set according to the matching degree of the first class object and each second class object in the same candidate object set, wherein the group association loss is positively correlated with the matching degree between the first class object and the non-correlated second class object; and adjusting network parameters of the neural network according to the group association loss.)

1. A method of training a neural network, the method comprising:

detecting a first type of object and a second type of object in an image;

generating at least one candidate object group based on the detected first class objects and second class objects, wherein the candidate object group comprises at least one first class object and at least two second class objects;

determining the matching degree between the first class of objects and each second class of objects in the same candidate object group respectively based on a neural network;

determining group association loss of the candidate object set according to the matching degree of the first class object and each second class object in the same candidate object set, wherein the group association loss is positively correlated with the matching degree between the first class object and the non-correlated second class object;

and adjusting network parameters of the neural network according to the group association loss.

2. The method of claim 1, wherein the group association loss is further inversely related to a degree of match between the first class of objects and the associated second class of objects within the candidate set.

3. The method according to claim 1 or 2, characterized in that the method further comprises:

and determining that the neural network completes training under the condition that the group association loss is smaller than a preset loss value.

4. The method according to any one of claims 1 to 3, wherein the detecting the first type of object and the second type of object in the image comprises:

extracting a feature map of the image;

determining a first class object and a second class object in the image according to the feature map;

the determining the matching degree between the first class object and each second class object in the same candidate object group based on the neural network comprises:

determining a first characteristic of the first class of objects according to the characteristic diagram;

according to the feature map, determining second features of all second-class objects in the candidate object group to obtain a second feature set corresponding to the first features;

splicing each second feature in the second feature set with the first feature respectively to obtain a spliced feature set;

and determining the matching degree between the second class object and the first class object corresponding to the splicing features in the splicing feature set based on the neural network.

5. The method according to any one of claims 1 to 4, wherein each of the second class objects in the candidate object group satisfies a preset relative positional relationship with the first class object; alternatively, the first and second electrodes may be,

and the detection frame of each second class object in the candidate object group and the detection frame of the first class object have an overlapping area.

6. The method of any of claims 1-5, wherein the first class of objects comprises first human body part objects and the second class of objects comprises human body objects; alternatively, the first class of objects comprises human objects and the second class of objects comprises first human part objects.

7. The method of claim 6, wherein the first human body part object comprises a human face object or a human hand object.

8. The method according to any one of claims 1 to 7, further comprising:

detecting a third class of objects in the image;

generating at least one candidate object group based on the detected first class object and second class object, comprising:

generating at least one candidate object group based on the detected first class objects, second class objects and third class objects, each candidate object group further comprising at least two of the third class objects;

the method further comprises the following steps: determining the matching degree between the first class of objects and each third class of objects in the same candidate object group respectively based on a neural network;

and the group association loss is also positively correlated to the degree of match between the first class of objects and the non-associated third class of objects.

9. The method of claim 8, wherein the third class of objects comprises a second human body part object.

10. A method for detecting a related object, comprising:

detecting a first type of object and a second type of object in an image;

generating at least one object group based on the detected first class objects and second class objects, wherein the object group comprises one first class object and at least two second class objects;

determining the matching degree of the first class object and each second class object in the same object group;

and determining the second class objects associated with the first class objects based on the matching degree of the first class objects and each second class object in the same object group.

11. The method of claim 10, wherein generating at least one group of objects based on the detected first class of objects and second class of objects comprises:

performing a combining operation on the detected first-class objects;

the combining operation includes:

combining the first-class object and any two detected second-class objects into an object group; alternatively, the first-type object and each of the detected second-type objects are combined into one object group.

12. The method according to claim 10 or 11, wherein generating at least one object group based on the detected first class object and second class object comprises:

determining at least two second-class objects meeting a preset relative position relation with the first-class object as candidate associated objects of the first-class object according to the detected position information of the first-class object and the second-class object;

and combining the first class object and each candidate related object of the first class object into an object group.

13. The method of claim 10 or 11, wherein the first class of objects comprises first human body part objects and the second class of objects comprises human body objects; alternatively, the first class of objects comprises human objects and the second class of objects comprises first human part objects.

14. The method of claim 13, wherein the first human body part object comprises a human face object or a human hand object.

15. The method of claim 10, further comprising: detecting a third type object in the image;

generating at least one object group based on the detected first class object and second class object, comprising:

generating at least one object group based on the detected first class objects, second class objects and third class objects, wherein the object group further comprises at least two third class objects;

the method further comprises the following steps:

determining the matching degree of the first class object and each third class object in the same object group;

and determining a third class object associated with the first class object based on the matching degree of the first class object and each third class object in the same object group.

16. The method of claim 15, wherein the third class of objects comprises a second human body part object.

17. The method according to any one of claims 10 to 16, wherein the determining the matching degree of the first class object with each second class object of the same object group respectively comprises:

determining the matching degree of the first class of objects and each second class of objects in the same object group respectively based on a pre-trained neural network; wherein the neural network is trained according to the method of any one of claims 1 to 9.

18. An apparatus for detecting a related object, comprising:

the detection module is used for detecting a first class object and a second class object in the image;

an object group generating module, configured to generate at least one object group based on the detected first class objects and second class objects, where the object group includes one first class object and at least two second class objects;

the determining module is used for determining the matching degree of the first class object and each second class object in the same object group;

and the associated object determining module is used for determining the second class objects associated with the first class objects based on the matching degree of the first class objects and each second class object in the same object group.

19. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1-17 when executing the program.

20. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1 to 17.

21. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1 to 17.

22. A computer program comprising computer readable code, wherein the computer readable code when run in an electronic device causes a processor in the electronic device to implement the method of any of claims 1 to 17.

Technical Field

The disclosure relates to the technical field of computer vision, in particular to a training method and device of a neural network and a detection method and device of a related object.

Background

In intelligent scene detection, object detection and identification are an important research direction. The multidimensional object analysis can obtain rich object information, thereby helping to research the state and the variation trend of the object. In a specific scene of object detection and identification, the association relationship between objects in an image can be analyzed, the potential relationship between the objects can be automatically extracted, and more association information except the attribute of the object can be obtained.

For a scene with multiple objects, especially a scene with multiple objects in an image being blocked or overlapped, the difficulty in analyzing the association relationship between the objects is high, and it is difficult to obtain an accurate detection result by determining the associated objects only according to the prior knowledge in the aspects of the inter-object position relationship and the like, for example, there may be a case of missing detection or false detection. For example, in a scene of intelligently detecting a multiplayer game, body parts such as different human hands and human faces in a video need to be associated with corresponding human bodies to identify actions of different people, and a plurality of human bodies may be shielded or overlapped, so that difficulty in detecting an association relationship between the human body parts and the human bodies is increased.

Disclosure of Invention

The disclosure provides a training method and device of a neural network and a detection method and device of a related object.

According to a first aspect of embodiments of the present disclosure, there is provided a training method of a neural network, the method including: detecting a first type of object and a second type of object in an image; generating at least one candidate object group based on the detected first class objects and second class objects, wherein the candidate object group comprises at least one first class object and at least two second class objects; determining the matching degree between the first class of objects and each second class of objects in the same candidate object group respectively based on a neural network; determining group association loss of the candidate object set according to the matching degree of the first class object and each second class object in the same candidate object set, wherein the group association loss is positively correlated with the matching degree between the first class object and the non-correlated second class object; and adjusting network parameters of the neural network according to the group association loss.

In some optional embodiments, the group association loss is also inversely related to a degree of matching between the first class of objects and the associated second class of objects within the candidate set of objects.

In some optional embodiments, the method further comprises: and determining that the neural network completes training under the condition that the group association loss is smaller than a preset loss value.

In some optional embodiments, the detecting the first class of objects and the second class of objects in the image includes: extracting a feature map of the image; determining a first class object and a second class object in the image according to the feature map; the determining the matching degree between the first class object and each second class object in the same candidate object group based on the neural network comprises: determining a first characteristic of the first class of objects according to the characteristic diagram; according to the feature map, determining second features of all second-class objects in the candidate object group to obtain a second feature set corresponding to the first features; splicing each second feature in the second feature set with the first feature respectively to obtain a spliced feature set; and determining the matching degree between the second class object and the first class object corresponding to the splicing features in the splicing feature set based on the neural network.

In some optional embodiments, each second class object in the candidate object group and the first class object satisfy a preset relative position relationship; or, there is an overlapping area between the detection frame of each second class object in the candidate object group and the detection frame of the first class object.

In some optional embodiments, the first class of objects comprises first human body part objects, and the second class of objects comprises human body objects; alternatively, the first class of objects comprises human objects and the second class of objects comprises first human part objects.

In some alternative embodiments, the first human body part object comprises a human face object or a human hand object.

In some optional embodiments, the method further comprises: detecting a third class of objects in the image; generating at least one candidate object group based on the detected first class object and second class object, comprising: generating at least one candidate object group based on the detected first class objects, second class objects and third class objects, each candidate object group further comprising at least two of the third class objects; the method further comprises the following steps: determining the matching degree between the first class of objects and each third class of objects in the same candidate object group respectively based on a neural network; and the group association loss is also positively correlated to the degree of match between the first class of objects and the non-associated third class of objects.

In some alternative embodiments, the third class of objects comprises a second body part object.

According to a second aspect of the embodiments of the present disclosure, there is provided a method for detecting a related object, including: detecting a first type of object and a second type of object in an image; generating at least one object group based on the detected first class objects and second class objects, wherein the object group comprises one first class object and at least two second class objects; determining the matching degree of the first class object and each second class object in the same object group; and determining the second class objects associated with the first class objects based on the matching degree of the first class objects and each second class object in the same object group.

In some optional embodiments, the generating at least one object group based on the detected first class object and second class object comprises: performing a combining operation on the detected first-class objects; the combining operation includes: combining the first-class object and any two detected second-class objects into an object group; alternatively, the first-type object and each of the detected second-type objects are combined into one object group.

In some optional embodiments, the generating at least one object group based on the detected first class object and second class object comprises: determining at least two second-class objects meeting a preset relative position relation with the first-class object as candidate associated objects of the first-class object according to the detected position information of the first-class object and the second-class object; and combining the first class object and each candidate related object of the first class object into an object group.

In some optional embodiments, the first class of objects comprises first human body part objects, and the second class of objects comprises human body objects; alternatively, the first class of objects comprises human objects and the second class of objects comprises first human part objects.

In some alternative embodiments, the first human body part object comprises a human face object or a human hand object.

In some optional embodiments, the method further comprises: detecting a third type object in the image; generating at least one object group based on the detected first class object and second class object, comprising: generating at least one object group based on the detected first class objects, second class objects and third class objects, wherein the object group further comprises at least two third class objects; the method further comprises the following steps: determining the matching degree of the first class object and each third class object in the same object group; and determining a third class object associated with the first class object based on the matching degree of the first class object and each third class object in the same object group.

In some alternative embodiments, the third class of objects comprises a second body part object.

In some optional embodiments, the determining the matching degree between the first class object and each of the second class objects in the same object group includes: determining the matching degree of the first class of objects and each second class of objects in the same object group respectively based on a pre-trained neural network; wherein the neural network is trained according to any one of the methods provided in the first aspect.

According to a third aspect of embodiments of the present disclosure, there is provided an apparatus for training a neural network, the apparatus including: the object detection module is used for detecting a first class object and a second class object in the image; a candidate object group generating module, configured to generate at least one candidate object group based on the detected first class objects and second class objects, where the candidate object group includes at least one of the first class objects and at least two of the second class objects; the matching degree determining module is used for determining the matching degree between the first class of objects and each second class of objects in the same candidate object group based on a neural network; the group association loss determining module is used for determining the group association loss of the candidate object set according to the matching degree of the first class object and each second class object in the same candidate object set, wherein the group association loss is positively correlated with the matching degree between the first class object and the non-correlated second class object; and the network parameter adjusting module is used for adjusting the network parameters of the neural network according to the group association loss.

According to a fourth aspect of the embodiments of the present disclosure, there is provided a detection apparatus of an associated object, including: the detection module is used for detecting a first class object and a second class object in the image; an object group generating module, configured to generate at least one object group based on the detected first class objects and second class objects, where the object group includes one first class object and at least two second class objects; the determining module is used for determining the matching degree of the first class object and each second class object in the same object group; and the associated object determining module is used for determining the second class objects associated with the first class objects based on the matching degree of the first class objects and each second class object in the same object group.

According to a fifth aspect of the embodiments of the present disclosure, there is provided a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the training method of the neural network according to any one of the first aspects or implements the detection method of the associated object according to any one of the second aspects when executing the program.

According to a sixth aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium on which a computer program is stored, the program, when executed by a processor, implementing the method for training a neural network according to any one of the first aspects or the method for detecting a related object according to any one of the second aspects.

According to a seventh aspect of embodiments of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method for training a neural network according to any one of the first aspect, or implements the method for detecting a related object according to any one of the second aspect.

In the embodiment of the disclosure, on the basis of detecting a first class object and a second class object in an image, a candidate object group is generated based on at least one first class object and at least two second class objects, matching degrees of the first class object and the second class objects are determined based on a neural network, group association losses of the corresponding candidate object group are obtained according to the determined matching degrees, and network parameters of the neural network are adjusted according to the group association losses, so that training of the neural network is completed. In the training mode, a loss function (group association loss) is obtained according to the matching degree of a plurality of matching pairs formed by a first class object and a second class object in a candidate object set, and then network parameters of the neural network are adjusted according to the group association loss obtained by the loss function corresponding to the candidate object set. The training mode can utilize a plurality of matching pairs to realize global optimization of the neural network. Through a minimized loss function, the matching degree of the mismatching pair is restrained, and the distance between each object in the mismatching pair is pushed away; the encouragement on the matching degree of the correct matching pair is realized, and the distance between each object in the correct matching pair is shortened. Therefore, the neural network obtained by training in the method can more accurately detect and determine the correct matching pair between the first class object and the second class object in the image, and more accurately determine the incidence relation between the first class object and the second class object.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

FIG. 1 is a flow diagram illustrating a method of training a neural network in accordance with an exemplary embodiment;

FIG. 2 is a schematic diagram illustrating a detected image according to an exemplary embodiment;

FIG. 3 is a schematic diagram of a neural network framework shown in accordance with an exemplary embodiment;

FIG. 4 is a flow diagram illustrating a method for matching determination, according to an exemplary embodiment;

FIG. 5 illustrates a method of detecting an associated object, according to an example embodiment;

FIG. 6 illustrates a training apparatus for a neural network, in accordance with an exemplary embodiment;

FIG. 7 illustrates another training apparatus for a neural network, in accordance with an exemplary embodiment;

FIG. 8 illustrates a detection apparatus associated with an object, according to an exemplary embodiment;

FIG. 9 is a schematic diagram illustrating a configuration of a computer device, according to an example embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The specific manner described in the following exemplary embodiments does not represent all aspects consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

Human body part and human body association is an important ring in intelligent video analysis. For example, in a scene of intelligently monitoring the process of a multiplayer chess and card game, the system needs to associate different human hands in a video with corresponding human bodies to determine actions made by different human bodies, so as to intelligently monitor different human bodies in the process of the multiplayer chess and card game.

The present disclosure provides a training method for a neural network, which can adjust network parameters of the neural network more optimally, so that the trained neural network can more accurately detect a matching degree between a human body part and a human body, thereby determining an association relationship between the human body part and the human body in an image. In the training process of the neural network, a candidate object group can be generated based on a first class object and a second class object detected in an image, the matching degree between the first class object and each second class object in the same candidate object group is determined based on the neural network, the group association loss of the corresponding candidate object group is obtained by using the determined matching degrees, and the network parameters of the neural network are adjusted according to the group association loss.

In order to make the training method of the neural network provided by the present disclosure clearer, the following describes in detail the implementation process of the scheme provided by the present disclosure with reference to the accompanying drawings and specific embodiments.

Referring to fig. 1, fig. 1 is a flowchart illustrating a method for training a neural network according to an embodiment of the present disclosure. As shown in fig. 1, the process includes:

step 101, detecting a first class object and a second class object in an image.

The detected image may be an image containing a plurality of types of objects. The object category is predefined, and includes, for example, a person and an article, or is divided according to attribute characteristics such as gender and age of the person, or is divided according to characteristics such as color and function of the article.

In some embodiments, the objects in the image include human body part objects and human body objects. That is, the first and second class objects may be human body part objects or human body objects. The human body part object is a local part in a human body, such as a human hand, a human face, a human foot and the like. For example, in the case of monitoring the process of the multiplayer chess game by using an intelligent monitoring device, the image acquired by the intelligent monitoring device may be used as the image detected in this step.

As shown in fig. 2, the image of the multi-player game scene collected by the intelligent monitoring device may be used as the detected image in the embodiment of the present disclosure. A plurality of human objects participating in the game exist in the captured image: human B1, human B2, and human B3, and corresponding human hand objects (human part objects): a human hand H1 and a human hand H2 corresponding to the human body B1; human hand H3 corresponding to human B2; corresponding to human hands H4 and H5 of human body B3. In fig. 2, a human subject is indicated by a human detection box, and a human hand subject is indicated by a human hand detection box.

In the embodiment of the present disclosure, the first class object in the image is different from the second class object, and a certain association relationship exists between the first class object and the second class object. Wherein, in case that the first class of objects includes human body part objects, the second class of objects may include human body part objects of a different type from the human body part objects included in the first class of objects, or the second class of objects may include human body objects. Alternatively, where the second class of objects comprises human body part objects, the first class of objects may comprise human body part objects of a different type than the human body part objects comprised by the second class of objects, or the first class of objects may comprise human body objects. The types of the human body part objects correspond to the body parts to which the human body part objects refer, for example, the human face object, the human hand object and the elbow object correspond to the human face, the human hand and the elbow respectively, and the types of the human body part objects are different from each other.

In some optional embodiments, the first class of objects comprises first human body part objects, and the second class of objects comprises human body objects; alternatively, the first class of objects comprises human objects and the second class of objects comprises first human part objects. Wherein the first human body part object comprises a human face object or a human hand object.

For example, with the human hand object as the first class object and the human body object as the second class object, this step may detect the human hand object and the human body object in the image. As shown in fig. 2, this step may detect the first type of object from fig. 2: human hand H1, human hand H2, human hand H3, human hand H4, and human hand H5; detecting a second type of object: human B1, human B2, and human B3.

It is understood that the image detected in this step can be obtained in many different ways to achieve training of the neural network, and the embodiments of the present disclosure are not limited thereto. For example, images in different scenes can be acquired through the intelligent monitoring device. For example, an intelligent monitoring device may be utilized to capture images of a multiplayer board game. For example, images of a human body part object and a human body object can be screened out through different image databases.

It should be noted that, the manner of detecting the first class object and the second class object in the image in this step may include various forms, and this embodiment is not limited. For example, the first type of object in the image may be detected first, then the second type of object in the image may be detected, and finally the first type of object and the second type of object in the image may be detected. Alternatively, the first-class object and the second-class object in the image may be detected simultaneously by one detection.

In some possible implementations, a detection network that can detect the first class of objects and the second class of objects in the image simultaneously may be obtained by training in advance, and the first class of objects and the second class of objects may be obtained by detecting the first class of objects and the second class of objects from the image at a time using the detection network obtained by training. For example, a human face and human body joint detection neural network may be obtained by training in advance, and in this embodiment, a human face object and a human body object may be simultaneously detected from an image by using the human face and human body joint detection neural network obtained by training.

Step 102, generating at least one candidate object group based on the detected first class objects and second class objects, wherein the candidate object group comprises at least one first class object and at least two second class objects.

In the case where the first type object and the second type object are detected in the image, this step may generate a candidate object group based on one detected first type object and at least two detected second type objects; alternatively, this step may generate a set of candidate objects based on at least two objects of the first type and at least two objects of the second type. Since the first-type object detected in the image may include a plurality of objects, the candidate object group generated based on the first-type object may also include a plurality of objects.

Still with the first type of object detected in FIG. 2: human hand H1, human hand H2, human hand H3, human hand H4, and human hand H5; the second type of object: human B1, human B2, and human B, as examples. This step may generate a corresponding set of candidate objects based on the first class of objects and the second class of objects detected in fig. 2. Illustratively, a human hand H1, a human body B1, a human body B2, and a human body B3 may be combined to obtain a candidate group; alternatively, the human hand H1, the human hand H2, the human body B1, the human body B2, and the human body B3 may be combined to obtain another candidate group. It is understood that more different candidate groups can be generated based on different combinations, and are not exhaustive here.

In some optional embodiments, each second class object in the candidate object group and the first class object satisfy a preset relative position relationship; or, there is an overlapping area between the detection frame of each second class object in the candidate object group and the detection frame of the first class object.

In the above embodiment, the relative positional relationship may be preset. And adding a second class object which meets a preset relative position relation with the first class object to a candidate object group where the first class object is located aiming at any detected first class object. Therefore, the first-class object and the second-class object in the same candidate object group can be ensured to meet the preset relative position relation. The preset relative position relationship may include at least one of the following items: and the position distance between the first class of objects and the second class of objects is smaller than a preset threshold value, and the detection frames have overlapping areas. The distance between the first class object and the second class object in the same candidate object group is smaller than a preset threshold, and/or an overlapping area exists between the detection frame of the first class object and the detection frame of the second class object in the same candidate object group.

In this optional embodiment, the relative position relationship may be satisfied by presetting, so that the first class object and the second class object in the same candidate object group are objects that may have a certain association, and then further determining the second class object correctly associated with the first class object from the candidate object group. According to the method, the objects which possibly have the association relation in the first-class objects and the second-class objects detected from the image can be preliminarily generated in the same candidate object group, so that the second-class objects which are correctly associated with the first-class objects can be further specifically determined from the candidate object group, and the accuracy of the calculation result of the matching degree between the first-class objects and the second-class objects is improved.

Taking fig. 2 as an example, the preset relative position relationship may be: detecting that the frames are overlapped; there is an overlapping area between the detection frame of the first kind of subject human hand H5 and the detection frames of the second kind of subject human body B2 and human body B3, respectively, in the same candidate group.

And 103, determining the matching degree between the first class of objects and each second class of objects in the same candidate object group respectively based on a neural network.

This step may preset a neural network for detecting a matching degree between the first class object and the second class object. For example, a neural network known to be useful for detecting the correlation between objects may be trained in advance by using training samples, resulting in a neural network usable in this step. The step can detect and determine the matching degree between the first class object and each second class object in the same candidate object group respectively based on a preset neural network. Wherein the matching degree is used for characterizing the relevance degree between the detected first class object and the second class object. The specific representation form of the matching degree may include multiple forms, and the embodiment of the present disclosure is not limited. For example, numerical means, percentage means, gradation means, etc. may be used.

Taking fig. 2 as an example, the candidate group G1 includes: the first class of objects: human hand H5; the second type of object: human B2 and human B3. This step may determine, based on a preset neural network, that of the candidate group G1: the matching degree M1 between the human hand H5 and the human body B2; the matching degree M2 between the human hand H5 and the human body B3.

And 104, determining the group association loss of the candidate object set according to the matching degree of the first class object and each second class object in the same candidate object set, wherein the group association loss is positively correlated to the matching degree between the first class object and the non-correlated second class object.

In this embodiment, the association relationship between the first class object and the second class object may be pre-labeled. The first class object and the second class object are associated to represent that the first class object and the second class object have specific similarity relation, same attribution relation and the like. The method can acquire the incidence relation between the first-class object and the second-class object in the detected image for manual labeling, and acquire labeling information. Thereby, objects of the second type associated with objects of the first type and objects of the second type not associated with objects of the first type in the same candidate group can be distinguished.

With reference to fig. 2, two corresponding degrees of matching are obtained from the candidate group G1: a degree of match M1 and a degree of match M2. In this step, the Group association loss 1 corresponding to the candidate object Group G1 may be determined according to the two obtained matching degrees. And, the first-class object human hand H5 is not associated with the second-class object human body B2, and accordingly, the Group association loss Group loss 1 is positively associated with the matching degree M1.

Because the group association loss is positively related to the matching degree between the first class object and the non-associated second class object, the matching degree between the first class object and the non-associated second class object can be inhibited through minimizing the group association loss, and the distance between the first class object and the non-associated second class object is further deduced, so that the neural network can have better capability of distinguishing the first class object from the non-associated second class object after being trained.

In some optional embodiments, the group association loss is also inversely related to a degree of matching between the first class of objects and the associated second class of objects within the candidate set of objects. For example, since the first-class object human hand H5 is associated with the second-class object human body B3, the group association loss 1 is negatively correlated with the matching degree M2.

Because the group association loss is negatively related to the matching degree between the first class object and the associated second class object, the encouragement of the matching degree between the first class object and the associated second class object can be realized through the minimization of the group association loss, and the distance between the first class object and the associated second class object can be shortened, so that the neural network can have better capability of distinguishing the first class object from the associated second class object after training, thereby realizing the global optimization of the neural network and improving the accuracy of the calculation result of the matching degree between the first class object and the second class object.

In the following, a specific example is described how the penalty function (resulting in a group association penalty) is set such that the group association penalty is positively related to the degree of matching between the first type of object and the non-associated second type of object and negatively related to the degree of matching between the first type of object and the associated second type of object.

The preset loss function is exemplarily explained with reference to the image shown in fig. 2. The candidate object group G2 includes a first type of object: human hand H3, second class object: a human body B1, a human body B2 and a human body B3, wherein the human hand H3 is correspondingly associated with the human body B2 (namely the human hand H3 and the human body B2 belong to the same person). For example, the matching degree between the human hand H3 and the human body B2 is denoted as SP(ii) a The matching degree of the human hand H3 and the human body B1 is recorded as Sn1(ii) a The matching degree of the human hand H3 and the human body B3 is recorded as Sn2(ii) a The group association loss is noted as LGroup. Illustratively, the loss function may be preset as follows:

LGroup=-log(exp(sp)/(exp sp+exp sn1+exp sn2))

and calculating the group association loss of the candidate object group according to the loss function. The loss function is negatively related to the matching degree of the first-class object and the second-class object which are related in the group, and positively related to the matching degree of the first-class object and the second-class object which are not related in the group, and in addition, the neural network can be rapidly converged.

Step 105, adjusting network parameters of the neural network according to the group association loss.

In some optional embodiments, a large number of sample images may be used as the images detected in this embodiment to train the neural network until a preset training requirement is met. In one possible implementation, it is determined that the neural network completes training in a case that the group association loss is smaller than a preset loss value. In the implementation mode, the matching degree between the first class object and the non-associated second class object is suppressed by minimizing the loss function, and the distance between the first class object and the non-associated second class object is further deduced; and realizing the encouragement of the matching degree between the first class objects and the associated second class objects, and drawing the distance between the first class objects and the associated second class objects closer. In another possible implementation manner, it is determined that the neural network completes training when the training times of the neural network reach a preset time threshold.

In the embodiment of the disclosure, on the basis of detecting a first class object and a second class object in an image, a candidate object group is generated based on at least one first class object and at least two second class objects, matching degrees of the first class object and the second class objects are determined based on a neural network, group association losses of the corresponding candidate object group are obtained according to the determined matching degrees, and network parameters of the neural network are adjusted according to the group association losses, so that training of the neural network is completed.

In the training mode, a loss function (group association loss) is obtained according to the matching degree of a plurality of matching pairs formed by a first class object and a second class object in a candidate object set, and then network parameters of the neural network are adjusted according to the group association loss obtained by the loss function corresponding to the candidate object set. Compared with the method that the value of the loss function is obtained according to the matching degree of only one matching pair consisting of the first class object and the second class object, the training mode can realize the global optimization of the neural network by utilizing a plurality of matching pairs. Through a minimized loss function, the matching degree of the mismatching pair is restrained, and the distance between each object in the mismatching pair is pushed away; the encouragement on the matching degree of the correct matching pair is realized, and the distance between each object in the correct matching pair is shortened. Therefore, the neural network obtained by training in the method can more accurately detect and determine the correct matching pair between the first class object and the second class object in the image, and more accurately determine the incidence relation between the first class object and the second class object.

For a multi-object scene, especially a scene with occlusion or overlap among a plurality of objects in an image, the difficulty of analyzing the association relationship among the objects in the image is high. In the related art, the association relationship is determined only according to the prior knowledge in the aspects of the position relationship between the objects and the like, so that the condition of missed detection or false detection may exist, and an accurate detection result is difficult to obtain. In the neural network obtained by the training method provided by this embodiment, a plurality of first-class objects and second-class objects which may have an association relationship in an image may be used as detection objects in the same group in the form of a candidate object group, so that global optimization of association relationship detection of a plurality of matching pairs formed by the first-class objects and the second-class objects in the image is realized on the basis of the candidate object group, and accuracy of a matching degree calculation result between the first-class objects and the second-class objects is improved.

Fig. 3 is a schematic diagram of a network architecture of an association detection network, which is provided by at least one embodiment of the present disclosure, and based on the association detection network, training of a neural network or detection of an association relationship between a first class of objects and a second class of objects in an image may be implemented. As shown in fig. 3, the association detection network may include:

and the feature extraction network 31 is used for extracting features of the image to obtain a feature map. In one example, the Feature extraction network 31 may include a backbone network (backbone) and an FPN (Feature Pyramid Networks). After the images are sequentially processed by a backbone network and an FPN, a characteristic diagram can be extracted.

For example, the backbone network may use vgnet, ResNet, etc., and the FPN may convert the feature map into a feature map of a pyramid multi-layer structure based on the feature map obtained by the backbone network. The backbone network, namely an image feature extraction part backbone of the correlation detection network; the FPN is equivalent to a Neck part in a network architecture, and is used for feature enhancement processing, so that shallow features extracted from the backhaul can be enhanced.

And the object detection network 32 is used for determining the first class object and the second class object in the image according to the feature map extracted from the image.

As shown in fig. 3, the object detection Network 32 may include an RPN (Region-convolutional-neural Network) and an RCNN (Region-CNN), where the RPN may predict an anchor frame (anchor) based on a feature map output by the FPN, and the RCNN may predict a detection frame (bbox) based on the anchor frame and the feature map output by the FPN, where the detection frame includes a first-class object or a second-class object. The detection box output by the RCNN may be plural.

The matching detection network 33(Pair Head), which is a neural network to be trained in the embodiment of the present disclosure, is configured to determine a first feature corresponding to a first class object and a second feature corresponding to a second class object based on the first class object or the second class object in the detection frame output by the RCNN and the feature map output by the FPN.

The object detection network 32 and the matching detection network 33 described above correspond to a Head section, i.e., a detector, which is located in the association detection network and outputs a detection result. The detection result in the embodiment of the present disclosure includes the first class object, the second class object, and the corresponding association relationship.

It should be noted that, the association detection network formed by the feature extraction network 31, the object detection network 32 and the matching detection network 33 is not limited to a specific network structure in the embodiment of the present disclosure, and the structure shown in fig. 3 is taken as an exemplary illustration. For example, instead of using the FPN in fig. 3, the first class object or the second class object may be determined directly from the feature map extracted from the background through RPN/RCNN or the like. For another example, fig. 3 illustrates a framework in which detection is performed in two stages (two stages), but detection may be performed in one stage in actual implementation.

Based on the network structure of the association detection network shown in fig. 3, a process of training the neural network (the matching detection network 33) by using the association detection network will be described in detail in the following embodiments.

In the embodiment of the present disclosure, the image may be input into the association detection network, and the feature extraction network 31 performs feature extraction on the image to obtain a feature map; and detecting and determining a detection frame corresponding to the first class object and a detection frame corresponding to the second class object in the image by the object detection network 32 according to the feature map, thereby determining the first class object and the second class object in the image. At least one candidate group is generated by the match detection network 33, i.e. by the neural network, based on the detected first class objects and second class objects, and the degree of match between the first class objects and each of the second class objects in the same candidate group is determined.

The determining, by the matching detection network 33, a specific implementation of the matching degree, that is, the determining, by step 103, the matching degree between the first class object and each second class object in the same candidate object group based on the neural network, as shown in fig. 4, may include the following steps:

step 401, determining a first feature of the first class object according to the feature map.

The matching detection network 33 may determine the first feature of the first type object according to the feature map extracted by the feature extraction network 31 and by combining the detection frame corresponding to the first type object output by the object detection network 32.

Step 402, determining a second feature of each second class object in the candidate object group according to the feature map, and obtaining a second feature set corresponding to the first feature.

The matching detection network 33 may determine the second feature corresponding to the second class object according to the feature map output by the feature extraction network 31 and by combining the detection frame corresponding to the second class object output by the object detection network 32. Based on the same principle, the second feature of each second class object in the candidate object group can be obtained, and a second feature set corresponding to the candidate object group is formed.

And 403, splicing each second feature in the second feature set with the first feature to obtain a spliced feature set.

For each second feature in the second feature set, the matching detection network 33 may perform feature concatenation on the second feature and the first feature to obtain a "first feature-second feature" concatenated feature. The specific splicing manner for splicing the first feature and the second feature is not limited in the embodiments of the present disclosure. In a possible implementation manner, when the feature vector represents the first feature and the second feature, the feature vector corresponding to the first feature and the feature vector corresponding to the second feature may be directly spliced, and the spliced feature vector is used as a splicing feature of the first class object and the second class object.

Step 404, determining a matching degree between the second class object and the first class object corresponding to the splicing features in the splicing feature set based on the neural network.

The matching detection network 33 may determine the matching degree between the corresponding first class object and the second class object according to the splicing characteristics of the first class object and the second class object. In a possible implementation manner, the spliced feature vectors may be input into a preset matching degree calculation function, and the matching degree between the corresponding first class object and the corresponding second class object is calculated. In another possible implementation manner, a matching degree calculation neural network meeting the requirement can be obtained by utilizing training of a training sample in advance; and under the condition that the matching degree needs to be calculated, inputting the spliced feature vectors into the matching degree calculation neural network, and outputting the matching degree between the first class object and the second class object by the matching degree calculation neural network.

In the embodiment of the disclosure, the first class object and the second class object in the image are determined by extracting the feature map of the image and according to the extracted feature map. When the matching degree between the first class of objects and the second class of objects is determined, the first features and the second features determined according to the feature graph can be spliced to obtain spliced features, and then the matching degree between the first class of objects and the second class of objects corresponding to the spliced features is determined based on the neural network. Therefore, the incidence relation between the first class object and the second class object in the image is detected and determined in a candidate object group mode, and the detection efficiency can be improved.

After determining the matching degree between the first class object and each second class object in the same candidate object group, the embodiment of the present disclosure may further calculate the group association loss according to the determined multiple matching degrees through a preset loss function. Then, the network parameters of the matching detection network 33 in the correlation detection network are adjusted according to the group correlation loss, so as to realize the training of the neural network. In a possible implementation manner, the network parameters of one or more of the feature extraction network 31, the object detection network 32 and the matching detection network 33 in the association detection network may be adjusted according to the group association loss, so as to implement training of a part of or the whole association detection network.

In some optional embodiments, based on the above specific manner of the training process for the association detection network, a sufficient number of images are used as training samples to train the association detection network, so as to obtain an association detection network meeting the requirements. After the training of the association detection network is completed, under the condition that the association relation between the first class object and the second class object in a certain image to be detected needs to be detected, the image to be detected can be input into the association detection network obtained through pre-training, and the association detection network outputs the matching degree between the first class object and the second class object in the image to be detected, so that the association result between the first class object and the second class object is obtained. The association detection network is a network trained according to a training method in any embodiment of the disclosure.

It will be appreciated that the correlation results output by the correlation detection network may be presented in a number of different forms. For example, taking fig. 2 as the image to be measured, the correlation result may be output: human hands H1, H2-human B1; hand H3-body B2; human hand H4, H5-human B3. For example, taking fig. 2 as the image to be measured, the correlation result may be output: the matching degree of the human hand H3 to the human body B1 is 0.01; the matching degree of the human hand H3 to the human body B2 is 0.99; the presentation form of the correlation result with the matching degree of the human hand H3-human body B3 being 0.02 … … or more is only an exemplary illustration and does not constitute any limitation of the correlation result.

In some optional embodiments, after the first class object and the second class object in the image are detected, a third class object may also be detected from the image. Wherein the third class of objects is a body part object distinct from the first class of objects or the second class of objects. For example, in the case where the first type of object is a human hand object and the second type of object is a human body object, the third type of object may be a human face object. In the present embodiment, a human hand object, a human body object, and a human face object can be detected from an image.

In one possible implementation, the third class of objects includes a second human body part object. Wherein the second human body part object is a human body part distinct from the first human body part object. For example, the second body part object comprises a human hand object or a human face object. For example, where the first body part object is a human hand object, the second body part object may be a human face object or a human foot object.

In the case where the first class object, the second class object, and the third class object are detected from the image, the present embodiment may generate at least one candidate object group based on the detected first class object, second class object, and third class object, where each candidate object group includes at least two third class objects.

For example, a set of candidate objects may be generated based on a first class of objects, at least two second class of objects, and at least two third class of objects. Alternatively, a set of candidate objects may be generated based on at least two objects of the first type, at least two objects of the second type and at least two objects of the third type.

After determining the matching degrees between the first class object and each second class object in the same candidate object group based on the neural network, the embodiment further includes: and determining the matching degree between the first class of objects and each third class of objects in the same candidate object group respectively based on the neural network.

When determining the group association loss corresponding to the candidate object group, the group association loss may be determined based on the matching degree between the first class object and each second class object in the same candidate object group, and in combination with the matching degree between the first class object and each third class object in the same candidate object group. Wherein the group association loss is positively related to the degree of matching between the first class of objects and the non-associated third class of objects. Therefore, the distance between the first class object and the non-associated third class object can be deduced by minimizing the loss function to suppress the matching degree between the first class object and the non-associated third class object.

In one possible implementation, the group association loss is also inversely related to the degree of matching between the first class of objects and the associated third class of objects. The encouragement of the matching degree between the first class objects and the associated third class objects can be realized by minimizing the loss function, and the distance between the first class objects and the associated third class objects is shortened.

In the embodiment of the disclosure, a candidate object group is generated by using the first class object, the second class object and the third class object detected in the image, and based on the matching degrees between the first class object and the second class object and between the first class object and the third class object, the group association loss corresponding to the candidate object group is determined, so as to adjust the network parameters of the neural network. The neural network obtained by training in the mode can simultaneously detect the matching degree between the first class object and the second class object and between the first class object and the third class object, so that the incidence relation among the first class object, the second class object and the third class object can be simultaneously determined.

Taking fig. 2 as an example, based on the neural network obtained by training in this embodiment, the association relationship among the human hand object, the human body object, and the human face object can be detected and determined from fig. 2 at the same time. For example, it may be determined simultaneously: the first type of object human hands H1 and H2, the second type of object human body B1 and the third type of object human face F1 have correct association relation; the first type of object human hand H3, the second type of object human body B2 and the third type of object human face F2 have correct association relation; the first-class object human hands H4 and H5, the second-class object human body B3 and the third-class object human face F3 are correctly associated.

Based on the concept of the method for training the neural network in the embodiment disclosed above, referring to fig. 5, the present disclosure further provides a method for detecting a related object. As shown in fig. 5, the method comprises the steps of:

step 501, a first class object and a second class object in an image are detected.

In this step, the first class object and the second class object can be detected from the image to be detected for the associated object.

In some optional embodiments, the first class of objects comprises first human body part objects, and the second class of objects comprises human body objects; alternatively, the first class of objects comprises human objects and the second class of objects comprises first human part objects. In one possible implementation, the first human body part object includes a human face object or a human hand object.

Step 502, generating at least one object group based on the detected first class objects and second class objects, wherein the object group comprises one first class object and at least two second class objects.

In case that an object of the first type and an object of the second type in the image are detected, this step may generate an object group based on one object of the first type and at least two objects of the second type. Since the first type object detected in the image may include a plurality of objects, the object group generated based on the first type object may also include a plurality of objects.

The generation manner for generating the object group based on the first class object and the second class object may include multiple implementations, which is not limited in this embodiment. In some optional embodiments, the generating at least one object group based on the detected first class object and second class object comprises: performing a combining operation on the detected first-class objects; the combining operation includes: combining the first-class object and any two detected second-class objects into an object group; alternatively, the first-type object and each of the detected second-type objects are combined into one object group.

In the above alternative embodiment, after the first class object and the second class object in the image are detected, a combination operation may be performed to obtain a corresponding object group. For example, the first-type object and the detected arbitrary at least two second-type objects may be combined to obtain a corresponding one object group. Alternatively, the first-type object and each detected second-type object may be combined to obtain a corresponding one object group.

Taking fig. 2 as an example, the first type of object is detected in fig. 2: human hand H1, human hand H2, human hand H3, human hand H4, and human hand H5, second class objects: human B1, human B2, and human B3. In the above embodiment, the combining operation is performed for the first-type object human hand H5. For example, two are arbitrarily selected from the second class of objects: the human body B2 and the human body B3 are combined with the first kind of subject human hand H5 to obtain a subject Group1 (human hand H5, human body B2, and human body B3). Alternatively, the first-type subject human hand H5 may be combined with each of the second-type subjects (human body B1, human body B2, and human body B3) obtained by detection to obtain a subject Group2 (human hand H5, human body B1, human body B2, and human body B3).

In some optional embodiments, the generating at least one object group based on the detected first class object and second class object comprises: determining at least two second-class objects meeting a preset relative position relation with the first-class object as candidate associated objects of the first-class object according to the detected position information of the first-class object and the second-class object; and combining the first class object and each candidate related object of the first class object into an object group.

In the above embodiment, a relative position relationship may be preset, and at least two second class objects satisfying the relative position relationship with the first class object may be determined as candidate associated objects of the first class object according to the position information of the first class object and the second class object. Taking fig. 2 as an example, the relative position relationship may be preset as follows: and an overlapping area exists between the detection frames of the first class of objects and the second class of objects. Since there are overlapping regions between the detection frame of the human hand H5 and the detection frames of the human body B2 and the human body B3, respectively, the human body B2 and the human body B3 can be taken as candidate related objects of the human hand H5 in the present embodiment. Further, the human hand H5, the human body B2, and the human body B3 may be combined into one candidate group.

Step 503, determining the matching degree between the first class object and each second class object in the same object group.

After generating the object group based on the first class object and the second class object, this step may determine a matching degree between the first class object and each of the second class objects in the same object group.

In some optional embodiments, the determining the matching degree between the first class object and each of the second class objects in the same object group includes: determining the matching degree of the first class of objects and each second class of objects in the same object group respectively based on a pre-trained neural network; the neural network is obtained by training according to a training method of the neural network provided in any embodiment of the disclosure. For example, the image to be detected for the associated object may be input into the association detection network shown in fig. 3, and the neural network (the matching detection network 33) outputs the matching degree between the first class object and each second class object in the same object group.

Step 504, determining the second class objects associated with the first class objects based on the matching degrees of the first class objects and the second class objects in the same object group.

Taking fig. 2 as an example, the same object group includes: human hand H5, human body B2, and human body B3, the matching degree between human hand H5 and human body B2 and human body B3 respectively can be obtained in this embodiment: a degree of match m1 and a degree of match m 2. This step may determine that the human hand H5 is correspondingly associated with the human body B3 based on the determined two matching degrees. In a possible implementation manner, the first class object and the second class object with the largest matching degree value in the same object group may be determined as the corresponding association relationship. In conjunction with fig. 2, in the case where the matching degree m2 is greater than the matching degree m1, it may be determined that the human hand H5 is correspondingly associated with the human body B3.

In the embodiment of the present disclosure, on the basis of detecting a first class object and a second class object in an image, an object group is generated based on one first class object and at least two second class objects, the matching degrees of the first class object and each second class object in the same object group are determined, and the second class object associated with the first class object is determined according to a plurality of matching degrees determined in the object group.

According to the method for detecting the associated object, the second class object associated with the first object can be determined from the plurality of second class objects in the form of the object group, compared with the method for directly detecting the association relation of a matching pair formed by the first class object and the second class object, global optimization of the plurality of matching pairs is realized in the form of the object group, and the second class object associated with the first class object can be determined more accurately.

For a scene with multiple objects, especially a scene with multiple objects in an image that are blocked or overlapped, the method for detecting an associated object provided by this embodiment uses multiple first-class objects and multiple second-class objects that may have an association relationship in the image as detection objects in the same group in the form of an object group, and implements global optimization of association detection of multiple matching pairs formed by the first-class objects and the second-class objects in the image on the basis of the object group, thereby improving accuracy of a matching degree calculation result between the first-class objects and the second-class objects.

In some alternative embodiments, after detecting the first type of object and the second type of object in the graph, a third type of object in the image may also be detected. Wherein the third class of objects comprises a second human body part object. For example, the second body part object comprises a human face object or a human hand object.

An object group is generated based on the detected objects of the first type, the at least two objects of the second type and the at least two objects of the third type in the image. Then, in the same object group, the matching degree of the first class object with each second class object is determined, and the matching degree of the first class object with each third class object is determined. Determining second class objects correspondingly associated with the first class objects based on the matching degree of the first class objects and each second class object in the same object group; and determining the third class objects correspondingly associated with the first class objects based on the matching degree of the first class objects and each third class object in the same object group.

In the above optional embodiment, when detecting the associated object, the second class object associated with the first class object and the third class object associated with the first class object in the image may be determined simultaneously, that is, the association detection manner may determine the association relationship among the first class object, the second class object and the third class object in the image simultaneously, and it is not necessary to separately detect the association relationship between the first class object and the second class object in the image, or separately detect the association relationship between the first class object and the third class object in the image. For a multi-object scene, especially a scene with occlusion or overlap among multiple objects in an image, the present embodiment uses a first class object, a second class object, and a third class object, which may have an association relationship in the image, as detection objects in the same group in the form of an object group, and determines the association relationship among the first class object, the second class object, and the third class object in the image simultaneously on the basis of the object group.

As shown in fig. 6, the present disclosure provides a training apparatus for a neural network, which may perform a training method for a neural network according to any embodiment of the present disclosure. The apparatus may include an object detection module 601, a candidate set generation module 602, a degree of match determination module 603, a group association loss determination module 604, and a network parameter adjustment module 605. Wherein:

an object detection module 601, configured to detect a first class object and a second class object in an image;

a candidate group generating module 602, configured to generate at least one candidate group based on the detected first class objects and second class objects, where the candidate group includes at least one of the first class objects and at least two of the second class objects;

a matching degree determining module 603, configured to determine, based on a neural network, matching degrees between the first class of objects and each second class of objects in the same candidate object group, respectively;

a group association loss determining module 604, configured to determine a group association loss of the candidate object set according to matching degrees of the first class object and each second class object in the same candidate object set, where the group association loss positively relates to a matching degree between the first class object and a non-associated second class object;

a network parameter adjusting module 605, configured to adjust a network parameter of the neural network according to the group association loss.

In some optional embodiments, the group association loss is also inversely related to a degree of matching between the first class of objects and the associated second class of objects within the candidate set of objects.

In some optional embodiments, as illustrated in fig. 7, the apparatus further comprises: a training completion determining module 701, configured to determine that the neural network completes training when the group association loss is smaller than a preset loss value.

In some optional embodiments, the object detection module 601, when configured to detect the first class object and the second class object in the image, includes: extracting a feature map of the image; determining a first class object and a second class object in the image according to the feature map; the matching degree determining module 603, when configured to determine, based on a neural network, matching degrees between the first class of objects and the second class of objects in the same candidate object group, includes: determining a first characteristic of the first class of objects according to the characteristic diagram; according to the feature map, determining second features of all second-class objects in the candidate object group to obtain a second feature set corresponding to the first features; splicing each second feature in the second feature set with the first feature respectively to obtain a spliced feature set; and determining the matching degree between the second class object and the first class object corresponding to the splicing features in the splicing feature set based on the neural network.

In some optional embodiments, each second class object in the candidate object group and the first class object satisfy a preset relative position relationship; or, there is an overlapping area between the detection frame of each second class object in the candidate object group and the detection frame of the first class object.

In some optional embodiments, the first class of objects comprises first human body part objects, and the second class of objects comprises human body objects; alternatively, the first class of objects comprises human objects and the second class of objects comprises first human part objects.

In some alternative embodiments, the first human body part object comprises a human face object or a human hand object.

In some optional embodiments, the object detection module 601 is further configured to detect a third class of objects in the image; the candidate group generating module 602, when configured to generate at least one candidate group based on the detected first class object and the second class object, includes: generating at least one candidate object group based on the detected first class objects, second class objects and third class objects, each candidate object group further comprising at least two of the third class objects; the matching degree determining module 603 is further configured to determine, based on a neural network, matching degrees between the first class of objects and each third class of objects in the same candidate object group; the group association loss positively correlates to a degree of match between the first class of objects and a non-associated third class of objects.

In some alternative embodiments, the third class of objects comprises a second body part object.

As shown in fig. 8, the present disclosure provides a related object detection apparatus, which may perform the related object detection method according to any embodiment of the present disclosure. The apparatus may include a detection module 801, an object group generation module 802, a determination module 803, and an associated object determination module 804. Wherein:

a detection module 801, configured to detect a first class object and a second class object in an image;

an object group generating module 802, configured to generate at least one object group based on the detected first class objects and second class objects, where the object group includes one first class object and at least two second class objects;

a determining module 803, configured to determine matching degrees between the first class object and each second class object in the same object group;

an associated object determining module 804, configured to determine, based on matching degrees between the first class object and each second class object in the same object group, a second class object associated with the first class object.

In some optional embodiments, the object group generating module 802, when configured to generate at least one object group based on the detected first class object and the second class object, includes: performing a combining operation on the detected first-class objects; the combining operation includes: combining the first-class object and any two detected second-class objects into an object group; alternatively, the first-type object and each of the detected second-type objects are combined into one object group.

In some optional embodiments, the object group generating module 802, when configured to generate at least one object group based on the detected first class object and the second class object, includes: determining at least two second-class objects meeting a preset relative position relation with the first-class object as candidate associated objects of the first-class object according to the detected position information of the first-class object and the second-class object; and combining the first class object and each candidate related object of the first class object into an object group.

In some optional embodiments, the first class of objects comprises first human body part objects, and the second class of objects comprises human body objects; alternatively, the first class of objects comprises human objects and the second class of objects comprises first human part objects.

In some alternative embodiments, the first human body part object comprises a human face object or a human hand object.

In some optional embodiments, the detecting module 801 is further configured to detect a third class of objects in the image; the object group generating module 802, when configured to generate at least one object group based on the detected first class object and the second class object, includes: generating at least one object group based on the detected first class objects, second class objects and third class objects, wherein the object group further comprises at least two third class objects; the determining module 803 is further configured to determine matching degrees between the first class of objects and each third class of objects in the same object group; the associated object determining module 804 is further configured to determine a third class object associated with the first class object based on the matching degree between the first class object and each third class object in the same object group.

In some alternative embodiments, the third class of objects comprises a second body part object.

In some optional embodiments, the determining module 803, when configured to determine the matching degrees of the first class of objects and the second class of objects in the same object group, includes: determining the matching degree of the first class of objects and each second class of objects in the same object group respectively based on a pre-trained neural network; the neural network is obtained according to a training method of the neural network provided by any embodiment of the disclosure.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of at least one embodiment of the present disclosure. One of ordinary skill in the art can understand and implement it without inventive effort.

The present disclosure also provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor is capable of implementing the neural network training method according to any embodiment of the present disclosure, or implementing the associated object detection method according to any embodiment of the present disclosure when executing the program.

Fig. 9 is a schematic diagram illustrating a more specific hardware structure of a computer device according to an embodiment of the present disclosure, where the computer device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via bus 1050.

The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present disclosure.

The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called to be executed by the processor 1010.

The input/output interface 1030 is used for connecting an input/output module to input and output information. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.

The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present apparatus and other apparatuses. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).

Bus 1050 includes a path that transfers information between various components of the device, such as processor 1010, memory 1020, input/output interface 1030, and communication interface 1040.

It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.

The present disclosure also provides a non-transitory computer-readable storage medium on which a computer program is stored, which, when executed by a processor, is capable of implementing the method for training a neural network of any of the embodiments of the present disclosure, or implementing the method for detecting a related object of any of the embodiments of the present disclosure.

The non-transitory computer readable storage medium may be, among others, ROM, Random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like, and the present disclosure is not limited thereto.

In some optional embodiments, the disclosed embodiments provide a computer program product comprising computer readable code that when run on a device, a processor in the device performs a training method for a neural network implementing any of the embodiments of the present disclosure, or a detection method for an associated object implementing any of the embodiments of the present disclosure. The computer program product may be embodied in hardware, software or a combination thereof.

Other embodiments of the present disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

The above description is only exemplary of the present disclosure and is not intended to limit the present disclosure, so that any modification, equivalent replacement, or improvement made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

26页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:关联对象的检测方法及装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!