Object detection network training and detection method, device, equipment and storage medium

文档序号：1926659 发布日期：2021-12-03 浏览：3次中文

阅读说明：本技术 对象检测网络的训练与检测方法、装置、设备和存储介质 (Object detection network training and detection method, device, equipment and storage medium ) 是由王柏润张学森刘春亚陈景焕伊帅于 2021-09-13 设计创作，主要内容包括：本申请提出对象检测网络的训练与检测方法、装置、设备和存储介质。对象检测网络的训练方法包括对输入对象检测网络的图像数据集中的各图像进行对象检测得到各图像包含的对象被预测为多个预设类型中各预设类型的置信度,多个预设类型包括被图像数据集所标注的一个或多个标注类型和未被图像数据集标注的一个或多个非标注类型；针对每个对象,根据对象被预测为各非标注类型的非相关置信度确定对象对非标注类型的参考标注信息；针对每个对象,根据对象被预测为各预设类型的置信度、对象的真实标注信息和对象对各非标注类型的参考标注信息确定对象被预测为各预设类型的损失信息；基于各对象被预测为各预设类型的损失信息调整对象检测网络的网络参数。(The application provides a training and detecting method, a device, equipment and a storage medium of an object detection network. The training method of the object detection network comprises the steps of carrying out object detection on each image in an image data set input into the object detection network to obtain confidence coefficients of each preset type of an object predicted to be contained in each image, wherein the plurality of preset types comprise one or more marked types marked by the image data set and one or more non-marked types not marked by the image data set; for each object, determining reference labeling information of the object to the non-labeling type according to the non-relevant confidence coefficient of the object predicted as each non-labeling type; for each object, determining the loss information of the object predicted as each preset type according to the confidence coefficient of the object predicted as each preset type, the real marking information of the object and the reference marking information of the object to each non-marking type; and adjusting network parameters of the object detection network based on the loss information of each object predicted to be of each preset type.)

1. A method of training an object detection network, comprising:

performing object detection on each image in an image data set input into the object detection network to obtain a confidence coefficient that an object contained in each image is predicted to be each preset type in a plurality of preset types, wherein the plurality of preset types comprise one or more labeled types labeled by the image data set and one or more non-labeled types not labeled by the image data set;

for each object, determining reference labeling information of the object to the non-labeling type according to the non-relevant confidence degree of the object predicted as each non-labeling type;

for each object, determining loss information of the object predicted as each preset type according to the confidence degree of the object predicted as each preset type, the real labeling information of the object and the reference labeling information of the object to each non-labeling type;

and adjusting the network parameters of the object detection network based on the loss information of each preset type predicted by each object.

2. The method of claim 1, wherein the determining the reference annotation information of the object for the non-annotated type according to the non-relevant confidence that the object is predicted to be the non-annotated type comprises:

under the condition that the non-correlation confidence coefficient reaches a preset positive sample confidence coefficient, determining that the reference marking information of the object to the non-marking type is first preset reference marking information;

under the condition that the non-correlation confidence coefficient does not reach a preset negative sample confidence coefficient, determining that the reference marking information of the object to the non-marking type is second preset reference marking information;

wherein the positive sample confidence is not less than the negative sample confidence.

3. The method of claim 2, further comprising:

and under the condition that the non-correlation confidence coefficient reaches the negative sample confidence coefficient and does not reach the positive sample confidence coefficient, determining that the reference annotation information of the object to the non-annotation type is third preset reference annotation information.

4. The method according to any one of claims 1 to 3, wherein each of the predetermined types is determined as the annotated type or the non-annotated type by:

acquiring the one or more annotation types annotated in the image dataset;

respectively determining each preset type as a current type, and executing:

determining whether the current type matches one of the one or more annotation types;

and if not, determining the current type as the non-labeling type.

5. The method according to any one of claims 1 to 4, wherein the determining the loss information of the object predicted as each of the preset types according to the confidence of the object predicted as each of the preset types, the real labeling information of the object, and the reference labeling information of the object to each of the non-labeled types includes:

for each non-annotation type, determining first loss information of the object predicted as the non-annotation type based on a difference between a non-relevant confidence that the object is predicted as the non-annotation type and reference annotation information of the object on the non-annotation type;

for each annotation type, determining second loss information of the object predicted as the annotation type according to a difference between the confidence that the object is predicted as the annotation type and the real annotation information of the object.

6. The method of claim 5, wherein said adjusting network parameters of said object detection network based on loss information for each of said objects predicted to be of each of said predetermined types comprises:

for each object, determining the sum of the first loss information and the second loss information of the object to obtain total loss information of the object;

determining a gradient of descent in a back propagation process according to the total loss information of each of the objects;

and adjusting the network parameters of the object detection network through back propagation according to the descending gradient.

7. The method according to any one of claims 1 to 6, wherein at least two of the image data sets input to the object detection network are not identical in annotation type.

8. A human subject detection method, comprising:

acquiring a scene image;

performing object detection on the scene image through an object detection network to obtain human body objects contained in the scene image and confidence coefficients of the human body objects which are predicted to be of each preset type; the object detection network is obtained by training according to the training method of the object detection network of any one of claims 1 to 7;

determining the highest confidence coefficient of the confidence coefficients of the human body objects predicted to be of the preset types, and

and determining the preset type corresponding to the highest confidence coefficient as the object type of the human body object.

9. The method of claim 8, wherein the first and second light sources are selected from the group consisting of,

the human subject includes at least one of: face, hand, elbow, shoulder, leg, torso;

the preset type includes at least one of: a face class; a human being; elbows; the shoulder class; a leg class; trunk type; a background class.

10. A human subject detection method, comprising:

acquiring a plurality of image sets; wherein the object types marked by at least two image sets in the plurality of image sets are not identical:

performing object detection on each image in the plurality of image sets through an object detection network to obtain a human body object contained in each image and confidence coefficients of each preset type of the human body object predicted; the object detection network is obtained by training according to the training method of the object detection network of any one of claims 1 to 7;

determining the highest confidence coefficient of the confidence coefficients of the human body objects predicted to be of the preset types, and

and determining the preset type corresponding to the highest confidence coefficient as the object type of the human body object.

11. An apparatus for training an object detection network, comprising:

the detection module is used for carrying out object detection on each image in the image data set input into the object detection network to obtain confidence coefficients of each preset type of the predicted objects contained in each image;

a first determining module, configured to determine, according to an annotation type annotated by the image data set in the multiple preset types, a non-standard type that is not annotated by the image data set in the multiple preset types;

a second determining module, configured to determine, for each object, reference annotation information of the object for the non-annotation type according to a non-relevant confidence that the object is predicted as the non-annotation type;

a third determining module, configured to determine, for each object, loss information of each of the preset types of the object predicted as the object according to the confidence that the object is predicted as each of the preset types, the real tagging information of the object, and reference tagging information of the object to each of the non-tagged types;

and the adjusting module is used for adjusting the network parameters of the object detection network based on the loss information of each preset type predicted by each object.

12. A human subject detection apparatus comprising:

the first acquisition module is used for acquiring a scene image;

the first prediction module is used for carrying out object detection on the scene image through an object detection network to obtain human body objects contained in the scene image and the confidence coefficients of the human body objects which are predicted to be in each preset type; the object detection network is obtained by training according to the training method of the object detection network of any one of claims 1 to 7;

a first object type determination module for

Determining the highest confidence coefficient of the confidence coefficients of the human body objects predicted to be of the preset types, and

and determining the preset type corresponding to the highest confidence coefficient as the object type of the human body object.

13. A human subject detection apparatus comprising:

a second acquisition module for acquiring a plurality of image sets; wherein the object types marked by at least two image sets in the plurality of image sets are not identical:

a second prediction module, configured to perform object detection on each image in the multiple image sets through an object detection network, to obtain a human body object included in each image, and a confidence that the human body object is predicted to be of each preset type; the object detection network is obtained by training according to the training method of the object detection network of any one of claims 1 to 7;

a second object type determination module for

Determining the highest confidence coefficient of the confidence coefficients of the human body objects predicted to be of the preset types, and

and determining the object type corresponding to the highest confidence coefficient as the object type of the human body object.

14. An electronic device, the device comprising a memory for storing computer instructions executable on a processor, the processor for implementing the method of any one of claims 1 to 10 when executing the computer instructions.

15. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method of any one of claims 1 to 10.

Technical Field

The present application relates to computer technologies, and in particular, to a method, an apparatus, a device, and a storage medium for training and detecting an object detection network.

Background

Object detection techniques are very important in the field of computer vision. In order to improve the universality of the object detection network, one network often needs to support multiple types of object detection tasks. In practical situations, the training sample sets may not be labeled for all object types that can be detected by the object detection network, and therefore, a plurality of training sample sets (where a plurality of training sample sets are combined to label all object types) are required to jointly train the object detection network.

Disclosure of Invention

In view of the above, the present application at least discloses a training method for an object detection network, which includes: performing object detection on each image in an image data set input into the object detection network to obtain a confidence coefficient that an object contained in each image is predicted to be each preset type in a plurality of preset types, wherein the plurality of preset types comprise one or more labeled types labeled by the image data set and one or more non-labeled types not labeled by the image data set; for each object, determining reference labeling information of the object to each non-labeling type according to the non-relevant confidence degree of the object predicted as each non-labeling type; for each object, determining loss information of the object predicted as each preset type according to the confidence degree of the object predicted as each preset type, the real labeling information of the object and the reference labeling information of the object to each non-labeling type; and adjusting the network parameters of the object detection network based on the loss information of each preset type predicted by each object.

In some embodiments, the determining, according to the non-related confidence that the object is predicted as the non-labeled type, reference labeling information of the object on the non-labeled type includes: determining the reference marking information as first preset reference marking information under the condition that the non-correlation confidence coefficient reaches a preset positive sample confidence coefficient; under the condition that the non-correlation confidence coefficient does not reach the preset negative sample confidence coefficient, determining the reference marking information as second preset reference marking information; wherein the confidence of the positive sample is not less than the confidence of the negative sample.

In some illustrative embodiments, the method further comprises: and under the condition that the non-correlation confidence coefficient reaches the negative sample confidence coefficient and does not reach the positive sample confidence coefficient, determining the reference marking information as third preset reference marking information.

In some illustrated embodiments, each of the preset types is determined to be the label type or the non-label type by: acquiring the one or more annotation types annotated in the image dataset; respectively determining each preset type as a current type, and executing: determining whether the current type matches one of the one or more annotation types; and if not, determining the current type as the non-labeling type.

In some embodiments, the determining, according to the confidence level of the object predicted as each of the preset types, the real tagging information of the object, and the reference tagging information of the object to each of the non-tagged types, loss information of the object predicted as each of the preset types includes: for each non-annotation type, determining first loss information of the object predicted as the non-annotation type based on a difference between a non-relevant confidence that the object is predicted as the non-annotation type and the reference annotation information; for each annotation type, determining second loss information of the object predicted as the annotation type according to a difference between the confidence that the object is predicted as the annotation type and the real annotation information of the object.

In some embodiments shown, adjusting the network parameters of the object detection network based on the loss information of each of the objects predicted to be of each of the preset types includes: for each object, determining the sum of the first loss information and the second loss information of the object to obtain total loss information of the object; determining a gradient of descent in a back propagation process according to the total loss information of each of the objects; and adjusting the network parameters of the object detection network through back propagation according to the descending gradient.

In some embodiments shown, at least two of the image data sets input to the object detection network are not identical in annotation type.

The application also provides a human body object detection method, which comprises the following steps: acquiring a scene image; performing object detection on the scene image through an object detection network to obtain human body objects contained in the scene image and confidence coefficients of the human body objects which are predicted to be of each preset type; the object detection network comprises a detection network obtained by training according to the network training method shown in any one of the embodiments; and determining the highest confidence coefficient of the confidence coefficients of the human body object predicted as each preset type, and determining the preset type corresponding to the highest confidence coefficient as the object type of the human body object.

In some illustrated embodiments, the human subject includes at least one of: face, hand, elbow, shoulder, leg, torso; the preset type includes at least one of the following: a face class; a human being; elbows; the shoulder class; a leg class; trunk type; a background class.

The application also provides a human body object detection method, which comprises the following steps: acquiring a plurality of image sets; wherein the object types marked by at least two image sets in the plurality of image sets are different: performing object detection on images in the plurality of image sets through an object detection network to obtain human body objects contained in the images and confidence degrees that the human body objects are predicted to be of each preset type; the object detection network comprises a detection network obtained by training according to the network training method shown in any one of the embodiments; and determining the highest confidence coefficient of the confidence coefficients of the human body object predicted as each preset type, and determining the object type corresponding to the highest confidence coefficient as the object type of the human body object.

The present application further provides a training apparatus for an object detection network, including: the detection module is used for carrying out object detection on the images input into the object detection network to obtain confidence coefficients that the objects contained in each image are predicted to be in each preset type; the first determining module is used for determining non-labeling types which do not belong to the object types in the preset types according to the object types labeled by the images to which the objects belong; a second determining module, configured to determine, according to a non-relevant confidence that the object is predicted as each of the non-labeled types, reference labeling information of the object for the non-labeled type; a third determining module, configured to determine, according to confidence that the object is predicted as each preset type, the real annotation information of the object, and the reference annotation information, loss information that the object is predicted as each preset type; and the adjusting module is used for adjusting the network parameters of the object detection network based on the loss information.

The application also provides a human object detection device, including: the first acquisition module is used for acquiring a scene image; the first prediction module is used for carrying out object detection on the scene image through an object detection network to obtain human body objects contained in the scene image and the confidence coefficients of the human body objects which are predicted to be in each preset type; the object detection network comprises a detection network obtained by training according to the network training method shown in any one of the embodiments; and the first object type determining module is used for determining the highest confidence coefficient of the confidence coefficients of the human body object predicted as each preset type and determining the preset type corresponding to the highest confidence coefficient as the object type of the human body object.

The application also provides a human object detection device, including: a second acquisition module for acquiring a plurality of image sets; wherein the object types marked by at least two image sets in the plurality of image sets are different: a second prediction module, configured to perform object detection on images in the multiple image sets through an object detection network, so as to obtain human body objects included in the images and confidence levels that the human body objects are predicted as preset types; the object detection network comprises a detection network obtained by training according to the network training method shown in any one of the embodiments; and the second object type determining module is used for determining the highest confidence coefficient of the confidence coefficients of the human body object predicted as each preset type and determining the object type corresponding to the highest confidence coefficient as the object type of the human body object.

The present application further proposes an electronic device, which comprises a memory for storing computer instructions executable on a processor, and a processor for implementing the method shown in any of the foregoing embodiments when executing the computer instructions.

The present application also proposes a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the method shown in any of the previous embodiments.

In the above technical solution, object detection may be performed on each image in an image dataset input to an object detection network, so as to obtain a confidence that an object included in each image is predicted as each preset type in a plurality of preset types, where the plurality of preset types include one or more annotated types annotated by the image dataset and one or more non-annotated types not annotated by the image dataset; determining reference labeling information of each object to each non-labeling type according to the non-relevant confidence degree of each object predicted as each non-labeling type; determining loss information of each object predicted as each preset type according to the confidence of each object predicted as each preset type, the real labeling information of each object and the reference labeling information of each object to each non-labeling type, and adjusting the network parameters of the object detection network based on the loss information.

Therefore, the reference marking information corresponding to the detected object can be added under the condition that the detected object is predicted to be of the unmarked type, so that accurate loss information can be determined based on the added reference marking information during network training, the network can learn accurate information, the network detection accuracy is improved, and the false alarm rate is reduced.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

In order to more clearly illustrate one or more embodiments of the present application or technical solutions in the related art, the drawings needed to be used in the description of the embodiments or the related art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in one or more embodiments of the present application, and other drawings can be obtained by those skilled in the art without inventive exercise.

FIG. 1 is a flow chart of a method for training an object detection network according to the present disclosure;

fig. 2 is a schematic flow chart of a loss information determination method according to the present application;

fig. 3 is a schematic flowchart of an object detection network training method according to the present application;

FIG. 4 is a flow chart of a method of determining seed loss information according to the present application;

fig. 5 is a schematic method flow diagram of a human body object detection method according to the present application;

fig. 6 is a schematic method flow diagram of a human body object detection method according to the present application;

FIG. 7 is a schematic structural diagram of an object detection network training apparatus according to the present application;

fig. 8 is a schematic diagram of a hardware structure of an electronic device shown in the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It should also be understood that the word "if" as used herein may be interpreted as "at … …" or "at … …" or "in response to a determination," depending on the context.

In the following, a human body detection scenario is taken as an example to introduce a method for performing joint training in the related art.

In the above scenario, the object detection network (hereinafter referred to as a detection network) may detect a face object, a hand object, and an elbow object included in the target image.

In the above scenario, the detection network may be trained by the image data set 1 and the image data set 2. Wherein, the data set 1 is labeled with objects of human face type and human hand type. The above data set 2 is labeled with objects of a face type and an elbow type. In some instances, the annotation may be in the form of one-hot (one-hot) code. For example, in the data set 1, the annotation information for the object of the face type may be [1, 0, 0 ]. The meaning is as follows: the subject has a true value of 1 for the predicted face type, 0 for the predicted hand type, and 0 for the predicted background type. As another example, the label information for the object of the human hand type may be [0, 1, 0 ]. The meaning of this is that the true value of the subject predicted as the face type is 0, the true value predicted as the hand type is 1, and the true value predicted as the background type is 0.

It is understood that, in one aspect, neither the data set 1 nor the data set 2 fully labels all object types detectable by the object detection network, and the combination of the data set 1 and the data set 2 labels all object types detectable by the object detection network. On the other hand, the unlabeled object in the data set 1 and the data set 2 can be regarded as a background classification, i.e. the true labeled information corresponding to the unlabeled object is [0, 0, 1 ]. For example, the actual label information corresponding to the unlabeled elbow object in the data set 1 is [0, 0, 1 ].

In training, the detection network may be trained based on dataset 1 and dataset 2. Here, the configuration of the detection network is not particularly limited.

In one iteration of training, the acquired data set 1 and data set 2 may be input into the object detection network, so as to obtain detection frames of each object included in the data set 1 and data set 2, and a detection result for a type of the object in the detection frames; the type detection result includes confidence levels respectively corresponding to a plurality of preset types of the object, such as a face object, a hand object, an elbow object, a background and the like.

Then, for each detected detection frame, the face type, the hand type, the elbow type, and the background type may be determined as the current type, and sub-loss information of the object in the detection frame, which is predicted as the current type, may be determined.

When the sub-loss information is determined, it may be determined whether the current type matches an object type marked in the image data set.

And if the current type matches the object type marked by the image data set, determining that the type of the object in the detection frame is predicted as the sub-loss information of the current type based on the real marking information and the confidence of the object in the detection frame.

And if the current type is not matched with the object type marked by the image data set, setting the sub-loss information to be 0.

For example, the detection box 1 is for an object detected by the object detection network, and the object in the detection box 1 is an elbow object not labeled in the data set 1. The annotation information of the object includes [0, 0, 1], that is, the true value of the object predicted as a human face is 0, the true value predicted as a human hand is 0, and the true value predicted as a background is 1. Assume that the type detection result for the object in the detection box 1 includes [0.1, 0.1, 0.7, 0.1], that is, the confidence of the object being a face is 0.1, the confidence of a human hand is 0.1, the confidence of an elbow is 0.7, and the confidence of a background is 0.1.

Since the data set 1 is labeled for the face type, the sub-loss information of the object predicted to be a face can be determined based on the true value 0 of the object predicted to be a face and the confidence 0.1 that the object is predicted to be a face.

Since dataset 1 is labeled for a human hand type, the child loss information for the object being predicted as a human hand may be determined based on the true value 0 of the object being predicted as a human hand and the confidence 0.1 that the object is predicted as a human hand.

Since the data set 1 does not label the elbow type, it is not necessary to consider the sub-loss information of the object predicted as the elbow, that is, the sub-loss information of the object predicted as the elbow may be set to 0.

Since dataset 1 is annotated for the background type, the sub-loss information for the object when predicted to be background can be determined based on the true value 1 of the object when predicted to be background and the confidence 0.1 of the object when predicted to be background.

After determining that the objects in the detection frame are detected as the loss information of each object type, the sum of the loss information corresponding to each determined object type may be determined as the loss information corresponding to the objects in the detection frame, where the loss information corresponding to the objects in the detection frame represents the difference between the type detection result of the objects in the detection frame and the real annotation information.

After determining the loss information corresponding to the objects in each detection frame, the sum of the loss information corresponding to the objects in each detection frame detected in the image may be determined as the total loss information of the iteration, and the network parameters of the detection network may be adjusted according to the total loss information.

And finally, repeating the iteration process until the detection network converges and finishes training.

It is understood that the loss information of the type in which the object in the image is predicted to be unlabeled for the image data set to which it belongs is set to 0 in the related art. Since the loss information is closer to 0, which means that the detection result is more correct, the neural network generally performs parameter update with the loss information close to 0 as a target in training, so that in the iterative training process, the unlabeled object may be divided into the unlabeled type (non-background type) and not divided into the background type, and in fact, the unlabeled object should be divided into the background type, which may lead to inaccurate loss information being introduced in the related art, which leads to inaccurate information being learned by the detection network, resulting in a higher false alarm rate of the detection network.

For example, in the example of determining the loss information corresponding to the detection frame 1, the object included in the detection frame 1 is an elbow object (unmarked object), and in this case, the classification thereof should be determined as the background type, but in the example, the classification thereof is divided into types unmarked in the image. It can be seen that inaccurate loss information may be introduced into the related art, which results in inaccurate information being learned by the detection network, and leads to false alarm of the object detection network.

In view of this, the present application provides a training method for an object detection network. The method increases the reference marking information of the detected object under the condition that the detected object is predicted to be of an unmarked type, so that accurate loss information can be determined based on the increased reference marking information during network training, the network can learn accurate information, the network detection accuracy is improved, and the false alarm rate is reduced.

The unlabeled type is an object type that can be predicted by an object detection network but is not labeled in the image data set.

Referring to fig. 1, fig. 1 is a flowchart illustrating a method of network training according to the present application.

The training method illustrated in fig. 1 may be applied to an electronic device. The electronic device may execute the training method by installing a software system corresponding to the training method. The type of the electronic device may be a notebook computer, a server, a mobile phone, a PAD terminal, etc., and is not particularly limited in this application. The electronic device may be a client device or a server device, and is not particularly limited herein.

As shown in fig. 1, the method may include:

s102, carrying out object detection on each image in the image data set input into the object detection network to obtain confidence of each preset type of the object contained in each image, wherein the object is predicted to be in each preset type. The preset types include all object types that can be detected by the object detection network, for example, an object type labeled by the image data set (hereinafter, may be referred to as labeled type labeled category for short) and an object type not labeled by the image data set (hereinafter, may be referred to as non-labeled type non-labeled category for short). Accordingly, the confidence level that the object is predicted as each of the preset types includes a confidence level that the object is predicted as the annotated type (hereinafter, may be referred to as an associated confidence level), and a confidence level that the object is predicted as the non-annotated type (hereinafter, may be referred to as a non-associated confidence level).

The object detection network described above may be used for object detection of images. For example, the object detection network may be a human object detection network. At this time, the human body object in the target image can be detected through the detection network. The object detection network may be a network constructed based on RCNN (Region Convolutional Neural Networks), FAST-RCNN (FAST Region Convolutional Neural Networks), or FAST-RCNN (FASTER Region Convolutional Neural Networks). The present application does not limit the network structure of the object detection network.

The output result of the above object detection network may be a confidence level that the object included in the input image is predicted to be of each preset type.

The preset types can be preset by developers according to requirements. Assuming that the object detection network needs to include objects of faces, hands and elbows appearing in the detected image, the preset types can be set as a face type, a hand type, an elbow type and a background type.

The image of the input object detection network may be from a plurality of image data sets, and the types of objects marked by at least two of the image data sets are not identical.

The image data set may comprise several annotated image samples. The object type of the image to be labeled may be only a part of the preset types. For example, if the preset types include a human face class, a human hand class, an elbow class and a background class, the object type of the image to be labeled may be only the human face class or the human hand class.

At present, image data sets labeled for part of object types are widely used, in the application, the image data sets can be used for training object detection networks, and in addition, a plurality of image data sets with labeling information of different object types can be fused to train the object detection networks for a plurality of object types, so that the training cost is reduced.

The confidence level represents a degree of reliability when the object detected from the image is predicted to be of each predetermined type, and may be represented by a probability value. According to the difference between the labeling information and the confidence coefficient, loss information corresponding to the detection result of the object detection network for the object can be determined.

In some examples, in S102, images of a plurality of image data sets may be input into the object detection network for calculation, and an object included in each image data set and a type detection result of the object may be obtained.

Then, S104 may be executed to determine, according to the labeled type labeled by the image data set, a non-labeled type that does not belong to the object type labeled by the image data set in the preset types.

The annotation type is specifically an object type annotated by the image data set. In some examples, when constructing an image dataset, object type information labeled for the image dataset may be packaged into the image dataset. At this time, the type of the object marked by the image in the image data set can be determined by acquiring the marked object type information.

The non-labeled type is specifically an object type that does not belong to the labeled type in the preset types. For example, the preset types include a face type, a human hand type, an elbow type, and a background type, the object type labeled by the image data set includes a face type, a human hand type, and a background type, and the elbow type in the preset types is the non-labeled type.

In some examples, when determining the non-annotation type, the type of the object annotated in the image data set may be obtained as an annotation type. Then, respectively determining each preset type as a current type, and executing: determining whether the current type matches an annotation type of the image data set; and if not, determining the current type as the non-labeling type.

In some examples, the same object type may be characterized using the same identification, and different object types may be characterized using different identifications. At this time, whether the current type is matched with the label type can be determined by determining whether the identifier of the current type is consistent with the identifier corresponding to the label type.

Therefore, the non-labeling type in the preset types can be determined, and then the reference labeling information of the object predicted as the non-labeling type can be determined, so that accurate loss information is obtained, and the network training effect is improved.

After determining the non-labeling type, S106 may be executed to determine reference labeling information of the object to the non-labeling type according to the non-related confidence that the object is predicted as the non-labeling type.

The reference label information of the object for the non-label type is information that is virtually labeled when the object is predicted to be the non-label type.

When the object is predicted as an unmarked type (the unmarked type), there is a possibility that accurate loss information cannot be determined because the marking information corresponding to the object cannot be acquired. Therefore, in the related art, the loss information is set to 0, that is, the loss when the object is predicted as the non-labeled type is not considered, which may introduce wrong loss information during model training. In the application, when the object is predicted to be the non-labeled type, the reference labeling information is virtually labeled for the object, so that more accurate loss information can be introduced, and the network training effect is further improved.

In some examples, whether the object is a positive or negative sample of the non-annotated type may be determined based on a non-relevant confidence that the object is predicted to be the non-annotated type.

If the sample is a positive sample, the reference marking information may be determined to be the first preset reference marking information (empirical threshold). For example, the first preset reference mark information may be 1.

If the reference marking information is a negative example, the reference marking information can be determined to be second preset reference marking information (experience threshold). For example, the second preset reference mark information may be 0.

In some examples, when the object is determined to be a positive sample or a negative sample of the non-labeled type, the object type of the object (unlabeled object) may be predicted by using a trained object type determination network, so as to obtain the object type of the object. The object type determination network may be understood as a teacher model, which is obtained by training a plurality of training samples labeled with the preset types.

If the object type of the object obtained through the object type determination network is consistent with the non-labeled type, the object can be determined to be a positive sample of the non-labeled type.

And if the object type of the object obtained by the object type determination network is inconsistent with the non-labeled type, determining that the object is a negative sample of the non-labeled type.

In some examples, a first preset threshold may be set. If the detected non-correlation confidence that the object is of the non-labeled type reaches the first preset threshold, the object may be considered as a positive sample of the non-labeled type. Conversely, the object may be considered as a negative example of the non-labeled type.

In some examples, a second preset threshold may be set. And if the non-correlation confidence does not reach the second preset threshold, the object can be regarded as a negative sample of the non-labeling type. Conversely, the object may be considered as a positive sample of the non-labeled type.

By carrying out threshold judgment on the non-correlation confidence coefficient, the time for determining the true value and the calculation overhead are reduced, the efficiency for determining the true value is improved, and the network training efficiency is further improved.

In some examples, a positive exemplar confidence and a negative exemplar confidence may be set. And if the confidence reaches the confidence of the positive sample, the object can be regarded as the positive sample. If the confidence does not reach the confidence of the negative sample, the object can be considered as a negative sample.

In this example, by setting the confidence of the positive sample and the confidence of the negative sample, more accurate positive sample and negative sample can be determined, so that more accurate information is provided for network training, and the accuracy of network detection is improved.

In some examples, the reference annotation information is determined to be the third preset reference annotation information when the non-associated confidence level reaches the negative sample confidence level and the positive sample confidence level is not reached.

The third preset reference marking information may be an empirical threshold. And may be set to 0 in some examples.

In this example, the classification of the object includes not only the positive sample and the negative sample, but also the difficult sample, and the loss information when the object is the difficult sample is set as the third preset reference marking information (for example, 0), so that in the process of training the network, the information provided by the difficult sample may not be learned, and only the information provided by the positive sample and the negative sample is learned, thereby providing more accurate information for network training, and improving the accuracy of network detection.

After determining the reference annotation information of the object predicted as the unlabeled type, S108 may be executed to determine loss information of the object predicted as each preset type according to the confidence that the object is predicted as each preset type, the real annotation information of the object, and the reference annotation information.

The loss information may be determined in two ways according to whether the object is predicted as an unmarked type.

In some examples, in response to the object being predicted as a non-annotated type, first loss information for the object being predicted as the non-annotated type may be determined based on a difference between the non-associated confidence and the reference annotation information.

For example, the first loss information may be obtained by taking the non-correlation confidence and the reference label information as input according to a preset first loss function. It should be noted that the present application does not limit the specific type of the first loss function.

In some examples, in response to the object being predicted as the annotation type, second loss information indicating that the object is predicted as the annotation type may be determined according to a difference between a confidence that the object is predicted as the annotation type and actual annotation information corresponding to the object. The labeling type comprises types except the non-labeling type in the preset types.

For example, a true value of the object predicted as the annotation type may be obtained according to the real annotation information of the image to which the object belongs, and then the second loss information may be obtained by taking, as input, a confidence that the object is predicted as the annotation type and the true value of the object predicted as the annotation type according to a preset second loss function. It should be noted that the present application does not limit the specific type of the second loss function.

In step S110, network parameters of the object detection network are adjusted based on the loss information.

In some examples, for each object in the image, a sum of the first loss information and the second loss information corresponding to the object may be determined, so that total loss information detected for the image may be obtained.

And then, determining a descending gradient in the back propagation process according to the total loss information, and adjusting the network parameters of the object detection network through back propagation according to the descending gradient.

In some examples, the image may include a plurality of objects. The detection network can detect a plurality of preset types of objects. At this time, the images may be sequentially input into the detection network, so as to obtain a detection frame of each object in the image and a confidence level that each object is predicted to be of each preset type.

Referring to fig. 2, fig. 2 is a schematic flow chart of a loss information determining method according to the present application.

As shown in fig. 2, the detection frames corresponding to the detected objects may be sequentially set as the target detection frames, and S202 and S204 may be executed:

s202, determining an image data set corresponding to the image to which the object in the target detection frame belongs. The above-described object detection in-frame object will be simply referred to as an in-frame object hereinafter.

S204, sequentially taking each preset type as a current type, and executing S2042-S2048:

s2042, determining whether the current type is matched with one of the label types of the image data set.

S2044, if the matching is successful, acquiring a true annotation value when the in-frame object is predicted to be the current type from the real annotation information corresponding to the image data set; and then determining sub-loss information when the in-frame object is predicted as the current type according to the difference between the labeling truth value and the detected confidence coefficient.

S2046, if the matching is not successful, determining reference marking information of the object in the frame according to the non-relevant confidence coefficient when the object in the frame is predicted to be of the current type; and then determining sub-loss information when the object in the frame is of the current type according to the difference between the reference marking information and the non-correlation confidence coefficient.

After determining that the intra-frame object is predicted as the sub-loss information corresponding to each object type, S2048 may be performed to determine the loss information of the detection result for the intra-frame object by means of summing or averaging the sub-loss information.

After the above steps are completed with each detection frame in the image as a target detection frame, loss information of the detection result of the image detection can be obtained.

In some examples, when the training sample set of the object detection network is a plurality of image data sets, after determining total loss information detected for the image, total loss information corresponding to each image in each image data set may be determined, and then total loss information of a detection result of detecting the image in each image data set may be determined by, for example, averaging or otherwise, and a network parameter may be updated using the total loss information.

And completing one round of training of the object detection network. The above steps may then be repeated, performing multiple rounds of training until the above detection network converges. It should be noted that the above convergence condition may be, for example, that a preset training time is reached, or a variation of the obtained joint learning loss function after M (M is a positive integer greater than 1) consecutive forward propagations is smaller than a certain threshold. The present application does not specifically limit the conditions for model convergence.

The following description of the embodiments is made in conjunction with a human detection network training scenario.

The human body detection network is specifically used for detecting a human face object, a human hand object and an elbow object contained in a target image. The human body detection network can be a detection network constructed based on a FASTER-RCNN network.

In the above scenario, the detection network may be trained by the image data set 1 and the image data set 2. It will be appreciated that in practical applications more data sets may be used.

Wherein, the data set 1 is labeled with objects of human face type and human hand type. The above data set 2 is labeled with objects of a face type and an elbow type.

In some instances, the annotation may be in the form of one-hot (one-hot) code. For example, in the data set 1, the annotation information for the object of the face type may be [1, 0, 0 ]. The meaning is as follows: the confidence that the object is predicted as a face type is 1, the confidence that the object is predicted as a hand type is 0, and the confidence that the object is predicted as a background type is 0. As another example, the label information for the object of the human hand type may be [0, 1, 0 ]. The meaning of this is that the confidence that the subject is predicted as a human face type is 0, the confidence that the subject is predicted as a human hand type is 1, and the confidence that the subject is predicted as a background type is 0.

It is understood that the object of the elbow type is not labeled in the data set 1, and the elbow type may be an unlabeled type corresponding to the data set 1. If the object of the hand type is not labeled in the data set 2, the hand type may be considered as an unlabeled type corresponding to the data set 2.

In the application, the training iteration number may be preset to be P, the initial network parameter of the detection network is Q, the loss function is L, and the network parameter adjustment is performed by using a random gradient descent method.

The positive sample confidence E and the negative sample confidence F may also be preset. When the confidence of the object predicted as an unlabeled type reaches E, the object can be considered as a positive sample, and the corresponding reference labeling information is 1. When the confidence of the object predicted as the unmarked type does not reach F, the object can be regarded as a negative sample, and the corresponding reference marking information is 0. A difficult sample may be considered if the confidence with which an object is predicted as an unlabeled type is between E and F.

Referring to fig. 3, fig. 3 is a schematic flow chart of a network training method according to the present application. It should be noted that fig. 3 illustrates a method for adjusting network parameters in a round of iterative training.

As shown in fig. 3, in one round of iterative training, S302 may be performed through the human object detection network, and each image included in the data set 1 and the data set 2 is input into the detection network once for calculation, so as to obtain a detection frame corresponding to each object included in each image, and the confidence levels of the objects in each detection frame that are predicted to be a human face class, a human hand class, an elbow class, and a background class.

Then, S304 may be performed by the total loss determining unit to determine the total loss information corresponding to the round of training.

When determining the total loss information, the detection frames detected by the current input picture may be determined as target detection frames, and the following steps may be performed:

an image data set to which the object in the target detection frame (hereinafter referred to as an in-frame object) belongs is determined.

Then, the four types are respectively used as current types, and sub-loss information when the object is predicted as the current type is determined.

Referring to fig. 4, fig. 4 is a flowchart illustrating a method of determining seed loss information according to the present application.

As shown in fig. 4, S3042 may be executed to determine whether the current category matches the labeled category labeled in the corresponding data set. If there is a match, the sub-loss information may be determined to be L (confidence, true). Where L represents a preset loss function. The above-mentioned loss function may be a logarithmic loss function, a quadratic loss function, a cross-entropy loss function, or the like. The application does not limit the type of loss function. L (confidence, true) represents the difference between the confidence that the in-frame object determined by the above-mentioned preset loss function is predicted as the current class and the real annotation information.

If not, S3044 may be executed to determine whether the non-correlation confidence that the object is predicted as the current type reaches the threshold E. If so, the sub-loss information may be determined to be L (confidence, 1). Where L (confidence, 1) represents the difference between the confidence that the object in the box is predicted as the current class and the first reference annotation information.

If the confidence level does not reach the threshold E, S3046 may be further performed to determine whether the non-associated confidence level does not reach the threshold F, and if so, the sub-loss information may be determined as L (confidence level, 0). Where L (confidence, 0) represents the difference between the confidence that the object in the box is predicted as the current class and the second reference annotation information.

If not, the above sub-loss information may be determined to be 0.

After the above steps are completed for the input images in each of the data sets 1 and 2, loss information corresponding to the detection of each input image can be obtained, and then the total loss information can be determined by means of, for example, summing or averaging.

Finally, S306 may be executed by the parameter adjusting unit to adjust the network parameters of the detection network according to the total loss information and the random gradient descent method.

And finally, repeating the iteration process until the detection network converges and finishes training.

In the above example, on one hand, when the object is predicted as an unlabeled type, the reference labeling information of the object is determined according to the corresponding confidence level, so that more accurate loss information is determined, more accurate information is provided for network training, and the network detection accuracy is further improved.

On the other hand, in the above example, when the object is a difficult sample, the corresponding loss information is determined to be 0, and compared with the related art, the situation that the loss information is determined to be 0 is reduced, so that introduction of inaccurate information is reduced, and the false alarm rate of the detection network is reduced.

The application also provides a human body object detection method. Referring to fig. 5, fig. 5 is a schematic flow chart of a method for detecting a human body object according to the present application.

As shown in fig. 5, the method may include:

and S502, acquiring a scene image.

S504, performing object detection on the scene image through an object detection network to obtain human objects included in the scene image and confidence levels of the human objects predicted as each preset type, where the object detection network may include a network trained according to the object detection network training method shown in any one of the foregoing embodiments.

S506, determining the highest confidence coefficient of the confidence coefficients of the human body object predicted as each preset type, and determining the preset type corresponding to the highest confidence coefficient as the object type of the human body object.

The scene can be any scene needing human body object detection. For example, the above-described scenario may be a dangerous driving behavior detection scenario. At this time, the human body object appearing in the captured scene image can be detected, and the detected human body object is subjected to behavior matching, so that whether dangerous behaviors are performed or not is determined. As another example, the scene may be a table game scene. At this time, a human body object appearing in the captured scene image may be detected and the detected human body object may be correlated, thereby determining an executor performing an action such as placing a game chip.

In some examples, the human body object and the preset type may be set according to business requirements. In some examples, the human subject may include at least one of: face, hand, elbow, shoulder, leg, torso. The preset type includes at least one of the following: a face class; a human being; elbows; the shoulder class; a leg class; trunk type; a background class. Therefore, various human body types appearing in the image can be detected, and the method is suitable for more service scenes.

In the above example, since the object detection network trained by the object detection network training method shown in any of the foregoing embodiments is used to perform object detection on the scene image, the accuracy rate of detecting the human body object in the image can be improved.

The application also provides a human body object detection method. Referring to fig. 6, fig. 6 is a schematic flow chart of a method for detecting a human body object according to the present application.

S602, acquiring a plurality of image sets; wherein the object types marked by at least two image sets in the plurality of image sets are not completely the same.

S604, performing object detection on the images in the plurality of image sets through an object detection network to obtain human body objects contained in the images and confidence degrees that the human body objects are predicted to be of each preset type; the object detection network may include a network obtained by training according to the object detection network training method shown in any one of the embodiments;

and S606, determining the highest confidence coefficient of the confidence coefficients of the human body object predicted as each preset type, and determining the object type corresponding to the highest confidence coefficient as the object type of the human body object.

The image data set may comprise a number of annotated image samples. The object type of the image to be labeled may be only a part of the preset types. For example, if the preset types include a human face class, a human hand class, an elbow class, and a background class, the object type to which the image is labeled may be only the human face class or the human hand class.

In the above example, since the object detection network trained by the object detection network training method shown in any one of the foregoing embodiments is used to perform object detection on images in an image set, therefore,

the object detection network can be trained by using the image data set labeled for part of the object types, and the object detection network for a plurality of object types can be trained by fusing a plurality of image data sets with labeling information of different object types, so that the training cost is reduced.

Corresponding to any one of the above embodiments, the present application further provides a training apparatus for an object detection network.

Referring to fig. 7, fig. 7 is a schematic structural diagram of a training apparatus for an object detection network according to the present application.

As shown in fig. 7, the above-mentioned device 70 may include: a detection module 71, configured to perform object detection on each image in the image dataset input to the object detection network, so as to obtain a confidence that an object included in each image is predicted to be each preset type in multiple preset types; a first determining module 72, configured to determine, according to an annotation type annotated by the image data set in the plurality of preset types, a non-standard type that is not annotated by the image data set in the plurality of preset types; a second determining module 73, configured to determine, for each object, reference labeling information of the object for the non-labeling type according to a non-related confidence that the object is predicted as the non-labeling type; a third determining module 74, configured to determine, for each object, loss information of each preset type of the object predicted as the object according to the confidence that the object is predicted as each preset type, the real tagging information of the object, and the reference tagging information of the object to each non-tagging type; an adjusting module 75, configured to adjust a network parameter of the object detection network based on the loss information of each of the objects predicted as each of the preset types.

In some illustrated embodiments, the second determining module 73 is specifically configured to: determining the reference marking information as first preset reference marking information under the condition that the non-correlation confidence coefficient reaches a preset positive sample confidence coefficient; under the condition that the non-correlation confidence coefficient does not reach the preset negative sample confidence coefficient, determining the reference marking information as second preset reference marking information; wherein the confidence of the positive sample is not less than the confidence of the negative sample.

In some illustrated embodiments, the second determining module 73 is further configured to: and under the condition that the non-correlation confidence coefficient reaches the negative sample confidence coefficient and does not reach the positive sample confidence coefficient, determining the reference marking information as third preset reference marking information.

In some illustrated embodiments, the first determining module 72 is specifically configured to: acquiring the object type marked in the image data set as a marking type; respectively determining each preset type as a current type, and executing: determining whether the current type is matched with the labeled type; and if not, determining the current type as the non-labeling type.

In some illustrated embodiments, the third determining module 74 is specifically configured to: determining first loss information that the object is predicted as the non-annotation type based on a difference between the non-associated confidence and the reference annotation information; determining second loss information of the object predicted as the annotation type according to the difference between the confidence coefficient of the object predicted as the annotation type and the real annotation information corresponding to the object; and the labeling type comprises types except the non-labeling type in the preset types.

In some illustrated embodiments, the adjusting module 75 is specifically configured to: determining the sum of the first loss information and the second loss information corresponding to each object in the image to obtain total loss information; determining a descending gradient in the back propagation process according to the total loss information; and adjusting the network parameters of the object detection network through back propagation according to the descending gradient.

In some embodiments shown, at least two of the image data sets input to the object detection network are not identical in annotation type.

The application also provides a human object detection device, including: a second acquisition module for acquiring a plurality of image sets; wherein the object types marked by at least two image sets in the plurality of image sets are not completely the same: a second prediction module, configured to perform object detection on images in the multiple image sets through an object detection network, so as to obtain human body objects included in the images and confidence levels that the human body objects are predicted as preset types; the object detection network comprises a detection network obtained by training according to the network training method shown in any one of the embodiments; and the second object type determining module is used for determining the highest confidence coefficient of the confidence coefficients of the human body object predicted as each preset type and determining the object type corresponding to the highest confidence coefficient as the object type of the human body object.

The embodiments of the object detection network training apparatus and the human body object detection apparatus shown in the present application can be applied to electronic devices. Accordingly, the present application discloses an electronic device, which may comprise: a memory for storing computer instructions executable on a processor, a processor for performing a method as shown in any of the embodiments above.

Referring to fig. 8, fig. 8 is a schematic diagram of a hardware structure of an electronic device shown in the present application.

As shown in fig. 8, the electronic device may include a processor for executing instructions, a network interface for making network connections, a memory for storing operation data for the processor, and a non-volatile memory for storing instructions corresponding to the object detection network training apparatus or the human object detection apparatus.

The embodiments of the apparatus may be implemented by software, or by hardware, or by a combination of hardware and software. Taking a software implementation as an example, as a logical device, the device is formed by reading, by a processor of the electronic device where the device is located, a corresponding computer program instruction in the nonvolatile memory into the memory for operation. In terms of hardware, in addition to the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 8, the electronic device in which the apparatus is located in the embodiment may also include other hardware according to an actual function of the electronic device, which is not described again.

It is to be understood that, in order to increase the processing speed, the instruction corresponding to the object detection network training apparatus or the human object detection apparatus may also be directly stored in the memory, which is not limited herein.

The present application proposes a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements a method as shown in any of the previous embodiments.

One skilled in the art will recognize that one or more embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (which may include, but are not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

"and/or" in this application means having at least one of the two, for example, "a and/or B" may include three schemes: A. b, and "A and B".

The embodiments in the present application are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the data processing apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to part of the description of the method embodiment.

Specific embodiments of the present application have been described above. Other embodiments are within the scope of the following claims. In some cases, the acts or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Embodiments of the subject matter and functional operations described in this application may be implemented in the following: digital electronic circuitry, tangibly embodied computer software or firmware, computer hardware that may include the structures disclosed in this application and their structural equivalents, or combinations of one or more of them. Embodiments of the subject matter described in this application can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions, encoded on a tangible, non-transitory program carrier for execution by, or to control the operation of, data processing apparatus. Alternatively or additionally, the program instructions may be encoded on an artificially generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode and transmit information to suitable receiver apparatus for execution by the data processing apparatus. The computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.

The processes and logic flows described in this application can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output. The processes and logic flows described above can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).

Computers suitable for executing computer programs may include, for example, general and/or special purpose microprocessors, or any other type of central processing unit. Generally, a central processing unit will receive instructions and data from a read-only memory and/or a random access memory. The basic components of a computer may include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer does not necessarily have such a device. Moreover, a computer may be embedded in another device, e.g., a mobile telephone, a Personal Digital Assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device such as a Universal Serial Bus (USB) flash drive, to name a few.

Computer-readable media suitable for storing computer program instructions and data can include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices (e.g., EPROM, EEPROM, and flash memory devices), magnetic disks (e.g., internal hard disk or removable disks), magneto-optical disks, and CD ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.

Although this application contains many specific implementation details, these should not be construed as limiting the scope of any disclosure or of what may be claimed, but rather as merely describing features of particular disclosed embodiments. Certain features that are described in this application in the context of separate embodiments can also be implemented in combination in a single embodiment. In other instances, features described in connection with one embodiment may be implemented as discrete components or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In some cases, multitasking and parallel processing may be advantageous. Moreover, the separation of various system modules and components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.

Thus, particular embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. Further, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some implementations, multitasking and parallel processing may be advantageous.

The above description is only for the purpose of illustrating the preferred embodiments of the present application and is not intended to limit the present application to the particular embodiments of the present application, and any modifications, equivalents, improvements, etc. made within the spirit and principles of the present application should be included within the scope of the present application.

28页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：RFID标签及天线

Object detection network training and detection method, device, equipment and storage medium

相关技术

网友询问留言