Sample weighted learning system for object detection of images and method thereof

文档序号：1905440 发布日期：2021-11-30 浏览：12次中文

阅读说明：本技术 用于图像的对象检测的样本加权学习系统及其方法 (Sample weighted learning system for object detection of images and method thereof ) 是由潘滢炜王羽刘金根姚霆梅涛于 2020-05-20 设计创作，主要内容包括：本发明提供了一种用于图像的对象检测的样本加权学习系统及其方法,所述方法包括：第一步骤：针对每个样本接收输入的第一特征、第二特征、第三特征和第四特征；第二步骤：对输入的第一特征、第二特征、第三特征和第四特征进行变换；第三步骤：使用变换函数根据第一密集特征、第二密集特征、第三密集特征和第四密集特征来生成联合样本特征；第四步骤：根据生成的样本特征来为每一个样本预测分类损失的样本权重和回归损失的样本权重；第五步骤：根据预测的分类损失的样本权重和回归损失的样本权重来计算损失函数；第六步骤：根据计算的损失函数来调整样本特征生成设备所使用的变换函数。(The invention provides a sample weighted learning system for object detection of an image and a method thereof, wherein the method comprises the following steps: the first step is as follows: receiving, for each sample, input first, second, third, and fourth features; the second step is as follows: transforming the input first feature, second feature, third feature and fourth feature; the third step: generating a joint sample feature from the first dense feature, the second dense feature, the third dense feature, and the fourth dense feature using a transformation function; the fourth step: predicting a classification lost sample weight and a regression lost sample weight for each sample according to the generated sample characteristics; the fifth step: calculating a loss function from the predicted sample weights of classification losses and the sample weights of regression losses; a sixth step: the transformation function used by the sample feature generation device is adjusted according to the calculated loss function.)

1. A sample weight generation system for image processing, the system comprising:

a feature transformation device for transforming the input first, second, third and fourth features into first, second, third and fourth dense features, respectively;

a sample feature generation device for generating a joint sample feature from the first dense feature, the second dense feature, the third dense feature and the fourth dense feature using a transformation function;

and the weight prediction device is used for predicting the sample weight of the classification loss and the sample weight of the regression loss for the sample according to the generated sample characteristics.

2. A sample weighted learning system for object detection of images, the system comprising:

an input device for receiving, for each sample, input of a first feature, a second feature, a third feature, and a fourth feature;

a feature transformation device for transforming the input first, second, third and fourth features into first, second, third and fourth dense features, respectively;

a weight prediction device for predicting a sample weight of the classification loss and a sample weight of the regression loss for each sample according to the generated sample characteristics;

a loss function calculation device for calculating a loss function from the sample weight of the predicted classification loss and the sample weight of the regression loss; and

a feature transformation adjusting device for adjusting the transformation function used by the sample feature generating device based on the calculated loss function.

3. A method of sample weight generation for image processing, the method comprising:

transforming the input first feature, second feature, third feature and fourth feature into a first dense feature, a second dense feature, a third dense feature and a fourth dense feature, respectively;

generating a joint sample feature from the first dense feature, the second dense feature, the third dense feature, and the fourth dense feature using a transformation function;

and predicting the sample weight of the classification loss and the sample weight of the regression loss for the sample according to the generated sample characteristics.

4. The sample weight generation method of claim 3, wherein:

the sample weights of the predicted classification loss and the sample weights of the regression loss are respectively obtained by a first exponential function and a second exponential function.

5. A sample weighted learning method for object detection of an image, the method comprising:

the first step is as follows: receiving, for each sample, input first, second, third, and fourth features;

the second step is as follows: transforming the input first feature, second feature, third feature and fourth feature;

the third step: generating a joint sample feature from the first dense feature, the second dense feature, the third dense feature, and the fourth dense feature using a transformation function;

the fourth step: predicting a classification lost sample weight and a regression lost sample weight for each sample according to the generated sample characteristics;

the fifth step: calculating a loss function from the predicted sample weights of classification losses and the sample weights of regression losses;

a sixth step: the transformation function is adjusted according to the calculated loss function.

6. The sample weighted learning method of claim 5, wherein the second step further comprises:

a step of transforming the input first feature into a first dense feature;

a step of transforming the input second feature into a second dense feature;

a step of transforming the input third feature into a third dense feature; and

a step of transforming the input fourth feature into a fourth dense feature.

7. The sample weighted learning method of claim 5, wherein the fourth step further comprises:

predicting a sample weight of a classification loss; and

and predicting the sample weight of the regression loss.

8. The sample weighted learning method of claim 5, wherein the first feature is a classification loss, the second feature is a regression loss, the third feature is an intersection ratio, and the fourth feature is a classification probability.

9. The sample weighted learning method of claim 5, wherein the first step further comprises receiving a fifth feature; and the method further comprises the step of transforming the input fifth feature into a fifth dense feature.

10. The sample weight learning method of claim 9, wherein the fifth feature is a mask loss.

11. The sample weighted learning method of claim 5, wherein:

the sample weights of the predicted classification loss and the sample weights of the regression loss are respectively obtained by a first exponential function and a second exponential function.

12. The sample weighted learning method of claim 5, wherein:

the sample weights of the classification loss of a group of samples including a positive sample and a negative sample are averaged as the sample weight of the classification loss of each sample.

13. The sample weighted learning method of claim 5, wherein the sixth step further comprises:

a gradient is derived from the calculated loss function and the transformation function is then adjusted according to the derived gradient.

14. A method for object detection in image processing, comprising:

receiving an input image; and

object detection is performed by obtaining a sample weight lost by classification and a sample weight lost by regression of an image using the sample weight generation method according to any one of claims 3 to 4 or the sample weight learning method according to any one of claims 5 to 13.

15. A system for object detection in image processing, comprising:

an input device for receiving an input image; and

the sample weight generation system of claim 1 or the sample weight learning system of claim 2, for obtaining a sample weight of classification loss and a sample weight of regression loss of an input image for object detection.

16. An electronic device for image processing, the device comprising a memory, a processor, and a computer program stored on the memory, wherein the processor, when executing the computer program, implements the sample weight generation method of any of claims 3 to 4 or the sample weight learning method of any of claims 5 to 13.

17. A computer-readable storage medium storing a computer program having a program code for executing the sample weight generation method according to any one of claims 3 to 4 or the sample weight learning method according to any one of claims 5 to 13 when run on a computer.

Technical Field

The present invention relates to the field of image processing, and in particular, to a sample weight generation system and method for image processing, and a sample weight learning system and method for object detection of an image.

Background

Modern image region-based object detection is a multi-task learning problem, consisting of object classification and localization. It involves region sampling (sliding window or region proposal), region classification and regression, and non-maxima suppression. Regional sampling is utilized, which converts object detection into classification tasks, thereby classifying and regressing a large number of regions. These detectors can be classified into a one-stage detector and a two-stage detector according to the manner of region search.

Typically, the most accurate object detector is based on a two-level framework, such as fast R-CNN (fast R-CNN), which rapidly narrows down the region (mostly from the background) during the region proposal phase. In contrast, one-stage detectors, such as SSD and YOLO, achieve faster detection speed but with less accuracy. This is due to the class imbalance problem (i.e. imbalance between foreground and background regions), which is a classical challenge for object detection.

The two-stage detector handles class imbalance through a region proposal mechanism and then employs various efficient sampling strategies such as selecting samples at a fixed foreground to background ratio and difficult sample mining. Although similar difficult sample mining can be applied to a one-stage detector, it is inefficient because there are a large number of simple negative samples.

Sample weighting is a very complex and dynamic process. When applied to the loss function of a multi-tasking problem, there are various uncertainties in the individual samples. If the detector uses its capabilities for accurate classification and produces poor positioning results, detection of positioning errors will compromise the average accuracy, especially at the high IoU standard, and vice versa.

Disclosure of Invention

According to the invention, the weighting of samples in the field of image processing is not only data-dependent but also task-dependent. On the one hand, unlike previous techniques, the importance of a sample of an image should be determined by its intrinsic properties compared to the real annotation and its response to the loss function. On the other hand, object detection of images is a multi-tasking problem. The weighting of the samples of the image should be balanced between the different tasks.

According to an aspect of the present invention, there is provided a sample weight generation system for image processing, the system comprising:

a feature transformation device for transforming the input first, second, third and fourth features into first, second, third and fourth dense features, respectively;

A sample weight generation system according to an aspect of the invention, wherein:

the sample weight of the classification loss and the sample weight of the regression loss predicted by the weight prediction device are respectively obtained by a first exponential function and a second exponential function.

According to an aspect of the invention, a sample weighted learning system for object detection of an image is proposed, the system comprising:

an input device for receiving, for each sample, input of a first feature, a second feature, a third feature, and a fourth feature;

a feature transformation device for transforming the input first, second, third and fourth features into first, second, third and fourth dense features, respectively;

a weight prediction device for predicting a sample weight of the classification loss and a sample weight of the regression loss for each sample according to the generated sample characteristics;

a loss function calculation device for calculating a loss function from the sample weight of the predicted classification loss and the sample weight of the regression loss;

and

a feature transformation adjusting device for adjusting the transformation function based on the calculated loss function.

The sample weight generation system according to an aspect of the present invention, wherein the system further comprises:

a gradient calculation device for deriving a gradient from the calculated loss function to adjust the transformation function used by the sample feature generation device.

The sample weighted learning system according to an aspect of the present invention, wherein the sample feature generation device further includes:

first feature transformation means for transforming an input first feature into a first dense feature;

second feature transformation means for transforming the input second features into second dense features;

third feature transformation means for transforming an input third feature into a third dense feature; and

fourth feature transformation means for transforming the input fourth feature into a fourth dense feature.

The sample weighted learning system according to an aspect of the present invention, wherein the weight prediction apparatus further includes:

classification loss weight prediction means for predicting a sample weight of a classification loss; and

and the regression loss weight prediction device is used for predicting the sample weight of the regression loss.

The sample weighted learning system according to an aspect of the present invention, wherein the first feature is a classification loss, the second feature is a regression loss, the third feature is an intersection ratio, and the fourth feature is a classification probability.

A sample weighted learning system according to an aspect of the invention, wherein:

the input device is further configured to receive a fifth feature; and

the sample weighted learning system further comprises:

fifth feature transformation means for transforming the input fifth feature into a fifth dense feature.

A sample weighted learning system according to an aspect of the invention, wherein:

the fifth characteristic is mask loss.

A sample weighted learning system according to an aspect of the invention, wherein:

the sample weights of the predicted classification loss and the sample weights of the regression loss are respectively obtained by a first exponential function and a second exponential function.

A sample weighted learning system according to an aspect of the invention, wherein:

the sample weights of the classification loss of a group of samples including a positive sample and a negative sample are averaged as the sample weight of the classification loss of each sample.

According to an aspect of the present invention, there is provided a sample weight generation method for image processing, the method including:

generating a joint sample feature from the first dense feature, the second dense feature, the third dense feature, and the fourth dense feature using a transformation function;

and predicting the sample weight of the classification loss and the sample weight of the regression loss for the sample according to the generated sample characteristics.

A sample weight generating method according to an aspect of the present invention, wherein:

the sample weights of the predicted classification loss and the sample weights of the regression loss are respectively obtained by a first exponential function and a second exponential function.

According to an aspect of the present invention, a sample weighted learning method for object detection of an image is presented, the method comprising:

the first step is as follows: receiving, for each sample, input first, second, third, and fourth features;

the second step is as follows: transforming the input first feature, second feature, third feature and fourth feature;

the third step: generating a joint sample feature from the first dense feature, the second dense feature, the third dense feature, and the fourth dense feature using a transformation function;

the fourth step: predicting a classification lost sample weight and a regression lost sample weight for each sample according to the generated sample characteristics;

the fifth step: calculating a loss function from the predicted sample weights of classification losses and the sample weights of regression losses;

a sixth step: the transformation function used by the sample feature generation device is adjusted according to the calculated loss function.

The sample weight learning method according to an aspect of the present invention, wherein the second step further includes:

a step of transforming the input first feature into a first dense feature;

a step of transforming the input second feature into a second dense feature;

a step of transforming the input third feature into a third dense feature; and

a step of transforming the input fourth feature into a fourth dense feature.

The sample weight learning method according to an aspect of the present invention, wherein the fourth step further includes:

predicting a sample weight of a classification loss; and

and predicting the sample weight of the regression loss.

The sample weighted learning method according to an aspect of the present invention, wherein the first feature is a classification loss, the second feature is a regression loss, the third feature is an intersection ratio, and the fourth feature is a classification probability.

A sample weighted learning method according to an aspect of the invention, wherein the first step further comprises receiving a fifth feature; and the method further comprises the step of transforming the input fifth feature into a fifth dense feature.

The sample weight learning method according to an aspect of the present invention, wherein the fifth feature is a mask loss.

A sample weighted learning method according to an aspect of the invention, wherein:

the sample weights of the predicted classification loss and the sample weights of the regression loss are respectively obtained by a first exponential function and a second exponential function.

A sample weighted learning method according to an aspect of the invention, wherein:

the sample weights of the classification loss of a group of samples including a positive sample and a negative sample are averaged as the sample weight of the classification loss of each sample.

The sample weight learning method according to an aspect of the present invention, wherein the sixth step further includes:

a gradient is derived from the calculated loss function and the transformation function is then adjusted according to the derived gradient.

The sample weight generation system or the sample weight learning system according to an aspect of the present invention can also be applied to a system of object detection in image processing.

The sample weight generation method or the sample weight learning method according to an aspect of the present invention can also be applied to a method of object detection in image processing.

The sample weighting learning method for the image object detection is simple and effective, and can balance classification and regression tasks by learning weights sample by using a sample weighting network. Specifically, in addition to the basic detection network, the invention designs a sample weighting network to predict the weights of the classification loss and the regression loss of the image samples. The sample weighting network takes as input the classification loss, the regression loss, the IoU (cross-over ratio) value and the classification probability. It uses a function that converts the current context characteristics of the sample into a sample weight. The sample weighting network according to the present invention has been fully evaluated on the MS COCO and Pascal VOC datasets and evaluated various one-and two-phase detectors.

In summary, the present invention proposes a generic loss function for object detection of images, which covers most area-based object detectors and their sampling strategies, and designs a uniform sample weighting network based on this. The method according to the invention has the following advantages compared to previous methods of sample weighting of images: (1) the sample weights for both the classification task and the regression task are learned together. (2) Is data dependent so that the soft weights for each individual sample can be learned from the training data. (3) Can be applied to various one-stage and two-stage detectors. And is easily inserted into most object detectors for use and achieves significant performance gains without impacting inference time.

Drawings

For a more complete understanding of the present invention, and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:

fig. 1(a) -1 (c) show schematic diagrams of samples with different weights and classification losses during object detection training of images, wherein fig. 1(a) shows samples with larger classification losses but smaller weights, fig. 1(b) shows samples with smaller classification losses but larger weights, and fig. 1(c) shows classification probabilities and an inconsistency exhibited before IoU;

2(a) -2(c) show architectural diagrams of a sample weighted learning system for object detection of images according to an embodiment of the invention, where FIG. 2(a) shows a block diagram of a two-phase detector, FIG. 2(c) shows a schematic diagram of a sample weighting network that processes images according to an embodiment of the invention, and FIG. 2(b) shows a schematic diagram of how a loss function in image processing is obtained according to FIG. 2 (c);

FIG. 3 shows a block diagram of a sample weighted learning system for object detection of images, in accordance with an embodiment of the invention;

FIG. 4 shows a flow diagram of a sample weighted learning method for object detection of images according to an embodiment of the invention;

fig. 5(a) -5(d) are schematic diagrams illustrating comparative effects of applying the sample weighted learning system for object detection of images to object detection according to the embodiment of the present invention.

Fig. 6 schematically shows a block diagram of an electronic device according to an embodiment of the disclosure.

Detailed Description

Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. It is to be understood that such description is merely illustrative and not intended to limit the scope of the present invention. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the invention. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present invention.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). Where a convention analogous to "A, B or at least one of C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B or C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

Some block diagrams and/or flow diagrams are shown in the figures. It will be understood that some blocks of the block diagrams and/or flowchart illustrations, or combinations thereof, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the instructions, which execute via the processor, create means for implementing the functions/acts specified in the block diagrams and/or flowchart block or blocks. The techniques of the present invention may be implemented in hardware and/or in software (including firmware, microcode, etc.). Furthermore, the techniques of this disclosure may take the form of a computer program product on a computer-readable storage medium having instructions stored thereon for use by or in connection with an instruction execution system. The aforementioned "difficult" samples generally refer to samples in which classification loss in image processing is large. However, the "difficult" sample is not necessarily important. As shown in fig. 1(a) (samples of all images are selected from the training process), samples have higher classification loss but are less weighted ("difficult" but not important). Conversely, it may be important if a "simple" sample captures the subject category's gist shown in FIG. 1 (b). In addition, when the classification score is high, the assumption that the bounding box regression is accurate is not always as shown in fig. 1 (c). Sometimes inconsistencies between classification and regression may occur.

The invention is applied to object detection in the field of image processing. Where object detection refers to the ability of a computer and software system to locate objects in an image (or scene) and identify each object, the sample referred to herein may be, for example, a small region in the object (image). In the object detection process, an image is input, and an identified (positioned) object is output.

The invention reconstructs the sample weighting problem in a probability format and measures the importance of the sample by reflecting uncertainty, wherein the probability modeling not only solves the problem of sample weighting, but also solves the balance problem between classification and positioning tasks. The invention makes the sample weighting process flexible and can learn through deep learning.

Fig. 2(a) -2(c) show a block diagram of a sample weighted learning system for object detection of images according to an embodiment of the present invention. Fig. 2(a) shows a block diagram of a two-stage detector (detection network), which may be replaced with a one-stage detector. Fig. 2(c) shows a schematic diagram of a sample weighting network in a sample weighting learning system for object detection of an image according to an embodiment of the present invention, and fig. 2(b) shows a schematic diagram of how a loss function in image processing is obtained according to fig. 2 (c). In FIG. 2(c), each sample is compared with its true label to calculate an input feature, which according to one embodiment of the invention includes four initial features associated with the sample of the imageAnd Prob_iWhich are the classification loss, regression loss, IoU and classification probability, respectively, for the samples of each image. Four functions F, G, H corresponding to the four input features relating to the sample of the image and K, which may each be implemented by an MLP neural network, then transform the four features to obtain four dense features. The sample weights of the classification loss and the sample weights of the regression loss are obtained based on the dense features and provided to fig. 2(b) for further processing. From FIG. 2(c), it can be seen that the sample weighting network for obtaining the sample weights of the classification loss and the sample weights of the regression lossTo consist of two levels of multi-layer aware (MLP) networks. The loss of all samples can be averaged to optimize the model parameters. FIG. 2(b) Total loss L to be generated_iAnd respectively transmitted back to the sample weighting network and the detection network, so as to respectively adjust the four functions used in the sample weighting network and the related parameters in the detection network. A block diagram of a sample weighted learning system for object detection according to an embodiment of the present invention will be described in detail below with reference to fig. 3.

As shown in fig. 3, a sample weighted learning system for object detection of images includes an input device 30 for receiving input image data, which may be sample-based features; a sample weighting network 32 for obtaining sample weights by learning from input image data such as features; a loss function calculation device 34 for calculating a loss function according to the weights of the samples; and a gradient calculation device 36 for calculating a gradient from the loss function and providing it to the sample weighting network 32. And a detection network (not shown).

The sample weighting network 32 includes: a feature transformation adjusting device 301, a feature transformation device 302, a sample feature generating device 303, and a weight predicting device 304.

The feature transformation apparatus 302 includes a first feature transformation device 3021, a second feature transformation device 3022, a third feature transformation device 3023, and a fourth feature transformation device 3024 that respectively transform input image data, for example, features, using a transformation function. The feature transformation adjusting means 301 is used to adjust the transformation functions used by the first feature transformation means 3021, the second feature transformation means 3022, the third feature transformation means 3023 and the fourth feature transformation means 3024.

The weight prediction apparatus 304 includes a classification weight prediction means 3041 for predicting a sample weight of a classification loss from the generated sample feature and a regression weight prediction means 3042 for predicting a sample weight of a regression loss.

The loss function calculation device 34 is configured to calculate a loss function according to the predicted sample weights, and the weights of the samples may include a sample weight of the classification loss and a sample weight of the regression loss.

The algorithm employed by the present invention will be described before describing in detail the sample weighted learning system for object detection. The algorithm can more effectively detect objects in the image.

Recent studies of object detection including a one-phase object detector and a two-phase object detector follow a similar region-based paradigm. Given a set of anchors (samples from images)i is a natural number. I.e., a prior box that is typically placed over the image to densely cover spatial location, scale, and aspect ratio, the multitasking training objects in the image may be summarized as follows:

whereinIs the classification loss (regression loss), andrepresents the anchor of the sample for classification (regression). N is a radical of₁And N₂Is the number of training samples and foreground samples. Relationships betweenSuitable for most object detectors. LetAndare respectively used for the sample a_iClass loss weight and sample a of_iThe generalized loss function of two-stage and one-stage detectors with different sampling strategies is as follows:

wherein the content of the first and second substances,and is an indicator function that outputs a 1 when a condition is satisfied, and outputs a 0 otherwise. As a result, can adoptAndto represent various sample strategies. Here, area sampling can be interpreted as a special case of sample weighting, so that soft sampling is possible.

The present invention jointly learns sample weights for both classification and regression from a data-driven perspective. While previous methods focused on re-weighting classification (e.g., OHEM and Focal-Loss) or regression Loss (e.g., KL-Loss), the present invention re-weights classification and regression Loss jointly. In addition, unlike mining "difficult" samples (which have high classification Loss) in the OHEM and Focal-local methods, the present invention focuses on important samples for object detection in the image, which may also be "simple" samples.

The invention reconstructs the sample weighting problem in a probability format and measures the importance of the sample by reflecting the uncertainty. The invention makes the sample weighting process flexible and can learn through deep learning, and the probability modeling not only solves the problem of sample weighting, but also solves the balance problem between classification and localization tasks.

A sample weighted learning system for object detection of images according to an embodiment of the present invention will be specifically described below with reference to fig. 2(a) -2 (c).

As shown in FIGS. 2(a) -2(c), vector gt_iRepresenting the real annotation bounding box coordinates.Is the estimated bounding box coordinates. By mixing_iAnd(and a)_iRelated) comparison four distinguishing features were obtained for each sample:and Prob_iClassification loss, regression loss, IoU, respectively_i(Cross-over ratio) and Prob_i(classification probability). The four features are input as input data by the input device 30 into the sample weighting network 32. How to obtain classification loss will be described laterAnd regression lossA detailed description will be given.

Rather than using the visual features of the sample directly (which effectively loses information of the true label of the object from the corresponding image), the present invention designs four distinguishing features from the detector itself, which take advantage of the interaction between the estimate and the true label (i.e., IoU and the classification score), since both classification and regression losses inherently reflect the uncertainty of the prediction to some extent.

For negative examples, feature IoU is used_iAnd Prob_iIs set to 0. Wherein a positive sample may be a sample comprising an object (object in the image). The negative examples may be examples that do not include an object (an object in the image).

The input device 30 will obtain four different features:and Prob_iInput to the sample weighting network 32. For four features of the inputAnd Prob_iThe processing is performed by the first feature conversion device 3021, the second feature conversion device 3022, the third feature conversion device 3023, and the fourth feature conversion device 3024, respectively.

According to an embodiment of the present invention, the first feature transforming means 3021, the second feature transforming means 3022, the third feature transforming means 3023 and the fourth feature transforming means 3024 may be or use four different functions F, G, H and K, respectively, for transforming the input into dense features for a more comprehensive representation and providing to the sample feature generating device 303. According to one embodiment of the invention, these functions are transform functions, each of which may be implemented by an MLP neural network, which may map each one-dimensional value to a higher-dimensional feature through the transform function. The transformed features may be encapsulated in sample-level features d by using a sample feature generation device 303_iThe method comprises the following steps:

the sample feature generation device 303 supplies the generated joint sample features to the weight prediction device 304. The classification loss weight prediction device 3041 and the regression loss weight prediction device 3042 of the weight prediction device 304 respectively predict d from the generated sample features_iSample weights lost to middle learning classificationAnd sample weights of regression losses

And

wherein W_clsAnd W_regTwo separate MLP networks for weighted prediction of classification loss and regression loss can be represented separately.

The weight prediction device 304 supplies the predicted classification loss weight and regression loss weight to the loss function calculation device 34.

How the loss function calculation device 34 calculates the loss function will be described in detail below.

The object detection target may be decomposed into regression and classification tasks. Given the ith sample, the regression task is first modeled as a Gaussian likelihood with the predicted position offset as the meanAnd standard deviation of

Wherein vector gt_iRepresents the true annotated bounding box coordinates, andare the estimated bounding box coordinates. To optimize the regression network, the log probability of likelihood is maximized:

by definition(according to_iAndobtain) Multiplying the equation by-1 and ignoring the constant, the loss function calculation device 34 obtains the following regression loss:

for object detector training of images, there are two opposite sample weighting strategies. On the one hand, some people prefer "difficult" samples, which can effectively speed up the training process through larger magnitude losses and gradients. On the other hand, some believe that the "simple" example requires more attention when ranking is more important for evaluating the metrics and the category imbalance problem is less important. However, it is often impractical to manually determine how difficult or noisy a training sample is. Thus, the sample horizontal variance as referred to in equation (8) introduces greater flexibility as it allows for automatic adjustment of the sample weights based on the significance of each sample feature.

Relative varianceTake the derivative (derivative) of equation (8) equal to zero and solve (assume λ)₂1), the optimum variance value satisfiesReinserting this value into equation (8) and ignoring the constant, the overall regression object is reduced toThe function is a concave-constant function which greatly supportsBut only to largeThe value applies a soft penalty. This makes the algorithm robust to outliers and noise samples with large gradients, which may degrade performance. This also prevents the algorithm from paying too much attentionAre very large difficult samples. Thus, the regression function of equation (8) facilitates the selection of samples with large IoU, as this encourages faster speeds, bringing the loss toward 0. This, in turn, motivates the feature learning process to add weight on these samples, while samples with relatively small IoU still maintain a moderate gradient during training.

For equation (8), λ₂Is a constant value that absorbs the overall scale of loss in object detection of the image. By mixingWritingEquation (8) can be roughly considered as a weighted version of the regression loss, where a regularization term is used to prevent the loss from becoming a trivial solution. As the deviation increases in the course of time,the weight of (3) is decreased. Intuitively, this weighting strategy places more weight on confidence samples and penalizes more errors made by these samples during the training process. For the classification task, the likelihood is formulated as a softmax function:

wherein the temperature t_iThe flatness of the distribution is controlled.And y_iAre respectivelyLogarithm of (d) and true label.The distribution of (a) is actually a boltzmann distribution. To conform its form to that of the regression task, defineOrder to(obtain (a)) The loss function calculation device 34 approximates the classification loss as:

the loss function calculation device 34 combines the weighted classification loss (equation (10)) with the weighted regression loss (equation (8)) to produce the following total loss:

attention direct predictionMay cause difficulty in implementation because ofIs expected to be positive and willThe location where the denominator is placed has a potential hazard of being divided by zero. According to one embodiment of the invention, to proceedOne-step optimization equation, using predictionMaking the optimization numerically more stable and allowing unconstrained predicted output. The loss function calculation device 34 makes the final total loss function become:

the invention customizes different weights for each sampleAndallowing the multitask balancing weights to be adjusted at the sample level. The loss function computation device 34 can efficiently drive the network to learn useful sample weights through the network design.

The loss function calculation device 34 calculates the loss function L according to equation (12)_iThe loss function is then provided to the gradient calculation device 36. The gradient calculation apparatus 36 calculates a gradient based on the loss function and supplies it to the sample weighting network 32, and then the feature transformation adjustment apparatus 301 adjusts the functions used by the first feature transformation device 3021, the second feature transformation device 3022, the third feature transformation device 3023, and the fourth feature transformation device 3024 based on the gradient. Since the feature transformation device 302 can be dynamically adjusted based on the gradient, the classification loss weight and the regression loss weight for each sample can be dynamically learned. The above description shows the sample weight learning system for object detection of an image, and it is obvious that a sample weight of a classification loss and a sample weight of a regression loss can be predicted for a sample by using a sample weight generation system (not shown) of an image constituted by the feature transformation device 302, the sample feature generation device 303, and the weight prediction device 304 of the present invention.

A sample weighted learning method for object detection according to an embodiment of the present invention is described below with reference to fig. 3 and 4.

At S410, a sample weighted learning system for object detection of images receives input sample-related features.

In S420, the first feature transforming means 3021, the second feature transforming means 3022, the third feature transforming means 3023, and the fourth feature transforming means 3024 transform features relating to the sample of the input image to obtain dense features, respectively.

At S430, the sample feature generating device 303 generates a joint sample feature based on the transformed dense features.

At S440, the weight prediction apparatus 304 generates a sample weight of classification loss and a sample weight of regression loss from the generated sample features.

At S450, the loss function calculation apparatus 34 calculates a loss function from the sample weight of the predicted classification loss and the sample weight of the regression loss.

At S460, the gradient calculation device 36 calculates a gradient from the calculated loss function.

In S470, the feature transform adjustment apparatus 301 adjusts the first, second, third, and fourth feature transform devices 3021, 3022, 3023, and 3024 according to the obtained gradient so that the first, second, third, and fourth feature transform devices 3021, 3022, 3023, and 3024 perform S420-S470 again on the next sample until all samples are processed. The present invention can perform S410-S470 on a per sample basis, and preferably, the present invention performs S410-S470 on a per batch (per group) of samples. Each batch of samples may include at least four samples, thereby increasing computational efficiency.

The method according to the present invention can calculate a loss function by adaptively learning a sample weight of a classification loss and a sample weight of a regression loss in object detection of image processing, thereby more preferably training an accurate object detection model.

Due to the fact that the sample weighted learning system for object detection according to the invention is on a basic object detectorWithout any assumptions, this means that it can be used with most region-based object detectors, including fast R-CNN, RetinaNet, and Mask R-CNN. The method according to the invention is generic, with only minimal modifications to the original framework. The Faster R-CNN consists of a Regional Proposal Network (RPN) and a fast R-CNN network. The RPN is kept unchanged and the sample weighted learning method for object detection of images according to the present invention is inserted into the Fast R-CNN branch. For each sample, first calculateAnd Prob_iAs input to a Sample Weighting Network (SWN). Then predicting the weightAndinserted in equation (12), the gradient is propagated back to the basic detection network and the sample weighting network. For RetinaNet, the present invention follows a similar process to generate classification and regression weights for each sample. Since Mask R-CNN (Mask R-CNN) has other Mask branches, the method of the present invention can incorporate another branch into the sample weighting network to generate adaptive weights for Mask loss, where classification, bounding box regression, and Mask prediction are jointly estimated. According to one example of the present invention, to match the extra mask weights, the mask penalty is also taken as an input to the sample weighting network, so that the mask penalty is compared to the other four inputsAnd Prob_iComputing prediction weights together as input to a sample weighting networkAnd(Steps S410-S470 are performed). Root of therebyAccording to an embodiment of the present invention, the sample weighting network for image processing may further include a fifth feature transformation means (not shown) for transforming the input mask loss into dense features using another function, which may be implemented by an MLP neural network. Then, the dense feature obtained by the fifth feature transformation means is supplied to the sample feature generation unit 303 together with the four dense features obtained by the four functions F, G, H and K to generate a sample feature. The weight prediction device 304 generates a classification lost sample weight and a regression lost sample weight from the generated sample features.

According to an embodiment of the present invention, a sample weight generation system (or method) for image processing or a sample weight learning system (or method) for image processing may be applied to a system (or method) (not shown) for object detection in image processing, wherein the system for object detection in image processing may include an input device that receives an input image, and the sample weight learning system according to the present invention is used to perform sample weight learning on the input image to obtain a sample weight of classification loss and a sample weight of regression loss of the image so that object detection may be performed or performed according to the sample weight generated by the sample weight generation system.

According to one embodiment of the present invention, the predicted classification weights are found to be unstable because the uncertainty between negative and positive samples is much greater than the uncertainty of the regression. Thus, the classification weights of the positive and negative samples in each batch are averaged separately as a smoothed version of the classification loss weight prediction. Where each batch may be a manually defined number of samples in the training process.

Fig. 5(a) -5(d) show qualitative performance comparisons between RetinaNet and RetinaNet + SWN on the COCO dataset. Following a general threshold of 0.5 for visually detected objects, the present invention only illustrates detection when its score is above the threshold. As shown in fig. 5(a) -5(d), some so-called "simple" objects missed by RetinaNet (e.g., children, sofas, baby bottles, etc.) have been successfully detected by enhanced retinene with a Sample Weighting Network (SWN). The present invention speculates that the original RetinaNet may be concentrated too much on "difficult" samples. As a result, samples that are "simple" are less of a concern and contribute less to model training. As a result, samples that are "simple" are less of a concern and contribute less to model training. The scores for these "simple" samples had decreased, resulting in undetected samples. The purpose of fig. 5(a) -5(d) is not to show the "difference" of RetinaNet in the score calibration, since a "simple" sample can be detected anyway when the threshold is lowered, which actually shows that the sample weight learning system of the present invention does not give less weight to the "simple" sample.

There is another study aimed at improving the bounding box regression. In other words, they attempt to optimize the regression loss by learning with IoU as a supervision or in conjunction with NMS. Based on the fast R-CNN + ResNet-50+ FPN framework, COCO val2017 is compared, and performance comparison shows that the sample weighted learning system and the expanded SWN + Soft-NMS thereof are superior to IoU-Net and IoU-Net + NMS. The performance comparison further demonstrates the advantage of learning the sample weights for both classification and regression.

Fig. 6 schematically shows a block diagram of an electronic device for image processing according to an embodiment of the present disclosure. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 6, the electronic device 600 includes a processor 610, a computer-readable storage medium 620. The electronic device 600 may perform a method according to an embodiment of the present disclosure.

In particular, the processor 610 may comprise, for example, a general purpose microprocessor, an instruction set processor and/or related chip set and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. The processor 610 may also include onboard memory for caching purposes. The processor 610 may be a single processing unit or a plurality of processing units for performing the different actions of the method flows according to embodiments of the present disclosure.

Computer-readable storage medium 620, for example, may be a non-volatile computer-readable storage medium, specific examples including, but not limited to: magnetic storage devices, such as magnetic tape or Hard Disk Drives (HDDs); optical storage devices, such as compact disks (CD-ROMs); a memory, such as a Random Access Memory (RAM) or a flash memory; and so on.

The computer-readable storage medium 620 may include a computer program 621, which computer program 621 may include code/computer-executable instructions that, when executed by the processor 610, cause the processor 610 to perform a method according to an embodiment of the disclosure, or any variation thereof.

The computer program 621 may be configured with, for example, computer program code comprising computer program modules. For example, in an example embodiment, code in computer program 621 may include one or more program modules, including 621A, 621B, … …, for example. It should be noted that the division and number of the modules are not fixed, and those skilled in the art may use suitable program modules or program module combinations according to actual situations, so that the processor 610 may execute the method according to the embodiment of the present disclosure or any variation thereof when the program modules are executed by the processor 610.

According to an embodiment of the present disclosure, at least one of the devices or apparatuses illustrated in fig. 3 may be implemented as computer program modules described with reference to fig. 6, which, when executed by the processor 610, may implement the respective operations described above.

The present invention also provides a computer-readable storage medium, which may be contained in the device/system described in the above embodiments; or may exist separately and not be incorporated into the device/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the present invention.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

It will be understood by those skilled in the art that while the present invention has been shown and described with reference to certain exemplary embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims and their equivalents. Accordingly, the scope of the present invention should not be limited to the above-described embodiments, but should be defined not only by the appended claims, but also by equivalents thereof.

22页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种数码管小数点位置识别方法及装置

Sample weighted learning system for object detection of images and method thereof

相关技术

网友询问留言