Self-adaptive rapid target detection method based on fast-RCNN

文档序号:1545164 发布日期:2020-01-17 浏览:4次 中文

阅读说明:本技术 基于Faster-RCNN的自适应快速目标检测方法 (Self-adaptive rapid target detection method based on fast-RCNN ) 是由 张良 曹之君 于 2019-09-05 设计创作,主要内容包括:一种基于Faster-RCNN的自适应快速目标检测方法。其包括将原始图像输入底层特征提取网络中得到特征图;将特征图输入区域建议网络的卷积层中进行训练;对候选区域进行重叠度评分,然后自适应选取候选区域并输入分类和回归层中进行训练,得到含有目标的候选区域;将含有目标的候选区域和特征图一起送入最终分类回归层,判断出哪一类目标等步骤。本发明将底层特征提取网络从vgg网络改变成残差网络,使网络深度更深更高,提取的特征自然更抽象更全面,提高了目标检测的识别率。采取自适应的方式选取候选区域的数量,通过训练结果反馈调节,使候选区域的数量在300-2000之间动态变化,有效减少了训练时间,并且识别率更高。(An adaptive rapid target detection method based on fast-RCNN. Inputting an original image into a bottom layer feature extraction network to obtain a feature map; inputting the characteristic diagram into a convolutional layer of the area suggestion network for training; performing overlap degree scoring on the candidate regions, then adaptively selecting the candidate regions, inputting the candidate regions into a classification and regression layer, and training to obtain candidate regions containing targets; and sending the candidate region containing the target and the feature map into a final classification regression layer, judging which type of target is, and the like. The invention changes the bottom layer characteristic extraction network from vgg network to residual error network, so that the network depth is deeper and higher, the extracted characteristic is natural and more abstract and more comprehensive, and the identification rate of target detection is improved. The number of the candidate regions is selected in a self-adaptive mode, and the number of the candidate regions is dynamically changed between 300-2000 through feedback adjustment of the training result, so that the training time is effectively reduced, and the recognition rate is higher.)

1. A self-adaptive rapid target detection method based on fast-RCNN is characterized in that: the self-adaptive rapid target detection method based on the fast-RCNN comprises the following steps which are carried out in sequence:

(1) inputting an original image into a bottom layer feature extraction network, and obtaining a feature map through convolution for a plurality of times;

(2) inputting the feature map into a convolutional layer of a regional suggestion network for training, and establishing mapping from the feature map to an original image through a preset anchor point, namely that a certain pixel point on the feature map corresponds to k candidate regions in the original image;

(3) scoring the overlapping degree of all the candidate regions, then adaptively selecting the candidate regions with the scores between 300 and 2000, and inputting the candidate regions into a classification and regression layer of a regional suggestion network for training to obtain the candidate regions containing the targets;

(4) and sending the candidate region containing the target and the feature map into a final classification regression layer of the region suggestion network, and judging which type of target the target in the candidate region is specifically by adopting ROI pooling operation, thereby obtaining a final recognition result.

2. The fast-RCNN-based adaptive fast target detection method according to claim 1, wherein: in the step (1), the bottom layer feature extraction network adopts a RESNET58 residual error network.

3. The fast-RCNN-based adaptive fast target detection method according to claim 1, wherein: in step (2), the method for establishing the mapping from the feature map to the original image through the anchor point set in advance is as follows: a plurality of candidate areas are generated on the basis of a preset anchor point, each pixel point on the feature map corresponds to one area in the original image, then the area is adjusted, the area is set to be in three settings with the length-width ratios of 1:1,1:2 and 2:1 respectively, and the sizes of the anchor points are set to be different, so that each pixel point on the feature map corresponds to 9 candidate areas in the original image, namely k is equal to 9.

4. The fast-RCNN-based adaptive fast target detection method according to claim 1, wherein: in step (3), the method for adaptively selecting the candidate region with the score of between 300 and 2000 is as follows: calculating the average value total _ loss of the regression loss every N times of training, and considering that the average value total _ loss of the regression loss is reduced by half and increased by one time of self as a reasonable variation jitter interval every N times of training, and considering that feedback adjustment is needed when the average value total _ loss exceeds the interval; when the average value total _ loss of the regression loss is doubled or more, the Number of the candidate areas is automatically increased by 1+ Number _ rate _ up times; when the average value of regression loss, total _ loss, is reduced by half and smaller, the Number of candidate regions is appropriately reduced, and the Number of candidate regions is changed to 1-Number _ rate _ down, i.e., the Number of candidate regions is adaptively changed in the interval 300-2000.

Technical Field

The invention belongs to the technical field of computer vision and image processing, and particularly relates to a self-adaptive rapid target detection method based on fast-RCNN.

Background

The target detection, also called target extraction, is an image segmentation based on target geometry and statistical characteristics, which combines the segmentation and identification of targets into one, and the accuracy and real-time performance of the method are important capabilities of the whole system. Target detection is an important problem in computer vision, and has important research value in the fields of pedestrian tracking, license plate recognition, unmanned driving and the like. In recent years, with the dramatic improvement of the accuracy of image classification by deep learning, target detection algorithms based on deep learning have become mainstream.

Since the concept of target detection is proposed, scholars at home and abroad continuously explore the problem. Most of the traditional target detection algorithms are based on a frame of a sliding window or carry out matching according to feature points. AlexNet has taken a lot of time in the annual ImageNet large-scale visual recognition challenge match since 2012, and the effect is far beyond traditional algorithms, bringing the public's view back to the deep neural network. The proposal of R-CNN in 2014 gradually makes the target detection algorithm based on CNN become mainstream.

Disclosure of Invention

In order to solve the above problems, an object of the present invention is to provide an adaptive fast target detection method based on fast-RCNN.

In order to achieve the above purpose, the adaptive fast target detection method based on fast-RCNN provided by the invention comprises the following steps in sequence:

(1) inputting an original image into a bottom layer feature extraction network, and obtaining a feature map through convolution for a plurality of times;

(2) inputting the feature map into a convolutional layer of a regional suggestion network for training, and establishing mapping from the feature map to an original image through a preset anchor point, namely that a certain pixel point on the feature map corresponds to k candidate regions in the original image;

(3) scoring the overlapping degree of all the candidate regions, then adaptively selecting the candidate regions with the scores between 300 and 2000, and inputting the candidate regions into a classification and regression layer of a regional suggestion network for training to obtain the candidate regions containing the targets;

(4) and sending the candidate region containing the target and the feature map into a final classification regression layer of the region suggestion network, and judging which type of target the target in the candidate region is specifically by adopting ROI pooling operation, thereby obtaining a final recognition result.

In the step (1), the bottom layer feature extraction network adopts a RESNET58 residual error network.

In step (2), the method for establishing the mapping from the feature map to the original image through the anchor point set in advance is as follows: a plurality of candidate areas are generated on the basis of a preset anchor point, each pixel point on the feature map corresponds to one area in the original image, then the area is adjusted, the area is set to be in three settings with the length-width ratios of 1:1,1:2 and 2:1 respectively, and the sizes of the anchor points are set to be different, so that each pixel point on the feature map corresponds to 9 candidate areas in the original image, namely k is equal to 9.

In step (3), the method for adaptively selecting the candidate region with the score of between 300 and 2000 is as follows: calculating the average value total _ loss of the regression loss every N times of training, and considering that the average value total _ loss of the regression loss is reduced by half and increased by one time of self as a reasonable variation jitter interval every N times of training, and considering that feedback adjustment is needed when the average value total _ loss exceeds the interval; when the average value total _ loss of the regression loss is doubled or more, the Number of the candidate areas is automatically increased by 1+ Number _ rate _ up times; when the average value of regression loss, total _ loss, is reduced by half and smaller, the Number of candidate regions is appropriately reduced, and the Number of candidate regions is changed to 1-Number _ rate _ down, i.e., the Number of candidate regions is adaptively changed in the interval 300-2000.

The self-adaptive rapid target detection method based on the fast-RCNN provided by the invention has the following advantages:

1. the bottom-layer feature extraction network is changed from the vgg network to the residual error network, so that the depth of the network is deeper and higher, the extracted features are natural and more abstract and comprehensive, and the identification rate of target detection is improved from the original 16 layers to 50 layers.

2. A rapid target detection method with a region number adjusting layer is provided to improve a classical region suggestion network. And during training, introducing a region number adjusting layer, judging the current training effect in real time, adjusting the number of candidate regions according to the current training effect, and determining the number of the optimal candidate regions when the training is finished. Through feedback adjustment of a training result, the number of candidate areas is dynamically changed between 300-2000-plus-2000, and experiments show that compared with the traditional fast-RCNN network, the rate is increased by 18 percentage points, the recognition rate is increased by 3 percentage points, and the adaptability to the environment is stronger, so that the training time is effectively reduced, and the recognition rate is higher.

Drawings

FIG. 1 is a general flowchart of a fast-RCNN-based adaptive fast target detection method according to the present invention;

FIG. 2 is a block diagram of a bottom-level feature extraction network employed in the adaptive fast target detection method based on fast-RCNN provided in the present invention;

FIG. 3 is a schematic diagram of a face ROI result extracted by the fast-RCNN-based adaptive fast target detection method provided by the invention.

Detailed Description

The adaptive fast target detection method based on fast-RCNN provided by the invention is described in detail below with reference to the accompanying drawings and specific embodiments.

As shown in FIG. 1, the adaptive fast target detection method based on fast-RCNN provided by the invention comprises the following steps in sequence:

(1) inputting an original image in a voc2007 data set into a RESNET58 residual error network serving as a bottom-layer feature extraction network shown in FIG. 2, and performing convolution for a plurality of times to obtain a feature map; it is conventional to choose vgg16 as the underlying feature extraction network. As the number of network layers increases, the convergence of the training result is worse, and even the higher the number of network layers is, the worse the training effect is. In order to solve the problem of network degradation, the RESNET58 residual error network is adopted as a bottom layer feature extraction network, so that the number of layers of the bottom layer feature extraction network is changed from 16 layers to 58 layers, and the training effect can be greatly improved. The structure of the underlying feature extraction network is shown in table 1.

(2) Inputting the feature map into a convolutional layer of a regional suggestion network (RPN) for training, and establishing mapping from the feature map to an original image through a preset anchor point, namely that a certain pixel point on the feature map corresponds to k candidate regions in the original image;

the core idea of the fast-RCNN is that a plurality of candidate areas are generated on the basis of a preset anchor point, each pixel point on a feature map corresponds to a certain area in an original image, then the area is adjusted, in the invention, the area is subjected to three settings of which the length-width ratio is 1:1,1:2 and 2:1, and the size of the anchor point is respectively three different settings of large, medium and small, so that each pixel point on the feature map corresponds to 9 candidate areas in the original image, namely k is equal to 9. The number of candidate regions in the original image is 9 times of the number of pixel points in the feature map, and it can be considered that the target to be detected is exhausted by all the candidate regions. The pixel points with the corresponding relation set on the characteristic diagram are called anchor points. The pixel points are similar to individual ship anchors which are fixed on the ocean, ships can be found through the ship anchors according to a clue, and the anchor points correspond to the candidate areas on the original image.

(3) Scoring the overlapping degree of all the candidate regions, then adaptively selecting the candidate regions with the scores between 300 and 2000, and inputting the candidate regions into a classification and regression layer of a regional suggestion network for training to obtain the candidate regions containing the targets;

conventionally, all candidate regions are scored for overlap, and the overlap and the score are inversely related. Then, the candidate region with the score of the first 2000 is selected for training. Because the number of the candidate areas generated in the step (2) is too large, the training cost is too large and the time consumption is too long, the method is optimized, an NP (construction number) layer is introduced in the training process to feed back the training result so as to adaptively adjust the number of the candidate areas, most of the candidate areas are abandoned, and the training time is shortened. The method for adaptively selecting the candidate area comprises the steps of calculating the average value total _ loss of the regression loss every N times of training, considering that the average value total _ loss of the regression loss is reduced by half and increased by one time of self as a reasonable variation jitter interval every N times of training, and considering that feedback adjustment is needed when the average value total _ loss exceeds the interval; when the average value total _ loss of the regression loss is doubled or more, the Number of the candidate areas is automatically increased by 1+ Number _ rate _ up times; when the regression loss average value total _ loss is reduced by half and smaller, the Number of the candidate regions is properly reduced, and the Number of the candidate regions is changed to 1-Number _ rate _ down times, that is, the Number of the candidate regions is adaptively changed in the interval of 300-2000, so that the operation rate can be improved by 18%, and the specific results of the method and the blank comparison set thereof are shown in Table 2. And finally obtaining a candidate region containing the target.

(4) And sending the candidate region containing the target and the feature map into a final classification regression layer of the region suggestion network, and judging which type of target the target in the candidate region is specifically by adopting ROI (region of interest) pooling operation, thereby obtaining a final recognition result. FIG. 3 is a schematic diagram of a face ROI result extracted by the fast-RCNN-based adaptive fast target detection method provided by the invention.

TABLE 1 Structure of bottom layer feature extraction network

Figure BDA0002192197410000061

TABLE 2

Figure BDA0002192197410000062

8页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:障碍物识别方法、装置、存储介质及巡检机器人

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!