pedestrian height estimation method suitable for overlooking shooting of fisheye camera

文档序号:1787277 发布日期:2019-12-10 浏览:10次 中文

阅读说明:本技术 一种适用于鱼眼相机俯视拍摄的行人身高估计方法 (pedestrian height estimation method suitable for overlooking shooting of fisheye camera ) 是由 谢龙汉 杨烈 于 2019-07-27 设计创作,主要内容包括:本发明提出了一种适用于鱼眼相机俯视拍摄的行人身高估计方法。用基于鱼眼相机俯视拍摄的行人头部检测数据集训练一个用于行人头部检测的深度卷积神经网络;使用训练好头部检测神经网络对鱼眼图像中的行人头部进行检测,得到图像中行人头部的位置;接着根据行人的头部位置对图像进行旋转操作,并将图像中的该行人截取出来;然后对截取出来的行人图像进行填补并进行语义分割,从而得到行人准确边框,计算出行人的图上身高;根据行人的图上身高和鱼眼相机的数学模型进行行人身高估计。重复以上过程,对图像中的行人逐个进行身高估计,从而实现同时对多个行人进行身高估计。本发明针对鱼眼相机俯视拍摄场景提出了一种准确、有效的行人身高估计方法。(The invention provides a pedestrian height estimation method suitable for overlooking shooting by a fisheye camera. Training a depth convolution neural network for pedestrian head detection by using a pedestrian head detection data set based on fisheye camera overlooking shooting; detecting the head of the pedestrian in the fisheye image by using the trained head detection neural network to obtain the position of the head of the pedestrian in the image; then, rotating the image according to the head position of the pedestrian, and intercepting the pedestrian in the image; then filling the intercepted pedestrian image and performing semantic segmentation to obtain an accurate pedestrian frame, and calculating the height of the pedestrian on the image; and estimating the height of the pedestrian according to the height on the image of the pedestrian and the mathematical model of the fisheye camera. Repeating the above processes, and carrying out height estimation on the pedestrians in the image one by one, thereby realizing height estimation on a plurality of pedestrians at the same time. The invention provides an accurate and effective pedestrian height estimation method for a overlooking shooting scene of a fisheye camera.)

1. A pedestrian height estimation method suitable for overlooking shooting by a fisheye camera is characterized by comprising the following steps of:

s1, making a pedestrian head detection data set based on top view shooting of a fisheye camera, and training a deep convolution neural network for pedestrian head detection by using the data set to obtain a trained pedestrian head detection neural network;

S2, using the trained pedestrian head detection neural network to perform pedestrian head detection, rotating the image according to the head position of the target pedestrian, rotating the target pedestrian in the image to an upright state, and then intercepting the target pedestrian from the image;

s3, filling the intercepted image and performing semantic segmentation on the pedestrian in the filled image so as to obtain an accurate frame of the pedestrian in the image, and calculating the height of the pedestrian on the image according to the accurate frame;

S4, estimating the height of the target pedestrian according to the mathematical model of the fisheye camera and the height on the figure of the target pedestrian;

S5, selecting different target pedestrians, repeating S2-S4, and sequentially carrying out height estimation on the pedestrians in the image, so that the height estimation of a plurality of pedestrians is achieved.

2. The method for estimating the height of the pedestrian according to claim 1, wherein the step S1 comprises the following steps:

s1.1, firstly, collecting videos shot by a fish-eye camera in different scenes in a downward view mode, then intercepting sample images from the videos, manually marking the head of a pedestrian appearing in each image by using a square frame, simultaneously writing coordinates of the upper left corner and the lower right corner of a marking frame into a marking file, and finishing the manufacture of a data set, wherein the coordinates of the upper left corner and the lower right corner of the marking frame are used for calculating loss in the neural network fine tuning training process of S1.3, so that network parameters are optimized according to loss values, and regression of the frame of the head of the pedestrian is realized;

S1.2, constructing a deep neural network, namely a detector, for pedestrian head detection based on a single detector, wherein a feature extraction part uses VGG-16, then uses a Pascal VOC data set, and pre-trains the detector in a batch gradient descent mode, and the basic structure and the loss function of the pedestrian head detection neural network are the same as the framework of the single detector;

S1.3, performing fine tuning training on the pre-trained detector by using the head detection data set of the person descending from the overlooking scene of the fisheye camera manufactured in the S1.1; in the fine tuning training process, the structure and the loss function of the pedestrian head detection neural network are the same as those in the pre-training process, firstly, the network parameters obtained by pre-training are loaded, then, the training is carried out in a batch gradient descending mode, and the trained pedestrian head detection neural network is obtained.

3. The method for estimating the height of the pedestrian according to claim 1, wherein the step S2 comprises the following steps:

S2.1, reading images one frame by one frame from the video which is overlooked and shot by the fisheye camera in the S1, and then carrying out pedestrian head detection on the images by using a trained pedestrian head detection neural network so as to obtain the head position of each pedestrian in the images;

S2.2, selecting a target pedestrian O from the imageii is the serial number of the pedestrian in the image, i is 1,2,3ithen calculating the image center point C0And head center point HiOf (2) a connection linethe included angle theta with the vertical upward direction is formed, the image is rotated according to the included angle theta, and the pedestrian is rotated to be in an upright state;

S2.3, assuming that the height on the graph of the pedestrian is H0the width of the pedestrian in the image is W0(ii) a According to H0and W0and obtaining a rough frame of the target pedestrian, and intercepting the pedestrian from the image according to the rough frame.

4. The method for estimating the height of the pedestrian according to claim 1, wherein the step S3 comprises the following steps:

S3.1, pre-constructing a neural network for semantic segmentation based on a mask region convolution network, wherein ResNet-50 is used in the convolution part of the neural network, then an MSCOCO data set is used, the neural network is trained in a batch gradient descent mode, network parameters of the trained semantic segmentation neural network are obtained, and a loss function and a basic structure of the semantic segmentation neural network are the same as MaskR-CNN;

S3.2, after training is finished, loading the trained network parameters, and changing a category list in the semantic segmentation neural network into a list only containing people, so that the semantic segmentation is only carried out on a human body;

S3.3, filling the image intercepted in the S2.3 by black according to the length-width ratio of the input image of the semantic segmentation neural network, and adjusting the size of the image after filling to be consistent with the input requirement of the semantic segmentation network;

s3.4, performing semantic segmentation on the image after the size is adjusted by using a pre-constructed and trained semantic segmentation neural network so as to obtain an accurate frame of the pedestrian in the image, taking a middle point a on the upper side and a middle point b on the lower side of the accurate frame as a head top point and a sole center point of the pedestrian in the image respectively, wherein the pixel coordinate of a is (u) coordinate1,v1) B has a pixel coordinate of (u)2,v2)。

5. the method for estimating the height of the pedestrian according to claim 1, wherein the step S4 comprises the following steps:

S4.1, from the mathematical model r ═ f · g (θ) of the fisheye camera and the imaging geometry of the pedestrian in the fisheye camera overlook shooting scene, we can obtain:

ra=f·g(α); (1)

rb=f·g(β); (2)

tan(α)=D/(H-h); (3)

tan(β)=D/H; (4)

Where f is the focal length of the camera, g (θ) is 2f · sin (θ/2), H is the installation height of the camera, H is the actual height of the pedestrian, i.e. the estimated value of the pedestrian height, r is the pixel distance from the image point to the center of the image, r is the distance between the image point and the image centerais the pixel distance of point a to the center of the image, rbthe distance between a point b and the center of an image is taken as a pixel distance, D is the actual distance between a pedestrian and the installation position of the camera on a horizontal plane, alpha is the included angle between the connecting line of the point a and the center point of the camera and the vertical downward direction, and beta is the included angle between the connecting line of the point b and the center point of the camera and the vertical downward direction;

the simultaneous (1), (2), (3) and (4) can obtain:

h=H{1-tan[g-1(ra/f)]/tan[g-1(rb/f)]}; (5)

s4.2, according to the central point C of the image0(cx,cy) And the coordinates of the two points a and b can be obtained:

will r isaand raSubstituting the value into (5) to obtain an estimated value h of the height of the pedestrian.

Technical Field

the invention mainly relates to the field of pedestrian height estimation in video images, in particular to a pedestrian height estimation method suitable for overlooking shooting by a fisheye camera.

Background

In recent years, with the rapid development of the information industry and the continuous improvement of the performance of computers, the detection of pedestrian information in video images by using computers has been a main task of the development of intelligent video monitoring systems. The height information of the pedestrian is one of the important information which needs to be acquired by the monitoring system. At present, there are some pedestrian height estimation methods for common cameras. However, with the continuous expansion of the monitoring range, the common video camera cannot meet the use requirement, the viewing angle range of the fisheye camera is large and can reach 180 degrees or even exceed 180 degrees, the monitoring range is far larger than that of the common video camera, the number of the cameras can be reduced by using the fisheye camera to detect objects, the monitoring cost is saved, and therefore the fisheye camera is more and more widely used in the field of security monitoring.

However, the picture taken by the fisheye camera has large distortion, so that the pedestrian height estimation becomes difficult; in addition, in order to realize large-scale monitoring, the fisheye camera is arranged at the center of the top of the monitoring area, and the difficulty of pedestrian height estimation is greatly increased by overlooking shooting. Therefore, the pedestrian height estimation in the overlooking shooting scene of the fisheye camera is a very challenging task, and no method for estimating the height of a plurality of pedestrians simultaneously in the overlooking scene of the fisheye camera exists at present.

Disclosure of Invention

In order to solve the technical problems, the invention provides a pedestrian height estimation method suitable for a overlooking shooting scene of a fisheye camera. According to the method, the semantic segmentation neural network and the fisheye camera mathematical model are combined, so that the height estimation of a plurality of pedestrians is simultaneously carried out in the overlooking shooting scene of the fisheye camera.

the purpose of the invention is realized by at least one of the following technical solutions.

a pedestrian height estimation method suitable for overlooking shooting by a fisheye camera comprises the following steps:

s1, making a pedestrian head detection data set based on top view shooting of a fisheye camera, and training a deep convolution neural network for pedestrian head detection by using the data set to obtain a trained pedestrian head detection neural network;

S2, using the trained pedestrian head detection neural network to perform pedestrian head detection, rotating the image according to the head position of the target pedestrian, rotating the target pedestrian in the image to an upright state, and then intercepting the target pedestrian from the image;

s3, filling the intercepted image and performing semantic segmentation on the pedestrian in the filled image so as to obtain an accurate frame of the pedestrian in the image, and calculating the height of the pedestrian on the image according to the accurate frame;

S4, estimating the height of the target pedestrian according to the mathematical model of the fisheye camera and the height on the figure of the target pedestrian;

s5, selecting different target pedestrians, repeating S2-S4, and sequentially carrying out height estimation on the pedestrians in the image, so that the height estimation of a plurality of pedestrians is achieved.

Further, step S1 specifically includes the following steps:

s1.1, firstly, collecting videos shot by a fish-eye camera in different scenes in a downward view mode, then intercepting sample images from the videos, manually marking the head of a pedestrian appearing in each image by using a square frame, simultaneously writing coordinates of the upper left corner and the lower right corner of a marking frame into a marking file, and finishing the manufacture of a data set, wherein the coordinates of the upper left corner and the lower right corner of the marking frame are used for calculating loss in the neural network fine tuning training process of S1.3, so that network parameters are optimized according to loss values, and regression of the frame of the head of the pedestrian is realized;

S1.2, constructing a deep neural network (Detector) for pedestrian head detection based on a Single Detector (SSD), wherein a feature extraction part uses VGG-16, then uses a Pascal VOC data set, and pre-trains the Detector in a batch gradient descent mode, and the basic structure and the loss function of the pedestrian head detection neural network are the same as those of an SSD frame;

And S1.3, performing fine tuning training on the pre-trained detector by using the head detection data set of the person descending from the overlooking scene of the fisheye camera manufactured in the S1.1. In the fine tuning training process, the structure and the loss function of the pedestrian head detection neural network are the same as those in the pre-training process, firstly, the network parameters obtained by pre-training are loaded, then, the training is carried out in a batch gradient descending mode, and the trained pedestrian head detection neural network is obtained.

Further, step S2 specifically includes the following steps:

S2.1, reading images one frame by one frame from the video which is overlooked and shot by the fisheye camera in the S1, and then carrying out pedestrian head detection on the images by using a trained pedestrian head detection neural network so as to obtain the head position of each pedestrian in the images;

S2.2, selecting a target pedestrian O from the imageiI is the serial number of the pedestrian in the image, i is 1,2,3ithen calculating the image center point C0And head center point HiOf (2) a connection linethe included angle theta with the vertical upward direction is formed, the image is rotated according to the included angle theta, and the pedestrian is rotated to be in an upright state;

s2.3, assuming that the height on the graph of the pedestrian is H0The width of the pedestrian in the image is W0(ii) a According to H0And W0And obtaining a rough frame of the target pedestrian, and intercepting the pedestrian from the image according to the rough frame.

further, step S3 specifically includes the following steps:

S3.1, pre-constructing a neural network for semantic segmentation based on a Mask region convolutional network (Mask R-CNN), wherein ResNet-50 is used in the convolutional part of the neural network, then an MSCOCO data set is used, the neural network is trained in a batch gradient descent mode, network parameters of the trained semantic segmentation neural network are obtained, and a loss function and a basic structure of the semantic segmentation neural network are the same as those of the Mask R-CNN;

S3.2, after training is finished, loading the trained network parameters, and changing a category list in the semantic segmentation neural network into a list only containing people, so that the semantic segmentation is only carried out on a human body;

S3.3, filling the image intercepted in the S2.3 by black according to the length-width ratio of the input image of the semantic segmentation neural network, and adjusting the size of the image after filling to be consistent with the input requirement of the semantic segmentation network;

S3.4, performing semantic segmentation on the image after the size is adjusted by using a pre-constructed and trained semantic segmentation neural network so as to obtain an accurate frame of the pedestrian in the image, taking a middle point a on the upper side and a middle point b on the lower side of the accurate frame as a head top point and a sole center point of the pedestrian in the image respectively, wherein the pixel coordinate of a is (u) coordinate1,v1) B has a pixel coordinate of (u)2,v2)。

further, step S4 specifically includes the following steps:

s4.1, from the mathematical model r ═ f · g (θ) of the fisheye camera and the imaging geometry of the pedestrian in the fisheye camera overlook shooting scene, we can obtain:

ra=f·g(α); (1)

rb=f·g(β); (2)

tan(α)=D/(H-h); (3)

tan(β)=D/H; (4)

Where f is the focal length of the camera, g (θ) is 2f · sin (θ/2), H is the installation height of the camera, H is the actual height of the pedestrian, i.e. the estimated value of the pedestrian height, r is the pixel distance from the image point to the center of the image, r is the distance between the image point and the image centeraIs the pixel distance of point a to the center of the image, rbthe distance between a point b and the center of an image is taken as a pixel distance, D is an actual distance between a pedestrian and a camera installation position on a horizontal plane, alpha is an included angle between a connecting line of the point a and the center point of the camera and a vertical downward direction, and beta is an included angle between a connecting line of the point b and the center point of the camera and the vertical downward direction.

The simultaneous (1), (2), (3) and (4) can obtain:

h=H{1-tan[g-1(ra/f)]/tan[g-1(rb/f)]}; (5)

S4.2, according to the central point C of the image0(cx,cy) And the coordinates of the two points a and b can be obtained:

Will r isaAnd rbSubstituting the value into (5) to obtain an estimated value h of the height of the pedestrian.

Compared with the prior art, the invention has the advantages and effects that:

The invention uses the deep neural network to detect the head of the pedestrian so as to obtain the approximate position of the pedestrian in the image, and then the pedestrian is independently cut out from the image according to the position of the pedestrian, so that a plurality of pedestrians in the image are independently considered, the mutual interference among the pedestrians is avoided, and the height estimation of the plurality of pedestrians is realized at the same time. According to the invention, the height of the pedestrian in the image is obtained by performing semantic segmentation on the pedestrian in the image by using the deep neural network, so that the actual height of the pedestrian is estimated according to the upper body height of the image.

drawings

FIG. 1 is a flow chart of steps of a pedestrian height estimation method suitable for top-view shooting by a fisheye camera.

Fig. 2 is a diagram of a result of head detection of a descending person in a scene of a fish-eye camera looking down for shooting.

Fig. 3 is a diagram showing the result of rotating an image according to the head position of a pedestrian.

fig. 4 is a result diagram of the pedestrian being cut out from the image and padded.

fig. 5 is a result diagram of semantic segmentation of a pedestrian in an image.

Fig. 6 is an imaging schematic diagram of a pedestrian in a shooting scene of a fisheye camera in a top view.

Detailed Description

the practice of the present invention will be further illustrated by the following examples and drawings, but the practice and protection of the present invention is not limited thereto.

As shown in fig. 1, a pedestrian height estimation method suitable for top view shooting by a fisheye camera includes the following steps:

s1, making a pedestrian head detection data set based on top view shooting of a fisheye camera, training a deep convolution neural network for pedestrian head detection by using the data set, and obtaining the trained pedestrian head detection neural network, wherein the method specifically comprises the following steps:

S1.1, firstly, videos shot by overlooking of a fisheye camera in different scenes are collected, then sample images are intercepted from the videos, the head of a pedestrian appearing in each image is manually marked by a square frame, as shown in a figure 2, and meanwhile, coordinates of the upper left corner and the lower right corner of a marking frame are written into a marking file to finish the manufacturing of a data set. Coordinates of the upper left corner and the lower right corner of the labeling frame are used for calculating loss in the neural network fine tuning training process of S1.3, so that network parameters are optimized according to the loss value, and regression of the pedestrian head frame is achieved;

S1.2, loading an SSD Model SSD-512-VGG 16-across-voc based on VGG-16 from a Model Zoo of a third-party library GluonCV as a detector, and loading a trained network parameter on a Pascal VOC data set as a pre-training parameter;

and S1.3, performing fine tuning training on the detector loaded with the pre-training parameters by using the pedestrian head detection data set which is manufactured in the S1.1 and is based on the overlooking shooting of the fisheye camera, and obtaining a trained pedestrian head detection neural network. The fine tuning training adopts a batch gradient descent mode to train the detector, the batch size is 16, the learning rate is 0.0005, and the training times are 500 epochs;

S2, using the trained pedestrian head detection neural network to perform pedestrian head detection, rotating the image according to the head position of the target pedestrian, rotating the target pedestrian in the image to an upright state, and then intercepting the target pedestrian from the image, specifically comprising the following steps:

s2.1, reading images one frame by one frame from the video which is overlooked and shot by the fisheye camera in the S1, and then carrying out pedestrian head detection on the images by using a trained pedestrian head detection neural network so as to obtain the head position of each pedestrian in the images;

S2.2, as shown in figure 3a, selecting a target pedestrian O from the imageiI is the serial number of the pedestrian in the image, i1,2,3, calculate its head center point H from its head borderithen calculating the image center point C0And head center point Hiof (2) a connection linethe angle theta is formed between the image and the vertical upward direction, the image is rotated according to the angle theta, and as shown in fig. 3b, the pedestrian is rotated to be in an upright state;

S2.3, assuming that the height on the graph of the pedestrian is H0the width of the pedestrian in the image is W0In which H is0=2.5m,W04 times the detected width of the pedestrian's head frame; according to H, as shown in FIG. 4a0And W0obtaining a rough frame of the target pedestrian, and intercepting the pedestrian from the image according to the rough frame;

S3, filling the intercepted image and performing semantic segmentation on the pedestrian in the image after filling so as to obtain an accurate frame of the pedestrian in the image, and calculating the height of the pedestrian on the image according to the accurate frame, wherein the method specifically comprises the following steps:

s3.1, loading a Mask _ rcnn _ ResNet50_ v1b _ coco network model of Mask R-CNN based on ResNet-50 from ModelZoo of a third-party library GluonCV in advance for carrying out semantic segmentation on the image, and loading the trained network parameters on an MSCOCO data set;

s3.2, after training is finished, loading the trained network parameters, and changing a category list in the semantic segmentation neural network into a list only containing people, so that the semantic segmentation is only carried out on a human body;

S3.3, filling the image intercepted in the S2.3 by black according to the length-width ratio of the input image of the semantic segmentation neural network, and adjusting the size of the image after filling to be consistent with the input requirement of the semantic segmentation network;

S3.4, performing semantic segmentation on the filled image by using a pre-constructed and trained semantic segmentation neural network, as shown in FIG. 5a, obtaining an accurate frame of the pedestrian in the image, as shown in FIG. 5b, respectively taking a middle point a on the upper side and a middle point b on the lower side of the accurate frame as middle pointsThe pedestrian has the pixel coordinate of (u) at the head vertex and the sole central point in the image1,v1) B has a pixel coordinate of (u)2,v2)。

S4, according to the mathematical model of the fisheye camera and the height on the figure of the target pedestrian, estimating the height of the target pedestrian, specifically comprising the following steps:

S4.1, FIG. 6 is a schematic diagram of the imaging process of the fish-eye camera, wherein the semi-circular arc is shownthe imaging geometrical relationship of a pedestrian in a shooting scene viewed from the fisheye camera is obtained according to a mathematical model r of the fisheye camera, wherein the point O is a focal point of the lens, and the point EI is an imaging plane of the fisheye camera:

ra=f·g(α); (1)

rb=f·g(β); (2)

tan(α)=D/(H-h); (3)

tan(β)=D/H; (4)

where f is the focal length of the camera, g (θ) is 2f · sin (θ/2), a is the head of the actual pedestrian, B is the center point of the sole of the actual pedestrian, H is the installation height of the camera, H is the actual height of the pedestrian, i.e., the estimated value of the height of the pedestrian, r is the pixel distance from the image point to the center of the image, r is the distance between the image point and the image pointaIs the pixel distance of point a to the center of the image, rbthe distance between a point b and the center of an image is taken as a pixel distance, D is the actual distance between a pedestrian and the installation position of the camera on a horizontal plane, alpha is the included angle between the connecting line of the point a and the center point of the camera and the vertical downward direction, and beta is the included angle between the connecting line of the point b and the center point of the camera and the vertical downward direction;

the simultaneous (1), (2), (3) and (4) can obtain:

h=H{1-tan[g-1(ra/f)]/tan[g-1(rb/f)]}; (5)

s4.2, according to the central point C of the image0(cx,cy) And the coordinates of the two points a and b can be obtained:

will r isaAnd rbSubstituting the value into (5) to obtain an estimated value h of the height of the pedestrian.

s5, selecting different target pedestrians, repeating S2-S4, and sequentially carrying out height estimation on the pedestrians in the image, so that the height estimation of a plurality of pedestrians is achieved.

12页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种驼背提示器

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!