Efficient target person track generation method based on pedestrian re-recognition

文档序号:987772 发布日期:2020-11-06 浏览:2次 中文

阅读说明:本技术 一种基于行人再识别的高效目标人物轨迹生成方法 (Efficient target person track generation method based on pedestrian re-recognition ) 是由 俞一奇 邱彦林 陈尚武 于 2020-07-20 设计创作,主要内容包括:本发明提供一种基于行人再识别的高效目标人物轨迹生成方法,结合当前主流MoblileNet-V2网络中深度可分离卷积的思想,设计轻量型卷积神经网络,对现有的公开行人再识别数据集进行训练得到生成模型,再通过mAP和Rank-k对生成模型进行性能评估,挑选出性能最优的行人再识别模型;本发明根据现有监控场景下的视频数据,设计出一个资源占用少,准确率高的目标人物轨迹生成方案,工作人员只需要提供目标人物的图片,就能够在待检索的数据库中匹配出全部相似可疑人物目标图片,通过人工筛选后,再通过准确图片进行二次检索,直到没有新的准确目标人物图片生成位置,再结合图片中的时间地点信息就能获取目标人物完整的轨迹线路。(The invention provides a high-efficiency target character track generation method based on pedestrian re-identification, which is characterized in that a light-weight convolutional neural network is designed by combining the idea of deep separable convolution in the current mainstream MobileNet-V2 network, the existing public pedestrian re-identification data set is trained to obtain a generation model, the performance of the generation model is evaluated through mAP and Rank-k, and a pedestrian re-identification model with the optimal performance is selected; according to the method, a target person track generation scheme with less resource occupation and high accuracy is designed according to video data in the existing monitoring scene, a worker can match all similar suspicious person target pictures in a database to be retrieved only by providing the pictures of the target person, secondary retrieval is carried out through the accurate pictures after manual screening until no new accurate target person picture generation position exists, and the complete track circuit of the target person can be obtained by combining time and place information in the pictures.)

1. A pedestrian re-recognition-based efficient target person track generation method is characterized by comprising the following steps:

step (1): combining the idea of deep separable convolution in the current mainstream MobileNet-V2 network, designing a lightweight convolution neural network, training the existing public pedestrian re-identification data set to obtain a generation model, performing performance evaluation on the generation model through mAP and Rank-k, and selecting the optimal performance: the model with the maximum mAP and Rank-k values is used as a pedestrian re-identification model;

step (2): acquiring camera monitoring data of different positions in the same time period according to specific requirements;

and (3): detecting the pedestrians in the video data in the step (2) by adopting a current mainstream target detection algorithm, and in order to avoid the situation that the same target is detected in multiple adjacent frames at the same time, so that a large amount of data redundancy occurs, judging whether the adjacent tracking frame and the adjacent detection frame are the same target by adopting a target tracking algorithm through IOU (input output) on the tracking frame and the detection frame, wherein in the normal situation, when the IOU is greater than 0.4, the person in the tracking frame and the person in the detection frame are considered as the same target, and if so, one picture of the person passing through the current camera is reserved;

and (4): naming the picture according to the IP address of the camera and the time information of the current picture;

and (5): extracting all candidate picture features in the step (3) through the pedestrian re-identification model in the step (1); the candidate pictures refer to pictures of target people needing to be searched; the picture features are output of the last full link layer in the lightweight convolutional neural network in the step (1), and are usually a group of n-dimensional floating point data which are stored in a database;

and (6): extracting two-dimensional features x [ n × m ] of m target figures to be retrieved by adopting the same method in the step (5), storing the two-dimensional features x [ n × m ] into a database, extracting corresponding features y [ n ] according to the target figure pictures, and performing feature matching with all the target pictures in the database, wherein the feature matching refers to calculating the similarity degree of the features x [ n × m ] of the figure pictures to be retrieved and the features y [ n ] of the target figure pictures;

in order to prevent target omission, the threshold value of the first target matching is reduced to a little, the threshold value is 0.6, so that more candidate persons can be matched, and then correct target persons can be screened out through manual judgment;

matching is carried out by a cosine similarity method, and a specific implementation formula is shown as follows; in order to prevent missing a target picture, a similarity threshold value which is relatively low is selected for matching (sim (X, Y) ≥ 0.6), wherein an X vector is the picture characteristic of a person to be retrieved in a candidate picture, and a Y vector is the picture characteristic of the target person;

Figure FDA0002591486510000011

and (7): matching the characters in the previous step with the characters in the database, matching by increasing the threshold value to ensure the matching accuracy, wherein the threshold value is usually selected to be 0.8, and then manually screening candidate character pictures; repeating the steps (6) to (7) until no new target person picture is generated;

and (8): and arranging the final target figure pictures from small to large according to the time information of the final target figure pictures to obtain the complete track information of the target.

2. The method for generating efficient target human trajectory based on pedestrian re-recognition as claimed in claim 1, wherein the design and training of the pedestrian re-recognition model in step (1) comprises the following steps:

s11, selecting a Pythrch as a training frame of the model;

s12, in order to ensure that the model can process image data in real time, the current mainstream MobileNet-V2 model is used as a backbone network, because the depth separable structure is used, the forward reasoning speed of the model can be improved while the parameter quantity of the model can be reduced;

s13, in order to improve the accuracy of the model, firstly, carrying out classification training on an imageNet data set, and then, carrying out training on the model on a public data set for pedestrian re-identification;

and S14, performing performance evaluation on the generated training model through mAP and Rank-k, and selecting the model with the best performance as a final pedestrian re-identification model.

3. The pedestrian re-recognition-based efficient target person trajectory generation method according to claim 1, wherein the step (2) of combining target detection and target tracking algorithm is as follows:

s21, taking the initial frame of the video as the input of a target detection algorithm, detecting a corresponding target pedestrian, and marking the pedestrian through a detection frame;

s22, using the detection frame as the input of a target tracking algorithm, tracking the subsequent 3 frames of images, and obtaining a tracking frame of the last frame of image;

s23, carrying out IOU calculation one by one on all the tracking frames in the step S22 and all the detection frames in the step S21, and if the IOU value is maximum and is greater than 0.4, determining that the tracking frames and the detection frames are the same person; otherwise, the person is not considered to be the same person.

Technical Field

The invention belongs to the technical field of computer vision analysis, and particularly relates to an efficient target character track generation method based on pedestrian re-identification.

Background

Along with the acceleration of urbanization pace, urban population density is increasing day by day, and more monitoring cameras are arranged in numerous public places in order to guarantee the life and property safety of people. In the face of massive video data every day, if a target person is searched by a manual screening and searching method, the manpower, material resources and time are extremely spent, and the recognition rate of the method is not high. The purpose of pedestrian re-identification is to match the picture of a person captured in one camera with the picture of a person captured in a different camera. Because it can provide practical technical support to the application demand in the safety monitoring field, therefore obtain the extensive attention of industry and academic world.

At present, how to generate pedestrian tracks quickly and accurately has great application value for a plurality of industries, for example: in the security industry, by giving a photo of a criminal suspect, picture information of more suspects is obtained by searching a historical monitoring video, so that a target person can be accurately positioned more quickly; in commerce, the user requirements are intelligently understood and advertisement push is pertinently provided by identifying the stay track and analyzing the behaviors of the client; in addition, in some large public places (such as amusement parks, railway stations and the like), the lost population can be searched, and the police can quickly locate the position of the current lost person only by the aid of provided pictures through re-recognition technology.

Of course, how to generate the target person track quickly, efficiently and accurately by the pedestrian re-identification technology also faces several important problems to be solved: 1. a lightweight and highly accurate pedestrian re-identification model is designed to run on small-resource and low-cost equipment. 2. How to effectively and accurately retrieve all the targets to be searched by each monitoring video through a pedestrian detection technology and a target tracking technology. 3. How to match all the target person pictures as much as possible with respect to the acquired pedestrian pictures so as to generate more detailed trajectory information.

In view of the above problems, it is necessary to design an efficient target person trajectory generation method based on pedestrian re-identification, and the method can meet the actual application requirements of the current monitoring scene.

Disclosure of Invention

In view of the above, the invention provides an efficient target person track generation method based on pedestrian re-identification, which designs a target person track generation scheme with less resource occupation and high accuracy according to video data in the existing monitoring scene, so that a worker can match all similar suspicious person target pictures in a database to be retrieved only by providing the pictures of the target person, perform secondary retrieval through the accurate pictures after manual screening, repeat the above steps until no new accurate target person picture generation position exists, and then obtain a complete track line of the target person by combining time and place information in the pictures. By adopting the method, the working efficiency can be greatly improved, and the labor and hardware costs are reduced.

In order to achieve the purpose, the invention provides the following technical scheme:

the invention relates to a pedestrian re-recognition-based efficient target character track generation method which comprises a pedestrian re-recognition model generation module 1, wherein the pedestrian re-recognition model generation module comprises network design, model training and performance testing. 2. The monitoring video data acquisition module is mainly used for collecting monitoring data of a plurality of non-overlapped cameras in the same time period. 3. And the database picture generation module is used for storing the person information in the video in a database in the form of pictures and storing the position and time information of all the pictures. 4. And the characteristic generation matching module is mainly used for generating characteristics of the database pictures in the module 3 through the module 1, storing the characteristics in a database, and screening out correct target pictures by combining human intervention through characteristic matching. 5. And the track generation module is mainly used for obtaining a correct picture according to the matching of the previous module, arranging the picture from small to large according to the time information and finally obtaining a track line of the target person.

A pedestrian re-identification-based efficient target person track generation method comprises the following steps:

step (1): combining the idea of deep separable convolution in the current mainstream MobileNet-V2 network, designing a lightweight (with few parameters and few floating point operation) convolution neural network, training the existing public pedestrian re-identification data set to obtain a generation model, performing performance evaluation on the generation model through mAP and Rank-k, and selecting the optimal performance: the model with the maximum mAP and Rank-k values is used as a pedestrian re-identification model;

step (2): acquiring camera monitoring data of different positions in the same time period according to specific requirements;

and (3): detecting pedestrians in the video data in the step (2) by adopting a current mainstream target detection algorithm (specifically adopting a YOLO-V3 algorithm), and in order to avoid the situation that the same target is detected in multiple adjacent frames at the same time and a large amount of data is redundant, adopting a target tracking algorithm (specifically adopting a KCF algorithm) to perform IOU on adjacent tracking frames and detection frames to judge whether the adjacent tracking frames and the detection frames are the same target, generally, when the IOU is more than 0.4, the person in the tracking frames and the detection frames is considered as the same target, and if so, one picture of the person passing through the current camera is reserved;

and (4): naming the picture according to the IP address of the camera and the time information of the current picture; the picture naming rule is as follows, and if the picture name is 192.168.130.1_20191212114309.png, the IP of the camera is 192.168.130.1, and the time of the picture appearing in the camera is 43 minutes and 09 seconds at 12 months and 12 days in 2019 and 11 points in 11 days;

and (5): extracting all candidate picture features in the step (3) through the pedestrian re-identification model in the step (1); the candidate pictures refer to pictures of target people needing to be searched; the 'picture characteristic' is the output of the last full link layer in the lightweight convolutional neural network in the step (1), and is usually a group of n-dimensional floating point data, and the n-dimensional floating point data is stored in a database;

and (6): extracting two-dimensional features x [ n × m ] of m target figures to be retrieved by adopting the same method in the step (5), storing the two-dimensional features x [ n × m ] into a database, extracting corresponding features y [ n ] according to the target figure pictures, and performing feature matching with all the target pictures in the database, wherein the feature matching refers to calculating the similarity degree of the features x [ n × m ] of the figure pictures to be retrieved and the features y [ n ] of the target figure pictures;

in order to prevent target omission, the threshold value of the first target matching is reduced to a little, the threshold value is 0.6, so that more candidate persons can be matched, and then correct target persons can be screened out through manual judgment;

matching is carried out by a cosine similarity method, and a specific implementation formula is shown as follows; in order to prevent missing a target picture, a similarity threshold value which is relatively low is selected for matching (sim (X, Y) ≥ 0.6), wherein an X vector is the picture characteristic of a person to be retrieved in a candidate picture, and a Y vector is the picture characteristic of the target person;

and (7): matching the characters in the previous step with the characters in the database, matching by increasing the threshold value to ensure the matching accuracy, wherein the threshold value is usually selected to be 0.8, and then manually screening candidate character pictures; repeating the steps (6) to (7) until no new target person picture is generated;

and (8): and arranging the final target figure pictures from small to large according to the time information of the final target figure pictures to obtain the complete track information of the target.

Preferably, the design and training of the pedestrian re-identification model in the step (1) comprises the following steps:

s11, selecting a Pythrch as a training frame of the model;

s12, in order to ensure that the model can process image data in real time, the current mainstream MobileNet-V2 model is used as a backbone network, because the depth separable structure is used, the forward reasoning speed of the model can be improved while the parameter quantity of the model can be reduced;

s13, in order to improve the accuracy of the model, firstly, carrying out classification training on an imageNet data set, and then, carrying out training on the model on a public data set for pedestrian re-identification;

s14, performing performance evaluation on the generated training model through mAP and Rank-k, and selecting the model with the best performance as a final pedestrian re-identification model;

the Average precision Average value of the mAP (mean Average precision) measures the quality of the model in all the test set categories, and the larger the mAP value of the model in the same test set is, the better the classification effect is;

the Rank-k represents the probability that the top n graphs (with the highest confidence) in the search results have the correct results. And (3) by counting the accuracy of the accumulated correct results of the first to k-th samples in the ranking, if the accuracy of the model is higher before the Rank-k ranking is higher under the same test set, the retrieval precision of the model is higher.

Preferably, the step (2) of combining the target detection and the target tracking algorithm is as follows:

s21, taking the initial frame of the video as the input of a target detection algorithm (specifically adopting a YOLO-V3 algorithm), detecting a corresponding target pedestrian, and marking the pedestrian through a detection frame;

s22, using the detection frame as the input of a target tracking algorithm (specifically adopting a KCF algorithm), tracking the following 3 frames of images to obtain a tracking frame of the last frame of image;

s23, carrying out IOU calculation one by one on all the tracking frames in the step S22 and all the detection frames in the step S21, and if the IOU value is maximum and is greater than 0.4, determining that the tracking frames and the detection frames are the same person; otherwise, the person is not considered to be the same person.

The lightweight convolutional neural network is a form of convolutional neural network, and adopts a deep separable convolution mode;

there are disclosed prior pedestrian re-identification data sets such as Market-1501, Viper, Duke01-03, and so on;

the IOU (intersection over union) is an overlapping rate of a generated candidate frame (candidate frame) and an original mark frame (ground route frame), and a larger IOU indicates that the candidate frame and the original mark frame are closer in position. I.e. the ratio of their intersection to union.

The Average precision Average value of the mAP (mean Average precision) measures the quality of the model on all the types of the test sets, and the larger the mAP value of the model on the same test set is, the better the classification effect is.

The Rank-k represents the probability that the top n graphs (with the highest confidence) in the search results have the correct results. And (3) by counting the accuracy of the accumulated correct results of the first to k-th samples in the ranking, if the accuracy of the model is higher before the Rank-k ranking is higher under the same test set, the retrieval precision of the model is higher.

Compared with the prior art, the invention has the beneficial effects that:

the same target person is screened from massive monitoring video data through the existing manpower, and the action track of the target person in a certain specific time period is generated through the screened person picture, which is time-consuming work. In addition, the manual screening also has the defects of easy target omission and low accuracy. Therefore, the action track of the target can be quickly and accurately generated by the efficient target person track generation method which is designed by combining the existing pedestrian detection, target tracking and pedestrian re-identification technologies. The staff only needs to provide a picture of the target person to be retrieved and a video of the corresponding time period, and can generate the track of the target person immediately by adding some manual intervention properly during checking. The method can greatly save the existing human, material and financial costs, and the generated target figure track has high application requirements and prospects in the business, security and protection industries and large-scale public places.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flowchart illustrating an overall method for efficient target person trajectory generation based on pedestrian re-identification according to an embodiment of the present invention;

fig. 2 is a flowchart of checking matching of a target person picture according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention provides a flow chart of an efficient target person track generation method based on pedestrian re-identification. As shown in fig. 1-2, it mainly comprises the following steps:

a pedestrian re-identification-based efficient target person track generation method comprises the following steps:

step (1): combining the idea of deep separable convolution in the current mainstream MobileNet-V2 network, designing a lightweight (with few parameters and few floating point operation) convolution neural network, training the existing public pedestrian re-identification data set to obtain a generation model, performing performance evaluation on the generation model through mAP and Rank-k, and selecting the optimal performance: the model with the maximum mAP and Rank-k values is used as a pedestrian re-identification model;

step (2): acquiring camera monitoring data of different positions in the same time period according to specific requirements;

and (3): detecting pedestrians in the video data in the step (2) by adopting a current mainstream target detection algorithm (specifically adopting a YOLO-V3 algorithm), and in order to avoid the situation that the same target is detected in multiple adjacent frames at the same time and a large amount of data is redundant, adopting a target tracking algorithm (specifically adopting a KCF algorithm) to perform IOU on adjacent tracking frames and detection frames to judge whether the adjacent tracking frames and the detection frames are the same target, generally, when the IOU is more than 0.4, the person in the tracking frames and the detection frames is considered as the same target, and if so, one picture of the person passing through the current camera is reserved;

and (4): naming the picture according to the IP address of the camera and the time information of the current picture; the picture naming rule is as follows, and if the picture name is 192.168.130.1_20191212114309.png, the IP of the camera is 192.168.130.1, and the time of the picture appearing in the camera is 43 minutes and 09 seconds at 12 months and 12 days in 2019 and 11 points in 11 days;

and (5): extracting all candidate picture features in the step (3) through the pedestrian re-identification model in the step (1); the candidate pictures refer to pictures of target people needing to be searched; the 'picture characteristic' is the output of the last full link layer in the lightweight convolutional neural network in the step (1), and is usually a group of n-dimensional floating point data, and the n-dimensional floating point data is stored in a database;

and (6): extracting two-dimensional features x [ n × m ] of m target figures to be retrieved by adopting the same method in the step (5), storing the two-dimensional features x [ n × m ] into a database, extracting corresponding features y [ n ] according to the target figure pictures, and performing feature matching with all the target pictures in the database, wherein the feature matching refers to calculating the similarity degree of the features x [ n × m ] of the figure pictures to be retrieved and the features y [ n ] of the target figure pictures;

in order to prevent target omission, the threshold value of the first target matching is reduced to a little, the threshold value is 0.6, so that more candidate persons can be matched, and then correct target persons can be screened out through manual judgment;

matching is carried out by a cosine similarity method, and a specific implementation formula is shown as follows; in order to prevent missing a target picture, a similarity threshold value which is relatively low is selected for matching (sim (X, Y) ≥ 0.6), wherein an X vector is the picture characteristic of a person to be retrieved in a candidate picture, and a Y vector is the picture characteristic of the target person;

Figure BDA0002591486520000061

and (7): matching the characters in the previous step with the characters in the database, matching by increasing the threshold value to ensure the matching accuracy, wherein the threshold value is usually selected to be 0.8, and then manually screening candidate character pictures; repeating the steps (6) to (7) until no new target person picture is generated;

and (8): and arranging the final target figure pictures from small to large according to the time information of the final target figure pictures to obtain the complete track information of the target.

The design and training of the pedestrian re-recognition model in the step (1) comprise the following steps:

s11, selecting a Pythrch as a training frame of the model;

s12, in order to ensure that the model can process image data in real time, the current mainstream MobileNet-V2 model is used as a backbone network, because the depth separable structure is used, the forward reasoning speed of the model can be improved while the parameter quantity of the model can be reduced;

s13, in order to improve the accuracy of the model, firstly, carrying out classification training on an imageNet data set, and then, carrying out training on the model on a public data set for pedestrian re-identification;

s14, performing performance evaluation on the generated training model through mAP and Rank-k, and selecting the model with the best performance as a final pedestrian re-identification model;

the Average precision Average value of the mAP (mean Average precision) measures the quality of the model in all the test set categories, and the larger the mAP value of the model in the same test set is, the better the classification effect is;

the Rank-k represents the probability that the top n graphs (with the highest confidence) in the search results have the correct results. And (3) by counting the accuracy of the accumulated correct results of the first to k-th samples in the ranking, if the accuracy of the model is higher before the Rank-k ranking is higher under the same test set, the retrieval precision of the model is higher.

The specific method for combining the target detection and the target tracking algorithm in the step (2) is as follows:

s21, taking the initial frame of the video as the input of a target detection algorithm (specifically adopting a YOLO-V3 algorithm), detecting a corresponding target pedestrian, and marking the pedestrian through a detection frame;

s22, using the detection frame as the input of a target tracking algorithm (specifically adopting a KCF algorithm), tracking the following 3 frames of images to obtain a tracking frame of the last frame of image;

s23, carrying out IOU calculation one by one on all the tracking frames in the step S22 and all the detection frames in the step S21, and if the IOU value is maximum and is greater than 0.4, determining that the tracking frames and the detection frames are the same person; otherwise, the person is not considered to be the same person.

Various alterations and modifications will no doubt become apparent to those skilled in the art after having read the above description. Therefore, the appended claims should be construed to cover all such variations and modifications as fall within the true spirit and scope of the invention. Any and all equivalent ranges and contents within the scope of the claims should be considered to be within the intent and scope of the present invention.

11页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:搜索方法、装置及服务器和计算机可读存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!