Vehicle weight identification method and system based on multi-view and convolution attention module

文档序号:191711 发布日期:2021-11-02 浏览:14次 中文

阅读说明:本技术 基于多视角和卷积注意力模块的车辆重识别方法和系统 (Vehicle weight identification method and system based on multi-view and convolution attention module ) 是由 张�浩 鲁统伟 贾世海 唐佳珊 于 2021-07-05 设计创作,主要内容包括:本发明提供了基于多视角和卷积注意力模块的车辆重识别方法和系统,通过加入卷积注意力模块,使得模型能够关注车辆的有效特征,忽略无效特征,提高了模型的辨别能力;通过局部模块中的公共面计算使得模型具有更强的视角抗干扰能力,解决了不同视角下的同一车辆难以辨别问题;实现了对外观相似的车辆和不同视角下的同一车辆进行重识别的功能。本发明结合全局特征表示和局部特征表示模块,使模型能学习到更好的特征信息,增强了网络的特征表达能力,更有效地执行车辆重识别任务。(The invention provides a vehicle weight recognition method and system based on a multi-view and convolution attention module, wherein the convolution attention module is added, so that a model can pay attention to effective characteristics of a vehicle, invalid characteristics are ignored, and the discrimination capability of the model is improved; the model has stronger visual angle anti-interference capability through common surface calculation in the local module, and the problem that the same vehicle at different visual angles is difficult to distinguish is solved; the function of re-identifying the vehicles with similar appearances and the same vehicle under different visual angles is realized. The invention combines the global characteristic representation module and the local characteristic representation module, so that the model can learn better characteristic information, the characteristic expression capability of the network is enhanced, and the vehicle re-identification task is executed more effectively.)

1. The vehicle weight identification method based on the multi-view and convolution attention module is characterized by comprising the following steps: the method comprises the following steps:

s1: extracting part of vehicle images from the vehicle image data set for semantic annotation; training the marked vehicle image through a deep learning segmentation network to obtain a semantic segmentation model; inputting all vehicle images in the vehicle image data set into a trained semantic segmentation model for reasoning, so that each image generates a Mask;

s2: constructing a vehicle weight recognition network based on a multi-view and convolution attention module, wherein the vehicle weight recognition network comprises a global feature representation module and a local feature representation module of a shared backbone network ResNet50, and a bottleneck layer bottleneck of a backbone network ResNet50 comprises a convolution attention module CBAM;

the backbone network ResNet50 flows the extracted features and Mask into a global feature representation module and a local feature representation module; the global feature representation module performs various pooling, splicing fusion and classification on the inflowing features; the local feature representation module aligns inflow features and masks to generate local features of different visual angles, and local distances of different vehicles at different visual angles are calculated through a common plane calculation method;

s3: calculating the global loss of the features classified by the global feature representation module by adopting a global loss function; calculating local loss by using a local loss function according to the local distance calculated by the local characteristic representation module; optimizing a vehicle weight identification network based on a multi-view and convolution attention module through a total loss function obtained by weighted addition of a global loss function and a local loss function;

s4: inputting the vehicle image and a corresponding Mask into a vehicle weight recognition network based on a multi-view and convolution attention module for training, generating a vehicle weight recognition model through iteration, and storing the model;

s5: and inputting the vehicle image to be retrieved into the trained vehicle weight recognition model for reasoning, and returning a vehicle retrieval result.

2. The multi-perspective and convolutional attention module-based vehicle weight recognition method of claim 1, wherein: in the step S1, the specific steps are as follows: the vehicle data set comprises a training set and a test set, and the test set comprises a query set and a gallery set; when the image is marked, marking different visual angles including the front part, the rear part, the side part and the top part of the vehicle by different colors; and training the marked image by adopting a deep learning segmentation network U-Net.

3. The multi-perspective and convolutional attention module-based vehicle weight recognition method of claim 2, wherein: in the step S2, the specific steps are as follows:

s21: the backbone network ResNet50 extracts the characteristics and flows the characteristics and Mask into a global characteristic representation module and a local characteristic representation module;

s22: the convolution attention module CBAM comprises a channel attention module and a space attention module which are sequentially added into a bottleneck layer bottleeck of a backbone network ResNet50 and used as the distinguishing capability of a shared backbone network enhanced model of a global feature representation module and a local feature representation module;

s23: the global feature representation module respectively performs global maximum pooling and global average pooling on the inflowing features, and then splices, fuses and classifies the two pooled features according to the dimension direction;

s24: the local feature representation module aligns the features extracted by the backbone network ResNet50 with the Mask; let the characteristics extracted by the backbone network ResNet50 be F (m, n), and the Mask be { S }iI belongs to {1, 2, 3, 4} }, and different values of i respectively represent different view angles of the front part, the rear part, the side part and the top part of the vehicle, so that local features F of the front part, the rear part, the side part and the top part of the vehicle at different view angles are respectively generatediComprises the following steps:

s25: the local feature representation module calculates scores a of different visual angles including front, rear, side and top of the vehicle through Maski

By passingCalculating the common surface score of the vehicle image x and the vehicle image y according to the view angle score

The Euclidean distance is set as D, and the score is obtained according to the common surfaceAnd the local feature F obtained in step S24iCalculating a local distance between two vehicles

4. The multi-perspective and convolutional attention module-based vehicle weight recognition method of claim 3, wherein: in the step S3, the specific steps are as follows:

using cross entropy loss and center loss in the global feature representation module; using the triplet penalties in the local feature representation module; let the anchor be a local distance from the positive sampleAnchor is a local distance from the negative sample ofThe edge coefficient is α, representing the function max (0,) as [ ·]+Then the total loss function is:

5. the multi-perspective and convolutional attention module-based vehicle weight identification method of claim 4, wherein: in the step S4, the specific steps are as follows:

s41: inputting the vehicle images of the training set and the corresponding Mask masks into a vehicle re-identification network based on a multi-view and convolution attention module, designating the iteration times of training, and training in batches;

s42: selecting an Adam algorithm as an optimizer in training, using a batch training method, wherein the sample size of single training is 64, and the number of training rounds epoch is 140;

s43: adopting a multi-step learning rate adjustment strategy, and setting the learning rates of [0, 20], [20, 50], [50, 90], [90, 140] epochs as 3.5e-5, 3.5e-4, 3.5e-5 and 3.5e-6 respectively;

s44: randomly cutting all input vehicle images to 256 multiplied by 256, and enhancing data by adopting a random brightness and random erasing method;

s45: and generating a vehicle weight recognition model and storing an optimal model result.

6. The multi-perspective and convolutional attention module-based vehicle weight identification method of claim 5, wherein: in the step S5, the specific steps are as follows:

s51: calculating the global distance D between the global features of the vehicle images to be retrieved and the global features of the vehicle images of the training set according to the Euclidean distanceglobal

S52: obtained according to step S25Calculating the local distance Dlocal

S53: let the weight coefficient be lambda1And λ2(ii) a The total distance between the features of the vehicle image to be retrieved and the features of the vehicle images of the training set is:

Dtotal=λ1Dglobal2Dlocal

7. a vehicle weight recognition system for use in the multi-perspective and convolutional attention module-based vehicle weight recognition method of any one of claims 1 to 6, characterized in that: the system comprises a data preprocessing module, a training module and an inference module which are connected in sequence;

the data preprocessing module is used for selecting part of data to train the semantic segmentation network and generating a Mask corresponding to the image;

the training module is used for building a network model, inputting a training image and a Mask training model, and obtaining an optimal model after training for a given number of times;

the reasoning module is used for inputting the vehicle image to be searched and outputting a search result image.

8. The vehicle weight recognition system according to claim 7, characterized in that:

the training module comprises a backbone network ResNet50, a global feature representation module and a local feature representation module which are respectively connected to the rear stage of the backbone network ResNet50, and a total loss function which is simultaneously connected to the rear stage of the global feature representation module and the rear stage of the local feature representation module;

the bottleneck layer bottleeck of the backbone network ResNet50 comprises a convolution attention module CBAM which comprises a channel attention module and a space attention module;

the global feature representation module comprises a global pooling layer, a splicing fusion module, a classifier and a global loss function which are connected in sequence; the global pooling layer comprises a global maximum pooling layer and a global average pooling layer;

the local feature representation module comprises a local feature alignment module, a local pooling layer, a local distance module and a local loss function which are connected in sequence.

9. A computer storage medium, characterized in that: stored therein is a computer program executable by a computer processor, the computer program performing the multi-perspective and convolutional attention module-based vehicle weight recognition method of any one of claims 1-6.

Technical Field

The invention belongs to the technical field of vehicle image retrieval, and particularly relates to a vehicle weight identification method and system based on a multi-view and convolution attention module.

Background

Vehicle weight identification is a vehicle image retrieval problem across cameras in a specific monitoring scene. In recent years, with the increase of monitoring cameras and the development of intelligent traffic, vehicle re-identification is receiving more and more attention. Due to the development of deep learning technology, the vehicle weight recognition technology is greatly improved. The advent of many excellent backbones lays the foundation for feature extraction, such as ResNet. Meanwhile, the release of vehicle re-identification data sets such as VeRI776, VERI-Wild and the like also promotes the development of vehicle re-identification.

However, vehicle weight recognition remains a challenging task in computer vision. First, it is still very difficult to distinguish between vehicles with similar appearances. Secondly, the change in the vehicle viewing angle will greatly increase the difficulty of vehicle re-identification.

To solve the re-identification problem of similar vehicles, researchers have done a lot of work to improve the discrimination ability of features. Liu et al propose a progressive search method from coarse to fine. Liu et al proposed a regional perception model to learn features from global appearance and local detail. Where color and some detail cues are also used for training to distinguish vehicles with similar appearances. He et al focus on details such as windows, lights and car logos. Wang et al mark the directional information of the vehicle using 20 key points, and propose a direction-based feature extraction method. For better optimization of the model, effective loss functions such as triplet loss, center loss, ID softmax, etc. are proposed.

The change in vehicle perspective is another challenge in re-identification. Zhou et al propose an attention inference model. According to the concept of generating a countermeasure network, a multi-view generation network is designed to generate a vehicle view feature vector. To cope with the influence of the viewing angle, Zhu et al propose a method of learning four directional features using a four-directional average pool layer to improve the robustness of the viewing angle change. Lin et al divide the search task into matching vehicles at the same perspective and matching vehicles at different perspectives. Meanwhile, two corresponding loss functions are designed to solve the problem of difficulty in re-recognition caused by the change of the visual angle of the camera.

Although the performance model of the above methods has a certain effect on improvement, the feature representation and the view angle change are not effectively combined. So while paying attention to efficient feature extraction and representation, a solution to the view angle problem should be fused in.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: a vehicle re-identification method and a system based on multi-view and convolution attention modules are provided, and are used for re-identifying vehicles with similar appearances and the same vehicle under different views.

The technical scheme adopted by the invention for solving the technical problems is as follows: the vehicle weight identification method based on the multi-view and convolution attention module comprises the following steps:

s1: extracting part of vehicle images from the vehicle image data set for semantic annotation; training the marked vehicle image through a deep learning segmentation network to obtain a semantic segmentation model; inputting all vehicle images in the vehicle image data set into a trained semantic segmentation model for reasoning, so that each image generates a Mask;

s2: constructing a vehicle weight recognition network based on a multi-view and convolution attention module, wherein the vehicle weight recognition network comprises a global feature representation module and a local feature representation module of a shared backbone network ResNet50, and a bottleneck layer bottleneck of a backbone network ResNet50 comprises a convolution attention module CBAM; the backbone network ResNet50 flows the extracted features and Mask into a global feature representation module and a local feature representation module; the global feature representation module performs various pooling, splicing fusion and classification on the inflowing features; the local feature representation module aligns inflow features and masks to generate local features of different visual angles, and local distances of different vehicles at different visual angles are calculated through a common plane calculation method;

s3: calculating the global loss of the features classified by the global feature representation module by adopting a global loss function; calculating local loss by using a local loss function according to the local distance calculated by the local characteristic representation module; optimizing a vehicle weight identification network based on a multi-view and convolution attention module through a total loss function obtained by weighted addition of a global loss function and a local loss function;

s4: inputting the vehicle image and a corresponding Mask into a vehicle weight recognition network based on a multi-view and convolution attention module for training, generating a vehicle weight recognition model through iteration, and storing the model;

s5: and inputting the vehicle image to be retrieved into the trained vehicle weight recognition model for reasoning, and returning a vehicle retrieval result.

According to the scheme, in the step S1, the specific steps are as follows: the vehicle data set comprises a training set and a test set, and the test set comprises a query set and a gallery set; when the image is marked, marking different visual angles including the front part, the rear part, the side part and the top part of the vehicle by different colors; and training the marked image by adopting a deep learning segmentation network U-Net.

Further, in step S2, the specific steps include:

s21: the backbone network ResNet50 extracts the characteristics and flows the characteristics and Mask into a global characteristic representation module and a local characteristic representation module;

s22: the convolution attention module CBAM comprises a channel attention module and a space attention module which are sequentially added into a bottleneck layer bottleeck of a backbone network ResNet50 and used as the distinguishing capability of a shared backbone network enhanced model of a global feature representation module and a local feature representation module;

s23: the global feature representation module respectively performs global maximum pooling and global average pooling on the inflowing features, and then splices, fuses and classifies the two pooled features according to the dimension direction;

s24: the local feature representation module aligns the features extracted by the backbone network ResNet50 with the Mask; let the characteristics extracted by the backbone network ResNet50 be F (m, n), and the Mask be { S }iIf i belongs to {1, 2, 3, 4} }, different values of i respectively represent different view angles of the front part, the rear part, the side part and the top part of the vehicle, and local features F of the front part, the rear part, the side part and the top part of the vehicle at different view angles are respectively generatediComprises the following steps:

s25: the local feature representation module calculates scores a of different visual angles including front, rear, side and top of the vehicle through Maski

Calculating the common surface score of the vehicle image x and the vehicle image y through the view angle score

The Euclidean distance is set as D, and the score is obtained according to the common surfaceAnd the local feature F obtained in step S24iCalculating a local distance between two vehicles

Further, in step S3, the specific steps include:

using cross entropy loss and center loss in the global feature representation module; using the triplet penalties in the local feature representation module; let the anchor be a local distance from the positive sampleAnchor is a local distance from the negative sample ofEdge coefficient ofα, representing the function max (0, ·) as [ ·]+Then the total loss function is:

further, in step S4, the specific steps include:

s41: inputting the vehicle images of the training set and the corresponding Mask masks into a vehicle re-identification network based on a multi-view and convolution attention module, designating the iteration times of training, and training in batches;

s42: selecting an Adam algorithm as an optimizer in training, using a batch training method, wherein the sample size of single training is 64, and the number of training rounds epoch is 140;

s43: adopting a multi-step learning rate adjustment strategy, and setting the learning rates of [0, 20], [20, 50], [50, 90], [90, 140] epochs as 3.5e-5, 3.5e-4, 3.5e-5 and 3.5e-6 respectively;

s44: randomly cutting all input vehicle images to 256 multiplied by 256, and enhancing data by adopting a random brightness and random erasing method;

s45: and generating a vehicle weight recognition model and storing an optimal model result.

Further, in step S5, the specific steps include:

s51: calculating the global distance D between the global features of the vehicle images to be retrieved and the global features of the vehicle images of the training set according to the Euclidean distanceglobal

S52: obtained according to step S25Calculating the local distance Dlocal

S53: let the weight coefficient be lambda1And λ2(ii) a The total distance between the features of the vehicle image to be retrieved and the features of the vehicle images of the training set is:

Dtotal=λ1Dglobal2Dlocal

the vehicle weight recognition system based on the multi-view and convolution attention module comprises a data preprocessing module, a training module and an inference module which are sequentially connected; the data preprocessing module is used for selecting part of data to train the semantic segmentation network and generating a Mask corresponding to the image; the training module is used for building a network model, inputting a training image and a Mask training model, and obtaining an optimal model after training for a given number of times; the reasoning module is used for inputting the vehicle image to be searched and outputting a search result image.

Further, the training module comprises a backbone network ResNet50, a global feature representation module and a local feature representation module which are respectively connected to the rear stage of the backbone network ResNet50, and a total loss function which is simultaneously connected to the rear stage of the global feature representation module and the rear stage of the local feature representation module; the bottleneck layer bottleeck of the backbone network ResNet50 comprises a convolution attention module CBAM which comprises a channel attention module and a space attention module; the global feature representation module comprises a global pooling layer, a splicing fusion module, a classifier and a global loss function which are connected in sequence; the global pooling layer comprises a global maximum pooling layer and a global average pooling layer; the local feature representation module comprises a local feature alignment module, a local pooling layer, a local distance module and a local loss function which are connected in sequence.

A computer storage medium having stored therein a computer program executable by a computer processor, the computer program performing a vehicle weight recognition method based on a multi-perspective and convolutional attention module.

The invention has the beneficial effects that:

1. according to the vehicle weight identification method and system based on the multi-view and convolution attention module, the convolution attention module is added, so that the model can pay attention to the effective characteristics of the vehicle, the invalid characteristics are ignored, and the discrimination capability of the model is improved; the model has stronger visual angle anti-interference capability through common surface calculation in the local module, and the problem that the same vehicle at different visual angles is difficult to distinguish is solved; the function of re-identifying the vehicles with similar appearances and the same vehicle under different visual angles is realized.

2. The invention combines the global characteristic representation module and the local characteristic representation module, so that the model can learn better characteristic information, the characteristic expression capability of the network is enhanced, and the vehicle re-identification task is executed more effectively.

Drawings

FIG. 1 is a flow chart of an embodiment of the present invention.

Fig. 2 is a network configuration diagram of an embodiment of the present invention.

Fig. 3 is a system flow diagram of an embodiment of the invention.

FIG. 4 is a graph of the results of the example of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.

Referring to fig. 1, the vehicle weight recognition method based on the multi-view and convolution attention modules according to the embodiment of the present invention includes the following steps:

s1: and extracting a part of image from the vehicle re-identification data set, labeling by using labelme, and training a semantic segmentation network. Inputting all images in the data set into a trained semantic segmentation network, so that each image generates a Mask;

the invention adopts a VeRi776 vehicle data set which comprises a training set and a test set (further divided into a query set and a gallery set). 1665 images are extracted from the training set, and 400 images are extracted from the testing set for semantic annotation. In the labeling, the front, rear, side, and top of the vehicle are labeled with different colors, respectively, using labelme software. And then training the labeled image by using U-Net to obtain a semantic segmentation model. And finally, inputting all images in the training set and the test set into the trained segmentation model for reasoning, and generating a corresponding Mask for each image in the VeRi data set.

S2: designing a vehicle weight recognition network structure based on multi-view and convolution attention, wherein the vehicle weight recognition network structure comprises a global feature representation module and a local feature representation module;

s21: the network structure is divided into two branches, namely a global characteristic representation module and a local characteristic representation module. Both share a backbone network.

S22: the convolution attention module (CBAM) is added to the ResNet50 backbone network to enhance the discriminative power of the model.

The convolution attention module comprises two modules of channel attention and space attention, which are sequentially added into the bottleeck of the ResNet50 network to serve as a shared backbone network of the global feature representation module and the local feature representation module. After global features are extracted using ResNet50, one branch flows into the global module and the other branch flows into the local module.

S23: in a global feature representation module, features extracted by a ResNet50 network are subjected to global maximum pooling and global average pooling respectively, and then the two pooled features are spliced together according to the dimension direction to enhance the robustness of the features. Then the global loss is sent to a subsequent classifier for classification, and the global loss is calculated.

S24: in the local feature representation module, features extracted by using Mask and ResNet50 are aligned to generate local features F of four perspectives of the front, the rear, the side and the top of the vehiclei. Local feature FiExpressed as:

wherein Mask of the vehicle image is expressed as SiI ∈ {1, 2, 3, 4} }, representing the four parts front, back, side, and top, respectively, and F (m, n) is the feature extracted by ResNet 50.

S25: in the local feature representation module, the Mask of the image is used for calculating the view angle score a of the four partsi

Then, the common surface score of the two vehicle images x and y is calculated through the view angle score

And finally, calculating the local distance between the two vehicles according to the previous local features and the view angle scores

S3: designing a loss function, wherein different loss functions are respectively used by a global module and a local module in a network structure, and finally, the loss functions obtained by adding the loss functions are used for optimizing a model;

the loss function contains two parts: and the global loss function and the local loss function are added to obtain a total loss function so as to optimize the model. Cross-entropy loss (cross-entropy loss) and center loss (center loss) are used at the global module. Triple loss (triplet loss) is used in the local module, and the loss function is expressed as:

whereinRepresenting the local distance of the anchor from the positive sample,representing the local distance of the anchor from the negative sample, alpha representing the edge coefficient, [. cndot.]+Represents the function max (0, ·).

S4: inputting the training image and the Mask generated by the training image into the network for training, generating a vehicle weight recognition model and storing the model;

and inputting the training set into the network structure, designating the iteration times of training, training in batches, and storing the optimal model result.

During training, Adam is selected as an optimizer, a batch training method is used, the batch size is 64, and the number of training rounds is 140 epochs. Meanwhile, a multi-step learning rate adjustment strategy is adopted, and the learning rates of [0, 20], [20, 50], [50, 90] and [90, 140] epochs are respectively 3.5e-5, 3.5e-4, 3.5e-5 and 3.5 e-6. For all input images, they are first randomly cropped to 256 × 256. Meanwhile, a data enhancement method of random brightness and random erasure is also adopted.

S5: and inputting the vehicle image to be retrieved, reasoning by using the trained vehicle weight recognition model, and returning a vehicle retrieval result.

In the reasoning stage, firstly, the global distance D between the global features of the vehicle and the global features of the gallery set is calculated and inquired by using Euclidean distanceglobalThen according to the aboveCalculating the local distance Dlocal. The total distance between the query set and the gallery set is calculated as follows. In the present invention, let λ1=0.7,λ2=0.3。

The distance between the features of the two images is expressed as:

Dtotal=λ1Dglobal2Dlocal

wherein DglobalIs the Euclidean distance between features, DlocalIs as described aboveλ1,λ2Are weight coefficients.

The present invention also provides a vehicle weight recognition system based on multi-view and convolution attention modules, as shown in fig. 3, comprising:

the data preprocessing module 101 selects part of data to train a semantic segmentation network and generates Mask corresponding to the image;

the training module 102 inputs a training image and a Mask training model, and obtains an optimal model after training for a given number of times;

the reasoning module 103 inputs the vehicle image to be retrieved and outputs a retrieval result image;

test examples:

three metrics are used: mAP, CMC Rank @1, and CMC Rank @5 tested our method on the VeRi776 dataset. The method of the present invention was first compared to several methods focusing on characterization, including PROVID, RAM, PNVR, OIFE. Meanwhile, the method is compared with methods for solving viewpoint problems such as VAMI, QD-DLF, MVL, VANET and the like.

Table 1 comparison of the present invention with eight excellent methods

As can be seen from Table 1, the method of the present invention achieves higher scores on all three metrics than the other eight methods, demonstrating that the method of the present invention is superior to the comparative method.

The above embodiments are only used for illustrating the design idea and features of the present invention, and the purpose of the present invention is to enable those skilled in the art to understand the content of the present invention and implement the present invention accordingly, and the protection scope of the present invention is not limited to the above embodiments. Therefore, all equivalent changes and modifications made in accordance with the principles and concepts disclosed herein are intended to be included within the scope of the present invention.

12页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种结合方向感知核集群的户型图识别方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!