Vehicle weight identification method, electronic equipment and related product

文档序号:616060 发布日期:2021-05-07 浏览:3次 中文

阅读说明:本技术 车辆重识别方法、电子设备及相关产品 (Vehicle weight identification method, electronic equipment and related product ) 是由 范艳 张鹏 贺武 吴伟华 于 2021-01-18 设计创作,主要内容包括:本申请实施例公开了一种车辆重识别方法、电子设备及相关产品,应用于电子设备,该方法包括:分别将第一查询图像和第一底库图像输入到预设风格转换模型,得到第二查询图像和第二底库图像;将第一车辆训练数据输入到预设风格转换模型,得到第二车辆训练数据;通过第一车辆训练数据和第二车辆训练数据训练第一特征提取网络,得到第二特征提取网络;将第二查询图像输入到第二特征提取网络进行特征提取,得到查询图像特征;将第二底库图像输入到第二特征提取网络进行特征提取,得到底库图像特征;将查询图像特征和底库图像特征进行匹配,得到匹配结果,展示匹配结果。采用本申请实施例可以提升车辆重识别准确率。(The embodiment of the application discloses a vehicle weight recognition method, electronic equipment and related products, which are applied to the electronic equipment, wherein the method comprises the following steps: respectively inputting the first query image and the first base library image into a preset style conversion model to obtain a second query image and a second base library image; inputting the first vehicle training data into a preset style conversion model to obtain second vehicle training data; training a first feature extraction network through first vehicle training data and second vehicle training data to obtain a second feature extraction network; inputting the second query image into a second feature extraction network for feature extraction to obtain query image features; inputting the second base image into a second feature extraction network for feature extraction to obtain base image features; and matching the query image features with the bottom library image features to obtain matching results, and displaying the matching results. By adopting the embodiment of the application, the accuracy rate of vehicle weight identification can be improved.)

1. A vehicle weight recognition method is applied to an electronic device, and comprises the following steps:

acquiring a first query image, a first base library image and first vehicle training data;

respectively inputting the first query image and the first base image into a preset style conversion model to obtain a second query image and a second base image, wherein the preset style conversion model is used for converting input data into output data with uniform style;

inputting the first vehicle training data into the preset style conversion model to obtain second vehicle training data;

training a first feature extraction network through the first vehicle training data and the second vehicle training data to obtain a second feature extraction network;

inputting the second query image into the second feature extraction network for feature extraction to obtain query image features;

inputting the second base image into the second feature extraction network for feature extraction to obtain base image features;

and matching the query image features with the image features of the bottom library to obtain matching results, and displaying the matching results.

2. The method of claim 1, wherein the first feature extraction network comprises a variable auto-encoder module and a depth feature extraction module, and wherein training the first feature extraction network with the first vehicle training data and the second vehicle training data results in a second feature extraction network comprising:

reconstructing training data i through the variable automatic encoder module to obtain a reconstructed image, wherein the training data i is any training data in the first vehicle training data or the second vehicle training data;

determining a residual image through the training data i and the reconstructed image;

determining a combined image according to the reconstructed image and the residual image;

performing feature extraction on the combined image through the depth feature extraction module to obtain training features;

and adjusting the model parameters of the first feature extraction network through the training features and a preset loss function to obtain the second feature extraction network.

3. The method of claim 2, wherein determining a combined image from the reconstructed image and the residual image comprises:

determining a first feature point distribution density of the reconstructed image;

determining a second feature point distribution density of the training data i;

determining a ratio between the first feature point distribution density and the second feature point distribution density;

dividing the reconstructed image into a plurality of regions, and determining the distribution density of each characteristic point in the plurality of regions to obtain a plurality of characteristic point distribution densities;

determining a target mean square error according to the distribution densities of the plurality of feature points;

determining a target fine-tuning coefficient corresponding to the target mean square error according to a mapping relation between a preset mean square error and the fine-tuning coefficient;

fine-tuning the ratio according to the target fine-tuning coefficient to obtain a convex parameter;

determining the combined image according to the reconstructed image and the residual image of the convex parameters.

4. The method of claim 1, wherein inputting the second query image into the second feature extraction network for feature extraction to obtain query image features comprises:

inputting the second query image into a backbone network for feature extraction to obtain a feature map;

carrying out global feature extraction on the feature map to obtain global features, wherein the global features are the most obvious global appearance representation of the vehicle;

local area detection is carried out on the second query image to obtain an area of interest;

dividing the global features into S multiplied by S grids, wherein S is an integer larger than 1;

projecting the region of interest to the S multiplied by S grids, and extracting local feature vectors of each projected grid by using local average pooling to obtain local features, wherein the local features are detailed features of a specific region of the vehicle; the local features in turn optimize the global feature representation in a back propagation process;

and performing aggregation operation on the global features and the local features to obtain the query image features.

5. The method according to any one of claims 1-4, further comprising:

acquiring initial training data and corresponding label data, wherein the initial training data come from a plurality of cameras;

acquiring a style conversion network of a generative confrontation network, wherein a style attention module is preset in a generator of the generative confrontation network of the style conversion network and is used for acquiring shallow image characteristics and attention characteristics related to style;

and using the initial training data and the corresponding label data to train the style conversion network to obtain the preset style conversion model.

6. A vehicle weight recognition apparatus, applied to an electronic device, comprising: an acquisition unit, a conversion unit, a training unit, an extraction unit and a presentation unit, wherein,

the acquisition unit is used for acquiring a first query image, a first base database image and first vehicle training data;

the conversion unit is used for respectively inputting the first query image and the first base image into a preset style conversion model to obtain a second query image and a second base image, and the preset style conversion model is used for converting input data into output data with uniform style;

the conversion unit is further configured to input the first vehicle training data into the preset style conversion model to obtain second vehicle training data;

the training unit is used for training a first feature extraction network through the first vehicle training data and the second vehicle training data to obtain a second feature extraction network;

the extracting unit is used for inputting the second query image into the second feature extraction network for feature extraction to obtain query image features;

the extraction unit is further configured to input the second base image into the second feature extraction network for feature extraction, so as to obtain base image features;

and the display unit is used for matching the query image features with the bottom library image features to obtain a matching result and displaying the matching result.

7. The apparatus of claim 6, wherein the first feature extraction network comprises a variable auto-encoder module and a depth feature extraction module, and wherein the training unit is specifically configured to, in the training of the first feature extraction network by the first vehicle training data and the second vehicle training data to obtain the second feature extraction network:

reconstructing training data i through the variable automatic encoder module to obtain a reconstructed image, wherein the training data i is any training data in the first vehicle training data or the second vehicle training data;

determining a residual image through the training data i and the reconstructed image;

determining a combined image according to the reconstructed image and the residual image;

performing feature extraction on the combined image through the depth feature extraction module to obtain training features;

and adjusting the model parameters of the first feature extraction network through the training features and a preset loss function to obtain the second feature extraction network.

8. The apparatus according to claim 7, characterized in that the training unit is specifically configured to, in said determining a combined image from the reconstructed image and the residual image:

determining a first feature point distribution density of the reconstructed image;

determining a second feature point distribution density of the training data i;

determining a ratio between the first feature point distribution density and the second feature point distribution density;

dividing the reconstructed image into a plurality of regions, and determining the distribution density of each characteristic point in the plurality of regions to obtain a plurality of characteristic point distribution densities;

determining a target mean square error according to the distribution densities of the plurality of feature points;

determining a target fine-tuning coefficient corresponding to the target mean square error according to a mapping relation between a preset mean square error and the fine-tuning coefficient;

fine-tuning the ratio according to the target fine-tuning coefficient to obtain a convex parameter;

determining the combined image according to the reconstructed image and the residual image of the convex parameters.

9. An electronic device comprising a processor, a memory for storing one or more programs and configured for execution by the processor, the programs comprising instructions for performing the steps in the method of any of claims 1-5.

10. A computer-readable storage medium, characterized in that a computer program for electronic data exchange is stored, wherein the computer program causes a computer to perform the method according to any one of claims 1-5.

Technical Field

The application relates to the technical field of image processing, in particular to a vehicle weight recognition method, electronic equipment and related products.

Background

The vehicle re-identification means that a vehicle image is given, the same vehicle in a vehicle image database is identified, and in a real traffic monitoring system, the vehicle re-identification can play a role in positioning, supervision and criminal investigation on a target vehicle. There are usually multiple cameras in the vehicle weight recognition task, and due to various factors (environment, light, etc.), the style of the image captured by each camera of the same vehicle will usually be different. In addition, even the same camera captures images of different styles due to differences in time (morning, noon, afternoon, etc.). Therefore, the style of the image may be changed, which has a great influence on the result of the final vehicle re-recognition task. In addition, the change of relative position between camera and the vehicle can lead to same vehicle to present different angles in different images, and there is the change that is showing in the outward appearance of vehicle, and in addition, because the vehicle that has different identities can have the same brand, model and colour, the extraction has the characteristics of discriminant and is extremely crucial to the task that the vehicle was heavy discerned, and then, how to promote the accuracy problem that the vehicle was heavy discerned in the real traffic control scene and await solving.

Disclosure of Invention

The embodiment of the application provides a vehicle weight identification method and a related product, and the vehicle weight identification accuracy can be improved.

In a first aspect, an embodiment of the present application provides a vehicle weight recognition method, which is applied to an electronic device, and the method includes:

acquiring a first query image, a first base library image and first vehicle training data;

respectively inputting the first query image and the first base image into a preset style conversion model to obtain a second query image and a second base image, wherein the preset style conversion model is used for converting input data into output data with uniform style;

inputting the first vehicle training data into the preset style conversion model to obtain second vehicle training data;

training a first feature extraction network through the first vehicle training data and the second vehicle training data to obtain a second feature extraction network;

inputting the second query image into the second feature extraction network for feature extraction to obtain query image features;

inputting the second base image into the second feature extraction network for feature extraction to obtain base image features;

and matching the query image features with the image features of the bottom library to obtain matching results, and displaying the matching results.

In a second aspect, an embodiment of the present application provides a vehicle weight recognition apparatus, which is applied to an electronic device, and the apparatus includes: an acquisition unit, a conversion unit, a training unit, an extraction unit and a presentation unit, wherein,

the acquisition unit is used for acquiring a first query image, a first base database image and first vehicle training data;

the conversion unit is used for respectively inputting the first query image and the first base image into a preset style conversion model to obtain a second query image and a second base image, and the preset style conversion model is used for converting input data into output data with uniform style;

the conversion unit is further configured to input the first vehicle training data into the preset style conversion model to obtain second vehicle training data;

the training unit is used for training a first feature extraction network through the first vehicle training data and the second vehicle training data to obtain a second feature extraction network;

the extracting unit is used for inputting the second query image into the second feature extraction network for feature extraction to obtain query image features;

the extraction unit is further configured to input the second base image into the second feature extraction network for feature extraction, so as to obtain base image features;

and the display unit is used for matching the query image features with the bottom library image features to obtain a matching result and displaying the matching result.

In a third aspect, an embodiment of the present application provides an electronic device, including a processor, a memory, a communication interface, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the processor, and the program includes instructions for executing the steps in the first aspect of the embodiment of the present application.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program for electronic data exchange, where the computer program enables a computer to perform some or all of the steps described in the first aspect of the embodiment of the present application.

In a fifth aspect, embodiments of the present application provide a computer program product, where the computer program product includes a non-transitory computer-readable storage medium storing a computer program, where the computer program is operable to cause a computer to perform some or all of the steps as described in the first aspect of the embodiments of the present application. The computer program product may be a software installation package.

The embodiment of the application has the following beneficial effects:

it can be seen that the vehicle weight recognition method, the electronic device and the related product described in the embodiments of the present application are applied to an electronic device, obtain a first query image, a first base image and first vehicle training data, respectively input the first query image and the first base image into a preset style conversion model to obtain a second query image and a second base image, the preset style conversion model is used to convert input data into output data with a uniform style, input the first vehicle training data into the preset style conversion model to obtain second vehicle training data, train a first feature extraction network through the first vehicle training data and the second vehicle training data to obtain a second feature extraction network, input the second query image into the second feature extraction network for feature extraction to obtain query image features, input the second base image into the second feature extraction network for feature extraction, the method comprises the steps of obtaining the image characteristics of the base, matching the query image characteristics with the image characteristics of the base to obtain a matching result, displaying the matching result, eliminating style differences of different pictures, enabling query and galery to achieve better matching, extracting distinguishing characteristics specific to a vehicle, and remarkably improving accuracy of vehicle weight recognition in the whole process.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1A is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure;

fig. 1B is a schematic flowchart of a vehicle weight recognition method according to an embodiment of the present application;

fig. 1C is a schematic structural diagram of a feature extraction network according to an embodiment of the present application;

fig. 1D is a schematic structural diagram of another feature extraction network provided in this embodiment of the present application;

fig. 1E is a schematic structural diagram of another feature extraction network provided in this embodiment of the present application;

FIG. 1F is a schematic flow chart diagram illustrating another vehicle weight recognition method according to an embodiment of the present disclosure;

FIG. 2 is a schematic flow chart diagram illustrating another vehicle weight recognition method provided by the embodiment of the present application;

fig. 3 is a schematic structural diagram of another electronic device provided in an embodiment of the present application;

fig. 4 is a block diagram of functional units of a vehicle weight recognition device according to an embodiment of the present application.

Detailed Description

The terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may include other steps or elements not listed or inherent to such process, method, article, or apparatus in one possible example.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The electronic device according to the embodiment of the present application may be a handheld device, an intelligent robot, a vehicle-mounted device, a wearable device, a computing device or other processing devices connected to a wireless modem, and various forms of User Equipment (UE), a mobile station (mobile station, MS), a terminal device (terminal device), and the like, and the electronic device may also be a server or an intelligent home device.

In the embodiment of the application, the smart home device may be at least one of the following: refrigerator, washing machine, electricity rice cooker, intelligent (window) curtain, intelligent lamp, intelligent bed, intelligent garbage bin, microwave oven, steam ager, air conditioner, lampblack absorber, server, intelligent door, smart window, door wardrobe, intelligent audio amplifier, intelligent house, intelligent chair, intelligent clothes hanger, intelligent shower, water dispenser, water purifier, air purifier, doorbell, monitored control system, intelligent garage, TV set, projecting apparatus, intelligent dining table, intelligent sofa, massage armchair, treadmill etc. of course, can also include other equipment.

As shown in fig. 1A, fig. 1A is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. The electronic device includes a processor, a Memory, a signal processor, a transceiver, a display screen, a speaker, a microphone, a Random Access Memory (RAM), a camera, a sensor, a network module, and the like. The storage, the signal processor DSP, the loudspeaker, the microphone, the RAM, the camera, the sensor and the network module are connected with the processor, and the transceiver is connected with the signal processor.

The Processor is a control center of the electronic device, connects various parts of the whole electronic device by using various interfaces and lines, executes various functions and processes data of the electronic device by running or executing software programs and/or modules stored in the memory and calling the data stored in the memory, thereby performing overall monitoring on the electronic device, and may be a Central Processing Unit (CPU), a Graphics Processing Unit (GPU) or a Network Processing Unit (NPU).

Further, the processor may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor.

The memory is used for storing software programs and/or modules, and the processor executes various functional applications of the electronic equipment and vehicle weight identification by running the software programs and/or modules stored in the memory. The memory mainly comprises a program storage area and a data storage area, wherein the program storage area can store an operating system, a software program required by at least one function and the like; the storage data area may store data created according to use of the electronic device, and the like. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

Wherein the sensor comprises at least one of: light-sensitive sensors, gyroscopes, infrared proximity sensors, vibration detection sensors, pressure sensors, etc. Among them, the light sensor, also called an ambient light sensor, is used to detect the ambient light brightness. The light sensor may include a light sensitive element and an analog to digital converter. The photosensitive element is used for converting collected optical signals into electric signals, and the analog-to-digital converter is used for converting the electric signals into digital signals. Optionally, the light sensor may further include a signal amplifier, and the signal amplifier may amplify the electrical signal converted by the photosensitive element and output the amplified electrical signal to the analog-to-digital converter. The photosensitive element may include at least one of a photodiode, a phototransistor, a photoresistor, and a silicon photocell.

The camera may be a visible light camera (general view angle camera, wide angle camera), an infrared camera, or a dual camera (having a distance measurement function), which is not limited herein.

The network module may be at least one of: a bluetooth module, a wireless fidelity (Wi-Fi), etc., which are not limited herein.

Based on the electronic device described in fig. 1A, the following vehicle weight recognition method can be performed, and the specific steps are as follows:

acquiring a first query image, a first base library image and first vehicle training data;

respectively inputting the first query image and the first base image into a preset style conversion model to obtain a second query image and a second base image, wherein the preset style conversion model is used for converting input data into output data with uniform style;

inputting the first vehicle training data into the preset style conversion model to obtain second vehicle training data;

training a first feature extraction network through the first vehicle training data and the second vehicle training data to obtain a second feature extraction network;

inputting the second query image into the second feature extraction network for feature extraction to obtain query image features;

inputting the second base image into the second feature extraction network for feature extraction to obtain base image features;

and matching the query image features with the image features of the bottom library to obtain matching results, and displaying the matching results.

It can be seen that, in the electronic device described in this embodiment of the present application, a first query image, a first base image, and first vehicle training data are obtained, the first query image and the first base image are respectively input to a preset style conversion model to obtain a second query image and a second base image, the preset style conversion model is used to convert input data into output data with a uniform style, the first vehicle training data is input to the preset style conversion model to obtain second vehicle training data, a first feature extraction network is trained through the first vehicle training data and the second vehicle training data to obtain a second feature extraction network, the second query image is input to the second feature extraction network to perform feature extraction to obtain query image features, the second base image is input to the second feature extraction network to perform feature extraction to obtain base image features, the query image features and the bottom library image features are matched to obtain a matching result, the matching result is displayed, style differences of different pictures can be eliminated, better matching between query and gallery can be achieved, distinguishing features specific to vehicles can be extracted, accuracy of vehicle weight recognition can be remarkably improved in the whole process, particularly images with larger differences in vehicle image styles and vehicle angles and images with similar brands, models, colors and the like can be improved, and accuracy of vehicle weight recognition is improved.

The embodiment of the application provides a unified style conversion method, which is used for eliminating style difference among different cameras, namely, vehicle images in different styles are generated into vehicle images with unified camera styles through a style conversion model. In addition, because vehicles with different identities can have the same brand, model and color, how to extract the characteristics with discrimination is very critical to the task of vehicle re-identification.

Referring to fig. 1B, fig. 1B is a schematic flowchart of a vehicle weight recognition method according to an embodiment of the present application, and as shown in the drawing, the vehicle weight recognition method is applied to the electronic device shown in fig. 1A, and includes:

101. a first query image, a first base library image and first vehicle training data are obtained.

The first query image may be a vehicle image, the first base image may be a vehicle image, and the first vehicle training data may be a vehicle image.

102. And respectively inputting the first query image and the first base library image into a preset style conversion model to obtain a second query image and a second base library image, wherein the preset style conversion model is used for converting input data into output data with uniform style.

The preset style conversion model can be stored in the electronic device in advance, and is used for converting input data into output data with uniform style.

103. And inputting the first vehicle training data into the preset style conversion model to obtain second vehicle training data.

The electronic equipment can input the first vehicle training data into the preset style conversion model to unify the styles, and then the second vehicle training data can be obtained.

Optionally, before the step 101, the following steps may be further included: 5. the method according to any one of claims 1-4, further comprising:

a1, acquiring initial training data and corresponding label data, wherein the initial training data come from a plurality of cameras;

a2, acquiring a style conversion network of a generative confrontation network, wherein a style attention module is preset in a generator of the generative confrontation network of the style conversion network, and is used for acquiring shallow image features and attention features related to styles;

and A3, using the initial training data and the corresponding label data to train the style conversion network to obtain the preset style conversion model.

In a specific implementation, the electronic device may acquire a generative confrontation network, and acquire initial training data and corresponding label data, the initial training data being from a plurality of cameras, acquire a style conversion network of the generative confrontation network, a generator of the generative confrontation network of the style conversion network being provided with a style attention module in advance, the style attention module being used for implementing acquisition of a shallow image feature as an attention feature related to a style, in this embodiment, in order to generate an image with a shape-stable and uniform style, a style attention module is added to the generator of the style conversion network, the shallow image feature may obtain an attention feature related to the style through the style attention module, train the style conversion network with each camera and all camera data in a training set as a group, obtain the preset style conversion model, namely, the initial training data and the corresponding label data are used for training the style conversion network to obtain a preset style conversion model.

In the embodiment of the present application, the preset style conversion model may be implemented based on a Generative Adaptive Networks (GAN), and in order to ensure that the style conversion model can generate an image with a stable shape and a uniform style, in a specific implementation, a style attention module may be added to a generator of the style conversion model, that is, attention features related to a style may be obtained through the shallow image feature through the module.

The formula is defined as follows:

A(x)=Sigmoid(Astyle(G1(x)))

wherein A isstyleFor the style attention module, G1 is the shallow image feature output by the generator of the preset style conversion model, and x is the input image. Thus, the loss function of the entire style conversion model can also be obtained:

wherein Loss represents the Loss function of the whole style conversion model, L is the total number of cameras, c is the serial number of the cameras,stylistic attention response of the ith data shot for the c-th camera, LUIs a joint loss function consisting of standard GAN loss, feature matching loss, identity mapping loss and cycle reconstruction loss. For example, during training, one data (e.g., camera c ═ 1) shot by a certain camera is selected, and the data shot by all cameras are respectively grouped into one group (c ═ 1, c ═ 2, and the like), so y has a superscript c, that is, one data (x) of any camera and one data (y) of all cameras are respectively selected.

In a specific implementation, in the training stage, the images in the training set may be trained with each camera and all cameras as a group, for example, all the images may be adjusted to 320 × 320, and the trained style conversion model may generate images with uniform style.

104. And training a first feature extraction network through the first vehicle training data and the second vehicle training data to obtain a second feature extraction network.

The electronic device can train the first feature extraction network through the first vehicle training data and the second vehicle training data, and then can improve the performance of the feature extraction network to obtain the second feature extraction network. Attention can be focused on areas with discriminative features (such as vehicle annual inspection marks, ornaments, hangers, spray letters, impact marks and the like) so as to improve the robustness to directions and the recognition of fine-grained details.

Optionally, the step 104 of training the first feature extraction network by the first vehicle training data and the second vehicle training data to obtain a second feature extraction network may include the following steps:

41. reconstructing training data i through the variable automatic encoder module to obtain a reconstructed image, wherein the training data i is any training data in the first vehicle training data or the second vehicle training data;

42. determining a residual image through the training data i and the reconstructed image;

43. determining a combined image according to the reconstructed image and the residual image;

44. performing feature extraction on the combined image through the depth feature extraction module to obtain training features;

45. and adjusting the model parameters of the first feature extraction network through the training features and a preset loss function to obtain the second feature extraction network.

The preset loss function may be preset or default to the system, for example, the preset loss function may be a triple loss function. In specific implementation, as shown in fig. 1C, taking training data i as an example, the training data i is any one of training data in training of first vehicle training data or training of second vehicle training data, the electronic device may further reconstruct the training data i through the variable automatic encoder module VAE to obtain a reconstructed image, further determine that a residual image is determined through the training data i and the reconstructed image, and determine a combined image according to the reconstructed image and the residual image, and a backbone network of the depth feature extraction module may be a ResNet-50 backbone. And finally, adjusting model parameters of the first feature extraction network through the training features and a preset loss function to obtain a second feature extraction network.

In the embodiment of the present application, the feature extraction network may be composed of two modules: and a self-supervision residual error generation module and depth feature extraction. In a particular implementation, the input image may be passed through a VAE-based reconstruction module to remove vehicle-specific details, and the reconstructed image is then subtracted from the input image to form a residual image containing the vehicle-specific details. Then, a convex combination of input and residual (with trainable parameters α) is computed and subjected to depth feature extraction through re-id stems. The entire network is trained by triplets and cross-entropy losses and separated by a batch normalization layer (BN Neck).

Based on the above-described embodiments of the present application, the electronic device may be based on the flow of the method for extracting vehicle features of self-supervision attention, the input image is passed through a reconstruction module based on a Variable Automatic Encoder (VAE), the overall shape and structure of the vehicle image is created, while obscuring the distinguishing details, to generate a vehicle image template that is free of manufacturer logos, windshield stickers, wheel patterns, and grids, bumpers, and front/rear light designs. The reconstructed image can also be subtracted from the input image to highlight significant areas and eliminate background interferers to form a residual image containing vehicle specific details, and then a convex combination of the original input image and the residual (with a trainable parameter α) is computed and input to the ResNet-50 backbone for depth feature extraction. To generate robust discriminative features.

Further, optionally, the step 43 of determining a combined image according to the reconstructed image and the residual image may include the following steps:

431. determining a first feature point distribution density of the reconstructed image;

432. determining a second feature point distribution density of the training data i;

433. determining a ratio between the first feature point distribution density and the second feature point distribution density;

434. dividing the reconstructed image into a plurality of regions, and determining the distribution density of each characteristic point in the plurality of regions to obtain a plurality of characteristic point distribution densities;

435. determining a target mean square error according to the distribution densities of the plurality of feature points;

436. determining a target fine-tuning coefficient corresponding to the target mean square error according to a mapping relation between a preset mean square error and the fine-tuning coefficient;

437. fine-tuning the ratio according to the target fine-tuning coefficient to obtain a convex parameter;

438. determining the combined image according to the reconstructed image and the residual image of the convex parameters.

In a specific implementation, the electronic device may determine a first feature point distribution density of the reconstructed image, and specifically, may determine a total number of feature points of the reconstructed image and an image area, and use a ratio between the total number of feature points and the image area as the feature point distribution density. Similarly, the electronic device may further determine a second feature point distribution density of the training data i, and further determine a ratio between the first feature point distribution density and the second feature point distribution density, where the ratio is the first feature point distribution density/the second feature point distribution density.

Further, the reconstructed image may be divided into a plurality of regions, the distribution density of each feature point in the plurality of regions is determined, the distribution densities of the plurality of feature points are obtained, a target mean square error is determined according to the distribution densities of the plurality of feature points, the mean square error reflects the relevance between the regions, that is, the variation difference between the neighborhood region and the neighborhood region of the image is reflected, a mapping relationship between a preset mean square error and a fine tuning coefficient may be pre-stored in the electronic device, further, a target fine tuning coefficient corresponding to the target mean square error may be determined according to the mapping relationship between the preset mean square error and the fine tuning coefficient, and fine tuning is performed according to the target fine tuning coefficient contrast value, so as to obtain a convex parameter, wherein a specific calculation formula is as follows:

convex parameter (1+ target trimming coefficient) ratio

Finally, the electronic device may determine a combined image from the convex parameters, the reconstructed image, and the residual image, i.e., the combined image + the reconstructed image with convex parameters + the residual image.

105. And inputting the second query image into the second feature extraction network for feature extraction to obtain the query image features.

In specific implementation, the purpose of feature extraction is to learn strong and discriminative vehicle features to adapt to larger viewpoint changes among different cameras and to distinguish fine-grained details among similar vehicles.

In the embodiment of the application, the vehicle feature extraction method based on the self-supervision attention can automatically highlight the salient regions in the vehicle image, and the salient regions specific to the vehicle have key details which are important for identifying the vehicle with large visual angle change and distinguishing two vehicles with similar vision. Specifically, since a Variable Automatic Encoder (VAE) is designed, a vehicle image template free of a manufacturer logo, a windshield sticker, a wheel pattern, and a grid, a bumper, and a front/rear lamp design can be generated, and a residual image, which contains key details required for re-recognition and serves as pseudo-saliency or pseudo-attention for highlighting a region in an image, is constructed using an obtained rough template and its pixel difference from an original image. In addition, with the benefit of self-supervised attention generation, no additional feature points, attributes, etc. labels are required.

Optionally, in the step 105, inputting the second query image into the second feature extraction network for feature extraction, so as to obtain a query image feature, the method may include the following steps:

51. inputting the second query image into a backbone network for feature extraction to obtain a feature map;

52. carrying out global feature extraction on the feature map to obtain global features, wherein the global features are the most obvious global appearance representation of the vehicle;

53. local area detection is carried out on the second query image to obtain an area of interest;

54. dividing the global features into S multiplied by S grids, wherein S is an integer larger than 1;

55. projecting the region of interest to the S multiplied by S grids, and extracting local feature vectors of each projected grid by using local average pooling to obtain local features, wherein the local features are detailed features of a specific region of the vehicle; the local features in turn optimize the global feature representation in a back propagation process;

56. and performing aggregation operation on the global features and the local features to obtain the query image features.

In specific implementation, the electronic device may input the second query image to the backbone network to perform feature extraction, obtain a feature map, perform global feature extraction on the feature map to obtain global features, perform local feature extraction on the feature map to obtain local features, and perform aggregation operation on the global features and the local features to obtain query image features.

Specifically, as shown in fig. 1D, the whole feature extraction network may be composed of two modules, one is a global feature extraction module for extracting the most prominent global appearance representation of the vehicle, and the other is a local feature extraction module for extracting the detail features of a specific area of the vehicle. The local features are introduced by the global features, and the global feature representation is optimized by the local features in the back propagation process, so that the global features and the local region feature representation are enhanced mutually. And finally, performing feature aggregation on the extracted global features and the extracted local features to obtain the vehicle features finally comprising the vehicle global features and the local region features.

Further, as shown in fig. 1E, the input image may be subjected to local region detection to obtain a specific region of interest ROI (window, license plate, and emblem) of the vehicle target, then the ROI generated by the local detection module is projected onto the global feature map, the spatial size of the global feature map is S × S, S is an integer greater than 1, the input image is divided into S × S grids, each grid unit overlapping with the ROI is marked as a portion corresponding to the ROI, and then local feature vectors of each ROI projection region are extracted using local average pooling to obtain local features.

106. And inputting the second base image into the second feature extraction network for feature extraction to obtain base image features.

The electronic equipment can also input the second base image into a second feature extraction network for feature extraction to obtain base image features. The detailed description thereof may refer to step 105, which is not described herein again.

107. And matching the query image features with the image features of the bottom library to obtain matching results, and displaying the matching results.

In specific implementation, the electronic device may match the query image features with the image features of the base library to obtain a plurality of matching values, select K matching values before ranking, use the K matching values as matching results, and display the matching results, where K is a positive integer.

For example, as shown in fig. 1F, the electronic device may input the query image query and the base image gallery into the style conversion model respectively to generate a query image and a base image with a uniform style, so that a gallery picture with a style different from that of the query image is changed into a picture with a style same as that of the query image; furthermore, the query image and the bottom library image with unified styles can be respectively input into a feature extraction network to obtain query image features and bottom library image features with discriminative power; and then, the query image features are respectively matched with the image features in the base library, the feature similarity is calculated and sequenced, and topK vehicle images with the highest similarity in the base library are returned, so that the style difference of different images can be eliminated, the query and galery can achieve better matching, the accuracy of vehicle weight identification can be obviously improved in the whole process, and particularly, the images with larger differences in vehicle image style and vehicle angle and the images with similar brands, models and colors can be obviously improved.

In a possible example, between the above steps 101 to 102, the following steps may be further included:

b1, acquiring a target face image;

b2, carrying out image quality evaluation on the target face image to obtain a face image quality evaluation value;

b3, when the face image quality evaluation value is larger than the preset image quality evaluation value, executing step 102.

In this embodiment, the preset image quality evaluation value may be pre-stored in the electronic device, and may be set by the user or default by the system.

In specific implementation, the electronic device may perform image quality evaluation on the target face image by using at least one image quality evaluation index to obtain a face image quality evaluation value, where the image quality evaluation index may be at least one of the following: face deviation degree, face integrity degree, definition degree, feature point distribution density, average gradient, information entropy, signal-to-noise ratio and the like, which are not limited herein. The human face deviation degree is the deviation degree between the human face angle in the image and the human face angle of the front face, and the human face integrity degree is the ratio of the area of the human face in the image to the area of the complete human face.

In one possible example, the step B2, performing image quality evaluation on the target face image to obtain a face image quality evaluation value, may include the following steps:

b21, acquiring a target face deviation degree of a target face image, a target face integrity degree of the target face image, a target feature point distribution density of the target face image and a target information entropy;

b22, when the target face deviation degree is greater than a preset deviation degree and the target face integrity degree is greater than a preset integrity degree, determining a target first reference evaluation value corresponding to the target face deviation degree according to a mapping relation between the preset face deviation degree and the first reference evaluation value;

b23, determining a target second reference evaluation value corresponding to the target face integrity according to a preset mapping relation between the face integrity and the second reference evaluation value;

b24, determining a target weight pair corresponding to the target feature point distribution density according to a preset mapping relation between the feature point distribution density and the weight pair, wherein the target weight pair comprises a target first weight and a target second weight, the target first weight is a weight corresponding to the first reference evaluation value, and the target second weight is a weight corresponding to the second reference evaluation value;

b25, performing weighted operation according to the target first weight, the target second weight, the target first reference evaluation value and the target second reference evaluation value to obtain a first reference evaluation value;

b26, determining a first image quality evaluation value corresponding to the target feature point distribution density according to a preset mapping relation between the feature point distribution density and the image quality evaluation value;

b27, determining a target image quality deviation value corresponding to the target information entropy according to a mapping relation between a preset information entropy and an image quality deviation value;

b28, acquiring a first shooting parameter of the target face image;

b29, determining a target optimization coefficient corresponding to the first shooting parameter according to a preset mapping relation between the shooting parameter and the optimization coefficient;

b30, adjusting the first image quality evaluation value according to the target optimization coefficient and the target image quality deviation value to obtain a second reference evaluation value;

b31, acquiring target environment parameters corresponding to the target face image;

b32, determining a target weight coefficient pair corresponding to the target environment parameter according to a mapping relation between preset environment parameters and the weight coefficient pair, wherein the target weight coefficient pair comprises a target first weight coefficient and a target second weight coefficient, the target first weight coefficient is a weight coefficient corresponding to the first reference evaluation value, and the target second weight coefficient is a weight coefficient corresponding to the second reference evaluation value;

b33, performing weighting operation according to the target first weight coefficient, the target second weight coefficient, the first reference evaluation value and the second reference evaluation value, and obtaining a face image quality evaluation value of the target face image.

In the embodiment of the application, the preset deviation degree and the preset integrity degree can be set by a user or defaulted by a system, and the preset deviation degree and the preset integrity degree can be successfully recognized by the human face only if the preset deviation degree and the preset integrity degree are within a certain range. The electronic device may pre-store a mapping relationship between a preset face deviation degree and a first reference evaluation value, a mapping relationship between a preset face integrity degree and a second reference evaluation value, and a mapping relationship between a preset feature point distribution density and a weight pair, where the weight pair may include a first weight and a second weight, a sum of the first weight and the second weight is 1, the first weight is a weight corresponding to the first reference evaluation value, and the second weight is a weight corresponding to the second reference evaluation value. The electronic device may further store a mapping relationship between a preset feature point distribution density and an image quality evaluation value, a mapping relationship between a preset information entropy and an image quality deviation value, a mapping relationship between a preset shooting parameter and an optimization coefficient, and a mapping relationship between a preset environment parameter and a weight coefficient pair in advance. The weight coefficient pair may include a first weight coefficient and a second weight coefficient, the first weight coefficient is a weight coefficient corresponding to the first reference evaluation value, the second weight coefficient is a weight coefficient corresponding to the second reference evaluation value, and a sum of the first weight coefficient and the second weight coefficient is 1.

The value range of the image quality evaluation value can be 0-1, or 0-100. The image quality deviation value may be a positive real number, for example, 0 to 1, or may be greater than 1. The value range of the optimization coefficient can be-1 to 1, for example, the optimization coefficient can be-0.1 to 0.1. In the embodiment of the present application, the shooting parameter may be at least one of the following: exposure time, shooting mode, sensitivity ISO, white balance parameters, focal length, focus, region of interest, etc., without limitation. The environmental parameter may be at least one of: ambient brightness, ambient temperature, ambient humidity, weather, atmospheric pressure, magnetic field interference strength, etc., and are not limited thereto.

In specific implementation, the electronic device may obtain a target face deviation degree of a target face image, a target face integrity degree of the target face image, a target feature point distribution density of the target face image, and a target information entropy, where the target feature point distribution density may be a ratio between a total number of feature points of the target face image and an area of the target face image.

Furthermore, when the degree of deviation of the target face is greater than the preset degree of deviation and the degree of integrity of the target face is greater than the preset degree of integrity, the electronic device may determine a target first reference evaluation value corresponding to the degree of deviation of the target face according to a mapping relationship between the preset degree of deviation of the face and the first reference evaluation value, may also determine a target second reference evaluation value corresponding to the degree of integrity of the target face according to a mapping relationship between the preset degree of integrity of the face and the second reference evaluation value, and determine a target weight pair corresponding to the distribution density of the target feature points according to a mapping relationship between the preset feature point distribution density and the weight pair, where the target weight pair includes a target first weight and a target second weight, the target first weight is a weight corresponding to the first reference evaluation value, and the target second weight is a weight corresponding to the second reference evaluation value, and then, may determine the target first weight, the target second weight, the, And performing weighted operation on the target second weight, the target first reference evaluation value and the target second reference evaluation value to obtain a first reference evaluation value, wherein a specific calculation formula is as follows:

the first reference evaluation value is a target first reference evaluation value and a target first weight and the target second reference evaluation value is a target second weight

Furthermore, the quality of the image can be evaluated in terms of the human face angle and the human face integrity.

Further, the electronic device may determine a first image quality evaluation value corresponding to the target feature point distribution density according to a mapping relationship between a preset feature point distribution density and an image quality evaluation value, and determine a target image quality deviation value corresponding to the target information entropy according to a mapping relationship between a preset information entropy and an image quality deviation value. The electronic equipment can determine a target image quality deviation value corresponding to the target information entropy according to a mapping relation between the preset information entropy and the image quality deviation value, and because some noises are generated due to external (weather, light, angle, jitter and the like) or internal (system, GPU) reasons when an image is generated, and the noises can bring some influences on the image quality, the image quality can be adjusted to a certain degree, so that the objective evaluation on the image quality is ensured.

Further, the electronic device may further obtain a first shooting parameter of the target face image, determine a target optimization coefficient corresponding to the first shooting parameter according to a mapping relationship between preset shooting parameters and optimization coefficients, where the shooting parameter setting may also bring a certain influence on image quality evaluation, and therefore, it is necessary to determine an influence component of the shooting parameter on the image quality, and finally, adjust the first image quality evaluation value according to the target optimization coefficient and the target image quality deviation value to obtain a second reference evaluation value, where the second reference evaluation value may be obtained according to the following formula:

when the image quality evaluation value is a percentile system, the specific calculation formula is as follows:

second reference evaluation value ═ (first image quality evaluation value + target image quality deviation value) (1+ target optimization coefficient)

In the case where the image quality evaluation value is a percentage, the specific calculation formula is as follows:

the second reference evaluation value (first image quality evaluation value (1+ target image quality deviation value) (1+ target optimization coefficient))

Further, the electronic device may acquire a target environment parameter corresponding to the target face image, and determine a target weight coefficient pair corresponding to the target environment parameter according to a mapping relationship between a preset environment parameter and the weight coefficient pair, where the target weight coefficient pair includes a target first weight coefficient and a target second weight coefficient, the target first weight coefficient is a weight coefficient corresponding to the first reference evaluation value, and the target second weight coefficient is a weight coefficient corresponding to the second reference evaluation value, and further, may perform a weighting operation according to the target first weight coefficient, the target second weight coefficient, the first reference evaluation value, and the second reference evaluation value to obtain a face image quality evaluation value of the target face image, and the specific calculation formula is as follows:

the face image quality evaluation value of the target face image is equal to a first reference evaluation value, a target first weight coefficient and a second reference evaluation value, a target second weight coefficient

Therefore, the image quality can be objectively evaluated by combining the influences of internal and external environment factors, shooting setting factors, human face angles, integrity and the like, and the evaluation accuracy of the human face image quality is improved.

It can be seen that the vehicle weight recognition method described in the embodiment of the present application is applied to an electronic device, and is configured to obtain a first query image, a first base image, and first vehicle training data, input the first query image and the first base image to a preset style conversion model, respectively, obtain a second query image and a second base image, where the preset style conversion model is configured to convert input data into output data with a uniform style, input the first vehicle training data to the preset style conversion model, obtain second vehicle training data, train a first feature extraction network through the first vehicle training data and the second vehicle training data, obtain a second feature extraction network, input the second query image to the second feature extraction network for feature extraction, obtain a query image feature, input the second base image to the second feature extraction network for feature extraction, the method comprises the steps of obtaining the image characteristics of the base, matching the query image characteristics with the image characteristics of the base to obtain a matching result, displaying the matching result, eliminating style differences of different pictures, enabling query and galery to achieve better matching, extracting distinguishing characteristics specific to a vehicle, and remarkably improving accuracy of vehicle weight recognition in the whole process.

Referring to fig. 2, in keeping with the embodiment shown in fig. 1B, fig. 2 is a schematic flowchart of a vehicle re-identification method provided in the embodiment of the present application, applied to the electronic device shown in fig. 1A, where the vehicle re-identification method includes:

201. initial training data and corresponding label data are obtained, wherein the initial training data come from a plurality of cameras.

202. The method comprises the steps of obtaining a style conversion network of a generative confrontation network, wherein a style attention module is arranged in advance in a generator of the generative confrontation network of the style conversion network and is used for achieving the purpose that shallow image features are obtained to be attention features related to styles.

203. And using the initial training data and the corresponding label data to train the style conversion network to obtain a preset style conversion model.

204. A first query image, a first base library image and first vehicle training data are obtained.

205. And respectively inputting the first query image and the first base library image into the preset style conversion model to obtain a second query image and a second base library image, wherein the preset style conversion model is used for converting input data into output data with uniform style.

206. And inputting the first vehicle training data into the preset style conversion model to obtain second vehicle training data.

207. And training a first feature extraction network through the first vehicle training data and the second vehicle training data to obtain a second feature extraction network.

208. And inputting the second query image into the second feature extraction network for feature extraction to obtain the query image features.

209. And inputting the second base image into the second feature extraction network for feature extraction to obtain base image features.

210. And matching the query image features with the image features of the bottom library to obtain matching results, and displaying the matching results.

For the detailed description of the steps 201 to 210, reference may be made to the corresponding steps of the vehicle re-identification method described in the foregoing fig. 1B, and details are not repeated here.

It can be seen that the vehicle weight recognition device described in the embodiment of the application is applied to electronic equipment, can eliminate style differences of different pictures, enables query and galery to achieve better matching, can extract distinguishing features specific to a vehicle, can remarkably improve the accuracy of vehicle weight recognition in the whole process, and is particularly beneficial to improving the accuracy of vehicle weight recognition by images with larger differences in vehicle image style and vehicle angle and images with similar brands, models, colors and the like.

Referring to fig. 3, fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application, and as shown in the drawing, the electronic device includes a processor, a memory, a communication interface, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the processor, and in an embodiment of the present application, the programs include instructions for performing the following steps:

acquiring a first query image, a first base library image and first vehicle training data;

respectively inputting the first query image and the first base image into a preset style conversion model to obtain a second query image and a second base image, wherein the preset style conversion model is used for converting input data into output data with uniform style;

inputting the first vehicle training data into the preset style conversion model to obtain second vehicle training data;

training a first feature extraction network through the first vehicle training data and the second vehicle training data to obtain a second feature extraction network;

inputting the second query image into the second feature extraction network for feature extraction to obtain query image features;

inputting the second base image into the second feature extraction network for feature extraction to obtain base image features;

and matching the query image features with the image features of the bottom library to obtain matching results, and displaying the matching results.

It can be seen that, in the electronic device described in this embodiment of the present application, a first query image, a first base image, and first vehicle training data are obtained, the first query image and the first base image are respectively input to a preset style conversion model to obtain a second query image and a second base image, the preset style conversion model is used to convert input data into output data with a uniform style, the first vehicle training data is input to the preset style conversion model to obtain second vehicle training data, a first feature extraction network is trained through the first vehicle training data and the second vehicle training data to obtain a second feature extraction network, the second query image is input to the second feature extraction network to perform feature extraction to obtain query image features, the second base image is input to the second feature extraction network to perform feature extraction to obtain base image features, the query image features and the bottom library image features are matched to obtain a matching result, the matching result is displayed, style differences of different pictures can be eliminated, better matching between query and gallery can be achieved, distinguishing features specific to vehicles can be extracted, accuracy of vehicle weight recognition can be remarkably improved in the whole process, particularly images with larger differences in vehicle image styles and vehicle angles and images with similar brands, models, colors and the like can be improved, and accuracy of vehicle weight recognition is improved.

Optionally, the first feature extraction network comprises a variable automatic encoder module and a depth feature extraction module, and the program includes instructions for executing the following steps in terms of training the first feature extraction network by the first vehicle training data and the second vehicle training data to obtain a second feature extraction network:

reconstructing training data i through the variable automatic encoder module to obtain a reconstructed image, wherein the training data i is any training data in the first vehicle training data or the second vehicle training data;

determining a residual image through the training data i and the reconstructed image;

determining a combined image according to the reconstructed image and the residual image;

performing feature extraction on the combined image through the depth feature extraction module to obtain training features;

and adjusting the model parameters of the first feature extraction network through the training features and a preset loss function to obtain the second feature extraction network.

Optionally, in said determining a combined image from said reconstructed image and said residual image, the program comprises instructions for:

determining a first feature point distribution density of the reconstructed image;

determining a second feature point distribution density of the training data i;

determining a ratio between the first feature point distribution density and the second feature point distribution density;

dividing the reconstructed image into a plurality of regions, and determining the distribution density of each characteristic point in the plurality of regions to obtain a plurality of characteristic point distribution densities;

determining a target mean square error according to the distribution densities of the plurality of feature points;

determining a target fine-tuning coefficient corresponding to the target mean square error according to a mapping relation between a preset mean square error and the fine-tuning coefficient;

fine-tuning the ratio according to the target fine-tuning coefficient to obtain a convex parameter;

determining the combined image according to the reconstructed image and the residual image of the convex parameters.

Optionally, in the aspect that the second query image is input to the second feature extraction network for feature extraction, so as to obtain a query image feature, the program includes instructions for performing the following steps:

inputting the second query image into a backbone network for feature extraction to obtain a feature map;

carrying out global feature extraction on the feature map to obtain global features, wherein the global features are the most obvious global appearance representation of the vehicle;

local area detection is carried out on the second query image to obtain an area of interest;

dividing the global features into S multiplied by S grids, wherein S is an integer larger than 1;

projecting the region of interest to the S multiplied by S grids, and extracting local feature vectors of each projected grid by using local average pooling to obtain local features, wherein the local features are detailed features of a specific region of the vehicle; the local features in turn optimize the global feature representation in a back propagation process;

and performing aggregation operation on the global features and the local features to obtain the query image features.

Optionally, the program further comprises instructions for performing the steps of:

acquiring initial training data and corresponding label data, wherein the initial training data come from a plurality of cameras;

acquiring a style conversion network of a generative confrontation network, wherein a style attention module is preset in a generator of the generative confrontation network of the style conversion network and is used for acquiring shallow image characteristics and attention characteristics related to style;

and using the initial training data and the corresponding label data to train the style conversion network to obtain the preset style conversion model.

The above description has introduced the solution of the embodiment of the present application mainly from the perspective of the method-side implementation process. It is understood that in order to implement the above functions, it includes corresponding hardware structures and/or software modules for performing the respective functions. Those of skill in the art will readily appreciate that the present application is capable of hardware or a combination of hardware and computer software implementing the various illustrative elements and algorithm steps described in connection with the embodiments provided herein. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiment of the present application, the functional units may be divided according to the above method example, for example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. It should be noted that the division of the unit in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation.

Fig. 4 is a block diagram of functional units of a vehicle weight recognition apparatus 400 according to an embodiment of the present application, where the apparatus 400 is applied to an electronic device, and the apparatus 400 includes: an acquisition unit 401, a conversion unit 402, a training unit 403, an extraction unit 404 and a presentation unit 405, wherein,

the obtaining unit 401 is configured to obtain a first query image, a first base image, and first vehicle training data;

the conversion unit 402 is configured to input the first query image and the first base image into a preset style conversion model respectively to obtain a second query image and a second base image, where the preset style conversion model is configured to convert input data into output data with a uniform style;

the conversion unit 402 is further configured to input the first vehicle training data into the preset style conversion model to obtain second vehicle training data;

the training unit 403 is configured to train a first feature extraction network through the first vehicle training data and the second vehicle training data to obtain a second feature extraction network;

the extracting unit 404 is configured to input the second query image into the second feature extraction network to perform feature extraction, so as to obtain a query image feature;

the extracting unit 404 is further configured to input the second base image into the second feature extraction network for feature extraction, so as to obtain a base image feature;

the display unit 405 is configured to match the query image features with the image features of the base library to obtain a matching result, and display the matching result.

It can be seen that the vehicle weight recognition apparatus described in the embodiment of the present application is applied to an electronic device, and is configured to obtain a first query image, a first base image, and first vehicle training data, input the first query image and the first base image to a preset style conversion model, respectively, obtain a second query image and a second base image, where the preset style conversion model is configured to convert input data into output data with a uniform style, input the first vehicle training data to the preset style conversion model, obtain second vehicle training data, train a first feature extraction network through the first vehicle training data and the second vehicle training data, obtain a second feature extraction network, input the second query image to the second feature extraction network for feature extraction, obtain a query image feature, input the second base image to the second feature extraction network for feature extraction, the method comprises the steps of obtaining the image characteristics of the base, matching the query image characteristics with the image characteristics of the base to obtain a matching result, displaying the matching result, eliminating style differences of different pictures, enabling query and galery to achieve better matching, extracting distinguishing characteristics specific to a vehicle, and remarkably improving accuracy of vehicle weight recognition in the whole process.

Optionally, the first feature extraction network includes a variable automatic encoder module and a depth feature extraction module, and in terms of obtaining a second feature extraction network by training the first feature extraction network through the first vehicle training data and the second vehicle training data, the training unit 403 is specifically configured to:

reconstructing training data i through the variable automatic encoder module to obtain a reconstructed image, wherein the training data i is any training data in the first vehicle training data or the second vehicle training data;

determining a residual image through the training data i and the reconstructed image;

determining a combined image according to the reconstructed image and the residual image;

performing feature extraction on the combined image through the depth feature extraction module to obtain training features;

and adjusting the model parameters of the first feature extraction network through the training features and a preset loss function to obtain the second feature extraction network.

Optionally, in the aspect of determining a combined image according to the reconstructed image and the residual image, the training unit 403 is specifically configured to:

determining a first feature point distribution density of the reconstructed image;

determining a second feature point distribution density of the training data i;

determining a ratio between the first feature point distribution density and the second feature point distribution density;

dividing the reconstructed image into a plurality of regions, and determining the distribution density of each characteristic point in the plurality of regions to obtain a plurality of characteristic point distribution densities;

determining a target mean square error according to the distribution densities of the plurality of feature points;

determining a target fine-tuning coefficient corresponding to the target mean square error according to a mapping relation between a preset mean square error and the fine-tuning coefficient;

fine-tuning the ratio according to the target fine-tuning coefficient to obtain a convex parameter;

determining the combined image according to the reconstructed image and the residual image of the convex parameters.

Optionally, in respect that the second query image is input to the second feature extraction network for feature extraction to obtain a query image feature, the extraction unit 404 is specifically configured to:

inputting the second query image into a backbone network for feature extraction to obtain a feature map;

carrying out global feature extraction on the feature map to obtain global features, wherein the global features are the most obvious global appearance representation of the vehicle;

local area detection is carried out on the second query image to obtain an area of interest;

dividing the global features into S multiplied by S grids, wherein S is an integer larger than 1;

projecting the region of interest to the S multiplied by S grids, and extracting local feature vectors of each projected grid by using local average pooling to obtain local features, wherein the local features are detailed features of a specific region of the vehicle; the local features in turn optimize the global feature representation in a back propagation process;

and performing aggregation operation on the global features and the local features to obtain the query image features.

Optionally, the apparatus 400 is further specifically configured to:

acquiring initial training data and corresponding label data, wherein the initial training data come from a plurality of cameras;

acquiring a style conversion network of a generative confrontation network, wherein a style attention module is preset in a generator of the generative confrontation network of the style conversion network and is used for acquiring shallow image characteristics and attention characteristics related to style;

and using the initial training data and the corresponding label data to train the style conversion network to obtain the preset style conversion model.

It can be understood that the functions of the program modules of the vehicle weight recognition apparatus of this embodiment may be specifically implemented according to the method in the foregoing method embodiment, and the specific implementation process may refer to the related description of the foregoing method embodiment, which is not described herein again.

Embodiments of the present application also provide a computer storage medium, where the computer storage medium stores a computer program for electronic data exchange, the computer program enabling a computer to execute part or all of the steps of any one of the methods described in the above method embodiments, and the computer includes an electronic device.

Embodiments of the present application also provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps of any of the methods as described in the above method embodiments. The computer program product may be a software installation package, the computer comprising an electronic device.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the above-described division of the units is only one type of division of logical functions, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit may be stored in a computer readable memory if it is implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the above-mentioned method of the embodiments of the present application. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash Memory disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

27页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:图像识别方法、装置、设备及计算机可读介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!