Vehicle-mounted sensing equipment joint learning method for model structure optimization under edge calculation

文档序号:195776 发布日期:2021-11-02 浏览:33次 中文

阅读说明:本技术 边缘计算下模型结构优化的车载感知设备联合学习方法 (Vehicle-mounted sensing equipment joint learning method for model structure optimization under edge calculation ) 是由 黄泽茵 李贺 李柔仪 李培春 余荣 谭北海 朱璟 于 2021-07-12 设计创作,主要内容包括:本发明公开了一种边缘计算下模型结构优化的车载感知设备联合学习方法,包括:根据车载设备所采用的目标检测算法,建立适用于车载设备的神经网络模型作为本地模型,利用中心服务器提供的初始化参数,进行本地模型的训练,并进行本地梯度更新,得到更新后的梯度;对本地模型进行梯度稀疏化、量化本地梯度、无损压缩处理;将量化后的本地梯度和压缩后的二值化掩码矩阵以流水线的形式上传至中心服务器;在车载设备完成本地模型梯度压缩和上传后,由中心服务器进行逐神经元梯度聚合;通过车载设备获取全局的聚合梯度,对本地模型进行更新,利用更新后的模型进行道路感知。(The invention discloses a vehicle-mounted sensing equipment joint learning method for model structure optimization under edge calculation, which comprises the following steps: establishing a neural network model suitable for the vehicle-mounted equipment as a local model according to a target detection algorithm adopted by the vehicle-mounted equipment, training the local model by using an initialization parameter provided by a central server, and updating a local gradient to obtain an updated gradient; carrying out gradient sparsification, local gradient quantification and lossless compression treatment on the local model; uploading the quantized local gradient and the compressed binary mask matrix to a central server in a pipeline form; after the vehicle-mounted equipment completes the gradient compression and uploading of the local model, the central server performs neuron-by-neuron gradient aggregation; and acquiring a global aggregation gradient through the vehicle-mounted equipment, updating the local model, and sensing the road by using the updated model.)

1. A vehicle-mounted sensing device joint learning method for model structure optimization under edge calculation is characterized by comprising the following steps:

step 1, local training of model

Establishing a neural network model suitable for the vehicle-mounted equipment as a local model according to a target detection algorithm adopted by the vehicle-mounted equipment, training the local model by using an initialization parameter provided by a central server, and updating a local gradient to obtain an updated gradient;

step 2, model structured compression

Step 2.1, gradient sparsification

And (3) sparsifying the local gradient and obtaining a binary mask, wherein the sparsifying process is carried out layer by layer aiming at the local model:

the L2 norm of each convolution kernel is first calculated:

wherein xiRefers to the parameter of the ith convolution kernel, n is volumeThe total number of the kernels;

secondly, setting the gradient of a convolution kernel with a smaller norm to zero by each convolution kernel according to a set sparsification rate, outputting a nonzero gradient tensor after removing the zero gradient tensor, and simultaneously performing gradient binaryzation on n convolution kernels, setting a zero gradient parameter to be 0 and setting a nonzero gradient parameter to be 1 so as to output a binaryzation mask matrix;

step 2.2, quantifying local gradients

Quantizing the thinned local gradient, and setting a fixed quantization bit width for the convolution layer and the full link layer:

firstly, clustering nonzero gradients, taking nonzero gradient parameters with similar values as a class, taking weighted average of the same class of parameters to obtain a clustering center true value of the gradient of the layer, wherein each parameter in the same class shares a true value, and only storing an index value corresponding to the value;

secondly, constructing a coding correspondence table, expressing the real values by simple binary index values, and directly corresponding the real values to the index values one by one, thereby reducing the bit width reduction of quantization; the quantized local gradient is;

step 2.3, lossless compression

The specific compression process is divided into two parts: firstly, coding the quantized local gradient, and secondly, compressing a binary mask matrix by using a matrix sparse representation method; wherein, the coding is divided into two steps:

firstly, constructing a binary tree:

according to the frequency of each index value appearing in all index values in step 2.2, two index values with the lowest frequency are taken to construct an initial binary tree, the frequency values of the two are added to be used as the frequency value of a new element, then the frequency values are compared with other index values, two smallest continuous additions are sequentially taken to construct the whole binary tree with weight;

secondly, coding is carried out:

the left branch encoding in the branch of the binary tree is 0, the right branch is 1, and the binary tree is traversed to obtain the encoding of all characters; the higher the frequency of occurrence of a character, the shorter its code will be at the upper layer; the lower the frequency of occurrence, the longer the code, the longer the character will be at the lower layer, and the whole storage space is maximally reduced;

step 3, pipeline compression transmission

Uploading the quantized local gradient and the compressed binary mask matrix to a central server in a pipeline form;

step 4, neuron-by-neuron polymerization

After the vehicle-mounted equipment completes the gradient compression and uploading of the local model, the central server performs gradient aggregation:

the superscript of each gradient parameter is represented by K, and the gradient of the uncompressed local model has K parameters; in the j global iteration, the local compressed gradient uploaded by the vehicle-mounted device i is represented as And the corresponding mask matrix isGlobal polymerization gradient ΔjThe aggregation can be performed in a neuron-by-neuron manner and is denoted asEach element of the global gradient can be calculated by the following formula:

whereinThe local data size of the ith vehicle-mounted device is represented;

obtaining a global polymerization gradient through neuron-by-neuron polymerization; in the global aggregation gradient, some weights are cut in the compression process, and the reserved weights are aggregated according to a weighted average mode to serve as the update weights of the global aggregation gradient;

and 5, acquiring the global aggregation gradient by the vehicle-mounted equipment, updating the local model, performing road perception by using the updated model, and improving the performance of road perception by using the real-time updated model.

2. The vehicle-mounted sensing device joint learning method for model structure optimization under edge computing according to claim 1, wherein the training of the local model is performed by using initialization parameters provided by the central server, and the local gradient is updated to obtain an updated gradient, and the method comprises the following steps:

step 1.1, initializing parameters of the neural network model at the central server to obtain initialized model parametersObtaining initialized global model parameters by performing iterative training on the model j times through the target data setStoring in a central server;

step 1.2, the vehicle-mounted equipment i downloads the parameters for initializing the global model from the central serverContinuously acquiring image data as private training data of each vehicle-mounted deviceInputting the parameters into a neural network model to continuously perform local update training to obtain new model parametersTo promote the local model effect, the local update gradient is:

3. the vehicle-mounted sensing device joint learning method for model structure optimization under edge calculation according to claim 1, wherein the parameter size is Oout×OinX k convolutional layer gradient, wherein Oout,OinK represents the number of output channels, the number of input channels, and the size of the convolution kernel, respectively, where a two-dimensional parameter of size k × k is defined as one convolution kernel and the size is OinThe three-dimensional parameter x k is defined as a convolution filter.

4. The vehicle-mounted sensing device joint learning method for model structure optimization under edge computing according to claim 1, wherein bit widths of a convolution layer and a full link layer in a local model are respectively fixed to 4 bits and 2 bits.

5. The vehicle-mounted sensing device joint learning method for model structure optimization under edge computing according to claim 1, wherein the pipeline compression transmission comprises:

when a plurality of vehicle-mounted devices carry out local model training, after model structural compression of a first layer of a neural network is completed for a certain vehicle-mounted device, a compression result of the first layer can be transmitted immediately; meanwhile, the rest vehicle-mounted equipment repeats the operations in a pipeline mode so as to quickly upload the local update model to the cloud server.

6. The utility model provides a model structure optimization's on-vehicle perception equipment joint learning device under edge calculation which characterized in that includes:

the model local training module is used for establishing a neural network model suitable for the vehicle-mounted equipment as a local model according to a target detection algorithm adopted by the vehicle-mounted equipment, training the local model by using an initialization parameter provided by the central server, and updating a local gradient to obtain an updated gradient;

the model structured compression module is used for carrying out gradient sparsification, local gradient quantification and lossless compression treatment on the local model;

the assembly line compression transmission module is used for uploading the quantized local gradient and the compressed binary mask matrix to the central server in an assembly line form;

the neuron-by-neuron aggregation module is used for performing neuron-by-neuron gradient aggregation by the central server after the vehicle-mounted equipment completes local model gradient compression and uploading;

and the road perception module is used for acquiring the global aggregation gradient through the vehicle-mounted equipment, updating the local model and carrying out road perception by using the updated model.

7. Terminal device comprising a memory, a processor and a computer program stored in said memory and executable on said processor, characterized in that the processor implements the steps of the model structure optimized joint learning method for vehicle mounted sensing devices under edge computing according to any of claims 1 to 6 when executing the computer program.

8. A computer-readable storage medium, in which a computer program is stored, and the computer program, when being executed by a processor, implements the steps of the model structure optimized vehicle-mounted sensing device joint learning method under edge computing according to any one of claims 1 to 6.

Technical Field

The invention relates to the field of updating of edge intelligent equipment, in particular to a vehicle-mounted sensing equipment joint learning method for model structure optimization under edge calculation.

Background

In recent years, the artificial intelligence computing technology for privacy protection with joint learning as the core and the joint learning in the marginal computing environment have received high attention from each ministry of the country and each department of research and development, and how to improve the efficiency of the joint learning as much as possible by using limited computing, communication, data and energy consumption resources becomes an important core problem in the scene. However, with the rapid development of intelligent application in the edge computing environment, the joint learning faces the difficult problem of lack of computation, communication and data resources, from the model localization update of the neural network to the hierarchical transmission of the gradient parameters, the communication network management and the intelligent computation in the joint learning system cannot be fully blended. Many current researches simplify the joint learning into a model of calculation and communication, convert the optimization problem of the joint learning into the traditional scheduling problem of communication and calculation, and seriously limit the development and progress of the joint learning with efficient resource utilization. Therefore, the whole process is optimized from model structure compression to parameter aggregation, and the efficiency of joint learning under the limited resource scene is necessarily improved.

Since the onboard edge devices have inherent heterogeneous properties in terms of computing power, communication conditions, data distribution, etc., the inherent characteristics of these edge computations degrade the performance of joint learning. The existing joint learning research mainly focuses on an algorithm, the heterogeneity of communication conditions of vehicle-mounted equipment is not considered, and the global iteration time of joint learning is determined by the vehicle-mounted equipment with the worst performance, so that the time delay overhead of a training process is increased due to heterogeneous communication and calculation; in model iteration, hundreds of rounds of communication may be needed between a central server and a client, the difficulty of resource-intensive mobile clients is not overcome by deploying joint learning in mobile edge computing, the existing solution cannot well eliminate the difficulties of computing, communication and data resource shortage, and the client is limited to use the same neural architecture training model.

The existing neural network compression scheme of the vehicle-mounted edge device compresses only the upstream communication from the client to the server (the downstream communication is kept uncompressed), or performs well only under an ideal condition, and the compression limit is large. Meanwhile, various clients with different computing and communication capabilities can occur, and the neural architecture is difficult to adapt to the hardware configuration of the client.

Disclosure of Invention

The invention aims to provide a vehicle-mounted sensing equipment joint learning method for model structure optimization under edge calculation, which enables mass vehicle-mounted sensing equipment to participate in joint learning together, improves learning efficiency under a limited resource scene, and further improves road sensing performance of the vehicle-mounted equipment.

In order to realize the task, the invention adopts the following technical scheme:

a vehicle-mounted sensing device joint learning method for model structure optimization under edge calculation comprises the following steps:

step 1, local training of model

Establishing a neural network model suitable for the vehicle-mounted equipment as a local model according to a target detection algorithm adopted by the vehicle-mounted equipment, training the local model by using an initialization parameter provided by a central server, and updating the local gradient to obtain an updated gradient deltai,j+1

Step 2, model structured compression

Step 2.1, gradient sparsification

And (3) sparsifying the local gradient and obtaining a binary mask, wherein the sparsifying process is carried out layer by layer aiming at the local model:

the L2 norm of each convolution kernel is first calculated:

wherein xiThe parameter of the ith convolution kernel is referred, and n is the total number of the convolution kernels;

secondly, each convolution kernel sets the gradient of the convolution kernel with smaller norm to zero according to the set sparsification rate, the nonzero gradient tensor is output after the zero gradient tensor is removed, and simultaneously n convolution kernels are subjected to gradient binarization, the zero gradient parameter is set to 0, and the nonzero gradient parameter is set to 1, so that a binarization mask matrix M is outputi,j+1

Step 2.2, quantifying local gradients

Quantizing the thinned local gradient, and setting a fixed quantization bit width for the convolution layer and the full link layer:

firstly, clustering nonzero gradients, taking nonzero gradient parameters with similar values as a class, taking weighted average of the same class of parameters to obtain a clustering center true value of the gradient of the layer, wherein each parameter in the same class shares a true value, and only storing an index value corresponding to the value;

secondly, constructing a coding correspondence table, expressing the real values by simple binary index values, and directly corresponding the real values to the index values one by one, thereby reducing the bit width reduction of quantization; the quantized local gradient is

Step 2.3, lossless compression

The specific compression process is divided into two parts: one is to quantize the local gradientCoding, the second is to use the sparse matrix representation method to carry out the binary mask matrix Mi,j+1Is compressed(ii) a Wherein, the coding is divided into two steps:

firstly, constructing a binary tree:

according to the frequency of each index value appearing in all index values in step 2.2, two index values with the lowest frequency are taken to construct an initial binary tree, the frequency values of the two are added to be used as the frequency value of a new element, then the frequency values are compared with other index values, two smallest continuous additions are sequentially taken to construct the whole binary tree with weight;

secondly, coding is carried out:

the left branch encoding in the branch of the binary tree is 0, the right branch is 1, and the binary tree is traversed to obtain the encoding of all characters; the higher the frequency of occurrence of a character, the shorter its code will be at the upper layer; the lower the frequency of occurrence, the longer the code, the longer the character will be at the lower layer, and the whole storage space is maximally reduced;

step 3, pipeline compression transmission

Uploading the quantized local gradient and the compressed binary mask matrix to a central server in a pipeline form;

step 4, neuron-by-neuron polymerization

After the vehicle-mounted equipment completes the gradient compression and uploading of the local model, the central server performs gradient aggregation:

the superscript of each gradient parameter is represented by K, and the gradient of the uncompressed local model has K parameters; in the j global iteration, the local compressed gradient uploaded by the vehicle-mounted device i is represented as And the corresponding mask matrix isGlobal polymerization gradient ΔjThe aggregation can be performed in a neuron-by-neuron manner and is denoted asEach element of the global gradient can be calculated by the following formula:

whereinThe local data size of the ith vehicle-mounted device is represented;

obtaining a global polymerization gradient through neuron-by-neuron polymerization; in the global aggregation gradient, some weights are cut in the compression process, and the reserved weights are aggregated according to a weighted average mode to serve as the update weights of the global aggregation gradient;

and 5, acquiring the global aggregation gradient by the vehicle-mounted equipment, updating the local model, performing road perception by using the updated model, and improving the performance of road perception by using the real-time updated model.

Further, the initialization parameters provided by the central server are utilized to train the local model, and the local gradient is updated to obtain the updated gradient deltai,j+1The method comprises the following steps:

step 1.1, initializing parameters of the neural network model at the central server to obtain initialized model parametersObtaining initialized global model parameters by performing iterative training on the model j times through the target data setStoring in a central server;

step 1.2, the vehicle-mounted equipment i downloads the parameters for initializing the global model from the central serverContinuously collecting image data as private training for each vehicle-mounted deviceData ofInputting the parameters into a neural network model to continuously perform local update training to obtain new model parametersTo promote the local model effect, the local update gradient is:

further, for a parameter size of Oout×OinX k convolutional layer gradient, wherein Oout,OinK represents the number of output channels, the number of input channels, and the size of the convolution kernel, respectively, where a two-dimensional parameter of size k × k is defined as one convolution kernel and the size is OinThe three-dimensional parameter x k is defined as a convolution filter.

Further, the bit widths of the convolutional layer and the fully-connected layer in the local model are respectively fixed to be 4 bits and 2 bits.

Further, the pipeline compression transmission includes:

when a plurality of vehicle-mounted devices carry out local model training, after model structural compression of a first layer of a neural network is completed for a certain vehicle-mounted device, a compression result of the first layer can be transmitted immediately; meanwhile, the rest vehicle-mounted equipment repeats the operations in a pipeline mode so as to quickly upload the local update model to the cloud server.

A vehicle-mounted sensing device joint learning device for model structure optimization under edge calculation comprises:

the model local training module is used for establishing a neural network model suitable for the vehicle-mounted equipment as a local model according to a target detection algorithm adopted by the vehicle-mounted equipment, training the local model by using an initialization parameter provided by the central server, and updating a local gradient to obtain an updated gradient;

the model structured compression module is used for carrying out gradient sparsification, local gradient quantification and lossless compression treatment on the local model;

the assembly line compression transmission module is used for uploading the quantized local gradient and the compressed binary mask matrix to the central server in an assembly line form;

the neuron-by-neuron aggregation module is used for performing neuron-by-neuron gradient aggregation by the central server after the vehicle-mounted equipment completes local model gradient compression and uploading;

and the road perception module is used for acquiring the global aggregation gradient through the vehicle-mounted equipment, updating the local model and carrying out road perception by using the updated model.

The terminal device comprises a memory, a processor and a computer program which is stored in the memory and can run on the processor, and the processor realizes the steps of the vehicle-mounted sensing device joint learning method for model structure optimization under edge calculation when executing the computer program.

A computer readable storage medium stores a computer program, and the computer program is executed by a processor to realize the steps of the model structure optimized vehicle-mounted sensing device joint learning method under the edge calculation.

Compared with the prior art, the invention has the following technical characteristics:

1. the model of modern deep learning is usually very large, and the capacity of the neural network is greatly reduced by compressing the neural network model through sparsification and coding based on a convolution kernel; in the joint learning, before the global model is converged, hundreds of times of communication is needed between the terminal equipment and the central server for model transmission, and the design adopts a multi-process pipeline working mode to stack two working processes of compression and transmission when the model is uploaded, so that the communication efficiency of the model is improved. The method is developed around an optimization technology based on a neural network structure, joint learning in an edge computing environment is optimized from an internal structure of the neural network, and more vehicle-mounted sensing devices are enabled to participate in the joint learning together in a limited communication and computing resource scene.

2. The invention provides a gradient compression scheme combining dynamic gradient pruning, fixed quantization and lossless coding, and differential gradient compression ratios are used among vehicle-mounted devices. In the design scheme, the gradient update of the local model is dynamically compressed, so that the compression ratio is dynamically changed along with the resource state of the training node, a plurality of devices use different compression ratios in the same global iteration, the barrel effect of training time delay in the joint learning is reduced, and the huge time delay caused by heterogeneous communication environment in the edge environment is overcome.

3. The invention provides a combined learning scheme of neuron-by-neuron aggregation, which aims at model parameters generated after different edge nodes are compressed to perform neuron-by-neuron aggregation, can train a model without directly accessing data, keep the data at an original position, only upload training network parameters by edge nodes, distribute the aggregated parameters to equipment for continuous training, and carry out cyclic reciprocating until the model converges, so that the whole process does not directly contact with user information, and the safety of the information can be fully ensured. In addition, the central server aggregates the model gradients of a plurality of different compression ratios, so as to obtain an updated global model.

Drawings

FIG. 1 is a schematic diagram of structured compression of a model;

FIG. 2 is a pipeline gradient compression and transmission scheme based on multiple threads;

FIG. 3 is a schematic diagram of a neuron-wise gradient polymerization.

Detailed Description

The invention provides a vehicle-mounted sensing equipment joint learning method based on model structure optimization in an edge computing environment, which is mainly applied to the optimization of model training in vehicle-mounted sensing equipment, adopts an elasticated gradient compression strategy in the aspect of model structure optimization, and allows a plurality of training nodes to use different compression strategies in the same global iteration; after the training nodes finish the localized model updating and gradient compression, the joint learning is carried out in the server, the optimized model is uploaded to the central server by adopting a multi-process pipeline working mode, then the weighted average aggregation is carried out through a neuron-by-neuron gradient aggregation strategy, the server is subjected to unified processing, a centralized optimization algorithm is obtained and returned to the edge nodes, the iterative learning is carried out continuously, and finally the optimal joint training model is obtained, so that the method is applied to mass mobile vehicle-mounted equipment and the high-energy-efficiency edge calculation is realized.

Referring to the attached drawings, the vehicle-mounted sensing equipment joint learning method for model structure optimization under edge calculation comprises the following steps:

step 1, local training of model

Step 1.1, establishing a neural network model suitable for the vehicle-mounted equipment as a local model according to a target detection algorithm adopted by the vehicle-mounted equipment, and performing parameter initialization on the neural network model at a central server to obtain initialized model parametersObtaining initialized global model parameters by performing iterative training on the model j times through the target data setStored in the central server.

The target detection algorithm may be, for example, yolov3 algorithm, and the target data set is composed of road scene images acquired from driving perspectives.

Step 1.2, the vehicle-mounted equipment i downloads the parameters for initializing the global model from the central serverContinuously collecting image data as private training data of each vehicle-mounted device (edge node)Inputting the parameters into a neural network model to continuously perform local update training to obtain new model parametersTo promote the local model effect, the local update gradient is:

the model is then optimized as follows and transmitted back to the cloud server for aggregation.

Step 2, model structured compression

Step 2.1, gradient sparsification

The invention provides a convolution kernel-based sparsification method, which sparsifies local gradients and obtains a binary mask, and the form of an algorithm table is as follows:

Mi,j+1=Sp(Δi,j+1)

for a parameter size of Oout×OinX k convolutional layer gradient, wherein Oout,OinK respectively represents the number of output channels, the number of input channels and the size of a convolution kernel, the invention defines two-dimensional parameters with the size of k multiplied by k as a convolution kernel (namely, each convolution kernel has k multiplied by k parameters), and the two-dimensional parameters with the size of OinThe three-dimensional parameter of x k × k is defined as a convolution filter, Sp () represents a gradient sparsification operation, and the whole sparsification process is performed layer by layer, as shown in fig. 1, the specific sparsification process is as follows:

the L2 norm of each convolution kernel is first calculated:

wherein xiRefers to the parameter of the ith convolution kernel, and n is the total number of convolution kernels.

Secondly, each convolution kernel sets the gradient of the convolution kernel with smaller norm to zero according to the set sparsification rate, the nonzero gradient tensor is output after the zero gradient tensor is removed, and simultaneously n convolution kernels are subjected to gradient binarization, the zero gradient parameter is set to 0, and the nonzero gradient parameter is set to 1, so that a binarization mask matrix M is outputi,j+1. Wherein the magnitude of the non-zero gradient tensor is related to the sparsification rate and the binary maskThe rows and columns of (a) depend on the number of input and output channels.

Step 2.2, quantifying local gradients

And (3) quantizing the local gradient after the sparsification in the step 2.1, wherein the algorithm is represented as follows:

wherein M isi,jRepresenting the binary mask matrix, Δ, produced by node i in the j-th iterationi,j+1Global model based parameters for onboard device iThe updated local gradient, and the operator |, indicates the element-by-element corresponding multiplication between the two high-dimensional vectors; qt () represents the local gradient quantization process.

In the invention, fixed quantization bit widths are set for the convolution layer and the full link layer, and the bit widths of the convolution layer and the full link layer are respectively fixed as 4 bits and 2 bits; taking 2 bits as an example, the operation flow is shown in fig. 1, and the specific quantization operation is as follows:

firstly, clustering nonzero gradients, taking nonzero gradient parameters with similar values as a class, taking weighted average of the same class of parameters to obtain a clustering center true value of the gradient of the layer, wherein each parameter in the same class shares a true value, and only storing an index value corresponding to the value;

secondly, constructing a coding correspondence table, expressing the real values by simple binary index values, and directly corresponding the real values to the index values one by one, thereby reducing the 32-bit quantization bit width to 4 bits and 2 bits and greatly reducing the size of the gradient; the quantized local gradient is

Step 2.3, lossless compression

The specific compression process is divided into two parts: one is to quantify the local gradient obtained in step 2.2Coding is carried out, and secondly, the binary mask matrix M generated in the step 2.1 is subjected to the binary mask matrix M by utilizing a matrix sparse representation methodi,j+1Compressing; wherein, the coding is divided into two steps:

firstly, constructing a binary tree:

and 2.2, according to the frequency of each index value in all the index values in the step 2.2, taking two index values with the lowest frequency to construct an initial binary tree, adding the frequency values of the two to be used as the frequency value of a new element, comparing the frequency values with other index values, sequentially taking two smallest continuous additions, and constructing the whole binary tree with the weight (namely the frequency value).

Since the gradient generated by the training of the neural network is mostly close to zero, the frequency of the index value in step 2.2 is also very different, so that the compressed gradient is further processed by encoding.

Secondly, coding is carried out:

the left branch encoding in the branch of the binary tree is 0, the right branch is 1, and the binary tree is traversed to obtain the encoding of all characters; the higher the frequency of occurrence of a character, the shorter its code will be at the upper layer; the lower the occurrence frequency of characters, the longer the codes, and the maximum reduction of the whole storage space.

Step 3, pipeline compression transmission

The quantized local gradient and the compressed binary mask matrix are uploaded to a central server, and the calculation delay required by the compressed gradient can be reduced to a negligible degree by adopting a multi-process-based pipeline working mode. The two works of gradient compression and gradient uploading in step 2 are superimposed through a Pipeline mechanism (Pipeline mechanism), and a specific schematic diagram is shown in fig. 2.

When a plurality of vehicle-mounted devices carry out local model training, after gradient compression of a first layer of a neural network is completed for a certain vehicle-mounted device, the compression gradient of the first layer can be transmitted immediately; meanwhile, the other vehicle-mounted equipment is subjected to gradient compression operation and then transmitted, and the other vehicle-mounted equipment repeats the operation in a pipeline mode so as to quickly upload the local update model to the cloud server.

Step 4, neuron-by-neuron polymerization

After the vehicle-mounted equipment completes the gradient compression and uploading of the local model, the central server performs gradient aggregation. And the gradient uploaded by each edge device is cut to different degrees, and the aggregation cannot be directly carried out, so the scheme provides a neuron-by-neuron gradient aggregation method.

Specifically, the superscript for each gradient parameter is denoted by K, and the gradient of the uncompressed local model has a total of K parameters. In the j global iteration, the local compressed gradient uploaded by the vehicle-mounted device i is represented asAnd the corresponding mask matrix isGlobal polymerization gradient ΔjThe aggregation can be performed in a neuron-by-neuron manner and is denoted asEach element of the global gradient can be calculated by the following formula:

whereinAnd the local data size of the ith vehicle-mounted device is represented.

By neuron-by-neuron aggregation, a global gradient of aggregation is obtained, as shown in FIG. 3. In the global aggregation gradient, the weight of each node is from different vehicle-mounted devices, some weights are cut in the compression process, the reserved weights are aggregated as the update weight of the global aggregation gradient only in a weighted average mode, and if the weights are not updated, the gradient value is zero. In the engineering implementation, the method can be implemented in a vector and parallel mode, so that the delay overhead of the gradient aggregation can be ignored.

Step 5, performing operations such as local training, structured compression, assembly line uploading, neuron-by-neuron aggregation and the like on the model to finally complete the joint learning process of the vehicle-mounted sensing equipment; the vehicle-mounted equipment acquires the global aggregate gradient, updates the local model, and utilizes the updated model to sense the road, so that the performance of road sensing can be improved by utilizing the real-time updated model.

Based on the neural network structure optimization technology, aiming at limited calculation, communication and data resources in the edge computing environment, a combined learning framework based on model structure compression is designed, the combined learning framework is optimized from the calculation angle, the calculation overhead required by the vehicle-mounted edge device localization training is reduced, the convergence time of the combined learning is shortened, meanwhile, privacy protection is provided for the vehicle-mounted edge device, and the bandwidth resources in the training process are greatly reduced; a neuron-by-neuron joint learning method is adopted during model aggregation, the problem that the model structure is incomplete and cannot be directly aggregated is solved, the joint learning efficiency is finally improved, and the optimal utilization of local resources is realized.

And secondly, the flexible joint learning framework provided by the invention can adopt different gradient compression ratios for different vehicle-mounted edge devices, and optimizes the joint learning according to the hardware configuration, channel conditions and training data size of the terminal device. By the method, the complexity of model inference can be reduced, so that the edge terminal can effectively complete the localization inference of the light weight neural network; a plurality of submodels can be effectively scheduled at a central server, so that the utilization rate of training data among different submodels is balanced, and heterogeneous communication and computing resources in the edge computing environment are effectively overcome; meanwhile, the training node can select a smaller gradient compression ratio in a state of sufficient resources, so that the global model precision of single global iteration is improved, a larger gradient compression ratio is selected in a state of resource shortage, and bandwidth resources required by model transmission are greatly reduced.

Finally, on the basis of the prior art, the invention uses a pipeline compression transmission method based on multithreading when the model parameters are uploaded, and superposes two works through a pipeline mechanism, so that the model compression and the transmission are almost synchronously carried out, the time delay expense brought by the gradient compression is reduced to the negligible degree, the communication speed is greatly improved, and the communication cost is reduced.

The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

13页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种异常邮件检测方法、装置、电子设备及存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类