Federal learning method with robustness to data pollution in wireless edge network

文档序号：1938685 发布日期：2021-12-07 浏览：22次中文

阅读说明：本技术 无线边缘网络中对数据污染具有鲁棒性的联邦学习方法 (Federal learning method with robustness to data pollution in wireless edge network ) 是由李文玲李钰浩刘杨于 2021-09-07 设计创作，主要内容包括：本发明公开了一种无线边缘网络中对数据污染具有鲁棒性的联邦学习方法,包括以下步骤：搭建模型结构,并初始化全局参数；中心服务器将全局参数广播至无线边缘网络的客户端,客户端以全局参数作为本轮训练初值；各客户端计算梯度值并进一步更新偏差系数各客户端更新迭代系数各客户端更新一阶动量与二阶动量各客户端更新模型参数重复步骤三至六,至迭代次数达到预设值；各客户端上传本地参数至中心服务器；中心服务器接收各客户端的本地参数并聚合,得到更新后的全局参数；重复步骤二至九,直至全局模型性能达到要求。该联邦学习方法能够提高算法面对有毒数据时的鲁棒性,并减少由本地模型差异性造成的性能影响。(The invention discloses a federal learning method with robustness to data pollution in a wireless edge network, which comprises the following steps: building a model structure and initializing global parameters; the central server broadcasts the global parameters to the client side of the wireless edge network, and the client side takes the global parameters as initial training values of the current round; each client calculates the gradient value And further updating the deviation coefficient Updating iteration coefficient by each client Updating first-order momentum by each client And second order motionMeasurement of Updating model parameters by each client Repeating the third step to the sixth step until the iteration times reach a preset value; uploading local parameters by each client To a central server; the central server receives and aggregates the local parameters of the clients to obtain updated global parameters; and repeating the steps from two to nine until the performance of the global model meets the requirement. The federated learning method can improve the robustness of the algorithm in the face of toxic data and reduce the performance impact caused by local model differences.)

1. A federated learning method robust to data pollution in a wireless edge network is characterized by comprising the following steps:

the method comprises the following steps: building a model structure for learning, and initializing global parameters, wherein the model structure comprises the following steps: global model parameters, global first-order momentum and global second-order momentum;

step two: the central server broadcasts the global parameters to the client side of the wireless edge network, and the client side takes the global parameters as initial values of the training in the current round;

step three: the client utilizes the historical moment model parametersObtaining gradient values on the local data set and obtaining deviation coefficients of the gradient values and the historical first-order momentum

Step four: the client updates the second-order momentum iteration coefficient

Step five: the client utilizes a bias coefficientCoefficient of iterationGradient valueAnd historical momentum valuesUpdating first order momentumAnd second order momentum

Step six: the client updates the module by utilizing the updated first and second order momentumForm parameter

Step seven: repeating the third step to the sixth step until the iteration times reach a preset iteration threshold;

step eight: the client uploads local model parametersFirst order momentumAnd second order momentumTo a central server;

step nine: the central server receives the local parameters of the client and carries out parameter aggregation to obtain updated global parameters x^t、m^t、v^t；

Step ten: and repeating the second step to the ninth step until the performance of the global model meets the requirement.

2. The federal learning method as claimed in claim 1, wherein the client in step two takes global parameters as initial values of the current training cycle, and the initial values are expressed as follows:

wherein, the subscript i represents the ith client, the superscript t' represents the initial time of the current round of training, and x^t′For the initial time global model parameters, m^t′Global first-order momentum, v, for the initial moment^t′Is the initial moment global second-order momentum.

3. The federal learning method as claimed in claim 1, wherein the bias coefficients in step threeThe update mode is expressed as follows:

the subscript i represents the ith client, the superscript t represents the current iteration moment, d is the vector dimension, the subscript j represents the jth component of the vector, g represents the gradient value, m represents the first-order momentum, v represents the second-order momentum, and the gradient value Randomly sampled data for the ith client at time t, D_iFor the local data set of the ith client,model parameters for the ith client at time t-1, f_iIs the local loss function of the ith client.

4. The federal learning method as in claim 1, wherein the iteration coefficients in step fourThe update mode is expressed as follows:

wherein gamma is a predetermined constant.

5. A federal learning method as claimed in claim 1, wherein the first order momentum in step fiveAnd second order momentumThe update mode is expressed as follows:

6. the federal learning method as claimed in claim 1, wherein the model parameters in step sixThe update mode is expressed as follows:

wherein v is^t′And alpha is a preset global learning rate, which is the global second-order momentum at the initial moment.

7. The federal learning method as claimed in claim 1, wherein the parameter aggregation in step nine is a weighted average, and the parameter is a first-order momentumSecond order momentumAnd model parametersSpecifically, the following are shown:

wherein p is_iIs the weight of the ith client, and N is the number of clients.

Technical Field

The invention belongs to the field of federal learning, and particularly relates to a federal learning method for a polluted client data set in a wireless edge network.

Background

Data is the basis of machine learning, and is used as the main direction of artificial intelligence, and the machine learning needs the data to train an artificial intelligence model. In most industries, due to the problems of industry competition, privacy safety, complex administrative procedures and the like, data often exist in an island form, and the performance of an artificial intelligence model obtained by training only by using the data in the data island often cannot meet the task requirement. Aiming at the dilemma of data islanding and data privacy, a federal learning method framework is produced.

Under the framework of a federal learning method, a plurality of mutually independent clients and a central server are provided, and the clients have different and non-sharable local data. In the training process, the server broadcasts global parameters to the client, the client uses the global model parameters obtained by updating and downloading on a data set of the client to train, then only uploads the local parameters to the server to aggregate, and the final model parameters are obtained through multiple downloading-training-uploading-aggregating processes. Obviously, under the federal learning framework, the data of the client is protected, and the problem of data islanding is solved.

The classic method of the federal learning method is federal averaging, after each client uploads parameters to a server, the server carries out weighted averaging on local parameters, and after global parameters are obtained, the server broadcasts the global parameters to each client. The Adam algorithm is used as a modification of the SGD, and has the advantages of high convergence rate and easiness in adjustment of hyper-parameters. The first-order momentum and the second-order momentum are obtained by utilizing the gradient information, so that the parameters can be converged quickly, and the learning rate can be adjusted in a self-adaptive manner, so that the Adam algorithm is widely applied to local training of a federal learning method. However, in an actual scenario, if the data set of the local client is polluted due to a network attack or other reasons, an abnormal value is necessarily generated in the random gradient calculated in the training process. And the Adam algorithm has extremely poor robustness to abnormal values due to the dependence of first-order momentum and second-order momentum on gradient values during parameter updating. In addition, local models generated by different client training usually have differences, and global models obtained by aggregation under the differences are unstable in performance.

Disclosure of Invention

In view of this, the present invention provides a federal learning method robust to data pollution in a wireless edge network, so as to improve the robustness of an algorithm in the face of toxic data and reduce performance impact caused by local model differences.

The specific technical scheme is as follows:

a federal learning method for robustness to data pollution in a wireless edge network comprises the following steps:

step three: the client utilizes the historical moment model parametersIs obtained in the bookGradient values on the earth data set are obtained, and deviation coefficients of the gradient values and the historical first-order momentum are obtained

Step four: the client updates the second-order momentum iteration coefficient

Step five: the client utilizes a bias coefficientCoefficient of iterationGradient valueAnd historical momentum valuesUpdating first order momentumAnd second order momentum

Step six: the client updates the model parameters by using the updated first-order and second-order momentum

Step seven: repeating the third step to the sixth step until the iteration times reach a preset iteration threshold;

step eight: the client uploads local model parametersFirst order momentumAnd second order momentumTo a central server;

step nine: the central server receives the local parameters of the client and carries out parameter aggregation to obtain updated global parameters x^t、m^t、v^t；

Step ten: and repeating the second step to the ninth step until the performance of the global model meets the requirement.

In the second step, the client takes the global parameter as the initial value of the current round of training to represent as follows:

Deviation coefficient in step threeThe update mode is expressed as follows:

wherein, subscript i represents the ith client, and superscript t represents the currentIteration time, d is the vector dimension, subscript j represents the jth component of the vector, g represents the gradient value, m represents the first order momentum, v represents the second order momentum, the gradient value Randomly sampled data for the ith client at time t, D_iFor the local data set of the ith client,model parameters for the ith client at time t-1, f_iIs the local loss function of the ith client.

Iteration coefficients in step fourThe update mode is expressed as follows:

wherein gamma is a predetermined constant.

The first order momentum in step fiveAnd second order momentumThe update mode is expressed as follows:

step six of the model parametersThe update mode is expressed as follows:

wherein v is^t′And alpha is a preset global learning rate, which is the global second-order momentum at the initial moment.

In the ninth step, the parameter aggregation mode is weighted average, and the related parameter is first-order momentumSecond order momentumAnd model parametersSpecifically, the following are shown:

wherein p is_iIs the weight of the ith client, and N is the number of clients.

Compared with the prior art, the invention has the following beneficial effects:

1. the invention utilizes the deviation coefficientAbnormal gradient values are detected, and the update direction is controlled when an abnormal value occurs. The concrete expression is as follows: when in useIn the case of an abnormal value, the value,approaching to 1, thenTherefore, the updating direction of the model parameters is not influenced by the abnormal values, the robustness of the method for updating the abnormal gradient values is embodied, and the influence of toxic data on the performance of the model is reduced.

2. The invention utilizesThe algorithm reduces the dependence on the gradient value when calculating the second-order momentum in the later iteration stage and adjusting the learning rate by using the second-order momentum, and the later training stageApproaches to 1, and satisfies the second-order momentumThe problem of overlarge learning rate caused by small gradient value in the later training period is solved, the influence of abnormal values is eliminated, the robustness of the method for the abnormal gradient value is embodied, and the performance of the model is improved.

3. The invention takes the global second-order momentum as the denominator of the updating step length, uses the same learning rate in the local training process of different clients, and reduces the difference of local models in the updating process, thereby ensuring that the model performance is more stable.

Drawings

Fig. 1 is a schematic structural diagram of a wireless edge network according to the present invention.

Fig. 2 is a flow chart of the federal learning method robust to data pollution in a wireless edge network of the present invention.

FIG. 3 is a comparison of experimental results of the method of the present invention and prior art.

Detailed Description

The invention is described in further detail below with reference to the figures and examples.

Fig. 1 is a system structure diagram of a wireless edge network according to the present invention, which includes a central server and N clients, where data is distributed among the N clients, and the clients and the server only transmit parameters and not transmit data, where the server uses a global model and the clients use a local model; in order to obtain a global model with better performance, the model is trained by adopting federal learning.

Fig. 2 is a flow chart of the federal learning method robust to data pollution in a wireless edge network of the present invention. The global parameters are initialized and broadcast initially, and the N clients perform local training on the local data set by using the downloaded parameters. After local training, the client uploads the local parameters to the server to perform weighted average of the parameters, and evaluates the global model obtained at the moment, if the performance requirements are met, the algorithm is ended, otherwise, the circulation is continued. The method specifically comprises the following steps:

step two: the central server broadcasts the global parameters to the client equipment of the wireless edge network, and the client takes the global parameters as initial values of the training of the current round;

step three: each client utilizes historical time model parametersObtaining gradient values on the local data set and obtaining deviation coefficients of the gradient values and the historical first-order momentum

Step four: each client is further provided withNew second order momentum iteration coefficient

Step five: each client utilizes the bias coefficientCoefficient of iterationGradient valueAnd historical momentum values Updating first order momentumAnd second order momentum

Step six: each client updates the model parameters by using the updated first-order and second-order momentum

Step seven: repeating the third step to the sixth step until the iteration times reach a preset iteration threshold;

step eight: uploading local model parameters by each clientFirst order momentumAnd second order momentumTo a central server;

step nine: the central server receives the local parameters of the clients and carries out parameter aggregation to obtain updated global parameters x^t、m^t、v^t；

Step ten: and repeating the second step to the ninth step until the performance of the global model meets the requirement.

The process of local training is explained as follows:

when the local training starts, the client acquires global parameters including a global model parameter, a global first-order momentum and a global second-order momentum as initial parameter values of the local training:

Taking the ith client as an example, random sampling is performed in the local data set at the beginning of each iteration to obtain partial data, and gradient values are calculated Randomly sampled data for the ith client at time t, D_iLocal data set for ith client，Model parameters for the ith client at time t-1, f_iIs the local loss function of the ith client. The gradient value and the previous first-order momentum are used to construct the following formula

The index i represents the ith client, the index t represents the current iteration time, d is the vector dimension, the index j represents the jth component of the vector, g represents the gradient value, m represents the first-order momentum, and v represents the second-order momentum. The iteration time t is utilized to construct the following formula for calculation

Wherein gamma is a predetermined constant. Constructing the following formula to obtain the first-order momentum of the current iteration momentAnd second order momentum

It follows that, when abnormalWhen the gradient value is present, the gradient value,andthe difference is increased, thenApproaches 1 whenTo be receivedThe influence is reduced, and the updating direction is not influenced by the abnormal value, so the abnormal value is controlled. Meanwhile, when the training is in the later stage,approaching 1, which ensures that the learning rate is not too large or too small due to the appearance of abnormal values when the parameter approaches the optimal value, and also enhances the algorithm robustness.

Using historical iteration time local model parametersWith global learning rate alpha, first order momentumInitial moment global second order momentum v^t′Calculating to obtain the local model parameter at the current moment

When the local parameters are updated, the global second-order momentum is used as the denominator of the learning rate, so that the same updating step length of different clients is ensured, the difference of local models of different clients is reduced, and the performance of the global model is improved.

Uploading local parameters and carrying out model fusion when the local iteration times reach a preset value:

wherein p is_iIs the weight of the ith client.

In practice, the MNIST handwritten number training set is averagely distributed to ten clients, and simultaneously Gaussian noise with the average value of 0 and the variance of 0.4 is added to one-hot labels of each picture with the probability of 50%, so that a logistic regression model is trained. The result of the global model on the test set is shown in fig. 3, and the accuracy and stability of the method provided by the invention are superior to those of the prior art. By the method, model training is realized under the polluted data set, the influence of toxic data on the model performance is eliminated to the greatest extent, and the method is high in precision and stable.

The above description is only exemplary of the present invention and should not be taken as limiting the scope of the present invention, and any modifications, equivalents, improvements and the like that are within the spirit and principle of the present invention should be included in the scope of the present invention.

11页详细技术资料下载

Federal learning method with robustness to data pollution in wireless edge network

相关技术

网友询问留言