Improving private model utility by minimizing expected loss under noise

文档序号:169062 发布日期:2021-10-29 浏览:19次 中文

阅读说明:本技术 通过最小化噪声下的预期损失改进私有模型效用 (Improving private model utility by minimizing expected loss under noise ) 是由 K·雷诺 莫尔钱 J·瓜哈尔多 于 2021-04-28 设计创作,主要内容包括:提供了通过最小化噪声下的预期损失改进私有模型效用。执行对模型的训练以在维持差分隐私的同时,使噪声下的预期损失(ELUN)最小化。噪声作为从噪声分布抽取的随机样本被添加到机器学习模型的权重,该噪声是根据隐私预算添加的。通过使用损失函数来最小化ELUN,该损失函数预计添加到机器学习模型的权重的噪声,以在参数空间中找到一点,对于该点,损失对于权重中的噪声是鲁棒的。噪声的添加和ELUN的最小化被迭代,直到权重收敛并满足优化约束。针对任意输入利用该模型,同时保护用于训练该模型的训练数据的隐私。(Improving the utility of the private model by minimizing the expected loss under noise is provided. Training of the model is performed to minimize the Expected Loss Under Noise (ELUN) while maintaining differential privacy. Noise is added to the weights of the machine learning model as random samples drawn from the noise distribution, the noise being added according to the privacy budget. ELUN is minimized by using a loss function that anticipates the noise added to the weights of the machine learning model to find a point in the parameter space for which the loss is robust to the noise in the weights. The addition of noise and minimization of ELUN are iterated until the weights converge and the optimization constraints are satisfied. The model is utilized for any input while preserving privacy of training data used to train the model.)

1. A method for training and utilizing a model to minimize Expected Loss Under Noise (ELUN) while maintaining differential privacy, the method comprising:

adding noise as random samples drawn from a noise distribution to weights of a machine learning model, the noise added according to a privacy budget;

minimizing ELUN by using a loss function that anticipates noise that adds to the weights of the machine learning model to find a point in the parameter space for which the loss is robust to noise in the weights;

iteratively adding noise and minimizing ELUN until the weights converge and an optimization constraint is satisfied; and

utilizing the model for any input while protecting privacy of training data used to train the model.

2. The method of claim 1, wherein the noise comprises laplacian noise.

3. The method of claim 1, wherein the noise comprises gaussian noise.

4. The method of claim 1, wherein the noise is approximated via random samples drawn from a laplacian distribution.

5. The method of claim 1, wherein the machine learning model is a linear machine learning model.

6. The method of claim 1, wherein the machine learning model comprises one or more of a support vector machine, a Convolutional Neural Network (CNN), or a Deep Neural Network (DNN).

7. The method of claim 1, wherein minimizing ELUN comprises optimizing according to gradient descent.

8. The method of claim 1, wherein ELUN is determined byLabeled, and given by:

wherein the content of the first and second substances,is at the model parameterAnd a mark pointThe loss function defined above; and isIs the distribution of noise over the possible model parameters tocAs the center.

9. A system for training and utilizing models to minimize Expected Loss Under Noise (ELUN) while maintaining differential privacy, the system comprising:

a memory storing a machine learning model; and

computing device programmed to

Adding noise as random samples drawn from a noise distribution to weights of a machine learning model, the noise added according to a privacy budget;

minimizing ELUN by using a loss function that anticipates noise that adds to the weights of the machine learning model to find a point in the parameter space for which the loss is robust to noise in the weights;

iteratively adding noise and minimizing ELUN until the weights converge and an optimization constraint is satisfied; and

utilizing the model for any input while protecting privacy of training data used to train the model.

10. The system of claim 9, wherein the noise comprises one or more of laplacian noise or gaussian noise.

11. The system of claim 9, wherein the noise is approximated via random samples drawn from a laplacian distribution.

12. The system of claim 9, wherein the machine learning model comprises one or more of a linear machine learning model, a support vector machine, a Convolutional Neural Network (CNN), or a Deep Neural Network (DNN).

13. The system of claim 9, wherein minimizing ELUN comprises optimizing according to gradient descent.

14. The system of claim 9, wherein ELUN is comprised ofLabeled, and given by:

wherein the content of the first and second substances,is at the model parameterAnd a mark pointThe loss function defined above; and isIs the distribution of noise over the possible model parameters tocAs the center.

15. A non-transitory computer-readable medium comprising instructions for training and utilizing a model to minimize Expected Loss Under Noise (ELUN) while maintaining differential privacy, the instructions, when executed by a processor, cause the processor to perform operations comprising:

adding noise as random samples drawn from a noise distribution to weights of a machine learning model, the noise added according to a privacy budget;

minimizing ELUN by using a loss function that anticipates noise that adds to the weights of the machine learning model to find a point in the parameter space for which the loss is robust to noise in the weights;

iteratively adding noise and minimizing ELUN until the weights converge and an optimization constraint is satisfied; and

utilizing the model for any input while protecting privacy of training data used to train the model.

16. The medium of claim 15, wherein the noise comprises one or more of laplacian noise or gaussian noise.

17. The medium of claim 15, wherein the noise is approximated via random samples drawn from a laplacian distribution.

18. The medium of claim 15, wherein the machine learning model is a linear machine learning model.

19. The medium of claim 15, wherein minimizing ELUN comprises optimizing according to gradient descent.

20. The medium of claim 15, wherein ELUN is composed ofLabeled, and given by:

wherein the content of the first and second substances,is at the model parameterAnd a mark pointThe loss function defined above; and isIs the distribution of noise over the possible model parameters tocAs the center.

Technical Field

The present disclosure relates to improving private model utility by minimizing expected losses under noise.

Background

As machine learning has become ubiquitous even in privacy-sensitive areas, recent research has demonstrated specific privacy threats, as well as explored robust privacy defenses, most notably differential privacy (differential privacy). When machine learning algorithms are applied to private training data, the resulting model may inadvertently reveal information about the data through details of its behavior or its structure and parameters.

Disclosure of Invention

According to one or more illustrative examples, a method includes performing training of a model to minimize Expected Loss Under Noise (ELUN) while maintaining differential privacy. Noise is added to the weights of the machine learning model as random samples drawn from the noise distribution, the noise being added according to the privacy budget. ELUN is minimized by using a loss function that anticipates the noise added to the weights of the machine learning model to find a point in the parameter space for which the loss is robust to the noise in the weights. The addition of noise and minimization of ELUN are iterated until the weights converge and the optimization constraints are satisfied. The model is utilized for any input while preserving privacy of training data used to train the model.

According to one or more illustrative examples, a system for training and utilizing a model to minimize Expected Loss Under Noise (ELUN) while maintaining differential privacy includes: a memory storing a machine learning model; and a computing device. The computing device is programmed to: adding noise as random samples drawn from a noise distribution to weights of a machine learning model, the noise added according to a privacy budget; minimizing ELUN by using a loss function that predicts noise added to the weights of the machine learning model to find a point in the parameter space for which the loss is robust to noise in the weights; iteratively adding noise and minimizing ELUN until the weights converge and an optimization constraint is satisfied; and utilizing the model for any input while protecting privacy of training data used to train the model.

According to one or more illustrative examples, a non-transitory computer-readable medium includes instructions for training and utilizing a model to minimize Expected Loss Under Noise (ELUN) while maintaining differential privacy, which when executed by a processor, cause the processor to perform operations comprising: adding noise as random samples drawn from a noise distribution to weights of a machine learning model, the noise added according to a privacy budget; minimizing ELUN by using a loss function that predicts noise added to the weights of the machine learning model to find a point in the parameter space for which the loss is robust to noise in the weights; iteratively adding noise and minimizing ELUN until the weights converge and an optimization constraint is satisfied; and utilizing the model for any input while protecting privacy of training data used to train the model.

Drawings

FIG. 1 provides an example of a non-convex loss function;

FIG. 2 illustrates the expected loss of a simple one-dimensional logistic regression problem as a function of weight;

FIG. 3 illustrates a first algorithm for generating a differential private model trained to minimize ELUN;

FIG. 4 illustrates an alternative algorithm for generating a differential private model trained to minimize ELUN;

FIG. 5 illustrates the training and testing accuracy of linear models trained using different methods;

FIG. 6 illustrates an example process for training and utilizing a model to minimize expected loss under noise while maintaining differential privacy; and

FIG. 7 illustrates an example computing device.

Detailed Description

Embodiments of the present disclosure are described herein. However, it is to be understood that the disclosed embodiments are merely examples and that other embodiments may take various and alternative forms. The figures are not necessarily to scale; some features may be exaggerated or minimized to show details of particular components. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting, but merely as a representative basis for teaching one skilled in the art to variously employ the embodiments. As one of ordinary skill in the art will appreciate, various features illustrated and described with reference to any one figure may be combined with features illustrated in one or more other figures to produce embodiments that are not explicitly illustrated or described. The combination of features illustrated provides a representative embodiment for a typical application. However, various combinations and modifications of the features consistent with the teachings of the present disclosure may be desired for particular applications or implementations.

Machine learning has become increasingly prevalent, including in sensitive areas where privacy is a concern. Previous work highlights privacy vulnerabilities in machine learning models-in particular, an adversary with access to the model can learn sensitive information about the private data on which the model is trained.

To combat privacy threats, a number of approaches have been proposed, most notably differential privacy, which gives provable privacy guarantees. A typical way to achieve differential privacy for linear machine learning models is to add noise to the weights of the model. Unfortunately, this noise can often significantly detract from the utility of the model.

While this utility trade-off may be unavoidable to some extent, it can be mitigated by finding a point in the parameter space for which the penalty is robust to noise in the weights. This intuition can be built upon improving the utility of the private model.

As discussed in detail herein, the described method involves three contributions. First, a novel loss function, Expected Loss Under Noise (ELUN), is described that extends any loss function to predict the noise that will be added to the linear model parameters. Second, theoretical analysis of ELUN indicates that a model trained to minimize ELUN can be made differentially private using the same amount of noise that would be needed for the original loss function. This directly implies the existence of a differential private algorithm for training linear models with ELUN. Third, a practical algorithm is supplied to obtain a model that minimizes ELUN with differential privacy.

One way to achieve differential privacy for a linear machine learning model is to add noise to the weights of the model. Unfortunately, this noise can often significantly detract from the utility of the model. While this utility trade-off may be unavoidable to some extent, it may be possible to mitigate it by finding a point in the parameter space for which the loss is robust to noise in the weights.

More formally, a model can be trained that minimizes the expected loss under noise-i.e., achieves the smallest possible loss in anticipation when accounting for uncertainty over noisy weights. To do this, the following definitions can be used:

define 1 expected loss under noise.Order to Is at the model parameter And a mark point Loss function defined above Counting; and order Is the noise distribution over the possible model parameters, centered at c. The Expected Loss Under Noise (ELUN) By Is given by

The standard distribution used in the context of differential privacy is the laplacian distribution, which has a focus on centersAnd sizebProbability density function of(PDF). Applying this noise distribution to definition 1, the expected loss under laplace noise is given by equation 1:

(1)

remarks 1In the minimization of anticipation Point of (2) Not necessarily with minimization Point of (2) Are identical to each other

FIG. 1 provides an example of a non-convex loss function for which there is Laplace noise with a suitably large scale. However, even for convex loss functions, for example, in logical regression, the optimal parameters may be different when using ELUN.

For example, consider a one-dimensional logistic regression problem, where data is distributed according toGenerating data, data distributionThe following were used:

1. y is uniformly randomly drawn from 0, 1.

2. x is according toI.e. having a mean valueSum varianceIs normally distributed-extracted, wherein

For a linear logical model of the model,let us orderThe expected loss above can be taken aswIs calculated because it is known how the data is generated; this is given by equation 2 whenIs selected asBinary cross entropyEquation 2 becomes equation 3.

(2)

(3)

At the same time, when used, has a dimensionbAnd binary cross-entropy-lost laplacian noise,the expected ELUN above is given by equation 4.

(4)

Wherein

FIG. 2 illustrates as weightswExpected loss of simple one-dimensional logistic regression problem 200 for the function of = 1,= 1: 2 (A), 1: 0 (B) and 0: 8 (C), andb = 1:0 &2: 0. notably, the optimal weight, i.e., the point at which the curve reaches its minimum, is greater when using ELUN (Eq. 4) than when simply using binary cross entropy (Eq. 3).

This is due to asymmetry in the binary cross-entropy loss function; slightly underestimating when at the optimal weight without noisewThe cost in loss is higher than slightly overestimated. Therefore, when noise is to be added towIt is more preferable to slightly overestimate the weights to avoid that disproportionately high costs of noise result in too small weights.

Differential privacy is a common, strong privacy concept that is an attribute of learning rules in the context of machine learning that states that the presence or absence of any particular training point does not significantly affect a particular model learned by the rules. More formally, definition 2 gives differential privacy (declared in the context of ML).

Differential privacy (Dwork) is defined 2.Order to Is a (randomization) mechanism, which is applied to a given data set In case of (2) return Model (model) . If for all And for all neighborhoods So that Then, then Is that Differential privacy

When in use-differential private mechanism learning modelfWhen it is said, it can be said thatfIs itself-differential private.

One common way to implement differential privacy is to switch to a non-private mechanismMThe output of (a) adds laplacian noise. In the context of a linear machine learning model, this corresponds to adding noise to each weight of the trained model. (it should be noted that the use of a linear model is merely one example, and other types of models, such as support vector machines, Convolutional Neural Networks (CNNs), or deep neural networks, may additionally or alternatively be usedComplex (DNN)). The scale of the noise is determined by the privacy budgetAndMsensitivity of (2) —MThe output of (d) may be determined by a different maximum amount on adjacent inputs.

Wu et al use variants on strong uniform RO stability to bound the sensitivity of learning rules that learn linear models over a strong convex pruchs continuous loss function. The results are summarized in theorem 1.

Theorem 1 (Wu et al).Let M be provided with -a learning rule of a strongly convex loss function, wherein for all Is a regulator, and is relative to Is/are as follows -plum And (3) Przetz. M sensitivity on a data set of size n to Limitation of

Therefore, forStrongly convex-a function of loss of Liphoz,Mcan be added by adding a material with a dimensionBecomes laplacian noise-differential private.

In the case of logistic regression or softmax regression commonly used for classification problems,is binary or class cross entropy whenByWhen norm is bounded, it is-Liphoz. In some cases, such a limit may easily exist, for example, for pixel values in the range [0, 1 ]]The image of (1); in other cases, it may be achieved by a pre-processing step in which the values are clipped (clip) to obtain the desired values. Can be adjusted by adding adjustment itemsTo make the cross entropyStrong convex.

Attention is paid to clipping.The appropriate selection should be made for the data set, however, care should be taken with regard to data-based selectionThe privacy impact of. If it is notMay be selected a priori or may be assumed public, there is no privacy issue. If selected, theFor example, as the maximum norm of the data, then it may be desirable to select in a differential private mannerAnd factored it into the privacy analysis.

Proposition 1If loss function For all Relative to Is that -Liphoxitz, then For all Also relative to Is that -Liphoxiz

And (5) proving. Order toSo as to makecPossible model parameters for the centerPDF of the above noise distribution. Can assume thatFor allRelative toIs that-Liphoxitz, thus. Order toIs ELUN. This gives:

thus, for all

(5)

(6)

(7)

(8)

Followed by equation 5 by re-indexing, sinceFollowed by equation 6, by assumingIs thatLeprushz is followed by equation 7, and becauseIs a probability measure followed by equation 8. Therefore, the temperature of the molten metal is controlled,for allRelative toIs that-Liphoz.

Thus, theorem 1 can be applied to ELUN,which corresponds to the original loss function to be added toThe scale of the noise of (2). This gives a way to generate a differential private model that is trained to minimize ELUN, as detailed in algorithm 1 shown in fig. 3. Due to the sensitivity and thus the measure of the noise that has to be added toAndboth are identical, so the model learned by algorithm 1 predicts the exact amount of noise added to it. Thus, the resulting model is for the original loss functionThe optimal post-noise model.

In general, ELUN (equation 1) for Laplace noise is not analytically solvable. Numerical solutions are possible, however, in high dimensions, computing the integral becomes difficult to handle because the effort to compute the integral scales exponentially with dimension. This means that it is not always possible to apply algorithm 1 directly and efficiently. Thus, in practice, ELUN is an approximation, which can be efficiently implemented via sampling.

Fig. 4 illustrates the ELUN algorithm 2, which describes a practical alternative to the ELUN algorithm 1. Essentially, the resolution is selectedRAnd extracted from the Laplace distributionRThe random samples approximate the expectation over noise. In practice, argmin can be found via a standard optimization algorithm, e.g. gradient descent.

Note that in the limit, andconverge on the integral over the probability density function, and proposition 1 still applies via essentially the same proof (by utilizing and replacing the integral, and utilizingReplacement of). Thus, the model returned by algorithm 2 is also-differential private.

As noted in remark 1, ELUN allows us to specify a model that may be better post-noise than a post-noise model trained using the original loss function (e.g., as done by Wu et al). Evidence that this potential advantage can be achieved in practice is now shown; the utility of differential private models trained with ELUN tends to exceed that of differential private models trained with cross-entropy, particularly forSmall value (greater privacy assurance).

FIG. 5 illustrates the training and testing accuracy of linear models trained using different methods. As shown, the figures indicate that on various data sets, for variousValue, training and testing accuracy by using algorithm 2 (blue, solid line) without differential privacy (black, dotted line), with output perturbation (red, dotted line). Results were averaged over 100 trials on each dataset, where=0.05,=2.0, andR = 50。

for theIs where privacy guarantees are best, algorithm 2 is continuously, and often with significant margin, superior to previous work. For largeBoth differential private models approach the performance of the non-private model, however, it is important to note that for largeThe privacy guarantee becomes meaningless, as shown in Yeom et al.

Notably, the parameters learned using ELUN generalize well; despite the fact that ELUN is minimized on the training data, Algorithm 2 also outperforms the previous work on the test data.

Therefore, for small(corresponding to strong privacy guarantees), the differential private training mechanism produces a model that performs better than the state-of-the-art methods comparable today.

Fig. 6 illustrates an example process 600 for training and utilizing a model to minimize Expected Loss Under Noise (ELUN) while maintaining differential privacy. In an example, process 600 may be performed by one or more computing devices, such as computing device 700 described herein.

At operation 602, noise is added to the weights of the machine learning model as random samples drawn from the noise distribution. In an example, noise may be added according to a privacy budget. The noise may be laplacian noise extracted according to a probability density function of a normal distribution, wherein the noise is approximated via random samples extracted from the laplacian distribution. It should be noted that this is only an example and other noise distributions, such as gaussian noise, may be used. The machine learning model may be a linear model.

In operation 604, the ELUN is minimized by using a loss function that predicts the noise added to the weights of the machine learning model to find a point in the parameter space for which the loss is robust to the noise in the weights. Minimizing the ELUN may include using standard optimization algorithms, such as gradient descent.

At operation 606, the model is evaluated to identify whether the model parameters have converged and whether a given optimization constraint is satisfied. If not, control returns to operation 602 to perform additional iterations. If so, the model is considered complete and control passes to operation 608.

At operation 608, the model is utilized for any input while protecting privacy of training data used to train the model. After operation 606, the process 600 ends.

Fig. 7 illustrates an example computing device 700. The algorithm and/or method techniques of one or more embodiments discussed herein may be implemented using such a computing device. The computing device 700 may include a memory 702, a processor 704, and non-volatile storage 706. Processor 704 may include one or more devices selected from a High Performance Computing (HPC) system, including a high performance core, microprocessor, microcontroller, digital signal processor, microcomputer, central processing unit, field programmable gate array, programmable logic device, state machine, logic circuitry, analog circuitry, digital circuitry, or any other device that manipulates signals (analog or digital) based on computer-executable instructions residing in memory 702. Memory 702 may include a single memory device or multiple memory devices, including but not limited to Random Access Memory (RAM), volatile memory, non-volatile memory, Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), flash memory, cache memory, or any other device capable of storing information. Non-volatile storage 706 may include one or more persistent data storage devices, such as hard disk drives, optical drives, tape drives, non-volatile solid-state devices, cloud storage, or any other device capable of persistently storing information.

The processor 704 may be configured to read into the memory 702 and execute computer-executable instructions that reside in program instructions 708 of the non-volatile storage 706 and that embody the algorithms and/or method techniques of one or more embodiments. The program instructions 708 may include an operating system and applications. The program instructions 708 may be compiled or interpreted from a computer program created using a variety of programming languages and/or techniques, including without limitation Java, C + +, C #, Objective C, Fortran, Pascal, Java Script, Python, Perl, and PL/SQL, alone or in combination. In one embodiment, PyTorch, which is a package of Python programming language, may be used to implement the code of the machine learning model of one or more embodiments.

When executed by the processor 704, the computer-executable instructions of the program instructions 708 may cause the computing device 700 to implement one or more of the algorithm and/or method techniques disclosed herein. The non-volatile storage 706 may also include data 710 that supports the functions, features, and processes of one or more embodiments described herein. This data 710 may include training data, models, sampling noise, model inputs, and model outputs, as some examples.

The processes, methods, or algorithms disclosed herein may be delivered to/implemented by a processing device, controller, or computer, which may include any existing programmable or dedicated electronic control unit. Similarly, the processes, methods or algorithms may be stored as data and instructions executable by a controller or computer in a variety of forms, including, but not limited to, information permanently stored on non-writable storage media such as ROM devices and information alterably stored on writable storage media such as floppy disks, magnetic tapes, CDs, RAM devices and other magnetic and optical media. A process, method, or algorithm may also be implemented in a software executable object. Alternatively, the processes, methods, or algorithms may be embodied in whole or in part using suitable hardware components such as Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), state machines, controllers or other hardware components or devices, or a combination of hardware, software, and firmware components.

While exemplary embodiments are described above, it is not intended that these embodiments describe all possible forms encompassed by the claims. The words used in the specification are words of description rather than limitation, and it is understood that various changes may be made without departing from the spirit and scope of the disclosure. As previously mentioned, the features of the various embodiments may be combined to form further embodiments of the invention that may not be explicitly described or illustrated. While various embodiments may have been described as providing advantages over or being preferred over other embodiments or prior art implementations in terms of one or more desired characteristics, those of ordinary skill in the art realize that one or more features or characteristics may be compromised to achieve desired overall system attributes, which depend on the specific application and implementation. These attributes may include, but are not limited to, cost, strength, durability, life cycle cost, marketability, appearance, packaging, size, serviceability, weight, manufacturability, ease of assembly, and the like. As such, to the extent that any embodiment is described as being less desirable in one or more respects than other embodiments or the prior art, such embodiments are not outside the scope of the present disclosure and may be desirable for particular applications.

16页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种免注册登录的表单填写方法及装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类