Fixed point number coding and operation system for privacy protection machine learning

文档序号：947428 发布日期：2020-10-30 浏览：21次中文

阅读说明：本技术 一种用于隐私保护机器学习的定点数编码及运算系统 (Fixed point number coding and operation system for privacy protection machine learning ) 是由汤定一韩伟力于 2020-06-22 设计创作，主要内容包括：本发明属于网络空间安全技术领域,具体为一种用于隐私保护机器学习的定点数编码及运算系统。该系统包括定点数表示模块,定点数加法模块以及定点数乘法模块,本发明将有限域中定点数编码方式用于隐私保护机器学习中,以提供隐私保护机器学习中定点数编码与运算的整体解决方案为目标,实现了隐私保护机器学习中定点数编码方案及运算机制。使用本发明系统表示定点数的机器学习框架训练的模型(如线性回归,逻辑回归,BP神经网络和LSTM神经网络),与现有的机器学习框架相比,可以几乎相同的精度执行预测和分类任务。(The invention belongs to the technical field of network space security, and particularly relates to a fixed point number coding and operating system for privacy protection machine learning. The system comprises a fixed point number representation module, a fixed point number addition module and a fixed point number multiplication module, and the invention uses a fixed point number coding mode in a finite field in the privacy protection machine learning, aims at providing an integral solution of fixed point number coding and operation in the privacy protection machine learning, and realizes a fixed point number coding scheme and an operation mechanism in the privacy protection machine learning. Models (such as linear regression, logistic regression, BP neural network and LSTM neural network) trained using a machine learning framework that represents fixed-point numbers using the system of the present invention can perform prediction and classification tasks with nearly the same accuracy as existing machine learning frameworks.)

1. A fixed point number coding and operation system for privacy protection machine learning is characterized by comprising a fixed point number representation module, a fixed point number addition module and a fixed point number multiplication module; wherein:

the fixed point number representation module is used for encoding the fixed point number required by machine learning in the privacy protection machine learning;

the fixed point number addition module is used for performing fixed point number addition operation in the privacy protection machine learning;

and the fixed point number multiplication module is used for performing fixed point number multiplication operation in the privacy protection machine learning.

2. The system of claim 1, wherein the fixed-point number representation module uses fixed-point number coding and calculation

first, a decimal point is fixedBy the function int:is mapped to signed integer

Second, signed integersBy function fld:is encoded in the integer domainWherein q > 2^kSigned integers are encoded in the integer domain in a form similar to the two's complement.

3. The system of claim 1, wherein the fixed-point number addition module is configured to enable the instruction to execute a fixed-point number encoding and operation for privacy-preserving machine learning

If f is₁＝f₂Then the two fixed-point numbers are operated by using the same protocol as the integer during addition,

otherwise, one of the operands needs to be scaled so that their fractional digits are the same(ii) a For example, when f₂＞f₁When in use, firstly

4. The system of claim 1, wherein the fixed-point number coding and computing system comprises: in the fixed point number multiplication module, when the secret shared fixed point number and the constant are subjected to multiplication operation, only the multiplication operation needs to be independently performed in each participant, and communication is not needed;

when the two secret sharing fixed point numbers are multiplied, truncation and reduction operation are required to be carried out after multiplication, wherein the reduction operation uses the same protocol as an integer to carry out operation;

order toFor two fixed-point numbers, fixed-point number multiplication

WhereinDecimal fraction f₁+f₂；

Truncated operation after multiplication by Div2mP operation, so that input shared by two secrets

5. The system of claim 4, wherein the fixed-point number coding and computing system comprises: the Div2mP operation is represented by the following function, the numbers enclosed in parentheses represent secret sharing numbers; :

[c]←Div2mP([a]，k，m)

the input to Div2mP contains 3 parameters:

1) secret shared signed integers

2) The significand k of the signed integer in binary;

3) a common constant m ∈ [1.. k-1 ];

the function of the function is to shift the signed integer a of the secret sharing to the right by m bits, in other words to divide a by 2^mP in the function name indicates that the division result is rounded to adjacent integers by probability, possibly down or up, with a particular probability of 1-alpha, where alpha is

Technical Field

The invention belongs to the technical field of network space security, and particularly relates to a fixed point number coding and operating system for privacy protection machine learning.

Background

Machine learning has been widely applied to respective actual scenes. For example, internet companies collect a large amount of user behavior data to train more accurate recommendation models. Hospitals collect health data to generate diagnostic models. Financial enterprises use historical transaction records to train more accurate fraud models.

In machine learning, the scale of the data plays an important role in the accuracy of the model. However, data distributed among multiple data sources or individuals cannot be simply consolidated. Legislation related to privacy issues, such as GDPR, concerns about maintaining competitive advantage for enterprises and issues related to ownership of data, make data not publicly shared. Privacy preserving machine learning based on secure multiparty computing allows different subjects to train their respective models on their joint data without revealing any information other than the final model.

The machine learning with privacy protection is particularly referred to privacy protection machine learning based on a secure multiparty computing protocol in the invention. It allows different subjects to train various models on their joint data without revealing anything outside the final model (or encrypted version). Wherein the secure multiparty computing protocol is specifically BGW protocol.

In machine learning, calculations are typically performed using floating point numbers, and the training data is typically distributed between [0,1] after normalization. Parameters in the training model, such as weights of the neural network, are usually distributed between [ -1,1], and the floating point number can represent significant bits 6 to 8 bits. In summary, machine learning requires a scheme of signed decimal numbers and capable of representing at least a range of [ -1,1], and representing significant digits 6 to 8 bits after the decimal point. The original BGW protocol only supports basic operations of addition, subtraction, multiplication, inverse elements and the like of elements on a finite field. How to represent signed fractional numbers becomes a challenge for the BGW protocol to support machine learning.

First, the Shamir secret sharing protocol is introduced, and the BGW protocol is based on the Shamir secret sharing protocol.

Shamir secret sharing

Shamir's secret sharing (a secret sharing approach based on cryptography proposed by Shamir et al in 1979) is a form of secret sharing in which the secret is divided into multiple parts, each participant getting its own unique part,where any k-party (called the threshold) is sufficient to reconstruct the original secret. The purpose is to divide the secret S into n parts of data S₁,s₂,…,s_nThe method is as follows:

1. knowing at least k parts or more of s_iSo that the secret S can be restored.

2. Knowing any k-1 parts or less of s_iS will be left completely undetermined.

This scheme is referred to as the (k, n) threshold scheme. Without loss of generality, we assume that the secret S is a finite field

Wherein 0 is<k≤n；S<p and p are prime numbers. Selecting to satisfy a_i<Random k-1 positive integer a of P₁,a₂,…,a_k-1Then let a₀S. Establishing a polynomial f (x) a₀+a₁x+a₂x²+a₃x³+…+a_k-1x^k-1. Let us construct any n points therein, e.g., set i to 1, …, n to obtain (i, f (i)). Each participant has a point (i, f (i)) and a prime number p, where the prime number p defines the finite field of use. Given an arbitrary subset of size k in n, we can use lagrange interpolation to recover the secret.

BGW protocol

The computing model of the BGW protocol (a BGW-based distributed computing protocol proposed by Ben-Or et al in 1988, which implements distributed addition and multiplication between three Or more parties) is a complete synchronous network, and paired communication channels between the two parties are secure.

Suppose v₀,…,v_n-1Is a domain

N different non-zero points. Holding secret input

By selecting t random elementsSecret sharing of input s is performed such that

f(x)＝s+a₁x+…+a_tx^t

And each party P_iReceived value s_i＝f(v_i)

Order toUsing two secrets shared by polynomials f (x), g (x), respectively, andc ≠ 0 is a constant. Linear operations (e.g., c × a and a + b) do not require any communication between the participants.

In computing the product of two shared secrets, we need to randomize the coefficients of h (x) and perform a polynomial descent operation while maintaining the constant term h₀And is not changed.

Order to

h(x)＝h₀+h₁x+…+h_2tx^2t

Reissue to order

s_i＝h(α_i)＝f(α_i)g(α_i)

Wherein s is_i(i-0, …, n-1) is h held by each party₀Share of (shares). Defining the truncation operation of the polynomial h (x)

truncate(h(x))←k(x)＝h₀+h₁x+…+h_tx^t

Wherein r is_i＝k(α_i)(i＝1,…,n-1)

Let S be (S)₀,…,S_n-1) And R ═ R₀,…,r_n-1) There is a constant matrix A of n x such that

R＝S·A

Let B be (B)_i,j) Is an n × n order vandermonde matrix. Let P be a linear projection. We have S × (B)^-1PB)＝R

Polynomial of productThe truncation coefficients for k (x) may not be completely random. To randomize the coefficients, each participant P _iRandomly selecting a polynomial q of order 2t with zero free coefficient_i(x) And distribute their shares to the parties.

We can use in descending order

In place of h (x)

Satisfy the requirement of

But x_iThe other coefficients 1. ltoreq. i.ltoreq.t are completely random.

Disclosure of Invention

The invention provides a fixed point number coding and operation system for privacy protection machine learning. The system of the invention can encode fixed point number and support fixed point number operation in privacy protection machine learning using BGW protocol through fixed point number representation and operation. The scheme can provide safe and efficient fixed point number representation and operation support for privacy protection machine learning.

The invention uses a finite field fixed point number coding mode in the privacy protection machine learning, aims at providing an integral solution of fixed point number coding and operation in the privacy protection machine learning, and realizes a fixed point number coding and operation system in the privacy protection machine learning. The data type coding and operation system in the privacy protection machine learning comprises a fixed point number representation module, a fixed point number addition module and a fixed point number multiplication module, and the coding and the operation are combined, so that the support of the privacy protection machine learning on the fixed point number operation is finally realized. The invention can be used for a machine learning framework based on a safe multi-party computing protocol and provides a fixed point number coding scheme and an operation mechanism required by machine learning. The technical scheme of the invention is specifically introduced as follows.

The invention provides a fixed point number coding and operation system for privacy protection machine learning, which comprises a fixed point number representation module, a fixed point number addition module and a fixed point number multiplication module; wherein:

the fixed point number representation module is used for encoding the fixed point number required by machine learning in the privacy protection machine learning;

the fixed point number addition module is used for performing fixed point number addition operation in the privacy protection machine learning;

and the fixed point number multiplication module is used for performing fixed point number multiplication operation in the privacy protection machine learning.

In the present invention, the fixed point number expression module is usedThe fixed-point decimal point is represented,expressed in the range of [ -2 [^k-f-1,2^k-f-1-2^f]Every other 2^-fRational number of samples. The fixed-point number is encoded in the integer domain in two steps. First, a decimal point is fixed

By the function int: is mapped to signed integerSecond, signed integers

By function fld:

is encoded in the integer domain

Wherein q is>2^kSigned integers are encoded in the integer domain in a form similar to the two's complement.

In the invention, in the fixed point number addition module, orderThe integers corresponding to them represent

If f is₁＝f₂The two fixed-point numbers are then operated on in addition using the same protocol as the integer. Otherwise, one of the operands needs to be scaled so that their fractional number of bits is the same. For example, when f ₂>f₁When in use, firstlyDecimal point ofAlignment is obtained

Then through calculationTo obtainThe value of (c). Whether the number of fixed points shared by a secret is added to a constant or to another number of fixed points shared by a secret, it is only necessary to perform the addition operation independently in each of the participants without performing communication.

In the invention, in the fixed point number multiplication module, when the secret shared fixed point number is multiplied by the constant, only the multiplication operation is independently carried out in each participant, and communication is not needed. When two secrets share a fixed point number phaseWhen multiplying, truncation and reduction operations are required after multiplication, wherein the reduction operations operate by using the same protocol as integers. Order toTwo fixed points. Fixed point number multiplicationWhereinDecimal fraction f₁+f₂. Since directly multiplying two fixed-point numbers increases the decimal place number, while in machine learning, the result of multiplication usually participates as an operand in the following multiplication calculation, it is necessary to reduce the decimal place number by truncation operation. Truncated operation after multiplication by Div2mP operation, so that input shared by two secrets

After multiplication operation, obtainingWherein

And the upper limit of the absolute error of the truncation operation is 2^-fThe error range is acceptable in machine learning, and the Div2mP operation only needs 1-pass interaction, so that the method is suitable for machine learning scenes requiring a large number of multiplication operations.

In the present invention, Div2mP operation is represented by a function, where numbers enclosed in parentheses herein represent secret sharing numbers;

[c]←Div2mP([a],k,m)

the input to Div2mP contains 3 parameters:

1) secret shared signed integers

2) Significand k of signed integer in binary

3) Common constant m is in the middle of [1.. k-1]

The function of the function is to shift the signed integer a of the secret sharing to the right by m bits, in other words to divide a by 2^m. P in the function name indicates that the division result is rounded to adjacent integers by probability, possibly down or up, with a particular probability of 1- α, where α isAnd the distance between adjacent ones of the integers. Function return

Where

u e

0,1 is 1 random bit.

Compared with the prior art, the invention has the technical effects that:

the invention provides a fixed point number representation and operation system for privacy protection machine learning. The invention enables safe multi-party calculation based on the BGW protocol to support fixed point number coding and operation and meets the requirement of machine learning operation. According to our experimental results, models (such as linear regression, logistic regression, BP neural network and LSTM neural network) trained by using the machine learning framework of the system representing fixed point number can execute the prediction and classification tasks with almost the same precision compared with the existing machine learning framework. Furthermore, experiments show that the linear regression based on the fixed point number scheme is substantially consistent with the linear regression based on the floating point number scheme in terms of accuracy and loss functions.

Drawings

FIG. 1 is a diagram of a fixed point number encoding scheme and an operation mechanism in privacy-preserving machine learning.

Fig. 2 is a coding scheme with signed fixed point numbers in the integer domain.

FIG. 3 is a fixed-point number multiplication fraction partial truncation method.

FIG. 4 is a comparison of linear regression based on fixed point number scheme and floating point number in accuracy.

FIG. 5 is a comparison of a linear regression based on fixed point number scheme and a linear regression based on floating point number in a loss function.

Detailed Description

The following examples are given for the detailed implementation and specific operation of the present invention, but the scope of the present invention is not limited to the following examples.

In the embodiment of the invention, a fixed point number coding scheme and an operation mechanism are used for a privacy protection machine learning framework and are mainly embodied in three aspects: the matrix library of the machine learning framework uses fixed point number to encode matrix elements, type conversion between fixed point numbers and floating point numbers is carried out before training is started and after training is finished, and an operation mechanism of fixed point number addition and multiplication is used for a predefined operator in the machine learning framework.

Design description of fixed point number representation module

As shown in FIG. 1, the machine learning framework operates through a matrix library during the training process. The matrix library is responsible for storing data in a matrix form and providing an operation method between the matrices. The data types of all elements in the same matrix are the same, namely the data type of the matrix is consistent with the data type of each element in the matrix. In a classical machine learning framework, such as Tensorflow, the data types of the matrix typically include integer, 32-bit floating point numbers and 64-bit floating point numbers. In the privacy-preserving machine learning framework based on the BGW protocol, the data type of the matrix uses fixed-point numbers, that is, all elements in all the matrices use a fixed-point number encoding scheme. The encoding scheme of fixed-point numbers in the integer domain is shown in figure 2.

Each element in the matrix is stored by using 64-bit integer numbers and a finite fieldAnd carrying out safety calculation. The finite field size p is set to 10¹²+3. The random bit (PRandBit) generation protocol on which the Div2mP protocol relies requires that the prime number p satisfy p mod 4 ═ 3. Number of fixed points with symbol

Where k is 23 and f is 20, can represent the range [ -4,3]Chinese character of' Zhong with 2^-20Signed fixed point numbers sampled at intervals. Since normalized data and model parameters in machine learning are typically distributed at [ -1,1 [ ] ]Thus meeting basic training requirements.

The data type of the fixed point number coding scheme is used, and the data type is the same as an integer protocol when descending operation is carried out in the matrix library.

Second, conversion between fixed point number and floating point number before and after training

In the privacy-preserving machine learning, data represented by floating point numbers in a data file needs to be converted into fixed point numbers stored in a matrix library before training, and training is started after a secret sharing process. After the training is completed, the model parameters represented by fixed-point numbers need to be converted into floating-point numbers, and the model parameters are assumed to be disclosed among all the participants after the training is completed.

The conversion of floating point number to fixed point number is divided into two steps: the floating point number is first multiplied by 2^fAnd rounding the result. For example, a floating point number of 1 is changed to 2 through the first step when f is 20²⁰. Second, the result of the first step is modulo q, e.g., a floating point number of 1, and 10 at q¹²In the case of +3, it is still 2 after the second step²⁰. And a floating-point number with a value of-1 is changed to-2 through a first step²⁰After the second step, becomes q-2²⁰. I.e. to a corresponding value in a finite field.

The conversion of fixed point number into floating point number is also divided into two steps: firstly, the signs are obtained, and the integer domain is smaller than

Is regarded as a positive number, greater thanThe number of (1) is regarded as negative. If positive, the value is not changed, otherwise the result is q minus the value. Secondly, the result of the first step is divided by 2^fAnd obtaining the floating point number corresponding to the fixed point number.

Design description of fixed point number addition module and fixed point number multiplication module

Let a, b ∈ E be 2 secrets shared using polynomials f (x), g (x), respectively, and let the constant c ∈ E, c ≠ 0. c, a + b only need to be operated in each party P_iThe method can be used for independent operation, and communication among all parties is not required.

The multiplication of fixed-point numbers for secret sharing requires interaction, since based on the BGW protocol, polynomial truncation is required after each multiplication. As in fig. 3, since fixed-point numbers are used, a scaling operation, i.e., a fractional part truncation operation, is also required. In the privacy-preserving machine learning framework, the multiplication c ═ a × b needs to be changed to [ c ] ← Div2mPD ([ a ] × b, k + f, f). Due to the use of Div2mPD, only 1 interaction is needed to complete polynomial truncation and fractional scaling operations in parallel. The efficiency of multiplication which is an important basic operation in a machine learning framework is greatly improved.

Fourth, experiment comparison uses the difference between fixed point number and floating point number on accuray and Loss of linear regression

This example shows through experiments that the linear regression using the fixed point number scheme is substantially consistent with the linear regression using the floating point number scheme in Accuracy accuracay and Loss function Loss.

The experiment used an MNIST handwriting dataset. Wherein the training set comprises 60000 pictures marked as 0-9, and the testing set comprises 10000 marked pictures. Each picture size 28 x 28, contains 784 features represented by gray scales of 0-255.

The linear regression model was used, and the metrics were Accuracy and Loss. Training linear regression parameters w ═ w- α × x using stochastic gradient descent^T(x w-y). Where w is the parameter matrix to be trained, size 784 x 1, and α is the learning rate, 0.001 is used. x is a small batch of training data, size 128 x 784, training 1 small batch per iteration, each small batch containing 128 training samples. y is the label corresponding to the small lot size of samples, size 128 x 1. And (5) iterating 30000 times in the training stage, and keeping a weight matrix corresponding to iteration with the highest accuracy.

The linear regression based on the fixed point number scheme was compared with the linear regression based on the floating point number, and the results are shown in fig. 4 and 5. Exactly the same training procedure is used. In other words, they are loaded with the same initial values of the weight matrix before training, and the training data are input to the model in the same order. Their accuracy and loss function curves are compared. The experiments in fig. 4, 5 represent linear regression based on fixed point number scheme and the control represents linear regression based on floating point number. The highest accuracy based on the floating point scheme is 98.75%, and the corresponding penalty function is 141.4. Based on linear regression of the fixed-point number scheme, the highest accuracy was 98.75%, and the corresponding loss function was 139.6. Experimental results show that linear regression using fixed point number scheme is substantially consistent with linear regression using floating point number scheme in terms of accuracy and loss function.

11页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种基于镜像查找表实现任意浮点型运算的硬件计算系统及其计算方法

Fixed point number coding and operation system for privacy protection machine learning

相关技术

网友询问留言