Data query method, device, equipment and storage medium based on differential privacy

文档序号:1952661 发布日期:2021-12-10 浏览:19次 中文

阅读说明:本技术 基于差分隐私的数据查询方法、装置、设备及存储介质 (Data query method, device, equipment and storage medium based on differential privacy ) 是由 吕子剑 于 2021-09-18 设计创作,主要内容包括:本发明涉及大数据领域,公开了一种基于差分隐私的数据查询方法、装置、设备及存储介质,用于解决现有技术中的基于差分隐私的数据查询方法查询速度较慢的问题。该方法包括:接收数据查询请求,并提取数据查询请求对应的数据类型;获取客户端数据库中的第二编码数据集;在第二编码数据集中筛选出数据类型对应的第二编码数据,得到目标编码数据;根据预置的即时随机响应规则对目标编码数据进行即时随机响应处理,得到第三编码数据;对第三编码数据进行统计处理,得到估计频数,根据估计频数生成数据查询结果,其中,数据查询结果满足本地化差分隐私。此外,本发明还涉及区块链技术,隐私数据的相关信息可存储于区块链中。(The invention relates to the field of big data, and discloses a data query method, a device, equipment and a storage medium based on differential privacy, which are used for solving the problem that the data query method based on differential privacy in the prior art is slow in query speed. The method comprises the following steps: receiving a data query request and extracting a data type corresponding to the data query request; acquiring a second encoding data set in the client database; screening second coded data corresponding to the data type in the second coded data set to obtain target coded data; carrying out instant random response processing on the target coded data according to a preset instant random response rule to obtain third coded data; and performing statistical processing on the third encoded data to obtain an estimation frequency, and generating a data query result according to the estimation frequency, wherein the data query result meets the requirement of localized differential privacy. In addition, the invention also relates to a block chain technology, and the related information of the private data can be stored in the block chain.)

1. A data query method based on differential privacy is characterized by comprising the following steps:

receiving a data query request, and extracting a data type corresponding to the data query request;

acquiring a second encoding data set in the client database;

screening out second coded data corresponding to the data type from the second coded data set to obtain target coded data;

carrying out instant random response processing on the target coded data according to a preset instant random response rule to obtain third coded data;

and carrying out statistical processing on the third coded data to obtain an estimation frequency, and generating a data query result according to the estimation frequency, wherein the data query result meets the requirement of localized differential privacy.

2. The differential privacy-based data query method according to claim 1, further comprising, before the receiving a data query request:

extracting original data and a preset coding table set in a client database, wherein the coding table set comprises at least one coding table;

acquiring the data type of the original data, and screening out a corresponding code table in the code table set according to the data type;

encoding the original data based on the corresponding encoding table to obtain first encoded data, wherein the first encoded data are binary data;

performing permanent random response mapping on the first coded data according to a preset permanent random response rule to obtain second coded data, wherein the second coded data are binary data;

a second set of encoded data is composed based on the second encoded data.

3. The differential privacy-based data query method according to claim 2, further comprising, before the extracting raw data and a preset encoding table set in a client database:

acquiring project characteristics in original data in a client database, and classifying the original data according to the project characteristics to obtain a plurality of data types;

coding each original data according to the data type of each original data to obtain a plurality of coding tables;

and forming a coding table set based on the plurality of coding tables.

4. The differential privacy-based data query method according to claim 3, wherein the permanent random response mapping of the first encoded data according to a preset permanent random response rule to obtain the second encoded data comprises:

extracting each digit of the first coded data to obtain a first digit sequence;

identifying each first digit in the first sequence of digits;

according to the value of the first number, outputting a real value according to a first mapping probability, and outputting a random value according to a second mapping probability to obtain an output result, wherein the real value and the random value are binary values, and the sum of the first mapping probability and the second mapping probability is 1;

obtaining a second digital sequence according to the output result;

second encoded data is generated from the second digital sequence.

5. The data query method based on differential privacy as claimed in claim 3, wherein the performing instant random response processing on the target encoded data according to a preset instant random response rule to obtain third encoded data comprises:

extracting each digit of the target coding data to obtain a third digit sequence;

identifying each third digit in the third sequence of digits;

judging whether the value of the third number is 1;

if so, outputting a true value according to a third mapping probability to obtain third encoded data;

if not, outputting the true value according to the fourth mapping probability to obtain third coded data.

6. The differential privacy-based data query method according to any one of claims 1-5, wherein before the statistical processing of the third encoded data, the method further comprises:

calling a lossless compression tool to compress the third coded data to obtain compressed third coded data;

transmitting the compressed third encoding data to the data statistics server;

and decoding the compressed third encoded data according to a decoding dictionary.

7. A differential privacy-based data query apparatus, comprising:

the receiving module is used for receiving a data query request and extracting a data type corresponding to the data query request;

the acquisition module is used for acquiring a second encoding data set in the client database;

the first response module is used for screening out second coded data corresponding to the data type in the second coded data set to obtain target coded data;

the second response module is used for carrying out instant random response processing on the target coded data according to a preset instant random response rule to obtain third coded data;

and the result generation module is used for carrying out statistical processing on the third coded data to obtain the estimation frequency, and generating a data query result according to the estimation frequency, wherein the data query result meets the requirement of localized differential privacy.

8. The differential privacy-based data query device according to claim 7, further comprising an encoded data set generation module, wherein the encoded data set generation module comprises:

the system comprises an extraction unit, a storage unit and a processing unit, wherein the extraction unit is used for extracting original data and a preset coding table set in a client database, and the coding table set comprises at least one coding table;

the screening unit is used for acquiring the data type of the original data and screening out the corresponding coding table in the coding table set according to the data type;

the first coding unit is used for coding the original data based on the corresponding coding table to obtain first coded data, wherein the first coded data are binary data;

the second coding unit is used for carrying out permanent random response mapping on the first coded data according to a preset permanent random response rule to obtain second coded data, wherein the second coded data are binary data;

a generating unit for composing a second encoded data set based on the second encoded data.

9. A differential privacy data query device, the differential privacy data query device comprising: a memory and at least one processor, the memory having instructions stored therein;

the at least one processor invoking the instructions in the memory to cause the differential privacy data query device to perform the steps of the differential privacy data query method of any one of claims 1-6.

10. A computer-readable storage medium having instructions stored thereon, wherein the instructions, when executed by a processor, implement the steps of the differential privacy data query method according to any one of claims 1-6.

Technical Field

The invention relates to the field of big data, in particular to a data query method, a data query device, data query equipment and a storage medium based on differential privacy.

Background

For a database or an organization having data, the data query function is opened while the privacy of the user needs to be prevented from being revealed, and generally, a differential privacy mode is adopted to protect the data in order to achieve the purpose of privacy protection in providing the query function. Differential privacy is a means in cryptography, and aims to provide a way to maximize the accuracy of data queries while minimizing the chances of identifying their records when querying information from a statistical database.

In the prior art, when data query is performed by using differential privacy, in order to protect the privacy from being disclosed, a bloom filter is generally adopted to map original data into a bit string, and then subsequent processing is performed on the obtained bit string to obtain data content after privacy processing.

Disclosure of Invention

The invention mainly aims to solve the problem that a data query method based on differential privacy in the prior art is slow in query speed.

The invention provides a data query method based on differential privacy in a first aspect, which comprises the following steps: receiving a data query request, and extracting a data type corresponding to the data query request; acquiring a second encoding data set in the client database; screening out second coded data corresponding to the data type from the second coded data set to obtain target coded data; carrying out instant random response processing on the target coded data according to a preset instant random response rule to obtain third coded data; and carrying out statistical processing on the third coded data to obtain an estimation frequency, and generating a data query result according to the estimation frequency, wherein the data query result meets the requirement of localized differential privacy.

Optionally, in a first implementation manner of the first aspect of the present invention, before the receiving the data query request, the method further includes: extracting original data and a preset coding table set in a client database, wherein the coding table set comprises at least one coding table; acquiring the data type of the original data, and screening out a corresponding code table in the code table set according to the data type; encoding the original data based on the corresponding encoding table to obtain first encoded data, wherein the first encoded data are binary data; performing permanent random response mapping on the first coded data according to a preset permanent random response rule to obtain second coded data, wherein the second coded data are binary data; a second set of encoded data is composed based on the second encoded data.

Optionally, in a second implementation manner of the first aspect of the present invention, before the extracting the original data and the preset encoding table set in the client database, the method further includes: acquiring project characteristics in original data in a client database, and classifying the original data according to the project characteristics to obtain a plurality of data types; coding each original data according to the data type of each original data to obtain a plurality of coding tables; and forming a coding table set based on the plurality of coding tables.

Optionally, in a third implementation manner of the first aspect of the present invention, the performing permanent random response mapping on the first encoded data according to a preset permanent random response rule to obtain second encoded data includes: extracting each digit of the first coded data to obtain a first digit sequence; identifying each first digit in the first sequence of digits; according to the value of the first number, outputting a real value according to a first mapping probability, and outputting a random value according to a second mapping probability to obtain an output result, wherein the real value and the random value are binary values, and the sum of the first mapping probability and the second mapping probability is 1; obtaining a second digital sequence according to the output result; second encoded data is generated from the second digital sequence.

Optionally, in a fourth implementation manner of the first aspect of the present invention, the performing, according to a preset immediate random response rule, immediate random response processing on the target encoded data to obtain third encoded data includes: extracting each digit of the target coding data to obtain a third digit sequence; identifying each third digit in the third sequence of digits; judging whether the value of the third number is 1, if so, outputting a true value according to a third mapping probability to obtain third coded data; if not, outputting the true value according to the fourth mapping probability to obtain third coded data.

Optionally, in a fifth implementation manner of the first aspect of the present invention, before performing statistical processing on the third encoded data, the method further includes: calling a lossless compression tool to compress the third coded data to obtain compressed third coded data; transmitting the compressed third encoding data to the data statistics server; and decoding the compressed third encoded data according to a decoding dictionary.

The invention provides a data inquiry device based on differential privacy, comprising: the receiving module is used for receiving a data query request and extracting a data type corresponding to the data query request; the acquisition module is used for acquiring a second encoding data set in the client database; the first response module is used for screening out second coded data corresponding to the data type in the second coded data set to obtain target coded data; the second response module is used for carrying out instant random response processing on the target coded data according to a preset instant random response rule to obtain third coded data; and the result generation module is used for carrying out statistical processing on the third coded data to obtain the estimation frequency, and generating a data query result according to the estimation frequency, wherein the data query result meets the requirement of localized differential privacy.

Optionally, in a first implementation manner of the second aspect of the present invention, the differential privacy-based data query apparatus further includes an encoded data set generating module, where the encoded data set generating module includes: the system comprises an extraction unit, a storage unit and a processing unit, wherein the extraction unit is used for extracting original data and a preset coding table set in a client database, and the coding table set comprises at least one coding table; the screening unit is used for acquiring the data type of the original data and screening out the corresponding coding table in the coding table set according to the data type; the first coding unit is used for coding the original data based on the corresponding coding table to obtain first coded data, wherein the first coded data are binary data; the second coding unit is used for carrying out permanent random response mapping on the first coded data according to a preset permanent random response rule to obtain second coded data, wherein the second coded data are binary data; a generating unit for composing a second encoded data set based on the second encoded data.

Optionally, in a second implementation manner of the second aspect of the present invention, the data query apparatus based on differential privacy further includes a coding table set generating module, where the coding table set generating module is specifically configured to: acquiring project characteristics in original data in a client database, and classifying the original data according to the project characteristics to obtain a plurality of data types; coding each original data according to the data type of each original data to obtain a plurality of coding tables; and forming a coding table set based on the plurality of coding tables.

Optionally, in a third implementation manner of the second aspect of the present invention, the second encoding unit is specifically configured to: extracting each digit of the first coded data to obtain a first digit sequence; identifying each first digit in the first sequence of digits; according to the value of the first number, outputting a real value according to a first mapping probability, and outputting a random value according to a second mapping probability to obtain an output result, wherein the real value and the random value are binary values, and the sum of the first mapping probability and the second mapping probability is 1; obtaining a second digital sequence according to the output result; second encoded data is generated from the second digital sequence.

Optionally, in a fourth implementation manner of the second aspect of the present invention, the second response module is specifically configured to: extracting each digit of the target coding data to obtain a third digit sequence; identifying each third digit in the third sequence of digits; judging whether the value of the third number is 1, if so, outputting a true value according to a third mapping probability to obtain third coded data; if not, outputting the true value according to the fourth mapping probability to obtain third coded data.

Optionally, in a fifth implementation manner of the second aspect of the present invention, the data query apparatus based on differential privacy further includes a data compression module, where the data compression module is specifically configured to: calling a lossless compression tool to compress the third coded data to obtain compressed third coded data; transmitting the compressed third encoding data to the data statistics server; and decoding the compressed third encoded data according to a decoding dictionary.

The third aspect of the present invention provides a data query device based on differential privacy, including: a memory and at least one processor, the memory having instructions stored therein; the at least one processor invokes the instructions in the memory to cause the differential privacy-based data query device to perform the steps of the differential privacy-based data query method described above.

A fourth aspect of the present invention provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to perform the steps of the differential privacy-based data query method described above.

In the technical scheme provided by the invention, a data query request is received, and a data type corresponding to the data query request is extracted; acquiring a second encoding data set in the client database; screening second coded data corresponding to the data type in the second coded data set to obtain target coded data; carrying out instant random response processing on the target coded data according to a preset instant random response rule to obtain third coded data; and performing statistical processing on the third encoded data to obtain an estimation frequency, and generating a data query result according to the estimation frequency. According to the technical scheme, the data query speed is increased while differential privacy protection is performed on the data.

Drawings

Fig. 1 is a schematic diagram of a first embodiment of a data query method based on differential privacy according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating a second embodiment of a data query method based on differential privacy according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating a third embodiment of a data query method based on differential privacy according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of an embodiment of a data query apparatus based on differential privacy according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of another embodiment of a data query device based on differential privacy according to an embodiment of the present invention;

fig. 6 is a schematic diagram of an embodiment of a data query device based on differential privacy according to an embodiment of the present invention.

Detailed Description

The embodiment of the invention provides a data query method, a data query device, data query equipment and a storage medium based on differential privacy, which are used for receiving a data query request and extracting a data type corresponding to the data query request; acquiring a second encoding data set in the client database; screening second coded data corresponding to the data type in the second coded data set to obtain target coded data; carrying out instant random response processing on the target coded data according to a preset instant random response rule to obtain third coded data; and performing statistical processing on the third encoded data to obtain an estimation frequency, and generating a data query result according to the estimation frequency, wherein the data query result meets the requirement of localized differential privacy. According to the embodiment of the invention, the data query speed is accelerated while differential privacy protection is carried out on the data.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

For convenience of understanding, a specific flow of the embodiment of the present invention is described below, and referring to fig. 1, an embodiment of the differential privacy-based data query method in the embodiment of the present invention includes:

101. receiving a data query request and extracting a data type corresponding to the data query request;

it is to be understood that the execution subject of the present invention may be a data query device based on differential privacy, and may also be a terminal or a server, which is not limited herein. The embodiment of the present invention is described by taking a server as an execution subject.

Differential privacy (differential privacy) is a means in cryptography, and aims to maximize the accuracy of data queries while minimizing the chances of identifying their records when querying from statistical databases. In this embodiment, a localized differential privacy method is used to protect local data when querying data. Specifically, localized differential privacy is defined as: giving n users, each user corresponding to one record, giving a privacy algorithm F, and if the algorithm F obtains the same output y on any two records t and t', if the algorithm F meets the following conditions:

Pr[F(t)=y]≤e×Pr[F(t′)=y];

f satisfies ∈-Localized differential privacy.

In this embodiment, specific information of a client is stored in a client database, and when a server needs to perform data statistics on specific information of a certain data type, a data statistics request needs to be issued to the client within a data statistics range to obtain corresponding information content in the client database; the data statistics request includes the requested data type.

102. Acquiring a second encoding data set in the client database;

and acquiring a second encoding data set prestored in the client database, wherein the second encoding data set comprises at least one second encoding data, and the second encoding data is a data string obtained by encoding the original data in the client according to a preset encoding rule.

Specifically, the second code may be obtained by encoding according to a preset encoding rule in the client before the data statistics request is received; or when receiving the data statistics request, according to the type of the data to be collected, the statistics server sends the coding rule to the client when issuing the data statistics request again, and the client codes the data to be collected according to the received coding rule to generate the data.

103. Screening second coded data corresponding to the data type in the second coded data set to obtain target coded data;

and screening the second coded data in the second coded data set according to the data statistical request to obtain target data. Specifically, in the embodiment, when the original data is encoded in advance, the encoded data interval may be determined in advance according to the type of the original data, and when the data type corresponding to the data statistics request is extracted, the data interval corresponding to the data type is determined, so as to extract the corresponding target encoded data.

104. Carrying out instant random response processing on the target coded data according to a preset instant random response rule to obtain third coded data;

after the corresponding target coding data is determined in the previous step, calling a preset instant random response rule to perform instant random response processing on the target coding data, wherein the instant random response processing needs to be performed on the obtained target coding data once when a data statistics request is received each time; so that random response processing can be performed once when the target processing data acquired under different statistical commands is the same target processing data.

Specifically, in this embodiment, a preset instant random response rule is obtained, and random mapping processing is performed on each bit in the target encoded data according to a probability parameter of the instant random response rule, so as to obtain third encoded data. Wherein, the probability parameter can make the third coded data obtained after mapping satisfy e-Localized differential privacy.

105. And performing statistical processing on the third encoded data to obtain an estimation frequency, and generating a data query result according to the estimation frequency.

And after the third coded data are obtained, sending the third coded data to a statistical server sending a data statistical request, after the statistical server receives the third coded data, performing statistics on the obtained third coded data according to the probability parameters in the coding rule and the response rule to obtain an estimated frequency count of the third coded data, and calculating an expected value of the original data corresponding to the third coded data based on the estimated frequency count.

When data statistics is carried out, the statistics server sends data requests to the plurality of clients to obtain a plurality of third coded data; and performing statistical correction processing according to the plurality of expected values corresponding to the plurality of third coded data to finally obtain a statistical result.

According to the embodiment of the invention, the data query speed is accelerated while differential privacy protection is carried out on the data.

Referring to fig. 2, a second embodiment of the data query method based on differential privacy according to the embodiment of the present invention includes:

in the embodiment, the project characteristics in the original data in the client database are obtained in advance, and the original data are classified according to the project characteristics to obtain a plurality of data types; coding each original data according to the data type of each original data to obtain a plurality of coding tables; a coding table set is composed based on a plurality of coding tables. The situation of hash value conflict is solved by replacing a bloom filter with codes so as to improve the statistical precision; meanwhile, codes are generated according to the specific item quantity of the data, and the efficiency of subsequent randomization processing is improved.

In a specific example, an encoding table may be as shown in table 1:

TABLE 1

Word Encoding
Diabetes mellitus 00
AIDS (acquired immune deficiency syndrome) 01
Lung cancer 10

201. Extracting original data and a preset coding table set in a client database;

202. acquiring the data type of original data, and screening out a corresponding coding table in a coding table set according to the data type;

in this embodiment, before receiving a data statistics request, data may be classified according to characteristics of data to be counted, and encoded according to a classified result, so as to generate a plurality of encoding tables and form an encoding table set; presetting the obtained coding table set in a client database when a client is installed or updated; the encoding is binary encoding, and specific bits of the encoding are required to ensure that different data contents cannot be repeated after encoding.

203. Encoding the original data based on the corresponding encoding table to obtain first encoded data;

after the corresponding coding table is found according to the data type of the original data, the original data is coded according to the content of the coding table to obtain first coded data corresponding to the original data, wherein the first coded data is binary data.

And when the data to be counted comprises the disease type and the income interval, generating a specific corresponding code according to the specific content of the disease type or the specific content of the income interval.

204. Performing permanent random response mapping on the first coded data according to a preset permanent random response rule to obtain second coded data;

205. composing a second encoded data set based on the second encoded data;

specifically, in order to ensure that the results of the encoding do not conflict, in a specific example, when statistics is performed, as in the foregoing table 1, only focusing on the 3 cases, only 2-bit 2-ary encoding may be adopted.

After the first coded data are obtained, performing permanent random response mapping operation on each bit in the first coded data according to a preset permanent random response rule, namely performing random disturbance processing on each bit in the first coded data, wherein the probability of performing random disturbance is preset according to the degree of data protection required to be achieved; and the second coding data after data disturbance is formed into a second coding data set.

206. Receiving a data query request and extracting a data type corresponding to the data query request;

207. acquiring a second encoding data set in the client database;

208. screening second coded data corresponding to the data type in the second coded data set to obtain target coded data;

when a server needs to perform data statistics on specific information of a certain data type, a data statistics request needs to be issued to a client within a data statistics range, and a data type corresponding to a data query request is extracted; and then acquiring a second encoding data set prestored in the client database. And screening the second coded data in the second coded data set according to the data statistical request to obtain target data.

209. Carrying out instant random response processing on the target coded data according to a preset instant random response rule to obtain third coded data;

and after the corresponding target coding data is determined, calling a preset instant random response rule to perform instant random response processing on the target coding data, wherein the instant random response processing needs to be performed on the obtained target coding data once when a data statistics request is received each time.

Specifically, a preset instant random response rule is obtained, and random mapping processing is performed on each bit in the target coded data according to the probability parameter of the instant random response rule to obtain third coded data. Wherein, the probability parameter can make the third coded data obtained after mapping satisfy e-Localized differential privacy.

210. And performing statistical processing on the third encoded data to obtain an estimation frequency, and generating a data query result according to the estimation frequency.

Sending the third encoded data to a statistical server sending a data statistical request, after receiving the third encoded data, the statistical server performing statistics on the obtained third encoded data according to the probability parameters in the encoding rule and the response rule to obtain an estimated frequency count of the third encoded data, and calculating an expected value of original data corresponding to the third encoded data based on the estimated frequency count; and performing statistical correction processing according to the plurality of expected values corresponding to the plurality of third coded data to finally obtain a statistical result.

According to the embodiment of the invention, the data query speed is accelerated while differential privacy protection is carried out on the data.

Referring to fig. 3, a third embodiment of the data query method based on differential privacy according to the embodiment of the present invention includes:

in the embodiment, the project characteristics in the original data in the client database are obtained in advance, and the original data are classified according to the project characteristics to obtain a plurality of data types; coding each original data according to the data type of each original data to obtain a plurality of coding tables; a coding table set is composed based on a plurality of coding tables. The efficiency of subsequent randomization is improved by generating codes according to the number of specific items of data. In a specific example, an encoding table may be as shown in table 1 in the foregoing embodiment.

301. Extracting original data and a preset coding table set in a client database;

302. acquiring the data type of original data, and screening out a corresponding coding table in a coding table set according to the data type;

in this embodiment, before receiving a data statistics request, data may be classified according to characteristics of data to be counted, and encoded according to a classified result, so as to generate a plurality of encoding tables and form an encoding table set; presetting the obtained coding table set in a client database when a client is installed or updated; the encoding is binary encoding, and specific bits of the encoding are required to ensure that different data contents cannot be repeated after encoding.

303. Encoding the original data based on the corresponding encoding table to obtain first encoded data;

after the corresponding coding table is found according to the data type of the original data, the original data is coded according to the content of the coding table to obtain first coded data corresponding to the original data, wherein the first coded data is binary.

And when the data to be counted comprises the disease type and the income interval, generating a specific corresponding code according to the specific content of the disease type or the specific content of the income interval.

304. Extracting each digit of the first coded data to obtain a first digit sequence;

305. identifying each first digit in the first sequence of digits;

306. outputting a real value according to the value of the first number by a first mapping probability, and outputting a random value according to a second mapping probability to obtain an output result;

307. obtaining a second digital sequence according to the output result, and generating second coded data according to the second digital sequence;

in this embodiment, each digit of the first encoded data is extracted to obtain a first digit sequence, each first digit in the first digit sequence is identified, and a second digit sequence is generated according to a preset response rule and a value of the first digit to obtain second encoded data, specifically, B represents the first encoded data, B' represents the second encoded data, and f is a disturbance probability; in a specific example, the permanent random response rule in the present embodiment may be:

the specific purpose is that the client performs permanent randomization processing on data once, namely: each bit of the bit string B of the first encoded data is randomly answered with a probability of f, and is truly answered with a probability of 1-f, that is, when a certain digit in the first encoded data is 1, a true value of 1 is output with a probability of 1-f, and 0 or 1 is randomly selected to be output with a probability of f.

For example, when a first coded data is 10001101, the second coded data may be 10010010 after passing through the rules of the permanent random response.

In addition, to ensure that the outputs of any two inputs are indistinguishable, it is necessary to ensure that the resulting second encoded data satisfies ∈-Localized differential privacy, assuming that a bit of B' is 0, the probability of the input B being 0 or 1, respectively, is:

wherein, the above formula satisfies:

then, it can be known that:

that is, the above equation satisfies f ≦ 0.5 in the present embodiment.

308. Composing a second encoded data set based on the second encoded data;

after the first coded data are obtained, performing permanent random response mapping operation on each bit in the first coded data according to a preset permanent random response rule, namely performing random disturbance processing on each bit in the first coded data, wherein the probability of performing random disturbance is preset according to the degree of data protection required to be achieved; and the second coding data after data disturbance is formed into a second coding data set.

309. Receiving a data query request and extracting a data type corresponding to the data query request;

310. acquiring a second encoding data set in the client database;

311. screening second coded data corresponding to the data type in the second coded data set to obtain target coded data;

when a server needs to perform data statistics on specific information of a certain data type, a data statistics request needs to be issued to a client within a data statistics range, and a data type corresponding to a data query request is extracted; and then acquiring a second encoding data set prestored in the client database. And screening the second coded data in the second coded data set according to the data statistical request to obtain target data.

312. Extracting each digit of the target coded data to obtain a third digit sequence;

313. identifying each third digit in the third sequence of digits;

314. judging whether the value of the third number is 1;

315. if so, outputting a true value according to a third mapping probability to obtain third encoded data;

316. if not, outputting a true value according to a fourth mapping probability to obtain third encoded data;

after the corresponding target coding data is determined in the previous step, calling a preset instant random response rule to perform instant random response processing on the target coding data, wherein the instant random response processing needs to be performed on the obtained target coding data once when a data statistics request is received each time; so that random response processing can be performed once when the target processing data acquired under different statistical commands is the same target processing data.

The target coded data is subjected to instant random response processing according to a preset instant random response rule, and a specifically used expression when third coded data is obtained is as follows:

wherein S represents third coded data, and p and q are disturbance probabilities of timely random response; the random response rule in this step is: for each bit of the target encoded data B 'obtained as described above, if 1, 1 is held with a probability of q, and 0 is set with a probability of 1-q, and if 0 is set for each bit of the bit string B', 1 is held with a probability of p, and 0 is set with a probability of 1-p, thereby obtaining the third encoded data S.

In addition, in this embodiment, the third encoded data obtained by the timely random response processing satisfies ∈-Localized differential privacy; when performing the real-time random response, if the input is 0 and the output is 1, there are two possibilities, that is: 0-1-1 and 0-0-1. The sum of their probabilities is as follows:

if the input is 1, two possible probabilities of 1 are output, namely: 1-0-1 and 1-1-1, the sum of their probabilities being as follows:

if the input is 0 and the output is 0, then there are two possibilities: 0-0-0 and 0-1-0. The sum of their probabilities is as follows:

if the input is 1 and the output is 0, then there are two possibilities: 1-1-0 and 1-1-1. The sum of their probabilities is as follows:

as a specific example, taking p to 0.75 and q to 0.5, if 1, 1 is held with a probability of 0.75 and 0.25 is taken into consideration; if 0, 0.5 probability is 1, and 0.5 probability is 0, when the bit string of one B' is 10101110, the output S may be 11010101 after the immediate random response.

317. And performing statistical processing on the third encoded data to obtain an estimation frequency, and generating a data query result according to the estimation frequency.

After the third coded data are obtained, calling a lossless compression tool to compress the third coded data to obtain compressed third coded data; transmitting the compressed third coded data to a data statistics server; and decoding the compressed third encoded data according to the decoding dictionary.

The Lossless Data Compression (Lossless Data Compression) is a process of reconstructing (or restoring or decompressing) Data using compressed Data, and the reconstructed Data is completely the same as the original Data, but is usually applied to a case where a signal to be reconstructed is required to be completely the same as the original signal with a Compression ratio smaller than that of lossy Data Compression. That is, the information is not lost after the data is compressed, and the data can be compressed by adopting a sliding window algorithm.

After the third encoded data are obtained, the third encoded data are sent to a statistical server sending a data statistical request, after the statistical server receives the third encoded data, the obtained third encoded data are counted according to probability parameters in the encoding rule and the response rule to obtain an estimated frequency count of the third encoded data, and an expected value of original data corresponding to the third encoded data is calculated based on the estimated frequency count; when data statistics is carried out, the statistics server sends data requests to the plurality of clients to obtain a plurality of third coded data; and carrying out statistical correction processing according to the plurality of expected values corresponding to the plurality of third coded data to finally obtain a statistical result.

According to the embodiment of the invention, the data query speed is accelerated while differential privacy protection is carried out on the data.

With reference to fig. 4, the differential privacy-based data query method in the embodiment of the present invention is described above, and a differential privacy-based data query device in the embodiment of the present invention is described below, where an embodiment of the differential privacy-based data query device in the embodiment of the present invention includes:

a receiving module 401, configured to receive a data query request and extract a data type corresponding to the data query request;

an obtaining module 402, configured to obtain a second encoding data set in the client database;

a first response module 403, configured to filter out second encoded data corresponding to the data type in the second encoded data set, so as to obtain target encoded data;

a second response module 404, configured to perform an instant random response process on the target encoded data according to a preset instant random response rule, so as to obtain third encoded data;

a result generating module 405, configured to perform statistical processing on the third encoded data to obtain an estimation frequency, and generate a data query result according to the estimation frequency, where the data query result meets the requirement of localized differential privacy.

According to the embodiment of the invention, the data query speed is accelerated while differential privacy protection is carried out on the data.

Referring to fig. 5, another embodiment of the data query apparatus based on differential privacy according to the embodiment of the present invention includes:

a receiving module 401, configured to receive a data query request and extract a data type corresponding to the data query request;

an obtaining module 402, configured to obtain a second encoding data set in the client database;

a first response module 403, configured to filter out second encoded data corresponding to the data type in the second encoded data set, so as to obtain target encoded data;

a second response module 404, configured to perform an instant random response process on the target encoded data according to a preset instant random response rule, so as to obtain third encoded data;

a result generating module 405, configured to perform statistical processing on the third encoded data to obtain an estimation frequency, and generate a data query result according to the estimation frequency.

In another embodiment of the present application, the differential privacy-based data query apparatus further includes an encoded data set generating module 406, and the encoded data set generating module 406 includes:

the extracting unit 4061 is configured to extract original data and a preset encoding table set in a client database, where the encoding table set includes at least one encoding table;

the screening unit 4062 is configured to obtain a data type of the original data, and screen out a corresponding encoding table in the encoding table set according to the data type;

a first encoding unit 4063, configured to encode the original data based on the corresponding encoding table to obtain first encoded data, where the first encoded data is binary data;

a second encoding unit 4064, configured to perform persistent random response mapping on the first encoded data according to a preset persistent random response rule to obtain second encoded data, where the second encoded data is binary data;

a generating unit 4065, configured to compose a second encoded data set based on the second encoded data, where the data query result satisfies localized differential privacy.

In another embodiment of the present application, the data query apparatus based on differential privacy further includes a code table set generating module, where the code table set generating module is specifically configured to:

acquiring project characteristics in all data types, and classifying the data types according to the project characteristics to obtain a plurality of data types; coding each data according to the data type in each data type to obtain a plurality of coding tables; and forming a coding table set based on the plurality of coding tables.

In another embodiment of the present application, the second encoding unit 4064 is specifically configured to:

extracting each digit of the first coded data to obtain a first digit sequence; identifying each first digit in the first sequence of digits; according to the value of the first number, outputting a real value according to a first mapping probability, and outputting a random value according to a second mapping probability to obtain an output result, wherein the real value and the random value are binary values, and the sum of the first mapping probability and the second mapping probability is 1; obtaining a second digital sequence according to the output result; second encoded data is generated from the second digital sequence.

In another embodiment of the present application, the second response module 404 is specifically configured to:

extracting each digit of the target coding data to obtain a third digit sequence; identifying each third digit in the third sequence of digits; judging whether the value of the third number is 1, if so, outputting a true value according to a third mapping probability to obtain third coded data; if not, outputting the true value according to the fourth mapping probability to obtain third coded data.

In another embodiment of the present application, the data query apparatus based on differential privacy further includes a data compression module, where the data compression module is specifically configured to:

calling a lossless compression tool to compress the third coded data to obtain compressed third coded data; transmitting the compressed third encoding data to the data statistics server; and decoding the compressed third encoded data according to a decoding dictionary.

According to the embodiment of the invention, the data query speed is accelerated while differential privacy protection is carried out on the data.

Fig. 4 and fig. 5 describe the data query device based on differential privacy in the embodiment of the present invention in detail from the perspective of the modular functional entity, and the data query device based on differential privacy in the embodiment of the present invention is described in detail from the perspective of hardware processing.

Fig. 6 is a schematic structural diagram of a differential privacy-based data query device 600 according to an embodiment of the present invention, which may generate relatively large differences due to different configurations or performances, and may include one or more processors (CPUs) 610 (e.g., one or more processors) and a memory 620, one or more storage media 630 (e.g., one or more mass storage devices) for storing applications 633 or data 632. Memory 620 and storage medium 630 may be, among other things, transient or persistent storage. The program stored on the storage medium 630 may include one or more modules (not shown), each of which may include a sequence of instructions operating on the differential privacy based data query device 600. Still further, the processor 610 may be configured to communicate with the storage medium 630 to execute a series of instruction operations in the storage medium 630 on the differential privacy based data query device 600.

The differential privacy-based data query device 600 may also include one or more power supplies 640, one or more wired or wireless network interfaces 650, one or more input-output interfaces 660, and/or one or more operating systems 631, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, and the like. Those skilled in the art will appreciate that the differential privacy based data query device architecture illustrated in fig. 6 does not constitute a limitation of differential privacy based data query devices and may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components.

The present invention also provides a computer device, which may be any device capable of executing the differential privacy-based data query method described in the above embodiments, and the computer device includes a memory and a processor, where the memory stores computer-readable instructions, and the computer-readable instructions, when executed by the processor, cause the processor to execute the steps of the differential privacy-based data query method in the above embodiments.

The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

The present invention also provides a computer-readable storage medium, which may be a non-volatile computer-readable storage medium, and may also be a volatile computer-readable storage medium, having stored therein instructions, which, when executed on a computer, cause the computer to perform the steps of the differential privacy-based data query method.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

20页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:基于微信生态的企业微信关联企业业务系统的方法及介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!