Differential privacy using count mean sketch
阅读说明:本技术 使用计数均值草图的差分隐私 (Differential privacy using count mean sketch ) 是由 A·鲍米克 A·H·维罗斯 U·S·韦沙姆帕延 K·W·德克尔 C·A·舒尔茨 S·J·法 于 2018-03-28 设计创作,主要内容包括:本文所述的实施方案提供了一种隐私机制,该隐私机制用于在将用户数据传输至估计此类数据在一组客户端设备中的频率的服务器时保护数据。在一个实施方案中,使用计数均值草图技术实现差分隐私机制,该技术在提供关于隐私和效用的可证明保证时可减少启用隐私所需的资源需求。例如,该机制可提供根据资源需求(例如,传输带宽和计算复杂性)定制效用(例如,估计的精度)的能力。(Embodiments described herein provide a privacy mechanism for protecting user data when transmitting such data to a server that estimates the frequency of such data in a set of client devices. In one embodiment, a differential privacy mechanism is implemented using count-mean sketch techniques that may reduce the resource requirements needed to enable privacy while providing provable guarantees about privacy and utility. For example, the mechanism may provide the ability to tailor the utility (e.g., estimated accuracy) according to resource requirements (e.g., transmission bandwidth and computational complexity).)
1. A non-transitory machine-readable medium storing instructions that, when executed by one or more processors of a computing device, cause the computing device to perform operations comprising:
selecting a user data value to be transmitted to the server from a set of possible user data values collected on the client device;
creating hash entries for the user data values using a random hash function, wherein the random hash function is generated from randomly selected variants of the entries, and wherein a set of possible variants of the entries is indexed;
encoding at least a portion of the hash entries of the value as a vector, wherein the encoding comprises updating the vector value at a location corresponding to the hash;
privatizing the vector by changing at least some of the vector values with a predefined probability; and
transmitting the privatization vector and the index value of the randomly selected variant to the server to enable the server to estimate a frequency of the user data values on a set of client devices.
2. The non-transitory machine readable medium of claim 1, wherein:
the item is a string of characters and the randomly selected variant of the item includes one or more characters representing the index value appended to the variant of the string of characters;
the server estimating the frequency of the user data value by updating a frequency table indexed by the set of possible variants and updating a row or column of the frequency table corresponding to the index value of the random hash function with the privatization vector; and
the server estimates a frequency for each of the set of possible user data values from data accumulated from the set of client devices.
3. The non-transitory machine readable medium of claim 1, wherein the encoding comprises initializing the vector using a uniform value and sign, and updating the vector value comprises flipping the sign of the value at the location corresponding to the created hash value.
4. The non-transitory machine readable medium of claim 3, wherein initializing the vector comprises setting the each vector value to a value representing a constant, and updating the vector values comprises setting the values at the locations corresponding to created hash values to values representing sign flips of the constant.
5. The non-transitory machine-readable medium of claim 1, wherein the random hash function is to resolve hash collisions when using only the portion of the created hash values and to reduce the number of computations required to create a frequency table when maintaining privacy of the user data values.
6. The non-transitory machine-readable medium of claim 1, wherein the encoding vector is a Hadamard matrix and the user data value represents a website visited by a user of the client device.
7. The non-transitory machine-readable medium of claim 1, wherein only the index values of the privatization vector and the randomly selected variant are transmitted to the server as information representing the user data value.
8. The non-transitory machine readable medium of claim 1, wherein privatizing the vector comprises changing at least some of the vector values with a predefined probability, the predefined probability based on a privacy parameter.
9. The non-transitory machine readable medium of claim 8, wherein the privacy parameter represents a configuration tradeoff between privacy and accuracy.
10. The non-transitory machine readable medium of claim 8, wherein the predefined probability is defined as 1/(1+ e)ε) And ε is the privacy parameter.
11. An electronic device, comprising:
one or more processors; and
a memory coupled to the one or more processors, the memory storing instructions that, when executed by the one or more processors, cause the one or more processors to:
selecting a user data value to be transmitted to the server from a set of possible user data values collected on the client device;
creating hash entries for the user data values using a random hash function, wherein the random hash function is generated from randomly selected variants of the entries, and wherein a set of possible variants of the entries is indexed;
encoding at least a portion of the hash entries as vectors, wherein encoding at least the portion of the created hashes comprises: updating the vector value at a location corresponding to the created hash value;
privatizing the vector by changing at least some of the vector values with a predefined probability; and
transmitting the privatization vector and the index value of the randomly selected variant to the server, the server estimating a frequency of user data values in a set of different client devices.
12. The apparatus of claim 11, the server estimates the frequency through an update to a frequency table indexed by the set of possible variants, wherein a row or column of the frequency table corresponding to the indexed value of the randomly selected variant is updated with the privatization vector.
13. The apparatus of claim 12, wherein:
said randomly selected variant prevents hash collisions when using only said portion of said created hash value and reduces the number of computations required by said server to create said frequency table when maintaining privacy of said user data values;
the item is a string of characters and the randomly selected variant of the item includes one or more characters representing the index value appended to the variant of the string of characters; and
the encoding includes initializing the vector using a uniform value and sign, and updating the vector value includes sign-flipping of the value at the location corresponding to the created hash value.
14. A data processing system comprising:
one or more processors; and
a memory coupled to the one or more processors, the memory storing instructions that, when executed by the processors, cause the processors to perform operations comprising:
selecting a user data value to be transmitted to the server from a set of possible user data values collected on the client device;
creating hash entries for the user data values using a random hash function, wherein the random hash function is generated from randomly selected variants of the entries, and wherein a set of possible variants of the entries is indexed;
encoding at least a portion of the hash entries into vectors, wherein the encoding comprises: updating the vector value at a location corresponding to the created hash value;
privatizing the vector difference by changing at least some of the vector values with a predefined probability; and
transmitting the privatization vector and the index values of the randomly selected variants to the server, wherein the server estimates the frequency of the user data values by updating a frequency table indexed by the set of possible variants.
15. The data processing system of claim 19, wherein:
updating rows or columns of the frequency table corresponding to the index values of the randomly selected variants with the privatization vector;
the term is a string of characters and the randomly selected variant of the term includes one or more characters representing the index value appended to the variant of the string of characters, the randomly selected variant preventing hash collisions when using only the portion of the created hash value and reducing the number of computations required by the server to create the frequency table when maintaining privacy of the user data values;
the encoding includes initializing the vector with uniform values and symbols; and
updating the vector value comprises flipping the sign of the value at the location corresponding to the created hash value.
16. An electronic device, comprising:
a non-transitory machine readable medium to store instructions;
one or more processors to execute the instructions; and
a memory coupled to the one or more processors, the memory to store the instructions, which when executed by the one or more processors, cause the one or more processors to:
detecting an application-related interaction that is performed in response to an instruction within a web page presented by the application, wherein the web page presented by the application is for presentation of a media item;
associating the web page presented by the application with a category based on the interaction, the category selected from a set of categories related to inferred preferences of the web page presented by the application;
creating a privatization code comprising a representation of the web page and a representation of the category exposed by an application; and
transmitting the privatization codes to at least one server that accumulates privatization codes from a plurality of devices to estimate a frequency of the web pages presented by the application that are associated with the category in the plurality of devices.
17. The electronic device of claim 16, wherein the response is provided in response to initiation, attempted initiation, or permission of automatic playback of a media item presented by the web page or a web application associated with the web page, and the set of categories includes a first category related to inferred preferences that enable automatic playback of the media item.
18. The electronic device of claim 17, wherein the response includes allowing automatic playback of the media item beyond a predetermined time value after the web page presents the media item, or expanding or maximizing the media item after automatic playback is initiated for the media item.
19. The electronic device of claim 17, wherein automatic playback of the web page accessed by a user is disabled based on a setting, and the response includes selecting the media item for which playback of the web page is attempted to initiate automatic playback within a predetermined time value of presentation of the media item.
20. The electronic device of claim 16, wherein the response is provided in response to initiation, attempted initiation, or permission of automatic playback of a media item within the web page or a web application associated with the web page, and the set of categories includes a second category that is related to user preferences that infer disabling of automatic playback of the media content.
21. The electronic device of claim 20, wherein the response comprises closing or minimizing the application or a tag associated with the web page.
22. The electronic device of claim 20, wherein the response comprises muting or lowering system volume.
23. The electronic device of claim 20, wherein automatic playing of the web page is disabled based on a setting, and the response includes navigating away from the media item without playing back the media item.
24. The electronic device of claim 20, wherein the response comprises interrupting automatic playback of the media item for a predetermined value of time of the web page on which the media item is presented, and wherein interrupting the automatic playback comprises pausing, stopping, or muting the media item, or reducing a volume of the media item to a predetermined level.
25. The electronic device of claim 16, the instructions further causing the one or more processors to:
determining that content on a web page to be displayed by the application includes logic indicating that the media content is set for automatic playback; and
adjusting one or more presentation settings of the media content prior to presentation of the media content by the application, wherein adjusting the one or more presentation settings comprises disabling auto-play, delaying the auto-play, pausing the auto-play, or muting the media content.
26. The electronic device of claim 16, wherein to create the privatized code, the instructions are to cause the one or more processors to:
creating a hash value of the representation of the web page and the representation of the category by a randomly selected hash function,
encoding the initialization vector by sign-flipping vector values at positions corresponding to the hash values, and
changing at least some of the initialization vector values with a predefined probability determined based on a privacy parameter.
27. A computing system, comprising:
one or more processors; and
a memory coupled to the one or more processors, the memory storing instructions that, when executed by the one or more processors, cause the computing system to perform operations comprising:
receiving, from each of a set of client devices, a privatization encoding of a web page and a category associated with the web page, wherein the category relates to inferring a preference to present media content on the web page;
accumulating privatized code from the set of client devices; and
estimating a frequency of selected web pages associated with the category based on the accumulated privatized encodings from the set of client devices.
28. The computing system of claim 27, further comprising:
accumulating a sketch of the privatization codes received from the set of client devices, the sketch including frequency estimates of the privatization codes; and
estimating a frequency of the selected web pages associated with the category among the frequencies included in the sketch of the privatized code.
29. The computing system of claim 27, wherein estimating the frequency of the selected web pages comprises determining a count using a count mean sketch operation.
30. The computing system of claim 27, wherein the category relates to inferring a preference with respect to displaying content on the web page using a reader mode, wherein the reader mode is a mode that does not display menu items.
31. The computing system of claim 27, wherein the categories relate to inferring preferences for content blocking settings for the web page.
32. The computing system of claim 27, wherein the categories relate to inferring user preferences for enabling automatic playback of media content on the web page.
33. The computing system of claim 32, the operations further comprising: adding the selected web page to a whitelist based on an estimated frequency of users providing preference inference of enabling automatic playback of the web page, wherein the whitelist comprises a list of web pages that allow automatic playback.
34. The computing system of claim 33, the operations further comprising: removing the selected web page from the whitelist or adding the selected web page to a blacklist based on the estimated frequency of users providing preference inference for disabling the automatic playing of web pages, wherein the whitelist comprises a list of web pages that are allowed to be automatically played and the whitelist comprises a list of web pages that are disabled from being automatically played.
35. A non-transitory machine-readable medium storing instructions that, when executed by one or more processors of a computing device, cause the computing device to perform operations comprising:
receiving, from each of a set of client devices, a privatization encoding of a web page and a category associated with the web page, wherein the category relates to inferring user preferences for enabling automatic playback of media content on the web page;
accumulating privatized code from the set of client devices; and
estimating a frequency of selected web pages associated with the category based on the accumulated privatized encodings from the set of client devices.
36. The non-transitory machine-readable medium of claim 35, the operations further comprising:
accumulating a sketch of the privatization codes received from the set of client devices, the sketch including frequency estimates of the privatization codes; and
estimating a frequency of the selected web pages associated with the category among the frequencies included in the sketch of the privatized code.
37. The non-transitory machine-readable medium of claim 35, the operations further comprising: adding the selected web page to a whitelist based on an estimated frequency of users providing preference inference of enabling automatic playback of the web page, wherein the whitelist comprises a list of web pages that allow automatic playback.
38. The non-transitory machine-readable medium of claim 37, the operations further comprising: removing the selected web page from the whitelist or adding the selected web page to a blacklist based on the estimated frequency of users providing preference inference for disabling the automatic playing of web pages, wherein the whitelist comprises a list of web pages that are allowed to be automatically played and the whitelist comprises a list of web pages that are disabled from being automatically played.
39. The non-transitory machine readable medium of claim 38, wherein estimating the frequency of the selected web pages comprises determining a count using a count mean sketch operation.
40. A data processing system comprising:
a non-transitory machine readable medium to store instructions;
one or more processors to execute the instructions; and
a memory coupled to the one or more processors, the memory to store the instructions, which when executed by the one or more processors, cause the one or more processors to:
detecting an interaction with the data processing system related to a content item of the presented content, the interaction determined in response to execution of an instruction to present the content item;
associating the presented content with a category based on the interaction, the category selected from a set of categories related to inferred preferences of presenting the presented content;
creating a privatization encoding comprising a representation of the presented content and a representation of the category; and
transmitting the privatization encoding to at least one server that accumulates privatization encodings from a plurality of devices to estimate a frequency of the presented content associated with the category in the plurality of devices.
41. The data processing system of claim 40, wherein the presented content comprises a web page, a web application, or a user interface of an application, and the content item comprises a user interface element of the web page, a user interface element of the web application, or a user interface element of the application.
42. The data processing system of claim 40, wherein the content item is exposed by an application and the category is a setting selection provided by the application.
43. The data processing system of claim 42, wherein the application is a web browser or another application for viewing the presented content.
44. A non-transitory machine-readable medium storing instructions that, when executed by one or more processors of a computing device, cause the computing device to perform operations comprising:
receiving privatized encodings of representations of web pages from a set of client devices, each web page selected to be transmitted in response to exceeding a resource consumption threshold;
accumulating the privatized code received from the set of client devices;
generating a frequency estimate based on the accumulated privatized codes; and
estimating a frequency of selected ones of the web pages using the frequency estimate.
45. The non-transitory machine-readable medium of claim 44, wherein the representation of the web page comprises a website associated with the web page, and the operations further comprise estimating a frequency of a website based on the frequency of selected web pages associated with the website.
46. The non-transitory machine-readable medium of claim 44, the operations further comprising: adjusting the resource consumption threshold based on an analysis of an estimated frequency of one or more web pages included in the frequency estimate, wherein the frequency estimate is a sketch of privatization coding received from the set of client devices.
47. The non-transitory machine readable medium of claim 46, wherein the privatization encoding is a differential privatization encoding.
48. The non-transitory machine readable medium of claim 47, wherein estimating the frequency of the selected web pages comprises determining a count using a count mean sketch operation.
49. The non-transitory machine-readable medium of claim 44, wherein the resource consumption corresponds to usage of a processor or memory of the computing device.
50. The non-transitory machine readable medium of claim 44, wherein the resource consumption corresponds to power usage or data transmission bandwidth.
51. An electronic device, comprising:
one or more processors; and
a memory coupled to the one or more processors, the memory storing instructions that, when executed by the one or more processors, cause the electronic device to:
monitoring resource consumption of an application while a web page is being exposed from a website;
determining that the resource consumption exceeds a resource consumption threshold;
generating a privatized encoding of a representation of the web page; and
transmitting the privatization encoding of the representation of the web page to a server, wherein the server is to accumulate a sketch of privatization encodings from different devices to estimate a frequency with which the web page exceeds the resource consumption threshold in the different devices.
52. The electronic device of claim 51, wherein the server is to estimate the frequency that websites have web pages that exceed the resource consumption.
53. The electronic device of claim 51, wherein the resource consumption corresponds to usage of the one or more processors or the memory coupled to the one or more processors.
54. The electronic device of claim 51, wherein the resource consumption corresponds to power usage of the electronic device or data transmission bandwidth consumption of the electronic device.
55. The electronic device of claim 51, wherein the application that accesses the web page comprises a browser.
56. The electronic device of claim 55, wherein monitoring the resource consumption of the application accessing the web page comprises monitoring the resource consumption of a process id of a tag of the application corresponding to the web page.
57. The electronic device of claim 51, wherein generating the privatized encoding of the representation of the web page comprises:
creating a hash value for the web page using a randomly selected hash function,
encoding the initialization vector by sign-flipping vector values at positions corresponding to the created hash values, and
at least some of the vector values are changed with a predefined probability.
58. The electronic device of claim 57, wherein the predefined probability is based on a predetermined privacy parameter.
59. A data processing system comprising:
one or more processors; and
a memory coupled to the one or more processors, the memory storing instructions that, when executed by the processors, cause the processors to perform operations comprising:
receiving privatized encoding of a representation of a web page from each of a set of client devices, each web page selected to be transmitted in response to a resource consumption threshold being exceeded;
accumulating a sketch of the privatized code received from the set of client devices; and
estimating a frequency of selected ones of the web pages included in the sketch.
60. The data processing system of claim 59, wherein the representation of the web page comprises a website associated with the web page, and the operations further comprise: estimating a frequency of a website based on the frequency of web pages associated with the website included in the sketch.
61. The data processing system of claim 59, the operations further comprising: adjusting the resource consumption threshold based on an analysis of an estimated frequency of one or more web pages included in the sketch.
62. The data processing system of claim 59, wherein the resource consumption corresponds to usage of one or more processors or memory coupled to the one or more processors.
63. The data processing system of claim 59, wherein the resource consumption corresponds to power usage or data transfer bandwidth.
Technical Field
The present disclosure relates generally to the field of differential privacy. More particularly, the present disclosure relates to a system that implements an effective differential privacy mechanism while still maintaining privacy and utility guarantees.
Background
As the amount of information collected in online environments grows, individuals are increasingly protected from providing various forms of information. Thus, differential privacy has become an important consideration for providers of aggregated online information. In a crowdsourced client/server environment, local differential privacy introduces randomness to user data before the client and server share the user data. The server may learn from the aggregation of crowdsourced data for all clients, but the server cannot learn data provided by any particular client. As more user information is collected, a general pattern begins to appear, which may inform and enhance the user experience. Thus, differential privacy provides insight from large datasets, but also has a mathematical proof that information for a single individual remains private.
When local differential privacy is employed, the client device needs to perform various operations to create the privatized data. Client device operations may include encoding data, which in some cases (e.g., where the order of random values reaches thousands or millions) may be resource intensive in terms of computational cost and transmission bandwidth. In addition, the server must also perform the corresponding intensive arithmetic processing privatization data. Accordingly, there is a continuing need to provide efficient mechanisms for implementing local differential privacy of user data.
Disclosure of Invention
Embodiments described herein provide a privacy mechanism for protecting user data when transmitting such data to a server that estimates the frequency of such data in a set of client devices. In one embodiment, a differential privacy mechanism is implemented using count-mean sketch techniques that may reduce the resource requirements needed to enable privacy while providing provable guarantees about privacy and utility. For example, the mechanism may provide the ability to tailor the utility (e.g., estimated accuracy) according to resource requirements (e.g., transmission bandwidth and computational complexity).
One embodiment provides a non-transitory machine-readable medium storing instructions that, when executed by one or more processors of a computing device, cause the computing device to perform operations comprising selecting a user data value to transmit to a server from a set of possible user data values collected on a client device; creating hash entries for the user data values using a random hash function, wherein the random hash function is generated from randomly selected variants of the entries, and wherein a set of possible variants of the entries is indexed; encoding at least a portion of the hash entry for the value as a vector, wherein the encoding comprises updating the vector value at a location corresponding to the hash; privatizing the vector by changing at least some of the vector values with a predefined probability; and transmitting the privatization vector and the index value of the randomly selected variant to the server to enable the server to estimate a frequency of user data values on a set of client devices.
One embodiment provides an electronic device comprising one or more processors and memory coupled to the one or more processors, the memory storing instructions that, when executed by the one or more processors, cause the one or more processors to select a user data value to transmit to a server from a set of possible user data values collected on a client device; creating hash entries for the user data values using a random hash function, wherein the random hash function is generated from randomly selected variants of the entries, and wherein a set of possible variants of the entries is indexed; encoding at least a portion of the created hash value as a vector, wherein encoding at least the portion of the created hash comprises: updating the vector value at a location corresponding to the created hash value; privatizing the vector by changing at least some of the vector values with a predefined probability; and transmitting the privatization vector and the index value of the randomly selected variant to a server that estimates a frequency of user data values in a set of different client devices.
One embodiment provides a data processing system comprising one or more processors and memory coupled to the one or more processors, the memory storing instructions that, when executed by the processors, cause the processors to perform operations comprising selecting a user data value to be transmitted to a server from a set of possible user data values collected on a client device, the set of possible user data values creating hash terms for the user data value using a random hash function, wherein the random hash function is generated from variants of the randomly selected terms, and wherein a set of possible variants of the terms are indexed; encoding at least a portion of the created hash value into a vector, wherein the encoding comprises: updating the vector value at a location corresponding to the created hash value; privatizing the vector difference by changing at least some of the vector values with a predefined probability; and transmitting the privatization vector and the index values of the randomly selected variants to a server, wherein the server estimates the frequency of the user data values by updating a frequency table indexed by a set of possible variants.
Other features of the present embodiments will be apparent from the accompanying drawings and from the detailed description that follows.
Drawings
Embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings in which like reference numerals refer to similar elements.
Fig. 1 is a block diagram illustrating an exemplary overview of a system environment according to an embodiment of the present disclosure.
Fig. 2 is a block diagram of a system for differentially privatizing shared user data according to an embodiment of the present disclosure.
Fig. 3 is an exemplary process flow for differentially privatizing encoding of user data according to an embodiment of the disclosure.
Fig. 4 illustrates an exemplary data flow for transmitting a privatized encoding of user data to a server for frequency estimation, according to an embodiment.
Fig. 5-6 depict mathematical algorithms of a differential privacy mechanism according to embodiments described herein.
Fig. 7 is an exemplary flow diagram illustrating a method of differentially privatizing encoding of user data to be transmitted to a server, according to an embodiment.
Fig. 8A-8F are exemplary flowcharts and illustrations regarding crowdsourcing user interaction and device resource consumption data, according to an embodiment.
Fig. 9 is a block diagram illustrating an exemplary API architecture that may be used in some embodiments.
Fig. 10A-10B are block diagrams of an exemplary API software stack, according to an embodiment.
Fig. 11 is a block diagram of a mobile device architecture according to an embodiment.
Fig. 12 is a block diagram illustrating an exemplary computing system that may be used in conjunction with one or more of the embodiments of the present disclosure.
Detailed Description
In various instances, the user experience of computing devices may be improved by attempting to understand the current trends in the use of these devices. For example, the suggestion of a predictive keyboard may be improved by determining which new words are popular or which emoticons are selected most frequently. The behavior of the web browser in browsing certain websites may be adjusted based on the detected user behavior. Additionally, battery life may be extended by determining which websites are currently presenting issues that may affect the battery life of the device. However, such data may be considered personal to the user and should be privatized or otherwise encoded to mask the identity of the user providing such data. Embodiments described herein provide differential privacy encoding for user data that is used to estimate the frequency of such data in a set of client devices. Such embodiments provide differential privacy techniques that can be used to reduce resource requirements or enhance user experience while providing provable guarantees about privacy and utility.
Various embodiments and aspects will be described with reference to details discussed below, and the accompanying drawings will illustrate the various embodiments. The following description and drawings are illustrative and are not to be construed as limiting. Numerous specific details are described to provide a thorough understanding of various embodiments. However, in certain instances, well-known or conventional details are not described in order to provide a concise discussion of the implementations.
Reference in the specification to "one embodiment" or "an embodiment" or "some embodiments" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment. The appearances of the phrase "an embodiment" in various places in the specification are not necessarily all referring to the same embodiment.
It should be noted that there may be variations to the flowcharts or steps (or operations) described herein without departing from the embodiments described herein. For example, the steps may be performed in parallel, concurrently, or in a different order, or steps may be added, deleted or modified.
The present disclosure recognizes that the use of personal information data in the techniques herein may be useful to benefit a user. For example, the personal information data may be used to deliver target content that is of greater interest to the user. Thus, the use of such personal information data enables planned control of delivered content. In addition, the present disclosure also contemplates other uses for which personal information data is beneficial to a user.
The present disclosure further contemplates that entities responsible for the collection, analysis, disclosure, transmission, storage, or other use of such personal information data will comply with established privacy policies and/or privacy practices. In particular, such entities should enforce and adhere to the use of privacy policies and practices that are recognized as meeting or exceeding industry or government requirements for maintaining privacy and security of personal information data. For example, personal information from a user should be collected for legitimate and legitimate uses by an entity and not shared or sold outside of these legitimate uses. In addition, such collection should only be done after the user has informed consent. In addition, such entities should take any required steps to secure and protect access to such personal information data, and to ensure that others who are able to access the personal information data comply with their privacy policies and procedures. In addition, such entities may subject themselves to third party evaluations to prove compliance with widely accepted privacy policies and practices.
Regardless of the foregoing, the present disclosure also contemplates embodiments in which a user selectively prevents use or access to personal information data. That is, the present disclosure contemplates that hardware elements and/or software elements may be provided to prevent or block access to such personal information data. For example, in the case of an ad delivery service, the disclosed technology may be configured to allow a user to opt-in or opt-out of participating in the collection of personal information data during registration with the service. As another example, the user may choose not to provide location information for the targeted content delivery service. As another example, the user may choose not to provide accurate location information, but to permit transmission of location area information.
Differential privacy mechanism
Embodiments described herein provide a differential privacy mechanism that can be used to privatize user data collected for crowdsourcing. As a general overview, local differential privacy introduces randomness to client user data before sharing the user data. As opposed to having a centralized data source D ═ D1iBelonging to a separate client i. Given the transcript T interacting with client iiIf the data element is to be replaced with null, the adversary may not be able to distinguish TiAnd the transcript that should have been generated. The degree of indistinguishability (e.g., degree of privacy) is parameterized by ε, which is a privacy parameter that represents a tradeoff between the strength of privacy assurance and the accuracy of the published results. In general, ε is considered to be a small constant. In some embodiments, the epsilon value may vary based on the type of data to be privatized, with more sensitive data being privatized to a higher degree. The following is a formal definition of local differential privacy.
Let n be the number of clients in the client-server system, let Γ be the set of all possible transcripts generated from any single client-server interaction, and let T beiIs a transcript generated by the differential privacy algorithm a when interacting with client i. Is provided withdiAnd e is the data element of the client i. If for all subsets
The following equation holds true, and algorithm a is epsilon local differential privacy:
here, diNull refers to the case where a data element of client i is removed. In other words, an adversary with n-1 data points of a data set cannot reliably test whether the nth data point is a particular value. Thus, the differential privatized data set cannot be queried in a manner that enables any particular user data to be determined.
The systems (and methods) disclosed herein include a epsilon local differential private count mean sketch mechanism that may provide improvements in accuracy, bandwidth, and computational cost relative to clients and servers while preserving user privacy. A proprietary count mean sketch mechanism may be provided within the system environment as described herein. The use of averages in a count-mean sketch mechanism enables a trade-off between processing time and accuracy. With other mechanisms, such as a median-based mechanism, increased processing time may not result in increased accuracy. In contrast, the count-mean sketch mechanism can be used to achieve higher accuracy by taking more processing time.
Fig. 1 is a block diagram of an overview of such a
Server 130 may accumulate the privatized user data 112 and determine statistical attributes such as user data frequency estimates 131 in a set of
Fig. 2 is a block diagram of a
In one embodiment,
In one embodiment, the
In one embodiment, the whitelist stores a list of websites (or web pages) in which particular features are enabled. For example, the whitelist may include a list of websites in which auto-play is enabled as further described herein.
Server 130 may include a receiving module 250 and a
Fig. 3 is an exemplary process flow of differentially privatizing encoding of user data to be transmitted to a server according to an embodiment of the disclosure. As shown in diagram 300, a system, which may be included within
As shown, the entry may be the domain name of the website visited. However, other types of representations may be used, such as, but not limited to, a URI (Uniform resource identifier) or URL may also be used. The term may also be a visited web page of a website, where the representation may identify a domain name of the visited website and a particular web page of the website. As described herein, a web page is a single page or document that is presented from or hosted by a website, but a single presented web page may include content from multiple documents. A web site is a collection of related web pages that are presented with the same name, grouping, or organization. Web pages from a web site are typically (but not necessarily) hosted by the same domain. A single domain may host multiple websites, where each website includes multiple web pages. As described herein, when referring to a website, the reference may also apply to a collection of web pages associated with the website or a domain associated with the website.
The term 302 may be converted to a numerical value using a hash function. As shown, in one embodiment, a SHA256 hash function is used. However, any other hash function may be used. For example, variations of SHA or other algorithms may be used, such as SHA1, SHA2, SHA3, MD5, Blake2, and the like, with various bit sizes. Thus, any hash function (or block cipher) may be used in implementations as long as they are well known to both the client and the server.
As described above, embodiments of the present disclosure may reduce the computational resources required and the bandwidth required for differential privacy algorithms. In one embodiment, the computational logic may use a portion of the created hash value along with the variant 304 of item 302 to resolve potential hash conflicts when frequency counting is performed by the server, which increases computational efficiency when maintaining a provable level of privacy. The variant 304 may correspond to a set of k values (or k index values) that are well known to the server. In one embodiment, to create the variant 304, the system may append a representation of the index value 306 to the item 302. As shown in this example, an integer (e.g., "1,") corresponding to the index value may be appended to the URL to create a variant (e.g., "1, applet. com", or "applet.
Once hash value 303 is generated, the system may select a portion of hash value 308. In this example, a 16-bit portion may be selected, but other sizes (e.g., 8, 16, 32, 64, etc. digits) may also be expected based on a desired level of precision or computational cost of the differential privacy algorithm. For example, increasing the number of bits (or m) increases the computation (and transmission) cost, but an improvement in accuracy can be obtained. For example, using 16 bits to provide 2161 (e.g., about 65k) potentially unique values (or mA range of values). Similarly, increasing the value of variant k increases the computational cost (e.g., the cost of computing a sketch), but in turn increases the accuracy of the estimation. As noted, the system may encode this value, and as shown, the encoding may be in the form of a vector 306. For example, vector 306 may have a value of 216A size of 1, and each position of the vector may correspond to a potential value of the created hash 303. It should be noted that vectors are described herein for convenience and mathematical purposes, but any suitable data structure may be implemented, such as a bit string, an object, and so forth.
As shown in the diagram 350 of fig. 3, the created hash value 303 (as a decimal number) may correspond to the vector/bit position 305. Thus, vector 306 may be encoded by updating a value (e.g., setting a bit to 1) at location 305. To account for any potential deviations in the 0 or null values, the system may use an initialization vector 317. In one embodiment, the initialization vector 317 may be the vector v ← [ -1)]m×cεThereby making cεNoise having an average value of 0 is added to the initialization vector. The noise should be large enough to mask individual items of user data, but small enough to allow any pattern in the data set to occur. It should be noted that these values are used as mathematical terms, but may be encoded using bits (e.g., 0 ═ cε,1=-cε,). Thus, vector 306 may create an encoding 308 using initialization vector 317, where the value (or bit) at location 305 is changed (or updated). For example, the sign of the value at position 305 may be flipped, thereby making the value cε(or + c)ε) And all other values are reserved as-cεAs shown (or vice versa).
The system may then create a privatized code 312 by changing at least some of the values with predetermined probabilities 313. In one embodiment, the system may flip the sign of the value (e.g., (-) to (+), or vice versa) at a predetermined probability 313. As further described herein, in one embodiment, the predetermined probability is 1/(1+ e)ε)。
Thus, the user data 301 value is now represented as a privatized code 312, which separately maintains the privacy of the user. The privatization encoding 313 may be stored on the
It should be noted that additional bits may be added to the above-described encoding to transmit additional information. For example, additional bits may be added based on sorting the user data values, as further described herein. For example, add 2 bits (e.g., 2)2) The ability to encode 4 classes is provided. As described above, differential privacy mechanisms allow for a large number of data elements (e.g., p) and thus may provide an efficient mechanism for transmitting data as described herein, which may not be practical due to the resource requirements of previous mechanisms.
Fig. 4 illustrates an exemplary data flow 400 for transmitting a privatized encoding of user data to a server for frequency estimation, according to an embodiment. As shown, the server 130 may accumulate user data in the form of privatized data from different client devices 110A-B. When transmitting information, each client device may transmit a privatized encoding 312 of the user data along with an index value (or reference to an index value) of the random variable. For example, as shown, client device 110A transmits a privatization code 312 to access an applet. The random variants used for such encoding correspond to the random variants at
The accumulated user data may then be processed by the server (either in batches or as a data stream) to generate frequency estimates 131. In one embodiment, the server may maintain a
fig. 5-6 depict a more formal (mathematical) algorithm for the differential privacy mechanism according to an embodiment. As described, the system may create a sketch (e.g., privatized encoding 312) to provide a compact data structure to maintain data stream D ═ { D ═ D1,.. } presence of the element S ═ S { (S)1,...,spFrequency of the field of. Thus, the epsilon local differential private version of the count mean sketch can be used on the server to generate a frequency oracle that maintains user privacy. The frequency oracle is based on data D ═ D received from n clients1,.. } returning an estimate of the data item S e SA function of the count. The differential privacy mechanism may be one of two types: (1) epsilon-local differential private implementation aCLIENTOr (2) Hadamard epsilon-local differential private implementation ACLIENT-Hadamard。
FIG. 5 illustrates an ε -local differential private implementation A according to an embodiment of the present disclosureCLIENTProcess 500. Process 500 may be implemented by an algorithm executed by processing logic described herein, which may include software, hardware, or a combination thereof. For example, process 500 may be performed by a system (such as
Client-side epsilon-local differential private algorithm ACLIENTCan include the following steps: (1) a privacy parameter epsilon; (2) hash range, m; (3) an index value r; and (4) data elements: d ∈ S. Thus, the system (e.g., client device 110) may implement Algorithm ACLIENTTo generate an epsilon-local differential private sketch based on the following operations.
In operation 501, the system may calculate a constant
And initializes the vector v: v ← cε m. Constant cεThe noise is allowed to increase to maintain privacy at zero mean, thus preserving unbiased.In operation 502, the system may select a random variant r of the data element d.
In operation 503, the system may set a portion of n ← hash (r of d).
In operation 504, the system may set v [ n ]]←cε。
In operation 505, the system may apply a vector b ∈ { -1, +1}mSampling is performed, wherein each bjAre independent and have a probability of
The same distribution of + 1.In operation 506, the system may generate
In operation 507, the system may return a vector vprivAnd an index value r.
FIG. 6 illustrates Hadamard version A of an ε -local differential private implementation in accordance with an embodiment of the present disclosureCLIENT-HadamardThe method (or algorithm) 600. Client side a of epsilon-local differential private implementationCLIENT-HadamardThe versions may include: (1) a privacy parameter, ε; (2) hash range, m; (3) an index value, r; and (4) data elements: d ∈ S. Thus, a system (e.g.,
In
In
In
In
In
In
In
Based onThe particular algorithm (or method) used by the client device, the server (e.g., server 130) may generate a frequency table or other data structure to perform frequency estimation on user data values in different client devices. As described above, this estimation may be based on a count mean sketch (e.g., a variation of a count minimum sketch). The values of the frequency table may be based on whether the client uses the epsilon-local differential private sketch algorithm ACLIENTOr Hadamard epsilon-local differential private sketch algorithm ACLIENT-HadamardBut is incremented. The operation for each is as follows.
If the client uses ACLIENTThe algorithm accumulates the sketch, then vector vprivAdding to matching sketch data Wk,mThe following are:
for a vector corresponding to a vector for generating vprivW of the selected variant of (1)hW ishIs set as Wh+vpriv。
If the client uses ACLIENT-HadamardThe algorithm generates a sketch, then vector v isHadamardAdding to matching sketch data Wk,mThe following are:
1. for a vector corresponding to a vector for generating vHadamardW of the selected variant of (1)hLine, set Wh=Wh+vHadamard。
2. Before determining the count mean sketch W, the rows are converted from a Hadamard basis to a standard basis:
wherein HmIs a Hadamard matrix of dimension m.
Fig. 7 is an exemplary flow diagram illustrating a
In 701, the system may select a user data value to transmit to a server from a set of possible user data values collected on a client device.
At 702, the system may create a hash entry for the user data value using a random hash function. To generate a random hash function, the system may hash using a variant of the randomly selected item. In one embodiment, a set of possible variations of the value may be indexed. In one embodiment, the user data value may be a character string, and the set of possible variations of the user data value includes one or more characters representing corresponding index values appended to the variations of the character string.
In 703, the system may encode at least a portion of the created hash value into a vector. In one embodiment, encoding includes updating the vector values at locations corresponding to the representation. For example, the sign of the value at the position corresponding to the representation may be flipped. In one embodiment, encoding may include initializing the vector with uniform values and symbols. In one embodiment, initializing the vectors may further comprise multiplying each vector value by a constant cε=(eε+1)/(eε-1). Furthermore, as mentioned above, the encoded vector may also be in the form of a Hadamard matrix.
In 704, the system may privatize the vector difference by changing at least some of the vector values with a predefined probability. In one embodiment, the predefined probability may be 1/(1+ e)ε) Where ε is a privacy parameter.
In 705, the system may transmit the privatization vector and the index value of the randomly selected variant to the server. As described above, the server may estimate the frequency of user data values in a set of different client devices. The server may estimate the frequency of the user data value by updating the frequency table indexed by the set of possible variations. For example, a row or column of the frequency table corresponding to the index value of the randomly selected variant may be updated with the privatization vector. Furthermore, only the index values of the privatization vector and the randomly selected variant may be transmitted to the server as information representing the user data value. In one embodiment, the randomly selected variant may prevent hash collisions when only that portion of the created hash value is used, and may also reduce the number of computations required by the server to create the frequency table while still maintaining epsilon-local differential privacy of the user data values.
Improving user experience using privatized crowd-sourced data
In another aspect of the present disclosure, systems (and methods) are described that collect crowdsourcing data to enhance user experience using the differential privacy mechanisms described herein. For example, the user experience may be enhanced by inferring potential user preferences from analyzing crowd-sourced user interaction data. In some implementations, crowd-sourced data related to a particular website, application, or service may be collected. For example, in one embodiment, user interactions related to presentation of content, such as content from an online source, may be analyzed. Further, in one embodiment, websites exhibiting particular characteristics may be determined while masking the identity of users that are helpful in determining these characteristics. For example, a website that consumes a certain level of client resources may be identified using privatized crowdsourcing data. Data is collected into a privatized crowdsourced data set, where the identity of individual contributors is masked. The contributor data may be masked on a user device of the contributor prior to transmitting the contribution for inclusion in the data set. Differential privacy can be maintained for crowd-sourced data sets, thereby making it impossible to determine the identity of individual contributors to a data set through multiple structured queries of the data set. For example, the data set may be privatized, thereby making it impossible for an adversary with any background knowledge of the data set to infer that a particular record in the input data set is significantly more responsible for the observed output than a set of other input records.
The number of contributions a given user may contribute to a crowd-sourced data set over a given time period may be limited. In one embodiment, a privacy budget is established for a particular type of crowdsourced data. The amount of contribution from each user is then limited based on the privacy budget. The privacy budget for a particular type of data may also be associated with the epsilon value (privacy parameter) used when privatizing the particular type of data. For example, based on the sensitivity of the data, the privacy budget and privacy parameters for different types of data may vary.
Although differential privacy techniques are described herein, other methods of privatizing user data may be employed instead of or in addition to the differential privacy techniques described herein. Some embodiments may implement other privatization techniques including secure multiparty computing and/or homomorphic encryption. Secure multi-party computing enables multiple parties to jointly compute input functions while keeping these inputs private. Homomorphic encryption is a form of encryption that allows calculations to be performed on ciphertext (encrypted data) to produce an encrypted result that, when decrypted, matches the result of operations performed on plaintext. In all implementations, user data to be used for crowdsourcing will be cleaned up prior to transmission. In addition, the user data to be transmitted can be stored locally in a privatized coding manner.
Fig. 8A-8F are exemplary flowcharts and illustrations regarding crowdsourcing user interaction and device resource consumption data, according to an embodiment. Fig. 8A is an exemplary flow diagram illustrating crowdsourcing user interaction data according to an embodiment. Fig. 8B is an exemplary flow diagram illustrating a method of processing crowd-sourced user interaction data, according to an embodiment. Fig. 8C is an exemplary flow diagram illustrating a resource consumption based crowd-sourced data method according to an embodiment. Fig. 8D is an exemplary flow diagram illustrating a method of processing crowdsourced data based on resource consumption, according to an embodiment. Fig. 8E shows an exemplary diagram illustrating various usage and frequency information that may be derived from crowd-sourced resource consumption data, according to an embodiment. Fig. 8F illustrates a presentation of content for which crowd-sourced data regarding user interaction and resource consumption may be collected according to embodiments described herein.
Fig. 8A illustrates an
In
As described above, in one embodiment, the system may monitor user actions related to a website that initiates or attempts to initiate automatic playback of media content. As referred to herein, auto-play relates to media content that is arranged to automatically begin playing (e.g., by code, script, or other logic associated with a website) without explicit input from a user (e.g., selection of a play button). In addition, automatic playback may also occur in various situations. Typically, auto-play is initiated when a user visits a website, but it should be noted that auto-play may also be initiated in other circumstances, such as when a user scrolls to a particular portion of a web page or when media content is ready to be played (e.g., buffered), etc. Further, as referred to herein, media content may include various multimedia, such as video (with or without audio) or audio, which may be in various formats. In one embodiment, the media content may also include various additional components or plug-ins, such as
The website may initiate automatic playback in response to a particular event. As described above, websites typically initiate auto-play when the website is accessed (e.g., submitted URL). It should be noted, however, that in some cases it may take a certain amount of time to load media content. Thus, in some cases, a user may be able to browse a website (e.g., a text portion) before the website initiates playback of media content that is set to auto-play. In such examples, the system may be after (or in response to) the media content is ready (e.g., buffered) to begin playbackThis) monitors user interactions.In
Further, the user action may include selecting to play the video with automatic play disabled (e.g., via default system settings or preferences). In one embodiment, this may include selecting to play a video that disables automatic play for a predetermined amount of time to access the web page. It should be noted that the user interactions herein may be collected in an environment where the system may automatically adjust the parameters of the auto-play. For example, the system may implement user preferences for enabling or disabling auto-play. As another example, the system may adjust one or more parameters, such as volume, mute, size, or delay, for presentation of the media content that is set to be automatically played. A further non-limiting example of setting auto-play preferences based on aggregated data can be found in commonly assigned U.S. patent application No. 62/506,685 entitled "Device, method, and graphical user interface for managing web presentation", filed on 16/5.2017, which is incorporated herein by reference in its entirety.
The system may also monitor various user interactions that provide an indication to a user that preference is given to disabling auto-play. For example, the user action may include interrupting the automatic playback of the media content for a predetermined amount of time to present the media content to the user. For example, interrupting automatic play may include stopping or pausing media content, and muting or reducing the volume to a certain level (e.g., below a threshold in the form of a percentage or value). As another example, the user interaction may include closing or minimizing an application or a tab associated with the web page. Further, user interaction may include performing system functions, such as enabling system muting or lowering system volume. As another example, where automatic play is disabled, the user interaction may include navigating (e.g., scrolling) away from a content item set to automatic play without selecting to play media content.
Further, the user interactions may include various interactions that may be monitored using one or more sensors of the device. For example, various sensors on the device may monitor user engagement or disengagement. For example, a user being in proximity to a device and actively interacting with the device may infer a degree of engagement. Conversely, a user looking away or away from the device may infer a degree of disengagement. Thus, the behavior of the user may be determined or inferred using one or more sensors of the device.
Further, it should be noted that other categories may be used in addition to those discussed above. For example, categories that provide a degree of preference (e.g., very strong, weak, very weak) or any other classification technique may also be used.
Once the user interaction data is collected, it may be sent to a server for analysis. As described herein, the system may ensure user privacy by implementing a differential privacy mechanism. Furthermore, to further protect the privacy of the user, only data samples can be sent to the server.
Thus, in 803, the system may privatize the encoding of entities associated with user interactions (e.g., a web page or a website associated with the web page) and categories of user interactions (e.g., preferences to enable auto-play or preferences to disable auto-play). The encoding may utilize any of the encoding techniques described herein, and may mask various contributors to the data using one of various privatization techniques. In one embodiment, the encoding differential is privatized using the techniques described herein.
In 804, the system may transmit a differential privatization encoding to a server for estimating a classification frequency of entities in the crowd-sourced data. As described above, the server may perform various operations (e.g., a draft count mean) to determine the frequency estimate. The server may determine a frequency estimate based on the classification scheme. For example, a frequency estimate may be determined in which the user preferences enable automatic playback for a particular web page or web site, and a frequency estimate in which the user preferences disable automatic playback for that web page or web site.
Fig. 8B is an exemplary flow diagram illustrating a
In
In
In
In some embodiments, the system may also add or remove particular entities (e.g., web pages, websites) from the whitelist based on the determined frequency, as shown in
As described above, in another aspect of the present disclosure, the user experience may be enhanced by identifying particular websites that exhibit particular characteristics. In one implementation, websites associated with high resource consumption may be identified. For example, high resource consumption may be identified based on thresholds for particular resources (such as CPU, memory, and power usage). By identifying such sites, a developer may determine which sites may be problematic or which sites may be candidates for development work.
Fig. 8C is an exemplary flow diagram illustrating a
In
In
In
In
Fig. 8D is an exemplary flow diagram illustrating a
In 841, the system may receive a privatized encoding of an identifier of an application or website from each of a set of client devices. Each application or website is selected for transmission in response to the resource consumption threshold being exceeded. The identifier may be an application name, a website, or an application name and a website.
In 842, the system may accumulate the privatized encoded frequency estimates received from the set of client devices. The frequency estimate may include a sketch, such as the frequency table shown in fig. 4.
In 843, the system may estimate the frequency of selected applications, websites, or website pages exceeding a particular threshold. For example, as described above, the system may determine the frequency using a count mean sketch operation.
In some embodiments, the system in 844 may also adjust the resource consumption threshold based on the analysis of the resource consumption usage pattern. For example, the system may determine that the percentage of websites that exceed a predetermined threshold has increased over a period of time (e.g., months). Accordingly, a predetermined threshold that triggers an indication indicative of high resource consumption may be dynamically adjusted (e.g., increased) based on a continuous analysis of crowd-sourced data.
Fig. 8E shows an exemplary diagram illustrating various usage and frequency information that may be derived from crowd-sourced resource consumption data, according to an embodiment. For example, as shown, the number of web sites within a particular level of CPU usage and memory usage may be determined 852. Further, multiple websites exceeding a particular threshold may also be determined as shown. As another example, the frequency 857 of web sites 855 exhibiting particular characteristics may also be determined. For example, the frequency with which a particular website may cause an application to crash may be monitored, and the frequency of visits to the most popular websites may be tracked. It should be noted that these examples are merely illustrative, and that myriad other metrics may be used depending on the particular application.
Fig. 8F illustrates a presentation of content for which crowd-sourced data regarding user interaction and resource consumption may be collected according to embodiments described herein. As shown, application 860 may execute on a data processing system or an electronic device described herein, such as
Where the application 860 is a web browser, the application may be used to navigate to a website hosted on a network, such as the internet. The application 860 may display a plurality of tags 861, each of which exposes the same or different content 870. Content 870 may be presented from a web page (e.g., example. html826) hosted on a website (e.g., www.example.com 863). Web page 862 may be one of several web pages on a web site, and web site 863 may be one of several web sites hosted on a domain (e.g., example. Web page 862, web site 863, or domain name can be included in the representation of user data that is privatized and transmitted to the crowdsourcing server. Content 870 may include various types of content items, including text content 871, 872 and media content 880. Media content 880 may present media item 882 as a content item, as well as media control 884 for starting, pausing, or stopping playback of media item 882. For example, if automatic playback is enabled for the media item 882, the user may pause playback of the media item using the media control 884. The user may also use media control 882 to initiate playback of a media item that is set to auto-play, but auto-play is prevented due to the display settings of content 870. The applications 860 may include settings that may configure the display of the content 870. For example, the application 860 may include a user interface element 864 to enable reader mode settings 864 as described herein. The reader mode setting may be used to limit presentation of content 870 to only
In one embodiment, the privatized crowd-sourced data is collected based on detected interactions related to content items (e.g., media items 882) of the presented content 870. Based on the interaction, the presented content 870 may be associated with the category based on the interaction. The category may be selected from a set of categories relating to inferred preferences for presentation of the presented content. For example, applications 860 may include various settings that control how content 870 is presented. The interactions may be used to place the content 870 into a category associated with the inferred setting. For example, based on the interaction, the content 870 may be associated with a category related to the inferred preferences to allow for automatic playback of the media item 882. The content 870 may also be associated with a category related to the inferred preferences to prevent automatic playback of the media item 882. In one embodiment, the content may be associated with the inferred preferences to enter a reader mode. In one embodiment, content may be associated with inferred preferences to prevent display of certain content items. A privatization encoding may be created on the client device that includes a representation of the presentation content 870 and a representation of a category with which the presentation content is associated. The privatization codes can then be transmitted to a server, which can accumulate the privatization codes from multiple devices to estimate the frequency with which the presentation content is associated with a particular category. Privatized data with respect to user interaction with content can be used to crowd-source various display or presentation settings associated with the content. The categories may be determined relative to user preferences for zoom settings, brightness settings, or any application or device settings. In one embodiment, a set of desired layouts for user interface elements or content items may be determined based on crowd-sourced data regarding user interactions with the user interface elements and content items.
It should be noted that the data sampling described herein is an example, and thus it is contemplated that any type of data may be sampled (e.g., collected) in a differential, private manner to determine various frequencies or statistics. For example, the above-described methods may be equally applicable to a variety of user information or user interactions that may occur with various components of a system or application or service.
Thus, as described above, the mechanisms of the present disclosure leverage the potential of crowdsourcing data in maintaining user privacy (e.g., via a local differential privacy mechanism) to potentially gain valuable insight for development work.
Exemplary application Programming interface diagrams
Embodiments described herein include one or more Application Programming Interfaces (APIs) in an environment in which calling program code interacts with other program code called through the one or more programming interfaces. Various function calls, messages, or other types of calls may further include various parameters, and these calls may be transmitted via an API between the calling program and the called code. In addition, the API may provide the calling program code with the ability to use the data types or categories defined in the API and implemented in the called program code.
The API allows a developer of the API-calling component (which may be a third party developer) to take advantage of the specified features provided by the API-implementing component. There may be one API-calling component or there may be more than one such component. An API may be a source code interface provided by a computer system or library to support service requests from applications. An Operating System (OS) may have multiple APIs to allow applications running on the OS to call one or more of those APIs, and a service (e.g., a library) may have multiple APIs to allow applications using the service to call one or more of those APIs. The API may be specified in a programming language that can be interpreted or compiled when the application is built.
In some embodiments, the API-implementing component may provide more than one API, each providing a different view or having different aspects that access different aspects of the functionality implemented by the API-implementing component. For example, one API of an API-implementing component may provide a first set of functions and may be exposed to third party developers, and another API of the API-implementing component may be hidden (not exposed) and provide a subset of the first set of functions, and also provide another set of functions, such as testing or debugging functions that are not in the first set of functions. In other embodiments, the API-implementing component may itself invoke one or more other components via the underlying API, and thus be both an API-calling component and an API-implementing component.
The API defines the language and parameters used by the API-calling component when accessing and using the specified features of the API-implementing component. For example, the API-calling component accesses specified features of the API-implementing component through one or more API calls or references (e.g., implemented by function or method calls) exposed by the API, and passes data and control information using parameters via the API calls or references. The API implementing component may return a value through the API in response to an API call from the API calling component. Although the API defines the syntax and results of the API call (e.g., how the API call is caused and what the API call can do), the API may not reveal how the API call completes the function specified by the API call. The various API calls are transmitted via one or more application programming interfaces between the call (API calling component) and the API implementing component. Transferring API calls may include issuing, initiating, referencing, calling, receiving, returning, or responding to function calls or messages; in other words, the transport can describe the actions of either the API-calling component or the API-implementing component. A function call or other reference to the API may send or receive one or more parameters through a parameter list or other structure. A parameter may be a constant, a key, a data structure, an object class, a variable, a data type, a pointer, an array, a list, or a pointer to a function or method, or another way of referencing data or other item to be passed via the API.
Further, the data type or class can be provided by the API and implemented by the API-implementing component. Thus, the API-calling component may declare variables using definitions provided in the API, use pointers to such types or classes, use or instantiate constant values for such types or classes.
Generally, an API can be used to access services or data provided by an API implementing component or to initiate execution of operations or computations provided by the API implementing component. By way of example, the API-implementing component and the API-calling component may each be any of an operating system, a library, a device driver, an API, an application program, or other module (it being understood that the API-implementing component and the API-calling component may be the same or different types of modules as each other). In some cases, the API-implementing component may be implemented at least in part in firmware, microcode, or other hardware logic components. In some embodiments, the API may allow a client program to use services provided by a Software Development Kit (SDK) library. In other embodiments, an application or other client program may use an API provided by the application framework. In these embodiments, the application or client may incorporate the call into a function or method provided by the SDK and provided by the API, or use a data type or object defined in the SDK and provided by the API. In these embodiments, the application framework may provide a primary event loop for the program that responds to various events defined by the framework. The API allows an application to specify events and responses to events using an application framework. In some implementations, the API calls can report to the application the capabilities or state of the hardware device, including capabilities or states related to aspects such as input capabilities and states, output capabilities and states, processing capabilities, power states, storage capacity and states, communication capabilities, and the like, and the API may be implemented in part by firmware, microcode, or other low-level logic components that execute in part on the hardware components.
The API-calling component may be a local component (i.e., on the same data processing system as the API-implementing component) or a remote component (i.e., on a different data processing system than the API-implementing component) that communicates with the API-implementing component through the API over a network. It should be understood that an API-implementing component may also act as an API-calling component (i.e., it may make API calls to APIs exposed by different API-implementing components), and an API-calling component may also act as an API-implementing component by implementing APIs exposed by different API-calling components.
The API may allow multiple API-calling components written in different programming languages to communicate with the API-implementing component (so that the API may include features for translating calls and returns between the API-implementing component and the API-calling component); however, the API may be implemented in a particular programming language. In one embodiment, the API-calling component may invoke APIs from different providers, such as one set of APIs from an OS provider and another set of APIs from a plug-in provider, and another set of APIs from another provider (e.g., a provider of a software library) or a creator of another set of APIs.
Fig. 9 is a block diagram illustrating an exemplary API architecture, which may be used in some embodiments described herein. The
It is to be appreciated that the API-implementing
The API-implementing
Fig. 10A-10B are block diagrams of exemplary
FIG. 10B illustrates an exemplary software stack 1010 including
Additional exemplary computing devices
Fig. 11 is a block diagram of a
The
Sensors, devices, and subsystems can be coupled to
Communication functions can be facilitated by one or more
An
The I/
In one embodiment, a
In one embodiment, the I/
In one embodiment,
Further, the
Each of the instructions and applications identified above may correspond to a set of instructions for performing one or more functions described above. The instructions need not be implemented as separate software programs, procedures or modules.
Fig. 12 is a block diagram illustrating a computing system 1200 that may be used in conjunction with one or more of the embodiments described herein. The illustrated computing system 1200 may represent any device or system (e.g.,
As shown, computing system 1200 may include a bus 1205 that may be coupled to a processor 1210, a ROM (read only memory) 1220, a RAM (or volatile memory) 1225, and a storage device (or non-volatile memory) 1230. Processor 1210 may retrieve stored instructions from one or more of memories 1220, 1225, and 1230 and execute the instructions to perform the processes, operations, or methods described herein. These memories represent non-transitory machine-readable media (or computer-readable media) or storage devices containing instructions that, when executed by a computing system (or processor), cause the computing system (or processor) to perform the operations, processes, or methods described herein. The RAM 1225 may be implemented, for example, as dynamic RAM (dram) or other types of memory that require constant power to refresh or maintain data within the memory. Storage 1230 may include, for example, magnetic storage, semiconductor storage, tape storage, optical storage, removable storage, non-removable storage, and other types of storage that retain data even after power is removed from the system. It should be appreciated that the storage 1230 can be at a remote location (e.g., accessible via a network) relative to the system.
A display controller 1250 may be coupled to bus 1205 to receive display data to be displayed on a display device 1255, which may display user interface features or any of the embodiments described herein, and may be a local or remote display device. Computing system 1200 may also include one or more input/output (I/O) components 1265, including a mouse, keyboard, touch screen, network interface, printer, speakers, and other devices. Typically, the input/output components 1265 are coupled to the system through an input/output controller 1260.
Module 1270 (or component, unit, function, or logic) may represent any of the functions or engines described above, such as differential privacy engine 228. Module 1270 may reside, completely or at least partially, within the memory described above, or within the processor during execution thereof by the computing system. Further, module 1270 may be implemented as software, firmware, or functional circuitry within a computing system, or a combination thereof.
In some embodiments, the hash function described herein (e.g., SHA256) may utilize dedicated hardware circuitry (or firmware) of a system (client device or server). For example, the function may be a hardware acceleration function. Further, in some embodiments, the system may use functions that are part of a dedicated instruction set. For example, an instruction set may be used, which may be an extension of the instruction set architecture for a particular type of microprocessor. Thus, in one embodiment, the system may provide a hardware acceleration mechanism for performing SHA operations. Thus, the system may use these instruction sets to increase the speed at which the functions described herein are performed.
Further, the hardware acceleration engine/functionality is contemplated to include any implementation in hardware, firmware, or a combination thereof, including various configurations that may include hardware/firmware integrated into the SoC as a separate processor, or included as a dedicated CPU (or core), or integrated into a co-processor on a circuit board, or included on a chip that extends the circuit board, and so forth.
Thus, while such acceleration functionality is not necessarily required to achieve differential privacy, some embodiments herein may potentially improve the overall efficiency of the implementation with the generality of specialized support for such functionality (e.g., cryptographic functionality).
It should be noted that the terms "about" or "substantially" may be used herein and may be interpreted as "as nearly as possible," "under technical limitations," or the like. In addition, unless otherwise indicated, use of the term "or" indicates an inclusive or (e.g., and/or).
In the foregoing specification, exemplary embodiments of the present disclosure have been described. It will be evident that various modifications may be made thereto without departing from the broader spirit and scope of the disclosure. The specification and drawings are, accordingly, to be regarded in an illustrative sense rather than a restrictive sense. The specific details in the description and examples provided may be used anywhere in one or more embodiments. Various features of different embodiments or examples may be combined differently with some features included and others excluded to accommodate a variety of different applications. Examples may include subject matter, such as a method, an apparatus to perform the acts of the method, at least one machine readable medium comprising instructions that, when executed by a machine, cause the machine to perform the acts of the method, or perform the acts of an apparatus or system in accordance with the embodiments and examples described herein. In addition, various components described herein may be means for performing the operations or functions described herein.
In one aspect of the disclosure, a system (and method) is described that ensures differential privacy when transmitting data to a server that estimates the frequency of such data in a set of client devices. Differential privacy can reduce resource requirements while still providing provable guarantees about privacy and utility. For example, the mechanism may provide the ability to tailor the utility (e.g., estimated accuracy) according to resource requirements (e.g., transmission bandwidth and computational complexity). To account for reduced resource requirements (e.g., reduced encoded bit lengths), the mechanism may estimate the frequency of the data using a count-mean sketch, as described further herein.
With respect to resource requirements, the mechanism may implement a hash function that provides the ability to reduce computational requirements by using only a portion of the generated hash value. To avoid hash collisions using only a portion of the hash value, the mechanism may use a variant in hashing user data. The use of variants allows the mechanism to implement shared hashing to reduce the amount of required computation that the client and server must perform. With respect to utility, the mechanism provides frequency estimation within a predictable deviation that includes a lower bound and an upper bound.
In another aspect of the disclosure, systems and methods are described for collecting crowdsourcing data to enhance user experience using the privacy mechanisms described herein. For example, the user experience may be enhanced by inferring potential user preferences from analyzing crowd-sourced user interaction data. Development efforts may be refined or enhanced with respect to application behavior based on statistical analysis of user interactions related to various features or events. The privatization technique for privatizing crowd-sourced user data is not limited to differential privacy techniques. For example, privatization of crowdsourced data may be achieved using secure multiparty computing and/or homomorphic encryption.
In one embodiment, user interactions related to presentation of content, such as content from an online source, may be analyzed. For example, presentation settings or preferences may be defined based on crowd-sourced user interaction data.
In one embodiment, the presentation settings may include automatic playback settings for the media content, enabling analysis of crowd-sourced data related to automatic playback of the media content. For example, the system may determine or infer crowd-sourced preferences that enable or disable automatic play of media content with respect to a particular website. User interactions that include immediately stopping or muting the automatic play of a media item upon accessing a website may be viewed as inferring a preference to disable automatic play. Conversely, when a web page is accessed that disables automatic playback of media content (e.g., by default system settings or preferences), and the user selects to play media content that disables automatic playback, it may be inferred that the user prefers to enable automatic playback on such a website. Thus, collecting such user interaction data from various devices and analyzing the data on a server (e.g., via a local differential privacy mechanism) allows developers to potentially gain valuable insight into a particular website. For example, websites with an overestimated user frequency that provides for auto-play preference inference enabled may be added to a "white list" (e.g., a list of websites that allow or enable auto-play functionality).
In addition, other settings related to the presentation of the content may also be analyzed. Additional presentation settings, such as content display settings (e.g., reader mode settings) or content blocking settings, for example, may also be analyzed using the differential privacy mechanism described herein.
In another aspect, the user experience may also be enhanced by identifying particular websites that exhibit particular characteristics. In one implementation, websites associated with high resource consumption may be identified. For example, high resource consumption may be identified based on thresholds for resource usage, such as CPU, memory, and power. By identifying such websites, developers can determine which websites may be problematic or potential candidates for analysis to identify the cause of high resource consumption.
In the above description, privacy techniques have been described. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense. The specific details in the description and examples provided may be used anywhere in one or more embodiments. Various features of different embodiments or examples may be combined differently with some features included and others excluded to accommodate a variety of different applications. Examples may include subject matter, such as a method, an apparatus to perform the acts of the method, at least one machine readable medium comprising instructions that, when executed by a machine, cause the machine to perform the acts of the method, or perform the acts of an apparatus or system in accordance with the embodiments and examples described herein. Further, various components described herein may be means for performing operations or functions described in accordance with the implementations. Therefore, the true scope of the embodiments will be apparent to the skilled practitioner upon a study of the drawings, specification, and following claims.
- 上一篇:一种医用注射器针头装配设备
- 下一篇:一种网络及网络管理方法