Differential privacy using count mean sketch

文档序号：1591069 发布日期：2020-01-03 浏览：4次中文

阅读说明：本技术 使用计数均值草图的差分隐私 (Differential privacy using count mean sketch ) 是由 A·鲍米克 A·H·维罗斯 U·S·韦沙姆帕延 K·W·德克尔 C·A·舒尔茨 S·J·法于 2018-03-28 设计创作，主要内容包括：本文所述的实施方案提供了一种隐私机制,该隐私机制用于在将用户数据传输至估计此类数据在一组客户端设备中的频率的服务器时保护数据。在一个实施方案中,使用计数均值草图技术实现差分隐私机制,该技术在提供关于隐私和效用的可证明保证时可减少启用隐私所需的资源需求。例如,该机制可提供根据资源需求(例如,传输带宽和计算复杂性)定制效用(例如,估计的精度)的能力。(Embodiments described herein provide a privacy mechanism for protecting user data when transmitting such data to a server that estimates the frequency of such data in a set of client devices. In one embodiment, a differential privacy mechanism is implemented using count-mean sketch techniques that may reduce the resource requirements needed to enable privacy while providing provable guarantees about privacy and utility. For example, the mechanism may provide the ability to tailor the utility (e.g., estimated accuracy) according to resource requirements (e.g., transmission bandwidth and computational complexity).)

1. A non-transitory machine-readable medium storing instructions that, when executed by one or more processors of a computing device, cause the computing device to perform operations comprising:

selecting a user data value to be transmitted to the server from a set of possible user data values collected on the client device;

creating hash entries for the user data values using a random hash function, wherein the random hash function is generated from randomly selected variants of the entries, and wherein a set of possible variants of the entries is indexed;

encoding at least a portion of the hash entries of the value as a vector, wherein the encoding comprises updating the vector value at a location corresponding to the hash;

privatizing the vector by changing at least some of the vector values with a predefined probability; and

transmitting the privatization vector and the index value of the randomly selected variant to the server to enable the server to estimate a frequency of the user data values on a set of client devices.

2. The non-transitory machine readable medium of claim 1, wherein:

the item is a string of characters and the randomly selected variant of the item includes one or more characters representing the index value appended to the variant of the string of characters;

the server estimating the frequency of the user data value by updating a frequency table indexed by the set of possible variants and updating a row or column of the frequency table corresponding to the index value of the random hash function with the privatization vector; and

the server estimates a frequency for each of the set of possible user data values from data accumulated from the set of client devices.

3. The non-transitory machine readable medium of claim 1, wherein the encoding comprises initializing the vector using a uniform value and sign, and updating the vector value comprises flipping the sign of the value at the location corresponding to the created hash value.

4. The non-transitory machine readable medium of claim 3, wherein initializing the vector comprises setting the each vector value to a value representing a constant, and updating the vector values comprises setting the values at the locations corresponding to created hash values to values representing sign flips of the constant.

5. The non-transitory machine-readable medium of claim 1, wherein the random hash function is to resolve hash collisions when using only the portion of the created hash values and to reduce the number of computations required to create a frequency table when maintaining privacy of the user data values.

6. The non-transitory machine-readable medium of claim 1, wherein the encoding vector is a Hadamard matrix and the user data value represents a website visited by a user of the client device.

7. The non-transitory machine-readable medium of claim 1, wherein only the index values of the privatization vector and the randomly selected variant are transmitted to the server as information representing the user data value.

8. The non-transitory machine readable medium of claim 1, wherein privatizing the vector comprises changing at least some of the vector values with a predefined probability, the predefined probability based on a privacy parameter.

9. The non-transitory machine readable medium of claim 8, wherein the privacy parameter represents a configuration tradeoff between privacy and accuracy.

10. The non-transitory machine readable medium of claim 8, wherein the predefined probability is defined as 1/(1+ e)^ε) And ε is the privacy parameter.

11. An electronic device, comprising:

one or more processors; and

a memory coupled to the one or more processors, the memory storing instructions that, when executed by the one or more processors, cause the one or more processors to:

selecting a user data value to be transmitted to the server from a set of possible user data values collected on the client device;

encoding at least a portion of the hash entries as vectors, wherein encoding at least the portion of the created hashes comprises: updating the vector value at a location corresponding to the created hash value;

privatizing the vector by changing at least some of the vector values with a predefined probability; and

transmitting the privatization vector and the index value of the randomly selected variant to the server, the server estimating a frequency of user data values in a set of different client devices.

12. The apparatus of claim 11, the server estimates the frequency through an update to a frequency table indexed by the set of possible variants, wherein a row or column of the frequency table corresponding to the indexed value of the randomly selected variant is updated with the privatization vector.

13. The apparatus of claim 12, wherein:

said randomly selected variant prevents hash collisions when using only said portion of said created hash value and reduces the number of computations required by said server to create said frequency table when maintaining privacy of said user data values;

the item is a string of characters and the randomly selected variant of the item includes one or more characters representing the index value appended to the variant of the string of characters; and

the encoding includes initializing the vector using a uniform value and sign, and updating the vector value includes sign-flipping of the value at the location corresponding to the created hash value.

14. A data processing system comprising:

one or more processors; and

a memory coupled to the one or more processors, the memory storing instructions that, when executed by the processors, cause the processors to perform operations comprising:

selecting a user data value to be transmitted to the server from a set of possible user data values collected on the client device;

encoding at least a portion of the hash entries into vectors, wherein the encoding comprises: updating the vector value at a location corresponding to the created hash value;

privatizing the vector difference by changing at least some of the vector values with a predefined probability; and

transmitting the privatization vector and the index values of the randomly selected variants to the server, wherein the server estimates the frequency of the user data values by updating a frequency table indexed by the set of possible variants.

15. The data processing system of claim 19, wherein:

updating rows or columns of the frequency table corresponding to the index values of the randomly selected variants with the privatization vector;

the term is a string of characters and the randomly selected variant of the term includes one or more characters representing the index value appended to the variant of the string of characters, the randomly selected variant preventing hash collisions when using only the portion of the created hash value and reducing the number of computations required by the server to create the frequency table when maintaining privacy of the user data values;

the encoding includes initializing the vector with uniform values and symbols; and

updating the vector value comprises flipping the sign of the value at the location corresponding to the created hash value.

16. An electronic device, comprising:

a non-transitory machine readable medium to store instructions;

one or more processors to execute the instructions; and

a memory coupled to the one or more processors, the memory to store the instructions, which when executed by the one or more processors, cause the one or more processors to:

detecting an application-related interaction that is performed in response to an instruction within a web page presented by the application, wherein the web page presented by the application is for presentation of a media item;

associating the web page presented by the application with a category based on the interaction, the category selected from a set of categories related to inferred preferences of the web page presented by the application;

creating a privatization code comprising a representation of the web page and a representation of the category exposed by an application; and

transmitting the privatization codes to at least one server that accumulates privatization codes from a plurality of devices to estimate a frequency of the web pages presented by the application that are associated with the category in the plurality of devices.

17. The electronic device of claim 16, wherein the response is provided in response to initiation, attempted initiation, or permission of automatic playback of a media item presented by the web page or a web application associated with the web page, and the set of categories includes a first category related to inferred preferences that enable automatic playback of the media item.

18. The electronic device of claim 17, wherein the response includes allowing automatic playback of the media item beyond a predetermined time value after the web page presents the media item, or expanding or maximizing the media item after automatic playback is initiated for the media item.

19. The electronic device of claim 17, wherein automatic playback of the web page accessed by a user is disabled based on a setting, and the response includes selecting the media item for which playback of the web page is attempted to initiate automatic playback within a predetermined time value of presentation of the media item.

20. The electronic device of claim 16, wherein the response is provided in response to initiation, attempted initiation, or permission of automatic playback of a media item within the web page or a web application associated with the web page, and the set of categories includes a second category that is related to user preferences that infer disabling of automatic playback of the media content.

21. The electronic device of claim 20, wherein the response comprises closing or minimizing the application or a tag associated with the web page.

22. The electronic device of claim 20, wherein the response comprises muting or lowering system volume.

23. The electronic device of claim 20, wherein automatic playing of the web page is disabled based on a setting, and the response includes navigating away from the media item without playing back the media item.

24. The electronic device of claim 20, wherein the response comprises interrupting automatic playback of the media item for a predetermined value of time of the web page on which the media item is presented, and wherein interrupting the automatic playback comprises pausing, stopping, or muting the media item, or reducing a volume of the media item to a predetermined level.

25. The electronic device of claim 16, the instructions further causing the one or more processors to:

determining that content on a web page to be displayed by the application includes logic indicating that the media content is set for automatic playback; and

adjusting one or more presentation settings of the media content prior to presentation of the media content by the application, wherein adjusting the one or more presentation settings comprises disabling auto-play, delaying the auto-play, pausing the auto-play, or muting the media content.

26. The electronic device of claim 16, wherein to create the privatized code, the instructions are to cause the one or more processors to:

creating a hash value of the representation of the web page and the representation of the category by a randomly selected hash function,

encoding the initialization vector by sign-flipping vector values at positions corresponding to the hash values, and

changing at least some of the initialization vector values with a predefined probability determined based on a privacy parameter.

27. A computing system, comprising:

one or more processors; and

a memory coupled to the one or more processors, the memory storing instructions that, when executed by the one or more processors, cause the computing system to perform operations comprising:

receiving, from each of a set of client devices, a privatization encoding of a web page and a category associated with the web page, wherein the category relates to inferring a preference to present media content on the web page;

accumulating privatized code from the set of client devices; and

estimating a frequency of selected web pages associated with the category based on the accumulated privatized encodings from the set of client devices.

28. The computing system of claim 27, further comprising:

accumulating a sketch of the privatization codes received from the set of client devices, the sketch including frequency estimates of the privatization codes; and

estimating a frequency of the selected web pages associated with the category among the frequencies included in the sketch of the privatized code.

29. The computing system of claim 27, wherein estimating the frequency of the selected web pages comprises determining a count using a count mean sketch operation.

30. The computing system of claim 27, wherein the category relates to inferring a preference with respect to displaying content on the web page using a reader mode, wherein the reader mode is a mode that does not display menu items.

31. The computing system of claim 27, wherein the categories relate to inferring preferences for content blocking settings for the web page.

32. The computing system of claim 27, wherein the categories relate to inferring user preferences for enabling automatic playback of media content on the web page.

33. The computing system of claim 32, the operations further comprising: adding the selected web page to a whitelist based on an estimated frequency of users providing preference inference of enabling automatic playback of the web page, wherein the whitelist comprises a list of web pages that allow automatic playback.

34. The computing system of claim 33, the operations further comprising: removing the selected web page from the whitelist or adding the selected web page to a blacklist based on the estimated frequency of users providing preference inference for disabling the automatic playing of web pages, wherein the whitelist comprises a list of web pages that are allowed to be automatically played and the whitelist comprises a list of web pages that are disabled from being automatically played.

35. A non-transitory machine-readable medium storing instructions that, when executed by one or more processors of a computing device, cause the computing device to perform operations comprising:

receiving, from each of a set of client devices, a privatization encoding of a web page and a category associated with the web page, wherein the category relates to inferring user preferences for enabling automatic playback of media content on the web page;

accumulating privatized code from the set of client devices; and

estimating a frequency of selected web pages associated with the category based on the accumulated privatized encodings from the set of client devices.

36. The non-transitory machine-readable medium of claim 35, the operations further comprising:

accumulating a sketch of the privatization codes received from the set of client devices, the sketch including frequency estimates of the privatization codes; and

estimating a frequency of the selected web pages associated with the category among the frequencies included in the sketch of the privatized code.

37. The non-transitory machine-readable medium of claim 35, the operations further comprising: adding the selected web page to a whitelist based on an estimated frequency of users providing preference inference of enabling automatic playback of the web page, wherein the whitelist comprises a list of web pages that allow automatic playback.

38. The non-transitory machine-readable medium of claim 37, the operations further comprising: removing the selected web page from the whitelist or adding the selected web page to a blacklist based on the estimated frequency of users providing preference inference for disabling the automatic playing of web pages, wherein the whitelist comprises a list of web pages that are allowed to be automatically played and the whitelist comprises a list of web pages that are disabled from being automatically played.

39. The non-transitory machine readable medium of claim 38, wherein estimating the frequency of the selected web pages comprises determining a count using a count mean sketch operation.

40. A data processing system comprising:

a non-transitory machine readable medium to store instructions;

one or more processors to execute the instructions; and

a memory coupled to the one or more processors, the memory to store the instructions, which when executed by the one or more processors, cause the one or more processors to:

detecting an interaction with the data processing system related to a content item of the presented content, the interaction determined in response to execution of an instruction to present the content item;

associating the presented content with a category based on the interaction, the category selected from a set of categories related to inferred preferences of presenting the presented content;

creating a privatization encoding comprising a representation of the presented content and a representation of the category; and

transmitting the privatization encoding to at least one server that accumulates privatization encodings from a plurality of devices to estimate a frequency of the presented content associated with the category in the plurality of devices.

41. The data processing system of claim 40, wherein the presented content comprises a web page, a web application, or a user interface of an application, and the content item comprises a user interface element of the web page, a user interface element of the web application, or a user interface element of the application.

42. The data processing system of claim 40, wherein the content item is exposed by an application and the category is a setting selection provided by the application.

43. The data processing system of claim 42, wherein the application is a web browser or another application for viewing the presented content.

44. A non-transitory machine-readable medium storing instructions that, when executed by one or more processors of a computing device, cause the computing device to perform operations comprising:

receiving privatized encodings of representations of web pages from a set of client devices, each web page selected to be transmitted in response to exceeding a resource consumption threshold;

accumulating the privatized code received from the set of client devices;

generating a frequency estimate based on the accumulated privatized codes; and

estimating a frequency of selected ones of the web pages using the frequency estimate.

45. The non-transitory machine-readable medium of claim 44, wherein the representation of the web page comprises a website associated with the web page, and the operations further comprise estimating a frequency of a website based on the frequency of selected web pages associated with the website.

46. The non-transitory machine-readable medium of claim 44, the operations further comprising: adjusting the resource consumption threshold based on an analysis of an estimated frequency of one or more web pages included in the frequency estimate, wherein the frequency estimate is a sketch of privatization coding received from the set of client devices.

47. The non-transitory machine readable medium of claim 46, wherein the privatization encoding is a differential privatization encoding.

48. The non-transitory machine readable medium of claim 47, wherein estimating the frequency of the selected web pages comprises determining a count using a count mean sketch operation.

49. The non-transitory machine-readable medium of claim 44, wherein the resource consumption corresponds to usage of a processor or memory of the computing device.

50. The non-transitory machine readable medium of claim 44, wherein the resource consumption corresponds to power usage or data transmission bandwidth.

51. An electronic device, comprising:

one or more processors; and

a memory coupled to the one or more processors, the memory storing instructions that, when executed by the one or more processors, cause the electronic device to:

monitoring resource consumption of an application while a web page is being exposed from a website;

determining that the resource consumption exceeds a resource consumption threshold;

generating a privatized encoding of a representation of the web page; and

transmitting the privatization encoding of the representation of the web page to a server, wherein the server is to accumulate a sketch of privatization encodings from different devices to estimate a frequency with which the web page exceeds the resource consumption threshold in the different devices.

52. The electronic device of claim 51, wherein the server is to estimate the frequency that websites have web pages that exceed the resource consumption.

53. The electronic device of claim 51, wherein the resource consumption corresponds to usage of the one or more processors or the memory coupled to the one or more processors.

54. The electronic device of claim 51, wherein the resource consumption corresponds to power usage of the electronic device or data transmission bandwidth consumption of the electronic device.

55. The electronic device of claim 51, wherein the application that accesses the web page comprises a browser.

56. The electronic device of claim 55, wherein monitoring the resource consumption of the application accessing the web page comprises monitoring the resource consumption of a process id of a tag of the application corresponding to the web page.

57. The electronic device of claim 51, wherein generating the privatized encoding of the representation of the web page comprises:

creating a hash value for the web page using a randomly selected hash function,

encoding the initialization vector by sign-flipping vector values at positions corresponding to the created hash values, and

at least some of the vector values are changed with a predefined probability.

58. The electronic device of claim 57, wherein the predefined probability is based on a predetermined privacy parameter.

59. A data processing system comprising:

one or more processors; and

a memory coupled to the one or more processors, the memory storing instructions that, when executed by the processors, cause the processors to perform operations comprising:

receiving privatized encoding of a representation of a web page from each of a set of client devices, each web page selected to be transmitted in response to a resource consumption threshold being exceeded;

accumulating a sketch of the privatized code received from the set of client devices; and

estimating a frequency of selected ones of the web pages included in the sketch.

60. The data processing system of claim 59, wherein the representation of the web page comprises a website associated with the web page, and the operations further comprise: estimating a frequency of a website based on the frequency of web pages associated with the website included in the sketch.

61. The data processing system of claim 59, the operations further comprising: adjusting the resource consumption threshold based on an analysis of an estimated frequency of one or more web pages included in the sketch.

62. The data processing system of claim 59, wherein the resource consumption corresponds to usage of one or more processors or memory coupled to the one or more processors.

63. The data processing system of claim 59, wherein the resource consumption corresponds to power usage or data transfer bandwidth.

Technical Field

The present disclosure relates generally to the field of differential privacy. More particularly, the present disclosure relates to a system that implements an effective differential privacy mechanism while still maintaining privacy and utility guarantees.

Background

As the amount of information collected in online environments grows, individuals are increasingly protected from providing various forms of information. Thus, differential privacy has become an important consideration for providers of aggregated online information. In a crowdsourced client/server environment, local differential privacy introduces randomness to user data before the client and server share the user data. The server may learn from the aggregation of crowdsourced data for all clients, but the server cannot learn data provided by any particular client. As more user information is collected, a general pattern begins to appear, which may inform and enhance the user experience. Thus, differential privacy provides insight from large datasets, but also has a mathematical proof that information for a single individual remains private.

When local differential privacy is employed, the client device needs to perform various operations to create the privatized data. Client device operations may include encoding data, which in some cases (e.g., where the order of random values reaches thousands or millions) may be resource intensive in terms of computational cost and transmission bandwidth. In addition, the server must also perform the corresponding intensive arithmetic processing privatization data. Accordingly, there is a continuing need to provide efficient mechanisms for implementing local differential privacy of user data.

Disclosure of Invention

Embodiments described herein provide a privacy mechanism for protecting user data when transmitting such data to a server that estimates the frequency of such data in a set of client devices. In one embodiment, a differential privacy mechanism is implemented using count-mean sketch techniques that may reduce the resource requirements needed to enable privacy while providing provable guarantees about privacy and utility. For example, the mechanism may provide the ability to tailor the utility (e.g., estimated accuracy) according to resource requirements (e.g., transmission bandwidth and computational complexity).

One embodiment provides a non-transitory machine-readable medium storing instructions that, when executed by one or more processors of a computing device, cause the computing device to perform operations comprising selecting a user data value to transmit to a server from a set of possible user data values collected on a client device; creating hash entries for the user data values using a random hash function, wherein the random hash function is generated from randomly selected variants of the entries, and wherein a set of possible variants of the entries is indexed; encoding at least a portion of the hash entry for the value as a vector, wherein the encoding comprises updating the vector value at a location corresponding to the hash; privatizing the vector by changing at least some of the vector values with a predefined probability; and transmitting the privatization vector and the index value of the randomly selected variant to the server to enable the server to estimate a frequency of user data values on a set of client devices.

One embodiment provides an electronic device comprising one or more processors and memory coupled to the one or more processors, the memory storing instructions that, when executed by the one or more processors, cause the one or more processors to select a user data value to transmit to a server from a set of possible user data values collected on a client device; creating hash entries for the user data values using a random hash function, wherein the random hash function is generated from randomly selected variants of the entries, and wherein a set of possible variants of the entries is indexed; encoding at least a portion of the created hash value as a vector, wherein encoding at least the portion of the created hash comprises: updating the vector value at a location corresponding to the created hash value; privatizing the vector by changing at least some of the vector values with a predefined probability; and transmitting the privatization vector and the index value of the randomly selected variant to a server that estimates a frequency of user data values in a set of different client devices.

One embodiment provides a data processing system comprising one or more processors and memory coupled to the one or more processors, the memory storing instructions that, when executed by the processors, cause the processors to perform operations comprising selecting a user data value to be transmitted to a server from a set of possible user data values collected on a client device, the set of possible user data values creating hash terms for the user data value using a random hash function, wherein the random hash function is generated from variants of the randomly selected terms, and wherein a set of possible variants of the terms are indexed; encoding at least a portion of the created hash value into a vector, wherein the encoding comprises: updating the vector value at a location corresponding to the created hash value; privatizing the vector difference by changing at least some of the vector values with a predefined probability; and transmitting the privatization vector and the index values of the randomly selected variants to a server, wherein the server estimates the frequency of the user data values by updating a frequency table indexed by a set of possible variants.

Other features of the present embodiments will be apparent from the accompanying drawings and from the detailed description that follows.

Drawings

Embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings in which like reference numerals refer to similar elements.

Fig. 1 is a block diagram illustrating an exemplary overview of a system environment according to an embodiment of the present disclosure.

Fig. 2 is a block diagram of a system for differentially privatizing shared user data according to an embodiment of the present disclosure.

Fig. 3 is an exemplary process flow for differentially privatizing encoding of user data according to an embodiment of the disclosure.

Fig. 4 illustrates an exemplary data flow for transmitting a privatized encoding of user data to a server for frequency estimation, according to an embodiment.

Fig. 5-6 depict mathematical algorithms of a differential privacy mechanism according to embodiments described herein.

Fig. 7 is an exemplary flow diagram illustrating a method of differentially privatizing encoding of user data to be transmitted to a server, according to an embodiment.

Fig. 8A-8F are exemplary flowcharts and illustrations regarding crowdsourcing user interaction and device resource consumption data, according to an embodiment.

Fig. 9 is a block diagram illustrating an exemplary API architecture that may be used in some embodiments.

Fig. 10A-10B are block diagrams of an exemplary API software stack, according to an embodiment.

Fig. 11 is a block diagram of a mobile device architecture according to an embodiment.

Fig. 12 is a block diagram illustrating an exemplary computing system that may be used in conjunction with one or more of the embodiments of the present disclosure.

Detailed Description

In various instances, the user experience of computing devices may be improved by attempting to understand the current trends in the use of these devices. For example, the suggestion of a predictive keyboard may be improved by determining which new words are popular or which emoticons are selected most frequently. The behavior of the web browser in browsing certain websites may be adjusted based on the detected user behavior. Additionally, battery life may be extended by determining which websites are currently presenting issues that may affect the battery life of the device. However, such data may be considered personal to the user and should be privatized or otherwise encoded to mask the identity of the user providing such data. Embodiments described herein provide differential privacy encoding for user data that is used to estimate the frequency of such data in a set of client devices. Such embodiments provide differential privacy techniques that can be used to reduce resource requirements or enhance user experience while providing provable guarantees about privacy and utility.

Various embodiments and aspects will be described with reference to details discussed below, and the accompanying drawings will illustrate the various embodiments. The following description and drawings are illustrative and are not to be construed as limiting. Numerous specific details are described to provide a thorough understanding of various embodiments. However, in certain instances, well-known or conventional details are not described in order to provide a concise discussion of the implementations.

Reference in the specification to "one embodiment" or "an embodiment" or "some embodiments" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment. The appearances of the phrase "an embodiment" in various places in the specification are not necessarily all referring to the same embodiment.

It should be noted that there may be variations to the flowcharts or steps (or operations) described herein without departing from the embodiments described herein. For example, the steps may be performed in parallel, concurrently, or in a different order, or steps may be added, deleted or modified.

The present disclosure recognizes that the use of personal information data in the techniques herein may be useful to benefit a user. For example, the personal information data may be used to deliver target content that is of greater interest to the user. Thus, the use of such personal information data enables planned control of delivered content. In addition, the present disclosure also contemplates other uses for which personal information data is beneficial to a user.

The present disclosure further contemplates that entities responsible for the collection, analysis, disclosure, transmission, storage, or other use of such personal information data will comply with established privacy policies and/or privacy practices. In particular, such entities should enforce and adhere to the use of privacy policies and practices that are recognized as meeting or exceeding industry or government requirements for maintaining privacy and security of personal information data. For example, personal information from a user should be collected for legitimate and legitimate uses by an entity and not shared or sold outside of these legitimate uses. In addition, such collection should only be done after the user has informed consent. In addition, such entities should take any required steps to secure and protect access to such personal information data, and to ensure that others who are able to access the personal information data comply with their privacy policies and procedures. In addition, such entities may subject themselves to third party evaluations to prove compliance with widely accepted privacy policies and practices.

Regardless of the foregoing, the present disclosure also contemplates embodiments in which a user selectively prevents use or access to personal information data. That is, the present disclosure contemplates that hardware elements and/or software elements may be provided to prevent or block access to such personal information data. For example, in the case of an ad delivery service, the disclosed technology may be configured to allow a user to opt-in or opt-out of participating in the collection of personal information data during registration with the service. As another example, the user may choose not to provide location information for the targeted content delivery service. As another example, the user may choose not to provide accurate location information, but to permit transmission of location area information.

Differential privacy mechanism

Embodiments described herein provide a differential privacy mechanism that can be used to privatize user data collected for crowdsourcing. As a general overview, local differential privacy introduces randomness to client user data before sharing the user data. As opposed to having a centralized data source D ═ D1_iBelonging to a separate client i. Given the transcript T interacting with client i_iIf the data element is to be replaced with null, the adversary may not be able to distinguish T_iAnd the transcript that should have been generated. The degree of indistinguishability (e.g., degree of privacy) is parameterized by ε, which is a privacy parameter that represents a tradeoff between the strength of privacy assurance and the accuracy of the published results. In general, ε is considered to be a small constant. In some embodiments, the epsilon value may vary based on the type of data to be privatized, with more sensitive data being privatized to a higher degree. The following is a formal definition of local differential privacy.

Let n be the number of clients in the client-server system, let Γ be the set of all possible transcripts generated from any single client-server interaction, and let T be_iIs a transcript generated by the differential privacy algorithm a when interacting with client i. Is provided withd_iAnd e is the data element of the client i. If for all subsets

The following equation holds true, and algorithm a is epsilon local differential privacy:

here, d_iNull refers to the case where a data element of client i is removed. In other words, an adversary with n-1 data points of a data set cannot reliably test whether the nth data point is a particular value. Thus, the differential privatized data set cannot be queried in a manner that enables any particular user data to be determined.

The systems (and methods) disclosed herein include a epsilon local differential private count mean sketch mechanism that may provide improvements in accuracy, bandwidth, and computational cost relative to clients and servers while preserving user privacy. A proprietary count mean sketch mechanism may be provided within the system environment as described herein. The use of averages in a count-mean sketch mechanism enables a trade-off between processing time and accuracy. With other mechanisms, such as a median-based mechanism, increased processing time may not result in increased accuracy. In contrast, the count-mean sketch mechanism can be used to achieve higher accuracy by taking more processing time.

Fig. 1 is a block diagram of an overview of such a system 100, according to an embodiment of the present disclosure. As shown, system 100 may include client devices 110A-110C (or collectively 110) and a server 130, which may be coupled via a network 120. The network 120 may be any suitable type of wired or wireless network, such as a Local Area Network (LAN), a Wide Area Network (WAN), or a combination thereof. Client device 110 may include any type of computing device, such as a desktop computer, tablet computer, smart phone, television set-top box, or other computing device as described below in device architecture 1100 and/or system 1200. Exemplary client devices include, but are not limited to, devices such as

Watch、

TV or other computing devices provided by Apple Inc.

Client device 110 may be associated with users within a large set of users (e.g., crowd-sourced). Each client device 110 may transmit the privatized user data 112A-112C (collectively 112) as a differential private sketch (or encoding). The sketch is a computationally encoded representation of the user data values. The purpose of the sketch is to transmit an encoded representation of user data to the server so that the server cannot directly learn the user data values transmitted by a particular client. The user data may be any form of information, such as information related to the user or information related to user actions performed on the client device. For example, the value may include websites visited, user interaction or behavior data, words or emoticons used by the user, deep links, preferences, data from a questionnaire, or any other data that the user may wish or request to remain private. In one embodiment, the user data may be any form of information that is limited and from a known set. For example, the known set may include a predefined set of possible values known to server 130.

Server 130 may accumulate the privatized user data 112 and determine statistical attributes such as user data frequency estimates 131 in a set of client devices 110. As described, the server 130 may learn from an aggregation of crowd-sourced data, but may not learn user data provided by any particular client device 110. The server 130 may be any type of server or cluster of servers and may include cloud-based servers, application servers, backend servers, or a combination thereof.

Fig. 2 is a block diagram of a system 200 for differentially privatizing user data according to an embodiment of the present disclosure. The client device 110 may include a Differential Privacy Engine (DPE)228 that includes a differential privacy daemon 225 and a differential privacy framework or Application Programming Interface (API)227, and a plurality of applications 230A-230C. One or more of the applications 230A-230C may create the user data 205, and the DPE 228 may use various tools, such as a hash function (or cryptographic hash function), to privatize the user data 205 using a differential privacy mechanism as described further herein.

In one embodiment, client device 110 may store information related to user interactions with an application or service. Data related to the interaction of a user with a particular website may include information related to the interaction of the user with various features of the website. For example, the user data may include various information related to presentation settings of content provided by a website. For example, user interaction data related to automatic playback of media content, such as whether the user stopped or paused the automatically played media content, may be analyzed. Accordingly, various auto-play settings and/or device functions may be adjusted based on crowd-sourced user interaction patterns.

In one embodiment, the user data 205 may be stored on the client device in a differential private form. This user data 205 in a privatized form may be shared with the server 130 (e.g., subsampled) or may be used for other purposes, such as for use and diagnostic purposes. It should be noted that even when user data 205 is not shared with server 130, it may still be stored in a differential private form.

In one embodiment, the whitelist stores a list of websites (or web pages) in which particular features are enabled. For example, the whitelist may include a list of websites in which auto-play is enabled as further described herein.

Server 130 may include a receiving module 250 and a frequency estimation module 260 to determine frequency estimates 131 that may be stored in various data structures, such as a frequency table. The receiving module 250 may asynchronously receive a crowd-sourced data sketch of user data for a large number of multiple client devices 110. The receiving module 250 may remove the potential identifier from the received sketch data. The potential identifier may include an IP address, metadata, a session identifier, or other data that may identify the particular client device 110 that sent the sketch. The frequency estimation module 260 may periodically process the received privatized user data 205 using an operation such as, but not limited to, a count mean sketch operation, which is a variation of a count minimum sketch operation that uses an average value rather than a minimum value. In one implementation, frequency estimation module 260 may update the frequency table to determine frequency estimate 131, as further described herein.

Fig. 3 is an exemplary process flow of differentially privatizing encoding of user data to be transmitted to a server according to an embodiment of the disclosure. As shown in diagram 300, a system, which may be included within client device 110 as shown in FIG. 1, may select a user data 301 value to be transmitted to a server. In the illustrated example, the user data 301 values correspond to websites visited. However, any type of data is contemplated, and the user data 301 values may be represented as items 302 in any suitable format.

As shown, the entry may be the domain name of the website visited. However, other types of representations may be used, such as, but not limited to, a URI (Uniform resource identifier) or URL may also be used. The term may also be a visited web page of a website, where the representation may identify a domain name of the visited website and a particular web page of the website. As described herein, a web page is a single page or document that is presented from or hosted by a website, but a single presented web page may include content from multiple documents. A web site is a collection of related web pages that are presented with the same name, grouping, or organization. Web pages from a web site are typically (but not necessarily) hosted by the same domain. A single domain may host multiple websites, where each website includes multiple web pages. As described herein, when referring to a website, the reference may also apply to a collection of web pages associated with the website or a domain associated with the website.

The term 302 may be converted to a numerical value using a hash function. As shown, in one embodiment, a SHA256 hash function is used. However, any other hash function may be used. For example, variations of SHA or other algorithms may be used, such as SHA1, SHA2, SHA3, MD5, Blake2, and the like, with various bit sizes. Thus, any hash function (or block cipher) may be used in implementations as long as they are well known to both the client and the server.

As described above, embodiments of the present disclosure may reduce the computational resources required and the bandwidth required for differential privacy algorithms. In one embodiment, the computational logic may use a portion of the created hash value along with the variant 304 of item 302 to resolve potential hash conflicts when frequency counting is performed by the server, which increases computational efficiency when maintaining a provable level of privacy. The variant 304 may correspond to a set of k values (or k index values) that are well known to the server. In one embodiment, to create the variant 304, the system may append a representation of the index value 306 to the item 302. As shown in this example, an integer (e.g., "1,") corresponding to the index value may be appended to the URL to create a variant (e.g., "1, applet. com", or "applet. com 1", etc.). The system may then select a random variant 309 (e.g., a variant at random index value r). Thus, the system may generate the random hash function 307 by using a variant of the item 304 (e.g., the random variant 309), thereby enabling a series of k hash functions to be generated by using the variant. The series of hash functions is well known to the server, and the system may create the hash value 303 using a randomly selected hash function 307. In one embodiment, to reduce computation, the system may only create hash values 303 of randomly selected variants 309. Alternatively, the system may create a complete set of hash values (e.g., k hash values) or hash values up to a randomly selected variant r. It should be noted that the sequence of integers is shown as an example of an index value, but other forms of representations (e.g., various numbers of character values) or functions (e.g., another hash function) may be used as index values as long as they are well known to both the client and the server.

Once hash value 303 is generated, the system may select a portion of hash value 308. In this example, a 16-bit portion may be selected, but other sizes (e.g., 8, 16, 32, 64, etc. digits) may also be expected based on a desired level of precision or computational cost of the differential privacy algorithm. For example, increasing the number of bits (or m) increases the computation (and transmission) cost, but an improvement in accuracy can be obtained. For example, using 16 bits to provide 2¹⁶1 (e.g., about 65k) potentially unique values (or mA range of values). Similarly, increasing the value of variant k increases the computational cost (e.g., the cost of computing a sketch), but in turn increases the accuracy of the estimation. As noted, the system may encode this value, and as shown, the encoding may be in the form of a vector 306. For example, vector 306 may have a value of 2¹⁶A size of 1, and each position of the vector may correspond to a potential value of the created hash 303. It should be noted that vectors are described herein for convenience and mathematical purposes, but any suitable data structure may be implemented, such as a bit string, an object, and so forth.

As shown in the diagram 350 of fig. 3, the created hash value 303 (as a decimal number) may correspond to the vector/bit position 305. Thus, vector 306 may be encoded by updating a value (e.g., setting a bit to 1) at location 305. To account for any potential deviations in the 0 or null values, the system may use an initialization vector 317. In one embodiment, the initialization vector 317 may be the vector v ← [ -1)]^m×c_εThereby making c_εNoise having an average value of 0 is added to the initialization vector. The noise should be large enough to mask individual items of user data, but small enough to allow any pattern in the data set to occur. It should be noted that these values are used as mathematical terms, but may be encoded using bits (e.g., 0 ═ c_ε,1＝-c_ε,). Thus, vector 306 may create an encoding 308 using initialization vector 317, where the value (or bit) at location 305 is changed (or updated). For example, the sign of the value at position 305 may be flipped, thereby making the value c_ε(or + c)_ε) And all other values are reserved as-c_εAs shown (or vice versa).

The system may then create a privatized code 312 by changing at least some of the values with predetermined probabilities 313. In one embodiment, the system may flip the sign of the value (e.g., (-) to (+), or vice versa) at a predetermined probability 313. As further described herein, in one embodiment, the predetermined probability is 1/(1+ e)^ε)。

Thus, the user data 301 value is now represented as a privatized code 312, which separately maintains the privacy of the user. The privatization encoding 313 may be stored on the client device 110 or transmitted to the server 130. The privatization code 313 may be transmitted separately or as part of a batch. To further ensure privacy, the client device 110 may also sub-sample the data. For example, the client device 110 may only send the most frequently visited websites on a given date or any other time period.

It should be noted that additional bits may be added to the above-described encoding to transmit additional information. For example, additional bits may be added based on sorting the user data values, as further described herein. For example, add 2 bits (e.g., 2)²) The ability to encode 4 classes is provided. As described above, differential privacy mechanisms allow for a large number of data elements (e.g., p) and thus may provide an efficient mechanism for transmitting data as described herein, which may not be practical due to the resource requirements of previous mechanisms.

Fig. 4 illustrates an exemplary data flow 400 for transmitting a privatized encoding of user data to a server for frequency estimation, according to an embodiment. As shown, the server 130 may accumulate user data in the form of privatized data from different client devices 110A-B. When transmitting information, each client device may transmit a privatized encoding 312 of the user data along with an index value (or reference to an index value) of the random variable. For example, as shown, client device 110A transmits a privatization code 312 to access an applet. The random variants used for such encoding correspond to the random variants at index value 1. Thus, the client device 110A may transmit the value "1" as an index value of the random variation. Com website, along with a corresponding index value for the random variant of the code, which in this case is "15".

The accumulated user data may then be processed by the server (either in batches or as a data stream) to generate frequency estimates 131. In one embodiment, the server may maintain a frequency estimate 131 in the form of a sketch, such as a frequency table. The frequency table may be indexed by a set of possible variant index values k. Then LiThe rows of the frequency table corresponding to the index values of the randomly selected variants are updated with the privatization vector. In one implementation, the frequency estimation may be based on a draft count mean. For a particular term (e.g., as in term 302 of fig. 3), server 130 may calculate each variation of the hash value series and add the corresponding frequency to the list. Com, server 130 may calculate the frequency of a particular item (e.g., item applet). E.g. h₁Com, SHA256(1, applet) is position 305 b. The server may obtain the frequency at the corresponding location (e.g., 9 as shown) and add the frequency to the list. The server may obtain all frequencies with values 1-k to obtain a list of all frequencies for the item. The server may then determine an average of the frequencies in the list (e.g., a draft count average). However, it should be noted that other counting techniques may also be used, such as a count minimum sketch, a count median sketch, or a count maximum sketch. The mechanism enables frequency estimation with accurate variance bounds, whereas previous mechanisms typically only provide an upper bound. For example, and in one embodiment, server 130 may estimate the frequency of any element s as a random variable whose average is equal to the sum of

A defined true count and standard deviation, where σ is defined by the following equation:

fig. 5-6 depict a more formal (mathematical) algorithm for the differential privacy mechanism according to an embodiment. As described, the system may create a sketch (e.g., privatized encoding 312) to provide a compact data structure to maintain data stream D ═ { D ═ D₁,.. } presence of the element S ═ S { (S)₁,...,s_pFrequency of the field of. Thus, the epsilon local differential private version of the count mean sketch can be used on the server to generate a frequency oracle that maintains user privacy. The frequency oracle is based on data D ═ D received from n clients₁,.. } returning an estimate of the data item S e SA function of the count. The differential privacy mechanism may be one of two types: (1) epsilon-local differential private implementation a_CLIENTOr (2) Hadamard epsilon-local differential private implementation A_{CLIENT-Hadamard}。

FIG. 5 illustrates an ε -local differential private implementation A according to an embodiment of the present disclosure_CLIENTProcess 500. Process 500 may be implemented by an algorithm executed by processing logic described herein, which may include software, hardware, or a combination thereof. For example, process 500 may be performed by a system (such as system 100 in fig. 1).

Client-side epsilon-local differential private algorithm A_CLIENTCan include the following steps: (1) a privacy parameter epsilon; (2) hash range, m; (3) an index value r; and (4) data elements: d ∈ S. Thus, the system (e.g., client device 110) may implement Algorithm A_CLIENTTo generate an epsilon-local differential private sketch based on the following operations.

In operation 501, the system may calculate a constant

And initializes the vector v: v ← c_ε ^m. Constant c_εThe noise is allowed to increase to maintain privacy at zero mean, thus preserving unbiased.

In operation 502, the system may select a random variant r of the data element d.

In operation 503, the system may set a portion of n ← hash (r of d).

In operation 504, the system may set v [ n ]]←c_ε。

In operation 505, the system may apply a vector b ∈ { -1, +1}^mSampling is performed, wherein each b_jAre independent and have a probability of

The same distribution of + 1.

In operation 506, the system may generate

In operation 507, the system may return a vector v_privAnd an index value r.

FIG. 6 illustrates Hadamard version A of an ε -local differential private implementation in accordance with an embodiment of the present disclosure_{CLIENT-Hadamard}The method (or algorithm) 600. Client side a of epsilon-local differential private implementation_{CLIENT-Hadamard}The versions may include: (1) a privacy parameter, ε; (2) hash range, m; (3) an index value, r; and (4) data elements: d ∈ S. Thus, a system (e.g., client device 110 in system 100 of FIG. 1) may use Algorithm A_{CLIENT-Hadamard}A Hadamard version of the epsilon-local differential private implementation is generated based on the following operations.

In operation 601, the system may compute constants

And initializes the vector v: v ← [ 0)]^m。

In operation 602, the system may select a random variant r of the data element d.

In operation 603, the system may set a portion of n ← hash (r of d).

In operation 604, the system may set v [ n ]]←c_ε。

In operation 605, the system may generate a vectorWherein H_mIs a Hadamard matrix of dimension m. A Hadamard matrix is a square matrix whose entries are either +1 or-1 and whose rows are mutually orthogonal.

In operation 606, the system may sample the index j at [ m ] and the bits b ∈ { -1,1}, with the index j]Are independently and identically distributed, thereby making b a probability of

And +1 of (1).

In operation 607, the system returns c_ε·b·v_Had1amard[j]The selected index j and the index value r.

Based onThe particular algorithm (or method) used by the client device, the server (e.g., server 130) may generate a frequency table or other data structure to perform frequency estimation on user data values in different client devices. As described above, this estimation may be based on a count mean sketch (e.g., a variation of a count minimum sketch). The values of the frequency table may be based on whether the client uses the epsilon-local differential private sketch algorithm A_CLIENTOr Hadamard epsilon-local differential private sketch algorithm A_{CLIENT-Hadamard}But is incremented. The operation for each is as follows.

If the client uses A_CLIENTThe algorithm accumulates the sketch, then vector v_privAdding to matching sketch data W_k,mThe following are:

for a vector corresponding to a vector for generating v_privW of the selected variant of (1)_hW is_hIs set as W_h+v_pri_v。

If the client uses A_{CLIENT-Hadamard}The algorithm generates a sketch, then vector v is_HadamardAdding to matching sketch data W_k,mThe following are:

1. for a vector corresponding to a vector for generating v_HadamardW of the selected variant of (1)_hLine, set W_h＝W_h+v_Hadamard。

2. Before determining the count mean sketch W, the rows are converted from a Hadamard basis to a standard basis:

wherein H_mIs a Hadamard matrix of dimension m.

Fig. 7 is an exemplary flow diagram illustrating a process 700 of differentially privatizing encoding of user data to be transmitted to a server according to an embodiment of the present disclosure. Process 700 may be performed by processing logic that may comprise software, hardware, or a combination thereof. For example, process 700 may be performed by system 100, such as in fig. 1, via client device 110.

In 701, the system may select a user data value to transmit to a server from a set of possible user data values collected on a client device.

At 702, the system may create a hash entry for the user data value using a random hash function. To generate a random hash function, the system may hash using a variant of the randomly selected item. In one embodiment, a set of possible variations of the value may be indexed. In one embodiment, the user data value may be a character string, and the set of possible variations of the user data value includes one or more characters representing corresponding index values appended to the variations of the character string.

In 703, the system may encode at least a portion of the created hash value into a vector. In one embodiment, encoding includes updating the vector values at locations corresponding to the representation. For example, the sign of the value at the position corresponding to the representation may be flipped. In one embodiment, encoding may include initializing the vector with uniform values and symbols. In one embodiment, initializing the vectors may further comprise multiplying each vector value by a constant c_ε＝(e^ε+1)/(e^ε-1). Furthermore, as mentioned above, the encoded vector may also be in the form of a Hadamard matrix.

In 704, the system may privatize the vector difference by changing at least some of the vector values with a predefined probability. In one embodiment, the predefined probability may be 1/(1+ e)^ε) Where ε is a privacy parameter.

In 705, the system may transmit the privatization vector and the index value of the randomly selected variant to the server. As described above, the server may estimate the frequency of user data values in a set of different client devices. The server may estimate the frequency of the user data value by updating the frequency table indexed by the set of possible variations. For example, a row or column of the frequency table corresponding to the index value of the randomly selected variant may be updated with the privatization vector. Furthermore, only the index values of the privatization vector and the randomly selected variant may be transmitted to the server as information representing the user data value. In one embodiment, the randomly selected variant may prevent hash collisions when only that portion of the created hash value is used, and may also reduce the number of computations required by the server to create the frequency table while still maintaining epsilon-local differential privacy of the user data values.

Improving user experience using privatized crowd-sourced data

In another aspect of the present disclosure, systems (and methods) are described that collect crowdsourcing data to enhance user experience using the differential privacy mechanisms described herein. For example, the user experience may be enhanced by inferring potential user preferences from analyzing crowd-sourced user interaction data. In some implementations, crowd-sourced data related to a particular website, application, or service may be collected. For example, in one embodiment, user interactions related to presentation of content, such as content from an online source, may be analyzed. Further, in one embodiment, websites exhibiting particular characteristics may be determined while masking the identity of users that are helpful in determining these characteristics. For example, a website that consumes a certain level of client resources may be identified using privatized crowdsourcing data. Data is collected into a privatized crowdsourced data set, where the identity of individual contributors is masked. The contributor data may be masked on a user device of the contributor prior to transmitting the contribution for inclusion in the data set. Differential privacy can be maintained for crowd-sourced data sets, thereby making it impossible to determine the identity of individual contributors to a data set through multiple structured queries of the data set. For example, the data set may be privatized, thereby making it impossible for an adversary with any background knowledge of the data set to infer that a particular record in the input data set is significantly more responsible for the observed output than a set of other input records.

The number of contributions a given user may contribute to a crowd-sourced data set over a given time period may be limited. In one embodiment, a privacy budget is established for a particular type of crowdsourced data. The amount of contribution from each user is then limited based on the privacy budget. The privacy budget for a particular type of data may also be associated with the epsilon value (privacy parameter) used when privatizing the particular type of data. For example, based on the sensitivity of the data, the privacy budget and privacy parameters for different types of data may vary.

Although differential privacy techniques are described herein, other methods of privatizing user data may be employed instead of or in addition to the differential privacy techniques described herein. Some embodiments may implement other privatization techniques including secure multiparty computing and/or homomorphic encryption. Secure multi-party computing enables multiple parties to jointly compute input functions while keeping these inputs private. Homomorphic encryption is a form of encryption that allows calculations to be performed on ciphertext (encrypted data) to produce an encrypted result that, when decrypted, matches the result of operations performed on plaintext. In all implementations, user data to be used for crowdsourcing will be cleaned up prior to transmission. In addition, the user data to be transmitted can be stored locally in a privatized coding manner.

Fig. 8A-8F are exemplary flowcharts and illustrations regarding crowdsourcing user interaction and device resource consumption data, according to an embodiment. Fig. 8A is an exemplary flow diagram illustrating crowdsourcing user interaction data according to an embodiment. Fig. 8B is an exemplary flow diagram illustrating a method of processing crowd-sourced user interaction data, according to an embodiment. Fig. 8C is an exemplary flow diagram illustrating a resource consumption based crowd-sourced data method according to an embodiment. Fig. 8D is an exemplary flow diagram illustrating a method of processing crowdsourced data based on resource consumption, according to an embodiment. Fig. 8E shows an exemplary diagram illustrating various usage and frequency information that may be derived from crowd-sourced resource consumption data, according to an embodiment. Fig. 8F illustrates a presentation of content for which crowd-sourced data regarding user interaction and resource consumption may be collected according to embodiments described herein.

Fig. 8A illustrates an exemplary process 800 in which processing logic may be used, which may include software, hardware, or a combination thereof. For example, process 800 may be performed by client device 110 of system 100, as shown in fig. 1.

In operation 801, the system may monitor user interactions with an application, service, or domain. For example, the system may monitor user interactions with a website (or web application, service, etc.). The system may analyze various user interactions and determine (or infer) the user's preferences. For example, the system may determine (or infer) a preference to enable or disable automatic play of media content with respect to a particular website. As another example, the system can determine a preference for display settings relative to the content. For example, the system may infer a preference with respect to a reader mode (e.g., a mode that does not display tools or menu items, e.g., to simulate reading of a book). As another example, the system may infer a preference for a content blocking setting. For example, based on user interaction, the system may determine which types of elements (e.g., unsafe or malicious) within the website to block.

As described above, in one embodiment, the system may monitor user actions related to a website that initiates or attempts to initiate automatic playback of media content. As referred to herein, auto-play relates to media content that is arranged to automatically begin playing (e.g., by code, script, or other logic associated with a website) without explicit input from a user (e.g., selection of a play button). In addition, automatic playback may also occur in various situations. Typically, auto-play is initiated when a user visits a website, but it should be noted that auto-play may also be initiated in other circumstances, such as when a user scrolls to a particular portion of a web page or when media content is ready to be played (e.g., buffered), etc. Further, as referred to herein, media content may include various multimedia, such as video (with or without audio) or audio, which may be in various formats. In one embodiment, the media content may also include various additional components or plug-ins, such as

The website may initiate automatic playback in response to a particular event. As described above, websites typically initiate auto-play when the website is accessed (e.g., submitted URL). It should be noted, however, that in some cases it may take a certain amount of time to load media content. Thus, in some cases, a user may be able to browse a website (e.g., a text portion) before the website initiates playback of media content that is set to auto-play. In such examples, the system may be after (or in response to) the media content is ready (e.g., buffered) to begin playbackThis) monitors user interactions.

In operation 802, the system may classify the user interaction. In one embodiment, classifying the user interactions includes associating websites or web pages into categories based on the user interactions. When monitoring user interactions related to automatic playback of media content, the system can infer preferences for enabling automatic playback. Thus, the system may monitor various user interactions that provide an indication of a preference for an auto-play enabled user. For example, the user interaction may include allowing automatic playback of the media content beyond a predetermined time value once the media content is presented to the user. As referred to herein, a time value may include a time interval (e.g., 10 seconds) or various other values such as a percentage (e.g., viewing more than 50% of the length of the content item). As another example, the user interaction may include the user maximizing or enlarging the media content within a time value of accessing the web page. As another example, the user interaction may include playing a video within a certain time value of accessing a web page. For example, a user who wants to play a video immediately may provide an indication that the user may prefer to enable automatic play on that particular website.

Further, the user action may include selecting to play the video with automatic play disabled (e.g., via default system settings or preferences). In one embodiment, this may include selecting to play a video that disables automatic play for a predetermined amount of time to access the web page. It should be noted that the user interactions herein may be collected in an environment where the system may automatically adjust the parameters of the auto-play. For example, the system may implement user preferences for enabling or disabling auto-play. As another example, the system may adjust one or more parameters, such as volume, mute, size, or delay, for presentation of the media content that is set to be automatically played. A further non-limiting example of setting auto-play preferences based on aggregated data can be found in commonly assigned U.S. patent application No. 62/506,685 entitled "Device, method, and graphical user interface for managing web presentation", filed on 16/5.2017, which is incorporated herein by reference in its entirety.

The system may also monitor various user interactions that provide an indication to a user that preference is given to disabling auto-play. For example, the user action may include interrupting the automatic playback of the media content for a predetermined amount of time to present the media content to the user. For example, interrupting automatic play may include stopping or pausing media content, and muting or reducing the volume to a certain level (e.g., below a threshold in the form of a percentage or value). As another example, the user interaction may include closing or minimizing an application or a tab associated with the web page. Further, user interaction may include performing system functions, such as enabling system muting or lowering system volume. As another example, where automatic play is disabled, the user interaction may include navigating (e.g., scrolling) away from a content item set to automatic play without selecting to play media content.

Further, the user interactions may include various interactions that may be monitored using one or more sensors of the device. For example, various sensors on the device may monitor user engagement or disengagement. For example, a user being in proximity to a device and actively interacting with the device may infer a degree of engagement. Conversely, a user looking away or away from the device may infer a degree of disengagement. Thus, the behavior of the user may be determined or inferred using one or more sensors of the device.

Further, it should be noted that other categories may be used in addition to those discussed above. For example, categories that provide a degree of preference (e.g., very strong, weak, very weak) or any other classification technique may also be used.

Once the user interaction data is collected, it may be sent to a server for analysis. As described herein, the system may ensure user privacy by implementing a differential privacy mechanism. Furthermore, to further protect the privacy of the user, only data samples can be sent to the server.

Thus, in 803, the system may privatize the encoding of entities associated with user interactions (e.g., a web page or a website associated with the web page) and categories of user interactions (e.g., preferences to enable auto-play or preferences to disable auto-play). The encoding may utilize any of the encoding techniques described herein, and may mask various contributors to the data using one of various privatization techniques. In one embodiment, the encoding differential is privatized using the techniques described herein.

In 804, the system may transmit a differential privatization encoding to a server for estimating a classification frequency of entities in the crowd-sourced data. As described above, the server may perform various operations (e.g., a draft count mean) to determine the frequency estimate. The server may determine a frequency estimate based on the classification scheme. For example, a frequency estimate may be determined in which the user preferences enable automatic playback for a particular web page or web site, and a frequency estimate in which the user preferences disable automatic playback for that web page or web site.

Fig. 8B is an exemplary flow diagram illustrating a process 810 of processing crowd-sourced user interaction data according to an embodiment of the disclosure. Process 810 may use processing logic that may include software, hardware, or a combination thereof. For example, process 810 may be performed by a server (e.g., server 130) as described herein.

In operation 811, the system may receive a privatized encoding of a representation of an entity and an associated category from each of a set of client devices. The entity may be a web page or a website associated with the web page. In one embodiment, the entity may be a web page, and the representation (e.g., URL) identifies a website or domain associated with the web page. In one embodiment, the category may relate to inferring user preferences for enabling or disabling automatic playback of media content on web pages of a website. In one embodiment, the privatization code may be an identifier or locator of the website. In one embodiment, the interaction data may be related to an application that includes a web application, where the web application is a website hosted application encoded in a language such as Java or JavaScript, or an HTML 5-based application that may be loaded as a web page.

In operation 812, the system may accumulate the privatized encoded frequency estimates received from the set of client devices. For example, the system may use a sketch such as the frequency table shown in FIG. 4.

In operation 813, the system may estimate the frequency of the selected entities associated with the category in the frequencies included in the sketch. For example, as described above, the system may determine the frequency using a count mean sketch operation.

In some embodiments, the system may also add or remove particular entities (e.g., web pages, websites) from the whitelist based on the determined frequency, as shown in operation 814. For example, websites with a high percentage of users (e.g., over 90%) inferred preferences for enabling automatic playback may be added to the whitelist. Similarly, a user percentage (e.g., over 50%) representing a preference for disabling automatically played websites may be removed from the white list (or added to the black list).

As described above, in another aspect of the present disclosure, the user experience may be enhanced by identifying particular websites that exhibit particular characteristics. In one implementation, websites associated with high resource consumption may be identified. For example, high resource consumption may be identified based on thresholds for particular resources (such as CPU, memory, and power usage). By identifying such sites, a developer may determine which sites may be problematic or which sites may be candidates for development work.

Fig. 8C is an exemplary flow diagram illustrating a process 820 of crowdsourcing data based on resource consumption according to an embodiment. Process 820 may use processing logic that may include software, hardware, or a combination thereof. For example, process 820 may be performed by client device 110 as described herein.

In operation 821, the system may monitor resource consumption associated with the application. The application may be a native application executing on a client device or may be a web application executing on a website. The application may also be a plug-in application that executes based on content on the website, and the system may monitor consumed resources as the device browses the website. In one implementation, the resource consumption may include one or more of processor (e.g., CPU, graphics processor, etc.) usage, memory usage, power usage, or transmission bandwidth usage. However, it is contemplated that any other usage or statistical information may be monitored.

In operation 822, the system may determine that the resource consumption exceeds a threshold. For example, a resource threshold may be exceeded if a particular process or application or a process or application associated with a website uses a certain percentage of CPU (e.g., more than 20%).

In operation 823, the system may privatize the encoding of the identifier of the application using any of the encoding methods described herein, and in one embodiment, privatize one or more usage categories. In the case of a web application, a plug-in, or another web related application, the identifier may include the name or URL of a website or web page. The system may privatize the encoding using the differential privacy techniques described herein, although other privatization and/or anonymization techniques may be used.

In operation 824, the system may transmit the differential privatization code to the server. As described above, the server may perform various operations (e.g., a draft count mean) to determine the frequency estimate. For example, the server may determine the frequency of applications and/or websites that exceed a particular threshold. As another example, the server may determine the frequency of applications and/or websites falling within a particular threshold range or storage area, as shown in FIG. 8E.

Fig. 8D is an exemplary flow diagram illustrating a process 840 of processing crowdsourced data based on resource consumption according to an embodiment of the disclosure. Process 840 may use processing logic that may include software, hardware, or a combination thereof. For example, process 840 may be performed by server 130 in system 100 of FIG. 1.

In 841, the system may receive a privatized encoding of an identifier of an application or website from each of a set of client devices. Each application or website is selected for transmission in response to the resource consumption threshold being exceeded. The identifier may be an application name, a website, or an application name and a website.

In 842, the system may accumulate the privatized encoded frequency estimates received from the set of client devices. The frequency estimate may include a sketch, such as the frequency table shown in fig. 4.

In 843, the system may estimate the frequency of selected applications, websites, or website pages exceeding a particular threshold. For example, as described above, the system may determine the frequency using a count mean sketch operation.

In some embodiments, the system in 844 may also adjust the resource consumption threshold based on the analysis of the resource consumption usage pattern. For example, the system may determine that the percentage of websites that exceed a predetermined threshold has increased over a period of time (e.g., months). Accordingly, a predetermined threshold that triggers an indication indicative of high resource consumption may be dynamically adjusted (e.g., increased) based on a continuous analysis of crowd-sourced data.

Fig. 8E shows an exemplary diagram illustrating various usage and frequency information that may be derived from crowd-sourced resource consumption data, according to an embodiment. For example, as shown, the number of web sites within a particular level of CPU usage and memory usage may be determined 852. Further, multiple websites exceeding a particular threshold may also be determined as shown. As another example, the frequency 857 of web sites 855 exhibiting particular characteristics may also be determined. For example, the frequency with which a particular website may cause an application to crash may be monitored, and the frequency of visits to the most popular websites may be tracked. It should be noted that these examples are merely illustrative, and that myriad other metrics may be used depending on the particular application.

Fig. 8F illustrates a presentation of content for which crowd-sourced data regarding user interaction and resource consumption may be collected according to embodiments described herein. As shown, application 860 may execute on a data processing system or an electronic device described herein, such as client device 110 of FIG. 1. Application 860 may be one of applications 230A-230C of fig. 2, and may generate user data that may be privatized and submitted as privatized crowd-sourced data. In one embodiment, the application 860 is a web browser application, but may be any application executing on an electronic device or data processing system, including mobile applications executing on mobile, handheld, or wearable electronic devices. In one embodiment, the application 860 generates user data relative to user interaction with the application relative to content presented by the application. In one embodiment, the application 860 generates data relative to the resource consumption of the application, for example, when the application presents content, such as a web page or web application.

Where the application 860 is a web browser, the application may be used to navigate to a website hosted on a network, such as the internet. The application 860 may display a plurality of tags 861, each of which exposes the same or different content 870. Content 870 may be presented from a web page (e.g., example. html826) hosted on a website (e.g., www.example.com 863). Web page 862 may be one of several web pages on a web site, and web site 863 may be one of several web sites hosted on a domain (e.g., example. Web page 862, web site 863, or domain name can be included in the representation of user data that is privatized and transmitted to the crowdsourcing server. Content 870 may include various types of content items, including text content 871, 872 and media content 880. Media content 880 may present media item 882 as a content item, as well as media control 884 for starting, pausing, or stopping playback of media item 882. For example, if automatic playback is enabled for the media item 882, the user may pause playback of the media item using the media control 884. The user may also use media control 882 to initiate playback of a media item that is set to auto-play, but auto-play is prevented due to the display settings of content 870. The applications 860 may include settings that may configure the display of the content 870. For example, the application 860 may include a user interface element 864 to enable reader mode settings 864 as described herein. The reader mode setting may be used to limit presentation of content 870 to only certain text content 871, 822. Additionally, application 860 may provide the user with an option to block particular elements of content 870.

In one embodiment, the privatized crowd-sourced data is collected based on detected interactions related to content items (e.g., media items 882) of the presented content 870. Based on the interaction, the presented content 870 may be associated with the category based on the interaction. The category may be selected from a set of categories relating to inferred preferences for presentation of the presented content. For example, applications 860 may include various settings that control how content 870 is presented. The interactions may be used to place the content 870 into a category associated with the inferred setting. For example, based on the interaction, the content 870 may be associated with a category related to the inferred preferences to allow for automatic playback of the media item 882. The content 870 may also be associated with a category related to the inferred preferences to prevent automatic playback of the media item 882. In one embodiment, the content may be associated with the inferred preferences to enter a reader mode. In one embodiment, content may be associated with inferred preferences to prevent display of certain content items. A privatization encoding may be created on the client device that includes a representation of the presentation content 870 and a representation of a category with which the presentation content is associated. The privatization codes can then be transmitted to a server, which can accumulate the privatization codes from multiple devices to estimate the frequency with which the presentation content is associated with a particular category. Privatized data with respect to user interaction with content can be used to crowd-source various display or presentation settings associated with the content. The categories may be determined relative to user preferences for zoom settings, brightness settings, or any application or device settings. In one embodiment, a set of desired layouts for user interface elements or content items may be determined based on crowd-sourced data regarding user interactions with the user interface elements and content items.

It should be noted that the data sampling described herein is an example, and thus it is contemplated that any type of data may be sampled (e.g., collected) in a differential, private manner to determine various frequencies or statistics. For example, the above-described methods may be equally applicable to a variety of user information or user interactions that may occur with various components of a system or application or service.

Thus, as described above, the mechanisms of the present disclosure leverage the potential of crowdsourcing data in maintaining user privacy (e.g., via a local differential privacy mechanism) to potentially gain valuable insight for development work.

Exemplary application Programming interface diagrams

Embodiments described herein include one or more Application Programming Interfaces (APIs) in an environment in which calling program code interacts with other program code called through the one or more programming interfaces. Various function calls, messages, or other types of calls may further include various parameters, and these calls may be transmitted via an API between the calling program and the called code. In addition, the API may provide the calling program code with the ability to use the data types or categories defined in the API and implemented in the called program code.

The API allows a developer of the API-calling component (which may be a third party developer) to take advantage of the specified features provided by the API-implementing component. There may be one API-calling component or there may be more than one such component. An API may be a source code interface provided by a computer system or library to support service requests from applications. An Operating System (OS) may have multiple APIs to allow applications running on the OS to call one or more of those APIs, and a service (e.g., a library) may have multiple APIs to allow applications using the service to call one or more of those APIs. The API may be specified in a programming language that can be interpreted or compiled when the application is built.

In some embodiments, the API-implementing component may provide more than one API, each providing a different view or having different aspects that access different aspects of the functionality implemented by the API-implementing component. For example, one API of an API-implementing component may provide a first set of functions and may be exposed to third party developers, and another API of the API-implementing component may be hidden (not exposed) and provide a subset of the first set of functions, and also provide another set of functions, such as testing or debugging functions that are not in the first set of functions. In other embodiments, the API-implementing component may itself invoke one or more other components via the underlying API, and thus be both an API-calling component and an API-implementing component.

The API defines the language and parameters used by the API-calling component when accessing and using the specified features of the API-implementing component. For example, the API-calling component accesses specified features of the API-implementing component through one or more API calls or references (e.g., implemented by function or method calls) exposed by the API, and passes data and control information using parameters via the API calls or references. The API implementing component may return a value through the API in response to an API call from the API calling component. Although the API defines the syntax and results of the API call (e.g., how the API call is caused and what the API call can do), the API may not reveal how the API call completes the function specified by the API call. The various API calls are transmitted via one or more application programming interfaces between the call (API calling component) and the API implementing component. Transferring API calls may include issuing, initiating, referencing, calling, receiving, returning, or responding to function calls or messages; in other words, the transport can describe the actions of either the API-calling component or the API-implementing component. A function call or other reference to the API may send or receive one or more parameters through a parameter list or other structure. A parameter may be a constant, a key, a data structure, an object class, a variable, a data type, a pointer, an array, a list, or a pointer to a function or method, or another way of referencing data or other item to be passed via the API.

Further, the data type or class can be provided by the API and implemented by the API-implementing component. Thus, the API-calling component may declare variables using definitions provided in the API, use pointers to such types or classes, use or instantiate constant values for such types or classes.

Generally, an API can be used to access services or data provided by an API implementing component or to initiate execution of operations or computations provided by the API implementing component. By way of example, the API-implementing component and the API-calling component may each be any of an operating system, a library, a device driver, an API, an application program, or other module (it being understood that the API-implementing component and the API-calling component may be the same or different types of modules as each other). In some cases, the API-implementing component may be implemented at least in part in firmware, microcode, or other hardware logic components. In some embodiments, the API may allow a client program to use services provided by a Software Development Kit (SDK) library. In other embodiments, an application or other client program may use an API provided by the application framework. In these embodiments, the application or client may incorporate the call into a function or method provided by the SDK and provided by the API, or use a data type or object defined in the SDK and provided by the API. In these embodiments, the application framework may provide a primary event loop for the program that responds to various events defined by the framework. The API allows an application to specify events and responses to events using an application framework. In some implementations, the API calls can report to the application the capabilities or state of the hardware device, including capabilities or states related to aspects such as input capabilities and states, output capabilities and states, processing capabilities, power states, storage capacity and states, communication capabilities, and the like, and the API may be implemented in part by firmware, microcode, or other low-level logic components that execute in part on the hardware components.

The API-calling component may be a local component (i.e., on the same data processing system as the API-implementing component) or a remote component (i.e., on a different data processing system than the API-implementing component) that communicates with the API-implementing component through the API over a network. It should be understood that an API-implementing component may also act as an API-calling component (i.e., it may make API calls to APIs exposed by different API-implementing components), and an API-calling component may also act as an API-implementing component by implementing APIs exposed by different API-calling components.

The API may allow multiple API-calling components written in different programming languages to communicate with the API-implementing component (so that the API may include features for translating calls and returns between the API-implementing component and the API-calling component); however, the API may be implemented in a particular programming language. In one embodiment, the API-calling component may invoke APIs from different providers, such as one set of APIs from an OS provider and another set of APIs from a plug-in provider, and another set of APIs from another provider (e.g., a provider of a software library) or a creator of another set of APIs.

Fig. 9 is a block diagram illustrating an exemplary API architecture, which may be used in some embodiments described herein. The API architecture 900 includes an API implementing component 910 (e.g., an operating system, library, device driver, API, application, software, or other module) that implements an API 920. The API 920 specifies one or more functions, methods, classes, objects, protocols, data structures, formats, and/or other characteristics of the API implementing component that may be used by the API calling component 930. The API 920 may specify at least one calling convention that specifies how functions in the API-implementing component receive parameters from the API-calling component and how the functions return results to the API-calling component. An API calling component 930 (e.g., an operating system, library, device driver, API, application, software, or other module) makes API calls through the API 920 to access and use the API-implementing component 910 features specified by the API 920. The API-implementing component 910 may return a value to the API-calling component 930 through the API 920 in response to the API call.

It is to be appreciated that the API-implementing component 910 can include additional functions, methods, classes, data structures, and/or other features not specified by the API 920 and not available to the API-calling component 930. It is to be appreciated that the API-calling component 930 can be on the same system as the API-implementing component 910 or can be remotely located and access the API-implementing component 910 through a network using the API 920. Although fig. 9 shows a single API-calling component 930 interacting with the API 920, it should be understood that other API-calling components, different from the API-calling component 930, that may be written in a different language (or the same language) may use the API 920.

The API-implementing component 910, the API 920, and the API-calling component 930 may be stored in a machine-readable medium, which includes any mechanism for storing information in a form readable by a machine (e.g., a computer or other data processing system). For example, a machine-readable medium includes magnetic disks, optical disks, random access memories; read only memory, flash memory devices, and the like.

Fig. 10A-10B are block diagrams of exemplary API software stacks 1000, 1010, according to embodiments. Fig. 10A illustrates an exemplary API software stack 1000 in which an application 1002 may make calls to service a or service B using a service API and call to an operating system 1004 using an OS API. Further, service A and service B may call operating system 1004 using several OS APIs.

FIG. 10B illustrates an exemplary software stack 1010 including application 1, application 2, service 1, service 2, and operating system 1004. As shown, service 2 has two APIs, one of which (service 2API 1) receives calls and returns values from application 1 and the other (service 2API2) receives calls and returns values from application 2. Service 1 (which may be a software library, for example) makes calls to OS API 1 and receives values returned, and service 2 (which may be a software library, for example) makes calls to both OS API 1 and OSAPI 2 and receives values returned. Application 2 makes calls to OS API2 and receives the returned values.

Additional exemplary computing devices

Fig. 11 is a block diagram of a device architecture 1100 for a mobile or embedded device, according to an embodiment. The device architecture 1100 includes a memory interface 1102, a processing system 1104 including one or more data processors, image processors and/or graphics processing units, and a peripheral interface 1106. The various components may be coupled by one or more communication buses or signal lines. The various components may be separate logic components or devices or may be integrated on one or more integrated circuits, such as a system-on-a-chip integrated circuit.

The memory interface 1102 may be coupled to memory 1150, which may include high speed random access memory such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM) and/or non-volatile memory such as, but not limited to, flash memory (e.g., NAND flash, NOR flash, etc.).

Sensors, devices, and subsystems can be coupled to peripherals interface 1106 to facilitate multiple functions. For example, a motion sensor 1110, a light sensor 1112, and a proximity sensor 1114 may be coupled to the peripheral interface 1106 to facilitate mobile device functionality. There may also be one or more biometric sensors 1115, such as a fingerprint scanner for fingerprint recognition or an image sensor for facial recognition. Other sensors 1116 may also be connected to the peripheral interface 1106, such as a positioning system (e.g., a GPS receiver), temperature sensor, or other sensing device to facilitate related functions. Camera subsystem 1120 and optical sensor 1122, such as a Charge Coupled Device (CCD) or Complementary Metal Oxide Semiconductor (CMOS) optical sensor, may be utilized to facilitate camera functions, such as taking photographs and video clips.

Communication functions can be facilitated by one or more wireless communication subsystems 1124, which can include radio frequency receivers and transmitters and/or optical (e.g., infrared) receivers and transmitters. The specific design and implementation of the wireless communication subsystem 1124 may depend upon the communication network in which the mobile device is intended to operate. For example, a mobile device that includes the illustrated device architecture 1100 may include a wireless communication subsystem 1124 designed to operate over a GSM network, a CDMA network, an LTE network, a Wi-Fi network, a Bluetooth network, or any other wireless network. In particular, the wireless communication subsystem 1124 may provide a communication mechanism in which the media playback application may retrieve resources from a remote media server or retrieve scheduled events from a remote calendar or event server.

An audio subsystem 1126 may be coupled to speaker 1128 and microphone 1130 to facilitate voice-enabled functions, such as voice recognition, voice replication, digital recording, and telephony functions. In the smart media device described herein, audio subsystem 1126 may be a high quality audio system including support for virtual surround sound.

The I/O subsystem 1140 may include a touch screen controller 1142 and/or other input controllers 1145. For computing devices that include a display device, a touch screen controller 1142 may be coupled to a touch sensitive display system 1146 (e.g., a touch screen). The touch-sensitive display system 1146 and touch screen controller 1142 may, for example, detect contact and motion or pressure using any of a variety of touch and pressure sensing technologies, including but not limited to capacitive, resistive, infrared, and surface acoustic wave technologies, as well as other proximity sensor arrays or other elements for determining one or more points of contact with the touch-sensitive display system 1146. Display output of touch sensitive display system 1146 may be generated by display controller 1143. In one embodiment, the display controller 1143 may provide frame data to the touch sensitive display system 1146 at a variable frame rate.

In one embodiment, a sensor controller 1144 is included to monitor, control, and/or process data received from one or more of the motion sensor 1110, light sensor 1112, proximity sensor 1114, or other sensors 1116. Sensor controller 1144 may include logic to interpret the sensor data to determine the occurrence of one of a plurality of motion events or activities by analyzing the sensor data from the sensors.

In one embodiment, the I/O subsystem 1140 includes one or more other input controllers 1145, such as one or more buttons, rocker switches, thumb wheels, infrared ports, USB ports, and/or pointer devices such as styluses, or control devices such as up/down buttons for volume control of the speaker 1128 and/or microphone 1130, which may be coupled to other input/control devices 1148.

In one embodiment, memory 1150 coupled to memory interface 1102 may store instructions for an operating system 1152, including a Portable Operating System Interface (POSIX) compatible and incompatible operating system or an embedded operating system. Operating system 1152 may include instructions for handling basic system services and for performing hardware related tasks. In some implementations, the operating system 1152 can be a kernel.

Memory 1150 may also store communication instructions 1154 to facilitate communication with one or more additional devices, one or more computers, and/or one or more servers, such as to obtain web resources from a remote web server. Memory 1150 may also include user interface instructions 1156, including graphical user interface instructions to facilitate graphical user interface processing.

Further, the memory 1150 may store sensor processing instructions 1158 to facilitate sensor-related processing and functions; telephone instructions 1160 to facilitate telephone-related processes and functions; instant message instructions 1162 to facilitate electronic message processing-related processes and functions; web browser instructions 1164 to facilitate processes and functions related to web browsing; media processing instructions 1166 to facilitate media processing-related processes and functions; the location service instructions include GPS and/or navigation instructions 1168 and Wi-Fi based location instructions to facilitate location-based functionality; camera instructions 1170 to facilitate camera-related processes and functions; and/or other software instructions 1172 to facilitate other processes and functions, such as security processes and functions and system-related processes and functions. Memory 1150 may also store other software instructions, such as web video instructions that facilitate web video-related processes and functions; and/or online shopping instructions that facilitate processes and functions related to online shopping. In some implementations, the media processing instructions 1166 are divided into audio processing instructions and video processing instructions to facilitate audio processing-related processes and functions and video processing-related processes and functions, respectively. A mobile device identifier, such as an International Mobile Equipment Identity (IMEI)1174 or similar hardware identifier, may also be stored in the memory 1150.

Each of the instructions and applications identified above may correspond to a set of instructions for performing one or more functions described above. The instructions need not be implemented as separate software programs, procedures or modules. Memory 1150 may include additional instructions or fewer instructions. Further, various functions may be performed in hardware and/or software, including in one or more signal processing and/or application specific integrated circuits.

Fig. 12 is a block diagram illustrating a computing system 1200 that may be used in conjunction with one or more of the embodiments described herein. The illustrated computing system 1200 may represent any device or system (e.g., client device 110, server 130) described herein that performs any process, operation, or method of the present disclosure. It should be noted that while a computing system illustrates various components, it is not intended to represent any particular architecture of components or manner of interconnecting the components as such details are not germane to the present disclosure. It should also be understood that other types of systems having fewer or more components than shown may also be used in conjunction with the present disclosure.

As shown, computing system 1200 may include a bus 1205 that may be coupled to a processor 1210, a ROM (read only memory) 1220, a RAM (or volatile memory) 1225, and a storage device (or non-volatile memory) 1230. Processor 1210 may retrieve stored instructions from one or more of memories 1220, 1225, and 1230 and execute the instructions to perform the processes, operations, or methods described herein. These memories represent non-transitory machine-readable media (or computer-readable media) or storage devices containing instructions that, when executed by a computing system (or processor), cause the computing system (or processor) to perform the operations, processes, or methods described herein. The RAM 1225 may be implemented, for example, as dynamic RAM (dram) or other types of memory that require constant power to refresh or maintain data within the memory. Storage 1230 may include, for example, magnetic storage, semiconductor storage, tape storage, optical storage, removable storage, non-removable storage, and other types of storage that retain data even after power is removed from the system. It should be appreciated that the storage 1230 can be at a remote location (e.g., accessible via a network) relative to the system.

A display controller 1250 may be coupled to bus 1205 to receive display data to be displayed on a display device 1255, which may display user interface features or any of the embodiments described herein, and may be a local or remote display device. Computing system 1200 may also include one or more input/output (I/O) components 1265, including a mouse, keyboard, touch screen, network interface, printer, speakers, and other devices. Typically, the input/output components 1265 are coupled to the system through an input/output controller 1260.

Module 1270 (or component, unit, function, or logic) may represent any of the functions or engines described above, such as differential privacy engine 228. Module 1270 may reside, completely or at least partially, within the memory described above, or within the processor during execution thereof by the computing system. Further, module 1270 may be implemented as software, firmware, or functional circuitry within a computing system, or a combination thereof.

In some embodiments, the hash function described herein (e.g., SHA256) may utilize dedicated hardware circuitry (or firmware) of a system (client device or server). For example, the function may be a hardware acceleration function. Further, in some embodiments, the system may use functions that are part of a dedicated instruction set. For example, an instruction set may be used, which may be an extension of the instruction set architecture for a particular type of microprocessor. Thus, in one embodiment, the system may provide a hardware acceleration mechanism for performing SHA operations. Thus, the system may use these instruction sets to increase the speed at which the functions described herein are performed.

Further, the hardware acceleration engine/functionality is contemplated to include any implementation in hardware, firmware, or a combination thereof, including various configurations that may include hardware/firmware integrated into the SoC as a separate processor, or included as a dedicated CPU (or core), or integrated into a co-processor on a circuit board, or included on a chip that extends the circuit board, and so forth.

Thus, while such acceleration functionality is not necessarily required to achieve differential privacy, some embodiments herein may potentially improve the overall efficiency of the implementation with the generality of specialized support for such functionality (e.g., cryptographic functionality).

It should be noted that the terms "about" or "substantially" may be used herein and may be interpreted as "as nearly as possible," "under technical limitations," or the like. In addition, unless otherwise indicated, use of the term "or" indicates an inclusive or (e.g., and/or).

In the foregoing specification, exemplary embodiments of the present disclosure have been described. It will be evident that various modifications may be made thereto without departing from the broader spirit and scope of the disclosure. The specification and drawings are, accordingly, to be regarded in an illustrative sense rather than a restrictive sense. The specific details in the description and examples provided may be used anywhere in one or more embodiments. Various features of different embodiments or examples may be combined differently with some features included and others excluded to accommodate a variety of different applications. Examples may include subject matter, such as a method, an apparatus to perform the acts of the method, at least one machine readable medium comprising instructions that, when executed by a machine, cause the machine to perform the acts of the method, or perform the acts of an apparatus or system in accordance with the embodiments and examples described herein. In addition, various components described herein may be means for performing the operations or functions described herein.

In one aspect of the disclosure, a system (and method) is described that ensures differential privacy when transmitting data to a server that estimates the frequency of such data in a set of client devices. Differential privacy can reduce resource requirements while still providing provable guarantees about privacy and utility. For example, the mechanism may provide the ability to tailor the utility (e.g., estimated accuracy) according to resource requirements (e.g., transmission bandwidth and computational complexity). To account for reduced resource requirements (e.g., reduced encoded bit lengths), the mechanism may estimate the frequency of the data using a count-mean sketch, as described further herein.

With respect to resource requirements, the mechanism may implement a hash function that provides the ability to reduce computational requirements by using only a portion of the generated hash value. To avoid hash collisions using only a portion of the hash value, the mechanism may use a variant in hashing user data. The use of variants allows the mechanism to implement shared hashing to reduce the amount of required computation that the client and server must perform. With respect to utility, the mechanism provides frequency estimation within a predictable deviation that includes a lower bound and an upper bound.

In another aspect of the disclosure, systems and methods are described for collecting crowdsourcing data to enhance user experience using the privacy mechanisms described herein. For example, the user experience may be enhanced by inferring potential user preferences from analyzing crowd-sourced user interaction data. Development efforts may be refined or enhanced with respect to application behavior based on statistical analysis of user interactions related to various features or events. The privatization technique for privatizing crowd-sourced user data is not limited to differential privacy techniques. For example, privatization of crowdsourced data may be achieved using secure multiparty computing and/or homomorphic encryption.

In one embodiment, user interactions related to presentation of content, such as content from an online source, may be analyzed. For example, presentation settings or preferences may be defined based on crowd-sourced user interaction data.

In one embodiment, the presentation settings may include automatic playback settings for the media content, enabling analysis of crowd-sourced data related to automatic playback of the media content. For example, the system may determine or infer crowd-sourced preferences that enable or disable automatic play of media content with respect to a particular website. User interactions that include immediately stopping or muting the automatic play of a media item upon accessing a website may be viewed as inferring a preference to disable automatic play. Conversely, when a web page is accessed that disables automatic playback of media content (e.g., by default system settings or preferences), and the user selects to play media content that disables automatic playback, it may be inferred that the user prefers to enable automatic playback on such a website. Thus, collecting such user interaction data from various devices and analyzing the data on a server (e.g., via a local differential privacy mechanism) allows developers to potentially gain valuable insight into a particular website. For example, websites with an overestimated user frequency that provides for auto-play preference inference enabled may be added to a "white list" (e.g., a list of websites that allow or enable auto-play functionality).

In addition, other settings related to the presentation of the content may also be analyzed. Additional presentation settings, such as content display settings (e.g., reader mode settings) or content blocking settings, for example, may also be analyzed using the differential privacy mechanism described herein.

In another aspect, the user experience may also be enhanced by identifying particular websites that exhibit particular characteristics. In one implementation, websites associated with high resource consumption may be identified. For example, high resource consumption may be identified based on thresholds for resource usage, such as CPU, memory, and power. By identifying such websites, developers can determine which websites may be problematic or potential candidates for analysis to identify the cause of high resource consumption.

In the above description, privacy techniques have been described. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense. The specific details in the description and examples provided may be used anywhere in one or more embodiments. Various features of different embodiments or examples may be combined differently with some features included and others excluded to accommodate a variety of different applications. Examples may include subject matter, such as a method, an apparatus to perform the acts of the method, at least one machine readable medium comprising instructions that, when executed by a machine, cause the machine to perform the acts of the method, or perform the acts of an apparatus or system in accordance with the embodiments and examples described herein. Further, various components described herein may be means for performing operations or functions described in accordance with the implementations. Therefore, the true scope of the embodiments will be apparent to the skilled practitioner upon a study of the drawings, specification, and following claims.

46页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种网络及网络管理方法

Differential privacy using count mean sketch

相关技术

网友询问留言