User portrait generation method and device based on privacy protection and storage medium

文档序号:1156239 发布日期:2020-09-15 浏览:3次 中文

阅读说明:本技术 基于隐私保护的用户画像生成方法、装置及存储介质 (User portrait generation method and device based on privacy protection and storage medium ) 是由 徐杰 于 2020-05-27 设计创作,主要内容包括:本发明涉及人工智能,提出一种基于隐私保护的用户画像生成方法、装置及存储介质,通过纵向联邦学习模型获得目标用户的组合特征用户数据;对获得的组合特征用户数据按照设定维度形成分类数据,将所述分类数据生成数据文件;利用大数据平台对数据文件进行数据挖掘,以获得目标用户的个体分析参数和全局参数,并根据所述个体分析参数和全局参数描述目标用户的用户画像。本发明还涉及区块链技术,组合特征用户数据可存储于区块链中。本发明通过对数据集的提供者的筛选,增加联邦学习的效率,达到了提升用户画像数据的契合度的技术效果。(The invention relates to artificial intelligence, and provides a user portrait generation method, a device and a storage medium based on privacy protection, wherein combined characteristic user data of a target user is obtained through a longitudinal federal learning model; forming classification data for the obtained combined feature user data according to set dimensions, and generating data files from the classification data; and data mining is carried out on the data file by utilizing a big data platform so as to obtain individual analysis parameters and global parameters of the target user, and the user portrait of the target user is described according to the individual analysis parameters and the global parameters. The invention also relates to a block chain technology, and the combined characteristic user data can be stored in the block chain. According to the invention, through screening of the data set provider, the federal learning efficiency is increased, and the technical effect of improving the conformity of user portrait data is achieved.)

1. A user portrait generation method based on privacy protection is applied to an electronic device, and is characterized by comprising the following steps:

s110, obtaining combined characteristic user data of a target user through a longitudinal federal learning model;

s120, forming classification data for the obtained combined feature user data according to set dimensions, and generating data files from the classification data;

s130, data mining is carried out on the data file by using a big data platform to obtain individual analysis parameters and global parameters of the target user, and the user portrait of the target user is described according to the individual analysis parameters and the global parameters.

2. The privacy-preserving-based user representation generation method of claim 1, wherein the combined feature user data is stored in a block chain, and the method for obtaining the combined feature user data of the target user through a longitudinal federated learning model in the step S110 comprises:

s210, screening a first party data sample of the target user and second party sample data matched with the first party sample data by using a pre-selected third party, and performing feature combination on the first party data sample and the second party data sample;

s220, combining the first party data sample, the second party data sample and a third party into a federation;

and S230, carrying out federal learning training on the federal alliance to obtain combined characteristic user data of the target user.

3. The privacy-preserving-based user representation generation method of claim 2, wherein the third-party screening of the second-party sample data matching the first-party sample data in step S210 comprises:

s310, a third party carries out standard marking on the feature combination through a feature category dictionary and obtains category saturation information of the first party sample data and the candidate second party sample data;

s320, the third party encrypts the first party sample data and the candidate second party sample data and compares the encrypted first party sample data and the candidate second party sample data to obtain the feature saturation and the client contact ratio;

and S330, screening out candidate second-party sample data with the characteristic saturation greater than a characteristic saturation threshold value X1 and the customer contact ratio greater than a customer contact ratio threshold value X2 by a third party, and taking the candidate second-party sample data as second-party sample data matched with the first-party sample data.

4. The privacy-preserving-based user representation generation method of claim 2,

in step S230, the method for performing federal learning training on the federal alliance to obtain the combined feature user data of the target user includes:

obtaining an AUC value of the Federal alliance, and comparing the AUC value with a preset threshold value Y;

if the AUC value is smaller than a preset threshold value Y, repeatedly combining the values into a federal alliance and carrying out federal learning;

and if the AUC value is larger than the federal model evaluation threshold value Y, the obtained data is used as combined feature user data.

5. The privacy preserving-based user representation generation method of claim 4, wherein the AUC values of the Federal alliance are obtained by ROC AUCH method.

6. A user representation generation system comprises a combined feature user data generation unit, a user representation generation unit; wherein the content of the first and second substances,

the combined characteristic user data generation unit is used for obtaining combined characteristic user data of a target user through a longitudinal federated learning model;

and the user portrait generation unit is used for forming classification data according to set dimensions on the obtained combined characteristic user data, generating a data file from the classification data, mining the data file by using a big data platform to obtain individual analysis parameters and global parameters of a target user, and describing the user portrait of the target user according to the individual analysis parameters and the global parameters.

7. The user representation generation system of claim 6,

the combined feature user data is stored in a blockchain,

the combined characteristic user data generation unit comprises a characteristic combination generation module, a federal alliance generation module and a combined characteristic user data generation module; wherein the content of the first and second substances,

the characteristic combination generating module is used for screening a first party data sample of the target user and second party sample data matched with the first party sample data by using a pre-selected third party and carrying out characteristic combination on the first party data sample and the second party data sample;

the federation generation module is used for combining the first party data sample, the second party data sample and the third party of the target user into a federation;

and the combined characteristic user data generation module is used for carrying out federal learning training on the federal alliance to obtain the combined characteristic user data of the target user.

8. An electronic device, comprising: a memory, a processor, said memory having stored therein a user representation generation program, said user representation generation program when executed by said processor implementing the steps of:

s110, obtaining combined characteristic user data of a target user through a longitudinal federal learning model;

s120, forming classification data for the obtained combined feature user data according to set dimensions, and generating data files from the classification data;

s130, data mining is carried out on the data file by using a big data platform to obtain individual analysis parameters and global parameters of the target user, and the user portrait of the target user is described according to the individual analysis parameters and the global parameters.

9. The electronic device according to claim 8, wherein the method for obtaining combined feature user data through a longitudinal federal learning model in step S110 comprises:

s210, screening a first party data sample of the target user and second party sample data matched with the first party sample data by using a pre-selected third party, and performing feature combination on the first party data sample and the second party data sample;

s220, combining the first party data sample, the second party data sample and the third party of the target user into a federation;

and S230, carrying out federal learning training on the federal alliance to obtain combined characteristic user data of the target user.

10. A computer-readable storage medium comprising a stored data area storing data created according to use of blockchain nodes and a stored program area storing a computer program comprising user representation generation program which when executed by a processor implements the steps of the privacy protection based user representation generation method of any one of claims 1 to 5.

Technical Field

The present invention relates to artificial intelligence, and more particularly, to a method, system, apparatus, and storage medium for generating a user portrait based on privacy protection.

Background

The user portrait is drawn and sketched on the characteristics of the user by the enterprise through a machine/deep learning related model according to data such as a business system, an event system, relationship information and the like of the enterprise. However, due to the lack of information types, data can only cover the main business characteristics of the enterprise, and the comprehensiveness and accuracy of the formed user portrait model are not high.

In order to build a more accurate user representation model, enterprises tend to increase the dimensionality of data information by exchanging data with other enterprises. However, with the booming caused by the Facebook data leakage event, data privacy protection regulations are promulgated successively by various countries, such as GDPR (general data protection regulations) in the european union, and related network security regulations promulgated in China. In the future, user privacy protection becomes a factor which has to be considered by enterprises in the process of constructing user portraits.

Disclosure of Invention

The invention provides a user portrait generation method, a user portrait generation system, an electronic device and a computer readable storage medium based on privacy protection, which mainly utilize the characteristic that the dimensionality of data information can be increased without data exchange of federal learning data, increase the screening process of federal learning user data and solve the problem of insufficient accuracy and comprehensiveness of user portrait.

In order to achieve the above object, the present invention further provides a method for generating a user representation based on privacy protection, applied to an electronic device, the method for generating a user representation comprising:

s110, obtaining combined characteristic user data of a target user through a longitudinal federal learning model;

s120, forming classification data for the obtained combined feature user data according to set dimensions, and generating data files from the classification data;

s130, data mining is carried out on the data file by using a big data platform to obtain individual analysis parameters and global parameters of the target user, and the user portrait of the target user is described according to the individual analysis parameters and the global parameters.

Further, preferably, the combined feature user data is stored in a blockchain,

the method for obtaining the combined feature user data of the target user through the longitudinal federal learning model in the step S110 includes:

s210, screening a first party data sample of the target user and second party sample data matched with the first party sample data by using a pre-selected third party, and performing feature combination on the first party data sample and the second party data sample;

s220, combining the first party data sample, the second party data sample and a third party into a federation;

and S230, carrying out federal learning training on the federal alliance to obtain combined characteristic user data of the target user.

Further, preferably, the method for screening, by the third party, the second party sample data matched with the first party sample data in step S210 includes:

s310, a third party carries out standard marking on the feature combination through a feature category dictionary and obtains category saturation information of the first party sample data and the candidate second party sample data;

s320, the third party encrypts the first party sample data and the candidate second party sample data and compares the encrypted first party sample data and the candidate second party sample data to obtain the feature saturation and the client contact ratio;

and S330, screening out candidate second-party sample data with the characteristic saturation greater than a characteristic saturation threshold value X1 and the customer contact ratio greater than a customer contact ratio threshold value X2 by a third party, and taking the candidate second-party sample data as second-party sample data matched with the first-party sample data.

Further, preferably, in the step S230, the federal alliance is subjected to federal learning training, and the method for obtaining the combined feature user data includes:

obtaining an AUC value of the Federal alliance, and comparing the AUC value with a preset threshold value Y;

if the AUC value is smaller than a preset threshold value Y, repeatedly combining the values into a federal alliance and carrying out federal learning;

and if the AUC value is larger than the federal model evaluation threshold value Y, the obtained data is used as combined feature user data.

Further, preferably, the AUC value of the federal alliance is obtained by ROC AUCH method.

In order to achieve the above object, the present invention further provides a user portrait generation system, which includes a combined feature user data generation unit, a user portrait generation unit; wherein the content of the first and second substances,

the combined characteristic user data generation unit is used for obtaining combined characteristic user data of a target user through a longitudinal federated learning model;

and the user portrait generation unit is used for forming classification data according to set dimensions on the obtained combined characteristic user data, generating a data file from the classification data, mining the data file by using a big data platform to obtain individual analysis parameters and global parameters of a target user, and describing the user portrait of the target user according to the individual analysis parameters and the global parameters.

Further, preferably, the combined feature user data is stored in a blockchain,

the combined characteristic user data generation unit comprises a characteristic combination generation module, a federal alliance generation module and a combined characteristic user data generation module; wherein the content of the first and second substances,

the characteristic combination generating module is used for screening a first party data sample of the target user and second party sample data matched with the first party sample data by using a pre-selected third party and carrying out characteristic combination on the first party data sample and the second party data sample;

the federation generation module is used for combining the first party data sample, the second party data sample and the third party of the target user into a federation;

and the combined characteristic user data generation module is used for carrying out federal learning training on the federal alliance to obtain the combined characteristic user data of the target user.

To achieve the above object, the present invention also provides an electronic device, including: a memory, a processor, said memory having stored therein a user representation generation program, said user representation generation program when executed by said processor implementing the steps of:

s110, obtaining combined characteristic user data of a target user through a longitudinal federal learning model;

s120, forming classification data for the obtained combined feature user data according to set dimensions, and generating data files from the classification data;

s130, data mining is carried out on the data file by using a big data platform to obtain individual analysis parameters and global parameters of the target user, and the user portrait of the target user is described according to the individual analysis parameters and the global parameters.

Further, preferably, the method for obtaining the combined feature user data through the longitudinal federal learning model in the step S110 includes:

s210, screening a first party data sample of the target user and second party sample data matched with the first party sample data by using a pre-selected third party, and performing feature combination on the first party data sample and the second party data sample;

s220, combining the first party data sample, the second party data sample and the third party of the target user into a federation;

and S230, carrying out federal learning training on the federal alliance to obtain combined characteristic user data of the target user.

In addition, to achieve the above object, the present invention provides a computer-readable storage medium including a storage data area storing data created according to use of a block chain node and a storage program area storing a computer program including a user representation generation program that realizes the above steps of the user representation generation method based on privacy protection when executed by a processor.

According to the user portrait generation method, the system, the electronic device and the computer readable storage medium based on privacy protection, the matching degree of a second party data set and a first party data set in federal learning is improved by increasing the screening process of user portrait data, the first party, the second party and a third party obtain multi-azimuth and more comprehensive combined characteristic user data of the user portrait after federal learning, and finally obtain more accurate user portrait results by using the obtained combined characteristic user data; the beneficial effects are as follows:

1) the data source of the user portrait is enriched by utilizing a longitudinal federal learning method, and the accuracy of the user portrait is improved;

2) the federal learning efficiency is increased and the integrating degree of user portrait data is improved by screening the data set providers;

3) and the dimensionality of data information is increased on the premise of ensuring that data is not exchanged and protecting personal data privacy through federal learning.

Drawings

FIG. 1 is a flow chart of a preferred embodiment of a user representation generation method of the present invention;

FIG. 2 is a flow chart of a preferred embodiment of the present invention for obtaining combined feature data;

FIG. 3 is a flowchart of a method for screening second party sample data that matches the first party sample data according to a preferred embodiment of the present invention;

FIG. 4 is a schematic diagram of a longitudinal federated learning model of the present invention;

FIG. 5 is a schematic flow chart of the federated learning training of the present invention for obtaining combined feature data;

FIG. 6 is a schematic diagram of a user representation generation system according to a preferred embodiment of the present invention;

FIG. 7 is a schematic structural diagram of an electronic device according to a preferred embodiment of the invention;

the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In the process of constructing the user analysis portrait, the generated user portrait is not comprehensive and accurate enough due to single data information dimension and the fact that the data information dimension can only cover the main business characteristics of a single enterprise; if the dimensionality of the data information is increased, the sharing of the data information is limited due to the data protection terms of the country and the strict data regulation of each enterprise.

A user portrait generation method based on privacy protection improves the matching degree of a second party data set and a first party data set in federal learning by increasing the screening process of user portrait data, obtains multi-azimuth and more comprehensive combined characteristic user data of a user portrait after the first party, the second party and a third party are subjected to federal learning, and finally obtains a more accurate user portrait result by using the obtained combined characteristic user data.

In order to improve comprehensiveness and accuracy of user portrayal, the invention provides a user portrayal generation method based on privacy protection. FIG. 1 illustrates a flow diagram of a preferred embodiment of a user representation generation method in accordance with the present invention. Referring to fig. 1, the method may be performed by an apparatus, which may be implemented by software and/or hardware.

It should be noted that, the more the dimension information of the user portrait is, the better, but strong related information is to be found, such as strong related information of a corrupted scene or strong related information of a target client of the product. Only strong relevant information can help enterprises to effectively combine business requirements, namely, accurate positioning of customers is realized, and potential requirements of the customers are known, so that required products are further developed, and business values are created.

Specifically, the user profile generation method based on federal learning includes steps S110 to S130.

And S110, obtaining combined characteristic user data of the target user through a longitudinal federal learning model.

External data is introduced through the data isolation characteristic of federal learning, so that the integrity and effectiveness of a data set are improved.

It should be noted that federal learning (fed learning) refers to a method for performing machine learning by combining different participants (or party, also called data owners) or clients (clients); in federal learning, participants do not need to expose own data to other participants or coordinators or an aggregation server, so that the federal learning can well protect user privacy and guarantee data security. And in the longitudinal federated learning, under the condition that the users of the two data sets overlap more and the user features overlap less, the data sets are segmented according to the longitudinal direction (namely the feature dimension), and the part of data which is the same for the users and has not the same user features is taken out for training. Longitudinal federal learning, i.e., a scenario where the overlap of users (U1, U2, …) is large and the overlap of user features (X1, X2, …) is small, applies to both data sets. That is, vertical federal learning is to aggregate these different features in an encrypted state to enhance the model capabilities.

Referring to FIG. 4, Enterprise A (the Master Enterprise) divides the characteristics into α based on the saturation of its own client characteristicsA、βAIndex set, αAFeature set indicating high saturation, βAThe feature set with low saturation is the feature set which needs to be supplemented to improve the saturation, and the feature with low saturation and high dimension is generally selected to be βASo as to be obtained by machine learning.

Index set

Figure BDA0002511326450000071

β can be combinedaThe vacancy value is expressed by a partial index α of Enterprise A as labelaPartial index α with Business BbThe completion is predicted by federal learning (this is with supervised machine learning), whereαBIs an index set owned by enterprise B.

The following description will be made by taking the case that the insurance company of a group purchases travel dangerous cases according to the customer as an example, and the insurance company generates part of sample label data of "whether to travel to the person" according to the case that the customer purchases travel dangerous cases. If only according to the self data characteristics (age, sex, whether there is car, relevant indexes of labor insurance, and other weak relevant characteristics), the training of the model generation label is not enough. By the method, the insurance company combines the data characteristics of the bank company under the premise of protecting the privacy of the user: paying the times of purchasing train tickets/air tickets, recording defaulting payment, recording credit default and average annual/monthly consumption amount, selecting logistic regression, and combining training models to obtain a label of more comprehensively and accurately training whether the person arrives at a tourist.

It is emphasized that the combined feature user data may also be stored in a node of a blockchain in order to further ensure privacy and security of the combined feature user data.

Fig. 2 shows a flow of a preferred embodiment of the present invention for obtaining combined feature data, and as shown in fig. 2, the method for obtaining combined feature user data of a target user through a longitudinal federated learning model in step S110 includes steps S210-S230.

S210, screening a first party data sample of the target user and second party sample data matched with the first party sample data by using a pre-selected third party, and performing feature combination on the first party data sample and the second party data sample;

it should be noted that the participating enterprises a and B that constitute the federal learning require high overlap of their client ids (i.e., high overlap of user portions of the data set). If enterprise a is the master enterprise (i.e., enterprise a obtains a certain label _1 of all its own clients after federal training), enterprise a itself needs some clients to have a certain label _1, and the sample is used as a supervised sample for federal learning. And enterprise B is a second company, i.e., a company trained with the host company, that trains the first party data sample with the second party data sample of the second company.

In a specific implementation, the first party data sample (first company) is selected under the condition that first, the first company needs to have a part of customers and relevant data label _1 of the customers, and label _1 has high-dimensional features (such as personal income level). Second, there is a partially highly saturated base feature token α that can be combined with features from a second company to train to generate label _1, but the set of base feature tokens α alone is not sufficient to generate label _ 1. After selecting and identifying the first company a, the first company a prepares sample feature data and label data of the company, such as feature a1, feature a2, … …, and feature an, to form a label.

The second party data sample (second company) is selected on the condition that, first, the degree of overlap with the customer ID of the first company is high; secondly, the business scope of the first company is different, or the characteristic category of the client of the first company is different, so that mutual supplement of data can be realized. Therefore, after the federal training of the first-party data sample and the plurality of second-party data samples, comprehensive and multidirectional user portrait labels can be obtained. That is, the second company B prepares sample feature data of the company, such as feature B1, feature B2, … …, feature bm.

Fig. 3 shows a flow of the method for screening the second party sample data matching the first party sample data according to the preferred embodiment of the present invention, and as shown in fig. 3, the method for screening the second party sample data matching the first party sample data by the third party in step S210 includes steps S310 to S330.

S310, the third party carries out standard marking on the feature combination through the feature category dictionary and obtains category saturation information of the first party sample data and the candidate second party sample data.

Specifically, the third company (i.e., the third party) performs standard marking on the category of the characteristics of the first company (the first party sample data) and the candidate second company (the candidate second party sample data) according to a characteristic category dictionary (such as family names and academic records as basic information categories, whether a car exists or a house exists as property categories, consumption records and air ticket information as consumption categories). The first company and the candidate second company provide respective category variable saturation information to the third company. The first company provides a feature saturation threshold X1, a customer overlap threshold X2 to the third company.

S320, the third party encrypts the first party sample data and the candidate second party sample data and compares the encrypted first party sample data and the candidate second party sample data to obtain the feature saturation and the client contact ratio; namely, the third company encrypts the client ids of the first company and the candidate second company, and compares the client ids to obtain the contact ratio.

And S330, screening out candidate second-party sample data with the characteristic saturation greater than a characteristic saturation threshold value X1 and the customer contact ratio greater than a customer contact ratio threshold value X2 by a third party, and taking the candidate second-party sample data as second-party sample data matched with the first-party sample data.

That is, the third company selects a candidate second company having a different feature type from the first company, a feature saturation of the different feature type being greater than X1, and a customer id overlap ratio being greater than X2 as the second company.

It should be noted that the modeling sample ID difference is not revealed to each other, user matching is needed at the beginning of collaboration, the intersection of users needs to be found, but the difference cannot be revealed, because the difference is the most core asset of the enterprise. Any bottom layer (X, Y) data is not leaked to the other side, and how to ensure that the data is not leaked in the modeling process. Through the RSA and Hash mechanism, the two parties are guaranteed to be only used for the intersection part finally, and the difference part is not leaked to the other party. With homomorphic encryption technology, the original data of all parties and the data encryption state are not transmitted in the process. And in the interaction part, the two parties interact by using a homomorphic encryption mechanism through losing an intermediate result, after the models are trained, one model is obtained respectively, the respective model is deployed at the respective party, the models of any party cannot be applied independently, and the decision can be made only when the models are applied together.

The implementation is realized through RSA and Hash mechanisms. The B party can be used as a generator of the public key, the public key can be sent to the A party, the A party quotes a random number based on Hash and then interactively sends the random number to the B party, the B party does Hash at the same time and then sends the random number to the A party, and the A party finally does an intersection of results. In the whole process, no plaintext data is transmitted, and even if a violent or collision mode is adopted, the original id cannot be analyzed. By the set of mechanism, the difference set part of the two parties is protected.

Homomorphic encryption techniques, such as encrypting two numbers, the ciphertext of the two encrypted numbers can be subjected to mathematical operations, such as addition, the result is still the ciphertext, and the result obtained after decrypting the ciphertext is the same as the result of addition of the plaintext of the ciphertexts.

In general, a first company can obtain a comprehensive and multi-directional user representation by means of a data source(s) of a second company, and a foundation is laid for a user to deeply research in an enterprise later.

S220, combining the first party data sample, the second party data sample and a third party into a federation;

specifically, after the first party data sample and the second party data sample are selected by the method, the first company a selects a second company B to form a federal alliance of "the first company a + the second company B + the third company C".

The third party C distributes the public key to the first company A and the second company B, and encrypts the user id and the data needing to be exchanged in the training process; and aligning the samples according to the encrypted id, namely aligning the characteristics a and b required by a label according to the client id.

That is, each subsidiary company constructs its own model a/B in its own company using its own characteristics and tag data. And encrypting and exchanging intermediate results obtained in the training process.

And S230, carrying out federal learning training on the federal alliance to obtain combined characteristic user data of the target user.

FIG. 5 is a flow chart of the federated learning training of the present invention for obtaining combined feature data; as shown in figure 5 of the drawings,

after the first company A selects a second company B to form a Federal union, the Federal union is trained through Federal learning (different methods such as logistic regression, random forest, neural network and other algorithms can be selected according to different scenes), and if AUC < Y, the processes of forming the Federal union and carrying out Federal learning are repeated. Until AUC > Y, the first company obtains the label _1 that it needs to predict the customer.

And Y is a preset threshold provided by the first company, and the preset threshold Y is used as a criterion basis for judging the result of the federal model. AUC (area Under the Curve) is defined as the area Under the ROC curve, and it is clear that the value of this area is not greater than 1. Since the ROC curve is generally located above the line Y ═ x, the AUC generally ranges between 0.5 and 1, i.e., the preset threshold Y is [0.5, 1 ]. There are two ways to calculate AUC, the trapezoidal method and ROC AUCH method, which are both approximate methods. The trapezoidal method and ROC AUCH method are well known to those skilled in the art and will not be described herein.

The first company and a plurality of second companies generate comprehensive and multi-directional user portrait labels of the first company finally through the training module.

In one embodiment, at the same time, team collaborator C continues to aggregate model gradients and losses, passing back updated model A/B parameters. And (5) iterating until convergence, and finishing model training. In the training process, intermediate results and parameters of the model are only exchanged through encryption, and no user data is shared. And finally, predicting the label of other aligned clients by the subsidiary A according to the obtained sample model and by the data of the subsidiary B. And replacing the subsidiary B with another subsidiary, and similarly, assisting the subsidiary A to obtain the corresponding label of the aligned client. And repeating the replacement until the subsidiary A utilizes the data of all the subsidiary companies of the group to generate the multi-azimuth user portrait label of the subsidiary A.

In general, after determining a common user group, the machine learning model can be trained by using the data, in order to ensure the confidentiality of the data in the training process, a third party collaborator C needs to perform encryption training, taking a linear regression model as an example, the training process is as follows: the collaborator C distributes the public key to the model A and the model B for encrypting the data to be exchanged in the training process; secondly, interacting the alignment data A and the alignment data B in an encrypted form to calculate an intermediate result of the gradient; then, the alignment data A and the alignment data B are calculated respectively based on the encrypted gradient values, meanwhile, the alignment data B calculates loss according to the label data of the alignment data B, and the results are summarized to a collaborator C; collaborator C calculates the total gradient by aggregating the results and decrypts it. Finally, the collaborator C respectively transmits the decrypted gradient back to the model A and the model B; model a and model B update their respective parameters according to the gradient. And iterating the steps until the loss function converges, so that the whole training process is completed. In the sample alignment and model training process, the data of the enterprise A and the enterprise B are kept locally, and data privacy is not leaked due to data interaction in the training process. Thus, both parties are enabled to collaboratively train the model with the help of federal learning.

In addition, in the federal learning process, the available model algorithms are various and are not limited to algorithms such as neural networks and random forests, so that different business scenes can be met.

In summary, enterprise A builds model A locally, using data features αaEnterprise B builds model B locally, using data features αbA series of federal learning training processes such as user data alignment, encryption, parameter transmission, update iteration and the like are carried out by a third party collaborator C, and finally the enterprise A obtains a supplemented characteristic βa'. As mentioned above, the security and privacy of enterprise data of each party are ensured in the federal learning process, and only encrypted model parameters are transmitted.

Enterprise A gets other characteristics β that can be federated learned with Enterprise Ba′。

S120, forming classification data for the obtained combined feature user data according to set dimensions, and generating data files from the classification data; (ii) a

It should be noted that the set dimension is mainly used for describing the usage behavior of the article by the user, including the usage time, the location, the usage mode, the type of the article used, and the like.

And carrying out data concentration on the combined characteristic user data obtained through federal learning through a data warehouse, screening out strong relevant information, and carrying out qualitative analysis on quantitative information to generate a data file required by a big data platform.

S130, data mining is carried out on the data file by using a big data platform to obtain individual analysis parameters and global parameters of the target user, and the user portrait of the target user is described according to the individual analysis parameters and the global parameters.

In order to solve the challenges of data sparseness and privacy protection of user portrayal, the invention constructs a user portrayal generation method considering privacy protection aiming at large enterprise groups. By means of the advantages of high user overlapping degree, mutual supplement and close matching of various types of information data and the like, the subsidiaries draw and outline various images of the users by means of a longitudinal federal learning strategy. Firstly, under the premise of ensuring that data is not exchanged and personal data privacy is protected, all parties do not disclose a bottom layer data co-building model, and data of all subsidiaries of a group are fully utilized. Secondly, in the federal learning process, the available model algorithms are various and are not limited to algorithms such as neural networks and random forests, and different business scenes are met. Finally, the enterprise can obtain a comprehensive user portrait for describing user interests, characteristics, behaviors, preferences and the like, and a foundation is laid for the enterprise to deeply research the user in the later period.

FIG. 6 is a block diagram of a preferred embodiment of a user representation generation system of the present invention; referring to FIG. 6, a user representation generation system 600 includes a combined features user data generation unit 610, a user representation generation unit 620; wherein the content of the first and second substances,

the combined feature user data generating unit 610 is configured to obtain combined feature user data of a target user through a longitudinal federated learning model;

the user portrait generation unit 620 forms classification data according to the set dimensions for the obtained combined feature user data, generates data files from the classification data, performs data mining on the data files by using a big data platform to obtain individual analysis parameters and global parameters of a target user, and describes the user portrait of the target user according to the individual analysis parameters and the global parameters.

Further, it is emphasized that the combined feature user data may also be stored in a node of a blockchain in order to further ensure privacy and security of the combined feature user data.

The combined feature user data generation unit 610 includes a feature combination generation module 611, a federation generation module 612, and a combined feature user data generation module 613; wherein the content of the first and second substances,

the feature combination generating module 611 is configured to filter a first party data sample of the target user and second party sample data matched with the first party sample data by using a pre-selected third party, and perform feature combination on the first party data sample and the second party data sample;

the federation generation module 612 is configured to combine the first party data sample, the second party data sample, and the third party of the target user into a federation;

the combined feature user data generating module 613 is configured to perform federal learning training on the federal alliance to obtain combined feature user data of the target user

The invention provides a user portrait generation method based on privacy protection, which is applied to an electronic device 7.

FIG. 7 illustrates an application environment for a preferred embodiment of a privacy preserving based user representation generation method in accordance with the present invention.

Referring to fig. 7, in the present embodiment, the electronic device 7 may be a terminal device having an arithmetic function, such as a server, a smart phone, a tablet computer, a portable computer, or a desktop computer.

The electronic device 7 includes: a processor 72, a memory 71, a communication bus 73, and a network interface 75.

The memory 71 includes at least one type of readable storage medium. The at least one type of readable storage medium may be a non-volatile storage medium such as a flash memory, a hard disk, a multimedia card, a card-type memory 71, and the like. In some embodiments, the readable storage medium may be an internal storage unit of the electronic device 7, such as a hard disk of the electronic device 7. In other embodiments, the readable storage medium may also be an external memory 71 of the electronic device 7, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 7.

In the present embodiment, the readable storage medium of the memory 71 is generally used for storing a user portrait creation program 70 and the like installed in the electronic device 7. The memory 71 may also be used to temporarily store data that has been output or is to be output.

Processor 72, which in some embodiments may be a Central Processing Unit (CPU), microprocessor or other data Processing chip, executes program code or processes data stored in memory 71, such as executing user representation generation program 70.

A communication bus 73 is used to enable connection communication between these components.

The network interface 74 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), typically used to establish a communication link between the electronic apparatus 7 and other electronic devices.

Fig. 7 only shows the electronic device 7 with components 71-74, but it is to be understood that not all of the shown components are required to be implemented, and that more or fewer components may alternatively be implemented.

Optionally, the electronic device 7 may further comprise a user interface, which may include an input unit such as a Keyboard (Keyboard), a voice input device such as a microphone (microphone) or other equipment with voice recognition function, a voice output device such as a sound box, a headset or the like, and optionally a standard wired interface, a wireless interface.

Optionally, the electronic device 7 may further comprise a display, which may also be referred to as a display screen or a display unit. In some embodiments, the display device may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an Organic Light-Emitting Diode (OLED) touch device, or the like. The display is used for displaying information processed in the electronic device 7 and for displaying a visual user interface.

Optionally, the electronic device 7 may further include a Radio Frequency (RF) circuit, a sensor, an audio circuit, and the like, which are not described in detail herein.

In the embodiment of the apparatus shown in FIG. 7, a memory 71, which is a computer storage medium, may include an operating system, and a user representation generation program 70; the processor 72, when executing the user representation generation program 70 stored in the memory 71, implements the following steps: s110, obtaining combined characteristic user data of a target user through a longitudinal federal learning model; s120, forming classification data for the obtained combined feature user data according to set dimensions, and generating data files from the classification data; s130, data mining is carried out on the data file by using a big data platform to obtain individual analysis parameters and global parameters of the target user, and the user portrait of the target user is described according to the individual analysis parameters and the global parameters.

In other embodiments, the user representation generation program 70 may also be divided into one or more modules, which are stored in the memory 71 and executed by the processor 72 to implement the present invention. The modules referred to herein are referred to as a series of computer program instruction segments capable of performing specified functions. The user representation generating program 70 may be divided into a combined feature user data generating unit 610 and a user representation generating unit 620.

Furthermore, the present invention also provides a computer-readable storage medium, which mainly includes a storage data area that can store data created according to use of a block chain node, and a storage program area that can store an operating system, an application program required for at least one function, and includes a user representation generation program that, when executed by a processor, implements an operation such as a user representation generation method based on privacy protection.

The embodiments of the computer-readable storage medium of the present invention are substantially the same as the embodiments of the method, the system, and the electronic device for generating a user image based on privacy protection, and thus, the detailed description thereof is omitted here.

In summary, the user portrait generation method, the system, the electronic device and the computer readable storage medium based on privacy protection of the present invention improve the matching degree between the second party data set and the first party data set in federal learning by increasing the screening process of the user portrait data, obtain the multi-directional and more comprehensive combined feature user data of the user portrait after the first party, the second party and the third party are subjected to federal learning, finally obtain a more accurate user portrait result by using the obtained combined feature user data, increase the efficiency of federal learning by screening the provider of the data set, and achieve the technical effect of improving the degree of agreeing with of the user portrait data.

The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments. Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

17页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:检索地理位置的方法、装置、设备和计算机存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!