Cross-domain recommendation method based on subject label

文档序号：1937700 发布日期：2021-12-07 浏览：19次中文

阅读说明：本技术 一种基于主题标签的跨域推荐方法 (Cross-domain recommendation method based on subject label ) 是由朱全银马思伟李翔马甲林王媛媛周泓马天龙张柏萱于 2021-08-18 设计创作，主要内容包括：本发明公开了一种基于主题标签的跨域推荐方法,适用于普遍的跨域场景下用户推荐问题。包括如下步骤：输入用户标签数据集并提取所需集合；过滤标签集合T中低相关标签得到Ts；将标签信息进行低维下特征向量的融合得到Ts-model；通过LDA主题模型聚类Ts得到TFs-model；引入多域标签的依赖关系映射,生成跨域模型CTFs-model；加载CTFs-model,开放接口CTFs API处理终端请求,得到跨域行为特征处理结果集存于Web服务器并将相关推荐预测信息返回给调用终端。本发明结合改进的跨域场景下用户以及物品的属性特征信息识别技术,可有效获取一种准确度最高的目标用户对物品评分预测的结果集,使在跨域场景下对目标用户的评分预测结果更加准确,增加了对目标用户跨域推荐的使用价值。(The invention discloses a cross-domain recommendation method based on a theme label, which is suitable for user recommendation problems under a common cross-domain scene. The method comprises the following steps: inputting a user tag data set and extracting a required set; filtering low-related tags in the tag set T to obtain Ts; fusing the feature vectors of the tag information under a low dimension to obtain Ts _ model; clustering Ts through an LDA topic model to obtain a TFs _ model; introducing dependency mapping of a multi-domain label to generate a cross-domain model CTFs _ model; and loading the CTFs _ model, processing the terminal request by an open interface CTFs API, obtaining a cross-domain behavior characteristic processing result set, storing the cross-domain behavior characteristic processing result set in a Web server, and returning the relevant recommendation prediction information to the calling terminal. The method and the device can effectively obtain a result set of the object user with the highest accuracy for the object scoring prediction by combining with the improved technology for identifying the attribute characteristic information of the user and the object under the cross-domain scene, so that the scoring prediction result of the object user under the cross-domain scene is more accurate, and the use value of cross-domain recommendation of the object user is increased.)

1. A cross-domain recommendation method based on a theme label is characterized by comprising the following steps:

(1) inputting a data sequence set with subject label information, and extracting a user set U, an item set V, an item label set T and an item scoring set R in the data sequence set;

(2) reducing noise introduced by the low-correlation-degree label, and filtering the low-correlation label in the article label set T to obtain a noise-reduced article label set Ts;

(3) performing adaptive feature extraction on an article label set Ts by using an improved article label collaborative filtering model, and fusing a feature vector of the extracted article label information under a low dimension with a prediction model to obtain an adaptive article scoring matrix decomposition model Ts _ model;

(4) redundancy removal is carried out on an article label set Ts, label subjects are clustered, semantics of the label subjects are expanded while clustering is carried out through an LDA subject model, and an improved model TFs _ model which carries out prediction based on subject distribution is obtained;

(5) introducing mapping of a dependency relationship of associated subject labels of a cross-domain recommended target domain and an auxiliary domain, expanding a recommended scene of a target user to multiple domains, and generating a cross-domain item score prediction and recommendation model CTFs _ model;

(6) loading a cross-domain item scoring prediction and recommendation model CTFs _ model, inputting a characteristic behavior basic attribute set of a multi-target cross-domain user into the trained cross-domain item scoring prediction and recommendation model CTFs _ model, opening a user cross-domain recommendation adaptive identification interface CTFs API, performing interest association and discovery based on user network mining on behavior parameters of the user through the cross-domain recommendation adaptive identification interface CTFs API, perfectly updating an interest characteristic model of the user, simultaneously returning result information to a calling program, and obtaining a cross-domain recommendation result through a Web platform by the user.

2. The method for cross-domain recommendation based on the topic label according to claim 1, wherein the step (1) specifically comprises the following steps:

(1.1) inputting a data set S with subject label information, defining a function len (S) to represent the length of the data set S, and making S ═ { S ═ S₁,S₂,S₃,…,S_iIn which S is_iRepresents the ith data in S, i belongs to [1, len (S)]；

(1.2) defining a loop variable i1 for traversing S, i1 ∈ [1, len (S) ], i1 with an initial value of 1;

(1.3) if i1 ≦ len (S), then entering step (1.4), otherwise entering step (1.10);

(1.4) extracting the data item S_iU in (b) is merged into a user set U, U ═ { U ═₁,u₂,u₃,…,u_IWhere I represents the number of extracted users;

(1.5) extracting the data item S_iIs incorporated into item set V, V ═ V₁,v₂,v₃,..,v_JJ, where J represents the number of items extracted;

(1.6) extracting the data item S_iThe tag T in (1) is incorporated into an item tag set T, T ═ { T ═ T₁,t₂,t₃,…,t_NN represents the number of extracted item tags;

(1.7) extracting the data item S_iThe score R of (a) is incorporated into the item score set R, R ═ { R ═ R₁,R₂,R₃,…,R_O-wherein O represents the number of item scores extracted;

(1.8) establishing a combined set UVTR, marking the item V in the item set V by a set of item label set T and item scoring set R by a user U in a user set U, and marking the four-tuple data of 'user-item-label-scoring' as (U, V, T)_ij,R_ij) Wherein U ∈ U, V ∈ V,R_ij∈R，T_ijrepresenting user u_IFor article v_JLabeled tag set, R_ijRepresents u_IFor v_JThe item score of (1);

(1.9) i1 ═ i1+1, go to step (1.3);

and (1.10) finishing the preprocessing flow of the training data set.

3. The method for cross-domain recommendation based on the topic tag according to claim 2, wherein the step (2) specifically comprises the following steps:

(2.1) inputting an article label set T and a combined set UVTR;

(2.2) analyzing data information of tag popularity k and total sample number n (k) of tags with the popularity k according to the principle that tag data are distributed based on PowerLaw;

(2.3) define loop variable i2 for traversing T, define function len (T) represent length of item tag set T, i2 e [1, len (T)]I2 is given an initial value of 1, t_i2Is the i2 th item label in T;

(2.4) if i2 ≦ len (T), then go to step (2.5), otherwise go to step (2.11);

(2.5) determining a confidence xCI in the confidence interval CI by parameter tuning using a method of rank sum test, evaluating the item label and scored correlation data value by confidence xCI;

(2.6) analyzing the long tail distribution of the popularity of the label;

(2.7) analyzing the frequency of the appearing labels, filtering and removing the labels lower than a set threshold value, and defining the labels lower than the set threshold value as low-correlation labels;

(2.8) analyzing and optimizing the noise of the low correlation label;

(2.9) evaluating the distribution rank of the retained labels and the scored samples after the labels are removed and the test results, and verifying and evaluating the result set after the low-correlation labels are removed;

(2.10) i2 ═ i2+1, go to step (2.4);

(2.11) obtaining a noise-reduced item label set Ts.

4. The method for cross-domain recommendation based on subject label according to claim 3, wherein the step (3) specifically comprises the following steps:

(3.1) inputting an item label set Ts and a combination set UVTR, defining a function len (Ts) to represent the length of the item label set Ts, defining a cyclic variable i3 for traversing Ts, i3 being E [1, len (Ts) ], and i3 being assigned with an initial value of 1;

(3.2) if i3 ≦ len (Ts), entering step (3.3), otherwise entering step (3.17);

(3.3) analyzing a label complete set related to the article i, and calculating the number of the labels t associated on the article i to obtain key article label information;

(3.4) fusing the extracted feature vectors of the key item label information under the low dimension with a prediction model, analyzing the relevance between a given user u and an item i, and performing grading prediction;

(3.5) defining len (UVTR) as the length of the dataset UVTR, defining a loop variable j3 for traversing UVTR, j3 e [1, len (UVTR) ], j3 assigning an initial value of 1;

(3.6) if j3 ≦ len (UVTR), proceeding to step (3.7);

(3.7) analyzing and extracting potential feature vectors of the user u, the item i and the label t;

(3.8) analyzing and evaluating the offset of the user u and the item i;

(3.9) analyzing and minimizing errors by using a random gradient descent;

(3.10) traversing all the labels of the article in the evaluation item, and updating label domain information;

(3.11) j3 ═ j3+1, go to step (3.6);

(3.12) mapping the feature vector of the label information in a low dimension;

(3.13) fusing the extracted feature vectors of the key article label information under the low dimension with a prediction model;

(3.14) analyzing the relevance of the given user u and the item i and carrying out scoring prediction;

(3.15) obtaining parameters after traversal iteration, and adjusting and optimizing model information;

(3.16) i3 ═ i3+1, go to step (3.2);

(3.17) obtaining the adaptive item scoring matrix decomposition model Ts _ model after the model training is finished.

5. The method for cross-domain recommendation based on subject label as claimed in claim 1, wherein the step (4) specifically comprises the following steps:

(4.1) inputting an item label set Ts;

(4.2) clustering the label theme of the item label set Ts;

(4.3) defining len (Ts) as the length of Ts, defining a loop variable i4 for traversing Ts, i4 e [1, len (Ts) ], i4 assigning an initial value of 1;

(4.4) traversing Ts, if i4 is less than or equal to len (Ts), jumping to the step (4.5), and if not, ending the traversing Ts, and jumping to the step (4.19);

(4.5) performing redundancy removal on elements in the item label set Ts;

(4.6) expanding the semantics of the topic label while clustering through the LDA topic model;

(4.7) defining a user state flag, wherein when the flag is 1, entering a new user is indicated;

(4.8) if the flag value is 1, skipping to the step (4.9), otherwise skipping to the step (4.17);

(4.9) analyzing and extracting the behavior characteristics of interest bias of the user;

(4.10) jumping to step (4.11) if entering the new user is recorded by the auxiliary domain, otherwise jumping to step (4.18);

(4.11) extracting the behavior characteristics of each user through the model, updating the user interest model parameters, and perfecting the user model;

(4.12) calculating interest preference values of the user feature vectors in a plurality of dimensions;

(4.13) updating the user information according to the aggregation characteristics and the user model;

(4.14) skipping to the step (4.17) if the flag value is 1, otherwise skipping to the step (4.15);

(4.15) creating a behavior analysis model of the new user;

(4.16) jumping to step (4.18) if the user interest feature has been recorded, otherwise jumping to step (4.19);

(4.17) calculating the weight of the item theme characteristics and the probability of the item theme distributed on the theme interval;

(4.18) carrying out iterative updating and model self-adaptive parameter learning on the parameters of the model by adopting a random gradient descent minimization regular mean square error;

(4.19) if i4 is i4+1, go to step (4.4);

(4.20) obtaining an improved model TFs _ model for prediction based on topic distribution.

6. The method for cross-domain recommendation based on subject label as claimed in claim 1, wherein the step (5) specifically comprises the following steps:

(5.1) introducing associated topic tags of the cross-domain recommendation target domain and the auxiliary domain, and defining a cross-domain topic tag set CTs, wherein CTs is { t ═ t₁,t₂,t₃,…,t_ctN}, ctN denotes the number of cross-overlapping topic label sets in the extracted different domains;

(5.2) mapping the dependency relationship of the associated subject labels of the target domain and the auxiliary domain, expanding the recommendation scene of the target user to multiple domains, analyzing the relation between users and articles among different domains, and solving the problems of data sparsity and system cold start encountered in cross-domain by using the characteristic attribute information of the auxiliary domain;

(5.3) loading an adaptive item scoring matrix decomposition model Ts _ model, defining len (CTs) as the length of a cross-domain topic tag set CTs, and defining a loop variable i5 for traversing the cross-domain topic tag set CTs, wherein i5 belongs to [1, len (CTs) ], and i5 is assigned with an initial value of 1;

(5.4) if i5 ≦ len (CTs), then go to step (5.5), otherwise go to step (5.12);

(5.5) introducing cross-domain by using the influence authority of the label;

(5.6) learning a degree of association of the score of the target domain with the tag through the auxiliary domain;

(5.7) processing the target domain by utilizing inter-domain transfer of the associated label semantic information;

(5.8) analyzing potential feature vectors of the users and the items in the recommendation domain;

(5.9) analyzing the theme distribution condition of the recommended domain label to obtain a characteristic matrix of the theme;

(5.10) carrying out iterative updating of the model and adaptive learning of parameters through the minimized regular error of random gradient descent;

(5.11) if i5 is i5+1, go to step (5.4);

(5.12) expanding the adaptive item scoring matrix decomposition model Ts _ model to a multi-domain scene to predict the interest of the user network at a cross-domain level and recommend cross-domain items;

and (5.13) obtaining a cross-domain item score prediction and recommendation model CTFs _ model.

7. The method for cross-domain recommendation based on subject label as claimed in claim 1, wherein the step (6) specifically comprises the following steps:

(6.1) opening a cross-domain recommendation adaptive identification interface (CTFs) API;

(6.2) creating a Thread Pool CTFs Thread Pool;

(6.3) judging whether all tasks of the CTFs Thread Pool are executed completely, if all tasks are executed completely, entering a step (6.9), otherwise, entering a step (6.4);

(6.4) receiving a data processing request from the terminal;

(6.5) acquiring task processing by the sub-threads CTFs Child Thread;

(6.6) performing interest association and discovery on the behavior parameters of the user based on user network mining by using a cross-domain recommendation adaptive recognition interface (CTFs API), and perfecting and updating an interest characteristic model of the user;

(6.7) returning result information to the calling program, and obtaining a cross-domain recommendation result by the user through the Web platform;

(6.8) ending the Child CTFs Child Thread, and entering the step (6.3);

(6.9) closing the CTFs Thread Pool;

and (6.10) self-adaptive multi-target user network cross-domain interest association and recommendation are finished.

Technical Field

The invention relates to the technical field of information processing, in particular to a cross-domain recommendation method based on a subject label.

Background

The rapid development of computer technology and the large amount of online social media data information in recent years have led to the gradual interest in tracking and identifying the interest characteristics of cross-domain users by using computer means. The interest characteristic information of a plurality of targets is detected and tracked in a cross-domain scene, the method has important practical significance in places such as online social media and the like, an effective characteristic information result set can be provided for a social platform, and reliable item recommendation can be provided for users across multiple domains. The invention provides a cross-domain recommendation method and system based on a theme label, which are used for extracting semantic information included in an article label in a theme modeling mode and realizing an interest recommendation algorithm model in a cross-domain scene. By introducing the semantic information of the labels, the model solves the problem that cross-domain user interest association and recommendation can still be completed by using the labels and the scoring attribute information of the auxiliary domains under the situation that shared user information is lacked in multiple domains, improves the effect of scoring and predicting the target domain user articles, and increases the use value of user interest characteristic information identification under the cross-domain scene.

The existing research foundation of Zhupan silver and the like comprises Quanyin Zhu, Sunqun Cao.A Novel Feature Selection Algorithm for augmented data sets.2009, p: 77-82; lixiang, Zhu-Quanyin, collaborative clustering and scoring matrix shared collaborative filtering recommendations [ J ] computer science and exploration 2014,8(6): 751-; quanyin Zhu, Yunyang Yan, Jin Ding, Jin Qian, the Case Study for Price extraction of Mobile Phone Sell Online.2011, p: 282-285; quanyin Zhu, Suqun Cao, Pei Zhou, Yunyang Yan, Hong Zhou. Integrated print based on Dichotomy Back filling and Disturbance factory Algorithm.International Review on Computers and Software,2011, Vol.6(6): 10891093; ma S, Cao M, Li J, et al. A Face Sequence registration Method Based on Deep relational Network [ C ]// 201918 th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES) IEEE,2019: 104-; the Zhuquan silver et al apply, disclose and authorize related patents: the method is based on OpenCV (open computer vision library) for detecting label information of construction drawing, and has the technical scheme that the label information comprises Chinese patent publication numbers CN109002824A and 2018.12.14; a building component extraction method based on a Faster-RCNN model, Chinese patent publication Nos. CN109002841A, 2018.12.14; von Wanli, Yangyun, Yanlun, Zhu quan Yin, etc. an intelligent terminal IC card authorization and management method of an identity authentication system, Chinese patent publication No. CN107016310B, 2019.12.10; zhuquanjin, Shirenmin, Huronglin, Feng Wanli, etc. a knowledge graph-based expert combined recommendation method, Chinese patent publications CN109062961A, 2018.12.21; a multi-target tracking and facial feature information identification method is disclosed in Chinese patent publication Nos. CN111914613A, 2020.11.10; shaoxing chapter, Nijinxun, Zhuquanhyin, Chenxiaoyi, MaSi Wei, etc. A voucher type accounting method based on block chain mutual authentication and convolution neural network, Chinese patent publication No. CN110188787B,2020.11.03.

Neighbor-based collaborative filtering:

the neighbor-based strategy is a most common collaborative filtering method, and the method relies on the viewpoint of like-minded users to perform characteristic extraction on the habits of selecting articles in human life, namely, if friends with similar user relations like to show interest in an article, the user also has a high probability of being interested in the article, and further selects the article when the system recommends.

Collaborative filtering based on latent features:

different from a method based on neighbor, a strategy based on potential features starts from a preference degree matrix, abandons the fact that the preference degree matrix directly carries out score inference on new object combination through existing scores among associated objects, and carries out inference by using a feature vector low-dimensional mapping of a preference degree matrix of a user to an article in a system based on the preference degree matrix of a combined object in the system.

Model based on transfer learning:

the transfer learning is mainly utilized in other different but related places through the existing problem solving models, namely, in a plurality of specific related tasks, parameters of the models are transferred through the trained and optimized models to assist the training of new task models. The key point of the migration learning is to grasp the bridge in the migration learning process, extract and migrate the common knowledge in different fields by a certain method, and actually migrate and recycle the knowledge. Under the framework, the training and iteration of the model can be divided into two stages: firstly, parameters in a single domain need to be updated; the second is to adjust the parameters of the mapping function.

Similarity weight calculation:

the similarity weight is used as a key factor for recommendation evaluation and influences two most important indexes of recommendation result evaluation, namely recommendation performance of the system and accuracy of the recommendation result.

The similarity calculation is also based on an assumed condition as a premise that similar user groups have similar item interest tendencies, and on the other hand, similar item samples are always interested by the user groups in a limited number of similar systems. In the similarity calculation, one of the common methods is to calculate the cosine angle of the feature vector of the user or the article in the system, and evaluate the similarity between different objects in the system through the value of the cosine angle.

In the aspect of multi-target interest feature information tracking and detection, most of the existing researches are mainly oriented to unilateral treatment of problems in a single-domain scene and the like, the researches on a multi-target interest feature self-adaptive classification method in a cross-domain scene with label attributes are lacked, the information fusion is single, and the efficiency of tracking and analyzing the multi-target interest feature information under data with cross-domain attributes is limited.

Such as: wangxihua et al propose a social interest recommendation method and system based on graph convolution matrix decomposition, which recommend potential items to users to be recommended according to a user potential feature matrix and an item potential feature matrix, and the Chinese patent publication number: CN111523051A 2020.02.24; liu Fang ai et al have proposed an interest recommendation method and system based on user sequence clicking behavior, such that problems that the existing sequence recommendation method ignores the internal structure of user sequence behavior and ignores the conversion relationship between items, and the like are effectively made up, and Chinese patent publication numbers CN110807156A, 2019.10.23; wei and Wei et al propose a collaborative filtering recommendation method based on single-source SimRank, which can calculate the result of the single-source SimRank of a large graph within effective time, and meet the requirements of real-time recommendation and interactive query, and Chinese patent publication No. CN110287424A, 2019.06.28; zhangruiro et al propose a theme label recommendation method based on deep learning, which utilizes a Support Vector Machine (SVM) model to perform feature classification of theme labels on extracted features, and utilizes a word-combining embedding model word2vec and a K neighbor algorithm to expand predicted theme labels, so that labeling results are more reliable, and Chinese patent publication numbers CN110297933A and 2019.07.01.

Disclosure of Invention

The purpose of the invention is as follows: aiming at the problems in the prior art, the invention provides a cross-domain recommendation method based on a theme label, and aims to solve the problem of user recommendation in a cross-domain scene.

The technical scheme is as follows: in order to solve the technical problem, the invention provides a cross-domain recommendation method based on a subject label, which comprises the following steps: