Cross-domain recommendation method based on subject label

文档序号:1937700 发布日期:2021-12-07 浏览:19次 中文

阅读说明:本技术 一种基于主题标签的跨域推荐方法 (Cross-domain recommendation method based on subject label ) 是由 朱全银 马思伟 李翔 马甲林 王媛媛 周泓 马天龙 张柏萱 于 2021-08-18 设计创作,主要内容包括:本发明公开了一种基于主题标签的跨域推荐方法,适用于普遍的跨域场景下用户推荐问题。包括如下步骤:输入用户标签数据集并提取所需集合;过滤标签集合T中低相关标签得到Ts;将标签信息进行低维下特征向量的融合得到Ts-model;通过LDA主题模型聚类Ts得到TFs-model;引入多域标签的依赖关系映射,生成跨域模型CTFs-model;加载CTFs-model,开放接口CTFs API处理终端请求,得到跨域行为特征处理结果集存于Web服务器并将相关推荐预测信息返回给调用终端。本发明结合改进的跨域场景下用户以及物品的属性特征信息识别技术,可有效获取一种准确度最高的目标用户对物品评分预测的结果集,使在跨域场景下对目标用户的评分预测结果更加准确,增加了对目标用户跨域推荐的使用价值。(The invention discloses a cross-domain recommendation method based on a theme label, which is suitable for user recommendation problems under a common cross-domain scene. The method comprises the following steps: inputting a user tag data set and extracting a required set; filtering low-related tags in the tag set T to obtain Ts; fusing the feature vectors of the tag information under a low dimension to obtain Ts _ model; clustering Ts through an LDA topic model to obtain a TFs _ model; introducing dependency mapping of a multi-domain label to generate a cross-domain model CTFs _ model; and loading the CTFs _ model, processing the terminal request by an open interface CTFs API, obtaining a cross-domain behavior characteristic processing result set, storing the cross-domain behavior characteristic processing result set in a Web server, and returning the relevant recommendation prediction information to the calling terminal. The method and the device can effectively obtain a result set of the object user with the highest accuracy for the object scoring prediction by combining with the improved technology for identifying the attribute characteristic information of the user and the object under the cross-domain scene, so that the scoring prediction result of the object user under the cross-domain scene is more accurate, and the use value of cross-domain recommendation of the object user is increased.)

1. A cross-domain recommendation method based on a theme label is characterized by comprising the following steps:

(1) inputting a data sequence set with subject label information, and extracting a user set U, an item set V, an item label set T and an item scoring set R in the data sequence set;

(2) reducing noise introduced by the low-correlation-degree label, and filtering the low-correlation label in the article label set T to obtain a noise-reduced article label set Ts;

(3) performing adaptive feature extraction on an article label set Ts by using an improved article label collaborative filtering model, and fusing a feature vector of the extracted article label information under a low dimension with a prediction model to obtain an adaptive article scoring matrix decomposition model Ts _ model;

(4) redundancy removal is carried out on an article label set Ts, label subjects are clustered, semantics of the label subjects are expanded while clustering is carried out through an LDA subject model, and an improved model TFs _ model which carries out prediction based on subject distribution is obtained;

(5) introducing mapping of a dependency relationship of associated subject labels of a cross-domain recommended target domain and an auxiliary domain, expanding a recommended scene of a target user to multiple domains, and generating a cross-domain item score prediction and recommendation model CTFs _ model;

(6) loading a cross-domain item scoring prediction and recommendation model CTFs _ model, inputting a characteristic behavior basic attribute set of a multi-target cross-domain user into the trained cross-domain item scoring prediction and recommendation model CTFs _ model, opening a user cross-domain recommendation adaptive identification interface CTFs API, performing interest association and discovery based on user network mining on behavior parameters of the user through the cross-domain recommendation adaptive identification interface CTFs API, perfectly updating an interest characteristic model of the user, simultaneously returning result information to a calling program, and obtaining a cross-domain recommendation result through a Web platform by the user.

2. The method for cross-domain recommendation based on the topic label according to claim 1, wherein the step (1) specifically comprises the following steps:

(1.1) inputting a data set S with subject label information, defining a function len (S) to represent the length of the data set S, and making S ═ { S ═ S1,S2,S3,…,SiIn which S isiRepresents the ith data in S, i belongs to [1, len (S)];

(1.2) defining a loop variable i1 for traversing S, i1 ∈ [1, len (S) ], i1 with an initial value of 1;

(1.3) if i1 ≦ len (S), then entering step (1.4), otherwise entering step (1.10);

(1.4) extracting the data item SiU in (b) is merged into a user set U, U ═ { U ═1,u2,u3,…,uIWhere I represents the number of extracted users;

(1.5) extracting the data item SiIs incorporated into item set V, V ═ V1,v2,v3,..,vJJ, where J represents the number of items extracted;

(1.6) extracting the data item SiThe tag T in (1) is incorporated into an item tag set T, T ═ { T ═ T1,t2,t3,…,tNN represents the number of extracted item tags;

(1.7) extracting the data item SiThe score R of (a) is incorporated into the item score set R, R ═ { R ═ R1,R2,R3,…,RO-wherein O represents the number of item scores extracted;

(1.8) establishing a combined set UVTR, marking the item V in the item set V by a set of item label set T and item scoring set R by a user U in a user set U, and marking the four-tuple data of 'user-item-label-scoring' as (U, V, T)ij,Rij) Wherein U ∈ U, V ∈ V,Rij∈R,Tijrepresenting user uIFor article vJLabeled tag set, RijRepresents uIFor vJThe item score of (1);

(1.9) i1 ═ i1+1, go to step (1.3);

and (1.10) finishing the preprocessing flow of the training data set.

3. The method for cross-domain recommendation based on the topic tag according to claim 2, wherein the step (2) specifically comprises the following steps:

(2.1) inputting an article label set T and a combined set UVTR;

(2.2) analyzing data information of tag popularity k and total sample number n (k) of tags with the popularity k according to the principle that tag data are distributed based on PowerLaw;

(2.3) define loop variable i2 for traversing T, define function len (T) represent length of item tag set T, i2 e [1, len (T)]I2 is given an initial value of 1, ti2Is the i2 th item label in T;

(2.4) if i2 ≦ len (T), then go to step (2.5), otherwise go to step (2.11);

(2.5) determining a confidence xCI in the confidence interval CI by parameter tuning using a method of rank sum test, evaluating the item label and scored correlation data value by confidence xCI;

(2.6) analyzing the long tail distribution of the popularity of the label;

(2.7) analyzing the frequency of the appearing labels, filtering and removing the labels lower than a set threshold value, and defining the labels lower than the set threshold value as low-correlation labels;

(2.8) analyzing and optimizing the noise of the low correlation label;

(2.9) evaluating the distribution rank of the retained labels and the scored samples after the labels are removed and the test results, and verifying and evaluating the result set after the low-correlation labels are removed;

(2.10) i2 ═ i2+1, go to step (2.4);

(2.11) obtaining a noise-reduced item label set Ts.

4. The method for cross-domain recommendation based on subject label according to claim 3, wherein the step (3) specifically comprises the following steps:

(3.1) inputting an item label set Ts and a combination set UVTR, defining a function len (Ts) to represent the length of the item label set Ts, defining a cyclic variable i3 for traversing Ts, i3 being E [1, len (Ts) ], and i3 being assigned with an initial value of 1;

(3.2) if i3 ≦ len (Ts), entering step (3.3), otherwise entering step (3.17);

(3.3) analyzing a label complete set related to the article i, and calculating the number of the labels t associated on the article i to obtain key article label information;

(3.4) fusing the extracted feature vectors of the key item label information under the low dimension with a prediction model, analyzing the relevance between a given user u and an item i, and performing grading prediction;

(3.5) defining len (UVTR) as the length of the dataset UVTR, defining a loop variable j3 for traversing UVTR, j3 e [1, len (UVTR) ], j3 assigning an initial value of 1;

(3.6) if j3 ≦ len (UVTR), proceeding to step (3.7);

(3.7) analyzing and extracting potential feature vectors of the user u, the item i and the label t;

(3.8) analyzing and evaluating the offset of the user u and the item i;

(3.9) analyzing and minimizing errors by using a random gradient descent;

(3.10) traversing all the labels of the article in the evaluation item, and updating label domain information;

(3.11) j3 ═ j3+1, go to step (3.6);

(3.12) mapping the feature vector of the label information in a low dimension;

(3.13) fusing the extracted feature vectors of the key article label information under the low dimension with a prediction model;

(3.14) analyzing the relevance of the given user u and the item i and carrying out scoring prediction;

(3.15) obtaining parameters after traversal iteration, and adjusting and optimizing model information;

(3.16) i3 ═ i3+1, go to step (3.2);

(3.17) obtaining the adaptive item scoring matrix decomposition model Ts _ model after the model training is finished.

5. The method for cross-domain recommendation based on subject label as claimed in claim 1, wherein the step (4) specifically comprises the following steps:

(4.1) inputting an item label set Ts;

(4.2) clustering the label theme of the item label set Ts;

(4.3) defining len (Ts) as the length of Ts, defining a loop variable i4 for traversing Ts, i4 e [1, len (Ts) ], i4 assigning an initial value of 1;

(4.4) traversing Ts, if i4 is less than or equal to len (Ts), jumping to the step (4.5), and if not, ending the traversing Ts, and jumping to the step (4.19);

(4.5) performing redundancy removal on elements in the item label set Ts;

(4.6) expanding the semantics of the topic label while clustering through the LDA topic model;

(4.7) defining a user state flag, wherein when the flag is 1, entering a new user is indicated;

(4.8) if the flag value is 1, skipping to the step (4.9), otherwise skipping to the step (4.17);

(4.9) analyzing and extracting the behavior characteristics of interest bias of the user;

(4.10) jumping to step (4.11) if entering the new user is recorded by the auxiliary domain, otherwise jumping to step (4.18);

(4.11) extracting the behavior characteristics of each user through the model, updating the user interest model parameters, and perfecting the user model;

(4.12) calculating interest preference values of the user feature vectors in a plurality of dimensions;

(4.13) updating the user information according to the aggregation characteristics and the user model;

(4.14) skipping to the step (4.17) if the flag value is 1, otherwise skipping to the step (4.15);

(4.15) creating a behavior analysis model of the new user;

(4.16) jumping to step (4.18) if the user interest feature has been recorded, otherwise jumping to step (4.19);

(4.17) calculating the weight of the item theme characteristics and the probability of the item theme distributed on the theme interval;

(4.18) carrying out iterative updating and model self-adaptive parameter learning on the parameters of the model by adopting a random gradient descent minimization regular mean square error;

(4.19) if i4 is i4+1, go to step (4.4);

(4.20) obtaining an improved model TFs _ model for prediction based on topic distribution.

6. The method for cross-domain recommendation based on subject label as claimed in claim 1, wherein the step (5) specifically comprises the following steps:

(5.1) introducing associated topic tags of the cross-domain recommendation target domain and the auxiliary domain, and defining a cross-domain topic tag set CTs, wherein CTs is { t ═ t1,t2,t3,…,tctN}, ctN denotes the number of cross-overlapping topic label sets in the extracted different domains;

(5.2) mapping the dependency relationship of the associated subject labels of the target domain and the auxiliary domain, expanding the recommendation scene of the target user to multiple domains, analyzing the relation between users and articles among different domains, and solving the problems of data sparsity and system cold start encountered in cross-domain by using the characteristic attribute information of the auxiliary domain;

(5.3) loading an adaptive item scoring matrix decomposition model Ts _ model, defining len (CTs) as the length of a cross-domain topic tag set CTs, and defining a loop variable i5 for traversing the cross-domain topic tag set CTs, wherein i5 belongs to [1, len (CTs) ], and i5 is assigned with an initial value of 1;

(5.4) if i5 ≦ len (CTs), then go to step (5.5), otherwise go to step (5.12);

(5.5) introducing cross-domain by using the influence authority of the label;

(5.6) learning a degree of association of the score of the target domain with the tag through the auxiliary domain;

(5.7) processing the target domain by utilizing inter-domain transfer of the associated label semantic information;

(5.8) analyzing potential feature vectors of the users and the items in the recommendation domain;

(5.9) analyzing the theme distribution condition of the recommended domain label to obtain a characteristic matrix of the theme;

(5.10) carrying out iterative updating of the model and adaptive learning of parameters through the minimized regular error of random gradient descent;

(5.11) if i5 is i5+1, go to step (5.4);

(5.12) expanding the adaptive item scoring matrix decomposition model Ts _ model to a multi-domain scene to predict the interest of the user network at a cross-domain level and recommend cross-domain items;

and (5.13) obtaining a cross-domain item score prediction and recommendation model CTFs _ model.

7. The method for cross-domain recommendation based on subject label as claimed in claim 1, wherein the step (6) specifically comprises the following steps:

(6.1) opening a cross-domain recommendation adaptive identification interface (CTFs) API;

(6.2) creating a Thread Pool CTFs Thread Pool;

(6.3) judging whether all tasks of the CTFs Thread Pool are executed completely, if all tasks are executed completely, entering a step (6.9), otherwise, entering a step (6.4);

(6.4) receiving a data processing request from the terminal;

(6.5) acquiring task processing by the sub-threads CTFs Child Thread;

(6.6) performing interest association and discovery on the behavior parameters of the user based on user network mining by using a cross-domain recommendation adaptive recognition interface (CTFs API), and perfecting and updating an interest characteristic model of the user;

(6.7) returning result information to the calling program, and obtaining a cross-domain recommendation result by the user through the Web platform;

(6.8) ending the Child CTFs Child Thread, and entering the step (6.3);

(6.9) closing the CTFs Thread Pool;

and (6.10) self-adaptive multi-target user network cross-domain interest association and recommendation are finished.

Technical Field

The invention relates to the technical field of information processing, in particular to a cross-domain recommendation method based on a subject label.

Background

The rapid development of computer technology and the large amount of online social media data information in recent years have led to the gradual interest in tracking and identifying the interest characteristics of cross-domain users by using computer means. The interest characteristic information of a plurality of targets is detected and tracked in a cross-domain scene, the method has important practical significance in places such as online social media and the like, an effective characteristic information result set can be provided for a social platform, and reliable item recommendation can be provided for users across multiple domains. The invention provides a cross-domain recommendation method and system based on a theme label, which are used for extracting semantic information included in an article label in a theme modeling mode and realizing an interest recommendation algorithm model in a cross-domain scene. By introducing the semantic information of the labels, the model solves the problem that cross-domain user interest association and recommendation can still be completed by using the labels and the scoring attribute information of the auxiliary domains under the situation that shared user information is lacked in multiple domains, improves the effect of scoring and predicting the target domain user articles, and increases the use value of user interest characteristic information identification under the cross-domain scene.

The existing research foundation of Zhupan silver and the like comprises Quanyin Zhu, Sunqun Cao.A Novel Feature Selection Algorithm for augmented data sets.2009, p: 77-82; lixiang, Zhu-Quanyin, collaborative clustering and scoring matrix shared collaborative filtering recommendations [ J ] computer science and exploration 2014,8(6): 751-; quanyin Zhu, Yunyang Yan, Jin Ding, Jin Qian, the Case Study for Price extraction of Mobile Phone Sell Online.2011, p: 282-285; quanyin Zhu, Suqun Cao, Pei Zhou, Yunyang Yan, Hong Zhou. Integrated print based on Dichotomy Back filling and Disturbance factory Algorithm.International Review on Computers and Software,2011, Vol.6(6): 10891093; ma S, Cao M, Li J, et al. A Face Sequence registration Method Based on Deep relational Network [ C ]// 201918 th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES) IEEE,2019: 104-; the Zhuquan silver et al apply, disclose and authorize related patents: the method is based on OpenCV (open computer vision library) for detecting label information of construction drawing, and has the technical scheme that the label information comprises Chinese patent publication numbers CN109002824A and 2018.12.14; a building component extraction method based on a Faster-RCNN model, Chinese patent publication Nos. CN109002841A, 2018.12.14; von Wanli, Yangyun, Yanlun, Zhu quan Yin, etc. an intelligent terminal IC card authorization and management method of an identity authentication system, Chinese patent publication No. CN107016310B, 2019.12.10; zhuquanjin, Shirenmin, Huronglin, Feng Wanli, etc. a knowledge graph-based expert combined recommendation method, Chinese patent publications CN109062961A, 2018.12.21; a multi-target tracking and facial feature information identification method is disclosed in Chinese patent publication Nos. CN111914613A, 2020.11.10; shaoxing chapter, Nijinxun, Zhuquanhyin, Chenxiaoyi, MaSi Wei, etc. A voucher type accounting method based on block chain mutual authentication and convolution neural network, Chinese patent publication No. CN110188787B,2020.11.03.

Neighbor-based collaborative filtering:

the neighbor-based strategy is a most common collaborative filtering method, and the method relies on the viewpoint of like-minded users to perform characteristic extraction on the habits of selecting articles in human life, namely, if friends with similar user relations like to show interest in an article, the user also has a high probability of being interested in the article, and further selects the article when the system recommends.

Collaborative filtering based on latent features:

different from a method based on neighbor, a strategy based on potential features starts from a preference degree matrix, abandons the fact that the preference degree matrix directly carries out score inference on new object combination through existing scores among associated objects, and carries out inference by using a feature vector low-dimensional mapping of a preference degree matrix of a user to an article in a system based on the preference degree matrix of a combined object in the system.

Model based on transfer learning:

the transfer learning is mainly utilized in other different but related places through the existing problem solving models, namely, in a plurality of specific related tasks, parameters of the models are transferred through the trained and optimized models to assist the training of new task models. The key point of the migration learning is to grasp the bridge in the migration learning process, extract and migrate the common knowledge in different fields by a certain method, and actually migrate and recycle the knowledge. Under the framework, the training and iteration of the model can be divided into two stages: firstly, parameters in a single domain need to be updated; the second is to adjust the parameters of the mapping function.

Similarity weight calculation:

the similarity weight is used as a key factor for recommendation evaluation and influences two most important indexes of recommendation result evaluation, namely recommendation performance of the system and accuracy of the recommendation result.

The similarity calculation is also based on an assumed condition as a premise that similar user groups have similar item interest tendencies, and on the other hand, similar item samples are always interested by the user groups in a limited number of similar systems. In the similarity calculation, one of the common methods is to calculate the cosine angle of the feature vector of the user or the article in the system, and evaluate the similarity between different objects in the system through the value of the cosine angle.

In the aspect of multi-target interest feature information tracking and detection, most of the existing researches are mainly oriented to unilateral treatment of problems in a single-domain scene and the like, the researches on a multi-target interest feature self-adaptive classification method in a cross-domain scene with label attributes are lacked, the information fusion is single, and the efficiency of tracking and analyzing the multi-target interest feature information under data with cross-domain attributes is limited.

Such as: wangxihua et al propose a social interest recommendation method and system based on graph convolution matrix decomposition, which recommend potential items to users to be recommended according to a user potential feature matrix and an item potential feature matrix, and the Chinese patent publication number: CN111523051A 2020.02.24; liu Fang ai et al have proposed an interest recommendation method and system based on user sequence clicking behavior, such that problems that the existing sequence recommendation method ignores the internal structure of user sequence behavior and ignores the conversion relationship between items, and the like are effectively made up, and Chinese patent publication numbers CN110807156A, 2019.10.23; wei and Wei et al propose a collaborative filtering recommendation method based on single-source SimRank, which can calculate the result of the single-source SimRank of a large graph within effective time, and meet the requirements of real-time recommendation and interactive query, and Chinese patent publication No. CN110287424A, 2019.06.28; zhangruiro et al propose a theme label recommendation method based on deep learning, which utilizes a Support Vector Machine (SVM) model to perform feature classification of theme labels on extracted features, and utilizes a word-combining embedding model word2vec and a K neighbor algorithm to expand predicted theme labels, so that labeling results are more reliable, and Chinese patent publication numbers CN110297933A and 2019.07.01.

Disclosure of Invention

The purpose of the invention is as follows: aiming at the problems in the prior art, the invention provides a cross-domain recommendation method based on a theme label, and aims to solve the problem of user recommendation in a cross-domain scene.

The technical scheme is as follows: in order to solve the technical problem, the invention provides a cross-domain recommendation method based on a subject label, which comprises the following steps:

(1) inputting a data sequence set with subject label information, and extracting a user set U, an item set V, an item label set T and an item scoring set R in the data sequence set;

(2) reducing noise introduced by the low-correlation-degree label, and filtering the low-correlation label in the article label set T to obtain a noise-reduced article label set Ts;

(3) performing adaptive feature extraction on an article label set Ts by using an improved article label collaborative filtering model, and fusing a feature vector of the extracted article label information under a low dimension with a prediction model to obtain an adaptive article scoring matrix decomposition model Ts _ model;

(4) redundancy removal is carried out on an article label set Ts, label subjects are clustered, semantics of the label subjects are expanded while clustering is carried out through an LDA subject model, and an improved model TFs _ model which carries out prediction based on subject distribution is obtained;

(5) introducing mapping of a dependency relationship of associated subject labels of a cross-domain recommended target domain and an auxiliary domain, expanding a recommended scene of a target user to multiple domains, and generating a cross-domain item score prediction and recommendation model CTFs _ model;

(6) loading a cross-domain item scoring prediction and recommendation model CTFs _ model, inputting a characteristic behavior basic attribute set of a multi-target cross-domain user into the trained cross-domain item scoring prediction and recommendation model CTFs _ model, opening a user cross-domain recommendation adaptive identification interface CTFs API, performing interest association and discovery based on user network mining on behavior parameters of the user through the cross-domain recommendation adaptive identification interface CTFs API, perfectly updating an interest characteristic model of the user, simultaneously returning result information to a calling program, and obtaining a cross-domain recommendation result through a Web platform by the user.

Further, the step (1) specifically includes the following steps:

(1.1) inputting a data set S with subject label information, defining a function len (S) to represent the length of the data set S, and making S ═ { S ═ S1,S2,S3,…,SiIn which S isiRepresents the ith data in S, i belongs to [1, len (S)];

(1.2) defining a loop variable i1 for traversing S, i1 ∈ [1, len (S) ], i1 with an initial value of 1;

(1.3) if i1 ≦ len (S), then entering step (1.4), otherwise entering step (1.10);

(1.4) extracting the data item SiU in (b) is merged into a user set U, U ═ { U ═1,u2,u3,…,uIWhere I represents the number of extracted users;

(1.5) extracting the data item SiIs incorporated into item set V, V ═ V1,v2,v3,..,vJJ, where J represents the number of items extracted;

(1.6) extracting the data item SiThe tag T in (1) is incorporated into an item tag set T, T ═ { T ═ T1,t2,t3,…,tNN represents the number of extracted item tags;

(1.7) extracting the data item SiThe score R of (a) is incorporated into the item score set R, R ═ { R ═ R1,R2,R3,…,RO-wherein O represents the number of item scores extracted;

(1.8) establishing a combined set UVTR, marking the item V in the item set V by a set of item label set T and item scoring set R by a user U in a user set U, and marking the four-tuple data of 'user-item-label-scoring' as (U, V, T)ij,Rij) Wherein U ∈ U, V ∈ V,Rij∈R,Tijrepresenting user uIFor article vJLabeled tag set, RijRepresents uIFor vJThe item score of (1);

(1.9) i1 ═ i1+1, go to step (1.3);

and (1.10) finishing the preprocessing flow of the training data set.

Further, the step (2) specifically includes the following steps:

(2.1) inputting an article label set T and a combined set UVTR;

(2.2) analyzing data information of tag popularity k and total sample number n (k) of tags with the popularity k according to the principle that tag data are distributed based on PowerLaw;

(2.3) define loop variable i2 for traversing T, define function len (T) represent length of item tag set T, i2 e [1, len (T)]I2 is given an initial value of 1, ti2Is the i2 th item label in T;

(2.4) if i2 ≦ len (T), then go to step (2.5), otherwise go to step (2.11);

(2.5) determining a confidence xCI in the confidence interval CI by parameter tuning using a method of rank sum test, evaluating the item label and scored correlation data value by confidence xCI;

(2.6) analyzing the long tail distribution of the popularity of the label;

(2.7) analyzing the frequency of the appearing labels, filtering and removing the labels lower than a set threshold value, and defining the labels lower than the set threshold value as low-correlation labels;

(2.8) analyzing and optimizing the noise of the low correlation label;

(2.9) evaluating the distribution rank of the retained labels and the scored samples after the labels are removed and the test results, and verifying and evaluating the result set after the low-correlation labels are removed;

(2.10) i2 ═ i2+1, go to step (2.4);

(2.11) obtaining a noise-reduced item label set Ts.

Further, the step (3) specifically includes the following steps:

(3.1) inputting an item label set Ts and a combination set UVTR, defining a function len (Ts) to represent the length of the item label set Ts, defining a cyclic variable i3 for traversing Ts, i3 being E [1, len (Ts) ], and i3 being assigned with an initial value of 1;

(3.2) if i3 ≦ len (Ts), entering step (3.3), otherwise entering step (3.17);

(3.3) analyzing a label complete set related to the article i, and calculating the number of the labels t associated on the article i to obtain key article label information;

(3.4) fusing the extracted feature vectors of the key item label information under the low dimension with a prediction model, analyzing the relevance between a given user u and an item i, and performing grading prediction;

(3.5) defining len (UVTR) as the length of the dataset UVTR, defining a loop variable j3 for traversing UVTR, j3 e [1, len (UVTR) ], j3 assigning an initial value of 1;

(3.6) if j3 ≦ len (UVTR), proceeding to step (3.7);

(3.7) analyzing and extracting potential feature vectors of the user u, the item i and the label t;

(3.8) analyzing and evaluating the offset of the user u and the item i;

(3.9) analyzing and minimizing errors by using a random gradient descent;

(3.10) traversing all the labels of the article in the evaluation item, and updating label domain information;

(3.11) j3 ═ j3+1, go to step (3.6);

(3.12) mapping the feature vector of the label information in a low dimension;

(3.13) fusing the extracted feature vectors of the key article label information under the low dimension with a prediction model;

(3.14) analyzing the relevance of the given user u and the item i and carrying out scoring prediction;

(3.15) obtaining parameters after traversal iteration, and adjusting and optimizing model information;

(3.16) i3 ═ i3+1, go to step (3.2);

(3.17) obtaining the adaptive item scoring matrix decomposition model Ts _ model after the model training is finished.

Further, the step (4) specifically includes the following steps:

(4.1) inputting an item label set Ts;

(4.2) clustering the label theme of the item label set Ts;

(4.3) defining len (Ts) as the length of Ts, defining a loop variable i4 for traversing Ts, i4 e [1, len (Ts) ], i4 assigning an initial value of 1;

(4.4) traversing Ts, if i4 is less than or equal to len (Ts), jumping to the step (4.5), and if not, ending the traversing Ts, and jumping to the step (4.19);

(4.5) performing redundancy removal on elements in the item label set Ts;

(4.6) expanding the semantics of the topic label while clustering through the LDA topic model;

(4.7) defining a user state flag, wherein when the flag is 1, entering a new user is indicated;

(4.8) if the flag value is 1, skipping to the step (4.9), otherwise skipping to the step (4.17);

(4.9) analyzing and extracting the behavior characteristics of interest bias of the user;

(4.10) jumping to step (4.11) if entering the new user is recorded by the auxiliary domain, otherwise jumping to step (4.18);

(4.11) extracting the behavior characteristics of each user through the model, updating the user interest model parameters, and perfecting the user model;

(4.12) calculating interest preference values of the user feature vectors in a plurality of dimensions;

(4.13) updating the user information according to the aggregation characteristics and the user model;

(4.14) skipping to the step (4.17) if the flag value is 1, otherwise skipping to the step (4.15);

(4.15) creating a behavior analysis model of the new user;

(4.16) jumping to step (4.18) if the user interest feature has been recorded, otherwise jumping to step (4.19);

(4.17) calculating the weight of the item theme characteristics and the probability of the item theme distributed on the theme interval;

(4.18) carrying out iterative updating and model self-adaptive parameter learning on the parameters of the model by adopting a random gradient descent minimization regular mean square error;

(4.19) if i4 is i4+1, go to step (4.4);

(4.20) obtaining an improved model TFs _ model for prediction based on topic distribution.

Further, the step (5) specifically includes the following steps:

(5.1) introducing associated topic tags of the cross-domain recommendation target domain and the auxiliary domain, and defining a cross-domain topic tag set CTs, wherein CTs is { t ═ t1,t2,t3,…,tctN}, ctN denotes the number of cross-overlapping topic label sets in the extracted different domains;

(5.2) mapping the dependency relationship of the associated subject labels of the target domain and the auxiliary domain, expanding the recommendation scene of the target user to multiple domains, analyzing the relation between users and articles among different domains, and solving the problems of data sparsity and system cold start encountered in cross-domain by using the characteristic attribute information of the auxiliary domain;

(5.3) loading an adaptive item scoring matrix decomposition model Ts _ model, defining len (CTs) as the length of a cross-domain topic tag set CTs, and defining a loop variable i5 for traversing the cross-domain topic tag set CTs, wherein i5 belongs to [1, len (CTs) ], and i5 is assigned with an initial value of 1;

(5.4) if i5 ≦ len (CTs), then go to step (5.5), otherwise go to step (5.12);

(5.5) introducing cross-domain by using the influence authority of the label;

(5.6) learning a degree of association of the score of the target domain with the tag through the auxiliary domain;

(5.7) processing the target domain by utilizing inter-domain transfer of the associated label semantic information;

(5.8) analyzing potential feature vectors of the users and the items in the recommendation domain;

(5.9) analyzing the theme distribution condition of the recommended domain label to obtain a characteristic matrix of the theme;

(5.10) carrying out iterative updating of the model and adaptive learning of parameters through the minimized regular error of random gradient descent;

(5.11) if i5 is i5+1, go to step (5.4);

(5.12) expanding the adaptive item scoring matrix decomposition model Ts _ model to a multi-domain scene to predict the interest of the user network at a cross-domain level and recommend cross-domain items;

and (5.13) obtaining a cross-domain item score prediction and recommendation model CTFs _ model.

Further, the step (6) specifically includes the following steps:

(6.1) opening a cross-domain recommendation adaptive identification interface (CTFs) API;

(6.2) creating a Thread Pool CTFs Thread Pool;

(6.3) judging whether all tasks of the CTFs Thread Pool are executed completely, if all tasks are executed completely, entering a step (6.9), otherwise, entering a step (6.4);

(6.4) receiving a data processing request from the terminal;

(6.5) acquiring task processing by the sub-threads CTFs Child Thread;

(6.6) performing interest association and discovery on the behavior parameters of the user based on user network mining by using a cross-domain recommendation adaptive recognition interface (CTFs API), and perfecting and updating an interest characteristic model of the user;

(6.7) returning result information to the calling program, and obtaining a cross-domain recommendation result by the user through the Web platform;

(6.8) ending the Child CTFs Child Thread, and entering the step (6.3);

(6.9) closing the CTFs Thread Pool;

and (6.10) self-adaptive multi-target user network cross-domain interest association and recommendation are finished.

Compared with the prior art, the invention has the following remarkable progress:

the method changes the limitations of the traditional collaborative filtering model method, combines the improved attribute characteristic information identification technology of the user and the object under the cross-domain scene, can effectively obtain a result set of the object user with the highest accuracy for the object item score prediction, enables the prediction and recommendation results of the object item score by introducing the algorithm model of the subject label under the cross-domain scene to be more accurate, and increases the use value of the cross-domain recommendation of the object user.

Drawings

FIG. 1 is a general flow diagram of a cross-domain recommendation method based on a hashtag;

FIG. 2 is a flow chart of training data preprocessing of FIG. 1;

FIG. 3 is a flowchart of a process for performing correlation noise reduction on the labelsets shown in FIG. 1;

fig. 4 is a flowchart of the decomposition model of the training item scoring matrix in fig. 1.

FIG. 5 is a flow chart of the training improved topic distribution based predictive model of FIG. 1.

Fig. 6 is a flowchart of the training cross-domain item score prediction model in fig. 1.

FIG. 7 is a flowchart of the cross-domain user recommendation performed through the model adaptive recognition interface shown in FIG. 1.

Detailed Description

The technical scheme of the invention is further clarified by the following specific embodiments in combination with the attached drawings.

As shown in fig. 1 to 7, the method for cross-domain recommendation based on a topic label according to the present invention includes the following steps:

step 1: inputting a data sequence set with subject label information, and extracting a user set U, an item set V, an item label set T and an item scoring set R in the data sequence set; the method comprises the following specific steps:

step 1.1: inputting a data set S with subject label information, defining a function len (S) to represent the length of the set S, and making S ═ S1,S2,S3,…,SiIn which S isiRepresents the ith data in S, i belongs to [1, len (S)];

Step 1.2: defining a cyclic variable i1 for traversing S, i 1E [1, len (S) ], i1 assigning an initial value of 1;

step 1.3: if i1 is less than or equal to len (S), then step 1.4 is proceeded, otherwise step 1.10 is proceeded;

step 1.4: extracting a data item SiU in (b) is merged into a user set U, U ═ { U ═1,u2,u3,…,uIWhere I represents the number of extracted users;

step 1.5: extracting a data item SiIs incorporated into item set V, V ═ V1,v2,v3,..,vJJ, where J represents the number of items extracted;

step 1.6: extracting a data item SiThe tag T in (1) is incorporated into an item tag set T, T ═ { T ═ T1,t2,t3,…,tNN represents the number of extracted item tags;

step 1.7: extracting a data item SiThe score R of (a) is incorporated into the item score set R, R ═ { R ═ R1,R2,R3,…,RO-wherein O represents the number of item scores extracted;

step 1.8: establishing a combined set UVTR, wherein a user U in a user set U marks an item V in an item set V through a group of label sets T and a rating set R, and a user-item-labelScore-the four-tuple data is (u, v, T)ij,Rij) Wherein U ∈ U, V ∈ V,Rij∈R,Tijrepresenting user uIFor article vJLabeled tag set, RijIn a representation system uIFor vJThe item score of (1);

step 1.9: i1 ═ i1+1, go to step 1.3;

step 1.10: the preprocessing flow of the training data set ends.

Step 2: reducing noise introduced by the low-correlation-degree label, and filtering the low-correlation label in the article label set T to obtain a noise-reduced label set Ts; the method comprises the following specific steps:

step 2.1: inputting an article label set T and a combined set UVTR;

step 2.2: analyzing the appearance frequency of the labels and the popularity of the labels in the combined set UVTR, and evaluating the association evaluation of the labels and the scores of the articles by keeping the labels and the distribution condition of system scoring samples after removing the labels, wherein the association evaluation adopts a Wilcoxon rank sum test method; the specific content is shown in step 2.3-step 2.12:

step 2.3: analyzing data information of tag popularity k and total tag sample number n (k) with the popularity k according to the principle that tag data is based on PowerLaw distribution;

step 2.4: define loop variable i2 for traversing T, define function len (T) representing the length of item tag set T, i2 ∈ [1, len (T)]I2 is given an initial value of 1, ti2Is the i2 th item label in T;

step 2.5: if i2 is less than or equal to len (T), then step 2.6 is entered, otherwise step 2.12 is entered;

step 2.6: using a method of rank-sum testing, the confidence xCI was determined to be 95% in the confidence interval CI by parameter tuning, and the item label and scored correlation data value was evaluated by confidence xCI;

step 2.7: analyzing the long tail distribution of the popularity of the label;

step 2.8: analyzing the frequency of the appearing labels, filtering and removing the labels lower than a set threshold value, and defining the labels lower than the set threshold value as low-correlation labels;

step 2.9: analyzing and optimizing the noise of the low-correlation label;

step 2.10: evaluating the distribution rank and the test result of the scoring sample after the label is reserved and removed, and verifying and evaluating the result set after the low related label is removed;

step 2.11: i2 ═ i2+1, proceed to step 2.5;

step 2.12: and obtaining a denoised label set Ts.

And step 3: performing adaptive feature extraction on Ts by using an improved article tag collaborative filtering model, and fusing a feature vector of the extracted article tag information under a low dimension with a prediction model to obtain an adaptive article scoring matrix decomposition model Ts _ model; the method comprises the following specific steps:

step 3.1: inputting a label set Ts and a combined set UVTR, defining a function len (Ts) to represent the length of the article label set Ts, defining a cyclic variable i3 for traversing Ts, i3 belonging to [1, len (Ts) ], and i3 assigning an initial value of 1;

step 3.2: if i3 is less than or equal to len (Ts), then step 3.3 is entered, otherwise step 3.17 is entered;

step 3.3: analyzing a label complete set related to the article i, and calculating the number of the labels t associated on the article i to obtain the label information of the key article;

step 3.4: fusing the extracted feature vectors of the key item label information under the low dimension with a prediction model, analyzing the relevance of a given user u and an item i, and carrying out grading prediction;

step 3.5: defining len (UVTR) as the length of the dataset UVTR, defining a loop variable j3 for traversing UVTR, j3 e [1, len (UVTR) ], j3 assigning an initial value of 1;

step 3.6: if j3 is less than or equal to len (UVTR), then go to step 3.7;

step 3.7: analyzing and extracting potential feature vectors of the user u, the article i and the label t;

step 3.8: analyzing and evaluating the offset of the user u and the item i;

step 3.9: analyzing and minimizing errors using random gradient descent;

step 3.10: traversing all the labels of the article in the evaluation item, and updating label domain information;

step 3.11: j3 ═ j3+1, go to step 3.6;

step 3.12: mapping the feature vector of the label information in a low dimension;

step 3.13: fusing the extracted feature vector of the key article label information under the low dimension with a prediction model;

step 3.14: analyzing the relevance of a given user u and an item i and carrying out grading prediction;

step 3.15: obtaining parameters after traversal iteration and adjusting and optimizing model information;

step 3.16: i3 ═ i3+1, go to step 3.2;

step 3.17: and after the model training is finished, obtaining a self-adaptive article scoring matrix decomposition model Ts _ model.

And 4, step 4: redundancy removal is carried out on an article label set Ts, label subjects are clustered, semantics of the label subjects are expanded while clustering is carried out through an LDA subject model, and an improved subject distribution-based prediction model TFs _ model is obtained; the method comprises the following specific steps:

step 4.1: inputting an item label set Ts;

step 4.2: clustering the label theme of the article label set Ts;

step 4.3: defining len (Ts) as the length of Ts, defining a cyclic variable i4 for traversing Ts, i4 being [1, len (Ts) ], and i4 being assigned an initial value of 1;

step 4.4: traversing Ts, if i4 is less than or equal to len (Ts), jumping to step 4.5, otherwise, ending traversing Ts, and jumping to step 4.19;

step 4.5: redundancy removal is carried out on elements in the item label set Ts;

step 4.6: expanding the semantics of the subject label while clustering through the LDA subject model;

step 4.7: defining a user state flag, wherein when the flag is 1, entering a new user is indicated;

step 4.8: if the flag value is 1, skipping to the step 4.9, otherwise skipping to the step 4.17;

step 4.9: analyzing and extracting the behavior characteristics of interest bias of the user;

step 4.10: if the entering new user is recorded by the auxiliary domain, jumping to step 4.11, otherwise, jumping to step 4.18;

step 4.11: behavior characteristics of each user are extracted through the model, user interest model parameters are updated, and a user model is perfected;

step 4.12: calculating interest preference values of the user feature vectors on multiple dimensions;

step 4.13: updating the user information according to the aggregation characteristics and the user model;

step 4.14: if the flag value is 1, skipping to the step 4.17, otherwise skipping to the step 4.15;

step 4.15: creating a behavior analysis model of a new user;

step 4.16: if the user interest characteristic has been recorded, jumping to step 4.18, otherwise jumping to step 4.19;

step 4.17: calculating the weight of the item theme characteristics and the probability of the item theme distributed on the theme interval;

step 4.18: performing iterative updating and model self-adaptive parameter learning on parameters of the model by adopting a random gradient descent minimized regular mean square error;

step 4.19: i4 ═ i4+1, go to step 4.4;

step 4.20: the improved topic distribution based prediction model TFs _ model is obtained.

And 5: introducing mapping of a dependency relationship of associated subject labels of a cross-domain recommended target domain and an auxiliary domain, expanding a recommended scene of a target user to multiple domains, and generating a cross-domain item score prediction and recommendation model CTFs _ model;

step 5.1: introducing associated topic tags of a cross-domain recommended target domain and an auxiliary domain, and defining a cross-domain topic tag set CTs, wherein the CTs is { t ═ t1,t2,t3,…,tctN}, ctN denotes the number of cross-overlapping topic label sets in the extracted different domains;

step 5.2: mapping the dependency relationship of the associated subject labels of the target domain and the auxiliary domain, expanding the recommendation scene of the target user to multiple domains, analyzing the relation between users and articles among different domains, and solving the problems of data sparsity and system cold start in cross-domain by using the characteristic attribute information of the auxiliary domain;

step 5.3: loading a single-domain model (namely an adaptive item scoring matrix decomposition model Ts _ model), defining len (CTs) as the length of a cross-domain topic tag set CTs, and defining a cyclic variable i5 for traversing the cross-domain topic tag set CTs, wherein i5 belongs to [1, len (CTs) ], and i5 is assigned with an initial value of 1;

step 5.4: if i5 is less than or equal to len (CTs), then go to step 5.5, otherwise go to step 5.12;

step 5.5: introducing cross-domain by using the influence right of the label;

step 5.6: learning the association degree of the mark of the target domain and the label through the auxiliary domain;

step 5.7: processing the target domain by utilizing inter-domain transfer of associated label semantic information;

step 5.8: analyzing potential feature vectors of users and articles in the recommendation domain;

step 5.9: analyzing the theme distribution condition of the recommended domain label to obtain a characteristic matrix of the theme;

step 5.10: performing iterative updating of the model and adaptive learning of parameters through the minimized regular error of random gradient descent;

step 5.11: i5 ═ i5+1, proceed to step 5.4;

step 5.12: expanding the adaptive item scoring matrix decomposition model Ts _ model to a multi-domain scene to predict the interest of a user network at a cross-domain level and recommend cross-domain items;

step 5.13: and obtaining a cross-domain item score prediction and recommendation model CTFs _ model.

Step 6: loading a cross-domain item scoring prediction and recommendation model CTFs _ model, inputting a characteristic behavior basic attribute set of a multi-target cross-domain user into a trained adaptive recommendation model CTFs _ model, opening a user cross-domain recommendation adaptive identification interface CTFs API, performing interest association and discovery on behavior parameters of the user based on user network mining by the adaptive identification interface CTFs API, perfecting and updating an interest characteristic model of the user, and returning result information to a calling program, wherein the user can obtain a cross-domain recommendation result through a Web platform;

step 6.1: an open cross-domain recommendation adaptive interface CTFs API;

step 6.2: creating a Thread Pool CTFs Thread Pool;

step 6.3: judging whether all tasks of the CTFs Thread Pool are executed completely, if all the tasks are executed completely, entering a step 6.9, otherwise, entering a step 6.4;

step 6.4: receiving a data processing request from a terminal;

step 6.5: acquiring task processing by a Child Thread CTFs;

step 6.6: the adaptive identification interface CTFs API carries out interest association and discovery on the behavior parameters of the user based on user network mining, and perfects and updates the interest characteristic model of the user;

step 6.7: returning result information to the calling program, and enabling a user to obtain a cross-domain recommendation result through a Web platform;

step 6.8: ending the Child process CTFs Child Thread, and entering step 6.3;

step 6.9: closing the CTFs Thread Pool;

step 6.10: and (4) self-adaptive multi-target user network cross-domain interest association and recommendation are finished.

In order to better illustrate the effectiveness of the method, the result of a plurality of groups of comparison experiments under scores with labels intercepted from a plurality of public data sets and self-built data sets shows that the model is averagely reduced by 4.27% compared with the traditional SVD model under the evaluation function prediction error RMSE, and the promotion of the user interest recommendation and prediction effect under the cross-domain scene is obtained.

The following table is a detailed description of all variables involved in the above procedure.

The invention can be combined with a computer system so as to predict and recommend the interest of the multi-target user.

The invention creatively provides a cross-domain recommendation method based on a subject label, and the optimal prediction result of the user item score is obtained through multiple experiments.

The cross-domain recommendation method based on the theme label can be used for tracking and classifying user interest characteristics in the multi-target social field in a scoring label sequence in a scoring prediction processing process of user item scoring under a cross-domain scene such as movie scoring and book scoring, and can also be used for tracking and classifying three or more than three multi-domain heterogeneous data.

An embodiment of the present invention discloses a computer program product, which includes a computer program stored on a non-transitory computer readable storage medium, the computer program including program instructions, when the program instructions are executed by a computer, the computer can execute the methods provided by the above method embodiments, for example, the method includes:

inputting a user tag data set and extracting a required set; filtering low-related tags in the tag set T to obtain Ts; fusing the feature vectors of the tag information under a low dimension to obtain Ts _ model; clustering Ts through an LDA topic model to obtain a TFs _ model; introducing dependency mapping of a multi-domain label to generate a cross-domain model CTFs _ model; and loading the CTFs _ model, processing the terminal request by an open interface CTFs API, obtaining a cross-domain behavior characteristic processing result set, storing the cross-domain behavior characteristic processing result set in a Web server, and returning the relevant recommendation prediction information to the calling terminal.

Embodiments of the present invention provide a non-transitory computer-readable storage medium, which stores computer instructions, where the computer instructions cause the computer to perform the methods provided by the above method embodiments, for example, the methods include:

inputting a user tag data set and extracting a required set; filtering low-related tags in the tag set T to obtain Ts; fusing the feature vectors of the tag information under a low dimension to obtain Ts _ model; clustering Ts through an LDA topic model to obtain a TFs _ model; introducing dependency mapping of a multi-domain label to generate a cross-domain model CTFs _ model; and loading the CTFs _ model, processing the terminal request by an open interface CTFs API, obtaining a cross-domain behavior characteristic processing result set, storing the cross-domain behavior characteristic processing result set in a Web server, and returning the relevant recommendation prediction information to the calling terminal.

Furthermore, the logic instructions in the memory may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

25页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种用于属性亲密度的分析方法和系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!