User label updating method, device, equipment and medium based on artificial intelligence

文档序号:1889428 发布日期:2021-11-26 浏览:4次 中文

阅读说明:本技术 基于人工智能的用户标签更新方法、装置、设备及介质 (User label updating method, device, equipment and medium based on artificial intelligence ) 是由 纪曾文 于 2021-08-31 设计创作,主要内容包括:本发明公开了基于人工智能的用户标签更新方法、装置、设备及存储介质,涉及人工智能技术,先获取订阅标签子集,并基于用户历史数据的嵌入向量与其他用户集对应的嵌入向量集进行聚类,获取嵌入向量所属于的目标用户分群子簇及目标用户分群子簇相应的目标用户唯一识别码集;然后获取目标用户唯一识别码集对应的热门用户画像标签集;最后将订阅标签子集与热门用户画像标签集进行组合,得到与用户唯一识别码对应的用户当前最优标签集。实现了用户的标签既有固定的订阅标签,也有跟随用户点击行为反馈的动态标签,实现了了基于用户标签推荐内容的多样性和准确性。(The invention discloses a method, a device, equipment and a storage medium for updating a user tag based on artificial intelligence, which relate to the artificial intelligence technology, and are characterized in that a subscription tag subset is obtained firstly, an embedded vector based on user historical data is clustered with embedded vector sets corresponding to other user sets, and a target user clustering sub-cluster to which the embedded vector belongs and a target user unique identification code set corresponding to the target user clustering sub-cluster are obtained; then, acquiring a popular user portrait label set corresponding to the target user unique identification code set; and finally, combining the subscription tag subset with the popular user portrait tag set to obtain the current optimal tag set of the user corresponding to the unique identification code of the user. The method and the device have the advantages that the user labels are not only fixed subscription labels, but also dynamic labels fed back along with the clicking behaviors of the user, and diversity and accuracy of the recommended content based on the user labels are achieved.)

1. A user label updating method based on artificial intelligence is characterized by comprising the following steps:

if a subscription tag set distribution instruction is detected, receiving a subscription tag subset uploaded by a user side, and acquiring user history data according to a user unique identification code of the user side;

calling a pre-trained deep semantic matching model, inputting the user historical data into the deep semantic matching model for operation, and obtaining an embedded vector corresponding to the user historical data;

acquiring an embedded vector set corresponding to other stored user sets, clustering according to the embedded vector and the embedded vector set to obtain a user clustering cluster, and acquiring a target user clustering sub-cluster to which the embedded vector belongs and a target user unique identification code set corresponding to the target user clustering sub-cluster from the user clustering cluster;

acquiring a user portrait label set corresponding to each user unique identification code in the target user unique identification code set, and performing number statistics on each user portrait label to obtain a user portrait label statistical result;

sorting the user portrait label statistical results in a descending order according to the number of each user portrait label to obtain user portrait label sorting results, and acquiring user portrait labels which do not exceed a preset ranking threshold value in the user portrait label sorting results to form a popular user portrait label set; and

and combining the subscription tag subset with the popular user portrait tag set to obtain the current optimal tag set of the user corresponding to the unique identification code of the user.

2. The method for updating the user tag based on artificial intelligence of claim 1, wherein the receiving a subset of the subscription tags uploaded by the user terminal if the subscription tag distribution instruction is detected, and obtaining the user history data according to the user unique identifier of the user terminal comprises:

if a subscription tag set distribution instruction is detected, acquiring a stored subscription tag set, and sending the subscription tag set to a user side;

receiving a subscription label subset sent by a user side according to the subscription label set, mapping and binding the subscription label subset and a user unique identification code corresponding to the user side, and storing the mapping and binding in the local;

and searching and acquiring corresponding user historical data in a local user database according to the unique user identification code.

3. The method for updating the user tag based on artificial intelligence according to claim 1, wherein the obtaining of the embedded vector set corresponding to the stored other user set, the clustering according to the embedded vector and the embedded vector set to obtain a user cluster, and the obtaining of the target user cluster sub-cluster to which the embedded vector belongs and the target user unique identifier set corresponding to the target user cluster sub-cluster from the user cluster comprise:

acquiring an embedded vector set corresponding to other stored user sets, and performing K-means clustering on the embedded vectors and the embedded vector set to obtain user clusters with the same number as the preset clustering group number;

and acquiring a user clustering sub-cluster corresponding to the embedded vector as a target user clustering sub-cluster, and acquiring user unique identification codes corresponding to all embedded vectors in the target user clustering sub-cluster to form a target user unique identification code set.

4. The method for updating the user tag based on the artificial intelligence of claim 1, wherein the invoking of a pre-trained deep semantic matching model and the inputting of the user history data into the deep semantic matching model for operation to obtain the embedded vector corresponding to the user history data comprises:

acquiring dense user features and sparse user features in the user historical data;

inputting the dense features of the user to an input layer of the deep semantic matching model for independent hot coding to obtain a first coding vector of the user;

inputting the user sparse features into an input layer of the deep semantic matching model for word embedding processing to obtain a second user coding vector;

performing feature splicing on the first user coding vector and the second user coding vector to obtain a current coding vector;

and inputting the current coding vector to a representation layer of the depth semantic matching model for full-connection processing to obtain an embedded vector corresponding to the user historical data.

5. The artificial intelligence based user tag updating method according to claim 3, wherein the K-means clustering the embedded vectors and the embedded vector set to obtain user clusters having the same number as a preset cluster group number comprises:

selecting the same number of embedded vectors as the number of preset clustering groups in the embedded vector set, and taking the selected embedded vectors as the initial clustering center of each cluster;

dividing the embedded vector set according to the cosine similarity between each embedded vector in the embedded vector set and each initial clustering center to obtain an initial clustering result;

obtaining the adjusted clustering center of each cluster according to the initial clustering result;

and according to the adjusted clustering center, dividing the embedded vectors of the embedded vector set according to the cosine similarity of the adjusted clustering center until the clustering result keeps the same times more than the preset times, and thus obtaining the user clustering cluster.

6. The artificial intelligence based user tag updating method of claim 1, wherein after combining the subset of subscription tags with the trending user portrait tag set to obtain a user current optimal tag set corresponding to the user unique identification code, further comprising:

and if a hot user portrait label updating instruction is detected, acquiring the unique user identification code, searching and acquiring corresponding current user data in a local user database, taking the current user data updating as user historical data, returning to execute the calling of the pre-trained deep semantic matching model, inputting the user historical data into the deep semantic matching model for operation, and obtaining an embedded vector corresponding to the user historical data.

7. The artificial intelligence based user tag updating method according to claim 1, wherein a sum of tag weight values respectively corresponding to the popular user portrait tags in the popular user portrait tag set is recorded as a total tag change weight value, and a sum of the total tag change weight value and the total tag fixed weight value is 1.

8. An artificial intelligence based user tag updating apparatus, comprising:

the user historical data acquisition unit is used for receiving the subscription tag subset uploaded by the user side and acquiring user historical data according to the user unique identification code of the user side if the subscription tag set distribution instruction is detected;

the embedded vector acquisition unit is used for calling a pre-trained deep semantic matching model and inputting the user historical data into the deep semantic matching model for operation to obtain an embedded vector corresponding to the user historical data;

a target identification code set obtaining unit, configured to obtain an embedded vector set corresponding to a stored other user set, perform clustering according to the embedded vector and the embedded vector set to obtain a user cluster, and obtain, from the user cluster, a target user cluster sub-cluster to which the embedded vector belongs and a target user unique identification code set corresponding to the target user cluster sub-cluster;

the tag counting unit is used for acquiring a user portrait tag set corresponding to each user unique identification code in the target user unique identification code set and counting the number of each user portrait tag to obtain a user portrait tag counting result;

the hot tag set acquisition unit is used for sequencing the user portrait tag statistical results in a descending order according to the number of the user portrait tags to obtain user portrait tag sequencing results, and acquiring user portrait tags which do not exceed a preset ranking threshold value in the user portrait tag sequencing results to form a hot user portrait tag set; and

and the optimal tag set acquisition unit is used for combining the subscription tag subset and the popular user portrait tag set to obtain a user current optimal tag set corresponding to the user unique identification code.

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the artificial intelligence based user tag update method according to any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, causes the processor to carry out the artificial intelligence based user tag updating method according to any one of claims 1 to 7.

Technical Field

The invention relates to the technical field of artificial intelligence intelligent decision making, in particular to a user label updating method, a device, equipment and a storage medium based on artificial intelligence.

Background

At present, a recommendation system encounters a problem of the horse-sickness effect, in an information flow recommendation scene, recommended contents are shown to be narrower and narrower on a user, interest labels of the user are also more concentrated, and accordingly the recommended contents for the user are further narrowed, and the process is repeated.

The common solution is to perform similar interest expansion for the user or explore the user interest based on a user EE (the full name of EE in user EE is Exploration and Exploitation, which represents exploring and mining the user interest), but these methods still use the click behavior of the user as feedback to weight the contents clicked by the user, so as to recommend the weighted contents in a focused manner, but also get into another mary effect, so that the recommended contents are narrower.

Disclosure of Invention

The embodiment of the invention provides a user tag updating method, a device, equipment and a storage medium based on artificial intelligence, and aims to solve the problem that in the prior art, an information recommendation system takes clicking behaviors of a user as feedback to weight contents clicked by the user, so that the weighted contents are recommended in a key mode, the recommended contents are more and more concentrated on key tags, and the user cannot receive recommended information more comprehensively.

In a first aspect, an embodiment of the present invention provides an artificial intelligence based user tag updating method, which includes:

if a subscription tag set distribution instruction is detected, receiving a subscription tag subset uploaded by a user side, and acquiring user history data according to a user unique identification code of the user side;

calling a pre-trained deep semantic matching model, inputting the user historical data into the deep semantic matching model for operation, and obtaining an embedded vector corresponding to the user historical data;

acquiring an embedded vector set corresponding to other stored user sets, clustering according to the embedded vector and the embedded vector set to obtain a user clustering cluster, and acquiring a target user clustering sub-cluster to which the embedded vector belongs and a target user unique identification code set corresponding to the target user clustering sub-cluster from the user clustering cluster;

acquiring a user portrait label set corresponding to each user unique identification code in the target user unique identification code set, and performing number statistics on each user portrait label to obtain a user portrait label statistical result;

sorting the user portrait label statistical results in a descending order according to the number of each user portrait label to obtain user portrait label sorting results, and acquiring user portrait labels which do not exceed a preset ranking threshold value in the user portrait label sorting results to form a popular user portrait label set; and

and combining the subscription tag subset with the popular user portrait tag set to obtain the current optimal tag set of the user corresponding to the unique identification code of the user.

In a second aspect, an embodiment of the present invention provides an artificial intelligence-based user tag updating apparatus, which includes:

the user historical data acquisition unit is used for receiving the subscription tag subset uploaded by the user side and acquiring user historical data according to the user unique identification code of the user side if the subscription tag set distribution instruction is detected;

the embedded vector acquisition unit is used for calling a pre-trained deep semantic matching model and inputting the user historical data into the deep semantic matching model for operation to obtain an embedded vector corresponding to the user historical data;

a target identification code set obtaining unit, configured to obtain an embedded vector set corresponding to a stored other user set, perform clustering according to the embedded vector and the embedded vector set to obtain a user cluster, and obtain, from the user cluster, a target user cluster sub-cluster to which the embedded vector belongs and a target user unique identification code set corresponding to the target user cluster sub-cluster;

the tag counting unit is used for acquiring a user portrait tag set corresponding to each user unique identification code in the target user unique identification code set and counting the number of each user portrait tag to obtain a user portrait tag counting result;

the hot tag set acquisition unit is used for sequencing the user portrait tag statistical results in a descending order according to the number of the user portrait tags to obtain user portrait tag sequencing results, and acquiring user portrait tags which do not exceed a preset ranking threshold value in the user portrait tag sequencing results to form a hot user portrait tag set; and

and the optimal tag set acquisition unit is used for combining the subscription tag subset and the popular user portrait tag set to obtain a user current optimal tag set corresponding to the user unique identification code.

In a third aspect, an embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor, when executing the computer program, implements the artificial intelligence based user tag updating method according to the first aspect.

In a fourth aspect, the present invention further provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program, when executed by a processor, causes the processor to execute the artificial intelligence based user tag updating method according to the first aspect.

The embodiment of the invention provides a user tag updating method, a device, equipment and a storage medium based on artificial intelligence, so that a tag of a user not only has a fixed subscription tag, but also has a dynamic tag fed back along with a user clicking behavior, and diversity and accuracy of content recommended based on the user tag are realized.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic view of an application scenario of a user tag updating method based on artificial intelligence according to an embodiment of the present invention;

FIG. 2 is a schematic flowchart of a method for updating a user tag based on artificial intelligence according to an embodiment of the present invention;

FIG. 3 is a schematic block diagram of an artificial intelligence-based user tag updating apparatus according to an embodiment of the present invention;

FIG. 4 is a schematic block diagram of a computer device provided by an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

Referring to fig. 1 and fig. 2, fig. 1 is a schematic view of an application scenario of a user tag updating method based on artificial intelligence according to an embodiment of the present invention; fig. 2 is a schematic flowchart of an artificial intelligence based user tag updating method according to an embodiment of the present invention, where the artificial intelligence based user tag updating method is applied to a server, and the method is executed by application software installed in the server.

As shown in fig. 2, the method includes steps S101 to S106.

S101, if a subscription tag set distribution instruction is detected, receiving a subscription tag subset uploaded by a user side, and acquiring user history data according to a user unique identification code of the user side.

In the present embodiment, in order to more clearly understand the technical solution, the following describes the execution subject in detail. The technical scheme is described by taking the service as an execution subject.

And a server in which content data (such as video data, text data, voice data, shopping product data, etc.) of a plurality of tag types are stored. And a plurality of user history data tables are stored in the server, and each user history data table stores user history data of a user corresponding to the same user unique identification code (by analyzing a plurality of user history data in a certain user history data table, user characteristics corresponding to the user, such as user portrait labels and other characteristics, can be obtained). The server also stores a DSSM model (DSSM is called Deep Structured Semantic Models, which represents a Deep Semantic matching model) that is subjected to vector conversion based on user history data, and the model can obtain low-dimensional Semantic vector expression sensor embedding of sentences and can predict Semantic similarity of two sentences. After the user historical data are converted into corresponding embedded vectors, clustering analysis can be carried out according to the embedded vectors corresponding to the users, and therefore the content is recommended according to the user tags.

The user side is an intelligent terminal (such as a smart phone, a tablet personal computer and the like) used by the user, after the user starts a specified application program (such as a video APP, a music APP, a reader APP, an online shopping APP and the like) to establish communication with the server through operating the user side, the application program can acquire user behavior data (such as the types of browsed commodities and watching types of purchased commodities) and upload the user behavior data to the server and store the user behavior data in a corresponding user history data table through watching video, listening to music, watching literary works, purchasing commodities and the like in the using process.

When the server detects a subscription tag set distribution instruction, the server expects to distribute a fixed subscription tag set to some users, and at the moment, the server first acquires a locally stored subscription tag set and then sends the subscription tag set to a user side. When a subscription tag set distribution instruction is triggered, a target user side for receiving the subscription tag set can be selected, and the target user side is directly and directionally sent to the target user side to realize the extension of the tag of the target user side.

In one embodiment, step S101 includes:

if a subscription tag set distribution instruction is detected, acquiring a stored subscription tag set, and sending the subscription tag set to a user side;

receiving a subscription label subset sent by a user side according to the subscription label set, mapping and binding the subscription label subset and a user unique identification code corresponding to the user side, and storing the mapping and binding in the local;

and searching and acquiring corresponding user historical data in a local user database according to the unique user identification code.

In this embodiment, when the user receives the subscription tag set sent by the server, it may be determined that the user corresponds to the selected subscription tag by the following methods:

firstly, whether the user tags which are the same as the subscription tags in the subscription tag set are stored or not is directly judged in the user data locally stored in the user side, if yes, the subscription tags in the subscription tag set can be automatically selected, and the operation is repeated until the subscription tags are determined according to the locally stored user data, so that the subscription tag subset is formed. For example, the subscription tag set originally sent by the server includes 10 subscription tags, after the data comparison operation, 3 of the subscription tags are selected to form the subscription tag subset, and the fixed weight values of the tags corresponding to the 3 subscription tags are determined according to the local user data of the user side (for example, the historical frequency of respective clicks of the 3 types of subscription tags by the user).

And secondly, the subscription label set is directly displayed on the labels on the user terminal interface for the user to click and select, and when the user finishes selecting the labels on the interface and sets a label fixed weight value for each selected subscription label, the selected subscription labels form a subscription label subset.

Once the subset of subscription tags is obtained on the user side, it is sent to the server. After receiving the subscription tag subset, the server maps and binds the subscription tag subset and the user unique identification code corresponding to the user side and stores the mapping and binding in the local; the subscription label subset comprises at least one subscription label, each subscription label corresponds to one label fixed weight value, and the sum of the label fixed weight values of the subscription labels in the subscription label subset is marked as a total label fixed weight value. Due to the fact that the setting of the subscription tag subset is completed, when the tags are added to the user tags of the corresponding users, the weight of the subscription tags cannot be changed along with the fact that the users further use the user side to watch videos, listen to music, watch literary works, buy commodities and the like, and therefore the server can stably push corresponding contents for the fixed tags of the users all the time, the situation that the users click the contents corresponding to a certain tag more and more is avoided, and the server subsequently only recommends the contents corresponding to the tag. The subscription label subset comprises at least one subscription label, each subscription label corresponds to one label fixed weight value, and the sum of the label fixed weight values of the subscription labels in the subscription label subset is marked as a total label fixed weight value.

After the user fixed interest tags are set according to the subscription tag subsets, the sum of the tag fixed weight values of the subscription tags in the subscription tag subsets is recorded as a total tag fixed weight value, and the total tag fixed weight value is less than 1, that is, a part of weight space is reserved to assign values to the dynamic user tags of the user, so that the dynamic adjustment of the user for some non-fixed tags is comprehensively considered. At this time, in order to dynamically adjust the user tags based on the user history data, the server retrieves and acquires the corresponding user history data from the local user database according to the unique user identification code.

S102, calling a pre-trained deep semantic matching model, inputting the user historical data into the deep semantic matching model for operation, and obtaining an embedded vector corresponding to the user historical data.

In the embodiment, the server locally stores a pre-trained deep semantic matching model (i.e. DSSM model), and the general DSSM model can be generally divided into three layers, namely an input layer, a presentation layer and a matching layer.

User feature training data are input into the input layer, the user features comprise dense user features (such as features like user gender and the like, and are characterized in that the dimensionality is not particularly high and each sample appears) and sparse user features (such as user preferences and the like, and are characterized in that the feature dimensionality is high but the occurrence frequency of each sample is low), wherein the dense user features are subjected to one-hot encoding operation, the sparse user features are subjected to embedding dimensionality reduction to a low-dimensional space (64 or 32 dimensions), and then feature splicing operation is carried out. The advertisement side (which may also be understood as the lesson side) is similar to the user side.

And providing the spliced characteristics to respective deep learning network models. The user characteristic and the advertisement characteristic are converted into vectors with fixed lengths after passing through two respective full connection layers, and user embedding and ad embedding with the same dimension are obtained. The number and dimensions of the network layers inside each tower can be different, but the output dimensions must be the same so that the operation can be performed at the matching layer.

After the model is trained, user embedding and ad embedding are respectively obtained, and if a crowd is recommended for a certain specific advertisement, the ad embedding of the advertisement is respectively compared with the user embedding of all the crowds to calculate the cos similarity. And selecting the N people group subsets with the closest distances as advertisement putting people groups, thereby completing the advertisement recommendation task.

In this embodiment, only by using the input layer and the presentation layer in the DSSM model, the user dense feature and the user sparse feature in the user history data are input and calculated, respectively, and the embedded vector corresponding to the user history data can be obtained; the Word Embedding vector (Word Embedding) can be used to convert a Word into a vector representation with a fixed length, thereby facilitating mathematical processing. Similarly, the embedded vectors of other users may also be calculated based on historical data of other users.

In one embodiment, step S102 includes:

acquiring dense user features and sparse user features in the user historical data;

inputting the dense features of the user to an input layer of the deep semantic matching model for independent hot coding to obtain a first coding vector of the user;

inputting the user sparse features into an input layer of the deep semantic matching model for word embedding processing to obtain a second user coding vector;

performing feature splicing on the first user coding vector and the second user coding vector to obtain a current coding vector;

and inputting the current coding vector to a representation layer of the depth semantic matching model for full-connection processing to obtain an embedded vector corresponding to the user historical data.

In this embodiment, since the embedded vector of the user is obtained without processing the matching layer of the DSSM model, at this time, the user history data is input to the input layer of the DSSM model to perform unique hot coding, word embedding processing, and feature splicing to obtain a current coded vector, and then the current coded vector is input to the presentation layer of the deep semantic matching model to perform full connection processing to obtain the embedded vector corresponding to the user history data, so that the embedded vector corresponding to the user can be obtained quickly. The word embedding processing is used for forming intermediate features through dimension reduction and mapping in the process of sparse high-dimensional feature vector processing.

S103, acquiring an embedded vector set corresponding to the stored other user sets, clustering according to the embedded vector and the embedded vector set to obtain a user cluster, and acquiring a target user cluster sub-cluster to which the embedded vector belongs and a target user unique identification code set corresponding to the target user cluster sub-cluster from the user cluster.

In this embodiment, after the embedded vector set is formed by obtaining the embedded vectors corresponding to the users, the embedded vector set may be clustered in the server, so as to obtain a plurality of user cluster clusters and a target user cluster sub-cluster to which the embedded vectors belong, and accurately obtain a target user unique identifier set corresponding to the target user cluster sub-cluster.

In one embodiment, step S103 includes:

acquiring an embedded vector set corresponding to other stored user sets, and performing K-means clustering on the embedded vectors and the embedded vector set to obtain user clusters with the same number as the preset clustering group number;

and acquiring a user clustering sub-cluster corresponding to the embedded vector as a target user clustering sub-cluster, and acquiring user unique identification codes corresponding to all embedded vectors in the target user clustering sub-cluster to form a target user unique identification code set.

In this embodiment, the embedded vectors and the embedded vector set are subjected to K-means clustering to obtain user cluster clusters. And directly acquiring the user grouping sub-cluster to which the embedded vector belongs, namely determining the user grouping sub-cluster to which the user corresponds, and acquiring the user unique identification code corresponding to each embedded vector in the target user grouping sub-cluster to form a target user unique identification code set.

In an embodiment, the performing K-means clustering on the embedded vector and the embedded vector set to obtain user cluster clusters having the same number as a preset cluster group number includes:

selecting the same number of embedded vectors as the number of preset clustering groups in the embedded vector set, and taking the selected embedded vectors as the initial clustering center of each cluster;

dividing the embedded vector set according to the cosine similarity between each embedded vector in the embedded vector set and each initial clustering center to obtain an initial clustering result;

obtaining the adjusted clustering center of each cluster according to the initial clustering result;

and according to the adjusted clustering center, dividing the embedded vectors of the embedded vector set according to the cosine similarity of the adjusted clustering center until the clustering result keeps the same times more than the preset times, and thus obtaining the user clustering cluster.

In this embodiment, since the embedded vector set may be clustered by the K-means clustering method, the specific process is as follows:

a) randomly selecting N2 embedded vectors from an embedded vector set comprising N1 embedded vectors, and using the embedded vectors as initial clustering centers of N2 clusters; the initial total number of the embedded vectors in the embedded vector set is N1, N2 embedded vectors are arbitrarily selected from the embedded vectors (N2< N1, N2 is a preset number of cluster groups, that is, a preset number of cluster groups), and the initially selected N2 embedded vectors are used as initial cluster centers.

b) Respectively calculating cosine similarity of the rest embedded vectors to N2 initial clustering centers, and respectively classifying the rest embedded vectors to clusters with the minimum cosine similarity to obtain initial clustering results; selecting the initial clustering center closest to each embedded vector, and classifying the initial clustering centers into one class; this divides the embedding vector into N2 clusters with an initial cluster center initially selected, one for each cluster of data.

c) And according to the initial clustering result, re-calculating the clustering centers of the N2 clusters.

d) Re-clustering all elements in the N1 embedded vectors according to a new clustering center;

e) and d), repeating the step d) until the clustering result is not changed any more, and obtaining the clustering result corresponding to the preset clustering cluster number.

After the cluster classification is completed, the embedded vector sets can be quickly grouped to obtain a plurality of cluster clusters to form user cluster clusters.

And S104, acquiring a user portrait label set corresponding to each user unique identification code in the target user unique identification code set, and counting the number of user portrait labels to obtain a user portrait label counting result.

In this embodiment, all the user portrait tags included in the user portrait tag set corresponding to each user unique identifier in the target user unique identifier set and the frequency of occurrence of each user portrait tag are counted, thereby completing the counting work.

And S105, sorting the user portrait label statistical results in a descending order according to the number of the user portrait labels to obtain user portrait label sorting results, and obtaining the user portrait labels which do not exceed a preset ranking threshold value in the user portrait label sorting results to form a popular user portrait label set.

In this embodiment, the user portrait tags that do not exceed the preset ranking threshold in the user portrait tag sorting result are used as popular user portrait tags, so as to form a popular user portrait tag set. At this time, it is equivalent to divide the user into corresponding user groups based on the user history data, and then continuously adjust the dynamic tag of the user based on the hot tag in the hot group.

And S106, combining the subscription tag subset with the popular user portrait tag set to obtain the current optimal tag set of the user corresponding to the unique user identification code.

In this embodiment, after obtaining the fixed user tag corresponding to the subscription tag subset and the dynamic tag corresponding to the popular user portrait tag set, the subscription tag subset and the popular user portrait tag set may be combined to obtain the current optimal user tag set corresponding to the user unique identifier.

The sum of the label weight values corresponding to all the popular user portrait labels in the popular user portrait label set is recorded as a total label change weight value, and the sum of the total label change weight value and the total label fixed weight value is 1. At this time, it should be noted that the sum of the total value of the label change weights and the total value of the label fixed weights is 1, so that the server can provide the user with the content corresponding to the label corresponding to the weight corresponding to each label in the current optimal label set of the user.

In an embodiment, step S106 is followed by:

and if a hot user portrait label updating instruction is detected, acquiring the unique user identification code, searching and acquiring corresponding current user data in a local user database, taking the current user data updating as user historical data, returning to execute the calling of the pre-trained deep semantic matching model, inputting the user historical data into the deep semantic matching model for operation, and obtaining an embedded vector corresponding to the user historical data.

In this embodiment, in order to dynamically adjust the popular user portrait tag set of the user, at this time, the server may also periodically trigger generation of a popular user portrait tag update instruction (for example, a popular user portrait tag update instruction is automatically generated at 1 am in every natural month), at this time, the server obtains current user data accumulated in the last natural month of the user, uses the current user data update as user history data, returns to execute the calling of the pre-trained deep semantic matching model, and inputs the user history data into the deep semantic matching model for operation, so as to obtain an embedded vector corresponding to the user history data. By the method, the popular user portrait label can be dynamically adjusted regularly, and the phenomenon that the label has a Martian effect is avoided.

The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.

The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

The method enables the user label to have a fixed subscription label and also have a dynamic label fed back along with the user clicking behavior, and diversity and accuracy of the content recommended based on the user label are achieved.

The embodiment of the invention also provides a user label updating device based on artificial intelligence, which is used for executing any embodiment of the user label updating method based on artificial intelligence. Specifically, referring to fig. 3, fig. 3 is a schematic block diagram of an artificial intelligence-based user tag updating apparatus according to an embodiment of the present invention. The artificial intelligence based user tag updating apparatus 100 may be configured in a server.

The server may be an independent server, or may be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), and a big data and artificial intelligence platform.

As shown in fig. 3, the artificial intelligence based user tag updating apparatus 100 includes: a user history data acquisition unit 101, an embedded vector acquisition unit 102, a target identification code set acquisition unit 103, a tag statistics unit 104, a hot tag set acquisition unit 105, and an optimal tag set acquisition unit 106.

The user history data obtaining unit 101 is configured to receive the subscription tag subset uploaded by the user side if the subscription tag set distribution instruction is detected, and obtain the user history data according to the user unique identification code of the user side.

In this embodiment, when the server detects a subscription tag set distribution instruction, it indicates that the server desires to distribute a fixed subscription tag set to some users, and at this time, the server first obtains a locally stored subscription tag set and then sends the subscription tag set to the user side. When a subscription tag set distribution instruction is triggered, a target user side for receiving the subscription tag set can be selected, and the target user side is directly and directionally sent to the target user side to realize the extension of the tag of the target user side.

In one embodiment, the user history data obtaining unit 101 includes:

the subscription tag set sending unit is used for acquiring a stored subscription tag set and sending the subscription tag set to a user side if a subscription tag set distribution instruction is detected;

a subscription tag subset storage unit, configured to receive a subscription tag subset sent by a user terminal according to the subscription tag set, map and bind the subscription tag subset and a user unique identifier corresponding to the user terminal, and store the mapping and binding result in a local place;

and the historical data retrieval unit is used for retrieving and acquiring corresponding user historical data in the local user database according to the unique user identification code.

In this embodiment, when the user receives the subscription tag set sent by the server, it may be determined that the user corresponds to the selected subscription tag by the following methods:

firstly, whether the user tags which are the same as the subscription tags in the subscription tag set are stored or not is directly judged in the user data locally stored in the user side, if yes, the subscription tags in the subscription tag set can be automatically selected, and the operation is repeated until the subscription tags are determined according to the locally stored user data, so that the subscription tag subset is formed. For example, the subscription tag set originally sent by the server includes 10 subscription tags, after the data comparison operation, 3 of the subscription tags are selected to form the subscription tag subset, and the fixed weight values of the tags corresponding to the 3 subscription tags are determined according to the local user data of the user side (for example, the historical frequency of respective clicks of the 3 types of subscription tags by the user).

And secondly, the subscription label set is directly displayed on the labels on the user terminal interface for the user to click and select, and when the user finishes selecting the labels on the interface and sets a label fixed weight value for each selected subscription label, the selected subscription labels form a subscription label subset.

Once the subset of subscription tags is obtained on the user side, it is sent to the server. After receiving the subscription tag subset, the server maps and binds the subscription tag subset and the user unique identification code corresponding to the user side and stores the mapping and binding in the local; the subscription label subset comprises at least one subscription label, each subscription label corresponds to one label fixed weight value, and the sum of the label fixed weight values of the subscription labels in the subscription label subset is marked as a total label fixed weight value. Due to the fact that the setting of the subscription tag subset is completed, when the tags are added to the user tags of the corresponding users, the weight of the subscription tags cannot be changed along with the fact that the users further use the user side to watch videos, listen to music, watch literary works, buy commodities and the like, and therefore the server can stably push corresponding contents for the fixed tags of the users all the time, the situation that the users click the contents corresponding to a certain tag more and more is avoided, and the server subsequently only recommends the contents corresponding to the tag. The subscription label subset comprises at least one subscription label, each subscription label corresponds to one label fixed weight value, and the sum of the label fixed weight values of the subscription labels in the subscription label subset is marked as a total label fixed weight value.

After the user fixed interest tags are set according to the subscription tag subsets, the sum of the tag fixed weight values of the subscription tags in the subscription tag subsets is recorded as a total tag fixed weight value, and the total tag fixed weight value is less than 1, that is, a part of weight space is reserved to assign values to the dynamic user tags of the user, so that the dynamic adjustment of the user for some non-fixed tags is comprehensively considered. At this time, in order to dynamically adjust the user tags based on the user history data, the server retrieves and acquires the corresponding user history data from the local user database according to the unique user identification code.

And the embedded vector acquisition unit 102 is configured to invoke a pre-trained deep semantic matching model, input the user history data into the deep semantic matching model, and perform operation to obtain an embedded vector corresponding to the user history data.

In the embodiment, the server locally stores a pre-trained deep semantic matching model (i.e. DSSM model), and the general DSSM model can be generally divided into three layers, namely an input layer, a presentation layer and a matching layer.

User feature training data are input into the input layer, the user features comprise user dense features and user sparse features, the user dense features are subjected to one-hot coding operation, the user sparse features are subjected to embedding dimension reduction to a low-dimensional space (64 or 32 dimensions), and then feature splicing operation is carried out. The advertisement side (which may also be understood as the lesson side) is similar to the user side.

And providing the spliced characteristics to respective deep learning network models. The user characteristic and the advertisement characteristic are converted into vectors with fixed lengths after passing through two respective full connection layers, and user embedding and ad embedding with the same dimension are obtained. The number and dimensions of the network layers inside each tower can be different, but the output dimensions must be the same so that the operation can be performed at the matching layer.

After the model is trained, user embedding and ad embedding are respectively obtained, and if a crowd is recommended for a certain specific advertisement, the ad embedding of the advertisement is respectively compared with the user embedding of all the crowds to calculate the cos similarity. And selecting the N people group subsets with the closest distances as advertisement putting people groups, thereby completing the advertisement recommendation task.

In this embodiment, only by using the input layer and the presentation layer in the DSSM model, the user dense feature and the user sparse feature in the user history data are input and calculated, respectively, and the embedded vector corresponding to the user history data can be obtained; . Similarly, the embedded vectors of other users may also be calculated based on historical data of other users.

In one embodiment, the embedded vector obtaining unit 102 includes:

the user characteristic acquisition unit is used for acquiring dense user characteristics and sparse user characteristics in the user historical data;

the first coding unit is used for inputting the dense features of the user to an input layer of the depth semantic matching model for independent hot coding to obtain a first coding vector of the user;

the second coding unit is used for inputting the user sparse features to an input layer of the deep semantic matching model for word embedding processing to obtain a second coding vector of the user;

performing feature splicing on the first user coding vector and the second user coding vector to obtain a current coding vector;

and the full-connection unit is used for inputting the current coding vector to the representation layer of the depth semantic matching model for full-connection processing to obtain an embedded vector corresponding to the user historical data.

In this embodiment, since the embedded vector of the user is obtained without processing the matching layer of the DSSM model, at this time, the user history data is input to the input layer of the DSSM model to perform unique hot coding and feature splicing to obtain a current coding vector, and then the current coding vector is input to the presentation layer of the deep semantic matching model to perform full connection processing to obtain the embedded vector corresponding to the user history data, so that the embedded vector corresponding to the user can be obtained quickly.

A target id set obtaining unit 103, configured to obtain an embedded vector set corresponding to another stored user set, perform clustering according to the embedded vector and the embedded vector set to obtain a user cluster, and obtain, from the user cluster, a target user cluster sub-cluster to which the embedded vector belongs and a target user unique id set corresponding to the target user cluster sub-cluster.

In this embodiment, after the embedded vector set is formed by obtaining the embedded vectors corresponding to the users, the embedded vector set may be clustered in the server, so as to obtain a plurality of user cluster clusters and a target user cluster sub-cluster to which the embedded vectors belong, and accurately obtain a target user unique identifier set corresponding to the target user cluster sub-cluster.

In an embodiment, the target identification code set obtaining unit 103 includes:

the K-means clustering unit is used for acquiring an embedded vector set corresponding to other stored user sets, and carrying out K-means clustering on the embedded vectors and the embedded vector set to obtain user cluster clusters with the same number as the preset cluster group number;

and the target user unique identification code set acquisition unit is used for acquiring the user clustering sub-cluster corresponding to the embedded vector as a target user clustering sub-cluster, acquiring the user unique identification codes corresponding to all embedded vectors in the target user clustering sub-cluster respectively, and forming a target user unique identification code set.

In this embodiment, the embedded vectors and the embedded vector set are subjected to K-means clustering to obtain user cluster clusters. And directly acquiring the user grouping sub-cluster to which the embedded vector belongs, namely determining the user grouping sub-cluster to which the user corresponds, and acquiring the user unique identification code corresponding to each embedded vector in the target user grouping sub-cluster to form a target user unique identification code set.

In an embodiment, the K-means clustering unit includes:

an initial clustering center obtaining unit, configured to select, in the embedded vector set, embedded vectors that are the same in number as a preset number of clustering groups, and use the selected embedded vectors as an initial clustering center of each cluster;

the initial clustering unit is used for dividing the embedded vector set according to the cosine similarity between each embedded vector in the embedded vector set and each initial clustering center to obtain an initial clustering result;

the cluster center adjusting unit is used for acquiring the adjusted cluster center of each cluster according to the initial cluster result;

and the final clustering result obtaining unit is used for dividing the embedded vectors of the embedded vector set according to the cosine similarity of the adjusted clustering center according to the adjusted clustering center until the clustering result keeps the same times more than the preset times, so as to obtain the user clustering cluster.

In this embodiment, since the embedded vector set may be clustered by the K-means clustering method, the specific process is as follows:

a) randomly selecting N2 embedded vectors from an embedded vector set comprising N1 embedded vectors, and using the embedded vectors as initial clustering centers of N2 clusters; the initial total number of the embedded vectors in the embedded vector set is N1, N2 embedded vectors are arbitrarily selected from the embedded vectors (N2< N1, N2 is a preset number of cluster groups, that is, a preset number of cluster groups), and the initially selected N2 embedded vectors are used as initial cluster centers.

b) Respectively calculating cosine similarity of the rest embedded vectors to N2 initial clustering centers, and respectively classifying the rest embedded vectors to clusters with the minimum cosine similarity to obtain initial clustering results; selecting the initial clustering center closest to each embedded vector, and classifying the initial clustering centers into one class; this divides the embedding vector into N2 clusters with an initial cluster center initially selected, one for each cluster of data.

c) And according to the initial clustering result, re-calculating the clustering centers of the N2 clusters.

d) Re-clustering all elements in the N1 embedded vectors according to a new clustering center;

e) and d), repeating the step d) until the clustering result is not changed any more, and obtaining the clustering result corresponding to the preset clustering cluster number.

After the cluster classification is completed, the embedded vector sets can be quickly grouped to obtain a plurality of cluster clusters to form user cluster clusters.

And the tag counting unit 104 is used for acquiring a user portrait tag set corresponding to each user unique identification code in the target user unique identification code set, and counting the number of each user portrait tag to obtain a user portrait tag counting result.

In this embodiment, all the user portrait tags included in the user portrait tag set corresponding to each user unique identifier in the target user unique identifier set and the frequency of occurrence of each user portrait tag are counted, thereby completing the counting work.

And the hot tag set acquisition unit 105 is used for sequencing the user portrait tag statistical results in a descending order according to the number of the user portrait tags to obtain user portrait tag sequencing results, and acquiring user portrait tags which do not exceed a preset ranking threshold value in the user portrait tag sequencing results to form a hot user portrait tag set.

In this embodiment, the user portrait tags that do not exceed the preset ranking threshold in the user portrait tag sorting result are used as popular user portrait tags, so as to form a popular user portrait tag set. At this time, it is equivalent to divide the user into corresponding user groups based on the user history data, and then continuously adjust the dynamic tag of the user based on the hot tag in the hot group.

An optimal tag set obtaining unit 106, configured to combine the subscription tag subset with the popular user portrait tag set to obtain a current optimal tag set of the user corresponding to the user unique identifier.

In this embodiment, after obtaining the fixed user tag corresponding to the subscription tag subset and the dynamic tag corresponding to the popular user portrait tag set, the subscription tag subset and the popular user portrait tag set may be combined to obtain the current optimal user tag set corresponding to the user unique identifier.

The sum of the label weight values corresponding to all the popular user portrait labels in the popular user portrait label set is recorded as a total label change weight value, and the sum of the total label change weight value and the total label fixed weight value is 1. At this time, it should be noted that the sum of the total value of the label change weights and the total value of the label fixed weights is 1, so that the server can provide the user with the content corresponding to the label corresponding to the weight corresponding to each label in the current optimal label set of the user.

In one embodiment, the artificial intelligence based user tag updating apparatus 100 further comprises:

and the label updating unit is used for acquiring the unique user identification code and searching and acquiring corresponding current user data in a local user database if a hot user portrait label updating instruction is detected, returning and executing the calling of the pre-trained deep semantic matching model by taking the current user data as user historical data, and inputting the user historical data into the deep semantic matching model for operation to obtain an embedded vector corresponding to the user historical data.

In this embodiment, in order to dynamically adjust the popular user portrait tag set of the user, at this time, the server may also periodically trigger generation of a popular user portrait tag update instruction (for example, a popular user portrait tag update instruction is automatically generated at 1 am in every natural month), at this time, the server obtains current user data accumulated in the last natural month of the user, uses the current user data update as user history data, returns to execute the calling of the pre-trained deep semantic matching model, and inputs the user history data into the deep semantic matching model for operation, so as to obtain an embedded vector corresponding to the user history data. By the method, the popular user portrait label can be dynamically adjusted regularly, and the phenomenon that the label has a Martian effect is avoided.

The device enables the user to have a fixed subscription label and a dynamic label fed back along with the user clicking behavior, and diversity and accuracy of content recommendation based on the user label are achieved.

The artificial intelligence based user tag updating apparatus may be implemented in the form of a computer program that can be run on a computer device as shown in fig. 4.

Referring to fig. 4, fig. 4 is a schematic block diagram of a computer device according to an embodiment of the present invention. The computer device 500 is a server, and the server may be an independent server or a server cluster composed of a plurality of servers.

Referring to fig. 4, the computer device 500 includes a processor 502, memory, and a network interface 505 connected by a system bus 501, where the memory may include a storage medium 503 and an internal memory 504.

The storage medium 503 may store an operating system 5031 and a computer program 5032. The computer program 5032, when executed, may cause the processor 502 to perform an artificial intelligence based user tag update method.

The processor 502 is used to provide computing and control capabilities that support the operation of the overall computer device 500.

The internal memory 504 provides an environment for the execution of the computer program 5032 in the storage medium 503, and when the computer program 5032 is executed by the processor 502, the processor 502 can be enabled to execute the artificial intelligence based user tag updating method.

The network interface 505 is used for network communication, such as providing transmission of data information. Those skilled in the art will appreciate that the configuration shown in fig. 4 is a block diagram of only a portion of the configuration associated with aspects of the present invention and is not intended to limit the computing device 500 to which aspects of the present invention may be applied, and that a particular computing device 500 may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

The processor 502 is configured to run the computer program 5032 stored in the memory to implement the artificial intelligence based user tag updating method disclosed in the embodiment of the present invention.

Those skilled in the art will appreciate that the embodiment of a computer device illustrated in fig. 4 does not constitute a limitation on the specific construction of the computer device, and that in other embodiments a computer device may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components. For example, in some embodiments, the computer device may only include a memory and a processor, and in such embodiments, the structures and functions of the memory and the processor are consistent with those of the embodiment shown in fig. 4, and are not described herein again.

It should be understood that, in the embodiment of the present invention, the Processor 502 may be a Central Processing Unit (CPU), and the Processor 502 may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

In another embodiment of the invention, a computer-readable storage medium is provided. The computer-readable storage medium may be a nonvolatile computer-readable storage medium or a volatile computer-readable storage medium. The computer readable storage medium stores a computer program, wherein the computer program, when executed by a processor, implements the artificial intelligence based user tag updating method disclosed by the embodiments of the present invention.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses, devices and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided by the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only a logical division, and there may be other divisions when the actual implementation is performed, or units having the same function may be grouped into one unit, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electric, mechanical or other form of connection.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk.

While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

20页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种书籍推荐方法、装置、计算机设备及存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!