Multimedia resource classification model training method and multimedia resource recommendation method

文档序号：190632 发布日期：2021-11-02 浏览：14次中文

阅读说明：本技术 多媒体资源分类模型训练方法和多媒体资源推荐方法 (Multimedia resource classification model training method and multimedia resource recommendation method ) 是由朱灵子马连洋于 2021-01-27 设计创作，主要内容包括：本申请涉及一种多媒体资源分类模型训练方法、多媒体资源推荐方法、装置、计算机设备和存储介质,获取训练多媒体资源的目标属性信息集合和训练标签集合,将训练多媒体资源的目标属性信息集合输入待训练的多媒体资源分类模型；通过多媒体资源分类模型中各个特征子网络,分别对与特征子网络关联的目标属性信息进行向量化处理,得到对应的属性特征向量；将各个属性特征向量输入多媒体资源分类模型中各个任务子网络,得到对应的预测标签；基于各个任务对应的预测标签和训练标签调整多媒体资源分类模型的模型参数,直至满足收敛条件,得到用于对待推荐多媒体资源的质量进行分类的多媒体资源分类模型。采用本方法能够提高资源推荐的有效性。(The application relates to a multimedia resource classification model training method, a multimedia resource recommendation method, a device, computer equipment and a storage medium, wherein a target attribute information set and a training label set of a training multimedia resource are obtained, and the target attribute information set of the training multimedia resource is input into a multimedia resource classification model to be trained; respectively carrying out vectorization processing on target attribute information associated with the feature sub-networks through each feature sub-network in the multimedia resource classification model to obtain corresponding attribute feature vectors; inputting each attribute feature vector into each task sub-network in the multimedia resource classification model to obtain a corresponding prediction label; and adjusting model parameters of the multimedia resource classification model based on the prediction labels and the training labels corresponding to the tasks until convergence conditions are met, so as to obtain the multimedia resource classification model for classifying the quality of the multimedia resources to be recommended. By adopting the method, the effectiveness of resource recommendation can be improved.)

1. A multimedia resource classification model training method is characterized by comprising the following steps:

acquiring a target attribute information set and a training label set of a training multimedia resource; the target attribute information set comprises target attribute information of multiple dimensions, and the training label set comprises training labels corresponding to multiple tasks;

inputting the target attribute information set of the training multimedia resource into a multimedia resource classification model to be trained; the multimedia resource classification model comprises a plurality of feature sub-networks and task sub-networks corresponding to the tasks respectively;

respectively carrying out vectorization processing on target attribute information associated with the feature sub-networks through each feature sub-network in the multimedia resource classification model to obtain attribute feature vectors output by each feature sub-network;

inputting each attribute feature vector into each task sub-network to obtain a prediction label corresponding to each task;

adjusting parameters of corresponding task sub-networks based on training labels and prediction labels corresponding to the same task, and adjusting model parameters of each characteristic sub-network based on the training labels and the prediction labels corresponding to each task until convergence conditions are met, so as to obtain a trained multimedia resource classification model; the multimedia resource classification model is used for classifying the quality of the multimedia resources to be recommended.

2. The method of claim 1, wherein obtaining a training label set for training a multimedia asset comprises:

acquiring a recommended interaction information set corresponding to a plurality of historical multimedia resources respectively; the recommendation interaction information set comprises recommendation interaction information corresponding to each task;

counting each piece of recommended interaction information corresponding to the same task to obtain reference interaction information corresponding to each task;

classifying the quality of the historical multimedia resources based on the recommended interaction information and the corresponding reference interaction information to obtain a quality label set corresponding to each historical multimedia resource;

and obtaining the training multimedia resources and the corresponding training label sets based on the historical multimedia resources and the corresponding quality label sets.

3. The method of claim 2, wherein the classifying the quality of the historical multimedia resources based on the recommended interaction information and the corresponding reference interaction information to obtain a set of quality labels corresponding to each historical multimedia resource comprises:

comparing the recommended interaction degree corresponding to the recommended interaction information of the same task with the reference interaction degree corresponding to the reference interaction information in the recommended interaction information set corresponding to the same historical multimedia resource;

determining a quality label of a task corresponding to recommended interaction information with the recommended interaction degree greater than the reference interaction degree as a positive label;

and determining the quality label of the task corresponding to the recommended interaction information with the recommended interaction degree smaller than the reference interaction degree as a negative label.

4. The method of claim 1, wherein the feature subnetwork comprises a text feature subnetwork, the target attribute information associated with the text feature subnetwork includes a plurality of text attribute information, and the text feature subnetwork includes data processing channels to which respective text attribute information respectively correspond;

the obtaining of the attribute feature vector output by each feature subnetwork by vectorizing the target attribute information associated with the feature subnetwork through each feature subnetwork in the multimedia resource classification model comprises:

vectorizing the corresponding text attribute information through each data processing channel in the text feature sub-network to obtain a text feature vector output by each data processing channel;

and obtaining the attribute feature vector output by the text feature sub-network based on each text feature vector.

5. The method of claim 1, wherein the feature subnetwork comprises an atomic feature subnetwork;

performing characteristic cross processing on target attribute information associated with the atomic feature sub-network through the atomic feature sub-network to obtain at least one cross feature vector;

and obtaining the attribute feature vector output by the atomic feature sub-network based on each cross feature vector.

6. The method of claim 5, wherein the target attribute information associated with the atomic feature subnetwork includes at least two of user attribute information, image attribute information, language attribute information, and text statistics attribute information.

7. The method of claim 1, wherein the feature subnetwork comprises a teletext feature subnetwork, the target attribute information associated with the teletext feature subnetwork comprises text attribute information and image attribute information, and the teletext feature subnetwork comprises a text data processing channel corresponding to the text attribute information and an image data processing channel corresponding to the image attribute information;

encoding the text attribute information through the text data processing channel to obtain an intermediate feature vector;

coding the image attribute information through the image data processing channel to obtain an image characteristic vector;

performing attention distribution processing on the intermediate feature vector based on the image feature vector to obtain a first image-text fusion feature vector;

performing attention distribution processing on the image feature vector based on the intermediate feature vector to obtain a second image-text fusion feature vector;

and obtaining the attribute feature vector output by the image-text fusion feature sub-network based on the first image-text fusion feature vector and the second image-text fusion feature vector.

8. The method according to claim 7, wherein said encoding the text attribute information through the text data processing channel to obtain an intermediate feature vector comprises:

performing word coding processing on the text attribute information to obtain a word feature vector;

and carrying out sentence coding processing on the word feature vector to obtain the intermediate feature vector.

9. The method of claim 1, wherein the feature subnetwork comprises a style feature subnetwork, and wherein the target attribute information associated with the style feature subnetwork comprises style attribute information; the pattern feature subnetwork comprises a first data processing channel and a second data processing channel;

encoding the style attribute information through the first data processing channel to obtain an initial feature vector, and performing attention allocation processing on the initial feature vector to obtain a first feature vector;

performing convolution processing on the style attribute information through the second data processing channel to obtain a second feature vector;

and obtaining the attribute feature vector output by the style feature sub-network based on the first feature vector and the second feature vector.

10. The method of claim 1, wherein each of the task sub-networks comprises an expert layer, a gating layer, and a fusion layer; the task sub-networks share an expert layer;

the step of inputting the attribute feature vectors into the task sub-networks to obtain the prediction labels corresponding to the tasks includes:

in the current task sub-network, the expert layer performs feature processing on the attribute feature vectors to obtain feature processing results, the gating layer performs weighting processing on the feature processing results to obtain intermediate processing results, and the fusion layer performs fusion processing on the intermediate processing results to obtain a prediction label of a task corresponding to the current task sub-network.

11. The method according to any one of claims 1 to 10, further comprising:

acquiring a target attribute information set and a verification label set of a verification multimedia resource; the verification multimedia resource is a multimedia resource recommended by the update;

inputting the target attribute information set of the verified multimedia resource into a trained multimedia resource classification model to obtain a prediction label set corresponding to the verified multimedia resource;

calculating classification accuracy based on the prediction label set and the verification label set corresponding to the verification multimedia resource;

and when the classification accuracy is smaller than an accuracy threshold, updating the trained multimedia resource classification model based on the prediction label set and the training label set corresponding to the verified multimedia resource to obtain an updated multimedia resource classification model.

12. A method for recommending multimedia resources, the method comprising:

acquiring a target attribute information set of a multimedia resource to be recommended; the target attribute information set comprises target attribute information of a plurality of dimensions;

inputting the target attribute information set into a trained multimedia resource classification model; the multimedia resource classification model comprises a plurality of feature sub-networks and a plurality of task sub-networks;

inputting each attribute feature vector into each task sub-network to obtain a prediction label output by each task sub-network;

obtaining a quality classification result corresponding to the multimedia resource to be recommended based on each prediction label;

and recommending the multimedia resources to be recommended based on the quality classification result.

13. An apparatus for training a multimedia resource classification model, the apparatus comprising:

the information acquisition module is used for acquiring a target attribute information set and a training label set of a training multimedia resource; the target attribute information set comprises target attribute information of multiple dimensions, and the training label set comprises training labels corresponding to multiple tasks;

the attribute information input module is used for inputting the target attribute information set of the training multimedia resources into a multimedia resource classification model to be trained; the multimedia resource classification model comprises a plurality of feature sub-networks and task sub-networks corresponding to the tasks respectively;

the attribute information processing module is used for respectively carrying out vectorization processing on the target attribute information associated with the feature sub-networks through each feature sub-network in the multimedia resource classification model to obtain attribute feature vectors output by each feature sub-network;

the label prediction module is used for inputting each attribute feature vector into each task sub-network to obtain a prediction label corresponding to each task;

the model adjusting module is used for adjusting parameters of corresponding task sub-networks based on training labels and prediction labels corresponding to the same task, and adjusting model parameters of each characteristic sub-network based on the training labels and the prediction labels corresponding to each task until convergence conditions are met, so as to obtain a trained multimedia resource classification model; the multimedia resource classification model is used for classifying the quality of the multimedia resources to be recommended.

14. An apparatus for recommending multimedia resources, the apparatus comprising:

the attribute information acquisition module is used for acquiring a target attribute information set of the multimedia resource to be recommended; the target attribute information set comprises target attribute information of a plurality of dimensions;

the attribute information input module is used for inputting the target attribute information set into a trained multimedia resource classification model; the multimedia resource classification model comprises a plurality of feature sub-networks and a plurality of task sub-networks;

the label prediction module is used for inputting each attribute feature vector into each task sub-network to obtain a prediction label output by each task sub-network;

the quality classification module is used for obtaining a quality classification result corresponding to the multimedia resource to be recommended based on each prediction label;

and the resource recommending module is used for recommending the multimedia resources to be recommended based on the quality classification result.

15. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 12.

16. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 12.

Technical Field

The present application relates to the field of computer technologies, and in particular, to a multimedia resource classification model training method, a multimedia resource recommendation method, an apparatus, a computer device, and a storage medium.

Background

With the development of computer technology, a variety of network applications have emerged. People can publish multimedia resources on web applications and also browse multimedia resources on web applications.

In the conventional technology, multimedia resources are generally recommended to a user at random, the multimedia resources with low quality are easily recommended to the user, the multimedia resources recommended to the user are not concerned and interested by the user, and the multimedia resources with low quality not only occupy storage resources, but also cause the user to repeatedly search and repeatedly refresh an interface, occupy a large amount of computer resources, and finally cause the effectiveness of multimedia resource recommendation to be low.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a multimedia resource classification model training method, a multimedia resource recommendation method, an apparatus, a computer device, and a storage medium, which can improve the effectiveness of multimedia resource recommendation.

A method of multimedia resource classification model training, the method comprising:

inputting a target attribute information set of a training multimedia resource into a multimedia resource classification model to be trained; the multimedia resource classification model comprises a plurality of feature sub-networks and task sub-networks corresponding to the tasks respectively;

inputting each attribute feature vector into each task sub-network to obtain a prediction label corresponding to each task;

An apparatus for training a multimedia resource classification model, the apparatus comprising:

the attribute information input module is used for inputting a target attribute information set of the training multimedia resources into a multimedia resource classification model to be trained; the multimedia resource classification model comprises a plurality of feature sub-networks and task sub-networks corresponding to the tasks respectively;

the label prediction module is used for inputting each attribute feature vector into each task sub-network to obtain a prediction label corresponding to each task;

the model adjusting module is used for adjusting parameters of corresponding task sub-networks by using training labels and prediction labels corresponding to the same task, and adjusting model parameters of each characteristic sub-network based on the training labels and the prediction labels corresponding to each task until convergence conditions are met to obtain a trained multimedia resource classification model; the multimedia resource classification model is used for classifying the quality of the multimedia resource to be recommended.

In one embodiment, the information obtaining module is further configured to obtain recommended interaction information sets corresponding to the plurality of historical multimedia resources, respectively; the recommendation interaction information set comprises recommendation interaction information corresponding to each task; counting each piece of recommended interaction information corresponding to the same task to obtain reference interaction information corresponding to each task; classifying the quality of the historical multimedia resources based on the recommended interaction information and the corresponding reference interaction information to obtain a quality label set corresponding to each historical multimedia resource; and obtaining a training multimedia resource and a corresponding training label set based on the historical multimedia resource and the corresponding quality label set.

In one embodiment, the information obtaining module is further configured to compare, in a recommended interaction information set corresponding to the same historical multimedia resource, recommended interaction degrees corresponding to recommended interaction information of the same category with reference interaction degrees corresponding to reference interaction information; determining a quality label of a task corresponding to recommended interaction information with the recommended interaction degree greater than the reference interaction degree as a positive label; and determining the quality label of the task corresponding to the recommended interaction information with the recommended interaction degree smaller than the reference interaction degree as a negative label.

In one embodiment, the feature subnetwork includes a text feature subnetwork, the target attribute information associated with the text feature subnetwork includes a plurality of text attribute information, and the text feature subnetwork includes data processing channels to which respective text attribute information respectively correspond. The attribute information processing module is also used for respectively carrying out vectorization processing on the corresponding text attribute information through each data processing channel in the text feature sub-network to obtain a text feature vector output by each data processing channel; and obtaining attribute feature vectors output by the text feature sub-network based on the text feature vectors.

In one embodiment, the feature subnetwork comprises an atomic feature subnetwork. The attribute information processing module is also used for performing characteristic cross processing on target attribute information associated with the atomic feature sub-network through the atomic feature sub-network to obtain at least one cross feature vector; and obtaining the attribute feature vector output by the atomic feature subnetwork based on each cross feature vector.

In one embodiment, the target attribute information associated with the sub-network of atomic features includes at least two of user attribute information, image attribute information, language attribute information, and text statistical attribute information.

In one embodiment, the feature subnetwork includes a teletext feature subnetwork, the target attribute information associated with the teletext feature subnetwork includes text attribute information and image attribute information, and the teletext feature subnetwork includes a text data processing channel corresponding to the text attribute information and an image data processing channel corresponding to the image attribute information. The attribute information processing module is also used for coding the text attribute information through the text data processing channel to obtain an intermediate feature vector; coding the image attribute information through an image data processing channel to obtain an image characteristic vector; performing attention distribution processing on the intermediate feature vector based on the image feature vector to obtain a first image-text fusion feature vector; performing attention distribution processing on the image feature vector based on the intermediate feature vector to obtain a second image-text fusion feature vector; and obtaining the attribute feature vector output by the image-text fusion feature sub-network based on the first image-text fusion feature vector and the second image-text fusion feature vector.

In one embodiment, the attribute information processing module is further configured to perform word encoding processing on the text attribute information to obtain a word feature vector; and carrying out sentence coding processing on the word feature vector to obtain an intermediate feature vector.

In one embodiment, the feature subnetwork comprises a style feature subnetwork, and the target attribute information associated with the style feature subnetwork comprises style attribute information; the pattern feature subnetwork includes a first data processing channel and a second data processing channel. The attribute information processing module is also used for coding the style attribute information through the first data processing channel to obtain an initial feature vector, and performing attention allocation processing on the initial feature vector to obtain a first feature vector; performing convolution processing on the style attribute information through a second data processing channel to obtain a second feature vector; and obtaining the attribute feature vector output by the style feature sub-network based on the first feature vector and the second feature vector.

In one embodiment, each task subnetwork includes an expert layer, a gating layer, and a fusion layer; the task sub-networks share an expert layer. The label prediction module is further used for performing feature processing on each attribute feature vector through an expert layer in a current task sub-network to obtain a feature processing result, performing weighting processing on the feature processing result through a gate control layer to obtain an intermediate processing result, and performing fusion processing on the intermediate processing result through a fusion layer to obtain a prediction label of a task corresponding to the current task sub-network.

In one embodiment, the apparatus further comprises:

the model updating module is used for acquiring a target attribute information set and a verification label set of the verification multimedia resource; verifying that the multimedia resource is the recommended multimedia resource for updating; inputting a target attribute information set of the verified multimedia resource into the trained multimedia resource classification model to obtain a prediction label set corresponding to the verified multimedia resource; calculating classification accuracy based on a prediction label set and a verification label set corresponding to the verification multimedia resource; and when the classification accuracy is smaller than the accuracy threshold, updating the trained multimedia resource classification model based on the prediction label set and the training label set corresponding to the verified multimedia resource to obtain the updated multimedia resource classification model.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

inputting each attribute feature vector into each task sub-network to obtain a prediction label corresponding to each task;

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

inputting each attribute feature vector into each task sub-network to obtain a prediction label corresponding to each task;

The multimedia resource classification model training method, the device, the computer equipment and the storage medium are characterized in that a target attribute information set and a training label set for training multimedia resources are obtained, the target attribute information set comprises target attribute information with multiple dimensions, the training label set comprises training labels corresponding to multiple tasks, the target attribute information set for training the multimedia resources is input into a multimedia resource classification model to be trained, the multimedia resource classification model comprises multiple feature subnetworks and task subnetworks corresponding to the tasks respectively, vectorization processing is carried out on the target attribute information associated with the feature subnetworks through the feature subnetworks in the multimedia resource classification model to obtain attribute feature vectors output by the feature subnetworks, and the attribute feature vectors are input into the task subnetworks to obtain prediction labels corresponding to the tasks, and adjusting parameters of corresponding task sub-networks based on training labels and prediction labels corresponding to the same task, adjusting model parameters of each feature sub-network based on the training labels and the prediction labels corresponding to each task until convergence conditions are met, and obtaining a trained multimedia resource classification model, wherein the multimedia resource classification model is used for classifying the quality of the multimedia resources to be recommended. Therefore, the multimedia resource classification model can be supervised trained based on the target attribute information set and the training label set of the training multimedia resources, and the multimedia resource classification model capable of accurately classifying the quality of the multimedia resources to be recommended is obtained. The target attribute information set of the multimedia resources comprises target attribute information of multiple dimensions, the target attribute information of different dimensions can reflect the content quality of the multimedia resources from different angles, the target attribute information set is input into a multimedia resource classification model, the quality of the multimedia resources can be accurately classified by comprehensively considering the target attribute information of each dimension, and a prediction label capable of accurately embodying the quality of the multimedia resources is obtained. In addition, the multimedia resource classification model comprises a plurality of task sub-networks which are multi-task models and can predict the performance of multimedia resources on each task, when the model is trained, a plurality of related tasks simultaneously learn in parallel, gradients simultaneously and reversely propagate, and the connection and difference of different tasks are learned, so that the learning efficiency and quality of each task are improved. Finally, the trained multimedia resource classification model can be used for classifying the quality of the multimedia resources to be recommended, so that the multimedia resources with better quality can be recommended to the user, and the effectiveness of multimedia resource recommendation is improved. The effective multimedia resource recommendation can avoid repeated searching or repeated refreshing of the interface caused by low-quality and ineffective multimedia resource recommendation, and the repeated searching or repeated refreshing of the interface can occupy a large amount of computer equipment resources, so that the resource waste of the terminal or the server can be reduced on the basis of improving the effectiveness of the resource recommendation.

A method of multimedia resource recommendation, the method comprising:

acquiring a target attribute information set of a multimedia resource to be recommended; the target attribute information set comprises target attribute information of a plurality of dimensions;

inputting the target attribute information set into the trained multimedia resource classification model; the multimedia resource classification model comprises a plurality of feature sub-networks and a plurality of task sub-networks;

inputting each attribute feature vector into each task sub-network to obtain a prediction label output by each task sub-network;

obtaining a quality classification result corresponding to the multimedia resource to be recommended based on each prediction label;

and recommending the multimedia resources to be recommended based on the quality classification result.

A multimedia asset recommendation device, the device comprising:

the attribute information input module is used for inputting the target attribute information set into the trained multimedia resource classification model; the multimedia resource classification model comprises a plurality of feature sub-networks and a plurality of task sub-networks;