Multimedia resource classification model training method and multimedia resource recommendation method

文档序号:190632 发布日期:2021-11-02 浏览:14次 中文

阅读说明:本技术 多媒体资源分类模型训练方法和多媒体资源推荐方法 (Multimedia resource classification model training method and multimedia resource recommendation method ) 是由 朱灵子 马连洋 于 2021-01-27 设计创作,主要内容包括:本申请涉及一种多媒体资源分类模型训练方法、多媒体资源推荐方法、装置、计算机设备和存储介质,获取训练多媒体资源的目标属性信息集合和训练标签集合,将训练多媒体资源的目标属性信息集合输入待训练的多媒体资源分类模型;通过多媒体资源分类模型中各个特征子网络,分别对与特征子网络关联的目标属性信息进行向量化处理,得到对应的属性特征向量;将各个属性特征向量输入多媒体资源分类模型中各个任务子网络,得到对应的预测标签;基于各个任务对应的预测标签和训练标签调整多媒体资源分类模型的模型参数,直至满足收敛条件,得到用于对待推荐多媒体资源的质量进行分类的多媒体资源分类模型。采用本方法能够提高资源推荐的有效性。(The application relates to a multimedia resource classification model training method, a multimedia resource recommendation method, a device, computer equipment and a storage medium, wherein a target attribute information set and a training label set of a training multimedia resource are obtained, and the target attribute information set of the training multimedia resource is input into a multimedia resource classification model to be trained; respectively carrying out vectorization processing on target attribute information associated with the feature sub-networks through each feature sub-network in the multimedia resource classification model to obtain corresponding attribute feature vectors; inputting each attribute feature vector into each task sub-network in the multimedia resource classification model to obtain a corresponding prediction label; and adjusting model parameters of the multimedia resource classification model based on the prediction labels and the training labels corresponding to the tasks until convergence conditions are met, so as to obtain the multimedia resource classification model for classifying the quality of the multimedia resources to be recommended. By adopting the method, the effectiveness of resource recommendation can be improved.)

1. A multimedia resource classification model training method is characterized by comprising the following steps:

acquiring a target attribute information set and a training label set of a training multimedia resource; the target attribute information set comprises target attribute information of multiple dimensions, and the training label set comprises training labels corresponding to multiple tasks;

inputting the target attribute information set of the training multimedia resource into a multimedia resource classification model to be trained; the multimedia resource classification model comprises a plurality of feature sub-networks and task sub-networks corresponding to the tasks respectively;

respectively carrying out vectorization processing on target attribute information associated with the feature sub-networks through each feature sub-network in the multimedia resource classification model to obtain attribute feature vectors output by each feature sub-network;

inputting each attribute feature vector into each task sub-network to obtain a prediction label corresponding to each task;

adjusting parameters of corresponding task sub-networks based on training labels and prediction labels corresponding to the same task, and adjusting model parameters of each characteristic sub-network based on the training labels and the prediction labels corresponding to each task until convergence conditions are met, so as to obtain a trained multimedia resource classification model; the multimedia resource classification model is used for classifying the quality of the multimedia resources to be recommended.

2. The method of claim 1, wherein obtaining a training label set for training a multimedia asset comprises:

acquiring a recommended interaction information set corresponding to a plurality of historical multimedia resources respectively; the recommendation interaction information set comprises recommendation interaction information corresponding to each task;

counting each piece of recommended interaction information corresponding to the same task to obtain reference interaction information corresponding to each task;

classifying the quality of the historical multimedia resources based on the recommended interaction information and the corresponding reference interaction information to obtain a quality label set corresponding to each historical multimedia resource;

and obtaining the training multimedia resources and the corresponding training label sets based on the historical multimedia resources and the corresponding quality label sets.

3. The method of claim 2, wherein the classifying the quality of the historical multimedia resources based on the recommended interaction information and the corresponding reference interaction information to obtain a set of quality labels corresponding to each historical multimedia resource comprises:

comparing the recommended interaction degree corresponding to the recommended interaction information of the same task with the reference interaction degree corresponding to the reference interaction information in the recommended interaction information set corresponding to the same historical multimedia resource;

determining a quality label of a task corresponding to recommended interaction information with the recommended interaction degree greater than the reference interaction degree as a positive label;

and determining the quality label of the task corresponding to the recommended interaction information with the recommended interaction degree smaller than the reference interaction degree as a negative label.

4. The method of claim 1, wherein the feature subnetwork comprises a text feature subnetwork, the target attribute information associated with the text feature subnetwork includes a plurality of text attribute information, and the text feature subnetwork includes data processing channels to which respective text attribute information respectively correspond;

the obtaining of the attribute feature vector output by each feature subnetwork by vectorizing the target attribute information associated with the feature subnetwork through each feature subnetwork in the multimedia resource classification model comprises:

vectorizing the corresponding text attribute information through each data processing channel in the text feature sub-network to obtain a text feature vector output by each data processing channel;

and obtaining the attribute feature vector output by the text feature sub-network based on each text feature vector.

5. The method of claim 1, wherein the feature subnetwork comprises an atomic feature subnetwork;

the obtaining of the attribute feature vector output by each feature subnetwork by vectorizing the target attribute information associated with the feature subnetwork through each feature subnetwork in the multimedia resource classification model comprises:

performing characteristic cross processing on target attribute information associated with the atomic feature sub-network through the atomic feature sub-network to obtain at least one cross feature vector;

and obtaining the attribute feature vector output by the atomic feature sub-network based on each cross feature vector.

6. The method of claim 5, wherein the target attribute information associated with the atomic feature subnetwork includes at least two of user attribute information, image attribute information, language attribute information, and text statistics attribute information.

7. The method of claim 1, wherein the feature subnetwork comprises a teletext feature subnetwork, the target attribute information associated with the teletext feature subnetwork comprises text attribute information and image attribute information, and the teletext feature subnetwork comprises a text data processing channel corresponding to the text attribute information and an image data processing channel corresponding to the image attribute information;

the obtaining of the attribute feature vector output by each feature subnetwork by vectorizing the target attribute information associated with the feature subnetwork through each feature subnetwork in the multimedia resource classification model comprises:

encoding the text attribute information through the text data processing channel to obtain an intermediate feature vector;

coding the image attribute information through the image data processing channel to obtain an image characteristic vector;

performing attention distribution processing on the intermediate feature vector based on the image feature vector to obtain a first image-text fusion feature vector;

performing attention distribution processing on the image feature vector based on the intermediate feature vector to obtain a second image-text fusion feature vector;

and obtaining the attribute feature vector output by the image-text fusion feature sub-network based on the first image-text fusion feature vector and the second image-text fusion feature vector.

8. The method according to claim 7, wherein said encoding the text attribute information through the text data processing channel to obtain an intermediate feature vector comprises:

performing word coding processing on the text attribute information to obtain a word feature vector;

and carrying out sentence coding processing on the word feature vector to obtain the intermediate feature vector.

9. The method of claim 1, wherein the feature subnetwork comprises a style feature subnetwork, and wherein the target attribute information associated with the style feature subnetwork comprises style attribute information; the pattern feature subnetwork comprises a first data processing channel and a second data processing channel;

the obtaining of the attribute feature vector output by each feature subnetwork by vectorizing the target attribute information associated with the feature subnetwork through each feature subnetwork in the multimedia resource classification model comprises:

encoding the style attribute information through the first data processing channel to obtain an initial feature vector, and performing attention allocation processing on the initial feature vector to obtain a first feature vector;

performing convolution processing on the style attribute information through the second data processing channel to obtain a second feature vector;

and obtaining the attribute feature vector output by the style feature sub-network based on the first feature vector and the second feature vector.

10. The method of claim 1, wherein each of the task sub-networks comprises an expert layer, a gating layer, and a fusion layer; the task sub-networks share an expert layer;

the step of inputting the attribute feature vectors into the task sub-networks to obtain the prediction labels corresponding to the tasks includes:

in the current task sub-network, the expert layer performs feature processing on the attribute feature vectors to obtain feature processing results, the gating layer performs weighting processing on the feature processing results to obtain intermediate processing results, and the fusion layer performs fusion processing on the intermediate processing results to obtain a prediction label of a task corresponding to the current task sub-network.

11. The method according to any one of claims 1 to 10, further comprising:

acquiring a target attribute information set and a verification label set of a verification multimedia resource; the verification multimedia resource is a multimedia resource recommended by the update;

inputting the target attribute information set of the verified multimedia resource into a trained multimedia resource classification model to obtain a prediction label set corresponding to the verified multimedia resource;

calculating classification accuracy based on the prediction label set and the verification label set corresponding to the verification multimedia resource;

and when the classification accuracy is smaller than an accuracy threshold, updating the trained multimedia resource classification model based on the prediction label set and the training label set corresponding to the verified multimedia resource to obtain an updated multimedia resource classification model.

12. A method for recommending multimedia resources, the method comprising:

acquiring a target attribute information set of a multimedia resource to be recommended; the target attribute information set comprises target attribute information of a plurality of dimensions;

inputting the target attribute information set into a trained multimedia resource classification model; the multimedia resource classification model comprises a plurality of feature sub-networks and a plurality of task sub-networks;

respectively carrying out vectorization processing on target attribute information associated with the feature sub-networks through each feature sub-network in the multimedia resource classification model to obtain attribute feature vectors output by each feature sub-network;

inputting each attribute feature vector into each task sub-network to obtain a prediction label output by each task sub-network;

obtaining a quality classification result corresponding to the multimedia resource to be recommended based on each prediction label;

and recommending the multimedia resources to be recommended based on the quality classification result.

13. An apparatus for training a multimedia resource classification model, the apparatus comprising:

the information acquisition module is used for acquiring a target attribute information set and a training label set of a training multimedia resource; the target attribute information set comprises target attribute information of multiple dimensions, and the training label set comprises training labels corresponding to multiple tasks;

the attribute information input module is used for inputting the target attribute information set of the training multimedia resources into a multimedia resource classification model to be trained; the multimedia resource classification model comprises a plurality of feature sub-networks and task sub-networks corresponding to the tasks respectively;

the attribute information processing module is used for respectively carrying out vectorization processing on the target attribute information associated with the feature sub-networks through each feature sub-network in the multimedia resource classification model to obtain attribute feature vectors output by each feature sub-network;

the label prediction module is used for inputting each attribute feature vector into each task sub-network to obtain a prediction label corresponding to each task;

the model adjusting module is used for adjusting parameters of corresponding task sub-networks based on training labels and prediction labels corresponding to the same task, and adjusting model parameters of each characteristic sub-network based on the training labels and the prediction labels corresponding to each task until convergence conditions are met, so as to obtain a trained multimedia resource classification model; the multimedia resource classification model is used for classifying the quality of the multimedia resources to be recommended.

14. An apparatus for recommending multimedia resources, the apparatus comprising:

the attribute information acquisition module is used for acquiring a target attribute information set of the multimedia resource to be recommended; the target attribute information set comprises target attribute information of a plurality of dimensions;

the attribute information input module is used for inputting the target attribute information set into a trained multimedia resource classification model; the multimedia resource classification model comprises a plurality of feature sub-networks and a plurality of task sub-networks;

the attribute information processing module is used for respectively carrying out vectorization processing on the target attribute information associated with the feature sub-networks through each feature sub-network in the multimedia resource classification model to obtain attribute feature vectors output by each feature sub-network;

the label prediction module is used for inputting each attribute feature vector into each task sub-network to obtain a prediction label output by each task sub-network;

the quality classification module is used for obtaining a quality classification result corresponding to the multimedia resource to be recommended based on each prediction label;

and the resource recommending module is used for recommending the multimedia resources to be recommended based on the quality classification result.

15. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 12.

16. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 12.

Technical Field

The present application relates to the field of computer technologies, and in particular, to a multimedia resource classification model training method, a multimedia resource recommendation method, an apparatus, a computer device, and a storage medium.

Background

With the development of computer technology, a variety of network applications have emerged. People can publish multimedia resources on web applications and also browse multimedia resources on web applications.

In the conventional technology, multimedia resources are generally recommended to a user at random, the multimedia resources with low quality are easily recommended to the user, the multimedia resources recommended to the user are not concerned and interested by the user, and the multimedia resources with low quality not only occupy storage resources, but also cause the user to repeatedly search and repeatedly refresh an interface, occupy a large amount of computer resources, and finally cause the effectiveness of multimedia resource recommendation to be low.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a multimedia resource classification model training method, a multimedia resource recommendation method, an apparatus, a computer device, and a storage medium, which can improve the effectiveness of multimedia resource recommendation.

A method of multimedia resource classification model training, the method comprising:

acquiring a target attribute information set and a training label set of a training multimedia resource; the target attribute information set comprises target attribute information of multiple dimensions, and the training label set comprises training labels corresponding to multiple tasks;

inputting a target attribute information set of a training multimedia resource into a multimedia resource classification model to be trained; the multimedia resource classification model comprises a plurality of feature sub-networks and task sub-networks corresponding to the tasks respectively;

respectively carrying out vectorization processing on target attribute information associated with the feature sub-networks through each feature sub-network in the multimedia resource classification model to obtain attribute feature vectors output by each feature sub-network;

inputting each attribute feature vector into each task sub-network to obtain a prediction label corresponding to each task;

adjusting parameters of corresponding task sub-networks based on training labels and prediction labels corresponding to the same task, and adjusting model parameters of each characteristic sub-network based on the training labels and the prediction labels corresponding to each task until convergence conditions are met, so as to obtain a trained multimedia resource classification model; the multimedia resource classification model is used for classifying the quality of the multimedia resource to be recommended.

An apparatus for training a multimedia resource classification model, the apparatus comprising:

the information acquisition module is used for acquiring a target attribute information set and a training label set of a training multimedia resource; the target attribute information set comprises target attribute information of multiple dimensions, and the training label set comprises training labels corresponding to multiple tasks;

the attribute information input module is used for inputting a target attribute information set of the training multimedia resources into a multimedia resource classification model to be trained; the multimedia resource classification model comprises a plurality of feature sub-networks and task sub-networks corresponding to the tasks respectively;

the attribute information processing module is used for respectively carrying out vectorization processing on the target attribute information associated with the feature sub-networks through each feature sub-network in the multimedia resource classification model to obtain attribute feature vectors output by each feature sub-network;

the label prediction module is used for inputting each attribute feature vector into each task sub-network to obtain a prediction label corresponding to each task;

the model adjusting module is used for adjusting parameters of corresponding task sub-networks by using training labels and prediction labels corresponding to the same task, and adjusting model parameters of each characteristic sub-network based on the training labels and the prediction labels corresponding to each task until convergence conditions are met to obtain a trained multimedia resource classification model; the multimedia resource classification model is used for classifying the quality of the multimedia resource to be recommended.

In one embodiment, the information obtaining module is further configured to obtain recommended interaction information sets corresponding to the plurality of historical multimedia resources, respectively; the recommendation interaction information set comprises recommendation interaction information corresponding to each task; counting each piece of recommended interaction information corresponding to the same task to obtain reference interaction information corresponding to each task; classifying the quality of the historical multimedia resources based on the recommended interaction information and the corresponding reference interaction information to obtain a quality label set corresponding to each historical multimedia resource; and obtaining a training multimedia resource and a corresponding training label set based on the historical multimedia resource and the corresponding quality label set.

In one embodiment, the information obtaining module is further configured to compare, in a recommended interaction information set corresponding to the same historical multimedia resource, recommended interaction degrees corresponding to recommended interaction information of the same category with reference interaction degrees corresponding to reference interaction information; determining a quality label of a task corresponding to recommended interaction information with the recommended interaction degree greater than the reference interaction degree as a positive label; and determining the quality label of the task corresponding to the recommended interaction information with the recommended interaction degree smaller than the reference interaction degree as a negative label.

In one embodiment, the feature subnetwork includes a text feature subnetwork, the target attribute information associated with the text feature subnetwork includes a plurality of text attribute information, and the text feature subnetwork includes data processing channels to which respective text attribute information respectively correspond. The attribute information processing module is also used for respectively carrying out vectorization processing on the corresponding text attribute information through each data processing channel in the text feature sub-network to obtain a text feature vector output by each data processing channel; and obtaining attribute feature vectors output by the text feature sub-network based on the text feature vectors.

In one embodiment, the feature subnetwork comprises an atomic feature subnetwork. The attribute information processing module is also used for performing characteristic cross processing on target attribute information associated with the atomic feature sub-network through the atomic feature sub-network to obtain at least one cross feature vector; and obtaining the attribute feature vector output by the atomic feature subnetwork based on each cross feature vector.

In one embodiment, the target attribute information associated with the sub-network of atomic features includes at least two of user attribute information, image attribute information, language attribute information, and text statistical attribute information.

In one embodiment, the feature subnetwork includes a teletext feature subnetwork, the target attribute information associated with the teletext feature subnetwork includes text attribute information and image attribute information, and the teletext feature subnetwork includes a text data processing channel corresponding to the text attribute information and an image data processing channel corresponding to the image attribute information. The attribute information processing module is also used for coding the text attribute information through the text data processing channel to obtain an intermediate feature vector; coding the image attribute information through an image data processing channel to obtain an image characteristic vector; performing attention distribution processing on the intermediate feature vector based on the image feature vector to obtain a first image-text fusion feature vector; performing attention distribution processing on the image feature vector based on the intermediate feature vector to obtain a second image-text fusion feature vector; and obtaining the attribute feature vector output by the image-text fusion feature sub-network based on the first image-text fusion feature vector and the second image-text fusion feature vector.

In one embodiment, the attribute information processing module is further configured to perform word encoding processing on the text attribute information to obtain a word feature vector; and carrying out sentence coding processing on the word feature vector to obtain an intermediate feature vector.

In one embodiment, the feature subnetwork comprises a style feature subnetwork, and the target attribute information associated with the style feature subnetwork comprises style attribute information; the pattern feature subnetwork includes a first data processing channel and a second data processing channel. The attribute information processing module is also used for coding the style attribute information through the first data processing channel to obtain an initial feature vector, and performing attention allocation processing on the initial feature vector to obtain a first feature vector; performing convolution processing on the style attribute information through a second data processing channel to obtain a second feature vector; and obtaining the attribute feature vector output by the style feature sub-network based on the first feature vector and the second feature vector.

In one embodiment, each task subnetwork includes an expert layer, a gating layer, and a fusion layer; the task sub-networks share an expert layer. The label prediction module is further used for performing feature processing on each attribute feature vector through an expert layer in a current task sub-network to obtain a feature processing result, performing weighting processing on the feature processing result through a gate control layer to obtain an intermediate processing result, and performing fusion processing on the intermediate processing result through a fusion layer to obtain a prediction label of a task corresponding to the current task sub-network.

In one embodiment, the apparatus further comprises:

the model updating module is used for acquiring a target attribute information set and a verification label set of the verification multimedia resource; verifying that the multimedia resource is the recommended multimedia resource for updating; inputting a target attribute information set of the verified multimedia resource into the trained multimedia resource classification model to obtain a prediction label set corresponding to the verified multimedia resource; calculating classification accuracy based on a prediction label set and a verification label set corresponding to the verification multimedia resource; and when the classification accuracy is smaller than the accuracy threshold, updating the trained multimedia resource classification model based on the prediction label set and the training label set corresponding to the verified multimedia resource to obtain the updated multimedia resource classification model.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

acquiring a target attribute information set and a training label set of a training multimedia resource; the target attribute information set comprises target attribute information of multiple dimensions, and the training label set comprises training labels corresponding to multiple tasks;

inputting a target attribute information set of a training multimedia resource into a multimedia resource classification model to be trained; the multimedia resource classification model comprises a plurality of feature sub-networks and task sub-networks corresponding to the tasks respectively;

respectively carrying out vectorization processing on target attribute information associated with the feature sub-networks through each feature sub-network in the multimedia resource classification model to obtain attribute feature vectors output by each feature sub-network;

inputting each attribute feature vector into each task sub-network to obtain a prediction label corresponding to each task;

adjusting parameters of corresponding task sub-networks based on training labels and prediction labels corresponding to the same task, and adjusting model parameters of each characteristic sub-network based on the training labels and the prediction labels corresponding to each task until convergence conditions are met, so as to obtain a trained multimedia resource classification model; the multimedia resource classification model is used for classifying the quality of the multimedia resource to be recommended.

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

acquiring a target attribute information set and a training label set of a training multimedia resource; the target attribute information set comprises target attribute information of multiple dimensions, and the training label set comprises training labels corresponding to multiple tasks;

inputting a target attribute information set of a training multimedia resource into a multimedia resource classification model to be trained; the multimedia resource classification model comprises a plurality of feature sub-networks and task sub-networks corresponding to the tasks respectively;

respectively carrying out vectorization processing on target attribute information associated with the feature sub-networks through each feature sub-network in the multimedia resource classification model to obtain attribute feature vectors output by each feature sub-network;

inputting each attribute feature vector into each task sub-network to obtain a prediction label corresponding to each task;

adjusting parameters of corresponding task sub-networks based on training labels and prediction labels corresponding to the same task, and adjusting model parameters of each characteristic sub-network based on the training labels and the prediction labels corresponding to each task until convergence conditions are met, so as to obtain a trained multimedia resource classification model; the multimedia resource classification model is used for classifying the quality of the multimedia resource to be recommended.

The multimedia resource classification model training method, the device, the computer equipment and the storage medium are characterized in that a target attribute information set and a training label set for training multimedia resources are obtained, the target attribute information set comprises target attribute information with multiple dimensions, the training label set comprises training labels corresponding to multiple tasks, the target attribute information set for training the multimedia resources is input into a multimedia resource classification model to be trained, the multimedia resource classification model comprises multiple feature subnetworks and task subnetworks corresponding to the tasks respectively, vectorization processing is carried out on the target attribute information associated with the feature subnetworks through the feature subnetworks in the multimedia resource classification model to obtain attribute feature vectors output by the feature subnetworks, and the attribute feature vectors are input into the task subnetworks to obtain prediction labels corresponding to the tasks, and adjusting parameters of corresponding task sub-networks based on training labels and prediction labels corresponding to the same task, adjusting model parameters of each feature sub-network based on the training labels and the prediction labels corresponding to each task until convergence conditions are met, and obtaining a trained multimedia resource classification model, wherein the multimedia resource classification model is used for classifying the quality of the multimedia resources to be recommended. Therefore, the multimedia resource classification model can be supervised trained based on the target attribute information set and the training label set of the training multimedia resources, and the multimedia resource classification model capable of accurately classifying the quality of the multimedia resources to be recommended is obtained. The target attribute information set of the multimedia resources comprises target attribute information of multiple dimensions, the target attribute information of different dimensions can reflect the content quality of the multimedia resources from different angles, the target attribute information set is input into a multimedia resource classification model, the quality of the multimedia resources can be accurately classified by comprehensively considering the target attribute information of each dimension, and a prediction label capable of accurately embodying the quality of the multimedia resources is obtained. In addition, the multimedia resource classification model comprises a plurality of task sub-networks which are multi-task models and can predict the performance of multimedia resources on each task, when the model is trained, a plurality of related tasks simultaneously learn in parallel, gradients simultaneously and reversely propagate, and the connection and difference of different tasks are learned, so that the learning efficiency and quality of each task are improved. Finally, the trained multimedia resource classification model can be used for classifying the quality of the multimedia resources to be recommended, so that the multimedia resources with better quality can be recommended to the user, and the effectiveness of multimedia resource recommendation is improved. The effective multimedia resource recommendation can avoid repeated searching or repeated refreshing of the interface caused by low-quality and ineffective multimedia resource recommendation, and the repeated searching or repeated refreshing of the interface can occupy a large amount of computer equipment resources, so that the resource waste of the terminal or the server can be reduced on the basis of improving the effectiveness of the resource recommendation.

A method of multimedia resource recommendation, the method comprising:

acquiring a target attribute information set of a multimedia resource to be recommended; the target attribute information set comprises target attribute information of a plurality of dimensions;

inputting the target attribute information set into the trained multimedia resource classification model; the multimedia resource classification model comprises a plurality of feature sub-networks and a plurality of task sub-networks;

respectively carrying out vectorization processing on target attribute information associated with the feature sub-networks through each feature sub-network in the multimedia resource classification model to obtain attribute feature vectors output by each feature sub-network;

inputting each attribute feature vector into each task sub-network to obtain a prediction label output by each task sub-network;

obtaining a quality classification result corresponding to the multimedia resource to be recommended based on each prediction label;

and recommending the multimedia resources to be recommended based on the quality classification result.

A multimedia asset recommendation device, the device comprising:

the attribute information acquisition module is used for acquiring a target attribute information set of the multimedia resource to be recommended; the target attribute information set comprises target attribute information of a plurality of dimensions;

the attribute information input module is used for inputting the target attribute information set into the trained multimedia resource classification model; the multimedia resource classification model comprises a plurality of feature sub-networks and a plurality of task sub-networks;

the attribute information processing module is used for respectively carrying out vectorization processing on the target attribute information associated with the feature sub-networks through each feature sub-network in the multimedia resource classification model to obtain attribute feature vectors output by each feature sub-network;

the label prediction module is used for inputting each attribute feature vector into each task sub-network to obtain a prediction label output by each task sub-network;

the quality classification module is used for obtaining a quality classification result corresponding to the multimedia resource to be recommended based on each prediction label;

and the resource recommending module is used for recommending the multimedia resources to be recommended based on the quality classification result.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

acquiring a target attribute information set of a multimedia resource to be recommended; the target attribute information set comprises target attribute information of a plurality of dimensions;

inputting the target attribute information set into the trained multimedia resource classification model; the multimedia resource classification model comprises a plurality of feature sub-networks and a plurality of task sub-networks;

respectively carrying out vectorization processing on target attribute information associated with the feature sub-networks through each feature sub-network in the multimedia resource classification model to obtain attribute feature vectors output by each feature sub-network;

inputting each attribute feature vector into each task sub-network to obtain a prediction label output by each task sub-network;

obtaining a quality classification result corresponding to the multimedia resource to be recommended based on each prediction label;

and recommending the multimedia resources to be recommended based on the quality classification result.

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

acquiring a target attribute information set of a multimedia resource to be recommended; the target attribute information set comprises target attribute information of a plurality of dimensions;

inputting the target attribute information set into the trained multimedia resource classification model; the multimedia resource classification model comprises a plurality of feature sub-networks and a plurality of task sub-networks;

respectively carrying out vectorization processing on target attribute information associated with the feature sub-networks through each feature sub-network in the multimedia resource classification model to obtain attribute feature vectors output by each feature sub-network;

inputting each attribute feature vector into each task sub-network to obtain a prediction label output by each task sub-network;

obtaining a quality classification result corresponding to the multimedia resource to be recommended based on each prediction label;

and recommending the multimedia resources to be recommended based on the quality classification result.

The multimedia resource classification model training method, the device, the computer equipment and the storage medium are characterized in that a target attribute information set of the multimedia resource to be recommended is obtained, the target attribute information set comprises target attribute information with multiple dimensions, the target attribute information set is input into the trained multimedia resource classification model, the multimedia resource classification model comprises a plurality of feature subnetworks and a plurality of task subnetworks, respectively carrying out vectorization processing on target attribute information associated with the feature sub-networks through each feature sub-network in the multimedia resource classification model to obtain attribute feature vectors output by each feature sub-network, inputting each attribute feature vector into each task sub-network to obtain prediction labels output by each task sub-network, obtaining quality classification results corresponding to the multimedia resources to be recommended based on each prediction label, and recommending the multimedia resources to be recommended based on the quality classification results. Therefore, the target attribute information set of the multimedia resources comprises the target attribute information with multiple dimensions, the target attribute information with different dimensions can reflect the content quality of the multimedia resources from different angles, the target attribute information set is input into the multimedia resource classification model, the quality of the multimedia resources can be accurately classified by comprehensively considering the target attribute information with all dimensions, and accurate quality classification results are obtained. In addition, the multimedia resource classification model comprises a plurality of task sub-networks, is a multi-task model, can predict the performance of multimedia resources on each task, synthesizes the performance of the multimedia resources on each task to obtain the quality classification result of the multimedia resources, and can further improve the accuracy of the quality classification of the multimedia resources. Finally, the multimedia resources with better quality can be identified through the multimedia resource classification model, so that the multimedia resources with better quality can be recommended to the user, and the effectiveness of multimedia resource recommendation is improved. The effective multimedia resource recommendation can avoid repeated searching or repeated refreshing of the interface caused by low-quality and ineffective multimedia resource recommendation, and the repeated searching or repeated refreshing of the interface can occupy a large amount of computer equipment resources, so that the resource waste of the terminal or the server can be reduced on the basis of improving the effectiveness of the resource recommendation.

Drawings

FIG. 1 is a diagram of an exemplary implementation of a multimedia resource classification model training method and a multimedia resource recommendation method;

FIG. 2 is a flowchart illustrating a method for training a multimedia resource classification model according to an embodiment;

FIG. 3 is a schematic diagram of a process for obtaining training labels for training multimedia resources according to an embodiment;

FIG. 4 is a flow diagram illustrating the determination of a quality label for a historical multimedia asset based on recommended interaction information and corresponding reference interaction information, under an embodiment;

FIG. 5 is a diagram illustrating a structure of a sub-network of textual features in one embodiment;

FIG. 6 is a schematic diagram of an embodiment of a sub-network of atomic features;

FIG. 7 is a diagram illustrating the structure of a sub-network of the teletext fusion feature in one embodiment;

FIG. 8 is a schematic diagram of a style feature subnetwork in one embodiment;

FIG. 9A is a diagram illustrating the structure of a task sub-network in one embodiment;

FIG. 9B is a diagram illustrating the structure of a task sub-network in accordance with another embodiment;

FIG. 9C is a diagram showing the structure of a task sub-network in yet another embodiment;

FIG. 10 is a flowchart illustrating an embodiment of updating a multimedia asset classification model;

FIG. 11 is a schematic flow chart illustrating training and updating of a multimedia asset classification model according to an embodiment;

FIG. 12 is a flowchart illustrating a method for recommending multimedia resources according to an embodiment;

FIG. 13A is a flow diagram that illustrates the recommendation of premium content, under an embodiment;

FIG. 13B is a diagram illustrating an exemplary structure of a teletext content classification model;

FIG. 13C is a diagram illustrating an interface of a graphical content recommendation interface in one embodiment;

FIG. 14 is a block diagram showing the structure of an apparatus for training a classification model of multimedia resources according to an embodiment;

FIG. 15 is a block diagram showing the structure of an apparatus for training a multimedia resource classification model according to an embodiment;

FIG. 16 is a block diagram of an apparatus for recommending multimedia resources according to an embodiment;

FIG. 17 is a diagram showing an internal structure of a computer device in one embodiment;

FIG. 18 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face recognition and fingerprint recognition.

Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

The scheme provided by the embodiment of the application relates to the technologies of artificial intelligence, such as computer vision, natural language processing, machine learning and the like, and is specifically explained by the following embodiments:

the multimedia resource classification model training method and the multimedia resource recommendation method provided by the application can be applied to the application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be implemented by a stand-alone server or a server cluster composed of a plurality of servers.

The terminal 102 and the server 104 can be used separately to execute the multimedia resource classification model training method and the multimedia resource recommendation method provided in the embodiments of the present application.

For example, the server 104 obtains a target attribute information set and a training label set for training the multimedia resource, where the target attribute information set includes target attribute information of multiple dimensions, the training label set includes training labels corresponding to multiple tasks, and the target attribute information set for training the multimedia resource is input into a multimedia resource classification model to be trained, where the multimedia resource classification model includes multiple feature subnetworks and task subnetworks corresponding to the tasks, respectively. And respectively carrying out vectorization processing on the target attribute information associated with the feature sub-networks through each feature sub-network in the multimedia resource classification model to obtain attribute feature vectors output by each feature sub-network, and inputting each attribute feature vector into each task sub-network to obtain a prediction label corresponding to each task. The server 104 may adjust parameters of corresponding task sub-networks based on training labels and prediction labels corresponding to the same task, and adjust model parameters of each feature sub-network based on training labels and prediction labels corresponding to each task until a convergence condition is satisfied, to obtain a trained multimedia resource classification model, where the multimedia resource classification model is used to classify the quality of the multimedia resource to be recommended.

The terminal 102 obtains a target attribute information set of the multimedia resource to be recommended, the target attribute information set comprises target attribute information of multiple dimensions, the target attribute information set is input into a trained multimedia resource classification model, and the multimedia resource classification model comprises a plurality of feature sub-networks and a plurality of task sub-networks. Respectively carrying out vectorization processing on target attribute information associated with the feature sub-networks through each feature sub-network in the multimedia resource classification model to obtain attribute feature vectors output by each feature sub-network, inputting each attribute feature vector into each task sub-network to obtain a prediction label output by each task sub-network, and obtaining a quality classification result corresponding to the multimedia resource to be recommended based on each prediction label. The terminal 102 may recommend the multimedia resource to be recommended based on the quality classification result.

The terminal 102 and the server 104 may also be cooperatively used to execute the multimedia resource classification model training method and the multimedia resource recommendation method provided in the embodiments of the present application.

For example, the server 104 obtains a target attribute information set and a training label set of a training multimedia resource from the terminal 102, and the server 104 performs model training on a multimedia resource classification model based on the target attribute information set and the training label set of the training multimedia resource to obtain a trained multimedia resource classification model.

The terminal 102 obtains the trained multimedia asset classification model from the server 104. The terminal 102 classifies the quality of the multimedia resource to be recommended through the trained multimedia resource classification model to obtain a quality classification result of the multimedia resource to be recommended, and therefore the multimedia resource to be recommended is recommended based on the quality classification result.

In one embodiment, as shown in fig. 2, a multimedia resource classification model training method is provided, which is described by taking the method as an example applied to the computer device in fig. 1, where the computer device may be the terminal 102 or the server 104 in fig. 1. Referring to fig. 2, the multimedia resource classification model training method includes the following steps:

step S202, acquiring a target attribute information set and a training label set of a training multimedia resource; the target attribute information set comprises target attribute information of multiple dimensions, and the training label set comprises training labels corresponding to multiple tasks.

The multimedia resource refers to a resource including at least two media, for example, an article including a picture, a picture including a text, a video including a subtitle, a video including an audio, and the like. Users can publish multimedia resources on various resource service platforms, for example, short-lived videos on social applications and news information on information applications. Training a multimedia asset refers to a multimedia asset used for model training. The target attribute information is information for describing an attribute, a characteristic, and a function of the multimedia asset. The target attribute information may specifically be text related information, image related information, style related information, user related information of a publishing user of the multimedia resource, and the like in the multimedia resource.

The training labels refer to quality labels for training the multimedia resources, and the quality labels are used for measuring the quality of the multimedia resources, for example, the quality labels may include positive labels and negative labels, the positive labels indicate that the multimedia resources perform better on the corresponding task and are high-quality multimedia resources on the task, and the negative labels indicate that the multimedia resources perform generally or poorly on the corresponding task and are not high-quality multimedia resources on the task. The quality label may also include a first label, a second label, a third label, and the like. Different labels represent different quality levels, which may reflect the quality of the multimedia asset, e.g. the better the multimedia asset performs on the corresponding task, the higher the corresponding quality level on the task. A task corresponds to an interactive behavior of the multimedia resource and the user, such as a task related to click behavior, a task related to browsing behavior, and a task related to comment behavior. Then, the training label set may include a quality label corresponding to the click rate task, a quality label corresponding to the browsing duration task, and a quality label corresponding to the review rate task.

Specifically, the computer device may obtain a training multimedia resource from a multimedia resource database, and determine a training label set corresponding to the training multimedia resource. Further, the computer device may perform content analysis on the training multimedia resource to obtain a target attribute information set of the training multimedia resource. For example, a text title, a text body, image quality, and content layout of the multimedia resource are used as target attribute information to form a target attribute information set.

In one embodiment, the training multimedia asset may be an already published, already recommended multimedia asset. Then, from the perspective of user feedback, a reasonable evaluation system of the multimedia resource quality can be constructed, and a training label set for training the multimedia resource is determined based on the evaluation system. For example, the training labels for training the multimedia resources may be determined based on feedback information, evaluation information, of a large number of users to the multimedia resources. For example, the quality tag set of the multimedia resource is determined based on the click rate, browsing duration and comment rate of the multimedia resource. Furthermore, a reasonable evaluation system of the multimedia resource quality can be constructed by combining the content and the user feedback. Of course, the training multimedia asset may also be a multimedia asset that has not yet been released or recommended. Then, from the perspective of the content itself, a reasonable evaluation system of the multimedia resource quality can be constructed, and a training label set for training the multimedia resource is determined based on the evaluation system. For example, the training labels for training the multimedia asset may be quality labels artificially determined based on expert knowledge of an expert. The training labels for training the multimedia resources can also be determined according to the quality labels of published multimedia resources with similar contents. For example, in the click rate task, it is known that the quality label of the multimedia resource 1 is a positive label, and when the content similarity between the multimedia resource 1 and the multimedia resource 2 is greater than the preset threshold, that is, the content of the multimedia resource 1 is similar to the content of the multimedia resource 2, it is determined that the quality label of the multimedia resource 2 in the click rate task is also the positive label.

Step S204, inputting a target attribute information set of the training multimedia resource into a multimedia resource classification model to be trained; the multimedia resource classification model comprises a plurality of feature sub-networks and task sub-networks corresponding to the tasks respectively.

The multimedia resource classification model is a machine learning model for classifying the quality of multimedia resources, that is, for identifying high-quality multimedia resources. The multimedia resource classification model includes a plurality of feature subnetworks and a plurality of task subnetworks. The feature sub-networks are used for converting the target attribute information into attribute feature vectors, and different feature sub-networks are used for processing different target attribute information. The task sub-networks are used for predicting the performance of the multimedia resource on a specific task based on the attribute feature vector, and different task sub-networks correspond to different tasks.

And step S206, respectively carrying out vectorization processing on the target attribute information associated with the feature sub-networks through each feature sub-network in the multimedia resource classification model to obtain the attribute feature vectors output by each feature sub-network.

Specifically, the computer device may input a target attribute information set for training the multimedia resource into a multimedia resource classification model to be trained, each feature subnetwork in the multimedia resource classification model may receive corresponding target attribute information, and each feature subnetwork separately performs vectorization processing on the target attribute information associated with itself, thereby outputting a corresponding attribute feature vector. The vectorization is to represent the target attribute information by dense vectors, so that the model can learn related information and related knowledge more fully. The vectorization processing of the target attribute information by the respective feature sub-networks may be the same or different.

In one embodiment, the computer device may convert the target attribute information into an original feature vector, input the original feature vector into a feature sub-network, and perform a dense representation on the original feature vector through the feature sub-network to obtain the attribute feature vector. The specific way for converting the target attribute information into the original feature vector can be through One Hot coding.

In one embodiment, the multimedia resource classification model includes at least two of a text feature sub-network, an atomic feature sub-network, a text-fusion feature sub-network, and a style feature sub-network. The text feature sub-network is used for processing target attribute information corresponding to the multimedia resources and related to texts. The atomic feature subnetwork is used for processing the corresponding atomic features of the multimedia resources, and the atomic features refer to the smallest and inseparable features. The image-text fusion feature sub-network is used for processing the target attribute information corresponding to the multimedia resource and related to the text and the target attribute information related to the image, and fusing the two target attribute information. The style feature sub-network is used for processing target attribute information corresponding to the multimedia resource and related to the content style. Each feature subnetwork is used for processing target attribute information with different dimensions, and each finally obtained attribute feature vector can represent highly refined content information of multimedia resources from different angles.

And step S208, inputting each attribute feature vector into each task sub-network to obtain a prediction label corresponding to each task.

Specifically, after obtaining the attribute feature vectors respectively output by each feature sub-network, the computer device may input the attribute feature vectors into each task sub-network together, and each task sub-network performs data processing on the input data to obtain the corresponding prediction tag. For example, the multimedia resource classification model includes three feature subnetworks and two task subnetworks, the feature subnetwork 1 outputs the attribute feature vector 1, the feature subnetwork 2 outputs the attribute feature vector 2, and the feature subnetwork 3 outputs the attribute feature vector 3, and the attribute feature vectors 1 to 3 can be input into the task subnetwork 1 to obtain the prediction tag 1, and the attribute feature vectors 1 to 3 can be input into the task subnetwork 2 to obtain the prediction tag 2. Of course, in order to improve the learning efficiency, the computer device may input each attribute feature vector into a corresponding task sub-network, and each task sub-network may perform data processing on the input data to obtain a corresponding task prediction result. For example, the attribute feature vector 1 and the attribute feature vector 2 may be input to the task subnetwork 1 to obtain the prediction tag 1, and the attribute feature vector 2 and the attribute feature vector 3 may be input to the task subnetwork 2 to obtain the prediction tag 2.

In one embodiment, each task subnetwork is used to predict user feedback information for different categories of multimedia assets. For example, a task sub-network for predicting a click-through rate of a multimedia resource, a task sub-network for predicting a browsing duration of the multimedia resource, a task sub-network for predicting a criticizing rate of the multimedia resource, a task sub-network for predicting a forwarding rate of the multimedia resource, and the like.

And S210, adjusting parameters of corresponding task sub-networks based on training labels and prediction labels corresponding to the same task, and adjusting model parameters of each feature sub-network based on the training labels and the prediction labels corresponding to each task until convergence conditions are met to obtain a trained multimedia resource classification model, wherein the multimedia resource classification model is used for classifying the quality of multimedia resources to be recommended.

Specifically, the computer device may calculate training loss values based on training labels and prediction labels corresponding to the same task to obtain training loss values corresponding to the tasks, perform back propagation simultaneously based on the training loss values, and adjust model parameters of the multimedia resource classification model until a convergence condition is satisfied to obtain a trained multimedia resource classification model. When the model parameters are adjusted, the parameters of the corresponding task sub-networks are adjusted based on the training labels and the prediction labels corresponding to the same task, and the model parameters of the characteristic sub-networks are adjusted based on the training labels and the prediction labels corresponding to the tasks. For example, the multimedia resource classification model comprises a task sub-network corresponding to the click-through rate task and a task sub-network corresponding to the browsing duration task. And adjusting model parameters of a task sub-network corresponding to the click rate task based on a training label and a prediction label corresponding to the click rate task, adjusting model parameters of a task sub-network corresponding to the browsing duration task based on a training label and a prediction label corresponding to the browsing duration task, and adjusting model parameters of each feature sub-network based on a training label and a prediction label corresponding to the click rate task and a training label and a prediction label corresponding to the browsing duration task. The convergence condition may be that the number of iterations of the model reaches an iteration threshold, and the training loss value corresponding to each task sub-network is smaller than a preset threshold.

In one embodiment, the multimedia asset classification model is a multitasking model. The multitask model is a machine learning model based on multitask learning. Multi-task learning is a machine learning method that learns by putting multiple related tasks together based on a shared representation (shared representation). The multi-task learning is also a derivation transfer learning method, and a main task (main tasks) uses domain-related information possessed by a training signal of a related task (related tasks) as a derived bias (derived bias) to improve the generalization effect (generation performance) of the main task (main tasks). The multi-task learning relates to the simultaneous parallel learning of a plurality of related tasks, the simultaneous back propagation of gradients, and the plurality of tasks mutually help the learning through the shared representation (shared representation) of the bottom layer so as to improve the generalization effect. The essence of multi-task learning is an inductive migration mechanism, which utilizes additional information sources to improve the learning performance of the current task, including improving the generalization accuracy, learning rate and understandability of the learned model. The multitasking model may be a Hard-parameter sharing model (Hard-parameter sharing), an MOE model (texture-of-Properties), an MMOE model (Multi-gate texture-of-Properties), and the like. The main task of the multimedia resource classification model is quality classification of multimedia resources, that is, obtaining a quality classification result of the multimedia resources. The related tasks of the main task of the multimedia resource classification model are tasks corresponding to each task sub-network, for example, the click rate of the multimedia resource is predicted, the browsing time of the multimedia resource is predicted, and the like.

The trained multimedia resource classification model can be used for classifying the quality of the multimedia resources to be recommended and identifying high-quality multimedia resources. When the multimedia resource recommendation is carried out, obvious low-quality multimedia resources can be filtered out firstly, then high-quality multimedia resource identification is carried out through a multimedia resource classification model, and the identified high-quality multimedia resources are preferentially recommended to the user, so that the multimedia resource recommendation effectiveness is improved. When high-quality multimedia resources are identified through the multimedia resource classification model, target attribute information sets of the multimedia resources to be recommended are input into the multimedia resource classification model, each task sub-network in the multimedia resource classification model can output corresponding prediction labels, and the multimedia resource classification model outputs a final quality classification result based on each prediction label. For example, when all the prediction labels are positive labels, the quality classification result may be output as a positive label. Or, when most of the prediction labels are positive labels, the output quality classification result is the positive label.

In the multimedia resource classification model training method, the multimedia resource classification model can be supervised and trained based on the target attribute information set and the training label set of the training multimedia resources, so that the multimedia resource classification model capable of accurately classifying the quality of the multimedia resources to be recommended is obtained. The target attribute information set of the multimedia resources comprises target attribute information of multiple dimensions, the target attribute information of different dimensions can reflect the content quality of the multimedia resources from different angles, the target attribute information set is input into a multimedia resource classification model, the quality of the multimedia resources can be accurately classified by comprehensively considering the target attribute information of each dimension, and a prediction label capable of accurately embodying the quality of the multimedia resources is obtained. In addition, the multimedia resource classification model comprises a plurality of task sub-networks which are multi-task models and can predict the performance of multimedia resources on each task, when the model is trained, a plurality of related tasks simultaneously learn in parallel, gradients simultaneously and reversely propagate, and the connection and difference of different tasks are learned, so that the learning efficiency and quality of each task are improved. Finally, the trained multimedia resource classification model can be used for classifying the quality of the multimedia resources to be recommended, so that the multimedia resources with better quality can be recommended to the user, and the effectiveness of multimedia resource recommendation is improved. The effective multimedia resource recommendation can avoid repeated searching or repeated refreshing of the interface caused by low-quality and ineffective multimedia resource recommendation, and the repeated searching or repeated refreshing of the interface can occupy a large amount of computer equipment resources, so that the resource waste of the terminal or the server can be reduced on the basis of improving the effectiveness of the resource recommendation.

In one embodiment, as shown in fig. 3, obtaining a training label set for training a multimedia asset comprises:

step S302, acquiring a recommendation interaction information set corresponding to a plurality of historical multimedia resources respectively; the recommendation interaction information set comprises recommendation interaction information corresponding to each task.

The historical multimedia resources refer to published and recommended multimedia resources. The recommended interactive information refers to information generated by an ordinary user and a publishing user of the multimedia resource through an interactive behavior after the multimedia resource is published, that is, feedback information of the ordinary user on the multimedia resource. The interaction behavior may specifically be that a common user browses, approves, reviews, forwards, and the like, the multimedia resource issued by the issuing user. The recommended interaction information specifically comprises information of click rate, browsing duration, comment rate, praise rate, click-on rate, forwarding rate and the like of the multimedia resources, and it can be understood that the recommended interaction information between every two pieces of the recommended interaction information can be regarded as recommended interaction information corresponding to different tasks.

Specifically, the computer device may obtain a plurality of historical multimedia resources, and obtain a set of recommended interaction information corresponding to each historical multimedia resource in the same time period. Each recommendation interaction information set comprises recommendation interaction information corresponding to a plurality of tasks.

Step S304, counting each recommended interaction information corresponding to the same task to obtain reference interaction information corresponding to each task.

The reference interaction information refers to a statistical result of a plurality of pieces of recommended interaction information corresponding to the same task, and can reflect an average level of the plurality of pieces of recommended interaction information corresponding to the same task. For example, the average value of each piece of recommended interaction information corresponding to the same task, and the median value of each piece of recommended interaction information corresponding to the same task.

Specifically, the computer device may obtain recommended interaction information corresponding to the same task from each recommended interaction information set, and perform statistics on the recommended interaction information corresponding to the same task to obtain reference interaction information corresponding to each task. For example, the computer device obtains the click rate of each historical multimedia resource, calculates the average value of the click rate, obtains the browsing duration of each multimedia resource, and calculates the average value of the browsing duration.

Step S306, classifying the quality of the historical multimedia resources based on the recommended interaction information and the corresponding reference interaction information to obtain a quality label set corresponding to each historical multimedia resource.

Specifically, in the recommended interaction information sets corresponding to the same historical multimedia resource, the computer device may classify the quality of the historical multimedia resource under each task based on a comparison result between the recommended interaction information corresponding to the same task and the reference interaction information, to obtain a quality label set corresponding to the current historical multimedia resource, and further obtain a quality label set corresponding to each historical multimedia resource. For example, the recommended interaction information set includes a click-through rate and a browsing duration. And when the click rate of the historical multimedia resource 1 is greater than the average click rate, determining that the quality label of the historical multimedia resource 1 corresponding to the click rate task is a positive label. And when the browsing time of the historical multimedia resource 1 is greater than the average value of the browsing time, determining that the quality label corresponding to the task of the historical multimedia resource 1 in the browsing time is a positive label. The quality label set of the historical multimedia resource 1 comprises positive labels corresponding to click rate tasks and positive labels corresponding to browsing duration tasks.

Step S308, training multimedia resources and corresponding training label sets are obtained based on the historical multimedia resources and the corresponding quality label sets.

Specifically, the computer device may determine, based on the recommended interaction information sets respectively corresponding to a large number of historical multimedia resources, quality label sets respectively corresponding to a large number of historical multimedia resources. The computer equipment can select a part of historical multimedia resources from a large number of historical multimedia resources as training multimedia resources, and train a multimedia resource classification model by using the training multimedia resources and the corresponding training label set. Furthermore, the computer equipment can also select another part of historical multimedia resources as verification multimedia resources, and the verification multimedia resources are used for verifying the classification accuracy of the trained multimedia resource classification model. If the classification accuracy is low, the computer device may update the multimedia resource classification model based on the latest relevant information of the multimedia resource.

In this embodiment, a recommended interaction information set of a large number of historical multimedia resources is statistically analyzed, and a quality label set of the historical multimedia resources is determined according to a statistical analysis result, so as to obtain training data of a multimedia resource classification model. Therefore, the quality of the multimedia resources is classified based on the feedback information of the multimedia resources by the user, and the multimedia resources which are interested and concerned by the user are determined as high-quality multimedia resources, so that the multimedia resources which are interested and concerned by the user can be predicted by the finally trained multimedia resource classification model, the multimedia resources are recommended to the user, and the effectiveness of resource recommendation can be improved.

In an embodiment, as shown in fig. 4, classifying the quality of the historical multimedia resources based on the recommended interaction information and the corresponding reference interaction information to obtain a quality label set corresponding to each historical multimedia resource includes:

step S402, comparing the recommendation interaction degree corresponding to the recommendation interaction information of the same task with the reference interaction degree corresponding to the reference interaction information in the recommendation interaction information set corresponding to the same historical multimedia resource.

The interaction degree refers to the normalized data obtained by converting the relevant interaction information into the forward dimension so as to compare the data. The recommended interaction degree refers to the interaction degree obtained by converting recommended interaction information. The reference interaction degree refers to the interaction degree obtained by converting the reference interaction information.

Specifically, the recommendation interaction information corresponding to each task is converted into a recommendation interaction degree through a custom formula, and the reference interaction information corresponding to each task is converted into a reference interaction degree through the custom formula. For example, when the recommended interaction information is the click rate, it can be understood that the higher the click rate of the multimedia resource is, the better the quality of the multimedia resource is, and the more the user is interested in the multimedia resource, so that the click rate can be directly converted into a percentile click score. Thus, the higher the click score of a multimedia resource, the better the quality of the multimedia resource, and the more interesting the user is to the multimedia resource. Similarly, the average click rate may be converted into a percentile click average score. When the recommended interaction information is the click-on rate, it can be understood that the higher the click-on rate of the multimedia resource is, the lower the quality of the multimedia resource is, and the less the user is interested in the multimedia resource, then the click-on rate may be first converted into a percentile initial score, and the difference between 100 and the initial score is taken as the click-on score. Thus, the higher the click-on score of the multimedia resource, the better the quality of the multimedia resource, and the more interesting the user is in the multimedia resource. Furthermore, in the recommended interaction information set corresponding to the same historical multimedia resource, the computer device can directly compare the recommended interaction degree corresponding to the recommended interaction information of the same task with the reference interaction degree corresponding to the reference interaction information to obtain the quality label corresponding to each task.

Step S404, determining the quality label of the task corresponding to the recommended interaction information with the recommended interaction degree larger than the reference interaction degree as a positive label.

Step S406, determining the quality label of the task corresponding to the recommended interaction information with the recommended interaction degree smaller than the reference interaction degree as a negative label.

Specifically, the computer device may determine, as a positive tag, a quality tag of a task corresponding to recommended interaction information whose recommended interaction degree is greater than the reference interaction degree, and determine, as a negative tag, a quality tag of a task corresponding to recommended interaction information whose recommended interaction degree is less than the reference interaction degree. For example, the recommended interaction information set of the historical multimedia resources comprises click through rate, browsing duration and forwarding rate. And when the click interaction degree of the historical multimedia resource 1 is greater than the reference click interaction degree, determining that the quality label corresponding to the click rate task of the historical multimedia resource 1 is a positive label. And when the browsing duration interaction degree of the historical multimedia resource 1 is greater than the reference browsing duration interaction degree, determining that the quality label corresponding to the browsing duration task of the historical multimedia resource 1 is a positive label. And when the forwarding rate interaction degree of the historical multimedia resource 1 is smaller than the reference forwarding rate interaction degree, determining that the quality label corresponding to the forwarding rate task of the historical multimedia resource 1 is a negative label. The quality label set of the historical multimedia resource 1 comprises a positive label corresponding to the click rate task, a positive label corresponding to the browsing duration task and a negative label corresponding to the forwarding rate task.

It is understood that the statistical intervals corresponding to the labels of the quality levels may also be set based on the reference interactivity. In one task, a label corresponding to a target statistical interval in which the recommendation interaction degree of the historical multimedia resource falls is used as a quality label corresponding to the historical multimedia resource in the task.

In this embodiment, by comparing the recommended interaction degree corresponding to the recommended interaction information of the same task with the reference interaction degree corresponding to the reference interaction information, the quality label corresponding to the historical multimedia resource under each task can be quickly determined.

In one embodiment, the feature subnetwork includes a text feature subnetwork, the target attribute information associated with the text feature subnetwork includes a plurality of text attribute information, and the text feature subnetwork includes data processing channels to which respective text attribute information respectively correspond. The method comprises the following steps of respectively carrying out vectorization processing on target attribute information associated with the feature sub-networks through each feature sub-network in the multimedia resource classification model to obtain attribute feature vectors output by each feature sub-network, wherein the vectorization processing comprises the following steps: vectorizing the corresponding text attribute information through each data processing channel in the text feature sub-network to obtain a text feature vector output by each data processing channel; and obtaining attribute feature vectors output by the text feature sub-network based on the text feature vectors.

The text attribute information refers to attribute information related to text in the multimedia resource, such as a text title, a text label, a text body, and the like. Text labels may be the subject, category, keyword, etc. of the text.

In particular, the multimedia resource classification model comprises a sub-network of text features for processing text attribute information. And after the target attribute information set is input into the multimedia resource classification model, the text attribute information in the target attribute information set can be input into the text feature sub-network. The text feature sub-network comprises data processing channels corresponding to the text attribute information respectively, and the text feature sub-network can carry out vectorization processing on the corresponding text attribute information through the data processing channels respectively to obtain text feature vectors. Furthermore, the attribute feature vector output by the text feature sub-network can be obtained based on each text feature vector, and specifically, each text feature vector can be spliced to obtain a corresponding attribute feature vector.

Referring to FIG. 5, the target attribute information associated with the sub-network of text features includes a text title, a text label, and a text body. The text feature sub-network can perform vectorization processing on the text titles through the data processing channel to obtain a first text feature vector, perform vectorization processing on the text labels through the data processing channel two pairs to obtain a second text feature vector, perform vectorization processing on the text texts through the data processing channel three to obtain a third text feature vector, and then splice the first text feature vector, the second text feature vector and the third text feature vector to obtain an attribute feature vector.

In this embodiment, different text attribute information is processed through different data processing channels, so that specificity and accuracy of data processing can be improved, and the attribute feature vectors obtained based on the accurate text feature vectors can comprehensively and accurately represent text information of multimedia resources.

In one embodiment, the feature subnetwork comprises an atomic feature subnetwork; the method comprises the following steps of respectively carrying out vectorization processing on target attribute information associated with the feature sub-networks through each feature sub-network in the multimedia resource classification model to obtain attribute feature vectors output by each feature sub-network, wherein the vectorization processing comprises the following steps: performing characteristic cross processing on target attribute information associated with the atom characteristic sub-network through the atom characteristic sub-network to obtain at least one cross characteristic vector; and obtaining the attribute feature vector output by the atomic feature subnetwork based on each cross feature vector.

Specifically, the multimedia resource classification model includes an atomic feature subnetwork for processing target attribute information of an atomic feature class. And inputting the target attribute information set into the multimedia resource classification model, and inputting the target attribute information of the atomic feature class in the target attribute information set into the atomic feature sub-network. The atomic feature subnetwork can perform feature cross processing on each target attribute information to obtain at least one cross feature vector. And performing feature cross processing on each pair of target attribute information to obtain cross feature vectors. The cross feature vector may reflect a correlation between the target attribute information.

In one embodiment, the target attribute information associated with the sub-network of atomic features includes at least two of user attribute information, image attribute information, language attribute information, and text statistical attribute information.

Specifically, the target attribute information associated with the sub-network of atomic features includes various atomic features of the multimedia resource, specifically including at least two of user attribute information, image attribute information, language attribute information, and text statistical attribute information. The user attribute information refers to user attribute information of a publishing user of the multimedia resource, and specifically may be account related information of the publishing user, such as an account level, an account verticality, an account authority, and the like. The image attribute information refers to image related information in the multimedia resource, such as the number of images, the sharpness and the beauty of the images, and the like. The language attribute information refers to information related to a text language and a word in a multimedia resource, for example, an article retrieval method (for example, a comparison sentence and a metaphor sentence number), an ancient poem reference condition, the diversity of the whole morphology, the diversity of syntax, and the like. The text statistical attribute information refers to information obtained by performing statistics on text content in the multimedia resource, such as text length and height, article title quality degree, article title and text matching degree, text composition and the like. Partial atomic features can be obtained directly, and partial atomic features can be obtained through statistics of corresponding software tools. It can be appreciated that multimedia assets published by users of high-level accounts tend to be more attractive and better quality. The rich and exquisite multimedia resources of the matching map are attractive and high-quality. The word alga gorgeous multimedia resource is often attractive and high-quality. In addition to being able to characterize the quality of multimedia assets individually, the various atomic features may also act in concert to maximize the quality of characterizing multimedia assets.

Referring to FIG. 6, the target attribute information associated with the sub-network of atomic features includes user attribute information, image attribute information, language attribute information, and text statistical attribute information. The atomic feature sub-network can perform feature cross processing on each target attribute information pairwise to obtain a plurality of cross feature vectors, and the cross feature vectors are spliced to obtain the attribute feature vectors.

In this embodiment, feature cross processing is performed on the target attribute information two by two, so that the combined features of the target attribute information can be effectively learned, and the attribute feature vectors obtained based on the cross feature vectors can comprehensively and accurately represent the atomic features of the multimedia resources.

In one embodiment, the feature subnetwork includes a teletext feature subnetwork, the target attribute information associated with the teletext feature subnetwork includes text attribute information and image attribute information, and the teletext feature subnetwork includes a text data processing channel corresponding to the text attribute information and an image data processing channel corresponding to the image attribute information. The method comprises the following steps of respectively carrying out vectorization processing on target attribute information associated with the feature sub-networks through each feature sub-network in the multimedia resource classification model to obtain attribute feature vectors output by each feature sub-network, wherein the vectorization processing comprises the following steps: encoding the text attribute information through a text data processing channel to obtain an intermediate feature vector; coding the image attribute information through an image data processing channel to obtain an image characteristic vector; performing attention distribution processing on the intermediate feature vector based on the image feature vector to obtain a first image-text fusion feature vector; performing attention distribution processing on the image feature vector based on the intermediate feature vector to obtain a second image-text fusion feature vector; and obtaining the attribute feature vector output by the image-text fusion feature sub-network based on the first image-text fusion feature vector and the second image-text fusion feature vector.

Specifically, the multimedia resource classification model includes a sub-network of teletext features for processing text attribute information and image attribute information. And after the target attribute information set is input into the multimedia resource classification model, the text attribute information and the image attribute information in the target attribute information set are input into the image-text fusion feature sub-network. The image-text fusion feature sub-network comprises a text data processing channel corresponding to the text attribute information and an image data processing channel corresponding to the image attribute information. The text data processing channel can encode the text attribute information to obtain an intermediate feature vector. The image data processing channel can perform coding processing on the image attribute information to obtain an image feature vector.

Further, the image may affect the text. Therefore, the computer device can perform attention allocation processing on the intermediate feature vector based on the image feature vector to obtain a first image-text fusion feature vector. The attention allocation process is performed on the intermediate feature vector to allocate different attention weights to the sentences, and the attention weights of the sentences can reflect the importance of the sentences in the multimedia resources. The first image-text fusion feature vector is obtained by weighting and summing the feature vectors corresponding to all sentences.

Text may also affect the image. The computer device can perform attention allocation processing on the image feature vector based on the intermediate feature vector to obtain a second image-text fusion feature vector. The attention allocation processing is performed on the image feature vectors for allocating different attention weights to the respective images, and the attention weights of the images can reflect the importance of the images in multimedia resources. The second image-text fusion feature vector is obtained by performing weighted summation on the feature vectors corresponding to the images.

Finally, the computer device may obtain the attribute feature vector output by the sub-network of image-text fusion features based on the first image-text fusion feature vector and the second image-text fusion feature vector, and specifically may splice the first image-text fusion feature vector and the second image-text fusion feature vector to obtain the corresponding attribute feature vector.

The image-text fusion sub-network is a network based on image-text multi-mode machine learning. Multimodal machine learning refers to the ability to process and understand multi-source modal information through a method of machine learning. The single-mode representation learning is responsible for representing information as a numerical vector which can be processed by a computer or further abstracting the information into a higher-level feature vector, and the multi-mode representation learning is to remove redundancy among the modes by utilizing complementarity among the multiple modes so as to learn a better feature representation. Therefore, the image-text fusion sub-network can eliminate redundancy between the image and the text by utilizing complementarity between the image and the text, so as to learn better feature representation.

In one embodiment, the encoding the text attribute information through the text data processing channel to obtain an intermediate feature vector includes: carrying out word coding processing on the text attribute information to obtain a word characteristic vector; and carrying out sentence coding processing on the word feature vector to obtain an intermediate feature vector.

Specifically, after receiving the text attribute information, the text data processing channel first performs word encoding processing on the text attribute information to obtain word feature vectors, that is, sequentially performs encoding processing on each word in the text of the multimedia resource by using the word as a unit to obtain the word feature vectors. The word feature vectors include feature vectors corresponding to the words respectively. And then, carrying out sentence coding processing on the sentence characteristic vectors to obtain intermediate characteristic vectors, namely, sequentially carrying out coding processing on the characteristic vectors respectively corresponding to the words by taking the sentences as units to obtain the intermediate characteristic vectors. Thus, sentence expressions can be obtained by word encoding processing, and text expressions can be obtained by sentence encoding processing.

In one embodiment, sentence encoding the word feature vector to obtain an intermediate feature vector includes: and performing attention allocation processing on the word feature vector to obtain a sentence feature vector, and performing sentence coding processing on the sentence feature vector to obtain an intermediate feature vector. Specifically, the role and importance of each word in a sentence is different for a sentence. Therefore, attention allocation processing can be performed on the word feature vector to obtain a sentence feature vector. The attention allocation processing is performed on the word feature vector to allocate different attention weights to each word in a sentence, and the attention weight of a word can reflect the importance of the word in the sentence. Specifically, attention allocation processing may be performed on the word feature vectors based on the target vectors, and parameters of the target vectors are continuously adjusted during model training to obtain the most appropriate target vectors. The sentence characteristic vectors comprise characteristic vectors corresponding to all sentences respectively. And then, carrying out sentence coding processing on the sentence characteristic vectors to obtain intermediate characteristic vectors, namely, sequentially carrying out coding processing on the characteristic vectors respectively corresponding to the words by taking the sentences as units to obtain the intermediate characteristic vectors.

In one embodiment, the text attribute information may be encoded by a transform model, resulting in an intermediate feature vector. The image attribute information can be encoded through a CNN model (convolutional neural network) to obtain an image feature vector.

Referring to fig. 7, the target attribute information associated with the teletext feature sub-network includes text attribute information and image attribute information. And the image-text fusion feature sub-network performs word coding processing and sentence coding processing on the text attribute information through a transformer to obtain an intermediate feature vector. And the image-text fusion characteristic sub-network encodes the image attribute information through the CNN to obtain an image characteristic vector. Then, the attention distribution processing is carried out on the intermediate feature vector based on the image feature vector to obtain a first image-text fusion feature vector, the attention distribution processing is carried out on the image feature vector based on the intermediate feature vector to obtain a second image-text fusion feature vector, and the first image-text fusion feature vector and the second image-text fusion feature vector are spliced to obtain an attribute feature vector.

In the embodiment, the text attribute information and the image attribute information can be organically fused together through the image-text fusion feature sub-network, and the redundancy between the text and the image is eliminated by utilizing the complementarity between the text and the image, so that the feature representation of the image-text information which better represents multimedia resources is learned.

In one embodiment, the feature subnetwork comprises a style feature subnetwork, and the target attribute information associated with the style feature subnetwork comprises style attribute information; the pattern feature subnetwork includes a first data processing channel and a second data processing channel. The method comprises the following steps of respectively carrying out vectorization processing on target attribute information associated with the feature sub-networks through each feature sub-network in the multimedia resource classification model to obtain attribute feature vectors output by each feature sub-network, wherein the vectorization processing comprises the following steps: the method comprises the steps that the style attribute information is coded through a first data processing channel to obtain an initial feature vector, and attention distribution processing is carried out on the initial feature vector to obtain a first feature vector; performing convolution processing on the style attribute information through a second data processing channel to obtain a second feature vector; and obtaining the attribute feature vector output by the style feature sub-network based on the first feature vector and the second feature vector.

The style attribute information refers to attribute information related to information typesetting in the multimedia resource, that is, typesetting information of pictures and texts of the multimedia resource. When the multimedia resource is an article containing a picture, the style attribute information may be style attribute information formed by arranging paragraphs and pictures in sequence according to the appearance order.

In particular, the multimedia asset classification model comprises a sub-network of style features for processing style property information. And after the target attribute information set is input into the multimedia resource classification model, the style attribute information in the target attribute information set is input into the style feature sub-network. The style feature subnetwork includes a first data processing channel and a second data processing channel for performing different modes of data processing on the style attribute information.

The style feature sub-network can perform coding processing on the style attribute information through the first data processing channel to obtain an initial feature vector, and perform attention allocation processing on the initial feature vector to obtain a first feature vector. Specifically, each pattern sub-information in the pattern attribute information may be used as a unit, and each pattern sub-information is sequentially encoded to obtain an initial feature vector. The initial feature vector comprises feature vectors corresponding to the sub information of each style. And then, carrying out attention distribution on the initial characteristic vector to obtain a first characteristic vector. The attention allocation processing is performed on the initial feature vector to allocate different attention weights to the respective style sub-information, and the attention weights of the style sub-information may reflect the importance of the style sub-information in the multimedia resource. For example, the style attribute information includes paragraph 1, paragraph 2, picture 1, and paragraph 3. And coding the style attribute information to obtain an initial feature vector consisting of the feature vector corresponding to the paragraph 1, the feature vector corresponding to the paragraph 2, the feature vector corresponding to the picture 1 and the feature vector corresponding to the paragraph 3. And then, carrying out attention distribution processing on each feature vector, distributing attention weight to each feature vector, and carrying out weighted summation on each feature vector and the corresponding attention weight to obtain a first feature vector. The first data processing channel is mainly used for learning the characteristics among the style sub-information in the style attribute information.

The style feature sub-network may perform convolution processing on the style attribute information through the second data processing channel to obtain a third feature vector. The second data processing channel is mainly used for learning the overall characteristics of the style attribute information.

And finally, obtaining the attribute feature vector output by the style feature subnetwork based on the first feature vector and the second feature vector. Specifically, the first feature vector and the second feature vector may be spliced to obtain the attribute feature vector.

Referring to fig. 8, the target attribute information associated with the style feature sub-network includes style attribute information. The style feature sub-network encodes the style attribute information through an LSTM (Long Short-Term Memory network) to obtain an initial feature vector, performs attention allocation processing on the initial feature vector through attribute to obtain a first feature vector, performs convolution processing on the style attribute information through a CNN (convolutional neural network) to obtain a second feature vector, and splices the first feature vector and the second feature vector to obtain an attribute feature vector.

In this embodiment, different data processing is performed on the style attribute information through different data processing channels, so that feature vectors representing the style attribute information from different angles can be obtained, and thus, the model is facilitated to learn knowledge related to the style.

In one embodiment, each task subnetwork includes an expert layer, a gating layer, and a fusion layer; the various task subnetworks share an expert layer. Inputting each attribute feature vector into each task sub-network to obtain a prediction label corresponding to each task, wherein the steps of: in the current task sub-network, feature processing is carried out on each attribute feature vector through an expert layer to obtain a feature processing result, weighting processing is carried out on the feature processing result through a gate control layer to obtain an intermediate processing result, and fusion processing is carried out on the intermediate processing result through a fusion layer to obtain a prediction label of a task corresponding to the current task sub-network.

Specifically, the multimedia resource classification model comprises a plurality of task sub-networks, and each task sub-network comprises an expert layer, a gating layer and a fusion layer. After each feature subnetwork outputs the attribute feature vector, each attribute feature vector is input into each task subnetwork together. In the current task sub-network, feature processing is carried out on each attribute feature vector through an expert layer to obtain a feature processing result, weighting processing is carried out on the feature processing result through a gate control layer to obtain an intermediate processing result, and fusion processing is carried out on the intermediate processing result through a fusion layer to obtain a prediction label of a task corresponding to the current task sub-network. Each task subnetwork may output a prediction tag.

In one embodiment, the various task sub-networks share not only an expert layer, but also a gating layer. Referring to fig. 9A, the expert layer may be further divided into a plurality of expert sublayers. The method is more based on the thought in ensemble learning, namely, a single network can not effectively learn general expressions among all tasks under the same scale, but each sub-network can always learn some related and unique expressions in a certain task after obtaining a plurality of sub-networks through division, then the output of each Expert sub-layer (Expert) is weighted through the output of the gating layer, and the multi-layer full connection of each task sub-network can be used for better learning of a specific task. Namely, a plurality of expert sublayers at the bottom layer learn different knowledge, and different expert sublayers distinguish different tasks, some experts learn common patterns, and some experts learn independent patterns.

In one embodiment, the various task subnetworks share only the expert layer. Referring to fig. 9B, corresponding gating layers are provided for different tasks, which has the advantage of learning task-specific function to balance shared expressions without adding a large number of new parameters, thereby more explicitly modeling the relationships between tasks. Differences in the tasks can be captured without significantly increasing the model parameters. On the one hand, the gating layer is lightweight, and the expert layer is common to all tasks, so that the method has advantages in terms of the amount of calculation and the amount of parameters. On the other hand, each task uses a separate gating layer, as opposed to all tasks sharing a gating layer. And the gating layer of each task realizes selective utilization of the expert sublayer through different final output weights. The gating layers of different tasks may learn different patterns of combined feature processing results, so that the model allows for capturing the relevance and distinctions of different tasks.

The task prediction result can be expressed as: y isk=hk(fk(x)), gk(x)=softmax(Wgkx) wherein, ykIndicating the task prediction result of the kth task. h isk(x) Data processing of the fusion layer of the k-th task is shown. f. ofk(x) Data processing of the expert layer and the gating layer representing the kth task. gk(x) Data processing of the gating layer representing the kth task. i denotes the ith expert sublayer. n denotes that there are n expert sublayers. f. ofi(x) Representing the data processing of the ith expert sublayer. gk(x)iRepresenting for the kth taskAnd the weight corresponding to the data processing result of the ith expert sublayer.

Referring to fig. 9C, the multi-resource classification model may specifically include a task sub-network corresponding to the click-through rate task and a task sub-network corresponding to the browsing duration task. The two task sub-networks share the expert layer, and each task sub-network comprises a respective gating layer and a respective fusion layer. After the two task subnetworks process the input data, the prediction labels corresponding to the click rate task and the browsing duration task can be output respectively.

In one embodiment, as shown in fig. 10, the method further comprises:

step S1002, acquiring a target attribute information set and a verification label set of a verification multimedia resource; verifying that the multimedia asset is the updated recommended multimedia asset.

Step S1004, inputting the target attribute information set of the verified multimedia resource into the trained multimedia resource classification model to obtain a prediction label set corresponding to the verified multimedia resource.

Step S1006, calculating the classification accuracy based on the predicted label set and the verification label set corresponding to the verification multimedia resource.

And step S1008, when the classification accuracy is smaller than the accuracy threshold, updating the trained multimedia resource classification model based on the prediction label set and the training label set corresponding to the verified multimedia resource to obtain the updated multimedia resource classification model.

Wherein the verification multimedia resource is a multimedia resource for verifying the accuracy of the multimedia resource classification model. The authentication ticket refers to a quality ticket for authenticating the multimedia asset. The verification multimedia resource and the training multimedia resource are different multimedia resources, the verification multimedia resource is a newly issued and newly recommended multimedia resource, namely the verification multimedia resource is an updated and recommended multimedia resource.

Specifically, there is subjectivity in feedback evaluation of the user on the multimedia resource, and subjective preference of the user on the multimedia resource changes in real time according to the popularity of the user group and the multimedia resource, so that the multimedia resource classification model needs to be updated to maintain accuracy and adaptability of the multimedia resource classification model. The computer equipment can obtain historical multimedia resources released recently as verification multimedia resources, a verification label set of the verification multimedia resources is determined based on the latest recommended interaction information set, then model testing is carried out on the resource classification model based on the verification multimedia resources, if the testing is passed, the multimedia resource classification model does not need to be updated, and if the testing is not passed, the multimedia resource classification model is updated based on the verification multimedia resources. When the model test is carried out, the computer equipment can obtain a target attribute information set for verifying the multimedia resource, the target attribute information set for verifying the multimedia resource is input into a trained multimedia resource classification model, and each task sub-network in the multimedia resource classification model respectively has a corresponding prediction label for each task. And the computer equipment calculates the classification accuracy of the multimedia resource classification model based on the prediction label set and the verification label set corresponding to the verification multimedia resource. And when the classification accuracy is greater than the accuracy threshold, the multimedia resource classification model still keeps higher accuracy, and the model test is passed. And when the classification accuracy is smaller than the accuracy threshold, the accuracy of the multimedia resource classification model is low, and the model needs to be updated. The computer equipment can calculate a training loss value based on the prediction label set and the training label set corresponding to the verified multimedia resource, perform back propagation based on the training loss value, update the model parameters of the trained multimedia resource classification model until the convergence condition is met, and obtain the updated multimedia resource classification model. The updated multimedia resource classification model is suitable for the current recommendation environment, and multimedia resources which are currently interested by the user can be identified.

After the multimedia resources are released, the posterior consumption data of the multimedia resources are stored in the database of each resource service platform. The posterior consumption data is the recommended interaction information. Referring to fig. 11, the computer device may obtain posterior consumption data of the historical multimedia resources from each database, and screen out positive and negative samples based on the posterior consumption data to construct a training set and a validation set. The positive sample indicates that the quality label corresponding to the historical multimedia resource in a certain task is a positive label, and the negative sample indicates that the quality label corresponding to the historical multimedia resource in a certain task is a negative label. The computer equipment can select a part of historical multimedia resources as a training set, and train the multimedia resource classification model based on the training set to obtain the trained multimedia resource classification model. The computer device may obtain another portion of the historical multimedia assets as a validation set, validate the trained multimedia asset classification model based on the validation set, and calculate a classification accuracy of the trained multimedia asset classification model based on the validation set. When the classification accuracy is less than the accuracy threshold, the trained multimedia resource classification model may be updated based on the validation set. In addition, when determining the quality label of the multimedia resource, the computer device may determine the quality label of each historical multimedia resource in the training set based on the posterior consumption data of the training set, and determine the quality label of each historical multimedia resource in the verification set based on the posterior consumption data of the verification set, so as to avoid mutual interference between the training set and the verification set. It can be understood that the computer device can update the latest multimedia resource classification model at regular time to ensure that the multimedia resource classification model is always adapted to the current recommendation environment.

In one embodiment, when the coincidence degree of the predicted tag set and the verification tag set of a multimedia resource is greater than the coincidence degree threshold value, the multimedia resource classification model is determined to be accurate for the quality classification of the multimedia resource, and the model prediction is accurate. Then, the proportion of the multimedia resources which are accurately predicted in the verification set can be counted, and the classification accuracy of the multimedia resource classification model is obtained.

In one embodiment, as shown in fig. 12, a multimedia resource recommendation method is provided, which is described by taking the method as an example applied to the computer device in fig. 1, where the computer device may be the terminal 102 or the server 104 in fig. 1. Referring to fig. 12, the multimedia resource recommendation method includes the steps of:

step S1202, acquiring a target attribute information set of a multimedia resource to be recommended; the set of target attribute information includes target attribute information for a plurality of dimensions.

Step S1204, input the goal attribute information set into the multimedia resources classification model already trained; the multimedia resource classification model includes a plurality of feature subnetworks and a plurality of task subnetworks.

Step S1206, respectively carrying out vectorization processing on the target attribute information associated with the feature sub-networks through each feature sub-network in the multimedia resource classification model to obtain the attribute feature vectors output by each feature sub-network.

And step S1208, inputting each attribute feature vector into each task sub-network to obtain the prediction label output by each task sub-network.

Step S1210, obtaining a quality classification result corresponding to the multimedia resource to be recommended based on each prediction label.

In step S1212, the multimedia resource to be recommended is recommended based on the quality classification result.

Specifically, in order to recommend a better quality multimedia resource to the user and improve the effectiveness of resource recommendation, before the multimedia resource recommendation, the computer device may identify a good quality multimedia resource from the massive multimedia resources to be recommended by using the trained multimedia resource classification model, and then recommend the good quality multimedia resource to the user.

The computer equipment can analyze the content of the multimedia resource to be recommended to obtain a target attribute information set of the multimedia resource to be recommended, and inputs the target attribute information set of the multimedia resource to be recommended into the trained multimedia resource classification model to obtain a quality classification result corresponding to the multimedia resource to be recommended.

The trained multimedia asset classification model comprises a plurality of feature subnetworks and a plurality of task subnetworks. The feature sub-networks are used for converting the target attribute information into attribute feature vectors, and different feature sub-networks are used for processing different target attribute information. And respectively carrying out vectorization processing on the target attribute information associated with the feature sub-networks through each feature sub-network in the multimedia resource classification model to obtain the attribute feature vectors output by each feature sub-network.

The task sub-networks are used for predicting the performance of the multimedia resource on a specific task based on the attribute feature vector, and different task sub-networks correspond to different tasks. After obtaining the attribute feature vectors respectively output by each feature sub-network, the computer device may input the attribute feature vectors into each task sub-network together, and each task sub-network performs data processing on the input data to obtain the corresponding prediction tag.

And the computer equipment can predict the quality classification result corresponding to the multimedia resource to be recommended based on each task prediction result. Specifically, when the number of positive labels of the task prediction results is greater than a preset threshold, the quality classification result is determined to be a positive label. For example, when all the task prediction results are positive labels, the quality classification result is determined to be a positive label. And when more than half of the task prediction results are positive labels, determining the quality classification result as the positive label. And finally outputting a quality classification result corresponding to the multimedia resource to be recommended by the multimedia resource classification model. It is understood that each prediction label may also be output by the multimedia resource classification model, and the quality classification result may be determined based on each prediction label outside the model. Or the multimedia resource classification model outputs various prediction labels and quality classification results.

After the quality classification result of the multimedia resources to be recommended is obtained, the computer equipment can perform recommendation weighting on the identified high-quality multimedia resources and perform recommendation weight reduction on the low-quality multimedia resources, and the recommendation method can effectively recommend the high-quality multimedia resources which are high in tone and attractive to the user preferentially, so that good reading experience is brought to the user, and the effectiveness of resource recommendation is improved.

It can be understood that the specific process of training and updating the multimedia resource classification model may refer to the method described in each relevant embodiment of the aforementioned multimedia resource classification model training method, and the model structure and the data processing process of the multimedia resource classification model may also refer to the method described in each relevant embodiment of the aforementioned multimedia resource classification model training method, which is not described herein again.

The multimedia resource classification model training method comprises the steps of obtaining a target attribute information set of multimedia resources to be recommended, inputting the target attribute information set into a trained multimedia resource classification model by the target attribute information set, wherein the target attribute information set comprises target attribute information with multiple dimensions, the multimedia resource classification model comprises a plurality of feature subnetworks and a plurality of task subnetworks, respectively carrying out vectorization processing on target attribute information associated with the feature sub-networks through each feature sub-network in the multimedia resource classification model to obtain attribute feature vectors output by each feature sub-network, inputting each attribute feature vector into each task sub-network to obtain prediction labels output by each task sub-network, obtaining quality classification results corresponding to the multimedia resources to be recommended based on each prediction label, and recommending the multimedia resources to be recommended based on the quality classification results. Therefore, the target attribute information set of the multimedia resources comprises the target attribute information with multiple dimensions, the target attribute information with different dimensions can reflect the content quality of the multimedia resources from different angles, the target attribute information set is input into the multimedia resource classification model, the quality of the multimedia resources can be accurately classified by comprehensively considering the target attribute information with all dimensions, and accurate quality classification results are obtained. In addition, the multimedia resource classification model comprises a plurality of task sub-networks, is a multi-task model, can predict the performance of multimedia resources on each task, synthesizes the performance of the multimedia resources on each task to obtain the quality classification result of the multimedia resources, and can further improve the accuracy of the quality classification of the multimedia resources. Finally, the multimedia resources with better quality can be identified through the multimedia resource classification model, so that the multimedia resources with better quality can be recommended to the user, and the effectiveness of multimedia resource recommendation is improved. The effective multimedia resource recommendation can avoid repeated searching or repeated refreshing of the interface caused by low-quality and ineffective multimedia resource recommendation, and the repeated searching or repeated refreshing of the interface can occupy a large amount of computer equipment resources, so that the resource waste of the terminal or the server can be reduced on the basis of improving the effectiveness of the resource recommendation.

The application also provides an application scene, and the application scene applies the multimedia resource classification model training and multimedia resource recommendation method. Specifically, the application of the method in the context recommendation application scenario is as follows:

referring to fig. 13A, when recommending image-text content, the resource service platform may filter low-quality content, perform high-quality content identification, and perform weighted recommendation on the high-quality content after the high-quality content is delivered from the repository, so as to effectively recommend the high-quality content with high tonality and attractiveness to the user preferentially, bring a good reading experience to the user, and improve the recommendation effectiveness.

1. Filtering low-quality content

The low quality content specifically includes teletext content including vulgar content, rumors, headline parties, advertising marketing. The inferior quality content specifically comprises non-nutritional image-text content, road text, eight diagrams, propaganda text, splicing text, negative influence text, oral hydrology, advertisement soft text and other image-text content. The server may preferentially filter the low quality content.

2. Identifying premium content

The high-quality content identification method comprises a priori high quality and a posterior high quality. The prior high quality refers to that high quality content is identified from objective angles of text quality, picture typesetting and the like from the picture and text content. The posterior high quality refers to that the high quality content is comprehensively identified from the objective and subjective angles by starting from the image-text content and further considering the evaluation of the user on the image-text content.

The server may train a model for a posterior high-quality identification of the pictures and texts. Firstly, constructing features from various content dimensions such as graphic multimodality, article typesetting, account numbers, linguistics and the like to complete deep network modeling, and establishing a multitask-based graphic content classification model. The model structure of the teletext content classification model can be referred to in fig. 13B. The model comprises a text characteristic sub-network, an atomic characteristic sub-network, a picture-text fusion characteristic sub-network and a style characteristic sub-network, data output by each characteristic sub-network is output to each task sub-network, each task sub-network outputs a task prediction result, and a quality classification result corresponding to the input data is obtained based on each task prediction result. The connection between the respective feature sub-networks and the task sub-networks can be made via the MLP layer. The model of FIG. 13B includes a task sub-network for predicting click-through rate and a task sub-network for predicting browsing duration, i.e., a task sub-network corresponding to the click-through rate task and a task sub-network corresponding to the browsing duration task.

Then, the posterior consumption data of the user is used for driving understanding of the high-quality content, namely, the posterior consumption data is used for screening out high-quality positive and negative image-text samples, and a training set and a verification set are established. Referring to fig. 13C, when the user browses information in the information watching small program, the user may perform operations such as pressing like, commenting, and clicking a browsing text on the information article, and the background may perform statistics on the operation data to obtain posterior consumption data, such as a click rate, a pressing like rate, and a browsing duration.

Further, training the model based on the training set to obtain the trained image-text content classification model. The trained image-text content classification model can be used for classifying the quality of the image-text content to be recommended, and the quality classification result corresponding to the image-text content to be recommended can be obtained by inputting the target attribute information set of the image-text content to be recommended into the image-text content classification model.

3. Recommending premium content

And when the quality classification result shows that the image-text content to be recommended is high-quality content, performing weighted recommendation on the high-quality content.

The accuracy rate of the model after the model is on line reaches 95%. After the browser side carries out recommendation weighting experiments on the identified image-text high-quality content, the high-quality content which is good in reading experience and attractive is preferentially recommended to the user, and a good business effect is obtained on the business side. The whole image-text clicking on the browser side is improved by 0.946%, the total image-text browsing time is improved by 1.007%, the image-text CTR is improved by 0.729%, and meanwhile, the human comments in the interactive index data are improved by 0.416%.

Furthermore, after the model is trained, the model can be tested based on the verification set at variable time, and if the test result shows that the accuracy of the model is low, the model is updated to ensure the accuracy of the model. In addition, we also performed test analysis on the model update time, and the test results are shown in table 1. The automatic updating period of the model can be set to 5 days according to the test result so as to maintain higher accuracy of the model.

Decay time High quality specific attenuation AUC decay
5 days 0.452% 0.031%
7 days 0.831% 0.126%
10 days 1.856% 0.705%

TABLE 1

In this embodiment, a model automatic update scheme for performing the posterior high-quality recognition of the images and texts based on multitasking, image-text multimodal, article typesetting, account numbers, linguistics and the like is an algorithm based on a specific service scene and an innovation in a model structure. The posterior consumption data of the user is used for driving understanding of the high-quality content, namely the posterior consumption data is used for screening image-text high-quality positive and negative samples, characteristics are built from image-text multi-mode, article typesetting, account numbers, linguistics and other content dimensions to complete deep network modeling, a model automatic updating scheme is adopted to continuously capture the latest consumption content rule, the problem that subjective consumption preference of a user group changes in real time along with time is optimized to a certain extent, and the effectiveness of resource recommendation is improved.

It should be understood that although the various steps in the flow charts of fig. 2-13B are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-13B may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least some of the other steps or stages.

In one embodiment, as shown in fig. 14, there is provided a multimedia resource classification model training apparatus, which may be a part of a computer device using a software module or a hardware module, or a combination of the two modules, and specifically includes: an information acquisition module 1402, an attribute information input module 1404, an attribute information processing module 1406, a task prediction module 1408, a label prediction module 1410, and a model adjustment module 1412, wherein:

an information obtaining module 1402, configured to obtain a target attribute information set and a training label set of a training multimedia resource; the target attribute information set comprises target attribute information of multiple dimensions, and the training label set comprises training labels corresponding to multiple tasks;

an attribute information input module 1404, configured to input a target attribute information set of a training multimedia resource into a multimedia resource classification model to be trained; the multimedia resource classification model comprises a plurality of feature sub-networks and task sub-networks corresponding to the tasks respectively;

the attribute information processing module 1406 is configured to perform vectorization processing on the target attribute information associated with the feature sub-networks through each feature sub-network in the multimedia resource classification model to obtain attribute feature vectors output by each feature sub-network;

the label prediction module 1408 is configured to input each attribute feature vector into each task sub-network to obtain a prediction label corresponding to each task;

the model adjusting module 1410 is used for adjusting the parameters of the corresponding task sub-networks by the training labels and the prediction labels corresponding to the same task, and adjusting the model parameters of each feature sub-network based on the training labels and the prediction labels corresponding to each task until the convergence condition is met, so as to obtain a trained multimedia resource classification model; the multimedia resource classification model is used for classifying the quality of the multimedia resource to be recommended.

In one embodiment, the information obtaining module is further configured to obtain recommended interaction information sets corresponding to the plurality of historical multimedia resources, respectively; the recommendation interaction information set comprises recommendation interaction information corresponding to each task; counting the recommended interaction information of the same task to obtain reference interaction information corresponding to each task; classifying the quality of the historical multimedia resources based on the recommended interaction information and the corresponding reference interaction information to obtain a quality label set corresponding to each historical multimedia resource; and obtaining a training multimedia resource and a corresponding training label set based on the historical multimedia resource and the corresponding quality label set.

In one embodiment, the information obtaining module is further configured to compare, in a recommended interaction information set corresponding to the same historical multimedia resource, a recommended interaction degree corresponding to recommended interaction information of the same task with a reference interaction degree corresponding to reference interaction information; determining a quality label of a task corresponding to recommended interaction information with the recommended interaction degree greater than the reference interaction degree as a positive label; and determining the quality label of the task corresponding to the recommended interaction information with the recommended interaction degree smaller than the reference interaction degree as a negative label.

In one embodiment, the feature subnetwork includes a text feature subnetwork, the target attribute information associated with the text feature subnetwork includes a plurality of text attribute information, and the text feature subnetwork includes data processing channels to which respective text attribute information respectively correspond. The attribute information processing module is also used for respectively carrying out vectorization processing on the corresponding text attribute information through each data processing channel in the text feature sub-network to obtain a text feature vector output by each data processing channel; and obtaining attribute feature vectors output by the text feature sub-network based on the text feature vectors.

In one embodiment, the feature subnetwork comprises an atomic feature subnetwork. The attribute information processing module is also used for performing characteristic cross processing on target attribute information associated with the atomic feature sub-network through the atomic feature sub-network to obtain at least one cross feature vector; and obtaining the attribute feature vector output by the atomic feature subnetwork based on each cross feature vector.

In one embodiment, the target attribute information associated with the sub-network of atomic features includes at least two of user attribute information, image attribute information, language attribute information, and text statistical attribute information.

In one embodiment, the feature subnetwork includes a teletext feature subnetwork, the target attribute information associated with the teletext feature subnetwork includes text attribute information and image attribute information, and the teletext feature subnetwork includes a text data processing channel corresponding to the text attribute information and an image data processing channel corresponding to the image attribute information. The attribute information processing module is also used for coding the text attribute information through the text data processing channel to obtain an intermediate feature vector; coding the image attribute information through an image data processing channel to obtain an image characteristic vector; performing attention distribution processing on the intermediate feature vector based on the image feature vector to obtain a first image-text fusion feature vector; performing attention distribution processing on the image feature vector based on the intermediate feature vector to obtain a second image-text fusion feature vector; and obtaining the attribute feature vector output by the image-text fusion feature sub-network based on the first image-text fusion feature vector and the second image-text fusion feature vector.

In one embodiment, the attribute information processing module is further configured to perform word encoding processing on the text attribute information to obtain a word feature vector; and carrying out sentence coding processing on the word feature vector to obtain an intermediate feature vector.

In one embodiment, the feature subnetwork comprises a style feature subnetwork, and the target attribute information associated with the style feature subnetwork comprises style attribute information; the pattern feature subnetwork includes a first data processing channel and a second data processing channel. The attribute information processing module is also used for coding the style attribute information through the first data processing channel to obtain an initial feature vector, and performing attention allocation processing on the initial feature vector to obtain a first feature vector; performing convolution processing on the style attribute information through a second data processing channel to obtain a second feature vector; and obtaining the attribute feature vector output by the style feature sub-network based on the first feature vector and the second feature vector.

In one embodiment, each task subnetwork includes an expert layer, a gating layer, and a fusion layer; the task sub-networks share an expert layer. The label prediction module is further used for performing feature processing on each attribute feature vector through an expert layer in a current task sub-network to obtain a feature processing result, performing weighting processing on the feature processing result through a gate control layer to obtain an intermediate processing result, and performing fusion processing on the intermediate processing result through a fusion layer to obtain a prediction label of a task corresponding to the current task sub-network.

In one embodiment, as shown in fig. 15, the apparatus further comprises:

the model updating module 1412 is configured to obtain a target attribute information set and a verification tag set of the verification multimedia resource; verifying that the multimedia resource is the recommended multimedia resource for updating; inputting a target attribute information set of the verified multimedia resource into the trained multimedia resource classification model to obtain a prediction label set corresponding to the verified multimedia resource; calculating classification accuracy based on a prediction label set and a verification label set corresponding to the verification multimedia resource; and when the classification accuracy is smaller than the accuracy threshold, updating the trained multimedia resource classification model based on the prediction label set and the training label set corresponding to the verified multimedia resource to obtain the updated multimedia resource classification model.

In one embodiment, as shown in fig. 16, there is provided a multimedia resource recommendation apparatus, which may be a part of a computer device using a software module or a hardware module, or a combination of the two modules, and specifically includes: an attribute information acquisition module 1602, an attribute information input module 1604, an attribute information processing module 1606, a task prediction module 1608, a quality classification module 1610, and a resource recommendation module 1612, wherein:

an attribute information obtaining module 1602, configured to obtain a target attribute information set of a multimedia resource to be recommended; the target attribute information set comprises target attribute information of a plurality of dimensions;

an attribute information input module 1604, configured to input the target attribute information set into the trained multimedia resource classification model; the multimedia resource classification model comprises a plurality of feature sub-networks and a plurality of task sub-networks;

an attribute information processing module 1606, configured to perform vectorization processing on the target attribute information associated with the feature sub-networks through each feature sub-network in the multimedia resource classification model, to obtain attribute feature vectors output by each feature sub-network;

a label prediction module 1608, configured to input each attribute feature vector into each task subnetwork, to obtain a prediction label output by each task subnetwork;

a quality classification module 1610, configured to obtain a quality classification result corresponding to the multimedia resource to be recommended based on each prediction tag;

and the resource recommending module 1612 is configured to recommend the multimedia resource to be recommended based on the quality classification result.

For the specific limitations of the multimedia resource classification model training apparatus and the multimedia resource recommendation apparatus, reference may be made to the limitations of the multimedia resource classification model training method and the multimedia resource recommendation method, which are not described herein again. All or part of the modules in the multimedia resource classification model training device and the multimedia resource recommending device can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 17. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer equipment is used for storing data such as a recommendation interaction information set, a target attribute information set, a quality label, a multimedia resource classification model and the like of multimedia resources. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a multimedia resource classification model training method and a multimedia resource recommendation method.

In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 18. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a multimedia resource classification model training method and a multimedia resource recommendation method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the configurations shown in fig. 17 and 18 are block diagrams of only some of the configurations relevant to the present disclosure, and do not constitute a limitation on the computing devices to which the present disclosure may be applied, and that a particular computing device may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.

In an embodiment, a computer-readable storage medium is provided, in which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.

In one embodiment, a computer program product or computer program is provided that includes computer instructions stored in a computer-readable storage medium. The computer instructions are read by a processor of a computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the steps in the above-mentioned method embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

47页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:多媒体数据的搜索方法、装置、设备及存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!