Singing list generation method, device, medium and computing equipment

文档序号:272537 发布日期:2021-11-19 浏览:4次 中文

阅读说明:本技术 歌单生成方法、装置、介质和计算设备 (Singing list generation method, device, medium and computing equipment ) 是由 龚淑琴 任印涛 肖强 李勇 于 2021-08-20 设计创作,主要内容包括:本公开的实施方式提供了一种歌单生成方法、装置、介质和计算设备。该歌单生成方法包括:获取目标用户的偏好歌曲集合,从所述偏好歌曲集合中确定种子歌曲;获取与所述种子歌曲关联的图片集合,计算所述目标用户对所述图片集合中的各个图片的偏好预测评分,根据所述偏好预测评分从所述图片集合中确定种子歌曲图片;基于所述偏好歌曲集合中的各个偏好歌曲召回相似歌曲,将所述偏好歌曲集合中除所述种子歌曲外的其他偏好歌曲和所述相似歌曲确定为候选歌曲;计算所述目标用户对所述候选歌曲的偏好预测评分,根据所述偏好预测评分生成歌曲列表;基于所述种子歌曲图片和所述歌曲列表生成歌单。本公开可以提高针对用户的歌曲推荐的准确度,提升用户体验。(The embodiment of the disclosure provides a song list generation method, a song list generation device, a song list generation medium and a computing device. The song list generation method comprises the following steps: acquiring a preference song set of a target user, and determining seed songs from the preference song set; acquiring a picture set associated with the seed song, calculating preference prediction scores of the target user for each picture in the picture set, and determining a seed song picture from the picture set according to the preference prediction scores; recalling similar songs based on each preference song in the preference song set, and determining other preference songs except the seed song and the similar songs in the preference song set as candidate songs; calculating the preference prediction score of the target user on the candidate songs, and generating a song list according to the preference prediction score; and generating a song list based on the seed song picture and the song list. The method and the device can improve the accuracy of song recommendation for the user and improve user experience.)

1. A method of generating a song list, the method comprising:

acquiring a preference song set of a target user, and determining seed songs from the preference song set;

acquiring a picture set associated with the seed song, calculating preference prediction scores of the target user for each picture in the picture set, and determining a seed song picture from the picture set according to the preference prediction scores;

recalling similar songs based on each preference song in the preference song set, and determining other preference songs except the seed song and the similar songs in the preference song set as candidate songs;

calculating the preference prediction score of the target user on the candidate songs, and generating a song list according to the preference prediction score;

and generating a song list based on the seed song picture and the song list.

2. The method of claim 1, the generating a list of songs from the preference prediction scores, comprising:

and displaying the candidate songs to the target user according to the preference prediction scores, and generating a song list according to the candidate songs selected by the target user.

3. The method of claim 1, the calculating a preference prediction score for the candidate song by the target user, comprising:

calculating preference prediction scores of the target user for the candidate songs according to multi-mode data respectively corresponding to the seed songs, the candidate songs and the historical playing songs of the target user;

wherein the multimodal data is used to characterize combined data of a plurality of the audio data, the picture data and the attribute data corresponding to the song.

4. The method of claim 3, wherein calculating the target user's preference prediction score for the candidate song based on multimodal data corresponding to the seed song, the candidate song, and the target user's historically played song, respectively, comprises:

inputting first multi-modal data corresponding to the seed song, second multi-modal data corresponding to the candidate song and third multi-modal data corresponding to the historical playing song of the target user into a song prediction model for prediction calculation to obtain a preference prediction score of the target user for the candidate song;

wherein the song prediction model comprises: the device comprises a vector conversion layer, a cross processing layer and a full connection layer;

the vector conversion layer is used for searching a first multi-modal feature vector corresponding to the first multi-modal data, searching a second multi-modal feature vector corresponding to the second multi-modal feature, searching a third multi-modal feature vector corresponding to the third multi-modal feature, and transmitting a searching result to the cross processing layer;

the cross processing layer is used for performing cross processing on the first multi-modal feature vector and the second multi-modal feature vector to obtain a first cross feature vector, performing cross processing on the third multi-modal feature vector and the second multi-modal feature vector to obtain a second cross feature vector, and transmitting spliced feature vectors obtained after splicing processing is performed on the first multi-modal feature vector, the second multi-modal feature vector, the third multi-modal feature vector, the first cross feature vector and the second cross feature vector to the full connection layer;

the full-connection layer is used for mapping the splicing feature vector into a preference prediction score and outputting the preference prediction score.

5. The method of claim 1, the recalling similar songs based on respective preferred songs of the set of preferred songs, comprising:

calculating the similarity between the target song and each preference song in the preference song set aiming at any target song in a song database, and determining whether the similarity reaches a preset first threshold value;

and if the similarity reaches the first threshold value, determining the target song as a similar song.

6. The method of claim 5, the determining other preferred songs in the set of preferred songs other than the seed song and the similar songs as candidate songs, comprising:

determining the preference degree of the target user for each preference song in the preference song set based on the historical playing behavior data of the target user;

calculating the interest degree of the target user for the target song based on the similarity between the target song and each preference song in the preference song set and the preference degree of the target user for each preference song in the preference song set, and determining whether the interest degree reaches a preset second threshold value;

and if the interest degree reaches the second threshold value, determining the target song with the interest degree reaching the second threshold value and other preference songs in the preference song set except the seed song as candidate songs.

7. The method of claim 1, the calculating a preference prediction score for the target user for each picture in the set of pictures, comprising:

inputting picture attribute data of the target picture and user portrait data of the target user into a picture prediction model for prediction calculation aiming at any target picture in the picture set, and obtaining a preference prediction score of the target user on the target picture; the image prediction model is a machine learning model trained on preference scores of sample users on sample images.

8. An apparatus for generating a song list, the apparatus comprising:

the first determining module is used for acquiring a preference song set of a target user and determining seed songs from the preference song set;

the second determination module is used for acquiring a picture set associated with the seed song, calculating preference prediction scores of the target user on each picture in the picture set, and determining a seed song picture from the picture set according to the preference prediction scores;

a third determining module, configured to recall similar songs based on each preferred song in the preferred song set, and determine other preferred songs in the preferred song set except the seed song and the similar songs as candidate songs;

the calculation module is used for calculating the preference prediction score of the target user on the candidate songs and generating a song list according to the preference prediction score;

and the generating module is used for generating a song list based on the seed song picture and the song list.

9. A medium having stored thereon a computer program which, when executed by a processor, carries out the method of any one of claims 1-7.

10. A computing device, comprising:

a processor;

a memory for storing a processor executable program;

wherein the processor implements the method of any one of claims 1-7 by running the executable program.

Technical Field

Embodiments of the present disclosure relate to the field of computer applications, and more particularly, to a song list generation method, apparatus, medium, and computing device.

Background

This section is intended to provide a background or context to the embodiments of the disclosure recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.

With the rapid development of digital music and the rapid growth of music resources, currently, mainstream music application programs can provide massive songs for users, and the users select songs to listen to the songs, but the problem of song overload is increasingly serious, so that how to find songs which are possibly preferred by the users becomes a problem to be solved urgently for performing personalized song recommendation for the users.

Disclosure of Invention

In this context, embodiments of the present disclosure are intended to provide a song list generation method, apparatus, medium, and computing device.

In a first aspect of embodiments of the present disclosure, there is provided a song list generation method, the method comprising:

acquiring a preference song set of a target user, and determining seed songs from the preference song set;

acquiring a picture set associated with the seed song, calculating preference prediction scores of the target user for each picture in the picture set, and determining a seed song picture from the picture set according to the preference prediction scores;

recalling similar songs based on each preference song in the preference song set, and determining other preference songs except the seed song and the similar songs in the preference song set as candidate songs;

calculating the preference prediction score of the target user on the candidate songs, and generating a song list according to the preference prediction score;

and generating a song list based on the seed song picture and the song list.

Optionally, the generating a song list according to the preference prediction score includes:

and displaying the candidate songs to the target user according to the preference prediction scores, and generating a song list according to the candidate songs selected by the target user.

Optionally, the calculating the preference prediction score of the target user for the candidate song includes:

calculating preference prediction scores of the target user for the candidate songs according to multi-mode data respectively corresponding to the seed songs, the candidate songs and the historical playing songs of the target user;

wherein the multimodal data is used to characterize combined data of a plurality of the audio data, the picture data and the attribute data corresponding to the song.

Optionally, the calculating, according to the multi-modal data respectively corresponding to the seed song, the candidate song, and the historical played song of the target user, a preference prediction score of the target user for the candidate song includes:

inputting first multi-modal data corresponding to the seed song, second multi-modal data corresponding to the candidate song and third multi-modal data corresponding to the historical playing song of the target user into a song prediction model for prediction calculation to obtain a preference prediction score of the target user for the candidate song;

wherein the song prediction model comprises: the device comprises a vector conversion layer, a cross processing layer and a full connection layer;

the vector conversion layer is used for searching a first multi-modal feature vector corresponding to the first multi-modal data, searching a second multi-modal feature vector corresponding to the second multi-modal feature, searching a third multi-modal feature vector corresponding to the third multi-modal feature, and transmitting a searching result to the cross processing layer;

the cross processing layer is used for performing cross processing on the first multi-modal feature vector and the second multi-modal feature vector to obtain a first cross feature vector, performing cross processing on the third multi-modal feature vector and the second multi-modal feature vector to obtain a second cross feature vector, and transmitting spliced feature vectors obtained after splicing processing is performed on the first multi-modal feature vector, the second multi-modal feature vector, the third multi-modal feature vector, the first cross feature vector and the second cross feature vector to the full connection layer;

the full-connection layer is used for mapping the splicing feature vector into a preference prediction score and outputting the preference prediction score.

Optionally, the method further comprises:

and according to the multi-modal data respectively corresponding to the songs in the song database, pre-processing extraction of feature vectors is carried out on the multi-modal data so as to extract the multi-modal feature vectors corresponding to the multi-modal data.

Optionally, the method further comprises:

acquiring user portrait data and historical playing behavior data of the target user;

the calculating the preference prediction score of the target user for the candidate song according to the multi-modal data respectively corresponding to the seed song, the candidate song and the historical playing song of the target user comprises:

and calculating the preference prediction score of the target user for the candidate song according to the multi-modal data respectively corresponding to the seed song, the candidate song and the historical playing song of the target user, and the user portrait data and the historical playing behavior data of the target user.

Optionally, the determining a seed song from the set of preferred songs includes:

determining the preference degree of the target user for each preference song in the preference song set based on the historical playing behavior data of the target user;

determining the preference song with the highest preference degree in the preference song set as the seed song.

Optionally, the recalling similar songs based on the respective preference songs in the preference song set includes:

calculating the similarity between the target song and each preference song in the preference song set aiming at any target song in a song database, and determining whether the similarity reaches a preset first threshold value;

and if the similarity reaches the first threshold value, determining the target song as a similar song.

Optionally, the determining the other preferred songs in the preferred song set except the seed song and the similar songs as candidate songs includes:

determining the preference degree of the target user for each preference song in the preference song set based on the historical playing behavior data of the target user;

calculating the interest degree of the target user for the target song based on the similarity between the target song and each preference song in the preference song set and the preference degree of the target user for each preference song in the preference song set, and determining whether the interest degree reaches a preset second threshold value;

and if the interest degree reaches the second threshold value, determining the target song with the interest degree reaching the second threshold value and other preference songs in the preference song set except the seed song as candidate songs.

Optionally, the calculating a preference prediction score of the target user for each picture in the picture set includes:

inputting picture attribute data of the target picture and user portrait data of the target user into a picture prediction model for prediction calculation aiming at any target picture in the picture set, and obtaining a preference prediction score of the target user on the target picture; the image prediction model is a machine learning model trained on preference scores of sample users on sample images.

Optionally, the determining a seed song picture from the picture set according to the preference prediction score includes:

determining the picture with the highest preference prediction score in the picture set as a seed song picture.

Optionally, the generating a song list according to the preference prediction score includes:

and sorting the candidate songs according to the preference prediction scores, and generating the song list according to the seed songs and the sorted candidate songs.

In a second aspect of embodiments of the present disclosure, there is provided a song list generation apparatus, the apparatus comprising:

the first determining module is used for acquiring a preference song set of a target user and determining seed songs from the preference song set;

the second determination module is used for acquiring a picture set associated with the seed song, calculating preference prediction scores of the target user on each picture in the picture set, and determining a seed song picture from the picture set according to the preference prediction scores;

a third determining module, configured to recall similar songs based on each preferred song in the preferred song set, and determine other preferred songs in the preferred song set except the seed song and the similar songs as candidate songs;

the calculation module is used for calculating the preference prediction score of the target user on the candidate songs and generating a song list according to the preference prediction score;

and the generating module is used for generating a song list based on the seed song picture and the song list.

Optionally, the calculation module is specifically configured to:

and displaying the candidate songs to the target user according to the preference prediction scores, and generating a song list according to the candidate songs selected by the target user.

Optionally, the calculation module is specifically configured to:

calculating preference prediction scores of the target user for the candidate songs according to multi-mode data respectively corresponding to the seed songs, the candidate songs and the historical playing songs of the target user;

wherein the multimodal data is used to characterize combined data of a plurality of the audio data, the picture data and the attribute data corresponding to the song.

Optionally, the calculation module is specifically configured to:

inputting first multi-modal data corresponding to the seed song, second multi-modal data corresponding to the candidate song and third multi-modal data corresponding to the historical playing song of the target user into a song prediction model for prediction calculation to obtain a preference prediction score of the target user for the candidate song;

wherein the song prediction model comprises: the device comprises a vector conversion layer, a cross processing layer and a full connection layer;

the vector conversion layer is used for searching a first multi-modal feature vector corresponding to the first multi-modal data, searching a second multi-modal feature vector corresponding to the second multi-modal feature, searching a third multi-modal feature vector corresponding to the third multi-modal feature, and transmitting a searching result to the cross processing layer;

the cross processing layer is used for performing cross processing on the first multi-modal feature vector and the second multi-modal feature vector to obtain a first cross feature vector, performing cross processing on the third multi-modal feature vector and the second multi-modal feature vector to obtain a second cross feature vector, and transmitting spliced feature vectors obtained after splicing processing is performed on the first multi-modal feature vector, the second multi-modal feature vector, the third multi-modal feature vector, the first cross feature vector and the second cross feature vector to the full connection layer;

the full-connection layer is used for mapping the splicing feature vector into a preference prediction score and outputting the preference prediction score.

Optionally, the apparatus further comprises:

and the preprocessing module is used for preprocessing the multi-modal data to extract the feature vectors according to the multi-modal data respectively corresponding to each song in the song database so as to extract the multi-modal feature vectors corresponding to the multi-modal data.

Optionally, the apparatus further comprises:

the acquisition module is used for acquiring user portrait data and historical playing behavior data of the target user;

the calculation module is specifically configured to:

and calculating the preference prediction score of the target user for the candidate song according to the multi-modal data respectively corresponding to the seed song, the candidate song and the historical playing song of the target user, and the user portrait data and the historical playing behavior data of the target user.

Optionally, the first determining module is specifically configured to:

determining the preference degree of the target user for each preference song in the preference song set based on the historical playing behavior data of the target user;

determining the preference song with the highest preference degree in the preference song set as the seed song.

Optionally, the third determining module is specifically configured to:

calculating the similarity between the target song and each preference song in the preference song set aiming at any target song in a song database, and determining whether the similarity reaches a preset first threshold value;

and if the similarity reaches the first threshold value, determining the target song as a similar song.

Optionally, the third determining module is specifically configured to:

determining the preference degree of the target user for each preference song in the preference song set based on the historical playing behavior data of the target user;

calculating the interest degree of the target user for the target song based on the similarity between the target song and each preference song in the preference song set and the preference degree of the target user for each preference song in the preference song set, and determining whether the interest degree reaches a preset second threshold value;

and if the interest degree reaches the second threshold value, determining the target song with the interest degree reaching the second threshold value and other preference songs in the preference song set except the seed song as candidate songs.

Optionally, the second determining module is specifically configured to:

inputting picture attribute data of the target picture and user portrait data of the target user into a picture prediction model for prediction calculation aiming at any target picture in the picture set, and obtaining a preference prediction score of the target user on the target picture; the image prediction model is a machine learning model trained on preference scores of sample users on sample images.

Optionally, the second determining module is specifically configured to:

determining the picture with the highest preference prediction score in the picture set as a seed song picture.

Optionally, the calculation module is specifically configured to:

and sorting the candidate songs according to the preference prediction scores, and generating the song list according to the seed songs and the sorted candidate songs.

In a third aspect of the disclosed embodiments, there is provided a medium having stored thereon a computer program which, when executed by a processor, implements any of the above-described song list generation methods.

In a fourth aspect of embodiments of the present disclosure, there is provided a computing device comprising:

a processor;

a memory for storing a processor executable program;

wherein the processor implements any of the above-described song list generation methods by running the executable program.

According to the embodiment of the disclosure, a seed song may be determined from a preference song set of a user, a seed song picture may be determined according to a preference prediction score of the user on a picture associated with the seed song, similar songs may be recalled based on the preference songs in the preference song set to determine other preference songs in the preference song set except the seed song and the similar songs as candidate songs, and subsequently a song list may be generated according to the preference prediction score of the user on the candidate songs, and a song list may be generated according to the seed song picture and the song list. By adopting the method, the preference degree of the user to the generated song list picture of the song list and the songs in the song list can be ensured, so that the song recommendation accuracy for the user can be improved, and the user experience is improved.

Drawings

The above and other objects, features and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:

fig. 1 schematically illustrates an application scenario of song list generation according to an embodiment of the present disclosure;

FIG. 2A schematically illustrates a schematic view of a user interface according to an embodiment of the present disclosure;

FIG. 2B schematically illustrates a schematic view of another user interface according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates a flow chart of a song list generation method according to an embodiment of the present disclosure;

FIG. 4 schematically illustrates a schematic diagram of a song prediction model according to an embodiment of the present disclosure;

FIG. 5 schematically illustrates a schematic diagram of another song prediction model according to an embodiment of the present disclosure;

figure 6 schematically illustrates a schematic diagram of a song menu according to an embodiment of the present disclosure;

FIG. 7A schematically illustrates a schematic view of yet another user interface according to an embodiment of the present disclosure;

figure 7B schematically illustrates a schematic view of another song menu according to an embodiment of the present disclosure;

FIG. 8 schematically illustrates a schematic view of a medium according to an embodiment of the disclosure;

fig. 9 schematically shows a block diagram of a song list generation apparatus according to an embodiment of the present disclosure;

FIG. 10 schematically shows a schematic diagram of a computing device in accordance with an embodiment of the present disclosure.

In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

Detailed Description

The principles and spirit of the present disclosure will be described with reference to a number of exemplary embodiments. It is understood that these embodiments are given solely for the purpose of enabling those skilled in the art to better understand and to practice the present disclosure, and are not intended to limit the scope of the present disclosure in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

As will be appreciated by one skilled in the art, embodiments of the present disclosure may be embodied as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.

According to an embodiment of the disclosure, a song list generation method, a song list generation device, a song list generation medium and a computing device are provided.

In this document, it is to be understood that any number of elements in the figures are provided by way of illustration and not limitation, and any nomenclature is used for differentiation only and not in any limiting sense.

The principles and spirit of the present disclosure are explained in detail below with reference to several representative embodiments of the present disclosure.

Summary of The Invention

Generally, when recommending songs for a user, a song list containing a plurality of songs that the user may prefer is pushed to the user for the user to view, and the songs are selected from the song list for listening. The list of songs may be carried in the form of a song list. In addition to the song list, the song list may also include a song list picture; wherein the picture of the song list may be a picture associated with a certain song in the song list. When the song list is pushed to the user, the song list picture in the song list is firstly displayed to the user, and the song list in the song list is displayed to the user after the user enters the song list detail page.

In the related art, song recommendation is usually performed for a user according to only song styles of songs played by the user history, and the recommendation dimension is single, so that the accuracy of song recommendation for the user is low.

In order to solve the above problem, the present disclosure provides a technical solution for generating a song list, in which a seed song may be determined from a preference song set of a user, a seed song picture may be determined according to a preference prediction score of the user on a picture associated with the seed song, similar songs may be recalled based on preference songs in the preference song set to determine other preference songs in the preference song set except the seed song and the similar songs as candidate songs, a song list may be subsequently generated according to the preference prediction score of the user on the candidate songs, and a song list may be generated according to the seed song picture and the song list. By adopting the method, the preference degree of the user to the generated song list picture of the song list and the songs in the song list can be ensured, so that the song recommendation accuracy for the user can be improved, and the user experience is improved.

Having described the general principles of the present disclosure, various non-limiting embodiments of the present disclosure are described in detail below.

Application scene overview

Generally, a user may install a client corresponding to a certain business activity in an electronic device used by the user; the electronic device can be a terminal device such as a smart phone, a tablet computer, a notebook computer, a palm computer, an intelligent wearable device, an intelligent vehicle-mounted device or a game machine.

In this case, the client may output an interface related to the business activity to the user, so that the user may perform an interactive operation in the interface; and the client can execute the business process related to the business activity based on the interaction operation of the user in the interface.

Referring first to fig. 1, fig. 1 schematically shows a schematic diagram of an application scenario of a song list generation according to an embodiment of the present disclosure.

As shown in fig. 1, in the application scenario of song list generation, a service end corresponding to a music application program and at least one client (e.g., client 1-N) accessing to the service end may be included; the client may be a music application, and the server may be a server corresponding to the music application.

In practical applications, the server may interface with a song database, so that songs stored in the song database may be provided to the user through the client.

For example, the server may recommend a personalized song for the user, and push the generated recommended song list to the user through the client.

Specifically, the client may output a user interface to the user as shown in fig. 2A. The user interface can display the song menu picture and the song menu name of the recommended song menu pushed to the user by the server; the recommended menu may include a plurality of songs that the user may prefer. The user can click the position of the menu picture or the menu name of the recommended menu in the user interface, so that the client jumps to the user interface which is used for displaying the menu details of the recommended menu and is shown in fig. 2B, and the user can check the songs in the recommended menu and select the songs from the recommended menu to listen to.

Exemplary method

A method of song list generation according to an exemplary embodiment of the present disclosure is described below with reference to fig. 3-5 in conjunction with the application scenario of fig. 1. It should be noted that the above application scenarios are merely illustrated for the convenience of understanding the spirit and principles of the present disclosure, and the embodiments of the present disclosure are not limited in this respect. Rather, embodiments of the present disclosure may be applied to any scenario where applicable.

Referring to fig. 3, fig. 3 schematically shows a flow chart of a song list generation method according to an embodiment of the present disclosure.

The song list generation method can be applied to the server; the song list generating method may include the steps of:

step 301: and acquiring a preference song set of the target user, and determining a seed song from the preference song set.

In practical application, when a user starts the client, the client sends a request for obtaining a recommended song list to the server, so that the server responds to the request and carries out personalized song recommendation for the user. Alternatively, the user may send a request for obtaining the recommended song list to the server through the client, for example: and the user initiates a request for creating a new song list on the client side, so that the client side sends a request for obtaining a recommended song list to the server side, and the server side responds to the request and carries out personalized song recommendation for the user.

In this embodiment, in order to perform personalized song recommendation for a certain user (referred to as a target user), a preference song set of the target user may be obtained first, and a seed song may be determined from the obtained preference song set.

The seed song may be a song with the highest preference degree of the target user in the preference song set.

In practical applications, for a user, the preferred songs of the user may be songs that have been played by the user in the last period of time, songs that have been shared by the user in the last period of time, songs that have been added to a playlist by the user, songs that have been "liked" by the user, songs that have been collected by the user, and so on. Therefore, the songs played by the target user in a preset time period can be obtained according to the historical playing records of the target user; acquiring songs shared by the target user within a preset time period according to the historical sharing record of the target user; acquiring songs added to the playlist by the target user according to the playlist of the target user; acquiring songs marked with 'liking' by the target user according to a song list created by the target user and used for recording the songs marked with 'liking' by the target user; acquiring songs collected by the target user according to the created song list for collecting the songs by the target user; subsequently, the acquired songs may be determined as the preferred songs of the target user, and the determined preferred songs are added to the preferred song set.

Step 302: and acquiring a picture set associated with the seed song, calculating preference prediction scores of the target user for each picture in the picture set, and determining a seed song picture from the picture set according to the preference prediction scores.

In this embodiment, in the case that the seed song is determined, a picture set associated with the seed song may be acquired, and a preference prediction score of the target user for each picture in the picture set (i.e., a predicted preference score of the target user for the picture) is calculated, so as to determine a seed song picture from the picture set according to the calculated preference prediction score.

In practical applications, for a certain song, the picture associated with the song may be a cover picture of a song album to which the song belongs, a portrait picture of an artist singing the song, a genre picture corresponding to a song genre of the song, and the like. Therefore, the cover pictures of the song album to which the seed song belongs can be acquired, the portrait pictures of the artist performing the seed song can be acquired, the genre pictures corresponding to the song genre of the seed song can be acquired, and the acquired pictures can be added to the picture set associated with the seed song.

Note that the genre picture corresponding to the song genre of the song may be a picture set in advance by a technician. For example, a technician may pick some pictures in advance and label each picture with style labels such as cheerful, traumatic, exciting, etc.; subsequently, the picture marked with the cheerful style label may be determined as the style picture corresponding to the song whose style is cheerful, the picture marked with the impairment style label may be determined as the style picture corresponding to the song whose style is impairment, the picture marked with the exciting style label may be determined as the style picture corresponding to the song whose style is exciting, and so on.

Step 303: and recalling similar songs based on each preference song in the preference song set, and determining other preference songs except the seed song and the similar songs in the preference song set as candidate songs.

In this embodiment, when the seed song is determined, similar songs similar to the respective preference songs may be recalled based on the respective preference songs in the preference song set, and the recalled similar songs and other preference songs in the preference song set except the seed song may be determined as candidate songs.

Step 304: and calculating the preference prediction score of the target user on the candidate songs, and generating a song list according to the preference prediction score.

In this embodiment, in the case that the candidate songs are determined, preference prediction scores of the target user for respective candidate songs (i.e., predicted preference scores of the target user for songs) may be calculated, and a song list may be generated according to the calculated preference prediction scores. For example, a list of songs may be generated based on the candidate songs having the higher preference prediction scores and the seed songs described above.

In one embodiment, the target user preference prediction score for the candidate song may be calculated based on multimodal data corresponding to the seed song, the candidate song, and the target user history-played song, respectively.

Wherein each source or form of data can be considered a modality, said multimodal data being usable to characterize combined data of a plurality of audio data, pictorial data and attribute data corresponding to a song; the attribute data may include data associated with the attributes of the song itself or the attributes of the user, such as the genre of the song, the language of the lyrics, the click-through rate (e.g., the rate of users who clicked on the song for play), the play rate (e.g., the rate of users who played the song in its entirety), the collection rate (e.g., the rate of users who added the song to the menu for collecting the song), and so forth.

In practical applications, the historical songs played by the target user may include songs that have been played by the target user in the last period of time.

It should be noted that the history playing songs of the target user may only include one song that has been played by the target user in history, or may include a song sequence that is composed of a plurality of songs that have been played by the target user in history.

Step 305: and generating a song list based on the seed song picture and the song list.

In this embodiment, in the case where the above-described song list is generated, a song list may be generated based on the above-described seed song picture and the generated song list.

Specifically, the seed song picture may be used as a song list picture in the generated song list, and the song list may be used as a song list in the generated song list.

In practical application, in the case that the song list is generated, the song list may be pushed to the target user through the client as a recommended song list of the target user, so that the target user can view the recommended song list and select a song from the recommended song list for listening.

The present embodiment is described in detail below in terms of extracting feature vectors, determining seed songs, determining seed song pictures, determining candidate songs, calculating user preference prediction scores for the candidate songs, and generating a song list.

(1) Extracting feature vectors

In one embodiment, since the songs in the song database docked by the server are not changed except for being deleted, in order to save the time for extracting the feature vectors corresponding to the songs, the songs in the song database may be preprocessed to extract the feature vectors corresponding to the songs, respectively.

Specifically, each song may be preprocessed according to the multimodal data respectively corresponding to each song in the song database, so as to extract multimodal feature vectors respectively corresponding to the multimodal data of each song.

In this case, when the preference prediction score of the target user for the candidate song is calculated, the feature vectors corresponding to the seed song, the candidate song and the history playing song of the target user obtained through the preprocessing can be directly found, and feature vector extraction for the songs is not needed, so that time for extracting the feature vectors corresponding to the songs can be saved, the speed of calculating the preference prediction score of the target user for the candidate song is accelerated, and personalized song recommendation can be performed for the user quickly.

In practical applications, in the first aspect, a machine learning model may be used to extract feature vectors corresponding to audio data of a song.

Specifically, sample songs of a plurality of songs may be obtained first, and corresponding song styles may be labeled for the sample songs; subsequently, the audio data of the sample song with the labeled song style can be input into a machine learning model (e.g., YAMNet model) for supervised training. After the machine learning model training is completed, the audio data of any song in the song database can be input into the machine learning model for prediction calculation. In this case, the prediction result output by the last layer (i.e., the output layer) of the machine learning model may not be obtained, but the feature vector output by the intermediate layer of the machine learning model may be obtained; at this time, the obtained feature vector is the feature vector corresponding to the audio data of the song.

In a second aspect, a machine learning model may be employed to extract feature vectors corresponding to picture data of a song.

Specifically, sample songs of a plurality of songs may be obtained first, and corresponding picture types may be labeled for the sample songs; subsequently, the picture data of the sample song labeled with the picture type can be input into a machine learning model (for example, an ImageNet-based picture classification model) for supervised training. After the training of the machine learning model is completed, the image data of any song in the song database can be input into the machine learning model for prediction calculation. In this case, the prediction result output by the last layer (i.e., the output layer) of the machine learning model may not be obtained, but the feature vector output by the intermediate layer of the machine learning model may be obtained; at this time, the obtained feature vector is the feature vector corresponding to the picture data of the song.

In a third aspect, an embedding process may be performed on attribute data of a song to extract a feature vector corresponding to the attribute data of the song. For example, one-hot encoding may be performed on attribute data of a song, and a vector obtained by the encoding may be determined as a feature vector corresponding to the attribute data of the song.

(2) Determining seed songs

In this embodiment, the seed song may be determined from the acquired preference song set.

In an embodiment, the preference degree of the target user for each preferred song in the preferred song set may be determined based on the historical play behavior data of the target user, and the preferred song with the highest preference degree determined in the preferred song set is determined as the seed song.

In practical applications, different preference degrees can be set for different playing behaviors of the song by the user. In this case, for any preferred song in the preferred song set, the historical playing behavior of the target user for the preferred song may be determined according to the historical playing behavior data of the target user, and the sum of the preference degrees corresponding to the determined historical playing behavior is calculated as the preference degree of the target user for the preferred song.

For example, assume that the user's preference for the playing behavior of a song is set as shown in table 1 below:

TABLE 1

Act of playing Degree of preference
Clicking the song to play, but not playing the song completely 0.4
Completely played song 0.6
Marking the songs as 'liking' 1
Collection of songs 0.8
Adding songs to a playlist 0.8

Further assume that determining the historical playing behavior of the target user for a preferred song according to the historical playing behavior data of the target user includes: the songs that have been completely played and the songs that have been collected, the target user's preference for the preferred song may be determined to be 0.6+ 0.8-1.4.

(3) Determining seed song pictures

In this embodiment, a picture set associated with the seed song may be obtained, and a preference prediction score of the target user for each picture in the picture set may be calculated, so as to determine the seed song picture from the picture set according to the calculated preference prediction score.

In one embodiment, a machine learning model may be used to calculate the preference score of the target user for each picture in the picture set.

Specifically, a plurality of users may be obtained as sample users, a plurality of pictures may be obtained as sample pictures, user portrait data of the sample users and picture attribute data of the sample pictures may be used as feature data, preference scores of the sample users on the sample pictures may be used as prediction results, and supervised training may be performed on a machine learning model serving as a picture prediction model. After the training of the picture prediction model is completed, for any picture (called a target picture) in the picture set, the user image data of the target user and the picture attribute data of the target picture may be input into the picture prediction model for prediction calculation, so as to obtain a preference prediction score of the target user for the target picture.

The image attribute data may include data such as an image type (e.g., a cover page image, a portrait image, a style image, etc.), an image style (e.g., cheerful, feelings of injury, exciting, etc.), an image conversion rate (e.g., a rate of users clicking a song menu corresponding to the image after viewing the image), and the like; the user portrait data may include data of a user preferred picture type, a user preferred picture style, and the like.

In practical applications, the picture prediction model may be a Logistic Regression (LR) model.

Further, in an embodiment, in order to improve the possibility that the user enters the recommended song list to listen to the song, after the preference prediction score of the target user for each picture in the picture set is calculated, the picture with the highest calculated preference prediction score in the picture set may be determined as the seed song picture.

(4) Determining candidate songs

In this embodiment, similar songs similar to the respective preferred songs may be recalled based on the respective preferred songs in the preferred song set, and the recalled similar songs and other preferred songs in the preferred song set except the seed song may be determined as the candidate songs.

In one embodiment, when similar songs similar to the respective preferred songs in the preferred song set are recalled, for any song in the song database (referred to as a target song), the similarity between the target song and the respective preferred songs may be calculated, and it may be determined whether the calculated similarity reaches a preset threshold (referred to as a first threshold). If the calculated similarity reaches the first threshold, the target song may be determined to be the similar song.

Specifically, when the similarity between the target song and any one of the preference songs in the preference song set is calculated, the number M of users who have played both the target song and the preference song may be counted, and the number N of users who have played the target song or the preference song may be counted; at this time, the similarity S between the target song and the preferred song may be calculated by using the following formula:

if the similarity between the target song and any one of the preference songs in the preference song set reaches the first threshold value, the target song can be determined as the similar song.

Further, in an embodiment, in order to increase the speed of generating the recommended song list, the similar songs may be filtered to determine a part of the similar songs and other preferred songs in the preferred song set except the seed song as the candidate songs.

When the similar songs are screened, the preference degree of the target user to each preference song in the preference song set can be determined based on the historical playing behavior data of the target user; the specific implementation method of this step may refer to the content in the determined seed song, which is not described herein again.

Subsequently, based on the preference of the target user for each preferred song in the preferred song set and the similarity between the target song and each preferred song, the interest level of the target user for the target song is calculated, and whether the calculated interest level reaches a preset threshold (referred to as a second threshold) is determined. If the calculated interest level reaches the second threshold, the target song may be determined as the candidate song. That is, the songs with the interest degree reaching the second threshold value in the similar songs may be screened out, and the part of songs and the other preferred songs in the preferred song set except the seed song may be determined as the candidate songs.

Specifically, when calculating the interest level of the target user in the target song, the following formula may be adopted:

wherein I (U, B) represents the interest degree of the user U in the song B, N (U) represents the preference song set of the user U, M (B, k) represents k songs with the highest similarity with the song B, S (B, A) represents the similarity of the song B and the song A, and P (U, A) represents the preference degree of the user U for the song A; the value of k can be preset by the skilled person.

For example, assume that Song A is included in the set of preferred songs of user U1Song A2Song A3And Song A4And the 3 songs with the highest similarity to song B are song A1Song A2Song A3Then the interest level of the user U in song B is:

I(U,B)=S(B,A1)P(U,A1)+S(B,A2)P(U,A2)+S(B,A3)P(U,A3)

wherein, S (B, A)1) Representing Song B and Song A1Similarity of (A), (B), P (U, A)1) Representing user U vs. Song A1The preference degree of (c); s (B, A)2) Representing Song B and Song A2Similarity of (A), (B), P (U, A)2) Representing user U vs. Song A2The preference degree of (c); s (B, A)3) Representing Song B and Song A3Similarity of (A), (B), P (U, A)3) Representing user U vs. Song A3The preference of (c).

(5) Computing a user preference prediction score for a candidate song

In this embodiment, the preference prediction scores of the target user for each of the candidate songs may be calculated.

Specifically, the preference prediction score of the target user for the candidate song may be calculated according to the multimodal data corresponding to the seed song, the candidate song, and the history playing song of the target user, respectively.

In one embodiment, when calculating the preference prediction score of the target user for the candidate song, for any one of the candidate songs, multimodal data corresponding to the seed song (referred to as first multimodal data), multimodal data corresponding to the candidate song (referred to as second multimodal data), and multimodal data corresponding to the target user's history song (referred to as third multimodal data) may be input to a song prediction model and subjected to prediction calculation, so that the preference prediction score of the target user for the candidate song is obtained.

Referring to fig. 4, fig. 4 schematically illustrates a schematic diagram of a song prediction model according to an embodiment of the present disclosure.

As shown in fig. 4, the song prediction model may include a vector conversion layer, a crossover processing layer, and a full connection layer.

In the first aspect, the vector conversion layer may be configured to search for a first multi-modal feature vector corresponding to the first multi-modal data, search for a second multi-modal feature vector corresponding to the second multi-modal feature, search for a third multi-modal feature vector corresponding to the third multi-modal feature, and transmit the searched first multi-modal feature vector, the searched second multi-modal feature vector, and the searched third multi-modal feature vector to the cross processing layer.

The first multimodal feature vector may be obtained by preprocessing for extracting feature vectors from the seed song, the second multimodal feature vector may be obtained by preprocessing for extracting feature vectors from the candidate song, and the third multimodal feature vector may be obtained by preprocessing for extracting feature vectors from the history-played song of the target user.

In a case where the history-played song of the target user includes a song sequence including a plurality of songs historically played by the target user, the third multimodal feature vector may include a multimodal feature vector sequence including multimodal feature vectors respectively corresponding to multimodal data of the plurality of songs.

For example, assume that the song sequence in the target user's historically played songs is as shown in table 2 below:

TABLE 2

Song 1 Song 2 …… Song N

The third multimodal feature vector may include a feature vector sequence corresponding to audio data as shown in table 3 below, a feature vector sequence corresponding to picture data as shown in table 4 below, and a feature vector sequence corresponding to attribute data as shown in table 5 below:

TABLE 3

Audio feature vector 1 Audio feature vector 2 …… Audio feature vector N

The audio feature vector 1 is a feature vector corresponding to the audio data of song 1, the audio feature vector 2 is a feature vector corresponding to the audio data of song 2, and so on.

TABLE 4

Picture feature vector 1 Picture feature vector 2 …… Picture feature vector N

The picture feature vector 1 is a feature vector corresponding to the picture data of the song 1, the picture feature vector 2 is a feature vector corresponding to the picture data of the song 2, and so on.

TABLE 5

Attribute feature vector 1 Attribute feature vector 2 …… Attribute feature vector N

The attribute feature vector 1 is a feature vector corresponding to the attribute data of song 1, the attribute feature vector 2 is a feature vector corresponding to the attribute data of song 2, and so on.

In the second aspect, the cross processing layer may be configured to cross-process the first multimodal feature vector and the second multimodal feature vector to obtain a first cross feature vector, cross-process the third multimodal feature vector and the second multimodal feature vector to obtain a second cross feature vector, and transmit the first multimodal feature vector, the second multimodal feature vector, the third multimodal feature vector, and the first cross feature vector and the second cross feature vector obtained by the cross processing to the full connection layer.

Specifically, the first multimodal eigenvectors and the second multimodal eigenvectors may be cross-processed based on a cosine (cos) similarity algorithm.

Furthermore, the third multi-modal feature vector and the second multi-modal feature vector may be cross-processed based on an Attention (Attention) mechanism. Taking the feature vector sequence corresponding to the audio data in the third multi-modal feature vector as shown in table 3 above as an example, the third multi-modal feature vector and the second multi-modal feature vector may be processed by interleaving using the following formula:

wherein, VURepresenting vectors, V, resulting from the interleavingBRepresenting audio feature vectors, V, in the multimodal feature vectors corresponding to candidate Song BiRepresenting the ith audio feature vector in the feature vector sequence corresponding to the audio data, and g representing a preset function.

In the cross processing of the multi-modal feature vectors, the feature vectors of the same modality in the multi-modal feature vectors may be cross processed. Taking the first multi-modal feature vector and the second multi-modal feature vector as an example, the audio feature vector in the first multi-modal feature vector and the audio feature vector in the second multi-modal feature vector may be processed in an intersecting manner, the picture feature vector in the first multi-modal feature vector and the picture feature vector in the second multi-modal feature vector may be processed in an intersecting manner, and the attribute feature vector in the first multi-modal feature vector and the attribute feature vector in the second multi-modal feature vector may be processed in an intersecting manner; and so on.

In a third aspect, the fully-connected layer may be configured to map the stitched feature vector to a preference prediction score of the target user for the candidate song, and output the preference prediction score.

Specifically, the full connection layer may use a Sigmoid function to map the splicing feature vector into a probability value with a value range of [0,1], and output the probability value as a preference prediction score of the target user for the candidate song.

Further, in one embodiment, in order to increase the feature dimension for predicting the preference score, the user representation data and the historical playing behavior data of the target user may be acquired. In this case, the preference prediction score of the target user for the candidate song may be calculated based on the multimodal data corresponding to each of the seed song, the candidate song, and the history-played song of the target user, and the user portrait data and history-played behavior data of the target user.

In practical applications, the user portrait data may include age, gender, city, device operating system, device operator, and other data; the historical playing behavior data may include language preference, genre preference, release age preference, and the like of the user analyzed according to the historical playing behavior of the user, and may further include data such as the number of songs that the user has completely played in a period of time, the number of songs marked as "favorite", and the number of songs collected.

Referring to fig. 5, fig. 5 schematically illustrates a schematic diagram of another song prediction model according to an embodiment of the present disclosure.

Similar to the song prediction model shown in fig. 4, the song prediction model may include a vector conversion layer, an intersection processing layer, and a full connection layer.

In a first aspect, the vector conversion layer may be configured to search for a first multi-modal feature vector corresponding to the first multi-modal data, search for a second multi-modal feature vector corresponding to the second multi-modal feature, search for a third multi-modal feature vector corresponding to the third multi-modal feature, perform feature vector extraction on the user portrait data and the historical playing behavior, and transmit the searched first multi-modal feature vector, the second multi-modal feature vector, the third multi-modal feature vector, the extracted user portrait feature vector, and the historical playing behavior feature vector to the intersection processing layer.

In the second aspect, the intersection processing layer may be configured to perform intersection processing on the first multimodal feature vector and the second multimodal feature vector to obtain a first intersection feature vector, perform intersection processing on the third multimodal feature vector and the second multimodal feature vector to obtain a second intersection feature vector, and transmit the first multimodal feature vector, the second multimodal feature vector, the third multimodal feature vector, the first intersection feature vector and the second intersection feature vector obtained through the intersection processing, and the user-portrait feature vector and the historical-playback-behavior feature vector to the all-connected layer.

In a third aspect, the fully-connected layer may be configured to map the stitched feature vector to a preference prediction score of the target user for the candidate song, and output the preference prediction score.

(6) Generating a song list

In this embodiment, a song list may be generated according to the calculated preference prediction scores of the target user for each candidate song in the candidate songs, and a song list may be generated based on the seed song picture and the generated song list.

In one embodiment, when generating the song list, the candidate songs may be ranked according to the calculated preference prediction scores of the target user for each of the candidate songs. For example, the candidate songs may be ranked in order of decreasing calculated preference prediction scores.

Subsequently, the song list may be generated according to the ranked candidate songs and the seed song.

Specifically, the seed songs may be arranged in front of the ranked candidate songs to generate a song list.

Assuming that the seed song is song X and the candidate songs include candidate song 1, candidate song 2, candidate song 3, and candidate song 4, where the prediction score of the target user for the candidate song 3 is greater than the prediction score of the target user for the candidate song 1, the prediction score of the target user for the candidate song 1 is greater than the prediction score of the target user for the candidate song 2, and the prediction score of the target user for the candidate song 2 is greater than the prediction score of the target user for the candidate song 4, a list of songs may be generated as shown in table 6 below:

TABLE 6

Serial number Song (music)
1 Song X
2 Candidate song 3
3 Candidate song 1
4 Candidate song 2
5 Candidate song 4

At this time, a song list as shown in fig. 6 may be generated based on the picture X determined as the seed song picture from the picture set associated with song X and the song list as shown in table 6 above.

In another illustrated embodiment, when generating the song list, the candidate songs may be presented to the target user according to the calculated preference prediction scores of the target user for each of the candidate songs. Subsequently, a portion of the songs from the candidate songs may be selected by the target user and the list of songs may be generated based on the portion of the songs selected by the target user.

Specifically, the server may send the candidate songs to the client according to the preference prediction scores, and the client may display the candidate songs to the target user through a user interface. The target user can select a part of songs from the candidate songs in the user interface, and the client sends the part of songs selected by the target user to the server, so that the server can generate the song list according to the part of songs.

In practical application, the candidate songs may be ranked according to the calculated preference prediction scores of the target user for each of the candidate songs, and then the ranked candidate songs are displayed to the target user, so that the target user can select the candidate songs from the ranked candidate songs.

It should be noted that, the seed song and the candidate songs may be presented to the target user together, the target user selects a part of songs from the seed song and the candidate songs, and the song list is generated according to the part of songs selected by the target user.

Continuing with the above example of song X, candidate song 1, candidate song 2, candidate song 3, and candidate song 4, the server may send the candidate songs to the client according to the predicted preference scores of the target user for each of the candidate songs, and the client may display the candidate songs to the target user through the user interface shown in fig. 7A. The target user can select a song through the option of 'adding a song list' in the user interface, click the button of 'generating a song list' after the song selection is completed, and the client responds to the click operation on the button to send the candidate song 3 and the candidate song 4 selected by the target user to the server, so that the server can generate the song list according to the song X, the candidate song 3 and the candidate song 4. At this time, a song list as shown in fig. 7B may be generated based on the picture X, and the song list.

According to the embodiment of the disclosure, a seed song may be determined from a preference song set of a user, a seed song picture may be determined according to a preference prediction score of the user on a picture associated with the seed song, similar songs may be recalled based on the preference songs in the preference song set to determine other preference songs in the preference song set except the seed song and the similar songs as candidate songs, and subsequently a song list may be generated according to the preference prediction score of the user on the candidate songs, and a song list may be generated according to the seed song picture and the song list. By adopting the method, the preference degree of the user to the generated song list picture of the song list and the songs in the song list can be ensured, so that the song recommendation accuracy for the user can be improved, and the user experience is improved.

Exemplary Medium

Having described the method of the exemplary embodiment of the present disclosure, the medium of the exemplary embodiment of the present disclosure is explained next with reference to fig. 8.

In the present exemplary embodiment, the above-described method may be implemented by a program product, such as a portable compact disc read only memory (CD-ROM) and including program code, and may be executed on a device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RE, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the C language or similar programming languages. The program code may execute entirely on the user computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

Exemplary devices

Having described the media of the exemplary embodiments of the present disclosure, the apparatus of the exemplary embodiments of the present disclosure is described next with reference to fig. 9.

The implementation process of the functions and actions of each module in the following device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again. For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points.

Fig. 9 schematically illustrates a song list generation apparatus according to an embodiment of the present disclosure; the device comprises:

a first determining module 901, configured to obtain a preference song set of a target user, and determine a seed song from the preference song set;

a second determining module 902, configured to obtain a picture set associated with the seed song, calculate a preference prediction score of the target user for each picture in the picture set, and determine a seed song picture from the picture set according to the preference prediction score;

a third determining module 903, configured to recall a similar song based on each preferred song in the preferred song set, and determine other preferred songs in the preferred song set except the seed song and the similar song as candidate songs;

a calculating module 904, configured to calculate a preference prediction score of the target user for the candidate song, and generate a song list according to the preference prediction score;

a generating module 905, configured to generate a song list based on the seed song picture and the song list.

Optionally, the calculating module 904 is specifically configured to:

and displaying the candidate songs to the target user according to the preference prediction scores, and generating a song list according to the candidate songs selected by the target user.

Optionally, the calculating module 904 is specifically configured to:

calculating preference prediction scores of the target user for the candidate songs according to multi-mode data respectively corresponding to the seed songs, the candidate songs and the historical playing songs of the target user;

wherein the multimodal data is used to characterize combined data of a plurality of the audio data, the picture data and the attribute data corresponding to the song.

Optionally, the calculating module 904 is specifically configured to:

inputting first multi-modal data corresponding to the seed song, second multi-modal data corresponding to the candidate song and third multi-modal data corresponding to the historical playing song of the target user into a song prediction model for prediction calculation to obtain a preference prediction score of the target user for the candidate song;

wherein the song prediction model comprises: the device comprises a vector conversion layer, a cross processing layer and a full connection layer;

the vector conversion layer is used for searching a first multi-modal feature vector corresponding to the first multi-modal data, searching a second multi-modal feature vector corresponding to the second multi-modal feature, searching a third multi-modal feature vector corresponding to the third multi-modal feature, and transmitting a searching result to the cross processing layer;

the cross processing layer is used for performing cross processing on the first multi-modal feature vector and the second multi-modal feature vector to obtain a first cross feature vector, performing cross processing on the third multi-modal feature vector and the second multi-modal feature vector to obtain a second cross feature vector, and transmitting spliced feature vectors obtained after splicing processing is performed on the first multi-modal feature vector, the second multi-modal feature vector, the third multi-modal feature vector, the first cross feature vector and the second cross feature vector to the full connection layer;

the full-connection layer is used for mapping the splicing feature vector into a preference prediction score and outputting the preference prediction score.

Optionally, the apparatus further comprises:

the preprocessing module 906 is configured to perform preprocessing for extracting feature vectors on the multimodal data according to the multimodal data corresponding to each song in the song database, so as to extract multimodal feature vectors corresponding to the multimodal data.

Optionally, the apparatus further comprises:

an obtaining module 907, configured to obtain user portrait data and historical playing behavior data of the target user;

the calculation module 904 is specifically configured to:

and calculating the preference prediction score of the target user for the candidate song according to the multi-modal data respectively corresponding to the seed song, the candidate song and the historical playing song of the target user, and the user portrait data and the historical playing behavior data of the target user.

Optionally, the first determining module 901 is specifically configured to:

determining the preference degree of the target user for each preference song in the preference song set based on the historical playing behavior data of the target user;

determining the preference song with the highest preference degree in the preference song set as the seed song.

Optionally, the third determining module 903 is specifically configured to:

calculating the similarity between the target song and each preference song in the preference song set aiming at any target song in a song database, and determining whether the similarity reaches a preset first threshold value;

and if the similarity reaches the first threshold value, determining the target song as a similar song.

Optionally, the third determining module 903 is specifically configured to:

determining the preference degree of the target user for each preference song in the preference song set based on the historical playing behavior data of the target user;

calculating the interest degree of the target user for the target song based on the similarity between the target song and each preference song in the preference song set and the preference degree of the target user for each preference song in the preference song set, and determining whether the interest degree reaches a preset second threshold value;

and if the interest degree reaches the second threshold value, determining the target song with the interest degree reaching the second threshold value and other preference songs in the preference song set except the seed song as candidate songs.

Optionally, the second determining module 902 is specifically configured to:

inputting picture attribute data of the target picture and user portrait data of the target user into a picture prediction model for prediction calculation aiming at any target picture in the picture set, and obtaining a preference prediction score of the target user on the target picture; the image prediction model is a machine learning model trained on preference scores of sample users on sample images.

Optionally, the second determining module 902 is specifically configured to:

determining the picture with the highest preference prediction score in the picture set as a seed song picture.

Optionally, the calculating module 904 is specifically configured to:

and sorting the candidate songs according to the preference prediction scores, and generating the song list according to the seed songs and the sorted candidate songs.

Exemplary computing device

Having described the methods, media, and apparatus of the exemplary embodiments of the present disclosure, a computing device of the exemplary embodiments of the present disclosure is described next with reference to fig. 10.

The computing device 1000 shown in fig. 10 is only one example and should not impose any limitations on the functionality or scope of use of embodiments of the disclosure.

As shown in fig. 10, computing device 1000 is embodied in the form of a general purpose computing device. Components of computing device 1000 may include, but are not limited to: the at least one processing unit 1001 and the at least one storage unit 1002 are connected to a bus 1003 that connects different system components (including the processing unit 1001 and the storage unit 1002).

The bus 1003 includes a data bus, a control bus, and an address bus.

The storage unit 1002 can include readable media in the form of volatile memory, such as Random Access Memory (RAM)10021 and/or cache memory 10022, and can further include readable media in the form of non-volatile memory, such as Read Only Memory (ROM) 10023.

The storage unit 1002 may also include a program/utility 10025 having a set (at least one) of program modules 10024, such program modules 10024 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Computing device 1000 may also communicate with one or more external devices 1004 (e.g., keyboard, pointing device, etc.).

Such communication may occur via input/output (I/O) interface 1005. Moreover, computing device 1000 may also communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the internet) through network adapter 1006. As shown in fig. 10, network adapter 1006 communicates with the other modules of computing device 1000 via bus 1003. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with computing device 1000, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

It should be noted that although in the above detailed description several units/modules or sub-units/modules of the singing sheet generating apparatus are mentioned, such division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module, in accordance with embodiments of the present disclosure. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.

Further, while the operations of the disclosed methods are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

While the spirit and principles of the present disclosure have been described with reference to several particular embodiments, it is to be understood that the present disclosure is not limited to the particular embodiments disclosed, nor is the division of aspects, which is for convenience only as the features in such aspects may not be combined to benefit. The disclosure is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

30页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种居家老人安全防护监控系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!