Intelligent recommendation method and system for music online teaching videos

文档序号:1861312 发布日期:2021-11-19 浏览:31次 中文

阅读说明:本技术 一种音乐线上教学视频智能推荐方法及系统 (Intelligent recommendation method and system for music online teaching videos ) 是由 张博阳 于 2021-07-31 设计创作,主要内容包括:本发明涉及一种音乐线上教学视频智能推荐方法及系统,该方法通过先训练好分别能够对用户气息、音准、节奏进行等级分类的三个神经网络,然后获取用户试唱的某段音频信息,从中提取能够表征用户气息水平和音准水平的窄带语谱图,和表征用户节奏水平的宽带语谱图,将窄带语谱图分别代入对用户气息、音准进行等级分类的两个神经网络,得到用户的气息、节奏等级;将宽带语谱图代入对用户节奏进行等级分类的神经网络,得到用户的节奏等级;最后,匹配最佳推荐的音乐教学视频,即选择与用户的气息等级、音准等级、节奏等级相适应的音乐教学视频,作为最佳推荐的音乐教学视频,推荐给用户,实现了音乐教学视频的智能推荐。(The invention relates to a music online teaching video intelligent recommendation method and a system, the method comprises the steps of training three neural networks which can classify the breath, intonation and rhythm of a user respectively, acquiring a certain section of audio information which is tried to sing by the user, extracting a narrow-band spectrogram which can represent the breath level and intonation level of the user and a wide-band spectrogram which represents the rhythm level of the user from the audio information, and substituting the narrow-band spectrogram into two neural networks which classify the breath and intonation of the user respectively to obtain the breath and rhythm grade of the user; substituting the broadband spectrogram into a neural network for carrying out grade classification on the user rhythm to obtain the rhythm grade of the user; and finally, matching the best recommended music teaching video, namely selecting the music teaching video which is adaptive to the breath level, the intonation level and the rhythm level of the user, serving as the best recommended music teaching video, and recommending the best recommended music teaching video to the user, so that the intelligent recommendation of the music teaching video is realized.)

1. An intelligent recommendation method for online teaching videos of music is characterized by comprising the following steps:

step S1: obtaining audio information of a user according to a test singing result of a piece of music selected by the user, and extracting breath information, intonation information and rhythm information of the user according to the audio information;

step S2: inputting the breath information of the user into the trained first neural network, and outputting a grade level corresponding to the breath information of the user, namely a breath grade; inputting the intonation information of the user into the trained second neural network, and outputting the level corresponding to the intonation information of the user, namely the intonation level; inputting the rhythm information of the user into the trained third neural network, and outputting the grade level corresponding to the rhythm information of the user, namely the rhythm grade;

step S3: matching the best recommended music teaching video, wherein the matching process comprises the following steps: and selecting the music teaching video which is adaptive to the breath level, the intonation level and the rhythm level of the user as the best recommended music teaching video, and recommending the best recommended music teaching video to the user.

2. The method for intelligently recommending online tutoring videos as claimed in claim 1, further comprising: determining the error tolerance of the music according to the music difficulty level of the user for the test singing; according to the error tolerance of the user' S test singing music, the breath level, intonation level and rhythm level of the user in the step S2 are adjusted to obtain a corrected breath level, a corrected intonation level and a corrected rhythm level of the user; and comparing the set breath level, intonation level and rhythm level of each music teaching video with the actual breath level, actual intonation level and actual rhythm level of the user, and recommending the music teaching video with the minimum comparison difference to the user as the best recommended teaching video.

3. The method for intelligently recommending music online instructional videos according to claim 2, wherein said error tolerance is calculated as follows:

wherein, R is the tolerance of error, n is the difficulty level of selecting songs by the user, and n is a set value.

4. The method for intelligently recommending videos for on-line music teaching according to claim 1, wherein in step S1, obtaining audio information of the user, and extracting the information about the user' S breath, intonation, and tempo from the audio information comprises: converting the test audio of the user into a spectrogram comprising a narrow-band spectrogram and a wide-band spectrogram, wherein the wide-band spectrogram is a spectrogram obtained by framing in a short time window of a first framing time; the narrow-band spectrogram is a spectrogram obtained by framing in a short time window of second framing time, and the first framing time is less than the second framing time; the narrow-band spectrogram is used as the breath information and the intonation information of the user, and the wide-band spectrogram is used as the rhythm information of the user.

5. The method of claim 1, wherein the first neural network W is a neural network1A second neural network W2And a third neural network W3The audio information of a user is input into the encoder, the encoder outputs a characteristic vector, and finally the characteristic vector is classified by the classifier.

6. The method for intelligently recommending videos for on-line music teaching according to claim 5, wherein the three neural networks are trained in the same way, wherein the training process of any neural network comprises the following steps:

A) obtaining training sample, namely data set of neural network, if it is the first neural network or the second neural network, its data set includes collected narrow-band spectrogram IzAnd a corresponding base level label; if the third neural network is adopted, the data set comprises a broadband spectrogram IkAnd a corresponding base level label;

B) converting each spectrogram in the data set into a one-dimensional feature vector, namely processing the grey scale of the spectrogram to obtain 1 row and n columns of grey scale values as the one-dimensional feature vector; forming a matrix with the size of [ M, w × h ] by using the feature vectors of all the spectrogram, wherein M is the total number of the spectrogram in the current data set, and w × h is the size information of each spectrogram; representing each spectrogram by using a two-dimensional vector by using a data dimension reduction technology;

C) mapping each spectrogram onto a constructed two-dimensional plane according to the two-dimensional vector of the spectrogram, wherein each point on the two-dimensional plane can correspond to one type of spectrogram in the data set and is marked as Iz(x, y), wherein x, y represent the corresponding coordinates of the spectrogram; setting the pixel value G (x, y) of the corresponding point as a basic grade label corresponding to the spectrogram to construct a breath basic grade distribution map P1

D) Distribution map P according to basic grade of breath1According to the distribution condition of each point, points with the same pixel value are classified into the same cluster, and L clusters are determined, wherein L is an integer larger than 1; thereby ensuringDetermining connected domain information S and a clustering center in each cluster;

E) determining a Thiessen polygon according to the clustering centers of the clusters, and dividing the Thiessen polygon into L Thiessen polygons by using a perpendicular bisector of a connecting line between the clustering centers, wherein each Thiessen polygon corresponds to one cluster;

F) distribution map P according to basic grade of breath1Each cluster in the cluster is corresponding to a cluster center point Z, a Thiessen polygon T and regional connected domain information S, and a breath basic grade distribution diagram P is calculated1The ratio of the sum of the translation distances of the perpendicular bisectors between every two adjacent Thiessen polygons to the distance between the central points is used as the discrimination between the two basic grade labels

G) Constructing a triple Loss function Loss according to the discrimination among the basic grade labels and a triple sample [ A, P, N ] formed by inputting three training samples every time in the training process;

H) inputting the training samples of the current batch into the neural network, training, calculating the triple loss function in the step G), and continuously updating the network parameters by using a gradient descent method until the training of the neural network is completed.

7. The method for intelligently recommending music online tutoring videos according to claim 6, wherein in step C), the step of setting the pixel value G (x, y) of the corresponding point as the base level label corresponding to the spectrogram comprises:

the pixel value G (x, y) of the corresponding point is calculated as follows:

wherein d isuA basic grade label corresponding to the spectrogram u; (x) is a rounding function, so that the finally obtained pixel value is an integer, namely, a basic grade label corresponding to the spectrogram; c (x, y) isNumber of spectrogram corresponding to position (x, y).

8. The method for intelligently recommending music online instructional videos according to claim 6, wherein in step G), the formula for calculating the triple loss function is as follows:

wherein, Loss is a Loss function, n is the number of triple samples, and F (x) is a feature vector obtained by a network encoder after training samples are input; delta is a hyperparameter and is a fixed value;in order to distinguish between the two base levels,triple samples [ A, P, N ] for base level label data of positive and negative samples]Wherein A represents a reference sample, P is a positive sample identical to the label of the reference sample, N is a negative sample different from the label of the reference sample, and + represents the value of "N"]Comparing the sum of the values in the table with zero, and if the value is greater than zero, the value is unchanged; if the value is less than zero, zero is set at the value.

9. The method of claim 8, wherein the degree of discrimination is calculated as follows:

wherein | ZiZm2As a cluster center point ZiAnd ZmThe Euclidean distance of i is not equal to m, i is not less than 1 and not more than 5, m is not less than 1 and not more than 5, Q1,mIs the degree of distinction between the category i and the category m, and the value range is [0,1 ]],Δli+ΔlmIs the sum of the translation distances of the perpendicular bisector between two adjacent Thiessen polygons.

10. An intelligent music online instructional video recommendation system comprising a memory and a processor, and a computer program running on the memory and on the processor, the processor being coupled to the memory, the processor, when executing the computer program, implementing the instructional video intelligent recommendation method of any one of claims 1-9.

Technical Field

The invention relates to the technical field of artificial intelligence and music education, in particular to a music online teaching video intelligent recommendation method and system.

Background

Music is an art for expressing the thought emotion of people and reflecting the real life, music activities are also a part of art education, along with the improvement of living standard, people pay more and more attention to the improvement of art ability, and music education gradually receives attention of people. In the online music education field, generally, after a student registers a music teaching account number on a line, the student selects music teaching videos according to interested courses or self-thought of suitable courses, the selected courses have the defect that the student evaluates the music skills of the student too subjectively, the selected courses are not suitable for the student, or the selected courses are too difficult to achieve the effect of improving the learning of the courses, or the selected courses are too simple, and the music skill improving speed is too slow. In any case, the learning effect of the course is not good.

Disclosure of Invention

The invention aims to provide an intelligent music online teaching video recommendation method and system, which are used for solving the problem of poor learning effect caused by the fact that a student selects a teaching video for learning through subjective feeling or interest.

Therefore, the adopted technical scheme is as follows:

in a first aspect, the invention provides an intelligent recommendation method for a music online teaching video, comprising the following steps:

preferably, step S1: obtaining audio information of a user according to a test singing result of a piece of music selected by the user, and extracting breath information, intonation information and rhythm information of the user according to the audio information;

step S2: inputting the breath information of the user into the trained first neural network, and outputting a grade level corresponding to the breath information of the user, namely a breath grade; inputting the intonation information of the user into the trained second neural network, and outputting the level corresponding to the intonation information of the user, namely the intonation level; inputting the rhythm information of the user into the trained third neural network, and outputting the grade level corresponding to the rhythm information of the user, namely the rhythm grade;

step S3: and matching the best recommended music teaching video, selecting the music teaching video which is adaptive to the breath level, the intonation level and the rhythm level of the user as the best recommended music teaching video, and recommending the best recommended music teaching video to the user.

Preferably, the method further comprises the following steps: determining the error tolerance of the music according to the music difficulty level of the user for the test singing; according to the error tolerance of the user' S test singing music, the breath level, intonation level and rhythm level of the user in the step S2 are adjusted to obtain a corrected breath level, a corrected intonation level and a corrected rhythm level of the user; and comparing the set breath level, intonation level and rhythm level of each music teaching video with the actual breath level, actual intonation level and actual rhythm level of the user, and recommending the music teaching video with the minimum comparison difference to the user as the best recommended teaching video.

Preferably, the error tolerance is calculated as follows:

wherein, R is the tolerance of error, n is the difficulty level of selecting songs by the user, and n is a set value.

Preferably, in step S1, the obtaining of the audio information of the user, and the extracting of the breath information, the intonation information, and the rhythm information of the user based on the audio information includes: the method comprises the steps of converting a test singing audio of a user into a spectrogram comprising a narrow-band spectrogram and a wide-band spectrogram, wherein the narrow-band spectrogram is used as breath information and intonation information of the user, and the wide-band spectrogram is used as rhythm information of the user.

Preferably, in step S1, the obtaining of the audio information of the user, and the extracting of the breath information, the intonation information, and the rhythm information of the user based on the audio information includes: the method comprises the steps of converting a test singing audio of a user into a spectrogram comprising a narrow-band spectrogram and a wide-band spectrogram, wherein the narrow-band spectrogram is used as breath information and intonation information of the user, and the wide-band spectrogram is used as rhythm information of the user.

Preferably, the first neural network W1A second neural network W2And a third neural network W3Adopts the same structure, is a CNN structure, and comprises an encoder and a classifierAnd inputting the audio information of the user into an encoder, outputting the feature vector by the encoder, and finally classifying the feature vector by a classifier.

Preferably, the three neural networks are trained in the same manner, wherein the training process of any one neural network comprises the following steps:

A) obtaining training sample, namely data set of neural network, if it is the first neural network or the second neural network, its data set includes collected narrow-band spectrogram IzAnd a corresponding base level label; if the third neural network is adopted, the data set comprises a broadband spectrogram IkAnd a corresponding base level label;

B) converting each spectrogram in the data set into a one-dimensional feature vector, namely processing the grey scale of the spectrogram to obtain 1 row and n columns of grey scale values as the one-dimensional feature vector; forming a matrix with the size of [ M, w × h ] by using the feature vectors of all the spectrogram, wherein M is the total number of the spectrogram in the current data set, and w × h is the size information of each spectrogram; representing each spectrogram by using a two-dimensional vector by using a data dimension reduction technology;

C) mapping each spectrogram onto a constructed two-dimensional plane according to the two-dimensional vector of the spectrogram, wherein each point on the two-dimensional plane can correspond to one type of spectrogram in the data set and is marked as Iz(x, y), wherein x, y represent the corresponding coordinates of the spectrogram; setting the pixel value G (x, y) of the corresponding point as a basic grade label corresponding to the spectrogram to construct a breath basic grade distribution map P1

D) Distribution map P according to basic grade of breath1According to the distribution condition of each point, points with the same pixel value are classified into the same cluster, and L clusters are determined, wherein L is an integer larger than 1; further determining connected domain information S and a clustering center in each cluster;

E) determining a Thiessen polygon according to the clustering centers of the clusters, and dividing the Thiessen polygon into L Thiessen polygons by using a perpendicular bisector of a connecting line between the clustering centers, wherein each Thiessen polygon corresponds to one cluster;

F) distribution map P according to basic grade of breath1Each of the clusters corresponds toA clustering center point Z, a Thiessen polygon T and the communication domain information S of the region are used for calculating the basic grade distribution diagram P of the breath1The ratio of the sum of the translation distances of the perpendicular bisectors between every two adjacent Thiessen polygons to the distance between the central points is used as the discrimination between the two basic grade labels

G) Constructing a triple Loss function Loss according to the discrimination among the basic grade labels and a triple sample [ A, P, N ] formed by inputting three training samples every time in the training process;

H) inputting the training samples of the current batch into the neural network, training, calculating the triple loss function in the step G), and continuously updating the network parameters by using a gradient descent method until the training of the neural network is completed.

Preferably, in step C), setting the pixel value G (x, y) of the corresponding point as the base level label corresponding to the spectrogram includes:

the pixel value G (x, y) of the corresponding point is calculated as follows:

wherein d isuA basic grade label corresponding to the spectrogram u; (x) is a rounding function, so that the finally obtained pixel value is an integer, namely, a basic grade label corresponding to the spectrogram; c (x, y) is the number of spectrogram corresponding to the position (x, y).

Preferably, in step G), the formula for calculating the triplet loss function is as follows:

wherein, Loss is a Loss function, n is the number of triple samples, and F (x) is a feature vector obtained by a network encoder after training samples are input; delta is a hyperparameter and is a fixed value;in order to distinguish between the two base levels,triple samples [ A, P, N ] for base level label data of positive and negative samples]Wherein A represents a reference sample, P is a positive sample identical to the label of the reference sample, N is a negative sample different from the label of the reference sample, and + represents the value of "N"]Comparing the sum of the values in the table with zero, and if the value is greater than zero, the value is unchanged; if the value is less than zero, zero is set at the value.

Preferably, the calculation formula of the discrimination is as follows:

wherein | ZiZm2As a cluster center point ZiAnd ZmThe Euclidean distance of i is not equal to m, i is not less than 1 and not more than 5, m is not less than 1 and not more than 5, Q1,mIs the degree of distinction between the category i and the category m, and the value range is [0,1 ]],Δli+ΔlmIs the sum of the translation distances of the perpendicular bisector between two adjacent Thiessen polygons.

In a second aspect, the present invention provides an intelligent recommendation system for online teaching videos, comprising:

the intelligent teaching video recommendation method comprises a memory, a processor and a computer program, wherein the computer program runs on the memory and runs on the processor, the processor is coupled with the memory, and the processor realizes the intelligent teaching video recommendation method when executing the computer program.

The invention has the following beneficial effects:

the method takes different singing levels of different people into consideration, and each user sings has own weak item, so that the method extracts breath information, intonation information and rhythm information which can represent the singing level of the user by using the test singing audio of the user, and respectively uses the breath information, the intonation information and the rhythm information as the input of a first neural network for breath grade classification, a second neural network for intonation grade classification and a third neural network for rhythm grade classification, then outputs grade classification of the user in the three aspects, and finally recommends music teaching videos matched in grade for the user according to the three-aspect grades of the user, thereby realizing intelligent recommendation of the music teaching videos, replacing the traditional teaching video learning selected by the subjective feeling or interest of a student, and achieving the purpose of improving the learning effect of the student.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions and advantages of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a flowchart of a method for intelligently recommending a video for online music teaching in embodiment 1 of the present invention;

fig. 2 is a schematic view of a thiessen polygon in embodiment 1 of the present invention.

Detailed Description

The embodiments provided by the invention are specifically described below with reference to the accompanying drawings.

Example 1:

the invention discloses an intelligent music online teaching video recommendation method, which mainly aims to realize intelligent music teaching video recommendation according to the singing level of a user, and the method has the following inventive concept: training three neural networks which can respectively classify the breath, intonation and rhythm of a user in a grade manner, then obtaining a certain section of audio information which is tried to sing by the user, extracting a narrow-band spectrogram which can represent the breath level and intonation level of the user and a wide-band spectrogram which represents the rhythm level of the user, and respectively substituting the narrow-band spectrogram into two neural networks which classify the breath and intonation of the user in a grade manner to obtain the breath and rhythm grade of the user; substituting the broadband spectrogram into a neural network for carrying out grade classification on the user rhythm to obtain the rhythm grade of the user; and finally, matching the best recommended music teaching video, namely selecting the music teaching video which is adaptive to the breath level, the intonation level and the rhythm level of the user as the best recommended music teaching video, and recommending the best recommended music teaching video to the user.

Specifically, as shown in fig. 1, the method includes the following specific steps:

step S1: obtaining audio information of a user according to a test singing result of a piece of music selected by the user, and extracting breath information, intonation information and rhythm information of the user according to the audio information; and determining the error tolerance of the music according to the music difficulty level of the user for the test singing.

In this step, the breath information and the intonation information of the user are both narrow-band spectrogram of the test-singing audio of the user, and the rhythm information of the user is wide-band spectrogram of the test-singing audio of the user.

The test singing audio of the user can be converted into a spectrogram, the spectrogram is divided into a narrow-band spectrogram and a wide-band spectrogram, the narrow-band spectrogram is obtained by processing audio information by using a window with a short time window and a long time window, and the frequency resolution capability of the narrow-band spectrogram is high; the broadband spectrogram is obtained by processing audio information by using a window with a short time window, and the time resolution capability of the broadband spectrogram is strong. Respectively recording a narrow-band spectrogram and a wide-band spectrogram obtained according to the audio information of a user as Iz,Ik. Since the obtaining method of the spectrogram is a known technology, it is not described herein again.

In the step, the broadband spectrogram is a spectrogram obtained by framing in a short time window of a first framing time (about 3 ms); the narrow-band spectrogram is a spectrogram obtained by framing in a short time window of a second framing time (about 20 ms).

In this step, the user selects a good song from the database of the system to try to sing, the difficulty coefficients of the songs to try to sing in the database are different, and the greater the difficulty coefficient is, the more difficult the intonation, breath and rhythm of the song is to control, so different error tolerance R is firstly set for the songs with different difficulty coefficients, the greater the difficulty coefficient is, the greater the tolerance thereof is, and the formula of the tolerance is:

wherein n is the difficulty level of selecting songs by the user and is a set value, and the tolerance R is used in the subsequent process of recommending the music teaching video.

Step S2: inputting the breath information of the user into the trained first neural network, and outputting a grade level (breath grade) corresponding to the breath information of the user; inputting the intonation information of the user into the trained second neural network, and outputting the level (intonation level) corresponding to the intonation information of the user; similarly, the rhythm information of the user is input into the trained third neural network, and the grade level (rhythm grade) corresponding to the rhythm information of the user is output.

In this step, the first neural network W1A second neural network W2And a third neural network W3The audio information of a user is input into the encoder, the encoder outputs a characteristic vector, and finally the classifier classifies the characteristic vector.

In this step, the intonation and breath of a person are mainly reflected in frequency characteristics, so that a narrow-band spectrogram I is obtainedzAs a feature for judging the breath level and intonation level. For rhythm, the time variation characteristic of audio information is mainly reflected, so that the broadband spectrogram IkAs a feature for judging the tempo class.

Because the training of the three neural networks is the same, the invention judges the neural network W according to the breath basic level1For example, the training process of the network is described in detail:

A) obtaining training samples, i.e. neural networks W1The data set comprises the acquired narrow-band spectrogram IzAnd a corresponding base level label.

In the step, the basic grade label is set manually and is according to the narrow-band spectrogram IzBase level labeling to determine corresponding intonation and breathAccording to a broad band spectrogram IkA corresponding tempo base level label is determined. The base ranking of each aspect is divided into 5 rankings, with higher rankings indicating a better base. So far, data sets of breath, intonation and rhythm can be obtained, wherein the data set related to breath is a neural network W1The tone-level aspect of the data set is a neural network W2The rhythm aspect data set is a neural network W3The data set of (2).

The acquisition of the last two data sets is described here to illustrate that for training the last two neural networks, the data sets of the three neural networks can be acquired at one time, and the last two data sets do not participate in the neural network W1And (4) training.

B) Narrow-band spectrogram I in data recording setzThe total number of the two-dimensional feature vectors is M, each spectrogram is converted into a one-dimensional feature vector, the conversion process is to process the spectrogram gray level, the gray values of all pixels in the gray image are obtained in sequence, and the gray values of 1 row and n columns are obtained and serve as the one-dimensional feature vectors.

And forming a matrix with the size of [ M, w × h ] by using the feature vectors of all the spectrogram, wherein M is the number of the spectrogram, w × h is the size information of each spectrogram, and n is w × h. Then, the dimensionality of the matrix of [ M, w × h ] is reduced to [ M,2] by using a data dimensionality reduction technology, and each spectrogram is represented by a 2-dimensional vector, namely a two-dimensional vector. In this step, the data dimension reduction technology is a known technology, and can be implemented by using the existing PCA algorithm or a self-coding network, which is not described herein again.

In this step, there is a certain rule in the distribution of each two-dimensional vector after dimension reduction of the data, for example, two-dimensional vector aggregation distribution belonging to the same basic level label calibrated by human. Moreover, if the two-dimensional vector corresponding to each spectrogram is embodied in the xy coordinate system, the x coordinate and the y coordinate respectively correspond to two elements in the two-dimensional vector, and when there is a certain error in the accuracy of the basic level label artificially calibrated for the spectrogram, the two-dimensional vectors belonging to different basic level labels artificially calibrated may be at the same point of the xy coordinate system, for example, the point actually corresponds to the basic level label labeled 3 and the basic level label labeled 4.

C) Constructing a two-dimensional plane, mapping each spectrogram to the two-dimensional plane according to the two-dimensional vector of the spectrogram, wherein each point on the two-dimensional plane can correspond to one type of spectrogram in the data set and is marked as Iz(x, y); setting the value (pixel value G (x, y)) of the corresponding point as the basic grade label corresponding to the spectrogram to construct the breath basic grade distribution map P1

According to the records in the above steps, because the basic grade label corresponding to the spectrogram is artificially labeled, the situation of wrong labeling caused by too strong subjective factors is inevitable in the labeling process. So that the basic grade distribution map P of the breath is obtained1In the process of (1), graph P1Each position in the graph corresponds to a plurality of spectrogram, and the number of spectrogram corresponding to the position (x, y) is C (x, y), so that the basic grade distribution graph P of the breath is1The calculation formula of the pixel value G (x, y) of (x, y) in (x, y) is:

wherein d isuA basic grade label corresponding to the spectrogram u; and f (X) is a rounding function, so that the finally obtained pixel value is an integer, and C (x, y) is the number of spectrogram corresponding to the position (x, y). The spectrogram corresponding to each position is processed identically to obtain an odour basic grade distribution diagram P1Each position in the figure represents a spectrogram Iz(x, y), the corresponding pixel value is the basic grade of the class spectrogram.

For example, the number of spectrogram corresponding to a certain position (x, y) is C (x, y) 2, u is 2, and d is1=3,d2If G (x, y) is 4, then the base level label of these two spectrograms is marked as 4.

D) For the obtained basic grade distribution map P of breath1Analyzing, classifying the points with the same pixel value into the same category to obtain 5 clusters, wherein each cluster corresponds to a basic grade to obtain each clusterConnected component information S of a cluster (i.e., S in fig. 2, which represents a set of points within a cluster); obtaining the cluster center of each cluster, for example, the cluster center Z of the class 1, by using the coordinates of the sample points corresponding to all spectrogram in the same cluster and the quantity information of each class spectrogram1The calculation formula of (a) is as follows:

wherein, J1A cluster indicating a label category of 1; (x)v,yv) Represents a cluster J1Position of inner pixel point, C (x)v,yv) Representation of spectrogram in data set Iz(xv,yv) The number of (2); z1Is a cluster J1The cluster center of (2). Obtaining the clustering center point of each category according to the same method and recording as Z1,Z2,Z3,Z4,Z5The five clustering center points are shown as five dots in FIG. 2, wherein the numbers 1-5 correspond to the five center points Z respectively1,Z2,Z3,Z4,Z5

E) Determining a Thiessen polygon according to the 5 cluster central points, and drawing P1And dividing the region into 5 regions, wherein each clustering center point corresponds to a Thiessen polygon. In this step, the Thiessen polygon is obtained according to a perpendicular bisector of a connecting line between each point, a point in a Thiessen polygon region has the closest distance to a clustering center point of the region, and the distance from a point on the Thiessen polygon side to two adjacent clustering center points is the same. As shown in FIG. 2, T represents the cluster center Z for class 11The Thiessen polygon.

F) Distribution map P according to basic grade of breath1Each cluster in the cluster is corresponding to a cluster center point Z, a Thiessen polygon T and regional connected domain information S, and a breath basic grade distribution diagram P is calculated1And the ratio of the sum of the translation distances of the perpendicular bisectors between every two adjacent Thiessen polygons to the distance between the central points is used as the discrimination between the two categories (namely basic grade labels).

Taking the basic level 1 as an example, the following process is specifically described to calculate the degree of distinction between the basic level 1 and the adjacent basic levels (i.e. basic levels 2,3 and 5) in fig. 2, as follows:

1) for base level 1 cluster J1Analyzing to obtain Thiessen polygon T1All edges and the base level label of the Thiessen polygon with each edge adjacent to, for the convenience of expression, the Thiessen polygon T will be recorded1Has the m-th side of BmMarking the grade label of the Thiessen polygon in the area adjacent to the edge as dm

2) Side BmPerpendicular line of (A), side BmVertically translating along the vertical line direction, and after translation, obtaining the information S of the straight line and the connected domain1Or SmWhen tangent, respectively recording the distance Deltal of translation1And Δ lmThe degree of distinction Q between class 1 and class m1,mThe calculation formula of (2) is as follows:

wherein | Z1Zm2As a cluster center point Z1And ZmThe Euclidean distance of (c); q1,mThe discrimination between the category 1 and the category m is in a value range of [0,1 ]]A larger value indicates that the two categories are easier to distinguish. According to the representation in fig. 2, m is 2,3, 5.

Note that, if the thiessen polygons of the two categories are not adjacent, the distinction degree between the two categories is 1. Discrimination between all categories (base level labels) is obtained in the same way.

G) And constructing a triple Loss function Loss according to the discrimination between the basic grade labels and a triple sample [ A, P, N ] formed by inputting three training samples every time in the training process, and adjusting the hyper-parameter in the triple Loss function when the current triple sample is trained by using the discrimination between the basic grade labels.

In the triple samples [ A, P and N ], A represents a reference sample, and a spectrogram of a certain basic grade label (1-5) is randomly selected from the total sample; p is a positive sample with the same label as the reference sample, N is a negative sample with a different label from the reference sample, and the method for obtaining the triple sample is a known technique and will not be described herein again.

In this step, the formula for calculating the triple loss function is as follows:

wherein n is the number of the triple samples in one batch of samples, for example, a total of 90 samples, which are divided into three batches, each batch has 10 sets of triple samples, and one set of triple samples includes 3 samples; f (x) is a feature vector obtained by the network encoder after the training sample (triple sample) is input; delta is a hyper-parameter, the initial value is set to 1, and the hyper-parameter is used for controlling the distance between the reference sample and the positive sampleAnd the distance between the reference sample and the negative sampleA difference of (a);label data for the base levels of the positive and negative examples;for the discrimination between two base levels, for each set of samples i, a discrimination can be foundThe discrimination degrees of different samples are different and are obtained by the calculation of the step F),for adjusting the hyper-parameter delta; + represents a value for n [ numbers ]]Comparing the sum of the values in the table with zero, and if the value is greater than zero, the value is unchanged; if it isThe value is less than zero, at which value zero is set.

The significance of adding discrimination in the above formula is: corresponding to the adjustment of the size of the hyperparameter by giving a variable, i.e.Equivalent to a changed hyper-parameter, the higher the discrimination, the better classification effect can be obtained without excessive training for representing two classes, so thatThe whole becomes small, the training difficulty of the network is reduced, and the network training speed is improved; the smaller the discrimination, the less the classification effect of the two categories is, and in order to make the neural network pay more attention to the classification effect between the two categories, the classification effect will be poorThe whole becomes large, and the accuracy of classification between basic levels with lower discrimination is ensured.

H) Inputting the training samples of the current batch into the neural network, training, calculating the triple loss function in the step G), continuously updating the network parameters by using a gradient descent method, and finishing the neural network W1And (4) training.

Step S3: according to the error tolerance of the user to sing the music in the step S1, the breath level, the intonation level and the rhythm level of the user in the step S2 are adjusted, and the corrected breath level, the corrected intonation level and the corrected rhythm level of the user are obtained through adjustment; and comparing the set breath level, intonation level and rhythm level of each music teaching video with the actual breath level, actual intonation level and actual rhythm level of the user, and recommending the music teaching video with the minimum comparison difference to the user as the best recommended teaching video.

The specific process is as follows:

(1) converting the user audio information collected in the step S1 into a narrow-band spectrogram and a wide-band spectrogram, and inputting the narrow-band spectrogram and the wide-band spectrogram into the neural network W1,W2,W3To obtainThe method comprises the steps of obtaining a feature vector H of a user according to basic grades of the user in the aspects of breath, intonation and rhythm, namely the breath grade, intonation grade and rhythm grade of the user, wherein the feature vector is a feature vector of 1 row and 3 lines.

(2) And obtaining grade labels of the music teaching videos, including breath grade, intonation grade and rhythm grade of the music teaching videos. In this step, the determination of the level label of the music teaching video comprises the following steps:

the method comprises the steps that artificial voting is conducted on the most suitable basic levels of the existing music teaching videos on the internet from the aspects of breath, intonation and rhythm, voting personnel are music professionals, each video obtains the mode of the suitable level in different aspects as a final level label, and the feature vector SP of the video on each music line is obtained and is also the feature vector of 1 line and 3 columns.

(3) Considering that the difficulty levels of the songs sung by the user are different, the optimal matching result is corrected according to the error tolerance R obtained in the step S1, and the intelligent recommendation of the video on the music line is completed, wherein the correction formula is as follows:

min{|∑i,j((1+R)H(i,j)-SP(i,j))|}

wherein H(i,j)The feature vector representing the user, i rows and j columns in the feature matrix H, i is 1, j is 1,2,3, SP(i,j)Feature vectors, sigma, representing a music teaching videoi,j((1+R)H(i,j)-SP(i,j)) And if there are k music teaching videos to be selected, the k groups of total difference values are shared, and min represents that the minimum value is selected from the k groups of total difference values, namely the music teaching video corresponding to the minimum total difference value is the best recommended teaching video, so that an optimal matching result can be obtained according to the objective function, and the recommendation of the music on-line teaching videos is completed.

Example 2:

the embodiment provides an intelligent music online teaching video recommendation method, which is different from embodiment 1 in that the error tolerance of the user' S test music does not need to be calculated, the breath level, the intonation level and the rhythm level of the user in step S2 do not need to be corrected, and the breath level, the intonation level and the rhythm level of the user in step S2 are directly compared with the levels of various music teaching videos to be selected, so that a more reasonable video recommendation result can be provided.

Example 3:

the embodiment provides an intelligent recommendation system for teaching videos on music lines, which comprises a memory, a processor and a computer program running on the memory and the processor, wherein the processor is coupled with the memory, and the processor realizes the intelligent recommendation method for teaching videos in embodiment 1 when executing the computer program.

It should be noted that: the above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

13页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:基于视频数据的结构化数据提取方法及装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!