Artificial intelligent composition method and system

文档序号：211044 发布日期：2021-11-05 浏览：4次中文

阅读说明：本技术 一种人工智能作曲方法和系统 (Artificial intelligent composition method and system ) 是由朱春霖田旭航廖勇夏雄军于 2021-07-07 设计创作，主要内容包括：本发明公开一种人工智能作曲方法和系统。本发明涉及计算机应用技术领域。所述方法包括：获取训练集中的音乐信息,保存为钢琴卷帘格式,所述音乐信息包括音符、音高、起始时间以及音符的时值；对所述音乐信息进行清洗,切分为4小节乐句,删除超出音高范围的音符；将当下的音乐信息与上一个时间点输入的音乐信息,共同编码为其相应的潜在信息,并保存在潜在空间中；从潜在空间中提取低维音乐特征信息,产生新的音乐信息,将音乐信息进行筛分,并按照不同音轨进行输出,实现智能作曲；将生成的音乐与真实音乐进行对比,指导下一轮的音乐生成；在多轮训练后,输出效果趋于稳定时,音乐生成模块生成的音乐即为该实例的输出。(The invention discloses an artificial intelligence composing method and system. The invention relates to the technical field of computer application. The method comprises the following steps: acquiring music information in a training set, and storing the music information in a piano rolling form, wherein the music information comprises musical notes, pitches, starting time and duration values of the musical notes; cleaning the music information, segmenting the music information into 4 measures of phrases, and deleting notes beyond the range of pitches; the current music information and the music information input at the last time point are jointly coded into corresponding potential information and stored in a potential space; extracting low-dimensional music characteristic information from the potential space, generating new music information, screening the music information, and outputting according to different music tracks to realize intelligent composition; comparing the generated music with the real music to guide the generation of the next round of music; after multiple rounds of training, when the output effect tends to be stable, the music generated by the music generation module is the output of the example.)

1. An artificial intelligence composing method is characterized in that: the method comprises the following steps:

step one, a preprocessing module is constructed, a training set containing a plurality of music files is input into the preprocessing module to obtain music information in the music files of the training set, and the music information is stored in a piano roller shutter format; the music information comprises notes, pitches, start times and durations of the notes;

step two, cleaning the music information through a preprocessing module, cutting the music information into music segments within a preset length range, and deleting notes exceeding a preset pitch range;

thirdly, a data conversion module is constructed, the current music information and the music information input at the last time point are jointly coded into corresponding potential information, and the potential information is stored in a potential space;

step four, a music generation module is constructed, low-dimensional information in the potential space is extracted, new music information is generated, the music information is screened and stored as music of different music tracks and different musical instruments, and automatic composition is achieved; the data conversion module and the music generation module form a variation self-encoder; the data conversion module is used as an encoder network of the variation self-encoder, and the music generation module is used as a decoder network of the variation self-encoder;

step five, a music evaluation module is constructed, a reward function is set, music output by the music generation module is compared with real music, and the music generation module is guided to generate music in the next round; the music generation module and the music evaluation module form a GAN neural network, the music generation module is used as a generator of the GAN neural network, and the music evaluation module is used as a discriminator of the GAN neural network;

step six, after the music generation module is trained for multiple rounds, when the output effect is stable, the music generated by the music generation module is the intelligent composition works.

2. The artificial intelligence composition method of claim 1, wherein: the data conversion module consists of a plurality of single-layer bidirectional GRU networks, and the single-layer bidirectional GRU network of each time sequence segment simultaneously transmits parameters to a previous time sequence and a next time sequence and also receives the parameters of the previous time sequence and the next time sequence; after extracting all music characteristic information, coding all music characteristic information into corresponding potential information by the single-layer double GRU network of the first time sequence and the single-layer double GRU network of the last time sequence, and storing the potential information in a potential space; the potential information is music characteristic information extracted by the data conversion module, and the potential space is a set of all variables for storing the potential information and is one or more one-dimensional arrays.

3. The artificial intelligence composition method of claim 1, wherein: the goal function of the GAN neural network is:

wherein D represents a music evaluation function and G represents a music generation function; x denotes the real data input, E_x～pdata(x)Represents sampling x from the distribution pdata; data represents real data, pdata (x) represents the distribution of the real data; z represents the noise data, pz is the distribution obeyed by the noise data, pg is the distribution obeyed by the generated data; d (x) represents the expectation of x when x obeys the padata distribution, and the output is a value with the maximum value of 1 and the minimum value of 0; λ is a penalty termThe parameter (c) of (c).

4. The artificial intelligence composition method of claim 1, wherein: the music generation module is composed of a layered GRU network, and the structure of the music generation module is a layer of U GRU networks and a layer of Uxn GRU networks; the GRU neural network is provided with two gate control units, an updating gate and a resetting gate;

the composition function is as follows

z_t＝σ(W_z·[h_t-1,x_t])

r_t＝σ(W_r·[h_t-1,x_t])

y_t＝σ(W_o·h_t)

Wherein: z is a radical of_tUpdate gate representing time t, r_tA reset gate is shown at time t,indicates the time tCandidate activation state of h_tIndicating the active state at time t, h_t-1Denotes the hidden layer state at time (t-1), x_tRepresenting the input at time t, sigma an activation function, W_r、W_z、W_h、W_oAll are weight parameters to be learned; the updating door z is determined by historical information which needs to be forgotten in the current state and new information which is accepted; the reset gate r is determined by the information obtained from the history information by the candidate state; the information of the update door at the previous moment is transmitted to the current state, and the larger the value of the update door is, the more the transmitted information is; the reset gate controls how much information of the previous state is input into the current state, and the larger the reset gate is, the more information is written into the current state.

5. The artificial intelligence composition method of claim 1, wherein: the music evaluation module comprises x GRU networks, the size of x depends on the length of a music information sequence input by the music generation module, the size of x is the length/a of the input music information sequence, and a is the set time resolution width of each section; designating a reward function before training, wherein the reward function is set as follows, if the musical interval difference between two adjacent notes is larger than a preset musical scale, the reward function is marked as 0, otherwise, the reward function is marked as 1, and g (x) is the average value of each piece of music; namely: if the musical interval difference of n groups of adjacent notes in a piece of music is larger than the preset scale, and the remaining m groups of adjacent notes are smaller than or equal to the preset scale, the musical interval difference is expressed by the following formula:

the music evaluation module takes the music sequence output by the music generation module as input, judges whether the music sequence is real or generated music by comparing the real music sequence, and determines whether to impose punishment on the music evaluation module according to the judgment result.

6. The artificial intelligence composition method of claim 5, wherein: and a is 96.

7. The artificial intelligence composition method of claim 5, wherein: the preset scale setting is 84.

8. An artificial intelligence composition system is characterized by comprising a preprocessing module, a data conversion module, a music generation module and a music evaluation module;

the preprocessing module is used for extracting music information in the music file, storing the music information in a piano rolling screen format, cleaning the music information, cutting the music information into music segments within a preset length range, and deleting notes exceeding a preset pitch range;

the data conversion module is used for coding the current music information and the music information input at the previous time point into corresponding potential information together and storing the potential information in a potential space;

the music generation module is used for extracting low-dimensional information in a potential space, generating new music information, screening the music information, storing the music information into different music tracks and music of different musical instruments, and realizing automatic composition;

the music evaluation module is used for setting a reward function, comparing the music output by the music generation module with the real music and guiding the music generation module to generate the music in the next round;

the data conversion module and the music generation module form a variation self-encoder; the data conversion module is used as an encoder network of the variation self-encoder, and the music generation module is used as a decoder network of the variation self-encoder;

the music generation module and the music evaluation module form a GAN neural network, the music generation module is used as a generator of the GAN neural network, and the music evaluation module is used as a discriminator of the GAN neural network.

Technical Field

The invention relates to the technical field of computer application, in particular to an artificial intelligence composing method and system.

Background

Music is undoubtedly of great special interest to today's society and is an essential part of everyday life. With the development and popularization of computer related technologies, the computer music field should be born. The use of neural networks for intelligent composition has also received great attention from researchers and commercial companies as a new direction in the field of computer music.

The existing Neural Network composition method is mainly implemented based on a Recurrent Neural Network (RNN), a Variational Automatic Encoder (VAE), or a Generative Adaptive Network (GAN). However, the computer composition method based on a single type of neural network is only suitable for the situation that the length of the music is short, and various types of neural networks have typical defects, and the composition efficiency of the computer composition method exponentially decreases along with the increase of the length of the music. The recurrent neural network has the problems of gradient disappearance and gradient explosion, and the generated music lacks regularity and coherence. The music generated by the antagonistic network is poor in audibility and unstable in generation quality. When a variable automatic encoder generates multi-scale or long-sequence music, the generation efficiency is low, and the audibility of the generated music is poor.

The noun explains:

freezen mechanism: a training mechanism is used for freezing a too powerful party when a generator or a discriminator is trained to be abnormally strong so that the gradient of the other party disappears and the training cannot be normally carried out in a GAN network.

KL divergence: relative entropy, if there are two separate probability distributions, P (x) and Q (x), for the same random variable, x, the difference of these two distributions can be measured by KL divergence (relative entropy).

And (3) tanh activation: in a neural network we would weight and sum all the inputs, and then we would apply a function to the result, which is called the activation function, which can handle more complex problems in order to diversify the intermediate outputs when used. tanh is just an activation function, which is the tangent-double-cut curve, passing through the (0,0) point. The functional formula is:

softmax output layer: softmax is a classifier used in the classification process to implement multi-classification, which maps some of the output neurons to real numbers between (0-1), and normalizes to guarantee a sum of 1, so that the sum of the probabilities for the multi-classification is also exactly 1. The functional formula is:

wherein, V_iIs the output of the pre-stage output unit of the classifier. i represents the category index, and the total number of categories is C. S_iThe ratio of the index of the current element to the sum of the indices of all elements is shown. The output values of the multiple classes can be converted into relative probabilities by a Softmax function.

Disclosure of Invention

The invention provides an artificial intelligent music composing method and system aiming at the defects of several existing neural network music composing methods.

An artificial intelligence composing method comprises the following steps:

step four, a music generation module is constructed, low-dimensional information in the potential space is extracted, new music information is generated, the music information is screened and stored as music of different music tracks and different musical instruments, and automatic composition is realized; the data conversion module and the music generation module form a variation self-encoder; the data conversion module is used as an encoder network of the variational self-encoder, and the music generation module is used as a decoder network of the variational self-encoder;

step six, after the music generation module is trained for multiple rounds, when the output effect is stable, the music generated by the music generation module is the intelligent composition works.

In a further improvement, the data conversion module is composed of a plurality of single-layer bidirectional GRU networks, and the single-layer bidirectional GRU network of each time sequence segment simultaneously transmits parameters to a previous time sequence and a next time sequence and also receives the parameters of the previous time sequence and the next time sequence; after extracting all music characteristic information, coding all music characteristic information into corresponding potential information by a single-layer double GRU network of a first time sequence and a single-layer double GRU network of a last time sequence, and storing the potential information in a potential space; the potential information is music characteristic information extracted by the data conversion module, and the potential space is a set of all variables for storing the potential information and is one or more one-dimensional groups.

In a further improvement, the goal function of the GAN neural network is:

wherein D represents a music evaluation function and G represents a music generation function; x denotes true data input, E_x～pdata(x)Represents sampling x from the distribution pdata; data represents real data, pdata (x) represents the distribution of the real data; z represents the noise data, pz is the distribution obeyed by the noise data, pg is the distribution obeyed by the generated data; d (x) represents the expectation of x when x obeys the padata distribution, and the output is a value with the maximum value of 1 and the minimum value of 0; λ is a penalty termThe parameter (c) of (c).

In a further improvement, the music generation module is composed of a hierarchical GRU network, and the structure of the music generation module is a layer of U GRU networks and a layer of Uxn GRU networks; the GRU neural network is provided with two gate control units, an updating gate and a resetting gate;

the composition function is as follows

z_t＝σ(W_z·[h_t-1,x_t])

r_t＝σ(W_r·[h_t-1,x_t])

y_t＝σ(W_o·h_t)

Wherein: z is a radical of_tUpdate gate representing time t, r_tA reset gate is shown at time t,indicating candidate activation states at time t, h_tIndicating the active state at time t, h_t-1Denotes the hidden layer state at time (t-1), x_tRepresenting the input at time t, sigma an activation function, W_r、W_z、W_h、W_oAll are weight parameters to be learned; the updating door z is determined by historical information which needs to be forgotten in the current state and new information which is accepted; the reset gate r is determined by the information obtained from the history information by the candidate state; how much information is transmitted to the current state at the moment before the updating gate is controlled, and the larger the value of the updating gate is, the more the transmitted information is; the reset gate controls how much information of the previous state is input into the current state, and the larger the reset gate is, the more information is written into the current state.

In a further improvement, the music evaluation module comprises x GRU networks, the size of x depends on the length of the music information sequence input by the music generation module, the size of x is the length/a of the input music information sequence, and a is the set time resolution width of each section; before training, designating a reward function, wherein the reward function is set as follows, if the musical interval difference between two adjacent notes is larger than a preset musical scale, the reward function is marked as 0, otherwise, the reward function is marked as 1, and g (x) is the average value of each segment of music; namely: if the musical interval difference of n groups of adjacent notes in a piece of music is larger than the preset scale, and the remaining m groups of adjacent notes are smaller than or equal to the preset scale, the musical interval difference is expressed by the following formula:

the music evaluation module takes the music sequence output by the music generation module as input, judges whether the music sequence is real or generated music by comparing the real music sequence, and determines whether to apply punishment to the music evaluation module according to the judgment result.

In a further improvement, a is 96.

In a further refinement, the preset scale setting is 84.

An artificial intelligence composition system comprises a preprocessing module, a data conversion module, a music generation module and a music evaluation module;

the preprocessing module is used for extracting music information in the music file, storing the music information in a piano rolling form, cleaning the music information, cutting the music information into music segments within a preset length range, and deleting notes exceeding a preset pitch range;

the music evaluation module is used for setting a reward function, comparing the music output by the music generation module with real music and guiding the music generation module to generate the next round of music;

the data conversion module and the music generation module form a variation self-encoder; the data conversion module is used as an encoder network of the variational self-encoder, and the music generation module is used as a decoder network of the variational self-encoder;

Compared with the prior art, the invention has the technical effects of strong audibility of generated music, capability of generating long music and capability of generating multi-track music. According to the intelligent composition method based on the hybrid neural network, the single-layer bidirectional GRU in the data conversion module comprises a GRU block, the GRU block comprises an updating gate, and the gate judges whether music information is important or not and can participate in the next round of training. The user can participate in the creation process by assigning parameters and reward functions to the music generation module and the music evaluation module by himself, different types of music are generated through different inputs, and the music creation module and the music evaluation module are flexible to control and rich in generation effect.

Drawings

The advantages of the invention will be better understood from the following detailed description of the embodiments of the invention. The drawings are only for purposes of further illustrating the invention and are not to be construed as limiting the invention in any way.

Fig. 1 is a flow diagram of an example intelligent composition method based on a hybrid neural network.

FIG. 2 is a block diagram of a hybrid neural network in one example.

FIG. 3 is a flow diagram of the pre-processing module processing MIDI music in one example.

Fig. 4 is a diagram of the internal structure of a single GRU network in one example.

Fig. 5 is a combined structure diagram of a single-layer bidirectional GRU network of the data conversion module in one example.

Fig. 6 is a combined structure diagram of a music generation module two-layer unidirectional GRU network in one example.

Fig. 7 is a combined structure diagram of a single-layer bidirectional GRU network of the music evaluation module in one example.

Fig. 8 is a diagram of a piano roll blind for music generated by a single RNN network in one example.

Fig. 9 is a piano rolling shutter diagram of music generated by the GAN-RNN network in one example.

Fig. 10 is a rolling shutter diagram of a piano of a solo music produced by the method of the present invention in one example.

Fig. 11 is a rolling shutter diagram of a piano of a three-instrument ensemble music generated by the method of the present invention in one example.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and examples.

The invention relates to an artificial intelligence composition method and a system, as shown in figure 1, comprising the following steps:

step S1, through the preprocessing module, acquiring the music information in the MIDI music file in the training set, wherein the music information includes the note, the pitch, the start time and the duration of the note, and is stored in the piano rolling form. The training set consists of more than three hundred midi music, can be specific music of a certain type, and can also consist of any type of music.

The most important information in the MIDI file as training set includes three aspects, namely note pitch, start time and duration of the note. And extracting and filtering information related to the musical notes in the file field to obtain training data of the neural network. The converted data is represented as a two-dimensional matrix, with the two dimensions of the matrix representing pitch and time, respectively. The vertical time dimension is quantified by taking tick (the minimum time unit in the MIDI file) in the MIDI format as a time unit, and the minimum note unit is a sixteenth note. The horizontal pitch dimension is in units of semitones, the length of which is determined by the maximum range of notes in the MIDI data, and notes that are out of range are ignored. The method comprises the steps of outputting MIDI data information analyzed by a preprocessing module, extracting control information of note triggering and stopping by analyzing corresponding fields, subtracting Delta-Time (Time difference parameter, which refers to the Time interval from the previous event to the event and has the unit of tick) of note stopping information and triggering information to obtain Delta-Time of a note self-Time value, and converting the Delta-Time of note length into an actual Time value of the note by using a conversion relation between resolution and the Delta-Time, wherein the note uses a representation mode containing a note ending mark, and a generated result can be directly used as an input data set of a neural network for training.

And step S2, cleaning the music information through a preprocessing module, segmenting the music information into music segments with proper length, and deleting notes beyond the pitch range.

The pre-processing module processes the whole flow of MIDI music as shown in FIG. 3. In one example, songs that are not in the C-key or do not use four beats are removed by the pre-processing module. For each section, the width (time resolution) is set to 96 for modeling common temporal patternsSuch as three notes, sixteen notes, etc. The height is set to 84 to cover pitches from C1 to C8. Thus, each data tensor is 96 × 84 × 5 in size and is stored in a two-dimensional matrix. The value of each element of the matrix is the velocity (volume) of the note at a certain time step. n music sequences are composed of X ═ X₀,...,x_t-1,x_t,...,x_nDenotes wherein x_t-1And x_tIs two continuous pieces of information, the size of n depends on the length of the music information sequence input by the music generation module, the size of n is the length/96 of the input music information sequence, and 96 is the set time resolution width of each section.

Step S3, the current music information and the music information input at the previous time point are jointly encoded into corresponding potential information by the data conversion module, and stored in a potential space, where the potential information is music characteristic information extracted by the data conversion module and is a special one-dimensional representation of information such as note, pitch, start time, duration of note, and the like, and the potential space is a set of all variables storing the potential information and is one or more one-dimensional arrays.

A Recurrent Neural Network (RNN) is a popular Neural Network model, which is commonly used for processing sequence data and has a remarkable effect in the field of natural language processing. X_t: indicating input at time t, Y_t: output representing time t, M_t: representing the memory at time t, K, L each represent a weight matrix, according to equation 1:

M_t＝f(KX_t+LM_t-1) Formula (1)

Wherein f () is an activation function, making it a non-linear mapping that can be used to filter information. Using memory M of current time in RNN prediction_tDe-predict, then predict the probability of each output using softmax, as in the formula:

Y_t＝soft max(HM_t) Formula (2)

Wherein H represents a weight matrix, Y_tThe output at time t is represented; softmax () represents the probability of the output at time t.

A gated recursion Unit Recurrent Neural network GRU (gate recursion Unit Recurrent Neural networks) is a special RNN network structure, and the network update method is the same as RNN, but the GRU can solve the long dependence problem in RNN network in design.

As shown in fig. 4, the GRU adds two gate control units, one refresh gate, and one reset gate, compared to the RNN. In a further improvement, the music generation module is composed of a hierarchical GRU network, and the structure of the music generation module is a layer of U GRU networks and a layer of Uxn GRU networks; the GRU neural network is provided with two gate control units, an updating gate and a resetting gate;

the composition function is as follows

z_t＝σ(W_z·[h_t-1,x_t]) Formula (3)

r_t＝σ(W_r·[h_t-1,x_t]) Formula (4)

y_t＝σ(W_o·h_t) Formula (5)

Wherein: z is a radical of_tUpdate gate representing time t, r_tA reset gate is shown at time t,represents the candidate activation state at time t, h_tIndicating the active state at time t, h_t-1Denotes a hidden layer state at time (t-1), x_tRepresenting the input at time t, sigma an activation function, W_r、W_z、W_h、W_oAll are weight parameters to be learned; the updating door z is determined by historical information which needs to be forgotten in the current state and new information which is accepted; resetThe gate r is determined by the information obtained from the history information by the candidate state; the information of the update door at the previous moment is transmitted to the current state, and the larger the value of the update door is, the more the transmitted information is; the reset gate controls how much information of the previous state is input into the current state, and the larger the reset gate is, the more information is written into the current state.

How much information is transferred to the current state before the update gate controls, and the larger the value of the update gate, the more information is transferred. The reset gate controls how much information from the previous state is input into the current state, and the larger the reset gate is, the more information is written into the current state. Music is a time-series sequence and the generation of the music sequence is implemented in the method using a GRU network. The gated recursion unit recurrent neural network GRU has a special time sequence memory function. By means of the control of the two gates, previous information can be linked to the current task, i.e. the next sequence action is predicted by the previously generated sequence. As shown in fig. 4, the data conversion module is composed of a plurality of single-layer bidirectional GRU networks, and the GRU network encoding each time slice simultaneously transmits and receives parameters of a previous time slice and a next time slice. After extracting all music characteristic information, the first time sequence GRU network and the last time sequence GRU network encode the music characteristic information into corresponding potential information and store the potential information in a potential space. The potential information is music characteristic information extracted by the data conversion module, and the potential space is a set of all variables for storing the potential information and is one or more one-dimensional arrays.

Step S4, extracting low-dimensional music characteristic information from the potential space through a music generation module, generating new music information, screening the music information, and outputting music according to different music tracks and different musical instruments to realize intelligent composition;

a Generative Adaptive Networks (GAN) is a deep learning model, and is one of the most promising methods for unsupervised learning in complex distribution in recent years. The model passes through two modules in the framework: the mutual game learning of the generator (Generative Model) and the discriminator (Discriminative Model) produces an output.

In one example, the music generation module undertakes the generator and the music evaluation module undertakes the discriminator. The music generation module network extracts low dimensional information from the potential space and generates high dimensional music information. The music evaluation module network attempts to distinguish between samples drawn from training data and samples generated from the music generation module. The music evaluation module issues a probability value for the sample given by the music generation module indicating the probability that the sample is a true training sample and not a fake sample extracted from the model.

The training purpose of the music evaluation module is to maximize the discrimination accuracy of the music evaluation module. When this data is discriminated as being from the real data, 1 is labeled, and when it is derived from the generated data, 0 is labeled. Contrary to this goal, the training goal of the music generation module is to minimize the discrimination accuracy of the music evaluation module. In the training process, the capacities of the music generation module and the music evaluation module are not always equal, and it often happens that the music generation module or the music evaluation module is trained to be abnormally strong so that the gradient of the opposite party disappears, which is also one of the main reasons for the unstable quality of the music made by the GAN network. In one embodiment, a freezen mechanism is added to ameliorate this problem. When a party is too powerful to perform training properly, the too powerful party is "frozen". Its objective function is:

wherein D represents a music evaluation function and G represents a music generation function; x represents the true data input, E_x～pdata(x)Represents sampling x from the distribution pdata; data represents real data, pdata (x) represents the distribution of the real data; z represents the noise data, pz is the score obeyed by the noise dataPg is the distribution to which the generated data obeys; d (x) represents the expectation of x when x obeys the padata distribution, and the output is a value with the maximum value of 1 and the minimum value of 0; λ is a penalty termThe parameter (c) of (c).

The music evaluation module tries to learn to correctly classify the sample into real or fake, meanwhile, in convergence, the sample of the music generation module is indistinguishable from actual data, the probability of accurate judgment of the music evaluation module is about 50%, and the two reach Nash equilibrium, so that the network is considered to achieve the learning effect.

A variational auto-encoder (VAE) is a directional model that uses approximate extrapolation and can be trained purely using gradient-based methods. The VAE first samples the latent vectors from the code distribution p (z) and stores them in the latent space z. The samples are then used to pass through the micro generator network g (z). Finally, x is sampled from the distribution P (x; g (z)) P (x | z); where x represents the original data samples, p (x) is the distribution of the original data samples, z is the latent variable in the latent space, and g (z) represents the distribution of the latent variable. The VAE consists of an encoder q λ (z | x) that approximates a posteriori p (z | x) and a decoder p θ (x | z) of a parameterized likelihood p (x | z). In practice, the approximate a posteriori distribution and likelihood distribution encoders and decoders are parameterized by neural networks of parameters λ and θ, respectively. Following the framework of variational reasoning, the KL divergence is minimized by maximizing the let-down bound, and the posterior is performed to make inferences to z-q λ (z | x) and KL using the KL divergence between the encoder and the posterior p (z | x).

The encoder and decoder networks in the VAE, as described above, are a single layer, bi-directional GRU network. The encoder uses the sampled potential space z to set the initial state of the decoder GRU (while assuming the generator in the generating countermeasure network), which automatically generates the output sequence. The model is trained to reconstruct the input sequence and learn an approximate a posteriori q λ (z | x) that is close to the previous p (z).

As shown in FIG. 6, the music generation module consists of a hierarchy of layersThe secondary GRU network is composed of a layer of U GRU networks and a layer of Uxn GRU networks. The module simultaneously undertakes a decoder in a variational automatic encoder network, divides an input sequence and a target output sequence S into c non-overlapping subsequences d_cAnd endpoint i_cTheir relationship is:

→s＝{d₁,d₂,...,d_cequation (10)

In one example, i is defined_c+1In a special case of T, an information vector in the potential space is passed through a fully-connected layer and then activated to obtain an initial state of the first layer GRU network. The first layer of the GRU network generates c embedded vectors u ═ u₁,u₂,...,u_c}，u_cRepresenting the c-th embedded vector; one for each subsequence. After the first layer of GRU networks has generated the sequence of embedded vectors u, each GRU network will go through the shared full-connectivity layer and then perform tanh activation to generate the initial state of the final bottom layer GRU. Then, the underlying GRU outputs the layer for each subsequence d via softmax_cA series of distribution sequences are recursively generated. In each step of the underlying GRU network, embedding vector u of the current first layer GRU network_cConnected to the previous output to serve as an input.

And step S5, comparing the music output by the music generation module with real music through the music evaluation module, and guiding the next round of music generation.

The reward function is set according to music theory rules. Since the generation of music is random, there may be large intervals between successive notes. A maximum interval between notes may be specified and the prize is reduced when the interval exceeds the specified maximum. A positive reward is given when the most power and the less power are harmonious or rest. When they are not chords, or have no scale notes, no prize is awarded. According to the constraint condition of the musical interval in the music theory, the variation of the musical interval exceeding five degrees is reduced through the constraint of a reward function, and the variation of the continuous large musical interval in the same direction is reduced; the occurrence of the same note repeated consecutively in the generated music is avoided.

The bonus function is set up such that if the difference in musical interval between two adjacent notes is greater than octave it will be noted as 0 and vice versa as 1, g (x) is the average of each piece of music. Namely: if the musical interval difference of n groups of adjacent notes of a piece of music is greater than the octave, the rest m groups are less than or equal to the octave. Can be expressed by the following formula:

in one embodiment, as shown in fig. 6, the music evaluation module consists of a plurality of single-layer bidirectional GRU networks, with a reward function specified before training begins. Music evaluation module network combining two-dimensional matrix x_tAnd x'_tAs input and predict whether they are real or generated music and influence the next round of training depending on the decision. x is the number of_tAnd x'_tRespectively representing an original music sequence and a generated music sequence;

the value of each element of the matrix is the velocity of the note at a certain time step. n music sequences are composed of X ═ X₀,...,x_t-1,x_t,...,x_nDenotes wherein x_t-1And x_tIs two consecutive pieces of information.

In step S6, after multiple rounds of training, when the output effect tends to be stable, the music generated by the music generation module is the output of the example.

In summary, the invention discloses an artificial intelligence composition method and system. And (5) obtaining the output of the model after multiple cycles of the steps 1 to 6. During the training process, the user can adjust the output results of the model by specifying the reward function and specifying the input MIDI music.

Specific experiments and result analysis:

in order to illustrate the effectiveness of the invention, the performance of the method of the invention is compared with that of a pure RNN frame composition method and a GAN-RNN frame composition method on a Lakh MIDI data set, and the piano rolling blind generated is shown in figures 8-10, wherein the piano rolling blind is respectively pure RNN and GAN-RNN in sequence. The method and the system can better accord with music rule and happy music, and have better music performance and audibility than other two methods after the result of manual evaluation by questionnaire survey. And the method of the invention can also generate music with various musical instrument reverberation better, as shown in fig. 11.

Although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that various changes, modifications, and equivalents may be made therein without departing from the spirit and scope of the invention.

16页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：适用于多人K歌的音频同步方法、装置及存储介质

Artificial intelligent composition method and system

相关技术

网友询问留言