MIDI playing style automatic conversion system based on recurrent neural network

文档序号:1244035 发布日期:2020-08-18 浏览:14次 中文

阅读说明:本技术 基于循环神经网络的midi演奏风格自动转换系统 (MIDI playing style automatic conversion system based on recurrent neural network ) 是由 丁泉龙 戴安东 曹燕 韦岗 于 2020-04-21 设计创作,主要内容包括:本发明公开了一种基于循环神经网络的MIDI演奏风格自动转换系统,包括MIDI分析模块、数据预处理模块、自编码器模块、风格网络模块和乐曲生成模块。其中,MIDI分析模块读取用户输入,将多音轨MIDI合并为单音轨MIDI,作为乐谱;数据预处理模块用于从乐谱中提取音符特征;自编码器模块将音符特征进行编解码;风格网络模块学习乐谱的演奏风格,并预测力度向量;乐曲生成模块用于为乐谱配置风格网络模块预测的力度向量,并转换为富有表现力的乐曲。本发明的演奏风格自动转换系统,满足音乐爱好者的个性化需求,生成用户喜欢的演奏风格的乐曲;同时,可以作为专业人士的工具,使他们可以轻松地将新样式和创意融入其作曲工作中。(The invention discloses a MIDI playing style automatic conversion system based on a recurrent neural network, which comprises a MIDI analysis module, a data preprocessing module, a self-encoder module, a style network module and a music generation module. The MIDI analysis module reads user input and combines multi-track MIDI into single track MIDI as a music score; the data preprocessing module is used for extracting note characteristics from the music score; the self-encoder module encodes and decodes the note characteristics; the style network module learns the playing style of the music score and predicts a strength vector; and the music generation module is used for configuring the strength vector predicted by the style network module for the music score and converting the strength vector into music with expressive force. The playing style automatic conversion system meets the individual requirements of music enthusiasts and generates music with playing style liked by users; meanwhile, the method can be used as a tool for professionals, so that the professionals can easily integrate new styles and creatives into the music composing work.)

1. A MIDI playing style automatic conversion system based on a recurrent neural network is characterized in that the MIDI playing style automatic conversion system comprises a MIDI analysis module, a data preprocessing module, a self-encoder module, a style network module and a music generation module which are sequentially connected in sequence; wherein the content of the first and second substances,

the MIDI analysis module is used for reading MIDI input by a user and combining multi-track MIDI into single track MIDI to be used as a music score; the data preprocessing module is used for extracting note characteristics from the music score; the self-encoder module encodes and decodes the extracted note characteristics and is used for compressing surface layer characteristics of the music score and extracting high-level characteristics of the music score; the style network module is used for learning the playing style of the music score, predicting the strength vector and assisting in composing music; the music generation module is used for configuring a strength vector predicted by the style network module for the music score and converting the strength vector into music with expressive force;

the automatic MIDI playing style conversion system for automatically converting the playing style comprises the following steps:

the MIDI input by the user is transmitted to a MIDI analysis module, the MIDI analysis module analyzes the type of the input MIDI, if the MIDI comprises a plurality of note tracks, all the rest note tracks are combined except the percussion track, otherwise, no processing is carried out, and the combined MIDI is stored as a music score;

acquiring a note vector, extracting a note matrix from the music score by a data preprocessing module, inputting the note matrix into a trained self-encoder module, and outputting the note vector after encoding by an encoder in a note self-encoder;

inputting the note vectors into trained different-style network modules, and predicting and generating different-style force vectors by the different-style network modules;

and the music generation module configures the force vector generated by the style network module corresponding to the style selected by the user for the music score and converts the force vector into MIDI of the style selected by the user.

2. The automatic MIDI performance style conversion system according to claim 1 wherein the MIDI analysis module is operated as follows:

firstly, all tracks of the MIDI are traversed, the tracks of the MIDI are divided into a global track and a note track, each note track respectively defines a tone color corresponding to a sound part, all the rest note tracks are combined to be used as a music score without considering the percussion track, and if the input MIDI only contains one note track, no processing is needed.

3. The system of claim 1, wherein the data preprocessing module is configured to extract note features from the score, the note features including pitch, note onset time, note duration and velocity, and the note matrix and the velocity matrix are designed by using the note features; the total time step of the music score is T, and the unit is the minimum time unit Ticks in MIDI; first, the minimum note represented by a single time step needs to be determined, and if 1 time step represents 1 sixteenth note, then 1 quartile note is represented by 4 time steps; the row of the note matrix represents a time step T, the range is 0-T, the column of the note matrix represents a pitch p, the range is 0-127, the value of the note matrix represents the state of the note at the time step, and the state of the note is represented by a two-dimensional vector consisting of 0 and 1, and comprises three note states: off, on, and on, represented as [0,0], [1,1], and [1,0] with two-dimensional vectors, respectively; the row of the force matrix represents the time step T, the range is 0-T, the list shows the pitch p, the range is 0-127, the value of the note matrix represents the force of the note with the corresponding pitch under the time step, and the value is 0-127.

4. The automatic MIDI performance style conversion system based on recurrent neural network as claimed in claim 1, wherein said self-encoder module comprises two independent units of note self-encoder and velocity self-encoder for compressing the dimensionality of the note matrix and the velocity matrix and extracting the high-level features of the score;

the self-encoder module comprises two independent units of a note self-encoder and a strength self-encoder, and each unit is formed by sequentially connecting an encoder and a decoder; the note self-encoder is composed of an input layer, a hidden layer and an output layer, wherein the hidden layer is composed of a bidirectional recurrent neural network, the note self-encoder takes a note matrix as input and outputs the reconstructed note matrix, the input layer is connected with the hidden layer to form an encoder, the mapping relation of the encoder is used for encoding, and the note matrix is converted into a note vector; connecting the hidden layer to the output layer to form a decoder, wherein the mapping relation of the decoder is used for decoding, and reconstructing the note vector into a note matrix;

the force autoencoder is composed of an input layer, a hidden layer and an output layer, wherein the hidden layer is composed of a bidirectional recurrent neural network, the force autoencoder takes a force matrix as input and outputs the reconstructed force matrix, the input layer is connected with the hidden layer to form the encoder, the mapping relation of the encoder is used for encoding, and the force matrix is converted into a force vector; and connecting the hidden layer to the output layer to form a decoder, wherein the mapping relation of the decoder is used for decoding, and the force vector is reconstructed into a force matrix.

5. The system of claim 4, wherein the training of the self-encoder module is composed of separate training of note self-encoder and dynamics self-encoder, and the whole training process is:

step S1, obtaining downloaded MIDI data sets of different styles, wherein each MIDI has its own style label: classical music, popular music, jazz music, metallic music, country music, dance music and balladry, wherein style labels are coded into style 0, style 1, style … and style N, N is the number of types of styles, all note tracks except percussion music tracks of all MIDI are traversed, the dynamics type of each MIDI is counted, the MIDI with less than 45 types of dynamics types is deleted, all the MIDI left after deletion is transposed to expand a data set so as to cover all possible tones, and the MIDI style labels before and after transposition are coded unchanged to obtain MIDI data sets with different styles for training;

step S2, for each style MIDI data set, sequentially inputting all MIDI in the style MIDI data set to a MIDI analysis module and a data preprocessing module, respectively adding the generated note matrix and strength matrix to a note matrix list and a strength matrix list, and after processing all different styles MIDI data sets, obtaining sample feature sets of different styles, wherein the sample feature sets comprise style label codes, note matrix lists and strength matrix lists;

step S3, training a note self-encoder and a velocity self-encoder of a specific style, taking a sample characteristic set of style 0 as input, taking a corresponding note matrix list for the note self-encoder, taking 32 as a batch, dividing the note matrix list into a plurality of note matrix sublists according to the batch, inputting all note matrixes in the same batch of note matrix sublists to an input layer of the note self-encoder, performing training for more than 200 times, using a mean square error as a loss function, and comparing the difference between the input note matrix and the generated note matrix until the loss function is converged to obtain the trained note self-encoder; for the strength self-encoder, taking a corresponding strength matrix list, taking 32 as one batch, dividing the strength matrix list into a plurality of strength matrix sub-lists according to the batch, inputting all strength matrices in the same batch of strength matrix sub-lists to an input layer of the strength self-encoder, performing more than 200 times of training, using a mean square error as a loss function, comparing the difference between the input strength matrix and the generated strength matrix until the loss function is converged, and obtaining the trained strength self-encoder; combining the trained note autoencoder and the strength autoencoder to obtain a trained style 0 autoencoder module;

and S4, executing the step S3 on the sample feature set of each style, thereby obtaining N different styles of self-encoder modules which are well trained.

6. The system of claim 1, wherein the style network module is used for learning the playing style of the music score and comprises an input layer, a hidden layer and an output layer, wherein the hidden layer is formed by connecting a plurality of layers of bidirectional cyclic neural networks, the input layer of the style network module receives the note vectors output from the encoder in the encoder module, the velocity vectors output from the encoder in the encoder module are used as learning targets, the note vectors pass through the input layer, then pass through the hidden layer, learn and finally reach the output layer, and output the predicted velocity vectors.

7. The system of claim 6, wherein the training process of the style network module is:

step T1, obtaining note vectors and force vectors output by encoders in note self-encoders and encoders in force self-encoders in self-encoder modules of different styles, and adding the note vectors and the force vectors into a note vector list and a force vector list respectively to form data sets required by network modules of different styles to be trained, wherein the data sets comprise style label codes, a note vector list and a force vector list;

step T2, training a specific style network module, taking a data set of style 0 as input, taking a note vector list and a force vector list of style 0, taking 8 as a batch, dividing the note vector list and the force vector list into a plurality of note vector sublists and a force direction sublist according to the batch, inputting all note vectors in the same batch of note vector sublists to an input layer of the style network module, taking all force vectors in the corresponding batch of force vector sublists as learning targets, performing more than 200 times of training by utilizing the many-to-many learning capability of a hidden layer formed by a multilayer recurrent neural network, comparing the difference between the input force vector and the generated force vector by using a mean square error as a loss function until the loss function is converged, and obtaining the trained style 0 network module;

and step T3, sequentially executing the step T2 on the data set for training each style network module obtained in the step T1 to obtain a plurality of different style network modules which are well trained.

8. The system of claim 1, wherein the music generation module is configured to configure, for the music score obtained by the MIDI analysis module, the force vector outputted by the style network module corresponding to the style selected by the user, and convert the music score into a MIDI music with high expressive power, and the configuration process is as follows:

r1, decoding the force vector output by the style network module corresponding to the style selected by the user through a decoder in a force self-encoder in the corresponding style self-encoder module to obtain a force matrix;

step R2, a music score obtained by the MIDI analysis module is taken, all note opening events on a unique note sound track of the music score are traversed, corresponding quantized positions of the note opening events are calculated by using the starting time of the note, the corresponding force values at the positions are searched in the force matrix obtained by decoding in the step R1 in combination with the pitch of the note, and the original force values are replaced in the note sound track;

step R3, the replaced note tracks are saved and combined with the existing unmodified global tracks in the score as the converted MIDI music piece.

Technical Field

The invention relates to the technical field of computer-aided music composition, in particular to an automatic MIDI playing style conversion system based on a recurrent neural network.

Background

In the modern society, people actively explore rich spirit and emotional world while pursuing the living standard of the substance. Music is an effective way for people to express emotions, constitutes and supports the mental life of human beings, and has an indispensable role in entertainment, education, medical treatment and the like. Different genres of music convey different music styles. The method has important significance for understanding the musical composition by grasping the style of the music. Whether a player can really understand a musical composition or not can only be judged by the correctness of his grasp of the style of music during the performance. The playing style can be described in terms of rhythm, dynamics, timbre and the like.

A Musical Instrument Digital Interface (MIDI) is an industry standard electronic communication protocol, and MIDI is an electronic music score viewed by an electronic Musical Instrument, and records a series of playing instructions instead of sound, and usually only dozens of KB is required for a file to enable the electronic Musical Instrument to play a complete piece of music. In the MIDI specification, each note has its own dynamics attribute, and different playing styles can be described by dynamics.

The artificial composition requires the creators to have the knowledge of the real music theory, the threshold is too high for the ordinary people, and the rise of the artificial intelligence prompts the intelligent composition and is usually used for generating the melody. However, smart composition is mainly focused on producing the constituent notes of music, i.e., pitch and duration, lacking dynamics. The same song, novice and experienced people will play different force ranges. If the dynamics can be automatically configured for the generated music score to assist in intelligent composition, the expressive power of the generated music can be increased.

With the rapid development of the internet and multimedia technologies, digital music that can be downloaded and transmitted over the internet has occupied a mainstream music market. Numerous companies at home and abroad have launched their music platforms, and a large number of users are gathered, so long as your works are sufficiently outstanding, the users can walk into the sight of numerous audiences. For the young people of the new age, it is attractive to automatically create a performance style belonging to different styles based on the existing music score.

For composers, the recording is usually performed while the music is played, and the conversion of playing styles can be used as tools for creators, so that the creators can easily integrate new styles and creatives into the composing work.

Disclosure of Invention

The invention aims to solve the defects in the prior art, and provides a MIDI performance style automatic conversion system based on a recurrent neural network, which automatically configures dynamics belonging to different performance styles for the existing music score and solves the problems of lack of expressive force and dynamic property of the music generated by the existing machine.

The purpose of the invention can be achieved by adopting the following technical scheme:

a MIDI playing style automatic conversion system based on a recurrent neural network comprises a MIDI analysis module, a data preprocessing module, a self-encoder module, a style network module and a music generation module which are sequentially connected; wherein the content of the first and second substances,

the MIDI analysis module is used for reading MIDI input by a user and combining multi-track MIDI into single track MIDI to be used as a music score; the data preprocessing module is used for extracting note characteristics from the music score; the self-encoder module encodes and decodes the extracted note characteristics and is used for compressing surface layer characteristics of the music score and extracting high-level characteristics of the music score; the style network module is used for learning the playing style of the music score, predicting the strength vector and assisting in composing music; the music generation module is used for configuring a strength vector predicted by the style network module for the music score and converting the strength vector into music with expressive force;

the automatic MIDI playing style conversion system for automatically converting the playing style comprises the following steps:

the MIDI input by the user is transmitted to a MIDI analysis module, the MIDI analysis module analyzes the type of the input MIDI, if the MIDI comprises a plurality of note tracks, all the rest note tracks are combined except the percussion track, otherwise, no processing is carried out, and the combined MIDI is stored as a music score;

acquiring a note vector, extracting a note matrix from the music score by a data preprocessing module, inputting the note matrix into a trained self-encoder module, and outputting the note vector after encoding by an encoder in a note self-encoder;

inputting the note vectors into trained different-style network modules, and predicting and generating different-style force vectors by the different-style network modules;

and the music generation module configures the force vector generated by the style network module corresponding to the style selected by the user for the music score and converts the force vector into MIDI of the style selected by the user.

Further, the working process of the MIDI analysis module is as follows:

firstly, all tracks of the MIDI are traversed, the tracks of the MIDI are divided into a global track and a note track, each note track respectively defines a tone color corresponding to a sound part, all the rest note tracks are combined to be used as a music score without considering the percussion track, and if the input MIDI only contains one note track, no processing is needed.

Further, the data preprocessing module is used for extracting note characteristics from the music score, wherein the note characteristics comprise pitch, note starting time, note duration and note strength, and a note matrix and a note strength matrix are designed by utilizing the note characteristics; the total time step of the music score is T, and the unit is the minimum time unit Ticks in MIDI; first, the minimum note represented by a single time step needs to be determined, and if 1 time step represents 1 sixteenth note, then 1 quartile note is represented by 4 time steps; the row of the note matrix represents a time step T, the range is 0-T, the column of the note matrix represents a pitch p, the range is 0-127, the value of the note matrix represents the state of the note at the time step, and the state of the note is represented by a two-dimensional vector consisting of 0 and 1, and comprises three note states: off, on, and on, represented as [0,0], [1,1], and [1,0] with two-dimensional vectors, respectively; the row of the force matrix represents the time step T, the range is 0-T, the list shows the pitch p, the range is 0-127, the value of the note matrix represents the force of the note with the corresponding pitch under the time step, and the value is 0-127.

Furthermore, the self-encoder module is used for encoding and decoding the note matrix and the strength matrix, and compared with the mode that the original note matrix and the original strength matrix are directly used as the input and the target of the style network module, the self-encoder module bears the responsibility of extracting a part of features and is used for compressing the dimensionality of the note matrix and the strength matrix and extracting the high-level features of the music score, so that the training of the style network module is accelerated.

Furthermore, the self-encoder module comprises two independent units of a note self-encoder and a strength self-encoder, and each unit is formed by sequentially connecting an encoder and a decoder; the note self-encoder is composed of an input layer, a hidden layer and an output layer, wherein the hidden layer is composed of a bidirectional recurrent neural network, the note self-encoder takes a note matrix as input and outputs the reconstructed note matrix, the input layer is connected with the hidden layer to form an encoder, the mapping relation of the encoder is used for encoding, and the note matrix is converted into a note vector; connecting the hidden layer to the output layer to form a decoder, wherein the mapping relation of the decoder is used for decoding, and reconstructing the note vector into a note matrix;

the force autoencoder is composed of an input layer, a hidden layer and an output layer, wherein the hidden layer is composed of a bidirectional recurrent neural network, the force autoencoder takes a force matrix as input and outputs the reconstructed force matrix, the input layer is connected with the hidden layer to form the encoder, the mapping relation of the encoder is used for encoding, and the force matrix is converted into a force vector; and connecting the hidden layer to the output layer to form a decoder, wherein the mapping relation of the decoder is used for decoding, and the force vector is reconstructed into a force matrix.

Further, the training of the self-encoder module is formed by separately training the note self-encoder and the strength self-encoder, and the whole training process is as follows:

step S1, obtaining downloaded MIDI data sets of different styles, wherein each MIDI has its own style label: classical music, popular music, jazz music, metallic music, country music, dance music and balladry, wherein style labels are coded into style 0, style 1, style … and style N, N is the number of types of styles, all note tracks except percussion music tracks of all MIDI are traversed, the dynamics type of each MIDI is counted, the MIDI with less than 45 types of dynamics types is deleted, all the MIDI left after deletion is transposed to expand a data set so as to cover all possible tones, and the MIDI style labels before and after transposition are coded unchanged to obtain MIDI data sets with different styles for training;

step S2, for each style MIDI data set, sequentially inputting all MIDI in the style MIDI data set to a MIDI analysis module and a data preprocessing module, respectively adding the generated note matrix and strength matrix to a note matrix list and a strength matrix list, and after processing all different styles MIDI data sets, obtaining sample feature sets of different styles, wherein the sample feature sets comprise style label codes, note matrix lists and strength matrix lists;

step S3, training a note self-encoder and a velocity self-encoder of a specific style, taking a sample characteristic set of style 0 as input, taking a corresponding note matrix list for the note self-encoder, taking 32 as a batch, dividing the note matrix list into a plurality of note matrix sublists according to the batch, inputting all note matrixes in the same batch of note matrix sublists to an input layer of the note self-encoder, performing training for more than 200 times, using a mean square error as a loss function, and comparing the difference between the input note matrix and the generated note matrix until the loss function is converged to obtain the trained note self-encoder; for the strength self-encoder, taking a corresponding strength matrix list, taking 32 as one batch, dividing the strength matrix list into a plurality of strength matrix sub-lists according to the batch, inputting all strength matrices in the same batch of strength matrix sub-lists to an input layer of the strength self-encoder, performing more than 200 times of training, using a mean square error as a loss function, comparing the difference between the input strength matrix and the generated strength matrix until the loss function is converged, and obtaining the trained strength self-encoder; combining the trained note autoencoder and the strength autoencoder to obtain a trained style 0 autoencoder module;

and S4, executing the step S3 on the sample feature set of each style, thereby obtaining N different styles of self-encoder modules which are well trained.

Further, the style network module is used for learning the playing style of the music score and is formed by sequentially connecting an input layer, a hidden layer and an output layer, wherein the hidden layer is formed by a multilayer bidirectional cyclic neural network, the input layer of the style network module receives the note vectors output by an encoder in a note self-encoder in the encoder module, the force vectors output by the encoder in the force self-encoder in the self-encoder module are taken as a learning target, the note vectors reach the output layer through the input layer and the hidden layer, and the predicted force vectors are output.

Further, the training process of the style network module is as follows:

step T1, obtaining note vectors and force vectors output by encoders in note self-encoders and encoders in force self-encoders in self-encoder modules of different styles, and adding the note vectors and the force vectors into a note vector list and a force vector list respectively to form data sets required by network modules of different styles to be trained, wherein the data sets comprise style label codes, a note vector list and a force vector list;

step T2, training a specific style network module, taking a data set of style 0 as input, taking a note vector list and a force vector list of style 0, taking 8 as a batch, dividing the note vector list and the force vector list into a plurality of note vector sublists and a force direction sublist according to the batch, inputting all note vectors in the same batch of note vector sublists to an input layer of the style network module, taking all force vectors in the corresponding batch of force vector sublists as learning targets, performing more than 200 times of training by utilizing the many-to-many learning capability of a hidden layer formed by a multilayer recurrent neural network, comparing the difference between the input force vector and the generated force vector by using a mean square error as a loss function until the loss function is converged, and obtaining the trained style 0 network module;

and step T3, sequentially executing the step T2 on the data set for training each style network module obtained in the step T1 to obtain a plurality of different style network modules which are well trained.

Further, the music generating module is configured to configure, for the music score obtained by the MIDI analyzing module, a force vector output by the style network module corresponding to the style selected by the user, and convert the force vector into a MIDI music with expressive force, where the configuring process is as follows:

r1, decoding the force vector output by the style network module corresponding to the style selected by the user through a decoder in a force self-encoder in the corresponding style self-encoder module to obtain a force matrix;

step R2, a music score obtained by the MIDI analysis module is taken, all note opening events on a unique note sound track of the music score are traversed, corresponding quantized positions of the note opening events are calculated by using the starting time of the note, the corresponding force values at the positions are searched in the force matrix obtained by decoding in the step R1 in combination with the pitch of the note, and the original force values are replaced in the note sound track;

step R3, the replaced note tracks are saved and combined with the existing unmodified global tracks in the score as the converted MIDI music piece.

Compared with the prior art, the invention has the following advantages and effects:

1. the invention improves and extracts the note matrix and the dynamics matrix on the basis of the common piano rolling matrix, and effectively distinguishes a plurality of continuous tones with the same pitch, so that more music information can be expressed in the note matrix.

2. The invention uses the note autoencoder and the dynamics autoencoder to extract the high-level characteristics of the music score, thereby not only compressing the dimensionality, but also learning the abstract representation of a specific style and accelerating the training of a style network module.

3. The invention utilizes the cyclic neural network to train different styles of network modules by using different styles of MIDI data sets, meets the individual requirements of users, and helps music enthusiasts to create music with different playing styles on an impromptu basis.

4. The invention can automatically configure the dynamics of different playing styles for the MIDI input by the user, and can be used as a tool for professional people, so that the professional people can easily integrate new styles and creatives into the work of the professional people.

Drawings

FIG. 1 is a schematic structural diagram of a MIDI playing style automatic conversion system based on a recurrent neural network according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating the operation of the automatic MIDI rendition style conversion system based on recurrent neural network in the embodiment of the present invention;

fig. 3 is a flowchart of extracting a note matrix and a strength matrix from a score according to an embodiment of the present invention.

FIG. 4 is a schematic diagram of an embodiment of an autoencoder of the present invention;

FIG. 5 is a block diagram of an embodiment of an auto-encoder module;

FIG. 6 is a schematic diagram of the training of the autoencoder module according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of the training of a style network module in an embodiment of the present invention;

fig. 8 is a flowchart of the operation of the music generation module in the embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

19页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种基于强弱标准的钢琴视奏能力评价系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!