MIDI multi-track sequence representation method and application

文档序号：1217190 发布日期：2020-09-04 浏览：19次中文

阅读说明：本技术 一种midi多轨序列表示方法和应用 (MIDI multi-track sequence representation method and application ) 是由任意李晨啸张克俊于 2020-05-12 设计创作，主要内容包括：本发明公开了一种MIDI多轨序列表示方法和应用,包括：解析MIDI文件,将MIDI文件中包含的MIDI消息拆分成音符演奏信息、音符停止信息、时间转移信息以及音色变换信息；将音符演奏信息、音符停止信息、时间转移信息以及音色变换信息转换成小节序列,每个小节序列包含小节开始标示和多个小节内时间步序列,每个时间步序列包含依序排布的时间步标示、音轨标示以及音符标示；其中,将一个小节分成32个时间步,根据音符演奏信息、音符停止信息、时间转移信息确定音符标示,根据音色变换信息确定音轨标示。以解决现有编码方式忽略音乐小节信息,不利于模型学习不同小节之间的关联信息。(The invention discloses a MIDI multi-track sequence representation method and application, comprising the following steps: analyzing the MIDI file, and splitting MIDI information contained in the MIDI file into note playing information, note stopping information, time shifting information and tone transformation information; converting the note playing information, the note stopping information, the time shifting information and the tone conversion information into bar sequences, wherein each bar sequence comprises a bar starting mark and a plurality of bar time step sequences, and each time step sequence comprises a time step mark, a sound track mark and a note mark which are sequentially arranged; wherein, a measure is divided into 32 time steps, note marks are determined according to note playing information, note stopping information and time shifting information, and track marks are determined according to tone color transformation information. The method solves the problem that the existing coding mode ignores music section information and is not beneficial to model learning of associated information among different sections.)

1. A MIDI multi-track sequence representation method, comprising the steps of:

analyzing the MIDI file, and splitting MIDI information contained in the MIDI file into note playing information, note stopping information, time offset information and tone transformation information;

converting the note playing information, the note stopping information, the time shifting information and the tone conversion information into bar sequences, wherein each bar sequence comprises a bar starting mark and a plurality of bar time step sequences, and each time step sequence comprises a time step mark, a sound track mark and a note mark which are sequentially arranged;

wherein, a measure is divided into 32 time steps, note marks are determined according to note playing information, note stopping information and time shifting information, and track marks are determined according to tone color transformation information.

2. The method of claim 1, wherein the note indicator comprises at least 3 note attributes, which are pitch information, duration information and velocity information.

3. A method as claimed in claim 2, wherein the pitches comprise 128 kinds, corresponding to 128 pitches in General MIDI.

4. The method of claim 2, wherein the lengths include 32, each corresponding to a note span from 1 time step to 32 time steps, and the length with a note span greater than 32 time steps is the 32 th length.

5. The method of claim 1, wherein the track designations include melody, drum, piano, string, guitar and bass tracks.

6. A music generation method based on a deep learning model comprises the following steps:

expressing original MIDI music as a music sequence by taking a bar sequence as a unit by using the MIDI multi-track sequence expression method of claims 1-5;

after a music sequence is coded into vectors, inputting vectors corresponding to marks into a trained music generation model in a mode of inputting one mark each time according to an arrangement sequence of bar start marks, time step marks, audio track marks and note marks in the music sequence, and outputting mark probability distribution through calculation, wherein the music generation model comprises a long-period memory network and a short-period memory network which are connected in sequence and a classifier;

sampling the output mark probability distribution each time to determine a new mark generated each time;

and sequentially discharging the new marks generated each time according to the generation sequence to form a generated music sequence.

7. The deep learning model-based music generation method of claim 6, wherein the classifier employs a softmax classifier.

8. The music generation method based on the deep learning model as claimed in claim 6, wherein the training process of the music generation model is as follows:

constructing a training sample, expressing MIDI music as a music sequence according to the MIDI multi-track sequence expression method of claims 1-5, moving each mark in the music sequence backwards by one bit, and forming a new mark sequence as the training sample;

and inputting the vector corresponding to the mark into the music generation model by utilizing the bar start mark, the time step mark, the audio track mark and the note mark in the training sample according to the arrangement sequence in the new mark sequence in a mode of inputting one mark each time, and training the music generation model.

Technical Field

The invention relates to the field of music design, in particular to a MIDI multi-track sequence representation method and a method for generating music by utilizing a deep learning model based on the MIDI multi-track sequence representation.

Background

Music is an audio modality that organizes information of sounds together according to certain purposes and rules and is used to express emotions and ideas. With the rapid development of artificial intelligence and deep learning, many automatic composition techniques have been proposed. For the automatic composition technology, music needs to be first converted into a form that can be understood by a machine. In order to serialize a score into a machine-understandable sequence, many previous efforts have attempted different encodings, which often have individually applicable scenarios and advantages.

The mainstream MIDI (Musical Instrument Digital Interface) coding method can be generally divided into two types: one is an image-based encoding scheme: as in document I: dong H W, Hsiao W Y, Yang LC, et al, Musegan, Multi-track sequential generating adaptive network for systematic music generation and access to animal [ C ]// third-Second AAAI consensus Artificial Intelligent interest.2018, and considers MIDI as a piano rolling graph, the horizontal axis is time, the vertical axis is pitch, and 0 and 1 are used for representing the current position and whether the tone is triggered or not. The coding mode can directly show the time sequence relation of notes and is easy to understand by human beings, but the piano rolling graph is very sparse, and experiments of the predecessors also show that the coding mode cannot be well understood and learned by machines.

The other is a sequence-based coding scheme: as in document II: huang C Z A, Vaswani A, UszkoreitJ, et al music transducer, Generating music with long-term structure [ J ].2018, which proposes an event sequence-based Note coding method, and splits notes into events such as Note On, Note Off, Time Shift, Program Change, etc., and then concatenates them to form an event sequence, which is widely adopted, but has an important problem: there is no explicit bar identifier, making it difficult to delineate bars from bar boundaries, which is not conducive to the model learning associations between different bars.

Compared with a single-track automatic music composing technology, the multi-track automatic music composing technology has higher difficulty and higher requirement on the coding form of MIDI. Document III: roberts a, Engel J, Raffel C, et al, ahirational vector modulation for learning long-term structure in music [ J ] arXiv prediction arXiv:1803.05428,2018, following the same coding scheme as document I, it does not encode the track information into the sequence, but models the multi-track MIDI on the model, thus presenting similar problems to the original coding. Document IV: donahue C, Mao H, Li YE, et al.LakhNES: Impropriating multi-instrumentation music generation with cross-domain pre-tracing [ J ]. arXiv preprinting arXiv:1907.04868,2019. A multi-track coding scheme is designed, and notes of different tracks are coded in an event form and then are merged together according to an event sequence, but the method does not explicitly code section information, so that the method cannot help a model to learn the connection between sections well.

In summary, the encoding method of the current mainstream automatic composition technology cannot well meet the requirement of multi-track encoding, and becomes a bottleneck of the current automatic composition technology.

Disclosure of Invention

The invention aims to provide a MIDI multi-track sequence representation method, which aims to solve the problem that the existing coding mode ignores music section information and is not beneficial to model learning of correlation information among different sections.

Another objective of the present invention is to provide a music generating method based on a deep learning model, in which training samples of the deep learning model are encoded and generated by a MIDI multi-track sequence representation method, so that the deep learning model can learn note information, bar information and track information of the training samples at the same time, and the music generating performance of the music is improved.

In order to achieve the purpose, the technical scheme provided by the invention is as follows:

a MIDI multi-track sequence representation method, comprising the steps of:

parsing a MIDI file, and dividing a MIDI message included in the MIDI file into Note playing (Note On) information, Note Off (Note Off) information, Time Shift (Time Shift) information, and tone color Change (Program Change) information;

Preferably, the note mark at least comprises 3 note attributes, which are Pitch (Pitch) information, Duration (Duration) information and Velocity (Velocity) information.

Preferably, the pitches contain 128, corresponding to 128 pitch representations in General MIDI.

Preferably, the duration includes 32 types, each corresponding to a note span from 1 time step to 32 time steps, and the duration with the note span greater than 32 time steps is the 32 th duration.

Preferably, the track designations include melody, drum, piano, string, guitar and bass tracks.

A music generation method based on a deep learning model comprises the following steps:

expressing original MIDI music as a music sequence by taking a bar sequence as a unit by using the MIDI multi-track sequence expression method of claims 1-5;

sampling the output mark probability distribution each time to determine a new mark generated each time;

and sequentially discharging the new marks generated each time according to the generation sequence to form a generated music sequence.

Preferably, the classifier employs a softmax classifier.

Preferably, the training process of the music generation model is as follows:

Compared with the prior art, the invention has the beneficial effects that:

in the MIDI multi-track sequence representation method provided by the invention, the bar information, the note information and the track information are fully considered, so that when the music sequence represented by the representation method is input into the deep learning model as a sample, the deep learning model can learn the associated information among the bars, and the music generation performance is stronger when the deep learning model is used for generating music.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a schematic diagram of MIDI multi-track sequence representation provided in the present invention, in which (a) is a MIDI format file of 2 tracks, and (b) is a MIDI multi-track sequence representation;

fig. 2 is a schematic structural diagram of a music generation model provided by the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the detailed description and specific examples, while indicating the scope of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.

The method aims to solve the problem that the existing coding mode ignores music section information and is not beneficial to model learning of associated information among different sections. The embodiment provides a MIDI multi-track sequence representation method which is suitable for multi-track coding and has universality in a deep learning-based composition model.

Specifically, the method for representing the MIDI multi-track sequence provided by the embodiment comprises the following steps:

s101, parsing the MIDI file, and dividing the MIDI message included in the MIDI file into note playing information, note stopping information, time shifting information, and timbre conversion information.

8页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种基于电子陀螺仪和电子加速度计的电子乐器

MIDI multi-track sequence representation method and application

相关技术

网友询问留言