Piano note recognition algorithm based on convolutional neural network

文档序号:1739460 发布日期:2019-12-20 浏览:25次 中文

阅读说明:本技术 基于卷积神经网络的钢琴音符识别算法 (Piano note recognition algorithm based on convolutional neural network ) 是由 董瓒 马学健 郭玲 于 2019-08-25 设计创作,主要内容包括:本发明公开了一种基于卷积神经网络的钢琴音符识别算法,主要步骤为:通过端点检测算法,从一段连续的钢琴音频中寻找出每个音符的起点和终点;将完整的钢琴音频分割成单个音符音频文件的集合;绘制每一个音符的频谱图;将频谱图输入到已经训练好的神经网络中完成识别。本发明提出一种寻找短时能量差峰值点结合双门限的算法,改进了传统双门限算法过分依赖阈值的设定的缺点;提出通过频谱图绘制,将音频信号的处理转换成数字图像识别,克服了传统时域方法提取基频时会出现的倍频错误,相较于传统频域方法,大大提高了计算速度和准确度。(The invention discloses a piano note recognition algorithm based on a convolutional neural network, which mainly comprises the following steps: searching a starting point and an end point of each note from a continuous piano audio through an end point detection algorithm; dividing the complete piano audio into a set of single note audio files; drawing a spectrogram of each note; and inputting the spectrogram into a trained neural network to finish the identification. The invention provides an algorithm for searching a short-time energy difference peak point and combining double thresholds, which overcomes the defect that the traditional double-threshold algorithm excessively depends on the setting of the threshold; the method has the advantages that the audio signal is processed and converted into the digital image for identification through drawing of the spectrogram, frequency doubling errors generated when the fundamental frequency is extracted by a traditional time domain method are overcome, and the calculation speed and accuracy are greatly improved compared with the traditional frequency domain method.)

1. A piano note identification algorithm based on a convolutional neural network is characterized by comprising the following steps:

step 1, finding out a starting point and an end point of each note from a continuous piano audio through an end point detection algorithm;

step 2, dividing the complete piano audio into a set of single note audio files according to the starting point and the ending point of each note;

step 3, drawing a spectrogram of each note;

and 4, inputting the spectrogram into the trained neural network to finish recognition.

2. The convolutional neural network-based piano note identification algorithm as claimed in claim 1, wherein step 1 finds the start point of each note using the energy mutation information on the time domain, and calculates the end point of each note using the double threshold algorithm in combination with the start point information;

the short-time energy formula is:

in the formula Si(m) is an amplitude of an m-th point of the i-th frame audio signal; l is the frame length;

the short-time energy difference is the energy difference delta E between two adjacent framesiNamely:

ΔEi=Ei-Ei-1

3. the convolutional neural network-based piano note identification algorithm as claimed in claim 2, wherein the end point detection algorithm based on short-time energy difference comprises the following steps:

A) calculating and drawing a short-time energy difference curve of two adjacent frames;

B) searching and marking all maximum value points in the curve as candidate note starting points;

C) setting a minimum peak height according to background environment sounds, and setting a shortest distance between adjacent peak points according to playing speed;

D) screening the peak points in the step B according to the minimum peak height and the minimum peak distance set in the step C, wherein the frame corresponding to the reserved points is the starting point of each note;

E) calculating the short-time zero-crossing rate of each frame, wherein the formula is as follows:

where w (n) is a window function, sgn represents a sign function, which is defined as follows:

F) and D, setting two thresholds of short-time energy and short-time zero-crossing rate, and respectively calculating the corresponding end point of each starting point obtained in the step D.

G) And judging the position of the end point corresponding to each starting point, and if the end point is behind the next starting point, taking the first 10 frames of the starting point behind the starting point as the corresponding end point.

4. The convolutional neural network-based piano note identification algorithm as claimed in claim 3, wherein the read-in audio signal is subjected to frame windowing and normalization before endpoint detection.

5. The convolutional neural network-based piano note identification algorithm as claimed in claim 3, wherein the difference between each pair of start and stop points is calculated, if the difference is smaller than the set shortest note length, it is determined as noise, the pair of start and stop points is deleted from the set, and finally the start and stop points of all notes are obtained.

6. The convolutional neural network-based piano note identification algorithm as claimed in claim 1, wherein step 4 inputs the spectrogram into the trained neural network to obtain the pitch; all convolution kernel sizes in the neural network are 3 x 3, the pooling layer is maximum pooling, the number of neurons of the full connection layer 1 is 1024, the number of neurons of the full connection layer 2 is 88, and the size corresponds to 88 pitches of the piano.

Technical Field

The invention belongs to an audio signal processing technology, and particularly relates to a piano note identification algorithm based on a convolutional neural network.

Background

With the development of economy and the improvement of culture, the number of music fans is increasing, but limited to factors such as energy and time, a considerable part of music fans choose to self-learn and practice during the off-hours. Because of lacking professional's guidance, the condition such as the wrong note of play and oneself can't judge often can appear, and the software that this moment a section can automatic identification piano play sound can help them to a great extent, and piano play note discernment can also alleviate music worker's working strength simultaneously, is favorable to the intellectuality of music processing and creation.

The piano note identification algorithm mainly comprises an end point detection part, a note segmentation part and a pitch identification part.

The end point detection and the note segmentation are key steps before note identification, and the accurate end point detection is a precondition for ensuring the accuracy of note identification. The double-threshold algorithm is the most classical endpoint algorithm, and the method respectively sets high and low threshold values (marked as delta) of short-time energy and short-time zero crossing rate1、δ2And Z1、Z2) A complete audio file is divided into four stages. 1 and a silent section: short time energy below delta2(ii) a 2. Transition section: short time energy greater than delta2Below delta1And the short-time zero crossing rate is more than Z2(ii) a 3. Music segment: short time energy greater than delta1And the short-time zero crossing rate is more than Z1(ii) a 4. Short time energy below delta2Or short time zero crossing rate lower than Z2. In practice, the noise condition is also taken into account, so that in addition to the above four thresholds, the shortest tone segment length and the longest transition segment length are additionally set for distinguishing noise and preventing tone truncation in advance. Therefore, the accuracy of the algorithm mainly depends on the setting of the threshold, and the setting of the threshold usually takes the background sound of a plurality of frames before the recording, which also has requirements on the recording file, and if a small popping sound occurs at the beginning of the recording, the accuracy rate is greatly reduced, and the practicability is lacked.

Conventional pitch identification has focused on research in both the time and frequency domains. The short-time autocorrelation function is used for judging the similarity degree of two signals in a time domain and is commonly used for detecting the synchronism and the periodicity of the signals. The property that the autocorrelation necessarily has a maximum value at the position of integral multiple of the period provides an important basis for extracting piano pitch, namely fundamental frequency, by using the short-time autocorrelation function. The fundamental frequency is extracted by a traditional autocorrelation function method by drawing a short-time autocorrelation function curve, the autocorrelation function is represented as a peak at a pitch period, and then the interval between two adjacent peaks is a gene period. However, in general, the fundamental component is not the strongest component, and the rich harmonic component makes the waveform of the audio signal very complex, and often a frequency doubling error occurs, that is, the result of the fundamental frequency estimation is the second frequency doubling or second frequency division of the actual fundamental frequency. The wavelet analysis method is used as a method in the field of applied mathematics, and local conversion is carried out on the time and the frequency of a signal, so that the fundamental frequency information in a music signal can be effectively extracted. The specific steps are that a wavelet component curve under the same grade number is drawn, the number n of sampling points between two maximum values in the curve reflects the pitch period, then the number of sampling points between adjacent maximum values under different grade numbers is calculated by continuously changing the grade number, and if the number of sampling points is not changed, the fundamental frequency is determined. Therefore, although the wavelet analysis method can effectively extract the fundamental frequency, the calculation amount is huge because wavelet components under different levels are calculated.

In summary, in the aspect of endpoint detection, the traditional double-threshold algorithm has the disadvantage of being too dependent on the setting of the double-threshold, and in the aspect of pitch identification fundamental frequency extraction, the traditional time domain method is prone to frequency multiplication errors and low in accuracy, while the traditional frequency domain method is high in algorithm complexity, large in calculation amount and low in operation efficiency, and both the frequency domain method and the time domain method have high requirements on signal-to-noise ratio, and cannot accurately extract audio signals with low signal-to-noise ratio.

Disclosure of Invention

The invention aims to provide a piano note identification algorithm based on a convolutional neural network.

The technical solution for realizing the purpose of the invention is as follows: a piano note identification algorithm based on a convolutional neural network comprises the following steps:

step 1, finding out a starting point and an end point of each note from a continuous piano audio through an end point detection algorithm;

step 2, dividing the complete piano audio into a set of single note audio files according to the starting point and the ending point of each note;

step 3, drawing a spectrogram of each note;

and 4, inputting the spectrogram into the trained neural network to finish recognition.

Compared with the prior art, the invention has the following remarkable advantages: (1) compared with the traditional double-threshold algorithm, the short-time energy difference and double-threshold-based endpoint detection algorithm provided by the invention does not excessively depend on the setting of the threshold value, and has high accuracy; (2) compared with the traditional time-frequency domain method, the algorithm for identifying the piano pitch by using the convolutional neural network provided by the invention has the advantages of no frequency doubling error, strong noise resistance, simple algorithm, high operation speed and high accuracy.

Drawings

FIG. 1 is a flow chart of the piano note identification algorithm based on the convolutional neural network of the present invention.

FIG. 2 is a diagram of a neural network used in the present invention.

Fig. 3 is a short time energy plot.

Fig. 4 is a graph illustrating a short-time energy difference curve.

Fig. 5 is a diagram illustrating a short-time energy difference peak point.

FIG. 6 is a schematic diagram of short-term energy difference peak screening.

Detailed Description

As shown in FIG. 1, the piano note identification algorithm based on the convolutional neural network of the present invention comprises the following steps:

step 1, reading a section of audio signal, performing framing and windowing on the audio signal, and performing normalization pretreatment.

The framing windowing represents the music signal from an unstable process as a combination of several frame sequences that are stable and time-invariant, and is the basis of a series of steps followed by calculating the relevant characteristics of the music signal.

Step 2, calculating and drawing a short-time energy difference curve of two adjacent frames, wherein the short-time energy and short-time energy difference formula is as follows:

ΔEi=Ei-Ei-1

since short-time energy difference information between frames is calculated, Δ EiFiltering micro energy fluctuation in a part of original signals, smoothing energy change of the whole audio information, and adopting difference operation to calculate difference value delta E of two adjacent framesiThe note onset is easier to determine than the energy of the short duration of each frame.

And step 3, searching and marking all maximum value points (peak value points) in the curve as candidate note starting points.

All peak points at this time include a large amount of background noise in the audio signal and the extreme points of the note signal, and need to be filtered.

And 4, setting the minimum peak height according to the background environment sound, and setting the shortest distance between adjacent peak points according to the playing speed.

The minimum peak height is mainly used for filtering background noise, and the shortest distance between adjacent peak points is mainly used for filtering pseudo end points in notes, so that one note is prevented from being cut off for multiple times and needs to be adjusted according to the beat speed when the piano is played.

And 5, screening the peak value points in the B according to the minimum peak value height and the minimum peak value distance set in the step 4, and reserving frames corresponding to the points, namely starting points of all notes.

Step 6, calculating the short-time zero crossing rate of each frame, wherein the formula is as follows:

where w (n) is a window function, sgn represents a sign function, which is defined as follows:

the short-time zero-crossing rate measurement has the significance that the periodic change of the signal can be reflected to a certain extent. For sampled sinusoidal periodic signals, the average zero crossing rate must be twice the signal frequency multiplied by the sampling period, and when the sampling period is fixed, the zero crossing rate reflects the signal frequency information. Especially for regular musical tone signals, the zero-crossing rate is distributed in a certain range, and the rule can be used for distinguishing musical tones from noise because the zero-crossing rate of the noise is larger.

And 7, setting two thresholds of short-time energy and short-time zero-crossing rate, and respectively calculating corresponding end points of each starting point obtained in the step 5.

And 8, judging the position of the end point corresponding to each starting point, and taking the first 10 frames of the starting point after the starting point as the corresponding end point if the end point is behind the next starting point.

And 9, calculating the difference value of each pair of start and stop points, judging the difference value as noise if the difference value is smaller than the set shortest note length, deleting the pair of start and stop points from the set, and finally obtaining the start and stop points of all notes.

Since the steps 8 and 9 carry out re-judgment and re-screening on each start point and each stop point, the dependence of the algorithm on threshold setting is greatly reduced, and the accuracy is improved.

And step 10, dividing the continuous notes in the audio into single notes according to the start and stop point information obtained in the step 9.

And step 11, drawing a spectrogram of each note.

And step 12, inputting the spectrogram into a trained neural network to obtain the pitch. The neural network structure is shown in fig. 2. All convolution kernels in the network are 3 x 3 in size, the pooling layers are in maximum pooling, the number of neurons in the fully-connected layer 1 is 1024, the number of neurons in the fully-connected layer 2 is 88, and the size corresponds to 88 pitches of the piano.

The present invention will be described in detail below with reference to the accompanying drawings and examples.

12页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种模拟声音处理器音色的方法、装置、终端设备及计算机可读存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!