Piano playing ability evaluation system based on strong and weak standards

文档序号:1244036 发布日期:2020-08-18 浏览:12次 中文

阅读说明:本技术 一种基于强弱标准的钢琴视奏能力评价系统 (Piano playing ability evaluation system based on strong and weak standards ) 是由 曹燕 吴梦杰 韦岗 于 2020-04-21 设计创作,主要内容包括:本发明公开了一种基于强弱标准的钢琴视奏能力评价系统,强标准指钢琴乐曲节奏、主旋律音符;弱标准指乐感表现力。该系统包括人机交互、节奏检测评价、主旋律音符估计评价、表现力检测评价和综合得分五个模块。人机交互模块负责乐谱显示、录入演奏音频;节奏检测评价模块提取音符起点,切割音符片段,得到节奏评价分数;主旋律音符估计评价模块提取音符片段的主旋律音符,得到主旋律评价分数;表现力检测评价模块计算演奏音频与标准音频的情感距离,归一化为评价分数;综合得分评价模块求上述得分的加权和实现视奏能力评价。本发明结合音乐欣赏的特点,设置强弱评价标准,使机器评价结果更接近人的主观感受。(The invention discloses a piano visual performance evaluation system based on strong and weak standards, wherein the strong standards refer to piano music rhythm and main melody notes; the weak standard indicates a sense of music expression. The system comprises five modules of man-machine interaction, rhythm detection evaluation, melody note estimation evaluation, expressive force detection evaluation and comprehensive scoring. The man-machine interaction module is responsible for displaying the music score and inputting the playing audio; the rhythm detection and evaluation module extracts the note starting point and cuts the note segment to obtain rhythm evaluation scores; the main melody note estimation and evaluation module extracts main melody notes of the note segments to obtain main melody evaluation scores; the expressive force detection and evaluation module calculates the emotional distance between the played audio and the standard audio and normalizes the emotional distance into an evaluation score; and the comprehensive score evaluation module is used for calculating the weighted sum of the scores to realize the evaluation of the visual performance. The invention combines the characteristics of music appreciation and sets the intensity evaluation standard, so that the machine evaluation result is closer to the subjective feeling of people.)

1. A piano visual performance evaluation system based on the strength standard is characterized by comprising a man-machine interaction module, a rhythm detection evaluation module, a melody note estimation evaluation module, an expressive force detection evaluation module and a comprehensive score evaluation module which are sequentially connected, wherein the comprehensive score evaluation module is respectively connected with the man-machine interaction module, the rhythm detection evaluation module, the melody note estimation evaluation module and the expressive force detection evaluation module,

the human-computer interaction module is used for selecting a music score from a pre-established database to display, recording video and audio and carrying out pretreatment;

the rhythm detection and evaluation module is used for extracting rhythm information of the played audio and comparing the rhythm information with rhythm information of the standard audio, normalizing the result to obtain rhythm evaluation scores, extracting note starting time in the audio file, dynamically and regularly aligning with the standard audio note starting point, and dividing the played audio into a plurality of note segments;

the main melody note estimation and evaluation module is used for extracting main melody notes in each note segment audio frequency, comparing the main melody notes with corresponding standard audio frequency main melody notes, and normalizing the result to obtain a main melody evaluation score;

the expressive force detection and evaluation module is used for training a two-dimensional emotion value prediction model of the audio, calculating a positive and negative-arousal music expression value of the played audio, comparing the positive and negative-arousal music expression value with a music expression value of a standard audio stored in a database of the music, calculating the Euclidean distance between the positive and negative music expression values and normalizing the Euclidean distance to obtain an expressive force evaluation score;

and the comprehensive score evaluation module is used for weighting the obtained evaluation scores of the main melody, the rhythm and the expressive force to obtain the final evaluation of the visual performance.

2. A piano visual performance evaluation system based on strong and weak criteria as claimed in claim 1, wherein said database establishment procedure is as follows:

collecting piano music required by piano video performance evaluation, and storing wav format audio, difficulty labels, music score information, time rhythm information, note information, audio two-dimensional music feeling values and harmonic amplitudes of 88 key single-tone samples into a database after digital processing.

3. The piano vision performance evaluation system based on the strength standard of claim 1, wherein the rhythm detection and evaluation module detects and extracts the note starting point by a time-frequency analyzer with adaptive parameters and a high-pass filter with adaptive parameters, wherein the adaptive parameters are adjusted according to the need of processing notes contained in a music score segment corresponding to an audio; the variable parameters of the time frequency analyzer comprise Fourier transform length, overlapping frame length and frame shift length; the variable parameter of the high-pass filter is cut-off frequency; and the rhythm detection and evaluation module realizes the dynamic regular alignment of the note starting points through a Mel filter, extracts the Mel coefficients of the audio segments of the notes, and performs the next note regular alignment treatment.

4. The piano visual performance evaluation system based on the strength criteria as claimed in claim 3, wherein the preprocessing process in the human-computer interaction module is as follows: filtering, denoising and normalizing signals input by a microphone into wav format playing audio, and cutting the audio into measure audio frames according to the beat and measure information of a music score;

the rhythm detection and evaluation module estimates the starting point of the note by a high-frequency energy difference method, and the process is as follows: processing each section of audio frame, setting cut-off frequency of a high-pass filter and time-frequency analysis parameters according to music score note information corresponding to the frame, acquiring time-frequency information, and then obtaining a high-frequency energy spectrum of each frame through the high-pass filter; carrying out first-order difference on the high-frequency energy spectrum, then carrying out peak value detection, setting a time threshold value, combining adjacent peak values in the threshold value, and selecting the initial time as a note starting point;

the dynamic warping and aligning process of the tone start points in the rhythm detection evaluation module is as follows:

segmenting the audio frequency according to the detected and extracted note starting point, and obtaining a Mel coefficient from each segment of audio frequency through a Mel filter to obtain a playing audio Mel coefficient matrix; dynamically regulating and aligning the playing audio Mel coefficient matrix and a known standard audio Mel coefficient matrix according to the similarity;

the rhythm detection and evaluation module compares the rhythm difference between the played audio and the standard audio according to the aligned note starting point information and normalizes the rhythm difference into rhythm evaluation scores; and cuts the measure audio frame into a plurality of note segments, each note segment containing one or more notes.

5. The piano visual performance evaluation system based on the strength standard of claim 1, wherein the melody note estimation and evaluation module comprises a low pass filter with adaptively adjusted cut-off frequency and a detuning regulation filter, wherein the cut-off frequency of the low pass filter is adaptively adjusted according to the fundamental frequency of the lowest note of the right hand in the score corresponding to the processed audio; the detuning regulation filter comprises a plurality of pass bands, the number of the pass bands is determined by fundamental frequency, each pass band is in a triangular shape or a cosine shape, the center frequency of the filter is theoretical harmonic frequency, and the pass bands of the detuning regulation filter are provided with different cut-off frequencies according to different harmonic times corresponding to the center frequency.

6. The piano vision performance evaluation system of claim 5, wherein the melody note evaluation module divides the audio transform domain into left and right hand parts according to the known score information and the performance characteristics of the left and right hands, and processes them separately, first using the "spectral subtraction maximum cross-correlation" method to estimate one or more notes played by the left hand, then subtracting the higher harmonics generated by the left hand spectral notes from the right hand spectral part, and then using the "spectral subtraction maximum cross-correlation" method to estimate the right hand notes as melody notes, the specific process is:

low-pass filtering to obtain the left half frequency-divided spectrum: carrying out short-time Fourier transform on the audio frequency segment of the melody to be estimated, and obtaining a left half frequency division spectrum through a low-pass filter after normalization;

note is estimated using the "spectral minus maximum cross-correlation" method: carrying out spectrum peak value detection on the left half frequency division spectrum, recording 'peak value frequency-peak value', sequentially calculating the difference between each peak value frequency and all backward peak value frequencies to form a spectrum peak frequency difference matrix, recording the front N columns of elements of the matrix as values to be processed, comparing NxM values with piano fundamental frequency, wherein M is the number of matrix lines, eliminating the values which cannot be the fundamental frequency, the rest is the possible fundamental frequency, calculating the maximum cross correlation between the 'harmonic-amplitude' of notes corresponding to the possible fundamental frequency and the 'peak value frequency-peak value', and taking the notes corresponding to the fundamental frequency of which the maximum cross correlation value is higher than a set threshold value as estimated determined notes, namely the estimated notes;

and (3) separating a right half spectrum estimation main melody by detuning and warping: setting a detuning regulation filter for the left-hand note obtained by the estimation, dividing the transformed frequency spectrum into two parts according to the left-hand highest note in the music score by passing the audio Fourier transform spectrum through the detuning regulation filter, taking the right half part for peak value detection to obtain the right half part, namely 'peak value frequency-peak value', and then subtracting the estimated higher harmonic peak value generated by the left-hand note; estimating the right-hand note by using a 'spectrum minus maximum cross correlation' method, namely the main melody note;

and comparing the estimated main melody notes with the main melody notes of the music score, and normalizing the main melody notes into evaluation scores to obtain the main melody evaluation scores.

7. The piano visual performance evaluation system based on the strength standard according to claim 1, wherein the expressive force detection evaluation module uses a continuous space composed of "negativity-wakefulness" in psychology as musical performance evaluation, maps musical performance expression as a point in the space, adopts a support vector regression method to build a two-dimensional emotional value prediction model, calculates the "negativity-wakefulness" values of the standard audio and the corresponding performance audio respectively, calculates the Euclidean distance between the standard audio and the performance audio, and normalizes to obtain an expressive force evaluation score.

Technical Field

The invention relates to the technical field of music signal main melody note estimation, music sensation identification and signal filter analysis, in particular to a piano visual performance evaluation system based on a strong and weak standard.

Background

The 'video' of the piano means that a player takes a new music, and directly reads the music for playing according to the music score information, which is different from the back music playing, the 'video' can reflect the piano level of the player, is an important link in the piano learning, but is a link which is neglected in the traditional piano teaching in China. The main reasons are that the traditional teaching needs music teachers to guide one by one, but the music teachers are expensive, the quality of the music teachers is uneven, and the performance of students cannot be accurately judged. By utilizing the computer technology, the capability of a player is scientifically and objectively judged, and an electronic teacher can be used for replacing a traditional teacher to assist the piano to play and learn, so that the learning efficiency is improved, and the cost is reduced; and the piano test board can be used for piano test examinations, provides objective evaluation on playing capability, and avoids artificial subjective influence.

The existing research on playing evaluation is roughly divided into two types, namely, the playing evaluation is regarded as the problem of multi-base note detection, the played notes are extracted by analyzing the playing waveform, and the played notes are compared with a music score to obtain a conclusion; secondly, the performance evaluation is regarded as a waveform similarity matching problem, a standard waveform of the music is given, and the standard waveform is extracted to be matched with each characteristic of the performance waveform, so that a conclusion is obtained.

The method has the disadvantages that in view of the multi-fundamental frequency detection technology, the fundamental frequency cannot be extracted completely and correctly, so that the evaluation accuracy is reduced; the standard waveform is completely used as a standard, and music knowledge is separated, so that the computer evaluation result has no music artistry.

Disclosure of Invention

The invention aims to solve the problems that a professional teacher needs to evaluate on site, different judges need to evaluate subjectivity and the like in piano video learning and capability detection, and provides a piano video capability evaluation system based on the strength standard. The system can obtain comprehensive evaluation of the video playing capability of the user and evaluation of rhythm, note and expressive force according to the recorded video playing audio of the user. In the invention, firstly, the piano performance evaluation problem is divided into two parts of strong and weak standards, wherein the strong standards refer to the main melody notes and rhythms of the music; the weak standard refers to the overall musicality expression of the piano music played. And after the two parts are respectively evaluated, the comprehensive evaluation of the piano visual performance is obtained by combining the music score difficulty weighting. The known music score information is fully utilized, and the artistic characteristics of music evaluation are combined, so that the evaluation which is closer to human music appreciation can be obtained; secondly, considering the inaccuracy of note starting point detection, the detected note starting points are regularly aligned with the standard notes through the similarity of the Mel coefficients, and the accuracy of subsequent module evaluation is improved; thirdly, in order to overcome the complexity of the polyphone estimation problem, music score information is utilized to divide the note estimation into a left part and a right part for processing respectively, and then harmonic waves generated by a left-hand spectrum are subtracted from a right-hand spectrum, so that the number of polyphones processed each time is reduced, and the note extraction accuracy is improved; fourthly, a single-label or multi-label classification method in the traditional music feeling evaluation is abandoned, a continuous space formed by positive and negative characters and awakening degrees in psychology is used, the music feeling expressive force evaluation is obtained by comparing the emotion space distance of the playing audio and the standard audio, the ambiguity of the evaluation emotion words is overcome, and the evaluation is closer to the human emotion evaluation.

The purpose of the invention can be achieved by adopting the following technical scheme:

a piano visual performance evaluation system based on the strength standard comprises a man-machine interaction module, a rhythm detection evaluation module, a melody note estimation evaluation module, an expressive force detection evaluation module and a comprehensive score evaluation module which are sequentially connected, wherein the comprehensive score evaluation module is respectively connected with the man-machine interaction module, the rhythm detection evaluation module, the melody note estimation evaluation module and the expressive force detection evaluation module,

the human-computer interaction module is used for selecting a music score from a pre-established database to display, recording video and audio and carrying out pretreatment;

the rhythm detection and evaluation module is used for extracting rhythm information of the played audio and comparing the rhythm information with rhythm information of the standard audio, normalizing the result to obtain a rhythm evaluation score, detecting and extracting note starting time in the audio file, regularly aligning with the standard audio note dynamics, and dividing the played audio into a plurality of note segments;

the main melody note estimation and evaluation module is used for extracting main melody notes in each note segment audio to compare with corresponding standard audio main melody notes, and normalizing the result to obtain a main melody evaluation score;

the expressive force detection and evaluation module is used for training a two-dimensional emotion value prediction model of the audio, calculating a positive and negative-arousal music expression value in the played audio, comparing the positive and negative music expression value with a music expression value of a standard audio stored in a database of the music, calculating the Euclidean distance between the positive and negative music expression values and the music expression value, and normalizing the result to obtain an expressive force evaluation score;

and the comprehensive score evaluation module is used for weighting the obtained evaluation scores of the main melody, the rhythm and the expressive force to obtain the final evaluation of the visual performance.

Further, the database establishment process is as follows:

collecting piano music required by piano video performance evaluation, and storing wav format audio, difficulty labels, music score information, time rhythm information, note information, audio two-dimensional music feeling values and harmonic amplitudes of 88 key single-tone samples into a database after digital processing.

Further, the rhythm detection and evaluation module detects and extracts the note starting point and realizes the note starting point through a time-frequency analyzer with self-adaptive parameters and a high-pass filter with self-adaptive parameters, wherein the self-adaptive parameters are adjusted according to the note contained in the music score segment corresponding to the audio to be processed; the variable parameters of the time frequency analyzer comprise Fourier transform length, overlapping frame length and frame shift length; the variable parameter of the high-pass filter is cut-off frequency; and the rhythm detection and evaluation module realizes the dynamic regular alignment of the note starting points through a Mel filter, extracts the Mel coefficients of the audio segments of the notes, and performs the next note regular alignment treatment.

Further, the preprocessing process in the human-computer interaction module is as follows: filtering, denoising and normalizing signals input by a microphone into wav format playing audio, and cutting the audio into measure audio frames according to the beat and measure information of a music score;

the rhythm detection and evaluation module estimates the starting point of the note by a high-frequency energy difference method, and the process is as follows: processing each section of audio frame, setting cut-off frequency of a high-pass filter and time-frequency analysis parameters according to music score note information corresponding to the frame, acquiring time-frequency information, and then obtaining a high-frequency energy spectrum of each frame through the high-pass filter; carrying out first-order difference on the high-frequency energy spectrum, then carrying out peak value detection, setting a time threshold value, combining adjacent peak values in the threshold value, and selecting the initial time as a note starting point;

considering the inaccuracy of note starting point detection, the extracted note and the standard note need to be dynamically regulated, and the dynamic regulation and alignment process of the note starting points in the rhythm detection and evaluation module is as follows:

segmenting the audio frequency according to the detected and extracted note starting point, and obtaining a Mel coefficient from each segment of audio frequency through a Mel filter to obtain a playing audio Mel coefficient matrix; dynamically regulating and aligning the playing audio Mel coefficient matrix and a known standard audio Mel coefficient matrix according to the similarity;

the rhythm detection and evaluation module compares the rhythm difference between the played audio and the standard audio according to the aligned note starting point information and normalizes the rhythm difference into rhythm evaluation scores; and cuts the measure audio frame into a plurality of note segments, each note segment containing one or more notes.

Further, the main melody note estimation and evaluation module comprises a low-pass filter with a cut-off frequency self-adaptively adjusted and a detuning regulation filter, wherein the cut-off frequency of the low-pass filter is self-adaptively adjusted according to the fundamental frequency of the lowest note of the right hand in the music score corresponding to the processed audio; because of the semiharmonic property of piano notes, higher harmonics are higher than theoretical harmonic frequency points, so that detuning regulation is required, the detuning regulation filter comprises a plurality of pass bands, the number of the pass bands is determined by fundamental frequency, the harmonic of low-frequency notes is rich, the detuning of the higher harmonics is serious, and the number of the pass bands is large; the shape of each passband can be triangular or cosine, the center frequency of the filter is theoretical harmonic frequency, and considering the characteristics of the invention that the harmonic frequency required to be utilized is lower, the lower harmonic detuning is not serious, the higher the harmonic frequency is, the more serious the detuning is, and the like, each passband of the detuning-regulating filter is provided with different cut-off frequencies according to the different harmonic frequencies corresponding to the center frequency, for example, when the harmonic number is below 5, the cut-off frequency of the filter is plus or minus 2Hz of the center frequency, and more than five times is plus or minus 8 Hz.

Furthermore, the melody note estimation and evaluation module divides the audio frequency transformation domain into a left hand part and a right hand part for processing respectively according to the known music score information and the performance characteristics of the left hand and the right hand, firstly estimates one or more notes played by the left hand by using a 'spectral subtraction maximum cross correlation' method, then subtracts higher harmonics generated by the left hand spectral notes from the right hand spectral part, and then estimates the right hand notes as the melody notes by using the 'spectral subtraction maximum cross correlation' method. The specific process is as follows:

low-pass filtering to obtain the left half frequency-divided spectrum: carrying out short-time Fourier transform on the audio frequency segment of the melody to be estimated, and obtaining a left half frequency division spectrum through a low-pass filter after normalization;

note is estimated using the "spectral minus maximum cross-correlation" method: performing spectrum peak value detection on the left half frequency spectrum obtained in the process, recording 'peak value frequency-peak value', sequentially calculating the difference between each peak value frequency and all the peak value frequencies in the backward direction to form a spectrum peak frequency difference matrix, recording the front N columns of elements of the matrix as to-be-processed values, comparing NxM (M is the number of rows of the matrix) values with the piano fundamental frequency, rejecting the values which cannot be the fundamental frequency, and obtaining the rest values which are possible fundamental frequencies; calculating the maximum cross correlation between the harmonic-amplitude of the possible fundamental frequency corresponding note and the peak frequency-peak value, and taking the fundamental frequency corresponding note with the maximum cross correlation value higher than a set threshold value as an estimated determined note, namely the estimated note;

and (3) separating a right half spectrum estimation main melody by detuning and warping: setting a detuning regulation filter for the left-hand note obtained by the estimation, dividing the transformed frequency spectrum into two parts according to the left-hand highest note in the music score by passing the audio Fourier transform spectrum through the detuning regulation filter, taking the right half part for peak value detection to obtain the right half part, namely 'peak value frequency-peak value', and then subtracting the estimated higher harmonic peak value generated by the left-hand note; estimating the right-hand note by using a 'spectrum minus maximum cross correlation' method, namely the main melody note;

and comparing the estimated main melody notes with the main melody notes of the music score, and normalizing the main melody notes into evaluation scores to obtain the main melody evaluation scores.

Furthermore, because the ambiguity of the expression of the musical sense is evaluated, the expression detection evaluation module uses a continuous space formed by 'negativity-awakening degree' in psychology as the musical sense evaluation, the musical sense expression is mapped to one point in the space, a two-dimensional emotion value prediction model is built by adopting a support vector regression method, the 'negativity-awakening degree' values of the standard audio and the corresponding playing audio are respectively calculated, the Euclidean distance between the standard audio and the playing audio is calculated, and the expression evaluation score is obtained through normalization.

Compared with the prior art, the invention has the following advantages and effects:

(1) when the note starting point is extracted by detecting the audio rhythm, the characteristic that the high-frequency energy is more quickly attenuated than the low-frequency energy is considered, when a new note is pressed down, the high-frequency energy is suddenly changed from attenuation to steep increase, the steep increase moment is used as the new note starting point, and the accuracy of note starting point detection is improved.

(2) In order to improve the precision of note starting point detection, the human ear auditory Mel coefficient similarity is utilized to orderly align the detected notes and the standard notes, the problem that the played audio frequency division is not aligned with the standard audio frequency division due to the wrong note starting point detection is solved, and the accuracy of rhythm and melody note evaluation is improved.

(3) When estimating the main melody note, extracting the note in the audio frequency into a left hand part and a right hand part according to the statistical characteristics of the number of the left hand and the right hand playing note in the piano music and the harmonic characteristics of the note, firstly setting a low pass filter according to the known left hand and right hand spectrum information, estimating the low frequency part note played by the left hand by using a 'spectrum minus maximum cross correlation' method, then subtracting the harmonic of the left hand low frequency note from the right hand high frequency spectrum, estimating the right hand note to be the main melody note, reducing the number of polyphones processed by the two parts independently, avoiding the problem that the accuracy of polyphone estimation is reduced along with the increase of the number of the note, and improving the accuracy of the detection of the main melody note.

(4) When the expressive force of the playing audio is evaluated, the continuous emotion space distance is used as an evaluation standard, the ambiguity and discontinuity caused by the fact that the emotion words are used as the evaluation standard to regard the music as the label classification problem are overcome, the fineness of the evaluation music is improved, and the evaluation is closer to the evaluation of human subjective feeling.

(5) The evaluation strategy is set by fully combining the artistic characteristics of music, the problem is divided into a strong standard and an independent play (weak standard), the main melody note is the framework of the music as the strong standard, the subjective feelings such as musicality expressive force and the like are as the weak standard, the evaluation on the playing capability is closer to the subjective feelings of people, the defect that the conventional computer playing evaluation method is separated from music appreciation is overcome, and the intelligence of the computer playing evaluation is improved.

(6) By combining with the actual situation, the complete playing error of a subject is a small-probability event, so that the method fully utilizes the known statistical information of the musical notes of the music score, sets parameters such as appropriate time-frequency analysis, filters, left-hand and right-hand frequency spectrum division and the like aiming at different music scores, more accurately extracts the characteristics, overcomes the defect that the known information of the music score is abandoned in the conventional computer playing evaluation method, and ensures the accuracy of the evaluation result.

Drawings

FIG. 1 is a block diagram of the piano visual performance evaluation system disclosed in the present invention;

FIG. 2 is a diagram of the preprocessing stage of the piano visual performance evaluation system disclosed by the invention;

FIG. 3 is a block diagram of the structure of a rhythm detection evaluation module in the piano vision performance evaluation system disclosed in the present invention;

FIG. 4 is a flow chart of a note start warping alignment method according to the present invention;

FIG. 5 is a schematic diagram of the detuning-warping filter of the present invention;

FIG. 6 is a flow chart of a spectral subtraction maximum cross-correlation note estimation method of the present invention;

FIG. 7 is a block diagram of the structure of the melody note estimation and evaluation module in the piano video performance evaluation system disclosed in the present invention;

FIG. 8 is a flow chart of the piano visual performance evaluation method based on the strong and weak labels disclosed in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

18页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种传统中国民族乐器的音符比对系统及其使用方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!