Music beat extraction method for non-woven engineering

文档序号：11665 发布日期：2021-09-17 浏览：22次中文

阅读说明：本技术 一种用于非编工程中的音乐节拍提取方法 (Music beat extraction method for non-woven engineering ) 是由马萧萧张博文黄平雷锴赵越于 2021-04-30 设计创作，主要内容包括：本发明公开了一种用于非编工程中的音乐节拍提取方法,包括以下步骤：步骤一：音频数据样本预处理,对原始音频数据样本进行预处理,将原始音频数据样本打包为音频帧格式的音频数据样本；步骤二：音频数据样本节奏评估,对音频帧格式的音频数据样本进行节奏评估,获得音频数据样本的每分钟节拍数；步骤三：音频数据样本节拍位置评估,利用音频数据样本的每分钟节拍数进行节拍位置评估,定位出音频数据样本中每个节拍的出现的具体时间位置。本发明通过对音频数据进行预处理,并评估预处理后的音频数据的每分钟节拍数,利用节拍数评估出音频样本中每个节拍的出现的具体时间位置,实现音乐节拍提取,提高了关键点位的定位精度。(The invention discloses a music beat extraction method used in non-woven engineering, which comprises the following steps: the method comprises the following steps: preprocessing an audio data sample, namely preprocessing an original audio data sample, and packaging the original audio data sample into an audio data sample in an audio frame format; step two: evaluating the rhythm of the audio data sample, namely evaluating the rhythm of the audio data sample in an audio frame format to obtain the beats per minute of the audio data sample; step three: and evaluating the beat position of the audio data sample, namely evaluating the beat position by using the beats per minute of the audio data sample, and positioning the specific time position of each beat in the audio data sample. According to the method, the audio data are preprocessed, the beats per minute of the preprocessed audio data are evaluated, the specific time position of each beat in the audio sample is evaluated by the beats, the music beat extraction is realized, and the positioning precision of key point positions is improved.)

1. A music beat extraction method used in non-woven engineering is characterized by comprising the following steps:

the method comprises the following steps: preprocessing an audio data sample, namely preprocessing an original audio data sample, and packaging the original audio data sample into an audio data sample in an audio frame format;

step two: evaluating the rhythm of the audio data sample, namely evaluating the rhythm of the audio data sample in an audio frame format to obtain the beats per minute of the audio data sample;

step three: and evaluating the beat position of the audio data sample, namely evaluating the beat position by using the beats per minute of the audio data sample, and positioning the specific time position of each beat in the audio data sample.

2. The method according to claim 1, wherein the first step specifically comprises:

s101, merging audio sample channels, and taking samples obtained by averaging multiple channels or only selecting one channel as a sample f (t) for subsequent processing;

s102, audio samples are down-sampled, the samples f (t) are down-sampled at a sampling frequency fq of 8KHz, the down-sampled samples are marked as S (n), and n is the index number of the sample data;

and S103, packing the audio samples, and packing the down-sampled samples S (n) into audio frames according to a preset audio frame format.

3. The method for extracting music beats in non-woven engineering according to claim 1, wherein the second step specifically comprises:

s201, performing spectrum analysis, namely analyzing the spectrum intensity which is adaptive to the auditory sense of human ears at different moments in different audio frames by using a spectrum analysis method;

s202, establishing a rhythm sample, and extracting the beat number in the audio frame by using a music beat extraction method according to the spectrum intensity at different moments;

and S203, evaluating the BPM, and performing autocorrelation operation on the audio intensity to acquire the BPM of the audio frame.

4. The method for extracting music beats in non-woven engineering according to claim 1, wherein the third step specifically comprises: generating a Gaussian window pair by taking the BPM of the audio frame as a parameter; filtering the audio intensity to obtain a beat score Ga (fn); searching local extreme points for the beat scores to obtain a beat time sequence T (n) with an audio frame index number fn and meeting preset searching conditions; taking the maximum value of the beat score Ga (fn) at the last dn moment in the beat time sequence T (n) as the end time t of the audio beat_end(ii) a From the end time t_endAnd (3) finding a sample index fn corresponding to the maximum value of the beat score Ga (fn) in the beat time sequence T (n) every time the beat score Ga (fn) traces back to the dn moment, and calculating the beat time t according to a formula t-fn/fm so as to position the specific time position of each beat in the audio data sample.

5. The method for extracting music tempo used in non-woven engineering according to claim 3, wherein the step S201 specifically comprises the following sub-steps:

s2011, Hamming window adding processing is carried out on the audio frame by utilizing a Hamming window function;

s2012, performing fast Fourier transform on the audio frame subjected to the Hamming window processing to obtain Fourier spectrum intensity of each frame;

s2013, converting the Fourier spectrum intensity into a Mel spectrum by using a matrix transformation method;

s2014, converting the Mel frequency spectrum into Mel spectrum intensity of decibel amplitude by utilizing a Mel spectrum conversion formula.

6. The method according to claim 3, wherein the step S202 specifically comprises the following sub-steps:

s2021, cutting off the lowest 20% intensity in the Mel sound spectrum intensity through threshold operation, and removing low-decibel sound in the Mel sound spectrum;

s2022, calculating a sound spectrum intensity increment, and summing the audio intensity increments of each audio frame;

and S2023, performing IIR filtering on the summed audio intensity increment by using an IIR filter, and removing a direct-current component.

Technical Field

The invention relates to the technical field of video editing, in particular to a music beat extraction method used in non-editing engineering.

Background

In recent years, with the increasing of network speed, with the rise of short videos, especially for a stuck point clipping mode in a short video, how to make video slices quickly and accurately align audio drumbeats and make an output picture more in line with the function of music beats is sought by short video producers.

At present, in a commonly used audio click clipping method, clipping personnel acquire key point locations by manually dotting an audio slice in the audio slice audition process, and introduce other material slices to correspond to the point locations.

Patent application No. CN201910619907.9 discloses a method and apparatus for generating multimedia, an electronic device, and a storage medium, wherein the method comprises: obtaining the frequency spectrum of each audio frame in the audio according to the selected audio for the multimedia; carrying out differential calculation according to the frequency spectrum of each audio frame in the audio to obtain the frequency spectrum flux of the audio frame; carrying out peak value detection according to the spectral flux of the audio frame, and positioning the audio frame where a drum point in the audio is located; generating a video clip aligned to a drum point according to a video material selected for the multimedia; and synthesizing the video clip and the audio according to the drum points aligned with the video clip to obtain multimedia, wherein the multimedia is adapted to the drum points in the audio to switch the corresponding video clip. Although the scheme can improve the generation speed of multimedia, the problem of low positioning precision of audio drum points exists.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a music beat extraction method for non-woven engineering.

The purpose of the invention is realized by the following technical scheme:

a music beat extraction method for non-woven engineering comprises the following steps:

step two: evaluating the rhythm of the audio data sample, namely evaluating the rhythm of the audio data sample in an audio frame format to obtain the beats per minute of the audio data sample;

Specifically, the first step specifically comprises:

s101, merging audio sample channels, and taking samples obtained by averaging multiple channels or only selecting one channel as a sample f (t) for subsequent processing;

and S103, packing the audio samples, and packing the down-sampled samples S (n) into audio frames according to a preset audio frame format.

Specifically, the second step specifically comprises:

s202, establishing a rhythm sample, and extracting the beat number in the audio frame by using a music beat extraction method according to the spectrum intensity at different moments;

and S203, evaluating the BPM, and performing autocorrelation operation on the audio intensity to acquire the BPM of the audio frame.

Specifically, the third step specifically comprises: generating a Gaussian window pair by taking the BPM of the audio frame as a parameter; filtering the audio intensity to obtain a beat score Ga (fn); searching local extreme points for the beat scores to obtain a beat time sequence T (n) with an audio frame index number fn and meeting preset searching conditions; taking the maximum value of the beat score Ga (fn) at the last dn moment in the beat time sequence T (n) as the end time t of the audio beat_end(ii) a From the end time t_endFind forward each time trace back to dn time in the beat score Ga (fn), i.e. in the beat time sequence T (n) at [ t [ [ t ]_end-dn，t_end]A sample index number fn corresponding to the maximum value of the beat score Ga (fn) corresponding to the moment in the time period; and calculating the beat time t according to the formula t-fn/fm, thereby locating the specific time position of the occurrence of each beat in the audio data sample.

Specifically, step S201 specifically includes the following sub-steps:

s2011, Hamming window adding processing is carried out on the audio frame by utilizing a Hamming window function;

s2012, performing fast Fourier transform on the audio frame subjected to the Hamming window processing to obtain Fourier spectrum intensity of each frame;

s2013, converting the Fourier spectrum intensity into a Mel spectrum by using a matrix transformation method;

s2014, converting the Mel frequency spectrum into Mel spectrum intensity of decibel amplitude by utilizing a Mel spectrum conversion formula.

Specifically, step S202 specifically includes the following sub-steps:

s2021, truncating the low-decibel spectrum intensity in the Mel sound spectrum intensity through threshold operation, and eliminating the low-decibel sound in the Mel sound spectrum;

s2022, calculating a sound spectrum intensity increment, and summing the audio intensity increments of each audio frame;

and S2023, performing IIR filtering on the summed audio intensity increment by using an IIR filter, and removing a direct-current component.

The invention has the beneficial effects that: according to the invention, the audio data is preprocessed, the beats per minute of the preprocessed audio data are evaluated, and the specific time position of each beat in the audio sample is evaluated by using the beats, so that the music beat extraction is realized, the positioning precision of key point positions is improved, and the editing experience of audio stuck points is improved.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

FIG. 2 is a flow chart of the audio data sample preprocessing of the present invention.

Fig. 3 is a flow chart of the spectral analysis of the present invention.

Fig. 4 is a rhythm sample construction flow diagram of the present invention.

FIG. 5 is a flow chart of the BPM evaluation of the present invention.

Fig. 6 is a flowchart of beat location evaluation of the present invention.

Detailed Description

In order to more clearly understand the technical features, objects, and effects of the present invention, embodiments of the present invention will now be described with reference to the accompanying drawings.

In this embodiment, as shown in fig. 1, a music beat extraction method for non-woven engineering includes the following steps:

(1) the audio data sample preprocessing comprises the following specific steps:

(1.1) audio sample channel merging, which may employ averaging of multi-channel samples or selecting only one of the channels as the sample f (t) for subsequent processing.

(1.2) sampling the sample to a lower sampling frequency fq, wherein the optional sampling frequency is 8KHz, the downsampling method can be a nearest neighbor or linear downsampling method, the downsampled sample is marked as S (n), and n is the index number of the sample data.

And (1.3) packing the samples into an audio frame format. The audio frame is calculated as

SF(fn,n)＝S(fn·(fz-hz)+n)

Wherein fz represents the number of samples contained in one audio frame; hz represents the number of overlapping samples between audio frames; fn denotes audio frame indexNumber, audio frame index number value range ofn represents the index number of sample data in the audio frame, and the value range of the index number of the sample data is more than or equal to 0 and less than fz; the corresponding frequency fm of the audio frame is

(2) Rhythm assessment

Tempo estimation is used to estimate the Beats per minute (BPM, Beats per minute) of music. Comprises three substeps: and (5) performing spectrum analysis, establishing rhythm samples and evaluating BPM.

(2.1) the spectral analysis is to analyze the spectral intensity among different audio frames that is adapted to the human auditory sense. The implementation steps of the spectrum analysis comprise the following parts:

a. and adding a Hamming window to the audio frame, wherein a Hamming window function used in the Hamming window adding process is as follows:

FW(fn,n)＝SF(fn,n)*HW(n)

b. and carrying out fast Fourier transform on the windowed sample frame to obtain the spectrum intensity of each frame.

FI(fn,n)＝|FFT(FW(fn,n))|

c. A spectrogram is generated. The frequency obtained by fourier transformation is hertz (Hz), the frequency range that the human ear can usually hear is 2 Hz-20 KHz, and the human ear's perception in hertz units is non-linear, usually more sensitive to low frequency signals and relatively insensitive to high frequency signals. The present invention uses a mel-frequency spectrum that is more linear with respect to the human ear. The mel-frequency spectrum may be quantized to mb bins, where mb is typically 40. The fourier spectrum is converted into the mel spectrum by using a matrix transformation method, wherein the matrix transformation process is shown as the following formula:

MI＝FI·W

where MI (a size fn × mb matrix) is a mel-frequency spectrum intensity, FI (a size fn × n matrix) is a fourier spectrum intensity, and W is a spectrum energy conversion matrix of n × mb.

d. Converting the Mel frequency spectrum into the Mel sound spectrum intensity of decibel amplitude, wherein the conversion process is shown as the following formula:

MIdb＝20.0*log10(MI)。

and (2.2) establishing a rhythm sample, and obtaining the change rule of the sound intensity after obtaining the Mel spectrum intensity of decibel amplitude at different moments, thereby extracting the number of beats in the music. The rhythm sample construction processing process comprises the following procedures:

a. and eliminating low-decibel sounds in the sound spectrum. The low-decibel sound spectrum intensity in the decibel amplitude can be cut off through threshold operation, and the cutting-off process is shown as the following formula;

MIdb＝max(MIdb,Th_db)；

b. and calculating the increment of the intensity of the sound spectrum, wherein the calculation formula is shown as the following formula:

D(fn,n)＝max(MIdb(fn,n)-MIdb(fn-1,n),0)；

the audio intensity increments for each audio frame are summed as follows:

c. and performing IIR filtering on the summed audio intensity increment by using an IIR filter to remove a direct current component, wherein the filtering process is shown as the following formula:

Ed(fn)＝a₀E(fn)+a₁E(fn-1)+a₂Ed(fn-1)；

wherein, the IIR filter coefficient can be selected from a₀＝1，a₁＝-1，a₂＝0.99。

(2.3) evaluation of BPM. This step obtains the BPM of the music by performing an autocorrelation operation on the audio intensity. The specific evaluation flow is as follows:

the audio intensity is truncated into a segment with a length of s seconds, the number of samples corresponding to the truncated audio intensity is sn ═ s · fm, and the truncated segment is et (n), where n < sn. And Et (n) is subjected to autocorrelation operation to obtain a correlation coefficient:

(for related operator)

In order to suppress frequencies that are too low and too high, a reference beat value, usually a reference value range [60,150], may be selected, w (n) may be windowed at the reference beat value, the index dn of the maximum value is selected as the sample number interval of the beat, and the corresponding time interval is Δ t — dn/fm, and the corresponding BPM is 60/Δ t.

(3) Beat position estimation

Beat location estimation estimates the specific temporal location of the occurrence of each beat in the audio data sample. The method specifically comprises the following steps:

A. generating a gaussian filter kernel, the filter kernel size being dn, the gaussian kernel function being as follows:

where sc is a scaling factor, usually 8.

B. Adopting the Gaussian core to perform correlation operation on the Ed (fn) to obtain the beat score

C. And (4) scoring the beats to find local extreme points, namely potential beat time sequences T (n). The local extreme point satisfies the time instant with the audio frame index fn of (Ga (fn +1) -Ga (fn)) (Ga (fn) -Ga (fn-1)) < 0.

D. Finding the maximum value of Ga (fn) at the time of the last dn sample of T (n) as the time t of music beat end_end。

E. From t_endThe time before is found in Ga (fn) trace back to dn every time T (n) time Ga (fn)The maximum value of (d) is corresponding to the sample index fn, then the beat time t of the music is fn/fm. And tracing back to dn by taking fn as a reference next time, and so on.

The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

10页详细技术资料下载

Music beat extraction method for non-woven engineering

相关技术

网友询问留言