Tone characteristic extraction method based on short-time discrete harmonic transformation

文档序号：1710351 发布日期：2019-12-13 浏览：28次中文

阅读说明：本技术 基于短时离散谐波变换的音色特征提取方法 (Tone characteristic extraction method based on short-time discrete harmonic transformation ) 是由 *** 孙聪珊杨婧马琳李洪伟陈婧薄洪健丰上熊文静于 2019-09-18 设计创作，主要内容包括：本发明公开了一种基于短时离散谐波变换的音色特征提取方法,包括：用基于三电平中心削波互相关函数法提取基音周期估计值,并基于基音周期获得对音频信号的谐波结构信息进行提取的频域变换方法,进一步提取谐波结构信息,构建音色谐波谱特征,用于对音频音色特征的提取。本发明的优点是：可以解决现有技术中对音频谐波结构信息提取不足,且使用的相关音色特征数目庞大,信息冗余,效率低下的问题,获得具有稀疏性的音频谐波结构信息提取方法。(The invention discloses a tone characteristic extraction method based on short-time discrete harmonic transformation, which comprises the following steps: and extracting a pitch period estimated value by using a three-level center clipping cross-correlation function method, and obtaining a frequency domain transformation method for extracting harmonic structure information of the audio signal based on the pitch period, further extracting the harmonic structure information, and constructing a tone harmonic spectrum characteristic for extracting the tone color characteristic of the audio. The invention has the advantages that: the problems that in the prior art, the extraction of audio harmonic structure information is insufficient, the number of used related tone features is large, information is redundant, and efficiency is low can be solved, and the audio harmonic structure information extraction method with sparsity is obtained.)

1. A tone characteristic extraction method based on short-time discrete harmonic transformation is characterized by comprising the following steps: the method comprises the following steps: a short-time discrete harmonic transformation method and a tone harmonic spectrum feature extraction method;

a. Short-time discrete harmonic conversion method

based on the harmonic structure theory, according to the physical characteristics of the sound source: the audio frequency emitted by the vibration of the object is complex sound, i.e. has a harmonic structure, and each complex sound has a fundamental frequency f₀It is the minimum frequency of the harmonic spectrum, the fundamental tone of different tones being different₀Also different; the harmonic structure of a complex tone is represented by a frequency sequence, harmonic tones are numbered in the order of frequency from low to high, and a vector HS ═ f (f) is used₀，f₁，...，f_m，...，f_M) Storing the frequency value of each harmonic wave of the harmonic spectrum, wherein M is the highest harmonic frequency; the analog frequency (center frequency) corresponding to the mth spectral line in the harmonic spectrum is as follows:

f_m＝f₀·m (1)

Wherein the bandwidth corresponding to adjacent harmonic spectra is:

B_m＝f_m+1-f_m＝f₀ (2)

It can be seen that the spacing (bandwidth) of any adjacent spectral lines is a constant value, whichRatio p of time-center frequency to bandwidth_mThe order of the spectral lines for the center frequency:

p_m＝f_m/B_m＝m (3)

Let the sampling frequency of the signal be f_s；T₀At fundamental frequency f of harmonic spectrum₀the corresponding pitch period satisfies:

T₀＝f_s/f₀ (4)

if the highest frequency component of the signal is f_maxIf so, at least the following relationship is satisfied, and aliasing distortion is not generated;

f_s≥2f_max (5)

Folding frequency (f)_s/2) is the highest frequency at which the analog signal can be analyzed; and because in the harmonic spectrum:

f_max≥M·f₀ (6)

since the highest harmonic order M is an integer, rounding down yields:

for a signal sequence with a window length N, the interval of the digital frequencies of the discrete spectrum obtained according to the frequency domain sampling theorem on discrete signals must satisfy the following formula:

B_m·N≤f_s (8)

Taking the equal sign in the above formula obtains the expression of N, and since N is an integer, rounding down obtains:

According to the formula, when the fundamental frequency or the fundamental tone period is known, the window length and the highest harmonic frequency of the time-frequency transformation based on the harmonic structure can be obtained; when equation (8) is given a sign equal to equation (3), the numerical frequency at line m in the harmonic spectrum is:

based on the corresponding component of the Discrete Short Time Fourier Transform (DSTFT), and performing normalization to obtain a sparse Transform based on harmonic frequency, a Discrete Harmonic Transform (DHT); assuming that the audio signal is a finite-length sequence x (n), the expression of the m-th order spectral components of the DHT is as follows:

wherein, N is the window length corresponding to the time-frequency transformation of the harmonic spectrum; w is a_N(N) is a window function of length N; m is 1, …, M is a frequency index of the sequence harmonic spectrum, representing the mth harmonic, M is the highest harmonic order; let the i frame after the original signal x (n) is framed be x_i(n) for x_i(n) performing discrete harmonic conversion to obtainReferred to as short-time discrete harmonic transformation of the signal:

b. Tone and color harmonic spectrum feature extraction method

The spectral energy distribution of the harmonic structure is an important feature that affects the timbre; thus, based on the short-time discrete harmonic transformation of equation 12, let E_m.Representing the mth harmonic energy, the mth harmonic energy of the ith frame is:

For M piecesNormalized and recorded asThe short-time discrete harmonic energy can be expressed as:

for EDHTⁱeach of which isperforming Discrete Cosine Transform (DCT):

Wherein, p is 1, 2.. times.m; the short-time discrete harmonic transform coefficients can be expressed as:

For SDHTCⁱCalculating the first order difference to obtain the harmonic energy conversion amplitude, and using the first order short-time discrete harmonic conversion coefficient delta SDHTCⁱRepresents; for SDHTCⁱCalculating second order difference to obtain harmonic energy conversion rate, and using second order short-time discrete harmonic conversion coefficient delta²SDHTCⁱRepresents; the tone-color harmonic spectral feature is a local tone-color harmonic spectral feature composed of the short-time discrete harmonic transform coefficient, the first-order difference short-time discrete harmonic transform coefficient, and the second-order difference short-time discrete harmonic transform coefficient.

2. The method of claim 1, wherein: the statistical characteristics such as mean, standard deviation, 20% quantile, 50% fractional, 80% fractional and kurtosis for the mth tone harmonic spectral characteristics found for all frames of the signal can be used as global tone harmonic spectral characteristics.

Technical Field

The invention relates to the technical field of signal processing, in particular to an audio signal tone characteristic extraction method.

Background

The extraction of the tone-color related features is a key part of sound source identification and has great influence on the sound source identification result. In recent years, with the development of signal processing technology, methods for extracting tone color related features are increasing, and the methods relate to time domain, frequency domain, cepstrum domain and the like. At present, when the tone color related features are extracted, a large number of acoustic features of a time domain, a frequency domain and a cepstrum domain are mainly combined, the number of the features is large, the calculation burden is increased, and information redundancy is caused. The object vibrates to generate a group of harmonic sequences arranged according to the pitch sequence, the human ear analyzes and synthesizes the received different harmonic sequences through a cochlea basement membrane, and different tones and pitches are obtained through the sensing judgment of the brain. The variation of the harmonic sequence has the greatest influence on the timbre. At present, many methods do not mine the essential features of timbre from the perspective of the physical significance of harmonics of an audio signal to timbre. Therefore, in order to further improve the accuracy and efficiency of sound source identification, it is necessary to find a timbre feature extraction method which is simple in expression and accurate in description.

Disclosure of Invention

aiming at the defects in the prior art, the invention provides a tone characteristic extraction method based on short-time discrete harmonic transformation, and solves the defects in the prior art.

In order to realize the purpose, the technical scheme adopted by the invention is as follows:

a. short-time discrete harmonic conversion method

Based on the harmonic structure theory, according to the physical characteristics of the sound source: the audio frequency emitted by the vibration of the object is complex sound, i.e. has a harmonic structure, and each complex sound has a fundamental frequency f₀It is the minimum frequency of the harmonic spectrum, the fundamental tone of different tones being different₀and also different. The harmonic structure of complex tones is represented by a frequency sequence, and harmonic tones are reduced in frequencyhigh order numbering, using vector HS ═ f₀，f₁，...，f_m，...，f_M) The harmonic frequency values of the harmonic spectrum are saved, with M being the highest harmonic order. The analog frequency (center frequency) corresponding to the mth spectral line in the harmonic spectrum is as follows:

f_m＝f₀·m (1)

Wherein the bandwidth corresponding to adjacent harmonic spectra is:

B_m＝f_m+1-f_m＝f₀ (2)

it can be seen that the spacing (bandwidth) of any adjacent spectral lines is constant, when the ratio p of the center frequency to the bandwidth is constant_mThe order of the spectral lines for the center frequency:

p_m＝f_m/B_m＝m (3)

Let the sampling frequency of the signal be f_s。T₀At fundamental frequency f of harmonic spectrum₀the corresponding pitch period satisfies:

T₀＝f_s/f₀ (4)

If the highest frequency component of the signal is f_maxthen at least the following relationship is satisfied so that aliasing distortion is not generated.

f_s≥2f_max (5)

Folding frequency (f)_s/2) is the highest frequency at which the analog signal can be analyzed. And because in the harmonic spectrum:

f_max≥M·f₀ (6)

Since the highest harmonic order M is an integer, rounding down yields:

For a signal sequence with a window length N, the interval of the digital frequencies of the discrete spectrum obtained according to the frequency domain sampling theorem on discrete signals must satisfy the following formula:

B_m·N≤f_s (8)

Taking the equal sign in the above formula obtains the expression of N, and since N is an integer, rounding down obtains:

from the above equation, the window length and the highest harmonic order of the time-frequency transform based on the harmonic structure can be found when the fundamental frequency or the pitch period is known for the audio signal. When equation (8) is given a sign equal to equation (3), the numerical frequency at line m in the harmonic spectrum is:

Based on the corresponding component of the Discrete Short Time Fourier Transform (DSTFT), and normalization is performed, a sparse Transform based on Harmonic frequency, Discrete Harmonic Transform (DHT) is obtained. Assuming that the audio signal is a finite-length sequence x (n), the expression of the m-th order spectral components of the DHT is as follows:

wherein, N is the window length corresponding to the time-frequency transformation of the harmonic spectrum; w is a_N(N) is a window function of length N; m is 1, …, M is the frequency index of the sequence harmonic spectrum, which represents the mth harmonic, and M is the highest harmonic order. Let the i frame after the original signal x (n) is framed be x_i(n) for x_i(n) performing discrete harmonic conversion to obtainreferred to as Short-time Discrete Harmonic transformation (SDHT) of the signal:

b. Tone and color harmonic spectrum feature extraction method

Harmonic waveThe spectral energy distribution of the structure is an important feature that affects the timbre. Thus, based on the short-time discrete harmonic transformation of equation 12, let E_mRepresents the mth harmonic energy, then the mth harmonic energy of the ith frame is:

For M piecesNormalized and recorded asthe short-time discrete harmonic energy can be expressed as:

For EDHTⁱEach of which isperforming Discrete Cosine Transform (DCT):

Wherein, p is 1, 2.. times.m; the short-time discrete harmonic transform coefficients can be expressed as:

For SDHTCⁱcalculating the first order difference to obtain the harmonic energy conversion amplitude, and using the first order short-time discrete harmonic conversion coefficient delta SDHTCⁱAnd (4) showing. For SDHTCⁱCalculating second order difference to obtain harmonic energy conversion rate, and using second order short-time discrete harmonic conversion coefficient delta²SDHTCⁱand (4) showing. The tone and color harmonic spectrum features are composed of short-time discrete harmonic transform coefficients, first-order difference short-time discrete harmonic transform coefficients and second-order differenceThe short-time discrete harmonic transformation coefficients constitute local timbre harmonic spectral features.

further, statistical characteristics such as mean, standard deviation, 20% quantile, 50% quantile, 80% quantile, and kurtosis (fourth-order center distance) of the mth tone color harmonic spectral characteristics found for all frames of the signal may be used as the global tone color harmonic spectral characteristics.

Compared with the prior art, the invention has the advantages that:

The harmonic structure information of the audio can be obtained through short-time discrete harmonic transformation, the extraction of the tone color characteristics of the audio is realized through the tone color harmonic spectrum, the method has the characteristics of less characteristic dimension, small calculated amount and high efficiency, and the extracted tone color characteristics meet the physical significance of the harmonic to the tone color. The feature extraction method can be widely applied to music recommendation, music emotion calculation, speaker identification and other applications.

Drawings

FIG. 1 is a flow chart of a method of an embodiment of the present invention;

FIG. 2 is a time domain waveform of piano C4 single tone for 1.5 seconds according to the embodiment of the present invention;

Fig. 3 is a short-time discrete harmonic transformation spectrum of a single tone of piano C4 according to an embodiment of the present invention.

Detailed Description

in order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail below with reference to the accompanying drawings by way of examples.

As shown in fig. 1, a timbre feature extraction method based on short-time discrete harmonic transformation includes the following steps:

Step 1, pretreatment;

the step 1 comprises the following substeps:

Step 101 detection of audio silence segments: the silence segments of the audio have an effect on the estimation of the pitch period and the extraction of harmonic features. Therefore, it is necessary to perform silence detection and remove the silence. Firstly, the signal is divided into frames, the short-time average energy of the divided signals is calculated, and a higher threshold value beta is given₁The start and stop points of the audio should be at the threshold and energytwo intersection points x of the envelope₁And x₂(x₁＜x₂) Outside the time point of (a). Given a lower threshold value beta₂from the point of intersection x₁and x₂Respectively searching left and right to find short-time energy and threshold beta₂Two points x of intersection₃And x₄(x₃＜x₁＜x₂＜x₄) The two points are the start point and the end point of the audio frequency.

step 102, data framing and windowing: based on the short-time stationary characteristic of the audio, before the feature of the audio signal after the silent section is removed in the extracting step 101, a frame segmentation process needs to be performed, that is, the signal is segmented into small segments of signals with stable statistical features, and each small segment of signal needs one frame. In order to increase the continuity between frames and reduce the spectral leakage, a windowing operation is also required on the divided frames, where a hanning window is selected that can reduce the spectral leakage.

Step 2, extracting harmonic information by short-time discrete harmonic transformation

the step 2 further comprises:

Pitch period estimation of an audio signal in step 201: aiming at the non-silent section audio frequency after the frame division and the window addition obtained in the step 102, one frame of data s is taken in sequence_iaccording to a three-level center clipping input-output function:

obtaining an output y 'of a chopper'_iWherein, C_Lis the three-level center clipping coefficient. And then, a central clipping input-output function is utilized:

Obtaining a central clipped output y_i. Then get y'_iAnd y_iCross correlation function of (d):

Where i is the ith frame and k is the delay in time. To R_i(k) Taking part R 'with retardation being positive value'_i(k) In that respect Let the sampling frequency f of the signal_sLet f be the frequency range of audio'_min～f′_maxthen the pitch period ranges from f_s/f′_maxAnd f_s/f′_minin the meantime. At f_s/f′_max～f_s/f′_minGet R 'from the middle'_i(k) Is the maximum value of (2), the delay amount corresponding to the maximum value is the pitch period T₀。

Step 202, solving harmonic spectrum information: let i be the ith frame after framing the signal x (N), and N be the window length corresponding to the time-frequency transform of the harmonic spectrum,w_N(N) is a window function of length N; m is 1, …, M is the frequency index of the sequence harmonic spectrum, representing the mth harmonic, M is the highest harmonic order, according to

the first M spectral components of each frame of the signal can be solved.

step 3, extracting the harmonic spectrum characteristics of tone and color;

the step 3 comprises the following substeps:

Step 301 short-time discrete harmonic transform coefficients: let E_mRepresents the mth harmonic energy, then the mth harmonic energy of the ith frame is:

For M piecesNormalized and recorded asthe short-time discrete harmonic energy can be expressed as:

for EDHTⁱEach of which isPerforming Discrete Cosine Transform (DCT):

Wherein, p is 1, 2.. times.m; the short-time discrete harmonic transform coefficients can be expressed as:

step 302 first order difference short time discrete harmonic transform coefficients: for SDHTCⁱCalculating the first order difference to obtain the harmonic energy conversion amplitude, and using the first order short-time discrete harmonic conversion coefficient delta SDHTCⁱAnd (4) showing.

Step 303 is a second order difference short time discrete harmonic transform coefficient: for SDHTCⁱcalculating second order difference to obtain harmonic energy conversion rate, and using second order short-time discrete harmonic conversion coefficient delta²SDHTCⁱAnd (4) showing.

Step 304 timbre harmonic spectral features: and the local tone-color harmonic wave spectrum characteristics are formed by the short-time discrete harmonic wave conversion coefficient, the first-order difference short-time discrete harmonic wave conversion coefficient and the second-order difference short-time discrete harmonic wave conversion coefficient obtained by each frame of the signal. Further, statistical characteristics such as a mean, a standard deviation, a 20% quantile, a 50% quantile, an 80% quantile, and kurtosis (fourth-order center distance) of the mth tone color harmonic spectral characteristics found for all frames of the signal may be used as the global tone color harmonic spectral characteristics.

As shown in fig. 2, is a time domain waveform diagram of piano C4 single tone for 1.5 seconds.

As shown in fig. 3, the spectrogram obtained by short-time discrete harmonic transformation of the monophonic audio of piano C4 in fig. 2 is very accurate in fundamental frequency, and can accurately obtain the frequency components of each subharmonic without spurious frequency components. The harmonic component of the audio frequency can be accurately obtained under the condition of sparseness.

It will be appreciated by those of ordinary skill in the art that the examples described herein are intended to assist the reader in understanding the manner in which the invention is practiced, and it is to be understood that the scope of the invention is not limited to such specifically recited statements and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

9页详细技术资料下载

Tone characteristic extraction method based on short-time discrete harmonic transformation

相关技术

网友询问留言