Method and device for acquiring characteristics of voice signal

文档序号：36644 发布日期：2021-09-24 浏览：40次中文

阅读说明：本技术 语音信号的特征获取方法及装置 (Method and device for acquiring characteristics of voice signal ) 是由张涛林丽琴孙浩于 2021-06-24 设计创作，主要内容包括：本公开实施例公开了一种语音信号的特征获取方法及装置,方法包括：对待分析的语音信号进行时频化处理,获取语音信号的语谱图；统计获取语谱图内各能量点的方向信息；根据语谱图内各能量点的方向信息,基于核密度估计算法建立形式背景,形式背景以能量点作为对象,能量点所含有的方向区间作为属性,用于描述语谱图中能量点与其所含有的方向区间之间的对应关系；根据形式背景,建立语音信号的方向共生属性拓扑图,方向共生属性拓扑图用于描述形式背景中属性对之间的共生关系。本技术方案可以用图的形式显示语音信号更加详细的方向信息,检测精度高,可解释性强,能有效分析语音信号的语谱图内能量点的方向值复杂多变的情况。(The embodiment of the disclosure discloses a method and a device for acquiring characteristics of a voice signal, wherein the method comprises the following steps: performing time-frequency processing on a voice signal to be analyzed to obtain a spectrogram of the voice signal; counting direction information of each energy point in the acquired spectrogram; establishing a form background based on a nuclear density estimation algorithm according to direction information of each energy point in a spectrogram, wherein the form background takes the energy point as an object, and a direction interval contained in the energy point is used as an attribute for describing a corresponding relation between the energy point in the spectrogram and the direction interval contained in the spectrogram; and establishing a directional symbiotic attribute topological graph of the voice signal according to the form background, wherein the directional symbiotic attribute topological graph is used for describing the symbiotic relationship between attribute pairs in the form background. The technical scheme can display more detailed direction information of the voice signal in a graph form, has high detection precision and strong interpretability, and can effectively analyze the condition that the direction value of the energy point in the spectrogram of the voice signal is complex and changeable.)

1. A method for obtaining characteristics of a speech signal, comprising:

performing time-frequency processing on a voice signal to be analyzed to obtain a spectrogram of the voice signal;

counting and acquiring direction information of each energy point in the spectrogram;

establishing a form background based on a kernel density estimation algorithm according to the direction information of each energy point in the spectrogram, wherein the form background takes the energy point as an object, and a direction interval contained in the energy point as an attribute and is used for describing a corresponding relation between the energy point in the spectrogram and the direction interval contained in the energy point;

and establishing a direction symbiotic attribute topological graph of the voice signal according to the form background, wherein the direction symbiotic attribute topological graph is used for describing symbiotic relations between attribute pairs in the form background.

2. The method of claim 1, wherein the performing time-frequency transformation on the speech signal to be analyzed to obtain a spectrogram of the speech signal comprises:

performing a short-time Fourier transform on the speech signal according to the following formula:

wherein x (u) is a speech signal, w (u-t) is a window function, t represents time, f represents frequency, u-t e [0, L-1], and L is a step length of the window function;

taking P (t, f) as an expression of the spectrogram of the voice signal, and calculating P (t, f) according to the following formula:

P(t,f)＝|STFT(t,f)|²；

p (t, f) represents the energy value at time t and frequency f.

3. The method of claim 2, wherein w (u-t) is a window function of a hamming window.

4. The method according to claim 2 or 3, wherein the statistically obtaining the direction information of each energy point in the spectrogram comprises:

performing sliding window processing on the spectrogram according to the following formula:

P(t,f)＝[P₁(t,f),P₂(t,f),…,P_n(t,f)]；

wherein n is the number of subarea windows in the spectrogram, P_i(t, f) represents the ith subregion window of the spectrogram;

calculating the sub-region window P according to the following formula_i(t, f) time-frequency mixed domain in (t, f)₀,f₀) Rate of change of direction of point of energy

Wherein said l represents said sub-region window P_i(t, f) time-frequency mixed domain in (t, f)₀,f₀) The direction of the point of energy is measured,representing the time-frequency mixture domain (t) under the window of the sub-region₀,f₀) At the energy point (t)₀,f₀) The rate of change with time of the time,in the time-frequency domain (t) under the window of the sub-region₀,f₀) At the energy point (t)₀,f₀) Of the frequency change rate of, saidIs the coordinate axis of the time-frequency plane in the spectrogram to (t)₀,f₀) The angle of the direction l of the energy point.

5. The method of claim 4, wherein the establishing a formal background based on a kernel density estimation algorithm according to the direction information of each energy point in the spectrogram comprises:

the sub-region window P is defined by the following formula_i(t, f) belowIn the time-frequency domain (t)₀,f₀) Performing kernel probability density estimation on the directional change rate value distribution of the energy points to obtain (t)₀,f₀) Approximate distribution function of direction change rate of energy point

the sub-region window P is defined by the following formula_i(t, f) time-frequency mixed domain in (t, f)₀,f₀) Performing kernel probability density estimation on the direction value distribution of the energy points to obtain (t)₀,f₀) Approximate distribution function of direction values of energy points

calculating the corresponding relation between the energy points and the direction intervals contained in the energy points according to the following formula:

wherein, theTo pass an approximate distribution functionAnd approximate distribution functionDerived direction valueIs a direction valueVariance of g_pRepresenting sub-region windows P_iThe p-th energy point under (t, f) is (t)₀,f₀) An energy point, wherein the value of P is 1,2_iThe number of energy points within (t, f); psi_qIs a sub-region window P_i(t, f) the directional interval after all the directional values of all the energy points have their value ranges quantized at equal intervals, ψ_qRepresents an energy point g_pQ is 1,2, the value of q is, b is the number of direction intervals quantized at equal intervals, I is the relation between the object and the attribute, and g is_pIψ_qEnergy of expressionMeasurement Point g_pAnd energy point g_pContaining the attribute psi_qThe corresponding relation between the two;

with sub-region window P_iThe energy point under (t, f) is the object, and the sub-region window P_i(t, f) in the attribute creation form, namely (G, M, I), the direction intervals after the value ranges of all direction values of all energy points are quantized at equal intervals are used as the background K, wherein G represents the sub-region window P_i(t, f) set of all energy points, M being the window P of the sub-region_iAnd (t, f) the set of direction intervals after the value ranges of all the energy point direction values are quantized at equal intervals.

6. The method according to claim 5, wherein the establishing a directional symbiotic attribute topological graph of the voice signals according to the form context comprises:

computing a symbiotic strength matrix Edge on the Edge between the attribute pairs in the formal background according to the following formula_i(ψ_u,ψ_v)：

Wherein g (psi)_u) For the interval psi with the direction in the form background_uThe corresponding relation of (c) is a set of energy points, g (ψ), of 1_v) For the interval psi with the direction in the form background_vThe corresponding relation of (1) is a set of energy points; # (g (psi)_u) Denotes g (ψ)_u) The number of energy points in; # (g (psi)_u)∩g(ψ_v) Denotes g (ψ)_u)∩g(ψ_v) The number of energy points in (1) and (2) are taken as u and v values;

according to the Edge_i(ψ_u,ψ_v) And constructing a direction symbiosis attribute topological graph of the voice signals.

7. An apparatus for obtaining characteristics of a speech signal, comprising:

the acquisition module is configured to perform time-frequency processing on a voice signal to be analyzed to acquire a spectrogram of the voice signal;

the statistical module is configured to statistically acquire direction information of each energy point in the spectrogram;

the first establishing module is configured to establish a form background based on a kernel density estimation algorithm according to the direction information of each energy point in the spectrogram, wherein the form background takes the energy point as an object, and a direction interval contained in the energy point is used as an attribute for describing a corresponding relation between the energy point in the spectrogram and the direction interval contained in the energy point;

and the second establishing module is configured to establish a directional symbiotic attribute topological graph of the voice signal according to the form background, wherein the directional symbiotic attribute topological graph is used for describing symbiotic relations between attribute pairs in the form background.

Technical Field

The present disclosure relates to the field of data processing technologies, and in particular, to a method and an apparatus for obtaining characteristics of a voice signal.

Background

Speech, as the most common way of communication between people, contains many effective and important information such as gender, age, emotion, and whether they are stable. Because the voice contains rich information and the feasibility of voice data acquisition is high, different features in the voice are extracted to represent required information, and the method has considerable application prospect in multiple fields of current artificial intelligence, medical diagnosis and the like. Particularly in the medical field, the voice detection method has the advantages of convenient acquisition, non-contact, non-invasive and the like, and has obvious advantages compared with other signals in operation and acquisition, so that the voice detection method has great attention to the diagnosis of related diseases through voice.

In the existing features capable of representing deep information of a speech signal, the traditional acoustic features have clear physical significance and strong interpretability, but the traditional acoustic features only extract features aiming at a time domain or a frequency domain, so that direct influence caused by other factors in speech is ignored, and the detection accuracy is not high. Although the detection accuracy of the speech signal features obtained based on the deep learning is high, the deep learning has the problems of poor interpretability and a black box of a model, the limitation of small data set scale generally exists in the medical field, and the risk of overfitting exists when the speech analysis research is carried out based on the deep learning.

Disclosure of Invention

The embodiment of the disclosure provides a method and a device for acquiring characteristics of a voice signal.

In a first aspect, an embodiment of the present disclosure provides a method for obtaining features of a speech signal, including:

performing time-frequency processing on a voice signal to be analyzed to obtain a spectrogram of the voice signal;

counting and acquiring direction information of each energy point in the spectrogram;

Further, the time-frequency processing the voice signal to be analyzed to obtain a spectrogram of the voice signal includes:

performing a short-time Fourier transform on the speech signal according to the following formula:

wherein x (u) is a speech signal, w (u-t) is a window function, t represents time, f represents frequency, u-t e [0, L-1], and L is a step length of the window function;

taking P (t, f) as an expression of the spectrogram of the voice signal, and calculating P (t, f) according to the following formula:

P(t,f)＝|STFT(t,f)|²；

p (t, f) represents the energy value at time t and frequency f.

Further, w (u-t) is a window function of a Hamming window.

Further, the obtaining direction information of each energy point in the spectrogram through statistics includes:

performing sliding window processing on the spectrogram according to the following formula:

P(t,f)＝[P₁(t,f),P₂(t,f),…,P_n(t,f)]；

wherein n is the number of subarea windows in the spectrogram, P_i(t, f) represents the ith subregion window of the spectrogram;

calculating the sub-region window P according to the following formula_i(t, f) time-frequency mixed domain in (t, f)₀,f₀) Rate of change of direction of point of energy

Further, the establishing a form background based on a kernel density estimation algorithm according to the direction information of each energy point in the spectrogram includes:

the sub-region window P is defined by the following formula_i(t, f) time-frequency mixed domain in (t, f)₀,f₀) Performing kernel probability density estimation on the directional change rate value distribution of the energy points to obtain (t)₀,f₀) Approximate distribution function of direction change rate of energy point

calculating the corresponding relation between the energy points and the direction intervals contained in the energy points according to the following formula:

wherein, theTo pass an approximate distribution functionAnd approximate distribution functionDerived direction valueIs a direction valueVariance of g_pRepresenting sub-region windows P_iThe p-th energy point under (t, f) is (t)₀,f₀) An energy point, wherein the value of P is 1,2_iThe number of energy points within (t, f); psi_qIs a sub-region window P_i(t, f) the directional interval after all the directional values of all the energy points have their value ranges quantized at equal intervals, ψ_qRepresents an energy point g_pQ is 1,2, the value of q is, b is the number of direction intervals quantized at equal intervals, I is the relation between the object and the attribute, and g is_pIψ_qRepresents an energy point g_pAnd energy point g_pContaining the attribute psi_qThe corresponding relation between the two;

to form a sub-region window P_iThe energy point under (t, f) is the object, and the sub-region window P_i(t, f) in the attribute creation form, namely (G, M, I), the direction intervals after the value ranges of all direction values of all energy points are quantized at equal intervals are used as the background K, wherein G represents the sub-region window P_i(t, f) set of all energy points, M being the window P of the sub-region_iAnd (t, f) the set of direction intervals after the value ranges of all the energy point direction values are quantized at equal intervals.

Further, the establishing a directional symbiotic attribute topological graph of the voice signal according to the form background includes:

computing a symbiotic strength matrix Edge on the Edge between the attribute pairs in the formal background according to the following formula_i(ψ_u,ψ_v)：

according to the Edge_i(ψ_u,ψ_v) And constructing a direction symbiosis attribute topological graph of the voice signals.

In a second aspect, an embodiment of the present disclosure provides a feature obtaining apparatus for a speech signal, including:

the acquisition module is configured to perform time-frequency processing on a voice signal to be analyzed to acquire a spectrogram of the voice signal;

the statistical module is configured to statistically acquire direction information of each energy point in the spectrogram;

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:

according to the technical scheme, a method for counting the directions of the energy points in different time-frequency mixed domains can be adopted, rich direction information in the voice signal to be analyzed and the symbiotic relationship among all the directions are extracted, the integrity of the information is ensured, and the conversion from the voice signal to the attribute topology is realized and the symbiotic relationship among the direction attribute pairs of the voice signal is visually represented by a form background establishing method based on kernel density estimation. The voice signal is converted into the form of the graph to be represented, more detailed direction information of the voice signal can be represented, the detection precision is high, the interpretability is strong, the technical defects of the traditional acoustic characteristic and the deep learning characteristic are overcome, and the situation that the direction values of the energy points in the spectrogram of the voice signal are complex and changeable can be effectively analyzed.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

Other features, objects, and advantages of the present disclosure will become more apparent from the following detailed description of non-limiting embodiments when taken in conjunction with the accompanying drawings. In the drawings:

fig. 1 illustrates a flowchart of a feature acquisition method of a voice signal according to an embodiment of the present disclosure;

FIG. 2 shows a schematic diagram of a formal background in accordance with an embodiment of the present disclosure;

FIG. 3 is a schematic diagram illustrating a directional symbiotic properties topology according to one embodiment of the present disclosure;

fig. 4 is a block diagram illustrating a structure of a speech signal feature acquisition apparatus according to an embodiment of the present disclosure.

Detailed Description

Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement them. Also, for the sake of clarity, parts not relevant to the description of the exemplary embodiments are omitted in the drawings.

In the present disclosure, it is to be understood that terms such as "including" or "having," etc., are intended to indicate the presence of the disclosed features, numbers, steps, behaviors, components, parts, or combinations thereof, and are not intended to preclude the possibility that one or more other features, numbers, steps, behaviors, components, parts, or combinations thereof may be present or added.

It should be further noted that the embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 shows a flowchart of a feature acquisition method of a voice signal according to an embodiment of the present disclosure, and as shown in fig. 1, the feature acquisition method of the voice signal may include the following steps S101 to S103:

in step S101, performing time-frequency processing on a speech signal to be analyzed to obtain a spectrogram of the speech signal;

in step S102, direction information of each energy point in the spectrogram is statistically acquired;

in step S103, establishing a form background based on a kernel density estimation algorithm according to the direction information of each energy point in the spectrogram, where the form background takes an energy point as an object and a direction interval contained in the energy point as an attribute, and is used to describe a corresponding relationship between the energy point in the spectrogram and the direction interval contained in the energy point;

in step S104, according to the form background, a direction symbiotic attribute topological graph of the voice signal is established, where the direction symbiotic attribute topological graph is used to describe a symbiotic relationship between attribute pairs in the form background.

In an embodiment of the present disclosure, the spectrogram is a graph representing a change of a speech energy value with frequency and time, and expresses three-dimensional information, where a vertical axis in a time-frequency coordinate axis is frequency, a horizontal axis is time, and a Z axis perpendicular to the time-frequency coordinate axis is an energy value, and the spectrogram is used to represent an energy value corresponding to an energy point at a given time and a given frequency in a time-frequency mixing domain.

In an embodiment of the present disclosure, a method for counting directions of energy points in different time-frequency mixed domains may be adopted to extract rich direction information in a speech signal to be analyzed and a symbiotic relationship between each direction, so as to ensure the integrity of the information, and a formal background establishment method based on kernel density estimation is used to realize conversion from the speech signal to a direction attribute topology thereof, and visually represent the symbiotic relationship between pairs of direction attributes of the speech signal. The voice signal is converted into the form of the graph to be represented, more detailed direction information of the voice signal can be represented, the detection precision is high, the interpretability is strong, the technical defects of the traditional acoustic characteristic and the deep learning characteristic are overcome, and the situation that the direction values of the energy points in the spectrogram of the voice signal are complex and changeable can be effectively analyzed.

In an embodiment of the present disclosure, the step 101 of the method, namely, performing time-frequency processing on the voice signal to be analyzed to obtain a spectrogram of the voice signal, may include the following steps a1 and a 2.

In step a1, the speech signal is subjected to a short-time fourier transform according to the following formula:

wherein x (u) is a speech signal, w (u-t) is a window function, t represents time, f represents frequency, u-t e [0, L-1], and L is a step length of the window function; for example, x (u) may be a speech signal to be analyzed with a duration of 1 second, and L may take the value 256, then u-t ∈ [0,255 ].

In step a2, with P (t, f) as an expression of the spectrogram of the speech signal, P (t, f) is calculated according to the following formula:

P(t,f)＝STFT(t,f)²；

p (t, f) represents the energy value at time t and frequency f.

In this embodiment, the window function may include a hamming window, a hanning window, a rectangular window, a triangular window, a blackman window, an exponential window, and so on.

In one embodiment of the present disclosure, to prevent leakage, w (u-t) is a window function of a hamming window, and the expression of the window function of the hamming window is as follows:

l may take a typical value of 256, although L may take other values, and is not limited herein.

In an embodiment of the present disclosure, the step 102 of the method for acquiring the direction information of each energy point in the spectrogram may include the following steps B1 and B2.

In step B1, the spectrogram is subjected to sliding window processing, and the formula is as follows:

P(t,f)＝[P₁(t,f),P₂(t,f),…,P_n(t,f)]

wherein, P_i(t, f) represents the ith sub-region window of the spectrogram, i may take values of 1,2, … … n, n is the number of sub-region windows in the spectrogram, and the value of n is an integer greater than 1, and is determined by the product of the horizontal sliding window parameter and the vertical sliding window parameter of the spectrogram, for example, assuming that the horizontal sliding window parameter of the spectrogram is 12 and the vertical sliding window parameter is 16, the value of n may be 12 × 16 ═ 192, at this time, the spectrogram of the speech signal is subjected to sliding window processing, and the spectrogram is divided into 192 sub-region windows, and the spectrogram may be represented as follows:

in step B2, the sub-region window P is calculated according to the following formula_i(t, f) time-frequency mixed domain in (t, f)₀,f₀) Rate of change of direction of point of energy

In this embodiment, the time-frequency domain (t) is within the time-frequency domain₀,f₀) The energy point at is time t₀And frequency f₀The energy point can be calculated according to the formula to obtain the sub-region window P_i(t, f) the rate of change of direction of all energy points in the time-frequency domain, for example, can be usedIs shown in the sub-area window P_i(t, f) the set of all energy point direction change rates in the time-frequency mixed domain, assuming a sub-region window P_iThe range of the time-frequency mixed domain under (t, f) is t₀∈(0,5.21)ms，f₀E is 0,500 Hz, thenCan be expressed by the following formula:

in this embodiment, for each sub-region window in the spectrogram, the direction change rate of all energy points in the time-frequency mixing domain under the sub-region window may be calculated, so that the direction change rate of all energy points in the spectrogram may be calculated.

In an embodiment of the present disclosure, step 103 in the above method, the establishing a formal background based on a kernel density estimation algorithm according to the direction information of each energy point in the spectrogram, includes the following steps C1 to C4.

In step C1, the sub-region window P is calculated according to the following formula_i(t, f) time-frequency mixed domain in (t, f)₀,f₀) Performing kernel probability density estimation on the directional change rate value distribution of the energy points to obtain (t)₀,f₀) Approximate distribution function f of direction change rate of energy point_h

Wherein the content of the first and second substances,are independently distributed at the same timeA in_rA sample point, a_rIs a sub-region window P_i(t, f) time-frequency mixed domain in (t, f)₀,f₀) The number of all direction change rates of the energy points, h > 0, is a smoothing coefficient called bandwidth, and is obtained by data adaptation. k is a radical of_el(. cndot.) is a kernel function; the sub-region window P_i(t, f) time-frequency mixingWithin a domain (t)₀,f₀) The direction change rate of the energy point has a plurality of values, and the direction change rate values have equal or unequal values, and share a_rA different directional rate of change value.

In step C2, the sub-region window P is calculated according to the following formula_i(t, f) time-frequency mixed domain in (t, f)₀,f₀) Performing kernel probability density estimation on the direction value distribution of the energy points to obtain (t)₀,f₀) Approximate distribution function of direction values of energy points

Wherein the content of the first and second substances,are independently distributed at the same timeA in_fA sample point, a_fIs a sub-region window P_i(t, f) time-frequency mixed domain in (t, f)₀,f₀) The number of all direction values of the energy point; the coordinate axis of the time-frequency plane in the spectrogram is (t)₀,f₀) The angle between the directions l of the energy points is (t)₀,f₀) A direction value of the energy point, the (t)₀,f₀) There are many direction values of energy points, there are equal and unequal direction values, and the total value is a_fA different direction value.

In this embodiment, the kernel function k_elThe functional expression of (c) is:where x is the estimated data, and in step C1, x isIn step C2, x is

In step C3, the correspondence between the energy points and the direction sections contained therein is calculated according to the following formula:

wherein, theTo pass an approximate distribution functionAnd approximate distribution functionDerived direction valueIs a direction valueThe specific calculation process of (a) is well known to those skilled in the art and will not be described in detail herein.G is a typical directional interval determined according to expectation and variance_pRepresenting sub-region windows P_iThe p-th energy point under (t, f) is (t)₀,f₀) An energy point, wherein the value of P is 1,2_iThe number of energy points within (t, f); psi_qIs a sub-region window P_i(t, f) the directional interval after all the directional values of all the energy points have their value ranges quantized at equal intervals, ψ_qRepresents an energy point g_pThe value of q is 1, 2.. and b, wherein b is equal-interval quantizationI is the relationship between the object and the attribute, and g_pIψ_qRepresents an energy point g_pAnd energy point g_pContaining the attribute psi_qThe corresponding relation between the two;

in step C4, a sub-region window P is formed_iThe energy point under (t, f) is the object, and the sub-region window P_i(t, f) in the attribute creation form, namely (G, M, I), the direction intervals after the value ranges of all direction values of all energy points are quantized at equal intervals are used as the background K, wherein G represents the sub-region window P_i(t, f) set of all energy points, M being the window P of the sub-region_iAnd (t, f) the set of direction intervals after the value ranges of all the energy point direction values are quantized at equal intervals.

In this embodiment, G in the formal background K ═ G, M, I represents the sub-region window P_i(t, f), and G ═ 1,2,3,. once.. 63,64, assuming a total of 64 energy points. M is a set of directional interval attributes after the value ranges of all energy point directional values are quantized at equal intervals, assuming that the value ranges of all energy point directional values are 0 ° -180 °, and quantizing the value ranges at equal intervals into 9 directional intervals, then M can be obtained: m ═ psi₁,ψ₂,ψ₃,ψ₄,ψ₅,ψ₆,ψ₇,ψ₈,ψ₉-0 ° -20 °,20 ° -40 °,40 ° -60 °,60 ° -80 °,80 ° -100 °,100 ° -120 °,120 ° -140 °,140 ° -160 °,160 ° -180 °, I being the relation between object and attribute.

By way of example, fig. 2 shows a schematic diagram of a background of one form of an embodiment of the present disclosure, assuming that G ═ 1,2,3, · 7,8}, and M ═ ψ₁,ψ₂,ψ₃,ψ₄,ψ₅,ψ₆,ψ₇,ψ₈-0 ° -20 °,20 ° -40 °,40 ° -60 °,60 ° -80 °,80 ° -100 °,100 ° -120 °,120 ° -140 °,140 ° -160 ° }, said p-th energy point g being calculated according to steps C1 to C3_pIs 20 deg. -60 deg., then the p-th energy point g_pThe included attribute is 8 directional intervals psi in the set M₁,ψ₂,ψ₃,ψ₄,ψ₅,ψ₆,ψ₇,ψ₈，ψ₁0 DEG-20 DEG is not within the typical directional interval of 20 DEG-60 DEG, so g₁Iψ₁＝0，ψ₂20-40 in a typical directional interval of 20-60, psi₃20 DEG-40 DEG within a typical directional interval of 40 DEG-60 DEG, so g₁Iψ₂＝1，g₁Iψ₃＝1，ψ₄,ψ₅,ψ₆,ψ₇,ψ₈Are not within a typical directional interval of 40 DEG to 60 DEG, so g₁Iψ₄＝0，g₁Iψ₅＝0，g₁Iψ₆＝0，g₁Iψ₇＝0，g₁Iψ₈0. By analogy with the above process, the correspondence relationship between 8 energy points in G ═ 1,2, 3.

In this embodiment, for each sub-region window, a formal background K ═ G, M, I may be established according to the above steps, and 192 formal backgrounds are established assuming that there are 192 sub-region windows.

In an embodiment of the present disclosure, the step S104 in the above method, namely, the step of establishing the directional symbiotic attribute topological graph of the voice signals according to the form context, includes the following steps D1 to D2.

In step D1, the co-occurrence strength matrix Edge on the Edge between the attribute pairs in the formal background is calculated according to the following formula_i(ψ_u,ψ_v)：

Wherein g (psi)_u) For the interval psi with the direction in the form background_uThe corresponding relation of (c) is a set of energy points, g (ψ), of 1_v) For the interval psi with the direction in the form background_vThe corresponding relation of (1) is a set of energy points; # (g (psi)_u) Denotes g (ψ)_u) The number of energy points in; # (g (psi)_u)∩g(ψ_v) Denotes g (ψ)_u)∩g(ψ_v) Number of energy points in, u, v are takenA value of 1,2,. said.. said., b, said b being the number of equally spaced quantized directional intervals;

in the following description, the directional section ψ will be described by way of example with reference to the background of the form shown in fig. 2, where u is 1 and v is 2₁The inner energy points are 1,2,3, 5 and 6, and the direction interval psi₂Inner energy points of 3, 4, 6, 7 and 8, (g (psi)₁)∩g(ψ₂))＝3,6，#(g(ψ₁)∩g(ψ₂) 2) and so on, the sub-region window P can be calculated_i(t, f) Edge corresponding thereto_i(ψ_u,ψ_v) Comprises the following steps:

in step D2, according to the Edge_i(ψ_u,ψ_v) And constructing a direction symbiosis attribute topological graph of the voice signals.

FIG. 3 is a diagram illustrating a directional symbiotic property topology according to an embodiment of the present disclosure, and FIG. 3 is a diagram illustrating a directional symbiotic property topology according to the disclosureAnd constructing a directional symbiotic attribute topological graph. The symbiotic relationship between the voice signal direction attribute pairs can be represented visually through the direction symbiotic attribute topological graph shown in fig. 3.

In this embodiment, one symbiotic attribute topological graph can be established according to the form background of each sub-region window, and 192 symbiotic attribute topological graphs can be established assuming that there are 192 sub-region windows.

The following are embodiments of the disclosed apparatus that may be used to perform embodiments of the disclosed methods.

Fig. 4 is a block diagram illustrating a feature acquiring apparatus for a voice signal according to an embodiment of the present disclosure, which may be implemented as part of or all of an electronic device by software, hardware, or a combination of the two. As shown in fig. 4, the acquisition means includes:

an obtaining module 401, configured to perform time-frequency processing on a voice signal to be analyzed, and obtain a spectrogram of the voice signal;

a statistic module 402 configured to statistically obtain direction information of each energy point in the spectrogram;

a first establishing module 403, configured to establish a form background based on a kernel density estimation algorithm according to the direction information of each energy point in the spectrogram, where the form background takes an energy point as an object and a direction section to which the energy point belongs as an attribute, and is used for describing a corresponding relationship between the energy point in the spectrogram and the direction section to which the energy point belongs;

a second establishing module 404, configured to establish a directional symbiotic attribute topological graph of the voice signal according to the form context, where the directional symbiotic attribute topological graph is used to describe a symbiotic relationship between attribute pairs in the form context.

Specifically, the implementation of each module in the apparatus for acquiring characteristics of a speech signal may refer to the description in the method for acquiring characteristics of a speech signal, and is not repeated here.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

15页详细技术资料下载

Method and device for acquiring characteristics of voice signal

相关技术

网友询问留言