A kind of determination method, apparatus, equipment and the storage medium of sound-groove model

文档序号:1757172 发布日期:2019-11-29 浏览:26次 中文

阅读说明:本技术 一种声纹模型的确定方法、装置、设备及存储介质 (A kind of determination method, apparatus, equipment and the storage medium of sound-groove model ) 是由 殷兵 李晋 方昕 方四安 徐承 柳林 于 2019-09-05 设计创作,主要内容包括:本申请提供了一种声纹模型的确定方法、装置、设备及存储介质,其中,方法包括:获取目标语音的至少一个语谱片段,通过预先建立的声纹提取模型,确定每个语谱片段的至少一个第一特征图,其中,第一特征图中的各特征点相互独立,通过声纹提取模型,确定每个第一特征图对应的、包含其全局信息的第二特征图,获得每个语谱片段的至少一个第二特征图,其中,一个第一特征图对应的第二特征图为对该第一特征图中能够区分声纹的特征区域进行强化后的特征图;至少利用每个语谱片段的至少一个第二特征图,以及声纹提取模型,确定目标语音的声纹模型。本申请提供的声纹模型确定方法能够针对目标语音确定出稳定且精准的声纹模型。(This application provides a kind of determination methods of sound-groove model, device, equipment and storage medium, wherein, method includes: to obtain at least one language spectral slice section of target voice, pass through the voiceprint extraction model pre-established, determine at least one fisrt feature figure of each language spectral slice section, wherein, each characteristic point in fisrt feature figure is mutually indepedent, pass through voiceprint extraction model, determine that each fisrt feature figure is corresponding, second feature figure comprising its global information, obtain at least one second feature figure of each language spectral slice section, wherein, the corresponding second feature figure of one fisrt feature figure is the characteristic pattern after strengthening to the characteristic area that can distinguish vocal print in the fisrt feature figure;At least at least one second feature figure and voiceprint extraction model of each language spectral slice section, the sound-groove model of target voice is determined.Sound-groove model provided by the present application determines that method can determine stabilization and accurately sound-groove model for target voice.)

1. a kind of determination method of sound-groove model characterized by comprising

Obtain at least one language spectral slice section of target voice;

By the voiceprint extraction model pre-established, at least one fisrt feature figure of each language spectral slice section is determined, wherein described Each characteristic point in fisrt feature figure is mutually indepedent;

By the voiceprint extraction model, the corresponding second feature figure comprising its global information of each fisrt feature figure is determined, Obtain at least one second feature figure of each language spectral slice section, wherein the corresponding second feature figure of a fisrt feature figure is pair Can be distinguished in the fisrt feature figure vocal print characteristic area strengthened after characteristic pattern;

At least at least one second feature figure and the voiceprint extraction model of each language spectral slice section, institute is determined State the sound-groove model of target voice.

2. the determination method of sound-groove model according to claim 1, which is characterized in that described at least with each language At least one second feature figure and the voiceprint extraction model of spectral slice section, determine the sound-groove model of the target voice, packet It includes:

It is composed using the voiceprint extraction module, at least one fisrt feature figure of each language spectral slice section and each language At least one second feature figure of segment, determines the sound-groove model of the target voice.

3. the determination method of sound-groove model according to claim 2, which is characterized in that described to utilize the voiceprint extraction mould At least one second feature of block, at least one fisrt feature figure of each language spectral slice section and each language spectral slice section Figure, determines the sound-groove model of the target voice, comprising:

For any language spectral slice section of the target voice, by the voiceprint extraction model, by least the one of the language spectral slice section A fisrt feature figure is merged at least one second feature figure of the language spectral slice section, obtains the vocal print submodule of the language spectral slice section Type, to obtain the vocal print submodel of each language spectral slice section of the target voice;

The vocal print submodel of each language spectral slice section of the target voice is averaging, the sound-groove model of the target voice is obtained.

4. the determination method of sound-groove model according to claim 3, which is characterized in that described to pass through the voiceprint extraction mould Type merges at least one fisrt feature figure of the language spectral slice section at least one second feature figure of the language spectral slice section, obtains Obtain the vocal print submodel of the language spectral slice section, comprising:

Each fisrt feature figure of the language spectral slice section is spliced into high dimensional vector by the voiceprint extraction model, as the language The high dimensional vector of the first of spectral slice section;

Each second feature figure of the language spectral slice section is spliced into high dimension vector by the voiceprint extraction model, is composed as the language The high dimensional vector of the second of segment;

The first high dimensional vector of the language spectral slice section and the second higher-dimension of the language spectral slice section are arranged by the voiceprint extraction model Vector is spliced, and spliced high dimension vector is obtained;

The spliced high dimension vector is subjected to dimensionality reduction by the voiceprint extraction model, the vector after dimensionality reduction is determined as this The vocal print submodel of language spectral slice section.

5. the determination method of sound-groove model according to claim 1, which is characterized in that each fisrt feature figure of determination Corresponding second feature figure comprising global information, comprising:

For any fisrt feature figure, which is divided into multiple fisrt feature subgraphs of different frequency range, to obtain Multiple fisrt feature subgraphs that each fisrt feature figure includes;

For any fisrt feature subgraph, the corresponding second feature subgraph comprising global information of the fisrt feature subgraph is determined, To obtain the corresponding second feature subgraph of each fisrt feature subgraph;

For any fisrt feature figure, corresponding second spy of the multiple fisrt feature subgraphs for being included by the fisrt feature figure Subgraph is levied, the corresponding second feature figure comprising global information of the fisrt feature figure is formed, to obtain each fisrt feature figure pair Second feature figure answer, comprising global information.

6. the determination method of sound-groove model according to claim 5, which is characterized in that the determination fisrt feature subgraph Corresponding second feature subgraph comprising global information, comprising:

Dimension-reduction treatment is carried out to the fisrt feature subgraph respectively by three convolution kernels that size is identical, parameter is different, obtains three Feature subgraph after width dimensionality reduction;

Attention weight is determined by two width feature subgraphs in the feature subgraph after the three width dimensionality reduction;

By another width feature subgraph in the feature subgraph after the attention weight and the three width dimensionality reduction, determine this first The corresponding second feature subgraph comprising global information of feature subgraph.

7. the determination method of sound-groove model according to claim 1, which is characterized in that the acquisition target voice is at least One language spectral slice section, comprising:

The phonetic feature for determining each speech frame of the target voice obtains the phonetic feature sequence of the target voice;

By preset segmentation rules, cutting is carried out to the phonetic feature sequence of the trained voice, obtains the target voice At least one language spectral slice section.

8. the determination method of sound-groove model described according to claim 1~any one of 7, which is characterized in that pre-establish The process of the voiceprint extraction model, comprising:

Training voice is obtained, and obtains at least one language spectral slice section of the trained voice;

By current voiceprint extraction model, at least one fisrt feature of each language spectral slice section of the trained voice is determined Figure, each characteristic point in the fisrt feature figure are mutually indepedent, wherein if training for the first time, then current voiceprint extraction model For initial voiceprint extraction model, if not training for the first time, then current voiceprint extraction model is the vocal print after premenstrual primary training Extract model;

Determine that each fisrt feature figure of each language spectral slice section of the trained voice is corresponding by current voiceprint extraction model , second feature figure comprising global information, with obtain the trained voice each language spectral slice section at least one is second special Sign figure, wherein the corresponding second feature figure of a fisrt feature figure is to the feature that can distinguish vocal print in the fisrt feature figure Region strengthened after characteristic pattern;

At least one second feature figure and current voiceprint extraction at least with each language spectral slice section of the trained voice Model determines the vocal print submodel of each language spectral slice section of the trained voice;

According to the vocal print submodel of each language spectral slice section of the trained voice, each language spectral slice section of the trained voice is predicted Corresponding vocal print identity label, and update according to prediction result the parameter of current voiceprint extraction model.

9. the determination method of sound-groove model according to claim 8, which is characterized in that described at least with the trained language At least one second feature figure and current voiceprint extraction model of each language spectral slice section of sound, determine the trained voice Each language spectral slice section vocal print submodel, comprising:

For any language spectral slice section of the trained voice, by current voiceprint extraction model by least the one of the language spectral slice section A fisrt feature figure is merged at least one second feature figure of the language spectral slice section, obtains the vocal print submodule of the language spectral slice section Type, to obtain the vocal print submodel of each language spectral slice section of the trained voice.

10. a kind of determining device of sound-groove model characterized by comprising language spectral slice section obtains module, fisrt feature obtains mould Block, second feature obtain module and sound-groove model determining module;

Institute's predicate spectral slice section obtains module, for obtaining at least one language spectral slice section of target voice;

The fisrt feature obtains module, for the voiceprint extraction model by pre-establishing, determines each language spectral slice section extremely A few fisrt feature figure, wherein each characteristic point in the fisrt feature figure is mutually indepedent;

The second feature obtains module, for determining that each fisrt feature figure is corresponding, wraps by the voiceprint extraction model Second feature figure containing its global information obtains at least one second feature figure of each language spectral slice section, wherein one first special It is the feature after strengthening to the characteristic area that can distinguish vocal print in the fisrt feature figure that sign, which schemes corresponding second feature figure, Figure;

The sound-groove model determining module, at least one second feature figure at least with each language spectral slice section, with And the voiceprint extraction model, determine the sound-groove model of the target voice.

11. a kind of sound-groove model locking equipment really characterized by comprising memory and processor;

The memory, for storing program;

The processor is realized such as sound-groove model according to any one of claims 1 to 9 really for executing described program Determine each step of method.

12. a kind of readable storage medium storing program for executing, is stored thereon with computer program, which is characterized in that the computer program is processed When device executes, each step of the determination method such as sound-groove model according to any one of claims 1 to 9 is realized.

Technical field

This application involves the determination method, apparatus of sound groove recognition technology in e field more particularly to a kind of sound-groove model, equipment and Storage medium.

Background technique

Application on Voiceprint Recognition is one of the key technology in biological identification field, directly carries out authentication using voice signal, Not only have the characteristics that without memory, adjudicate simple, but also can authenticate, have higher in the unwitting situation of user User's acceptance, be widely used in the fields such as national security, finance, smart home.

It should be noted that the key of Application on Voiceprint Recognition is the determination of sound-groove model.Currently, sound-groove model is based primarily upon full change The method for measuring factorial analysis determines, that is, utilizes a large amount of corpus, and training obtains covering the entire variable space of various environment and channel, By this space, one section of voice is mapped to the fixed unified sound-groove model vector (i-vector) of dimension.

It is higher for the accuracy requirement of Application on Voiceprint Recognition in certain application fields, this requires being stablized, accurate sound Line model, however, current sound-groove model determines that the sound-groove model that scheme determines is not sufficiently stable and accurately, this leads to Application on Voiceprint Recognition It is ineffective, it is unable to satisfy requirement of certain fields for Application on Voiceprint Recognition accuracy.

Summary of the invention

In view of this, this application provides determination method, apparatus, equipment and the storage medium of a kind of sound-groove model, to The sound-groove model for solving the problems, such as that sound-groove model in the prior art determines that scheme determines is not sufficiently stable and accurate, technical solution It is as follows:

A kind of determination method of sound-groove model, comprising:

Obtain at least one language spectral slice section of target voice;

By the voiceprint extraction model pre-established, at least one fisrt feature figure of each language spectral slice section is determined, wherein Each characteristic point in the fisrt feature figure is mutually indepedent;

By the voiceprint extraction model, the second corresponding, comprising its global information spy of each fisrt feature figure is determined Sign figure, obtains at least one second feature figure of each language spectral slice section, wherein the corresponding second feature figure of a fisrt feature figure For the characteristic pattern after strengthening to the characteristic area that can distinguish vocal print in the fisrt feature figure;

At least at least one second feature figure and the voiceprint extraction model of each language spectral slice section, really The sound-groove model of the fixed target voice.

Optionally, described at least one second feature figure and the vocal print at least with each language spectral slice section Model is extracted, determines the sound-groove model of the target voice, comprising:

Utilize the voiceprint extraction module, at least one fisrt feature figure of each language spectral slice section and described each At least one second feature figure of language spectral slice section, determines the sound-groove model of the target voice.

Optionally, described at least one fisrt feature figure using the voiceprint extraction module, each language spectral slice section And at least one second feature figure of each language spectral slice section, determine the sound-groove model of the target voice, comprising:

For any language spectral slice section of the target voice, by the voiceprint extraction model, extremely by the language spectral slice section A few fisrt feature figure is merged at least one second feature figure of the language spectral slice section, obtains the vocal print of the language spectral slice section Submodel, to obtain the vocal print submodel of each language spectral slice section of the target voice;

The vocal print submodel of each language spectral slice section of the target voice is averaging, the vocal print mould of the target voice is obtained Type.

Optionally, it is described by the voiceprint extraction model by least one fisrt feature figure of the language spectral slice section and the language At least one second feature figure of spectral slice section is merged, and the vocal print submodel of the language spectral slice section is obtained, comprising:

Each fisrt feature figure of the language spectral slice section is spliced into high dimensional vector by the voiceprint extraction model, as First high dimensional vector of the language spectral slice section;

Each second feature figure of the language spectral slice section is spliced into high dimension vector by the voiceprint extraction model, as this Second high dimensional vector of language spectral slice section;

It is high by the second of the first high dimensional vector of the language spectral slice section and the language spectral slice section by the voiceprint extraction model Dimensional vector is spliced, and spliced high dimension vector is obtained;

The spliced high dimension vector is subjected to dimensionality reduction by the voiceprint extraction model, the vector after dimensionality reduction is determined For the vocal print submodel of the language spectral slice section.

Optionally, the corresponding second feature figure comprising global information of each fisrt feature figure of the determination, comprising:

For any fisrt feature figure, which is divided into multiple fisrt feature subgraphs of different frequency range, with Obtain multiple fisrt feature subgraphs that each fisrt feature figure includes;

For any fisrt feature subgraph, determine that the fisrt feature subgraph is corresponding, second feature comprising global information Subgraph, to obtain the corresponding second feature subgraph of each fisrt feature subgraph;

For any fisrt feature figure, the multiple fisrt feature subgraphs for being included by the fisrt feature figure corresponding Two feature subgraphs form the corresponding second feature figure comprising global information of the fisrt feature figure, to obtain each fisrt feature Scheme corresponding second feature figure comprising global information.

Optionally, the corresponding second feature subgraph comprising global information of the determination fisrt feature subgraph, comprising:

Dimension-reduction treatment is carried out to the fisrt feature subgraph respectively by three convolution kernels that size is identical, parameter is different, is obtained Feature subgraph after obtaining three width dimensionality reductions;

Attention weight is determined by two width feature subgraphs in the feature subgraph after the three width dimensionality reduction;

By another width feature subgraph in the feature subgraph after the attention weight and the three width dimensionality reduction, determining should The corresponding second feature subgraph comprising global information of fisrt feature subgraph.

Optionally, described at least one language spectral slice section for obtaining target voice, comprising:

The phonetic feature for determining each speech frame of the target voice obtains the phonetic feature sequence of the target voice Column;

By preset segmentation rules, cutting is carried out to the phonetic feature sequence of the trained voice, obtains the target language At least one language spectral slice section of sound.

Optionally, the process of the voiceprint extraction model is pre-established, comprising:

Training voice is obtained, and obtains at least one language spectral slice section of the trained voice;

By current voiceprint extraction model, at least one first spy of each language spectral slice section of the trained voice is determined Sign figure, each characteristic point in the fisrt feature figure are mutually indepedent, wherein if training for the first time, then current voiceprint extraction mould Type is initial voiceprint extraction model, if not training for the first time, then current voiceprint extraction model is the sound after premenstrual primary training Line extracts model;

Each fisrt feature figure of each language spectral slice section of the trained voice is determined by current voiceprint extraction model Corresponding second feature figure comprising global information, with obtain the trained voice each language spectral slice section at least one Two characteristic patterns, wherein the corresponding second feature figure of a fisrt feature figure is that can distinguish vocal print in the fisrt feature figure Characteristic area strengthened after characteristic pattern;

At least at least one second feature figure and current vocal print of each language spectral slice section of the trained voice Model is extracted, determines the vocal print submodel of each language spectral slice section of the trained voice;

According to the vocal print submodel of each language spectral slice section of the trained voice, each language spectrum of the trained voice is predicted The corresponding vocal print identity label of segment, and update according to prediction result the parameter of current voiceprint extraction model.

Optionally, at least one second feature figure of each language spectral slice section at least with the trained voice, with And current voiceprint extraction model, determine the vocal print submodel of each language spectral slice section of the trained voice, comprising:

For any language spectral slice section of the trained voice, by current voiceprint extraction model by the language spectral slice section extremely A few fisrt feature figure is merged at least one second feature figure of the language spectral slice section, obtains the vocal print of the language spectral slice section Submodel, to obtain the vocal print submodel of each language spectral slice section of the trained voice.

A kind of determining device of sound-groove model, comprising: language spectral slice section obtains module, fisrt feature obtains module, the second spy Sign obtains module and sound-groove model determining module;

Institute's predicate spectral slice section obtains module, for obtaining at least one language spectral slice section of target voice;

The fisrt feature obtains module, for the voiceprint extraction model by pre-establishing, determines each language spectral slice section At least one fisrt feature figure, wherein each characteristic point in the fisrt feature figure is mutually indepedent;

The second feature obtains module, for determining that each fisrt feature figure is corresponding by the voiceprint extraction model , second feature figure comprising its global information, obtain at least one second feature figure of each language spectral slice section, wherein one The corresponding second feature figure of fisrt feature figure is after strengthening to the characteristic area that can distinguish vocal print in the fisrt feature figure Characteristic pattern;

The sound-groove model determining module, at least one second feature at least with each language spectral slice section Figure and the voiceprint extraction model, determine the sound-groove model of the target voice.

A kind of sound-groove model locking equipment really, comprising: memory and processor;

The memory, for storing program;

The processor realizes the determination method of sound-groove model described in any of the above embodiments for executing described program Each step.

A kind of readable storage medium storing program for executing is stored thereon with computer program, real when the computer program is executed by processor Each step of the determination method of existing sound-groove model described in any of the above embodiments.

Via above scheme it is found that determination method, apparatus, equipment and the storage medium of sound-groove model provided by the present application, Then at least one the language spectral slice section for obtaining target voice first by the voiceprint extraction model pre-established, determines each language At least one of spectral slice section includes the fisrt feature figure of local message, then determines each fisrt feature by voiceprint extraction model Scheme corresponding second feature figure comprising global information, to obtain at least one second feature figure of each language spectral slice section, most Afterwards, the sound-groove model of target voice is determined at least at least one second feature figure and sound-groove model of each language spectral slice section. Sound-groove model provided by the present application determines method, and the first spy of language spectral slice section is obtained using the voiceprint extraction model pre-established Sign figure, compared with the prior art, more accurate and stable sound-groove model is capable of determining that by fisrt feature figure, it is contemplated that the Each characteristic point of one characteristic pattern is mutually indepedent, i.e., what fisrt feature figure included is local message, more stable in order to obtain Accurately sound-groove model, the application further utilize voiceprint extraction model sufficiently to excavate the global information of fisrt feature figure, from And sound-groove model is determined at least with the second feature figure comprising global information, since second feature figure contains global information, And the characteristic area that vocal print can be distinguished in fisrt feature figure is strengthened, it therefore, at least can be true based on second feature figure Make more stable and accurately sound-groove model.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The embodiment of invention for those of ordinary skill in the art without creative efforts, can also basis The attached drawing of offer obtains other attached drawings.

Fig. 1 is the flow diagram of the determination method of sound-groove model provided by the embodiments of the present application;

Fig. 2 is the flow diagram provided by the embodiments of the present application for pre-establishing voiceprint extraction model;

Fig. 3 corresponding, second feature comprising global information for determining fisrt feature subgraph provided by the embodiments of the present application The schematic diagram of subgraph;

Fig. 4 is the structural schematic diagram of the determining device of sound-groove model provided by the embodiments of the present application;

Fig. 5 is the structural schematic diagram of sound-groove model provided by the embodiments of the present application locking equipment really.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.

When carrying out Application on Voiceprint Recognition, two sections of voices are described using the similitude of sound-groove model whether from the same theory People is talked about, if the sound-groove model obtained is not sufficiently stable and accurately, will directly affect Application on Voiceprint Recognition effect.

For determining scheme based on the sound-groove model of entire variable factorial analysis, in the case where voice duration is shorter, Since normalized set is insufficient, it will lead to determining sound-groove model and be not sufficiently stable and accurately.

In order to determine stable, accurate sound-groove model, so as to promote Application on Voiceprint Recognition effect, inventor It is studied, thinking originally is:

It is true using the sound-groove model based on depth convolutional neural networks (ConvolutionalNeuralNetworks, CNN) Determine scheme, in recent years, deep learning method achieves the achievement to attract people's attention in numerous research fields, low by combinatory analysis Layer feature, forms abstract high-rise attribute description, to find that the Structural Characteristics of data indicate, wherein depth convolutional Neural net Network is that developed recently gets up and causes a kind of efficient learning method paid attention to extensively.

Relative to simple entire variable factor-analysis approach, convolutional neural networks can carry out joint point to time domain and frequency domain Analysis, the deep voiceprint excavated in voice spectrum obtain more careful vocal print feature expression, to establish accurately vocal print Model.

When determining sound-groove model based on depth convolutional neural networks, reflection voiceprint is extracted first from one section of voice Feature, for example, Fast Fourier Transform (FFT) feature (Fast Fourier Transform, FFT), then by stacking convolution (convolution), structures, the training convolutional neural model CNN such as pond (pooling), activation (activation) utilize this Phonetic feature is carried out non-linear projection by a convolutional Neural model, obtains the corresponding sound-groove model c-vector of this section of voice.Base Determine that project plan comparison is simple and efficient in the vocal print of convolutional neural networks.

However, inventor through further research, it has been found that, the above-mentioned sound-groove model based on convolutional neural networks determines scheme, During carrying out characteristic pattern (feature map) analysis, the characteristic point on each characteristic pattern is mutually indepedent, is limited to convolution kernel The limitation of receptive field is unable to fully obtain the global information of characteristic pattern, this leads to the vocal print mould determined based on convolutional neural networks Type is still not sufficiently stable and accurately.

In order to obtain more stable and accurate sound-groove model, the further investigation of inventor's further progress, It ultimately provides a kind of preferable sound-groove model of effect and determines that method, this method are applied to carry out the applied field of Application on Voiceprint Recognition Scape, this method can be applied to the terminal with data-handling capacity, can also be applied to server.Followed by following embodiments Method, which is introduced, to be determined to sound-groove model provided by the present application.

Referring to Fig. 1, showing the flow diagram of the determination method of sound-groove model provided by the embodiments of the present application, the party Method may include:

Step S101: at least one language spectral slice section of target voice is obtained.

Specifically, the process for obtaining at least one language spectral slice section of target voice may include:

Step S1011, the phonetic feature for determining each speech frame of target voice obtains the phonetic feature sequence of target voice Column.

Specifically, framing adding window, Fourier transformation can be carried out to target voice, FFT characteristic sequence, FFT feature sequence are obtained Arrange the phonetic feature sequence as target voice.

Step S1012, preset segmentation rules are pressed, cutting is carried out to the phonetic feature sequence of target voice, is obtained at least One language spectral slice section.

Specifically, the long L of window can be preset, cutting is carried out by phonetic feature sequence of the long L of window to target voice, it is assumed that voice is special The dimension of sign is d, then the size of each language spectral slice section is L × d.

Step S102: at least one fisrt feature of each language spectral slice section is determined by the voiceprint extraction model pre-established Figure.

Wherein, each characteristic point is mutually indepedent in fisrt feature figure, i.e., include in fisrt feature figure is local message.

Wherein, the voiceprint extraction model pre-established can be the model based on convolutional neural networks, use trained language The language spectral slice section training of sound obtains, and the language spectral slice section of training voice has vocal print identity label.

Step S103: by voiceprint extraction model determine each fisrt feature figure it is corresponding, comprising global information second Characteristic pattern, to obtain at least one second feature figure of each language spectral slice section.

Wherein, the corresponding second feature figure of a fisrt feature figure is to the spy that can distinguish vocal print in the fisrt feature figure The characteristic pattern after region is strengthened is levied, is the equal of the characteristic pattern after being optimized to fisrt feature figure.The present embodiment benefit Global information is sufficiently excavated from fisrt feature figure with voiceprint extraction model, is determined and is needed the characteristic area paid close attention to (can Enough obvious characteristic areas for distinguishing vocal print), and then the characteristic area paid close attention to needs is strengthened.

Step S104: at least at least one second feature figure and voiceprint extraction model of each language spectral slice section, really Set the goal the sound-groove model of voice.

Since second feature figure includes global information, and the characteristic area that vocal print can be distinguished in fisrt feature figure is carried out Enhance, therefore, more accurate and stable sound-groove model be capable of determining that according to second feature figure, be based on this, one kind can Can implementation in, using at least one second feature figure of voiceprint extraction model and each language spectral slice section of target voice, Determine the sound-groove model of target voice.

In view of fisrt feature figure includes local message, and second feature figure includes global information, in order to obtain more Add accurate and stable sound-groove model, it is every using voiceprint extraction model, target voice in alternatively possible implementation At least one second feature figure of each language spectral slice section of at least one fisrt feature figure and target voice of a language spectral slice section, really Set the goal the sound-groove model of voice, that is, while the sound-groove model of target voice is determined using global information and local message.

Sound-groove model provided by the embodiments of the present application determines method, obtains mesh using the voiceprint extraction model pre-established The fisrt feature figure of the language spectral slice section of poster sound, compared with the prior art in based on entire variable factorial analysis vocal print determination side Case carries out Conjoint Analysis, energy to time domain and frequency domain due to containing the voiceprint of time domain and frequency-domain-interleaving in fisrt feature figure The deep enough voiceprint excavated in voice spectrum obtains more stable and accurate sound-groove model, it is contemplated that fisrt feature Each characteristic point of figure is mutually indepedent, i.e., what fisrt feature figure included is local message, more stable and accurate in order to obtain Sound-groove model, the application further excavates the global information of fisrt feature figure using voiceprint extraction model, thus using comprising The second feature figure of global information determines sound-groove model, since second feature figure contains global information, and to fisrt feature figure In can distinguish the characteristic area of vocal print and strengthened, therefore, be capable of determining that based on second feature figure more stable and accurate Sound-groove model.

Via above-described embodiment it is found that the sound-groove model of target voice is determined by the voiceprint extraction model pre-established, The process for pre-establishing voiceprint extraction model is introduced below.

Referring to Fig. 2, showing the flow diagram for pre-establishing voiceprint extraction model, may include:

Step S201: training voice is obtained, and obtains at least one language spectral slice section of trained voice.

Wherein, at least one of the process of at least one language spectral slice section of acquisition training voice and above-mentioned acquisition target voice The process of language spectral slice section is similar, that is, determines the phonetic feature of each speech frame of training voice, and the voice for obtaining training voice is special Levy sequence;By preset segmentation rules, cutting is carried out to the phonetic feature sequence of training voice, obtains at least the one of training voice A language spectral slice section.

Likewise, framing adding window, Fourier transformation can be carried out to training voice, FFT characteristic sequence, FFT feature sequence are obtained The phonetic feature sequence as training voice is arranged, carries out cutting by phonetic feature sequence of the long L of default window to training voice.

It should be noted that if the curtailment L of training voice, then mend training voice with the copy of training voice It fills, so that final voice length is greater than or equal to L, if final voice length is not the integral multiple of L, by extra voice It deletes, so that the length of final voice is the integral multiple of L.In addition, but not being the integer of L if the length of training voice is greater than L Times, then equally training voice after supplementary training voice copy, then remove extra voice.If above-mentioned target voice is deposited The case where curtailment L or length are greater than L but the not integral multiple of L, handled using mode identical with training voice.

It is understood that will cause the original sound spectrograph fragmentation of language spectral slice, continuous language spectrum information if L setting is too small Be split as several small language spectral slice sections, the information between language spectral slice section lose it is excessive, can not to voice when long range dependent into Row modeling, L setting is excessive, influences the training effectiveness of voiceprint extraction model, while the resource occupation of GPU is obviously improved.In one kind In possible implementation, the long L of window may be configured as training data and training voice concentrated to be averaged the 1/2 of duration.

Step S202: by current voiceprint extraction model determine each language spectral slice section of training voice at least one the One characteristic pattern.

It should be noted that current voiceprint extraction model is initial voiceprint extraction model when training for the first time.

Wherein, each characteristic point in fisrt feature figure is mutually indepedent, i.e., fisrt feature figure includes local message.

Specifically, any language spectral slice section can be handled by carrying out convolution, pond and activation to the language spectral slice section, it will It is mapped as at least one fisrt feature figure.

Step S203: by current voiceprint extraction model, each of each language spectral slice section of training voice first is determined The corresponding second feature figure comprising global information of characteristic pattern, at least one of each language spectral slice section of acquisition training voice Second feature figure.

Wherein, the corresponding second feature figure of a fisrt feature figure is to the spy that can distinguish vocal print in the fisrt feature figure Levy the characteristic pattern after region is strengthened.

Specifically, determining that each fisrt feature figure of each language spectral slice section of training voice is corresponding, including global information The process of second feature figure may include:

Step S2031, for any fisrt feature figure, which is divided into multiple first spies of different frequency range Subgraph is levied, to obtain multiple fisrt feature subgraphs that each fisrt feature figure includes.

The present embodiment divides fisrt feature figure in frequency domain, obtains multiple fisrt feature in different frequency range Figure.

Referring to Fig. 3,301 in Fig. 3 be an example of a fisrt feature figure, fisrt feature Figure 30 1 is divided into difference Two fisrt feature subgraphs of frequency range.

Step S2032, it for any fisrt feature subgraph, determines that the fisrt feature subgraph is corresponding, include global information Second feature subgraph, to obtain the corresponding second feature subgraph of each fisrt feature subgraph.

Specifically, for any fisrt feature subgraph, determine the fisrt feature subgraph it is corresponding, comprising global information The process of two feature subgraphs include: by three convolution kernels that size is identical, parameter is different respectively to the fisrt feature subgraph into Row dimension-reduction treatment, the feature subgraph after obtaining three width dimensionality reductions;Pass through two width feature subgraphs in the feature subgraph after three width dimensionality reductions Determine attention weight;By another width feature subgraph in the feature subgraph after attention weight and three width dimensionality reductions, determining should The corresponding second feature subgraph comprising global information of fisrt feature subgraph.

For any fisrt feature subgraph, it is assumed that using three sizes it is identical (such as three having a size of 1 × 1 convolution Core), the different convolution kernel of parameter obtain p1, p2 and p3 after carrying out dimensionality reduction to this its, the transposition of p1 is multiplied with p2 first, p1's Transposition can characterize the correlation of each characteristic point of p1, p2 with the matrix that p2 is multiplied, then that the transposition of p1 and p2 phase is multiplied The matrix arrived passes through softmax layers, and attention weight can be obtained, then attention weight is multiplied with p3, finally uses convolution Core (such as 1 × 1 convolution kernel) carries out a liter dimension to the result that attention weight is multiplied with p3, obtains and fisrt feature Scheme corresponding second feature subgraph comprising global information, wherein the corresponding second feature subgraph of the fisrt feature subgraph with should Fisrt feature sub-graph size is identical.

As shown in figure 3, fisrt feature Figure 30 1 to be divided into the fisrt feature subgraph 3011 and the first spy of two different frequency ranges Sign subgraph 3012 carries out dimensionality reduction to it using 31 × 1 convolution kernels, obtains after dimensionality reduction for fisrt feature subgraph 3011 Three width feature subgraph 3011a, 3011b and 3011c are obtained, 3011a transposition is multiplied with 3011b, multiplied result passes through softmax Layer obtains attention weight, and the attention weight of acquisition is multiplied with 3011c, which is risen through 1 × 1 convolution again Dimension can obtain the corresponding second feature subgraph 3011 ' comprising global information of fisrt feature subgraph 3011, second feature Figure 30 11 ' is the feature subgraph after optimizing to fisrt feature subgraph 3011.To fisrt feature subgraph 3012 using identical Processing can get the corresponding second feature subgraph 3012 ' comprising global information of fisrt feature subgraph 3012.

Step S2033, for any fisrt feature figure, the multiple fisrt feature subgraphs for being included by the fisrt feature figure point Not corresponding second feature subgraph forms the corresponding second feature figure comprising global information of the fisrt feature figure, every to obtain The corresponding second feature figure comprising global information of a fisrt feature figure.

As shown in figure 3, fisrt feature subgraph 3011 is corresponding second feature subgraph 3011 ' comprising global information with The corresponding second feature subgraph 3012 ' comprising global information of second feature subgraph 3012 is spliced, and fisrt feature figure is obtained 301 corresponding second feature Figure 30 1 ' comprising global information.

Step S204: at least at least one second feature figure of each language spectral slice section of training voice, and currently Voiceprint extraction model, determine training voice each language spectral slice section vocal print submodel.

In one possible implementation, it for any language spectral slice section of training voice, is mentioned using current vocal print At least one second feature figure of modulus type and the language spectral slice section, determines the vocal print submodel of the language spectral slice section.In order to obtain More stable and accurately sound-groove model, in alternatively possible implementation, for any language spectral slice of trained voice Section, extremely using current voiceprint extraction model, at least one fisrt feature figure of the language spectral slice section and the language spectral slice section A few second feature figure, determines the vocal print submodel of the language spectral slice section.

Wherein, for any language spectral slice section, using current voiceprint extraction model and the language spectral slice section at least one the Two characteristic patterns, determine the vocal print submodel of the language spectral slice section process may include: should by current voiceprint extraction model Each second feature figure of language spectral slice section is spliced into high dimension vector, and carries out dimensionality reduction, drop to the high dimension vector by linear transformation Vocal print submodel of the vector as the language spectral slice section is obtained after dimension.

Wherein, for any language spectral slice section, using current voiceprint extraction model, the language spectral slice section at least one first At least one second feature figure of characteristic pattern and the language spectral slice section, determines the vocal print submodel of the language spectral slice section, comprising: pass through Current voiceprint extraction model by least one fisrt feature figure of the language spectral slice section and the language spectral slice section at least one second Characteristic pattern is merged, and the vocal print submodel of the language spectral slice section is obtained.Specifically, by each fisrt feature figure of the language spectral slice section It is spliced into high dimensional vector, as the first high dimensional vector of the language spectral slice section, by each second feature figure of the language spectral slice section It is spliced into high dimensional vector, as the second high dimensional vector of the language spectral slice section, by the first high dimensional vector of the language spectral slice section Spliced with the second high dimensional vector of the language spectral slice section, obtain spliced high dimension vector, by spliced high dimension vector Dimensionality reduction is carried out by linear transformation, obtains low-dimensional vector, vocal print submodel of the low-dimensional vector of acquisition as the language spectral slice section.

Step S205: according to the vocal print submodel of each language spectral slice section of training voice, each language of training voice is predicted The corresponding vocal print identity label of spectral slice section, and update according to prediction result the parameter of current voiceprint extraction model.

Wherein, the corresponding vocal print identity label of each language spectral slice section of training voice is for identifying the corresponding theory of trained voice Talk about people.

Above-mentioned training process is performed a plurality of times, until reaching preset frequency of training, alternatively, the voiceprint extraction mould that training obtains The performance of type is met the requirements.

Via above-mentioned training process, the voiceprint extraction model for determining the sound-groove model of target voice can be obtained.In On the basis of above-described embodiment, the mistake of the sound-groove model of target voice is determined to the voiceprint extraction model obtained using training below Journey is further illustrated.

Above-described embodiment is mentioned, after at least one the language spectral slice section for obtaining target voice, first by pre-establishing Voiceprint extraction model determines at least one fisrt feature figure comprising local message of each language spectral slice section of target voice, so Afterwards, by the voiceprint extraction model pre-established determine each fisrt feature figure it is corresponding, comprising global second feature figure, with At least one second feature figure of each language spectral slice section of target voice is obtained, the voiceprint extraction by pre-establishing is given below Model determines the process of corresponding, comprising the overall situation the second feature figure of each fisrt feature figure:

Step a1, for any fisrt feature figure, which is divided into multiple fisrt feature of different frequency range Subgraph, to obtain multiple fisrt feature subgraphs that each fisrt feature figure includes.

Step a2, for any fisrt feature subgraph, determine the fisrt feature subgraph it is corresponding, comprising global information Two feature subgraphs, to obtain the corresponding second feature subgraph of each fisrt feature subgraph.

Specifically, carrying out dimensionality reduction to the fisrt feature subgraph respectively by three convolution kernels that size is identical, parameter is different Processing, the feature subgraph after obtaining three width dimensionality reductions;Note is determined by two width feature subgraphs in the feature subgraph after three width dimensionality reductions Meaning power weight;By another width feature subgraph in the feature subgraph after attention weight and three width dimensionality reductions, first spy is determined Levy the corresponding second feature subgraph comprising global information of subgraph.

Step a3, for any fisrt feature figure, the multiple fisrt feature subgraphs for being included by the fisrt feature figure are distinguished Corresponding second feature subgraph forms the corresponding second feature figure comprising global information of the fisrt feature figure, each to obtain The corresponding second feature figure comprising global information of fisrt feature figure, i.e. at least one of each language spectral slice section of target voice Second feature figure.

It should be noted that determining the process and above-mentioned determining training of the second feature figure of each language spectral slice section of target voice The process of the second feature figure of each language spectral slice section of voice is essentially identical, and illustrating for step a~c can be found in above-mentioned determining instruction Practice the process of the second feature figure of each language spectral slice section of voice.

After at least one the second feature figure for obtaining each language spectral slice section of target voice, at least with each language spectral slice section At least one second feature figure and voiceprint extraction model, determine the sound-groove model of target voice, it is specific:

Step b1, at least at least one second feature figure of each language spectral slice section of target voice and the sound pre-established Line model determines the vocal print submodel of each language spectral slice section of target voice.

In the training stage, if only determining training language according at least one second feature figure of the training each language spectral slice section of voice The vocal print submodel of each language spectral slice section of sound, then herein also only according to each language spectral slice section of target voice at least one second Characteristic pattern determines the vocal print submodel of each language spectral slice section of target voice, in the training stage, if being composed according to each language of training voice At least one fisrt feature figure and at least one second feature figure of segment determine the vocal print of each language spectral slice section of training voice Submodel, then herein also according at least one fisrt feature figure of each language spectral slice section of target voice and at least one second feature Scheme the vocal print submodel of the determining each language spectral slice section of target voice.

Specifically, using each language spectral slice section of the voiceprint extraction model, the target voice that pre-establish at least one first At least one second feature figure of characteristic pattern and each language spectral slice section of target voice determines each language spectral slice section of target voice The process of vocal print submodel includes: for any language spectral slice section, by the voiceprint extraction model that pre-establishes by the language spectral slice section At least one fisrt feature figure merged at least one second feature figure of the language spectral slice section, obtain the language spectral slice section Vocal print submodel, to obtain the vocal print submodel of each language spectral slice section.Further, the voiceprint extraction model by pre-establishing The mistake that at least one fisrt feature figure of the language spectral slice section is merged at least one second feature figure of the language spectral slice section Journey includes: that each fisrt feature figure of the language spectral slice section is spliced into high dimension vector by the voiceprint extraction model by pre-establishing, The first high dimensional vector as the language spectral slice section;By the voiceprint extraction model that pre-establishes by each the of the language spectral slice section Two characteristic patterns are spliced into high dimensional vector, the second high dimensional vector as the language spectral slice section;It is mentioned by the vocal print pre-established Modulus type splices the first high dimensional vector of the language spectral slice section and the second high dimensional vector of the language spectral slice section, is spelled High dimension vector after connecing;Spliced high dimension vector is subjected to dimensionality reduction by voiceprint extraction model, the vector after dimensionality reduction is determined For the vocal print submodel of the language spectral slice section.

Step b2, the vocal print submodel of each language spectral slice section of target voice is averaging, obtains the vocal print mould of target voice Type.

It should be noted that if the language spectral slice section of target voice only one, then directly by the vocal print of language spectral slice section Model is determined as the sound-groove model of target voice;If the language spectral slice section of target voice have it is multiple, by the sound of multiple language spectral slice sections The mean value of line model is determined as the sound-groove model of target voice.

Sound-groove model provided by the embodiments of the present application determines method, using attention mechanism to each language spectral slice section of target voice Fisrt feature figure optimize, accurate and stable sound-groove model is capable of determining that using the feature after optimization;Due to first Feature includes local message, and second feature includes global information, and fisrt feature and second feature are carried out Mutually fusion, can be true Make more accurate and stable sound-groove model.Furthermore, it is contemplated that voiceprint has different manifestations, the application in different frequency range Embodiment is divided into the subgraph on different frequency range when determining attention weight, by fisrt feature figure, can so reduce different frequencies Interfering with each other between segment information can achieve the purpose that accurately to calculate attention weight.

The embodiment of the present application also provides a kind of determining devices of sound-groove model, below to sound provided by the embodiments of the present application The determining device of line model is described, and the determining device of sound-groove model described below and above-described sound-groove model are really The method of determining can correspond to each other reference.

Referring to Fig. 4, showing the structural schematic diagram of the determining device of sound-groove model provided by the embodiments of the present application, such as scheme Shown in 4, the determining device of the sound-groove model may include: that language spectral slice section obtains module 401, fisrt feature obtains module 402, the Two features obtain module 403 and sound-groove model determining module 404.

Language spectral slice section obtains module 401, for obtaining at least one language spectral slice section of target voice.

Fisrt feature obtains module 402, for the voiceprint extraction model by pre-establishing, determines each language spectral slice section At least one fisrt feature figure.

Wherein, each characteristic point in fisrt feature figure is mutually indepedent.

Second feature obtains module 403, for determining that each fisrt feature figure is corresponding by the voiceprint extraction model , second feature figure comprising its global information, obtain at least one second feature figure of each language spectral slice section.

Wherein, the corresponding second feature figure of a fisrt feature figure is to the spy that can distinguish vocal print in the fisrt feature figure Levy the characteristic pattern after region is strengthened.

Sound-groove model determining module 404, at least one second feature figure at least with each language spectral slice section, And the voiceprint extraction model, determine the sound-groove model of the target voice.

Sound-groove model determining device provided by the embodiments of the present application obtains mesh using the voiceprint extraction model pre-established The fisrt feature figure of the language spectral slice section of poster sound, more accurate and stable vocal print mould is capable of determining that by fisrt feature figure Type, it is contemplated that each characteristic point of fisrt feature figure is mutually indepedent, i.e., what fisrt feature figure included is local message, in order to obtain It is more stable and accurately sound-groove model, the application further utilize voiceprint extraction model sufficiently to excavate the complete of fisrt feature figure Office's information, so that sound-groove model is determined at least with the second feature figure comprising global information, since second feature figure contains Global information, and the characteristic area that vocal print can be distinguished in fisrt feature figure is strengthened, therefore, at least it is based on second feature Figure is capable of determining that more stable and accurately sound-groove model.

In one possible implementation, in the determining device of sound-groove model provided by the above embodiment, language spectral slice section Obtaining module 401 includes: that feature determines submodule and cutting submodule.

Feature determines submodule, and the phonetic feature of each speech frame for determining the target voice obtains the mesh The phonetic feature sequence of poster sound.

Cutting submodule carries out cutting to the phonetic feature sequence of the trained voice for pressing preset segmentation rules, Obtain at least one language spectral slice section of the target voice.

In one possible implementation, in the determining device of sound-groove model provided by the above embodiment, second feature Obtaining module 403 includes: that fisrt feature figure divides submodule, second feature subgraph determines submodule and second feature figure determines son Module.

Fisrt feature figure divides submodule, for for any fisrt feature figure, which to be divided into difference Multiple fisrt feature subgraphs of frequency range, to obtain multiple fisrt feature subgraphs that each fisrt feature figure includes.

Second feature subgraph determines submodule, for determining the fisrt feature subgraph pair for any fisrt feature subgraph Second feature subgraph answer, comprising global information, to obtain the corresponding second feature subgraph of each fisrt feature subgraph.

Second feature figure determines submodule, for for any fisrt feature figure, by the fisrt feature figure included it is more The corresponding second feature subgraph of a fisrt feature subgraph, form the fisrt feature figure it is corresponding, comprising global information Two characteristic patterns, to obtain the corresponding second feature figure comprising global information of each fisrt feature figure.

In one possible implementation, second feature subgraph determines submodule, is specifically used for passing through three size phases The convolution kernel same, parameter is different carries out dimension-reduction treatment to the fisrt feature subgraph respectively, the feature subgraph after obtaining three width dimensionality reductions; Attention weight is determined by two width feature subgraphs in the feature subgraph after the three width dimensionality reduction;Pass through the attention weight With another width feature subgraph in the feature subgraph after the three width dimensionality reduction, determine that the fisrt feature subgraph is corresponding, it is complete to include The second feature subgraph of office's information.

In one possible implementation, in the determining device of sound-groove model provided by the above embodiment, sound-groove model Determining module 404, specifically at least one fisrt feature figure using the voiceprint extraction module, each language spectral slice section And at least one second feature figure of each language spectral slice section, determine the sound-groove model of the target voice.

In one possible implementation, sound-groove model determining module 404 include: vocal print submodel determine submodule and Sound-groove model determines submodule.

Vocal print submodel determines submodule, for any language spectral slice section for the target voice, passes through the vocal print Model is extracted, at least one fisrt feature figure of the language spectral slice section and at least one second feature figure of the language spectral slice section are carried out Fusion, obtains the vocal print submodel of the language spectral slice section, to obtain the vocal print submodel of each language spectral slice section of the target voice.

Sound-groove model determines submodule, for the vocal print submodel of each language spectral slice section of the target voice to be averaging, Obtain the sound-groove model of the target voice.

In one possible implementation, vocal print submodel determines submodule, incites somebody to action by the voiceprint extraction model At least one fisrt feature figure of the language spectral slice section is merged at least one second feature figure of the language spectral slice section, is somebody's turn to do When the vocal print submodel of language spectral slice section, specifically for by the voiceprint extraction model by each fisrt feature of the language spectral slice section Figure is spliced into high dimensional vector, the first high dimensional vector as the language spectral slice section;By the voiceprint extraction model by the language Each second feature figure of spectral slice section is spliced into high dimension vector, the second high dimensional vector as the language spectral slice section;By described Voiceprint extraction model splices the first high dimensional vector of the language spectral slice section and the second high dimensional vector of the language spectral slice section, Obtain spliced high dimension vector;The spliced high dimension vector is subjected to dimensionality reduction by the voiceprint extraction model, will be dropped Vector after dimension is determined as the vocal print submodel of the language spectral slice section.

Sound-groove model determining device provided by the above embodiment can also include: model construction module.

Model construction module includes: that trained voice obtains module, language spectral slice section obtains module, fisrt feature figure determining module With second feature figure determining module, vocal print submodel determining module, identity label prediction module and parameter updating module.

Training voice obtains module, for obtaining trained voice.

Language spectral slice section obtains module, for obtaining at least one language spectral slice section of the trained voice;

Fisrt feature figure determining module, for determining each of described trained voice by current voiceprint extraction model At least one fisrt feature figure of language spectral slice section.

Wherein, each characteristic point in fisrt feature figure is mutually indepedent.

Wherein, if training for the first time, then current voiceprint extraction model is initial voiceprint extraction model, if not it instructs for the first time Practice, then current voiceprint extraction model is the voiceprint extraction model after premenstrual primary training.

Second feature figure determining module, for determining each language of the trained voice by current voiceprint extraction model The corresponding second feature figure comprising global information of each fisrt feature figure of spectral slice section, to obtain the every of the trained voice At least one second feature figure of a language spectral slice section.

Wherein, the corresponding second feature figure of a fisrt feature figure is to the spy that can distinguish vocal print in the fisrt feature figure Levy the characteristic pattern after region is strengthened;

Vocal print submodel determining module, for each language spectral slice section at least with the trained voice at least one Two characteristic patterns and current voiceprint extraction model determine the vocal print submodel of each language spectral slice section of the trained voice;

Identity label prediction module, for the vocal print submodel according to each language spectral slice section of the trained voice, prediction The corresponding vocal print identity label of each language spectral slice section of the trained voice.

Parameter updating module, for updating current voiceprint extraction model according to the prediction result of identity label prediction module Parameter.

In one possible implementation, vocal print submodel determining module, specifically for for the trained voice Any language spectral slice section, by current voiceprint extraction model by least one fisrt feature figure of the language spectral slice section and the language spectral slice At least one second feature figure of section is merged, and the vocal print submodel of the language spectral slice section is obtained, to obtain the trained voice Each language spectral slice section vocal print submodel.

The embodiment of the present application also provides a kind of sound-groove model locking equipments really, referring to Fig. 5, showing the sound-groove model Really the structural schematic diagram of locking equipment, the equipment may include: at least one processor 501, at least one communication interface 502, At least one processor 503 and at least one communication bus 504;

In the embodiment of the present application, processor 501, communication interface 502, memory 503, communication bus 504 quantity be At least one, and processor 501, communication interface 502, memory 503 complete mutual communication by communication bus 504;

Processor 501 may be a central processor CPU or specific integrated circuit ASIC (Application Specific Integrated Circuit), or be arranged to implement the integrated electricity of one or more of the embodiment of the present invention Road etc.;

Memory 503 may include high speed RAM memory, it is also possible to further include nonvolatile memory (non- Volatile memory) etc., a for example, at least magnetic disk storage;

Wherein, memory is stored with program, the program that processor can call memory to store, and described program is used for:

Obtain at least one language spectral slice section of target voice;

By the voiceprint extraction model pre-established, at least one fisrt feature figure of each language spectral slice section is determined, wherein Each characteristic point in the fisrt feature figure is mutually indepedent;

By the voiceprint extraction model, the second corresponding, comprising its global information spy of each fisrt feature figure is determined Sign figure, obtains at least one second feature figure of each language spectral slice section, wherein the corresponding second feature figure of a fisrt feature figure For the characteristic pattern after strengthening to the characteristic area that can distinguish vocal print in the fisrt feature figure;

At least at least one second feature figure and the voiceprint extraction model of each language spectral slice section, really The sound-groove model of the fixed target voice.

Optionally, the refinement function of described program and extension function can refer to above description.

The embodiment of the present application also provides a kind of readable storage medium storing program for executing, which can be stored with and hold suitable for processor Capable program, described program are used for:

Obtain at least one language spectral slice section of target voice;

By the voiceprint extraction model pre-established, at least one fisrt feature figure of each language spectral slice section is determined, wherein Each characteristic point in the fisrt feature figure is mutually indepedent;

By the voiceprint extraction model, the second corresponding, comprising its global information spy of each fisrt feature figure is determined Sign figure, obtains at least one second feature figure of each language spectral slice section, wherein the corresponding second feature figure of a fisrt feature figure For the characteristic pattern after strengthening to the characteristic area that can distinguish vocal print in the fisrt feature figure;

At least at least one second feature figure and the voiceprint extraction model of each language spectral slice section, really The sound-groove model of the fixed target voice.

Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning Covering non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes that A little elements, but also including other elements that are not explicitly listed, or further include for this process, method, article or The intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence "including a ...", is not arranged Except there is also other identical elements in the process, method, article or apparatus that includes the element.

Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with other The difference of embodiment, the same or similar parts in each embodiment may refer to each other.

The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one The widest scope of cause.

21页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:信息隐写方法、装置、设备及存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!