The method and apparatus of speech detection

文档序号:1776697 发布日期:2019-12-03 浏览:15次 中文

阅读说明:本技术 语音检测的方法和装置 (The method and apparatus of speech detection ) 是由 郭红敬 李国梁 王鑫山 杨柯 朱虎 于 2018-03-26 设计创作,主要内容包括:一种语音检测的方法和装置,该方法包括:确定待处理数据中的第一数据块的N个分组中的每个分组的能量,其中,N为正整数(S110);根据N个分组的能量,确定初始的候选噪声集合和初始的候选语音集合,其中,初始的候选噪声集合中的分组的最大能量小于初始的候选语音集合中的分组的最小能量(S120);根据初始的候选噪声集合中的每个分组的能量,确定初始的噪声门限(S130);根据初始的候选语音集合以及初始的噪声门限,确定第一次迭代处理的候选噪声集合和第一次迭代处理的候选语音集合,其中,第一次迭代处理的候选噪声集合中的分组的能量均小于或等于初始的噪声门限,第一次迭代处理的候选语音集合中的分组的能量均大于初始的噪声门限(S140)。(A kind of method and apparatus of speech detection, this method comprises: determining the energy of each grouping in N number of grouping of the first data block in pending data, wherein N is positive integer (S110);According to the energy of N number of grouping, initial candidate noise set and initial candidate speech set are determined, wherein the ceiling capacity of the grouping in initial candidate noise set is less than the least energy (S120) of the grouping in initial candidate speech set;According to the energy of each grouping in initial candidate noise set, initial noise gate (S130) is determined;According to initial candidate speech set and initial noise gate, determine the candidate noise set of first time iterative processing and the candidate speech set of first time iterative processing, wherein, the energy of grouping in the candidate noise set of first time iterative processing is respectively less than or equal to initial noise gate, and the energy of the grouping in the candidate speech set of first time iterative processing is all larger than initial noise gate (S140).)

A method of speech detection, comprising:

determining an energy of each of N packets of a first data block in data to be processed, wherein N is a positive integer;

determining an initial candidate noise set and an initial candidate voice set according to the energy of the N groups, wherein the maximum energy of the groups in the initial candidate noise set is less than the minimum energy of the groups in the initial candidate voice set;

determining an initial noise threshold according to the energy of each group in the initial candidate noise set;

and determining a candidate noise set processed for the first iteration and a candidate voice set processed for the first iteration according to the initial candidate voice set and the initial noise threshold, wherein the energy of the packets in the candidate noise set processed for the first iteration is less than or equal to the initial noise threshold, and the energy of the packets in the candidate voice set processed for the first iteration is greater than the initial noise threshold.

The method of claim 1, further comprising:

determining a noise threshold of the kth iteration according to the energy of each group in the candidate noise set of the kth iteration, wherein k is 1,2, … …;

and determining a candidate noise set of the (k + 1) th iterative processing and a candidate voice set of the (k + 1) th iterative processing according to the candidate voice set of the (k) th iterative processing and the noise threshold of the (k) th iterative processing.

The method of claim 2, further comprising:

and if the energy of the grouping in the candidate voice set subjected to the kth iteration processing is larger than the noise threshold of the kth iteration processing, determining that the candidate voice set subjected to the kth iteration processing is a target voice set, and determining that the candidate noise set subjected to the kth iteration processing is a target noise set.

The method of claim 2, further comprising:

and when the iteration number k reaches an iteration upper limit, determining the candidate voice set processed by the kth iteration number as a target voice set, and determining the candidate noise set processed by the kth iteration number as a target noise set.

The method according to claim 3 or 4, characterized in that the method further comprises:

arranging the packets in the target voice set according to a time sequence;

and determining the updated target voice set according to the time interval between adjacent groups in the target voice set.

The method of claim 5, wherein the determining the updated target speech set according to the time interval between adjacent packets in the target speech set comprises:

if the time interval between two adjacent groups in the target voice set is smaller than a preset threshold, determining that other groups between the two adjacent groups are also voice signals, and adding the other groups between the two adjacent groups to the target voice set to obtain the updated target voice set.

The method according to any one of claims 1 to 6, wherein said determining an initial noise threshold from the energy of each packet in the initial set of candidate noises comprises:

determining an initial noise power according to the energy of each packet in the initial candidate noise set;

and determining the initial noise threshold as a result of multiplying the initial noise power by a threshold factor, wherein the threshold factor is determined according to a target false alarm probability.

The method of claim 7, wherein the first data block is a first data block in the data to be processed, and wherein determining an initial noise power according to an energy of each packet in the initial set of candidate noises comprises:

determining an average of the energy of each packet in the initial set of candidate noises as the initial noise power.

The method of claim 7, wherein the first data block is a non-first data block in the data to be processed, a previous data block of the first data block is a second data block, and wherein determining an initial noise power according to an energy of each packet in the initial set of candidate noises comprises:

and determining the initial noise power of the first data block according to the target noise power of the second data block and the estimated noise power of the first data block, wherein the estimated noise power of the first data block is the average value of the energy of each group in the initial candidate noise set of the first data block, and the target noise power of the second data block is the average value of the energy of each group in the target noise set of the second data block.

The method according to any one of claims 1 to 9, wherein determining an initial set of candidate noises and an initial set of candidate voices based on the energies of the N packets comprises:

determining a proportion of the N packets with lower energy as the initial candidate noise set, and determining other packets in the N packets as the initial candidate voice set; or

And determining a certain number of packets with lower energy in the N packets as the initial candidate noise set, and determining other packets in the N packets as the initial candidate voice set.

An apparatus for speech detection, comprising a determining module configured to:

determining an energy of each of N packets of a first data block in data to be processed, wherein N is a positive integer;

determining an initial candidate noise set and an initial candidate voice set according to the energy of the N groups, wherein the maximum energy of the groups in the initial candidate noise set is less than the minimum energy of the groups in the initial candidate voice set;

determining an initial noise threshold according to the energy of each group in the initial candidate noise set;

and determining a candidate noise set processed for the first iteration and a candidate voice set processed for the first iteration according to the initial candidate voice set and the initial noise threshold, wherein the energy of the packets in the candidate noise set processed for the first iteration is less than or equal to the initial noise threshold, and the energy of the packets in the candidate voice set processed for the first iteration is greater than the initial noise threshold.

The apparatus of claim 11, wherein the determining module is further configured to:

determining a noise threshold of the kth iteration according to the energy of each group in the candidate noise set of the kth iteration, wherein k is 1,2, … …;

and determining a candidate noise set of the (k + 1) th iterative processing and a candidate voice set of the (k + 1) th iterative processing according to the candidate voice set of the (k) th iterative processing and the noise threshold of the (k) th iterative processing.

The apparatus of claim 12, wherein the determining module is further configured to:

and when the iteration number k reaches an iteration upper limit, determining the candidate voice set processed by the kth iteration number as a target voice set, and determining the candidate noise set processed by the kth iteration number as a target noise set.

The apparatus of claim 12, wherein the determining module is further configured to:

and if the energy of the grouping in the candidate voice set subjected to the kth iteration processing is larger than the noise threshold of the kth iteration processing, determining that the candidate voice set subjected to the kth iteration processing is a target voice set, and determining that the candidate noise set subjected to the kth iteration processing is a target noise set.

The apparatus of claim 13 or 14, wherein the determining module is further configured to:

arranging the packets in the target voice set according to a time sequence;

and determining the updated target voice set according to the time interval between adjacent groups in the target voice set.

The apparatus of claim 15, wherein the determining module is specifically configured to:

if the time interval between two adjacent groups in the target voice set is smaller than a preset threshold, determining that other groups between the two adjacent groups are also voice signals, and adding the other groups between the two adjacent groups to the target voice set to obtain the updated target voice set.

The apparatus according to any one of claims 11 to 16, wherein the determining module is specifically configured to:

determining an initial noise power according to the energy of each packet in the initial candidate noise set;

and determining the initial noise threshold as a result of multiplying the initial noise power by a threshold factor, wherein the threshold factor is determined according to a target false alarm probability.

The apparatus of claim 17, wherein the first data block is a first data block in the to-be-processed data, and the determining module is specifically configured to:

determining an average of the energy of each packet in the initial set of candidate noises as the initial noise power.

The apparatus of claim 17, wherein the first data block is a non-first data block in the data to be processed, a previous data block of the first data block is a second data block, and the determining module is specifically configured to:

and determining the initial noise power of the first data block according to the target noise power of the second data block and the estimated noise power of the first data block, wherein the estimated noise power of the first data block is the average value of the energy of each group in the initial candidate noise set of the first data block, and the target noise power of the second data block is the average value of the energy of each group in the target noise set of the second data block.

The apparatus of any of claims 11-19, wherein the determining module is further configured to:

determining a proportion of the N packets with lower energy as the initial candidate noise set, and determining other packets in the N packets as the initial candidate voice set; or

And determining a certain number of packets with lower energy in the N packets as the initial candidate noise set, and determining other packets in the N packets as the initial candidate voice set.

1页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:记录管理装置及方法、计算机程序以及记录介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!