Quality evaluation method for automatic voice data annotation

文档序号:925430 发布日期:2021-03-02 浏览:7次 中文

阅读说明:本技术 一种语音数据自动标注的质量评估方法 (Quality evaluation method for automatic voice data annotation ) 是由 何俊 张彩庆 周义方 申时凯 岳为好 于 2020-11-20 设计创作,主要内容包括:本发明提供一种语音数据自动标注的质量评估方法,包括:基于质量关键指标,预先构建自动标注语音数据的质量规则库;读取需要被检测的自动标注语音数据,依据所述质量关键指标对所述需要被检测的自动标注语音数据进行质量检测,以完成质量度量;根据所述质量度量的结果更新自动标注语音数据集;将更新后的所述自动标注语音数据集转换为新规则导入所述质量规则库。本发明方法弥补了将传统数据标注质量评估方法用于机器自动标注数据存在的不足;对推动小语种语言语音智能化发展进程具有非常积极的支撑作用。(The invention provides a quality evaluation method for automatic voice data annotation, which comprises the following steps: based on the quality key indexes, a quality rule base for automatically labeling the voice data is constructed in advance; reading the automatic labeling voice data to be detected, and performing quality detection on the automatic labeling voice data to be detected according to the quality key indexes to finish quality measurement; updating an automatically labeled voice data set according to the result of the quality measurement; and converting the updated automatic labeling voice data set into a new rule and importing the new rule into the quality rule base. The method makes up the defects of the traditional data labeling quality evaluation method used for automatically labeling data by a machine; the method has a very positive support effect on promoting the intelligent development process of the Chinese language voice.)

1. A quality evaluation method for automatic voice data annotation is characterized by comprising the following steps:

the method comprises the following steps: based on the quality key indexes, a quality rule base for automatically labeling the voice data is constructed in advance;

the quality key indicators include: word error rate WER, sentence error rate SER, bias error characteristic error rate PAR and user feedback error rate CER;

step two: reading the automatic labeling voice data to be detected, and performing quality detection on the automatic labeling voice data to be detected according to the quality key indexes to finish quality measurement;

step three: updating an automatically labeled voice data set according to the result of the quality measurement;

step four: and converting the updated automatic labeling voice data set into a new rule and importing the new rule into the quality rule base.

2. The method for evaluating the quality of automatic labeling of voice data according to claim 1, wherein the step one of constructing the quality rule base of the automatic labeling voice data comprises the following steps:

step 11, generating a basic rule layer; generating a basic rule according to the quality key index, and using the basic rule as a basic standard of a rule base; the basic rule layer comprises a rule which is constructed in advance, and the rule import operation is not carried out in the quality evaluation process;

step 12, generating a self-defined rule layer; respectively generating data marking rules according to the business requirement definition rules; the data annotation rule comprises: automatic voice data labeling rules and Chinese data labeling rules; leading a new rule generated in the quality evaluation process into a self-defined rule layer for storage;

step 13, generating a user rule layer; testing a quality result fed back by a user, collecting feedback opinions by adopting a uniform text template, and warehousing after manual examination to generate a new rule;

step 14, rule detection; detecting whether all the rules have conflicts logically, and detecting the rules with the logical conflicts after modifying the rules until all the logical conflicts disappear;

step 15: and taking the detected rule base as the quality rule base.

3. The method for evaluating the quality of automatic labeling of voice data according to claim 2, wherein the second step comprises the steps of:

step 21: acquiring the automatic labeling voice data to be detected, separating labels for labeling words, sentences and error characteristics in the data respectively, and storing the labels as a word labeling set, a sentence labeling set and an error labeling set;

step 22: comparing the word label set with word error rate rules in the quality rule base one by one, respectively recording the number of label error words, then calculating the word error rate and recording the error position and type;

step 23: comparing the sentence marking set with the sentence error rate rules in the quality rule base one by one, recording the number of marked error sentences, calculating the sentence error rate and recording the error position and type;

step 24: comparing the bias error labeling set with the bias error feature rules in the quality rule base one by one, recording the number of sentences not labeled with the bias error features, calculating the non-labeling rate of the bias error features and recording the error positions and types;

step 25: respectively comparing the word label set, the sentence label set and the error label set with a user feedback error rule one by one, recording errors found through comparison, then calculating a user feedback error rate and recording the error position and type;

step 26: performing formula calculation according to the word error rate, sentence error rate, bias error characteristic unmarked rate and user feedback error rate to obtain the quality score of each automatic marked voice data set;

wherein, the quality scoring formula is as follows:

AQM=(q1WER+q2SER+q3PAR+q4CER) 100%, wherein: q. q.s1、q2、q3、q4Respectively represent the weight of the quality key index and satisfyAQM denotes the quality score.

4. The method of claim 3, wherein the word error rate is calculated as: the WER is (S + D + I)/N, S represents the number of the tagged error words needing to be replaced, D represents the number of the tagged error words needing to be deleted, I represents the number of the tagged error words needing to be inserted, and N represents the total number of tagged words; corresponding to the basic rule layer and the self-defined rule layer;

the sentence error rate is calculated as: the SER is EN/TN, EN represents the number of wrongly labeled sentences, if a word in the sentence is wrongly labeled, the sentence is judged to be wrong, and TN represents the total number of labeled sentences; corresponding to the basic rule layer and the self-defined rule layer;

the bias error feature error rate is calculated as: PAR is AN/TN, wherein AN represents the number of sentences without error characteristic labeling, and TN represents the total number of error characteristics in the rule base; corresponding to the basic rule layer and the self-defined rule layer;

the user feedback error rate is calculated as: CER ═ w1*B1+w2*B2+w3*B3) N, wherein B1、B2、B3Respectively representing the word error labeling quantity, sentence error quantity and bias error characteristic error quantity, w of the sample data in the user feedback error rule1、w2、w3Representing corresponding weight, N represents the sum of three types of errors in the sample data; corresponding to the user rule layer.

5. The method for evaluating the quality of automatic labeling of voice data according to claim 2, wherein the step 13 comprises the steps of:

step 131: a rule template is preset and used for filling the template and feeding back when a test user finds the quality problem of the automatically labeled voice data set; the user feedback generated by the rule template can be directly read and imported into the quality rule base;

step 132: and manually checking the normative and the rationality of the feedback opinions, and if the feedback opinions are checked to be passed, importing the feedback opinions into the quality rule base, otherwise, not importing the feedback opinions.

6. The method for evaluating the quality of automatic labeling of voice data according to claim 3, wherein the third step comprises:

step 31: setting a quality threshold, and if the quality score is larger than the threshold, not updating;

step 32: if the quality score is smaller than the quality threshold value, updating; according to the error position and type recorded in the quality grading process, label deletion, replacement and insertion processing are respectively carried out on the automatically marked voice data;

step 33: and after the updating treatment, performing quality evaluation again until the quality score is greater than the threshold value.

7. The method for evaluating the quality of automatic labeling of voice data according to claim 2, wherein the fourth step comprises:

(1) classifying and storing labels separated and recorded in the evaluation process of the automatic labeling voice data set;

(2) converting the classification labels into a format of a rule template according to a preset rule template to generate a new rule;

(3) and importing the new rule into a self-defined rule layer of a quality rule base for storage.

Technical Field

The invention relates to the technical field of language information processing, in particular to a quality evaluation method for automatic voice data annotation.

Background

In recent years, automatic data annotation has become a key basic technology in the field of artificial intelligence, and it is desired to automatically annotate data by a machine instead of human power, and great progress has been made in automatic data annotation in the field of images and the like. Extreme scarcity of voice labeling data has become a key factor for restricting the voice recognition performance of national minority languages. Due to the influence of factors such as original data quality, manual errors, model limitations and the like, data annotation errors are difficult to avoid, so that the introduction of an effective quality assessment method is very important, the data annotation standards are not uniform, the annotation quality is uneven, and the application and development of data annotation are greatly hindered.

The existing artificial data labeling quality assessment method mainly comprises the following steps: firstly, a data labeling quality assessment method based on manual participation mainly obtains an assessment conclusion through sampling analysis of a quality inspector; secondly, the quality evaluation method based on the probability model mainly realizes quality evaluation and error correction through the sampling statistics of the quality of the labeled data. However, these methods are mainly used for quality evaluation of manual data labeling, and are not suitable for quality evaluation of automatic data labeling, mainly because of the great difference between the error formation reason, the quality problem type and the law of automatic machine labeling and manual machine labeling.

Disclosure of Invention

The invention aims to solve the defects in the prior art, provides a quality evaluation method for automatic voice data annotation, and solves the following problems: firstly, the quality of the voice automatic labeling data finished by the machine is evaluated, and the quality problems of label error, label leakage and the like in the labeling data are found, so that the quality of the automatic labeling of the data is improved. Secondly, aiming at the essential difference between automatic labeling and manual labeling, a logic reasoning mechanism based on a rule base is introduced based on the existing probability model quality evaluation method, rules are formed according to common quality problems in automatic labeling data, and quality evaluation and measurement are realized through rule comparison.

A quality evaluation method for automatic voice data annotation comprises the following steps:

the method comprises the following steps: based on the quality key indexes, a quality rule base for automatically labeling the voice data is constructed in advance;

the quality key indicators include: word error rate WER, sentence error rate SER, bias error characteristic error rate PAR and user feedback error rate CER;

step two: reading the automatic labeling voice data to be detected, and performing quality detection on the automatic labeling voice data to be detected according to the quality key indexes to finish quality measurement;

step three: updating an automatically labeled voice data set according to the result of the quality measurement;

step four: and converting the updated automatic labeling voice data set into a new rule and importing the new rule into the quality rule base.

Further, in the above method for evaluating quality of automatic labeling of voice data, the step one of constructing a quality rule base of automatic labeling of voice data includes the following steps:

step 11, generating a basic rule layer; generating a basic rule according to the quality key index, and using the basic rule as a basic standard of a rule base; the basic rule layer comprises a rule which is constructed in advance, and the rule import operation is not carried out in the quality evaluation process;

step 12, generating a self-defined rule layer; respectively generating data marking rules according to the business requirement definition rules; the data annotation rule comprises: automatic voice data labeling rules and Chinese data labeling rules; leading a new rule generated in the quality evaluation process into a self-defined rule layer for storage;

step 13, generating a user rule layer; testing a quality result fed back by a user, collecting feedback opinions by adopting a uniform text template, and warehousing after manual examination to generate a new rule;

step 14, rule detection; detecting whether all the rules have conflicts logically, and detecting the rules with the logical conflicts after modifying the rules until all the logical conflicts disappear;

step 15: and taking the detected rule base as the quality rule base.

Further, the above-mentioned quality evaluation method for automatic labeling of voice data, the second step includes the following steps:

step 21: acquiring the automatic labeling voice data to be detected, separating labels for labeling words, sentences and error characteristics in the data respectively, and storing the labels as a word labeling set, a sentence labeling set and an error labeling set;

step 22: comparing the word label set with word error rate rules in the quality rule base one by one, respectively recording the number of label error words, then calculating the word error rate and recording the error position and type;

step 23: comparing the sentence marking set with the sentence error rate rules in the quality rule base one by one, recording the number of marked error sentences, calculating the sentence error rate and recording the error position and type;

step 24: comparing the bias error labeling set with the bias error feature rules in the quality rule base one by one, recording the number of sentences not labeled with the bias error features, calculating the non-labeling rate of the bias error features and recording the error positions and types;

step 25: respectively comparing the word label set, the sentence label set and the error label set with a user feedback error rule one by one, recording errors found through comparison, then calculating a user feedback error rate and recording the error position and type;

step 26: performing formula calculation according to the word error rate, sentence error rate, bias error characteristic unmarked rate and user feedback error rate to obtain the quality score of each automatic marked voice data set;

wherein, the quality scoring formula is as follows:

AQM=(q1WER+q2SER+q3PAR+q4CER) 100%, wherein: q. q.s1、q2、q3、q4Respectively represent the weight of the quality key index and satisfyAQM denotes the quality score.

Further, in the above method for evaluating the quality of automatic labeling of voice data, the word error rate is calculated as: the WER is (S + D + I)/N, S represents the number of the tagged error words needing to be replaced, D represents the number of the tagged error words needing to be deleted, I represents the number of the tagged error words needing to be inserted, and N represents the total number of tagged words; corresponding to the basic rule layer and the self-defined rule layer;

the sentence error rate is calculated as: the SER is EN/TN, EN represents the number of wrongly labeled sentences, if a word in the sentence is wrongly labeled, the sentence is judged to be wrong, and TN represents the total number of labeled sentences; corresponding to the basic rule layer and the self-defined rule layer;

the bias error feature error rate is calculated as: PAR is AN/TN, wherein AN represents the number of sentences without error characteristic labeling, and TN represents the total number of error characteristics in the rule base; corresponding to the basic rule layer and the self-defined rule layer;

the user feedback error rate is calculated as: CER ═ w1*B1+w2*B2+w3*B3) N, wherein B1、B2、B3Respectively representing the word error labeling quantity, sentence error quantity and bias error characteristic error quantity, w of the sample data in the user feedback error rule1、w2、w3Representing corresponding weight, N represents the sum of three types of errors in the sample data; corresponding to the user rule layer.

Further, in the method for evaluating the quality of automatic labeling of voice data as described above, the step 13 includes the following steps:

step 131: a rule template is preset and used for filling the template and feeding back when a test user finds the quality problem of the automatically labeled voice data set; the user feedback generated by the rule template can be directly read and imported into the quality rule base;

step 132: and manually checking the normative and the rationality of the feedback opinions, and if the feedback opinions are checked to be passed, importing the feedback opinions into the quality rule base, otherwise, not importing the feedback opinions.

Further, the method for evaluating the quality of the automatic annotation of voice data as described above, the third step includes:

step 31: setting a quality threshold, and if the quality score is larger than the threshold, not updating;

step 32: if the quality score is smaller than the quality threshold value, updating; according to the error position and type recorded in the quality grading process, label deletion, replacement and insertion processing are respectively carried out on the automatically marked voice data;

step 33: and after the updating treatment, performing quality evaluation again until the quality score is greater than the threshold value.

Further, the method for evaluating the quality of the automatic labeling of the voice data as described above, the fourth step includes:

step 41: classifying and storing labels separated and recorded in the evaluation process of the automatic labeling voice data set;

step 42: converting the classification labels into a format of a rule template according to a preset rule template to generate a new rule;

step 43: and importing the new rule into a self-defined rule layer of a quality rule base for storage.

The invention has the advantages that:

the invention relates to a quality evaluation method specially aiming at automatic labeling of voice data, which is greatly different from the existing manual data labeling or semi-automatic data labeling method.

Secondly, the invention adopts the 'logical reasoning' based on the rule base to realize the quality evaluation, which is different from the existing manual evaluation and probability model evaluation methods. And the rule base is layered, and multi-level evaluation indexes such as traditional errors, dialect bias errors, user feedback and the like are processed, so that the comprehensive and effective evaluation method is ensured.

Specifically, most of the existing deep learning methods are based on probabilistic models, and the data automatically labeled by voice is obtained based on probabilistic models such as neural networks. Therefore, the same theoretical method cannot be used for quality evaluation. The method using the rule base has the following advantages:

1. human-summarized quality assessment experience can be expressed in the form of rules (knowledge) and reused;

2. the quality detection mode based on the rule base is adopted, the defects of automatic labeling results obtained by machine learning model training (such as insufficient data samples, overfitting, model defects and the like) are overcome, and mutual combination and complementation of human logic knowledge and probability model training results based on data are really realized. And the data labeling quality is greatly improved.

3. Errors occurring in the automatic data marking by the machine are regularly and frequently repeated and occur in large quantity, so that the method of the rule base can more easily identify the errors (the score value is greatly reduced due to one type of errors), and the errors are processed.

Thirdly, the invention introduces a user feedback mechanism, which is an error correction mechanism for avoiding label missing, label error and the like caused by automatic labeling of a machine.

The invention has the positive effects that:

(1) the defects that the traditional data labeling quality evaluation method is used for automatically labeling data by a machine are overcome;

(2) a special data annotation quality evaluation method is designed for voice (particularly minority national languages with large influence of dialect and Chinese borrowed words). The method has a very positive support effect on promoting the intelligent development process of the Chinese language voice.

Drawings

FIG. 1 is a flowchart of a quality evaluation method for automatic labeling of voice data according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention are described clearly and completely below, and it is obvious that the described embodiments are some, not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

And automatically labeling a quality detection mechanism of the data. Because human beings and machines have a gap in understanding the 'labeling error', and certain difficulty exists in quality detection of a large amount of automatically labeled data automatically completed by a computer, the original manual method needs to be properly assisted. The basic idea of the detection mechanism design is as follows: establishing a quality evaluation key index system, extracting points which are easy to make errors in the labeling process, and establishing a rule base, wherein the points are as follows: sentence head, sentence tail words, Chinese borrowed vocabulary, branch language difference vocabulary, etc. In addition, a test user needs to be introduced to enrich the rule base step by using a feedback mechanism.

A data annotation quality measurement method. The automatic data labeling method adopts a learning model based on probability, and the quality measurement needs a more accurate method, so the method adopts a measurement method based on key indexes. Establishing an index system including factors such as word error rate, sentence error rate, characteristic error rate, user feedback rate and the like, continuously optimizing index weight through quality evaluation of a large amount of automatically labeled data, and optimizing parameters by using a quality feedback mechanism to continuously improve model performance.

The automatic data labeling model mainly adopts a probability model method, and the method based on rule logic reasoning is more effective in detecting the labeling quality. At present, the data cleaning technology based on the rule base in the technical field of big data is mature, and the invention aims to research the methods and construct a 'voice automatic labeling data quality detection model based on the rule base' on the basis. The establishment of the rule base and the key indexes in the model is crucial, a user feedback mechanism is introduced on the basis of self-established indexes, the error-prone points and common problems of the labeling are found in time, the contents of the key index base are enriched continuously, and the accuracy of the quality detection of the labeling data is improved gradually.

Fig. 1 is a flowchart of a quality evaluation method for automatic labeling of voice data according to the present invention, as shown in fig. 1, the method includes the following steps:

the method comprises the following steps: based on the quality key indexes, a quality rule base for automatically labeling the voice data is constructed in advance;

the quality key indicators include: word error rate WER, sentence error rate SER, bias error characteristic error rate PAR and user feedback error rate CER;

step two: reading the automatic labeling voice data to be detected, and performing quality detection on the automatic labeling voice data to be detected according to the quality key indexes to finish quality measurement;

step three: updating an automatically labeled voice data set according to the result of the quality measurement;

step four: and importing the updated automatic labeling voice data set into the quality rule base.

Preferably, the constructing of the quality rule base of the automatically labeled speech data in the first step includes the following steps:

step 11, generating a basic rule layer; generating a basic rule according to the quality key index, and using the basic rule as a basic standard of a rule base; the basic rule layer comprises a rule which is constructed in advance, and the rule import operation is not carried out in the quality evaluation process;

step 12, generating a self-defined rule layer; respectively generating data marking rules according to the business requirement definition rules; the data annotation rule comprises: automatic voice data labeling rules and Chinese data labeling rules; leading a new rule generated in the quality evaluation process into a self-defined rule layer for storage;

step 13, generating a user rule layer; testing a quality result fed back by a user, collecting feedback opinions by adopting a uniform text template, and warehousing after manual examination to generate a new rule;

step 14, rule detection; detecting whether all the rules have conflicts logically, and detecting the rules with the logical conflicts after modifying the rules until all the logical conflicts disappear;

step 15: and taking the detected rule base as the quality rule base.

Wherein the step 13 comprises the steps of:

step 131: a rule template is preset and used for filling the template and feeding back when a test user finds the quality problem of the automatically labeled voice data set; the user feedback generated by the rule template can be directly read and imported into the quality rule base;

step 132: and manually checking the normative and the rationality of the feedback opinions, and if the feedback opinions are checked to be passed, importing the feedback opinions into the quality rule base, otherwise, not importing the feedback opinions.

Preferably, the second step comprises the steps of:

step 21: acquiring the automatic labeling voice data to be detected, separating labels for labeling words, sentences and error characteristics in the data respectively, and storing the labels as a word labeling set, a sentence labeling set and an error labeling set;

step 22: comparing the word label set with word error rate rules in the quality rule base one by one, respectively recording the number of label error words, then calculating the word error rate and recording the error position and type;

step 23: comparing the sentence marking set with the sentence error rate rules in the quality rule base one by one, recording the number of marked error sentences, calculating the sentence error rate and recording the error position and type;

step 24: comparing the bias error labeling set with the bias error feature rules in the quality rule base one by one, recording the number of sentences not labeled with the bias error features, calculating the non-labeling rate of the bias error features and recording the error positions and types;

step 25: respectively comparing the word label set, the sentence label set and the error label set with a user feedback error rule one by one, recording errors found through comparison, then calculating a user feedback error rate and recording the error position and type;

step 26: performing formula calculation according to the word error rate, sentence error rate, bias error characteristic unmarked rate and user feedback error rate to obtain the quality score of each automatic marked voice data set;

wherein, the quality scoring formula is as follows:

AQM=(q1WER+q2SER+q3PAR+q4CER) 100%, wherein: q. q.s1、q2、q3、q4Respectively represent the weight of the quality key index and satisfyAQM denotes the quality score.

Preferably, the word error rate is calculated as: the WER is (S + D + I)/N, S represents the number of the tagged error words needing to be replaced, D represents the number of the tagged error words needing to be deleted, I represents the number of the tagged error words needing to be inserted, and N represents the total number of tagged words; corresponding to the basic rule layer and the self-defined rule layer;

the sentence error rate is calculated as: the SER is EN/TN, EN represents the number of wrongly labeled sentences, if a word in the sentence is wrongly labeled, the sentence is judged to be wrong, and TN represents the total number of labeled sentences; corresponding to the basic rule layer and the self-defined rule layer;

the bias error feature error rate is calculated as: PAR is AN/TN, wherein AN represents the number of sentences without error characteristic labeling, and TN represents the total number of error characteristics in the rule base; corresponding to the basic rule layer and the self-defined rule layer;

the user feedback error rate is calculated as: CER ═ w1*B1+w2*B2+w3*B3) N, wherein B1、B2、B3Respectively representing the word error labeling quantity, sentence error quantity and bias error characteristic error quantity, w of the sample data in the user feedback error rule1、w2、w3Representing corresponding weight, N represents the sum of three types of errors in the sample data; corresponding to the user rule layer.

Preferably, the third step includes:

step 31: setting a quality threshold, and if the quality score is larger than the threshold, not updating;

step 32: if the quality score is smaller than the quality threshold value, updating; according to the error position and type recorded in the quality grading process, label deletion, replacement and insertion processing are respectively carried out on the automatically marked voice data;

step 33: and after the updating treatment, performing quality evaluation again until the quality score is greater than the threshold value.

Preferably, the fourth step includes:

step 41: classifying and storing labels separated and recorded in the evaluation process of the automatic labeling voice data set;

step 42: converting the classification labels into a format of a rule template according to a preset rule template to generate a new rule;

step 43: and importing the new rule into a self-defined rule layer of a quality rule base for storage.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

10页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种基于图卷积神经网络的语音关键词识别系统及方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!