Punctuation mark adding method, system, mobile terminal and storage medium

文档序号:1465980 发布日期:2020-02-21 浏览:6次 中文

阅读说明:本技术 标点符号添加方法、系统、移动终端及存储介质 (Punctuation mark adding method, system, mobile terminal and storage medium ) 是由 张广学 肖龙源 李稀敏 *** 刘晓葳 王静 于 2019-09-19 设计创作,主要内容包括:本发明适用于语音识别技术领域,提供了一种标点符号添加方法、系统、移动终端及存储介质,该方法包括:获取待识别语音进行语音识别,以得到目标文本;对目标文本进行特征词的提取和标注,并将特征词的标注结果与语言表达习惯进行匹配;当标注结果与语言表达习惯匹配成功时,对提取后的目标文本进行修正条件的判断,根据判断结果对目标文本进行标点符号修正,并输出目标文本;当标注结果与语言表达习惯未匹配成功时,根据语言表达习惯对目标文本进行标点符号添加,并输出目标文本。本发明通过基于停顿检测、特征词的标注结果和语言表达习惯,以对应对文本进行标点符号的辅助修正,防止了由于仅根据3?gram方式进行标点添加所导致的准确率低下的现象。(The invention is suitable for the technical field of voice recognition, and provides a punctuation mark adding method, a punctuation mark adding system, a mobile terminal and a storage medium, wherein the punctuation mark adding method comprises the following steps: acquiring a voice to be recognized and performing voice recognition to obtain a target text; extracting and labeling feature words of the target text, and matching labeling results of the feature words with language expression habits; when the matching of the labeling result and the language expression habit is successful, judging the correction condition of the extracted target text, correcting punctuation marks of the target text according to the judgment result, and outputting the target text; and when the labeling result is not successfully matched with the language expression habit, adding punctuation marks to the target text according to the language expression habit, and outputting the target text. The invention correspondingly corrects punctuation marks on the text by the aid of the marking result and the language expression habit based on the pause detection and the characteristic words, thereby preventing the phenomenon of low accuracy caused by punctuation addition only according to a 3-gram mode.)

1. A punctuation mark adding method, the method comprising:

acquiring a voice to be recognized, and performing voice recognition on the voice to be recognized to obtain a target text;

extracting and labeling feature words of the target text, and matching labeling results of the feature words with language expression habits prestored locally, wherein the feature words comprise nouns, verbs, status words, degree words and auxiliary words;

when the marking result is successfully matched with the language expression habit, judging the correction condition of the extracted target text, performing punctuation correction on the target text according to the judgment result, and outputting the target text;

and when the labeling result is not successfully matched with the language expression habit, punctuation mark correction is carried out on the target text according to the language expression habit, and the target text is output.

2. The punctuation mark adding method of claim 1, wherein before the step of matching the labeling result of the characteristic word with the locally pre-stored language expression habits, the method further comprises:

carrying out pause detection on the target text, and judging whether the target text is correct in sentence break according to a detection result and the language expression habit;

when the target text sentence break is judged to be correct, triggering the matching between the labeling result and the language expression habit;

and when the target text punctuation is judged to be incorrect, directly judging the correction condition of the target text, and adding punctuation marks to the target text according to the judgment result.

3. The punctuation mark adding method according to claim 1, wherein the step of judging the correction condition for the extracted target text comprises:

judging whether a pause exists in a text sentence in the target text;

when the text sentence is judged to have pause, adding a comma at a position corresponding to the pause;

judging whether a parallel relation exists between adjacent feature words in the target text;

when judging that the adjacent feature words have a parallel relation, adding pause signs between the adjacent feature words;

judging whether the sentence patterns of the adjacent text sentences have a parallel relation;

adding a semicolon between the adjacent text sentences when the sentence patterns of the adjacent text sentences are judged to have a parallel relation;

judging whether a special word exists in the text sentence or not;

and adding quotation marks to the special words when the special words exist in the text sentences.

4. The punctuation mark adding method according to claim 3, wherein the step of judging the correction condition for the extracted target text further comprises:

judging whether the text statement is an explanation statement;

when the text statement is judged to be the interpretation specification statement, adding a colon at the tail end of the text statement;

judging whether the text sentence is an question sentence;

adding a question mark at the tail end of the text sentence when the text sentence is judged to be an question sentence;

judging whether the text sentence is an exclamation sentence;

when the text statement is judged to be the exclamation statement, adding an exclamation mark at the tail end of the text statement;

judging whether the text sentence is a statement sentence;

and when the text sentence is judged to be the statement sentence, adding a period at the end of the statement sentence.

5. The punctuation mark adding method of claim 4 wherein the step of judging whether the text sentence is an explanatory sentence comprises:

judging whether the sentence structure of the later section of the text sentence is a noun + subject + predicate structure;

if yes, the text statement is judged to be the explanation statement.

6. The punctuation mark adding method of claim 1 wherein the step of performing speech recognition on the speech to be recognized comprises:

performing phoneme recognition on the speech to be recognized to obtain phoneme data;

and decoding the phoneme data to obtain the target text.

7. A punctuation mark addition system, the system comprising:

the voice recognition module is used for acquiring a voice to be recognized and performing voice recognition on the voice to be recognized to obtain a target text;

the characteristic marking module is used for extracting and marking characteristic words from the target text and matching marking results of the characteristic words with language expression habits prestored locally, wherein the characteristic words comprise nouns, verbs, status words, degree words and auxiliary words;

the first punctuation adding module is used for judging the correction condition of the extracted target text when the marking result is successfully matched with the language expression habit, correcting punctuation marks of the target text according to the judgment result and outputting the target text;

and the second punctuation adding module is used for performing punctuation symbol correction on the target text according to the language expression habit and outputting the target text when the labeling result is not successfully matched with the language expression habit.

8. The punctuation addition system of claim 7 wherein the punctuation addition system further comprises:

the pause detection module is used for carrying out pause detection on the target text and judging whether the sentence break of the target text is correct or not according to a detection result and the language expression habit; when the target text sentence break is judged to be correct, triggering the matching between the labeling result and the language expression habit; and when the target text punctuation is judged to be incorrect, directly judging the correction condition of the target text, and adding punctuation marks to the target text according to the judgment result.

9. A mobile terminal, characterized by comprising a storage device for storing a computer program and a processor for executing the computer program to cause the mobile terminal to execute the punctuation mark addition method according to any one of claims 1 to 6.

10. A storage medium storing a computer program for use in a mobile terminal according to claim 9, the computer program, when executed by a processor, implementing the steps of the punctuation mark addition method according to any one of claims 1 to 6.

Technical Field

The invention belongs to the technical field of voice recognition, and particularly relates to a punctuation mark adding method, a punctuation mark adding system, a mobile terminal and a storage medium.

Background

In recent years, with the rapid development of voice technology, the voice recognition technology is widely applied to various fields of production and life by virtue of the advantages of intelligence, high efficiency and humanization, and is more and more popular with the public. This seriously affects the application of the speech recognition technology in conference recording, speech-to-text, operation and application recording, and so on, and therefore, the problem of adding the punctuation marks in the speech recognition process is more and more emphasized by people.

In the existing punctuation adding process, punctuation is added only according to a 3-gram mode, so that punctuation adding errors are often caused, and only delimiters such as commas and the like can be added for clauses, so that punctuation adding accuracy is low, and converted texts often need a large amount of manual intervention to correct punctuation, so that punctuation adding efficiency is low.

Disclosure of Invention

The embodiment of the invention aims to solve the technical problem that the adding accuracy of the punctuation marks is low because the existing punctuation mark adding is only to correspondingly add punctuation marks according to a 3-gram mode.

The embodiment of the invention is realized in such a way that a punctuation mark adding method comprises the following steps:

acquiring a voice to be recognized, and performing voice recognition on the voice to be recognized to obtain a target text;

extracting and labeling feature words of the target text, and matching labeling results of the feature words with language expression habits prestored locally, wherein the feature words comprise nouns, verbs, status words, degree words and auxiliary words;

when the marking result is successfully matched with the language expression habit, judging the correction condition of the extracted target text, performing punctuation correction on the target text according to the judgment result, and outputting the target text;

and when the labeling result is not successfully matched with the language expression habit, punctuation mark correction is carried out on the target text according to the language expression habit, and the target text is output.

Further, before the step of matching the labeling result of the feature word with the locally pre-stored language expression habit, the method further includes:

carrying out pause detection on the target text, and judging whether the target text is correct in sentence break according to a detection result and the language expression habit;

when the target text sentence break is judged to be correct, triggering the matching between the labeling result and the language expression habit;

and when the target text punctuation is judged to be incorrect, directly judging the correction condition of the target text, and adding punctuation marks to the target text according to the judgment result.

Further, the step of determining the correction condition for the extracted target text includes:

judging whether a pause exists in a text sentence in the target text;

when the text sentence is judged to have pause, adding a comma at a position corresponding to the pause;

judging whether a parallel relation exists between adjacent feature words in the target text;

when judging that the adjacent feature words have a parallel relation, adding pause signs between the adjacent feature words;

judging whether the sentence patterns of the adjacent text sentences have a parallel relation;

adding a semicolon between the adjacent text sentences when the sentence patterns of the adjacent text sentences are judged to have a parallel relation;

judging whether a special word exists in the text sentence or not;

and adding quotation marks to the special words when the special words exist in the text sentences.

Further, the step of determining the correction condition for the extracted target text further includes:

judging whether the text statement is an explanation statement;

when the text statement is judged to be the interpretation specification statement, adding a colon at the tail end of the text statement;

judging whether the text sentence is an question sentence;

adding a question mark at the tail end of the text sentence when the text sentence is judged to be an question sentence;

judging whether the text sentence is an exclamation sentence;

when the text statement is judged to be the exclamation statement, adding an exclamation mark at the tail end of the text statement;

judging whether the text sentence is a statement sentence;

and when the text sentence is judged to be the statement sentence, adding a period at the end of the statement sentence.

Further, the step of determining whether the text sentence is an explanatory sentence includes:

judging whether the sentence structure of the later section of the text sentence is a noun + subject + predicate structure;

if yes, judging and booking the text sentence as the explanation sentence.

Further, the step of performing speech recognition on the speech to be recognized includes:

performing phoneme recognition on the speech to be recognized to obtain phoneme data;

and decoding the phoneme data to obtain the target text.

Another object of an embodiment of the present invention is to provide a punctuation mark adding system, including:

the voice recognition module is used for acquiring a voice to be recognized and performing voice recognition on the voice to be recognized to obtain a target text;

the characteristic marking module is used for extracting and marking characteristic words from the target text and matching marking results of the characteristic words with language expression habits prestored locally, wherein the characteristic words comprise nouns, verbs, status words, degree words and auxiliary words;

the first punctuation adding module is used for judging the correction condition of the extracted target text when the marking result is successfully matched with the language expression habit, correcting punctuation marks of the target text according to the judgment result and outputting the target text;

and the second punctuation adding module is used for performing punctuation symbol correction on the target text according to the language expression habit and outputting the target text when the labeling result is not successfully matched with the language expression habit.

Furthermore, the punctuation mark adding system further comprises:

the pause detection module is used for carrying out pause detection on the target text and judging whether the sentence break of the target text is correct or not according to a detection result and the language expression habit; when the target text sentence break is judged to be correct, triggering the matching between the labeling result and the language expression habit; and when the target text punctuation is judged to be incorrect, directly judging the correction condition of the target text, and adding punctuation marks to the target text according to the judgment result.

Another object of an embodiment of the present invention is to provide a mobile terminal, including a storage device and a processor, where the storage device is used to store a computer program, and the processor runs the computer program to make the mobile terminal execute the punctuation mark adding method described above.

Another object of an embodiment of the present invention is to provide a storage medium, which stores a computer program used in the mobile terminal, wherein the computer program, when executed by a processor, implements the steps of the punctuation mark adding method.

According to the embodiment of the invention, the punctuation mark is correspondingly corrected on the text based on the pause detection, the marking result of the characteristic word and the language expression habit, so that the phenomenon of low accuracy caused by punctuation mark addition only according to a 3-gram mode is prevented, the punctuation mark is correspondingly corrected at the position meeting the correction condition through the design of judging the correction condition of the extracted target text, and the punctuation mark addition is directly designed according to the language expression habit when the marking result is judged to be not successfully matched with the language expression habit, so that the punctuation mark addition of the target text is effectively assisted and corrected, and the punctuation mark addition accuracy is improved.

Drawings

FIG. 1 is a flowchart of a punctuation mark adding method according to a first embodiment of the present invention;

FIG. 2 is a flowchart of a punctuation mark adding method according to a second embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a punctuation mark adding system according to a third embodiment of the present invention;

fig. 4 is a schematic structural diagram of a mobile terminal according to a fourth embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In order to explain the technical means of the present invention, the following description will be given by way of specific examples.

In the existing punctuation adding process, punctuation is added only according to a 3-gram mode, so that the punctuation adding accuracy is low, and the converted text usually needs a large amount of manual intervention to correct punctuation.

14页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:调查问卷创建方法、装置、介质及电子设备

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!