Text error correction method, device, terminal and storage medium

文档序号：1378979 发布日期：2020-08-14 浏览：14次中文

阅读说明：本技术 文本的纠错方法、装置、终端、及存储介质 (Text error correction method, device, terminal and storage medium ) 是由郭晗暄单彦会李娜郑文彬罗红于 2020-04-17 设计创作，主要内容包括：本发明实施例涉及人工智能领域,公开了一种文本的纠错方法、装置、终端、及计算机可读存储介质。本发明中,所述文本的纠错方法,包括：获取待纠错句子；将所述待纠错句子转换为句子向量；将所述待纠错句子的句子向量输入到训练好的神经机器翻译模型NMT中,以获取所述神经机器翻译模型NMT输出的纠错后句子的句子向量；将所述纠错后句子的句子向量转换为纠错后句子。本发明实施例能够减少人工维护的工作量,从而减少处理成本。(The embodiment of the invention relates to the field of artificial intelligence and discloses a text error correction method, a text error correction device, a text error correction terminal and a computer readable storage medium. In the present invention, the text error correction method includes: obtaining a sentence to be corrected; converting the sentence to be corrected into a sentence vector; inputting the sentence vector of the sentence to be corrected into a trained neural machine translation model NMT to obtain the sentence vector of the corrected sentence output by the neural machine translation model NMT; and converting the sentence vector of the error-corrected sentence into the error-corrected sentence. The embodiment of the invention can reduce the workload of manual maintenance, thereby reducing the processing cost.)

1. A method for correcting a text, comprising:

obtaining a sentence to be corrected;

converting the sentence to be corrected into a sentence vector;

inputting the sentence vector of the sentence to be corrected into a trained neural machine translation model NMT to obtain the sentence vector of the corrected sentence output by the neural machine translation model NMT;

and converting the sentence vector of the error-corrected sentence into the error-corrected sentence.

2. The method according to claim 1, wherein the step of obtaining the sentence to be error-corrected is preceded by the method further comprising: generating the neural machine translation model NMT;

the step of generating the neural machine translation model NMT comprises:

carrying out new word discovery processing on the original corpus to generate original corpus participles;

adding the original corpus and pre-collected hot words into a word segmentation table;

segmenting the original sentences in the original corpus according to the segmentation table to generate original sentence segments;

converting the original sentence participles into sentence vectors;

converting a correct sentence corresponding to the original sentence into a sentence vector;

inputting a sentence vector pair consisting of the sentence vector of the original sentence and the sentence vector of the correct sentence into an NMT model for training;

and generating the neural machine translation model NMT through training of the sentence vector pair.

3. The method according to claim 2, wherein said step of performing new word discovery processing on the original corpus to generate original corpus participles comprises:

dividing original sentences in the original corpus into word fragments;

determining the word segments which can possibly form words according to the solidification degree of the word segments;

and when the value of the information entropy of the word segment which can be formed into a word is larger than a preset threshold value, determining the word segment as an original corpus participle.

4. The method according to claim 2, wherein the step of generating the neural machine translation model, NMT, comprises:

carrying out new word discovery processing on original corpora corresponding to different speaking roles to generate original corpus participles corresponding to different speaking roles;

adding the original corpus participles and pre-collected hot words into a participle table; the word segmentation table corresponds to the same speaking role as the original corpus;

segmenting the original sentences in the original corpus according to the segmentation table to generate original sentence segments; the word segmentation table corresponds to the same speaking role as the original corpus;

converting the original sentence participles into sentence vectors;

converting a correct sentence corresponding to the original sentence into a sentence vector;

inputting a vector pair consisting of the sentence vector of the original sentence and the sentence vector of the correct sentence and a corresponding speaking role into an NMT model for training;

and generating the neural machine translation model NMT through the training of the vector pair.

5. The method of claim 4,

the step of obtaining the sentence to be corrected comprises the following steps: acquiring a sentence to be corrected and a speaking role corresponding to the sentence to be corrected;

the step of inputting the sentence vector of the sentence to be corrected into the trained neural machine translation model NMT comprises: and inputting the sentence vector of the sentence to be corrected and the speaking role corresponding to the sentence to be corrected into a trained neural machine translation model NMT.

6. The method according to claim 1, wherein the step of converting the sentence to be error-corrected into a sentence vector comprises:

segmenting the sentence to be corrected to generate segmented sentences to be corrected;

converting each sentence to be corrected into a word vector;

and combining the word vectors of the word segments of the sentences to be corrected to generate sentence vectors of the sentences to be corrected.

7. The method of claim 6,

while the step of converting each sentence to be corrected into a word vector, the method further comprises: generating a corresponding relation between the participles and the word vectors;

the step of converting the sentence vector of the error-corrected sentence into an error-corrected sentence includes:

generating an error-corrected word vector by the sentence vector of the sentence to be error-corrected through a decoder;

acquiring error-corrected participles corresponding to the error-corrected word vectors according to the corresponding relation between the participles and the word vectors;

and combining the error-corrected word segments to generate an error-corrected sentence.

8. An apparatus for correcting a text, comprising:

an acquisition unit configured to acquire a sentence to be error-corrected;

a first conversion unit for converting the sentence to be corrected into a sentence vector;

the input unit is used for inputting the sentence vector of the sentence to be corrected into a trained neural machine translation model NMT so as to obtain the sentence vector of the corrected sentence output by the neural machine translation model NMT;

and the second conversion unit is used for converting the sentence vector of the error-corrected sentence into the error-corrected sentence.

9. A terminal, comprising:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of correcting text according to any one of claims 1 to 7.

10. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements a method of correcting a text according to any one of claims 1 to 7.

Technical Field

The embodiment of the invention relates to the field of artificial intelligence, in particular to a text error correction method, a text error correction device, a text error correction terminal and a computer readable storage medium.

Background

The voice conversation mainly refers to each round of conversation between two parties through telephone conversation, and is usually spoken and repetitive, and contains much useless information. Speech is not directly processed and it needs to be transcribed into text by ASR (speech recognition). In the process of transcription, due to the influence of various noise, speaker tone and other factors, the transcribed text has poor effect and cannot be directly used for actual downstream tasks. Aiming at the problem, the text converted from the voice needs to be corrected, and errors such as grammar and syntax in the text are corrected, so that the requirements of subsequent tasks are met.

The current stage of text error correction is computed as a rule-based error correction algorithm. The rule-based error correction algorithm mainly comprises two steps of error detection and error correction, and mainly comprises the steps of analyzing suspected error sentences and generating candidate sentences or words at the error positions for replacement.

The inventors found that at least the following problems exist in the related art:

Disclosure of Invention

An object of embodiments of the present invention is to provide a method, an apparatus, a terminal, and a computer-readable storage medium for correcting a text, which can reduce the workload of manual maintenance, thereby reducing the cost of text correction.

In order to solve the above technical problem, an embodiment of the present invention provides an error correction method, including:

obtaining a sentence to be corrected;

converting the sentence to be corrected into a sentence vector;

and converting the sentence vector of the error-corrected sentence into the error-corrected sentence.

The embodiment of the present invention further provides a text error correction apparatus, including:

an acquisition unit configured to acquire a sentence to be error-corrected;

a first conversion unit for converting the sentence to be corrected into a sentence vector;

and the second conversion unit is used for converting the sentence vector of the error-corrected sentence into the error-corrected sentence.

An embodiment of the present invention further provides a terminal, including:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method for correcting text.

The embodiment of the invention also provides a computer readable storage medium, which stores a computer program, and the computer program is executed by a processor to realize the text error correction method.

Compared with the prior art, the embodiment of the invention obtains the sentence to be corrected; converting the sentence to be corrected into a sentence vector; inputting the sentence vector of the sentence to be corrected into a trained neural machine translation model NMT to obtain the sentence vector of the corrected sentence output by the neural machine translation model NMT; and converting the sentence vector of the error-corrected sentence into the error-corrected sentence. Therefore, automatic sentence correction is performed by using the neural machine translation model NMT, and the workload of manual maintenance can be reduced, thereby reducing the processing cost.

In addition, before the step of obtaining the sentence to be corrected, the method further includes: generating the neural machine translation model NMT;