Spoken language translation method integrating machine translation and manual translation

文档序号:115961 发布日期:2021-10-19 浏览:35次 中文

阅读说明:本技术 一种机器与人工翻译相融合的口语翻译方法 (Spoken language translation method integrating machine translation and manual translation ) 是由 杜金林 于 2021-03-11 设计创作,主要内容包括:本发明提供一种机器与人工翻译相融合的口语翻译方法。所述机器与人工翻译相融合的口语翻译方法包括以下步骤:S1:获取待识别语音数据,对语音数据进行语种识别并确定语种类型;S2:对所述语音数据进行识别并进行断句,得到以句子为单位的输入文本;S3:建立机器翻译模型,将所述输入文本带入所述机器翻译模型进行翻译并得到机器翻译结果;S4:对所述机器翻译结果进行置信度打分;S5:由人工翻译所输入文本获得人工翻译结果并进行质量评估;S6:根据所述机器翻译结果的翻译置信度评估以及所述人工翻译结果的质量评估。本发明提供的机器与人工翻译相融合的口语翻译方法具有语种覆盖面广、翻译时考虑外部世界场景信息因素、翻译准确率高的优点。(The invention provides a spoken language translation method integrating machine translation and manual translation. The spoken language translation method with the machine and the manual translation fused comprises the following steps: s1: acquiring voice data to be recognized, performing language recognition on the voice data and determining language types; s2: recognizing the voice data and carrying out sentence segmentation to obtain an input text taking a sentence as a unit; s3: establishing a machine translation model, bringing the input text into the machine translation model for translation and obtaining a machine translation result; s4: scoring a confidence level of the machine translation result; s5: manually translating the input text to obtain a manual translation result and carrying out quality evaluation; s6: and evaluating the translation confidence of the machine translation result and evaluating the quality of the manual translation result. The oral translation method with the machine and manual translation integrated has the advantages of wide language coverage, consideration of external world scene information factors during translation and high translation accuracy.)

1. A spoken language translation method with machine and manual translation fused is characterized by comprising the following steps:

s1: acquiring voice data to be recognized, performing language recognition on the voice data and determining language types;

s2: recognizing the voice data and carrying out sentence segmentation to obtain an input text taking a sentence as a unit;

s3: establishing a machine translation model, bringing the input text into the machine translation model for translation and obtaining a machine translation result;

s4: scoring a confidence level of the machine translation result;

s5: manually translating the input text to obtain a manual translation result and carrying out quality evaluation;

s6: and generating prosody-adjustable speech output from the translation result with better evaluation by adopting a speech synthesis method according to the translation confidence evaluation of the machine translation result and the quality evaluation of the manual translation result.

2. The method for translating spoken language by fusing machine and manual translation according to claim 1, wherein the specific steps of step S1 are as follows:

s101: extracting audio frequency domain characteristics from the voice data to be recognized;

s102: based on the audio frequency domain characteristics, carrying out sound accompanying data separation on the audio data to be recognized to obtain voice data to be recognized, wherein the sound accompanying data is separated into voice data and accompaniment data which are separated from the audio data;

s103: and performing language identification on the voice data to be identified to obtain a language identification result of the audio data to be identified.

3. The method for spoken language translation with machine and manual translation integrated according to claim 2, wherein the step S102 specifically comprises: acquiring an audio depth feature through a sound partner separation model based on the audio frequency domain feature; based on the audio depth feature, acquiring a first voice data feature and a first accompaniment data feature through the acoustic accompaniment separation model; acquiring a second voice data characteristic through the sound partner separation model based on the first voice data characteristic; and acquiring the voice data to be recognized according to the second voice data characteristic, wherein the voice data to be recognized belongs to an audio time domain signal.

4. The method for spoken language translation with fusion of machine and manual translation according to claim 1, wherein the step S4 is implemented by a machine translation system using a statistical method based on hierarchical phrases, and the machine translation system outputs the translation result with confidence measure, and the method comprises:

extracting a mass of aligned phrase fragments from a bilingual corpus, storing the aligned phrase fragments as a knowledge source on a storage medium, matching the phrase fragments on an input text recognized by a voice by using a search algorithm, and combining the matched phrase fragments into a target sentence to be translated; and a confidence degree calculation method based on forced alignment model parameter training is fused to generate a confidence degree score for each target statement.

5. The method for spoken language translation with machine and manual translation combined according to claim 1, wherein the continuous speech inputted in the step S2 with speech data as unit is segmented into punctuation with prosody as main feature, and the segmentation is performed in combination with automatic recognition of speech and automatic addition of punctuation marks, and is performed without loss of recognition rate.

6. The method for machine-to-human translation of a converged spoken language according to claim 1, wherein the following determination is further required before performing step S4: the complexity of the input text is calculated and combined with the user's category to determine whether manual translation is required.

7. The method for translating spoken language by fusing machine and manual translation according to claim 1, wherein the translation model in step S3 is established according to the external world scene information.

8. The method for machine-to-human translation fused spoken language translation according to claim 7, wherein the establishing of the machine translation model comprises the steps of:

s301: acquiring scene information of the external world;

s302: establishing a language model and a phrase translation model, wherein the language model and the phrase translation model are both established according to the external world scene information;

and establishing the machine translation model according to the language model and the phrase translation model.

9. The method for spoken language translation with machine and manual translation integration according to claim 1, wherein the specific step of establishing a language model comprises:

establishing a language model based on the external world scene information according to the external world scene information, acquiring a traditional language model, and determining the language model for establishing a machine translation model according to the language model based on the external world scene information and the traditional language model.

10. The method for machine-to-human translation fused spoken language translation according to claim 9, wherein the determining a language model for building a machine translation model according to the external world scene information based language model and the traditional language model comprises: and performing linear logarithm processing on the language model based on the external world scene information and the traditional language model, and determining the model after the linear logarithm processing as the language model for establishing a machine translation model.

Technical Field

The invention relates to the technical field of translation, in particular to a spoken language translation method integrating machine translation and manual translation.

Background

Translation is the act of converting one language information into another language information on the basis of accuracy (confidence), smoothness (arrival), and elegance (elegance). Translation is the process of converting a relatively strange expression into a relatively familiar expression. The contents of which contain language, text, graphics, symbols, and video translations. Among the languages A and B, the term "turning over" refers to the conversion of the two languages, namely, firstly converting a sentence A into a sentence B, and then converting the sentence B into the language A; the translation is the process of converting the first language into the second language, and the meaning of the second language is further clarified in the process of translating the first language into the local language characters. The two constitute a translation in the general sense, which allows more people to understand the meaning of other languages.

With the scientific progress, the automatic speech translation technology is more and more popular in the life of people, but the main difficulty of the current spoken language translation lies in that:

1. in real scenes such as spoken language conversation, network chat and the like, input sentences often lack standardization, and inherent grammatical structure information is difficult to capture, so that the problems of harder statistical translation result, poor stability and the like are caused;

2. statistical machine translation is data-driven, and the root of survival is bilingual data resources; in the current data accumulation, spoken bilingual corpus (chinese-english) is rather scarce. Therefore, at present, the spoken language translation system completely depending on the statistical method cannot completely meet the wide requirements of daily life of people;

3. spoken language translation is different from text translation, and mainly solves the communication problem of people with different languages, so that the requirement on translation real-time performance is high, and particularly, how to optimize a translation process in a network environment is a key for improving user experience.

Therefore, there is a need to provide a new method for spoken language translation with machine and manual translation integrated to solve the above technical problems.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a spoken language translation method integrating machine and manual translation, which has wide language coverage, high translation accuracy and high translation accuracy by considering external world scene information factors.

In order to solve the technical problem, the spoken language translation method with the machine and the manual translation fused comprises the following steps:

s1: acquiring voice data to be recognized, performing language recognition on the voice data and determining language types;

s2: recognizing the voice data and carrying out sentence segmentation to obtain an input text taking a sentence as a unit;

s3: establishing a machine translation model, bringing the input text into the machine translation model for translation and obtaining a machine translation result;

s4: scoring a confidence level of the machine translation result;

s5: manually translating the input text to obtain a manual translation result and carrying out quality evaluation;

s6: and generating prosody-adjustable speech output from the translation result with better evaluation by adopting a speech synthesis method according to the translation confidence evaluation of the machine translation result and the quality evaluation of the manual translation result.

Preferably, the specific steps of step S1 are as follows:

s101: extracting audio frequency domain characteristics from the voice data to be recognized;

s102: based on the audio frequency domain characteristics, carrying out sound accompanying data separation on the audio data to be recognized to obtain voice data to be recognized, wherein the sound accompanying data is separated into voice data and accompaniment data which are separated from the audio data;

s103: and performing language identification on the voice data to be identified to obtain a language identification result of the audio data to be identified.

Preferably, the step S102 specifically includes: acquiring an audio depth feature through a sound partner separation model based on the audio frequency domain feature; based on the audio depth feature, acquiring a first voice data feature and a first accompaniment data feature through the acoustic accompaniment separation model; acquiring a second voice data characteristic through the sound partner separation model based on the first voice data characteristic; and acquiring the voice data to be recognized according to the second voice data characteristic, wherein the voice data to be recognized belongs to an audio time domain signal.

Preferably, the step S4 is implemented by a machine translation system using a statistical method based on hierarchical phrases, and the machine translation system outputs a translation result including a confidence measure, and the specific process includes:

extracting a mass of aligned phrase fragments from a bilingual corpus, storing the aligned phrase fragments as a knowledge source on a storage medium, matching the phrase fragments on an input text recognized by a voice by using a search algorithm, and combining the matched phrase fragments into a target sentence to be translated; and a confidence degree calculation method based on forced alignment model parameter training is fused to generate a confidence degree score for each target statement.

Preferably, in step S2, continuous speech input by using speech data as a unit is segmented into sentences by using prosody as a main characteristic, and the segmentation is performed by combining automatic recognition of speech and automatic addition of punctuation marks, and performed on the premise of not losing recognition rate.

Preferably, before executing step S4, the following judgment is also required: the complexity of the input text is calculated and combined with the user's category to determine whether manual translation is required.

Preferably, the translation model in step S3 is established according to the external world scenario information.

Preferably, the establishing of the machine translation model comprises the following steps:

s301: acquiring scene information of the external world;

s302: establishing a language model and a phrase translation model, wherein the language model and the phrase translation model are both established according to the external world scene information;

and establishing the machine translation model according to the language model and the phrase translation model.

Preferably, the specific step of establishing the language model includes:

establishing a language model based on the external world scene information according to the external world scene information, acquiring a traditional language model, and determining the language model for establishing a machine translation model according to the language model based on the external world scene information and the traditional language model.

Preferably, the determining a language model for establishing a machine translation model according to the external world scenario information-based language model and the conventional language model includes: and performing linear logarithm processing on the language model based on the external world scene information and the traditional language model, and determining the model after the linear logarithm processing as the language model for establishing a machine translation model.

Compared with the related technology, the spoken language translation method with the machine and the manual translation fused has the following beneficial effects:

the invention provides a spoken language translation method integrating machine and manual translation, which can identify a plurality of languages and has wider translation coverage; the speech synthesis stabilizes the results of different translation methods to output speech containing emotion, so that the real-time communication problem in speech translation can be well solved; and the machine translation model determined based on the external world scene information is adopted for translation, so that the external world scene information factors can be considered during translation, translation combination more conforming to the external world scene is obtained, and the accuracy of the translation result is improved.

Detailed Description

The present invention will be further described with reference to the following embodiments.

A spoken language translation method with machine and manual translation fused comprises the following steps:

s1: acquiring voice data to be recognized, performing language recognition on the voice data and determining language types;

s2: recognizing the voice data and carrying out sentence segmentation to obtain an input text taking a sentence as a unit;

s3: establishing a machine translation model, bringing the input text into the machine translation model for translation and obtaining a machine translation result;

s4: scoring a confidence level of the machine translation result;

s5: manually translating the input text to obtain a manual translation result and carrying out quality evaluation;

s6: and generating prosody-adjustable speech output from the translation result with better evaluation by adopting a speech synthesis method according to the translation confidence evaluation of the machine translation result and the quality evaluation of the manual translation result.

The specific steps of step S1 are as follows:

s101: extracting audio frequency domain characteristics from the voice data to be recognized;

s102: based on the audio frequency domain characteristics, carrying out sound accompanying data separation on the audio data to be recognized to obtain voice data to be recognized, wherein the sound accompanying data is separated into voice data and accompaniment data which are separated from the audio data;

s103: and performing language identification on the voice data to be identified to obtain a language identification result of the audio data to be identified.

3. The method for spoken language translation with machine and manual translation integrated according to claim 2, wherein the step S102 specifically comprises: acquiring an audio depth feature through a sound partner separation model based on the audio frequency domain feature; based on the audio depth feature, acquiring a first voice data feature and a first accompaniment data feature through the acoustic accompaniment separation model; acquiring a second voice data characteristic through the sound partner separation model based on the first voice data characteristic; and acquiring the voice data to be recognized according to the second voice data characteristic, wherein the voice data to be recognized belongs to an audio time domain signal.

In step S4, the machine translation system using a statistical method based on hierarchical phrases outputs a translation result including a confidence measure, and the specific process includes:

extracting a mass of aligned phrase fragments from a bilingual corpus, storing the aligned phrase fragments as a knowledge source on a storage medium, matching the phrase fragments on an input text recognized by a voice by using a search algorithm, and combining the matched phrase fragments into a target sentence to be translated; and a confidence degree calculation method based on forced alignment model parameter training is fused to generate a confidence degree score for each target statement.

In step S2, continuous speech input by using speech data as a unit is segmented into punctuation by using prosody as a main feature, and the segmentation is performed by combining automatic recognition of speech and automatic addition of punctuation marks, and performed on the premise of not losing recognition rate.

The following determination is also required before step S4 is executed: the complexity of the input text is calculated and combined with the user's category to determine whether manual translation is required.

The translation model in step S3 is established according to the outside world scene information.

The establishment of the machine translation model comprises the following steps:

s301: acquiring scene information of the external world;

s302: establishing a language model and a phrase translation model, wherein the language model and the phrase translation model are both established according to the external world scene information;

and establishing the machine translation model according to the language model and the phrase translation model.

The specific steps of establishing the language model comprise:

establishing a language model based on the external world scene information according to the external world scene information, acquiring a traditional language model, and determining the language model for establishing a machine translation model according to the language model based on the external world scene information and the traditional language model.

The determining a language model for establishing a machine translation model according to the language model based on the external world scenario information and the traditional language model comprises the following steps: and performing linear logarithm processing on the language model based on the external world scene information and the traditional language model, and determining the model after the linear logarithm processing as the language model for establishing a machine translation model.

Compared with the related technology, the spoken language translation method with the machine and the manual translation fused has the following beneficial effects:

the invention provides a spoken language translation method integrating machine and manual translation, which can identify a plurality of languages and has wider translation coverage; the speech synthesis stabilizes the results of different translation methods to output speech containing emotion, so that the real-time communication problem in speech translation can be well solved; and the machine translation model determined based on the external world scene information is adopted for translation, so that the external world scene information factors can be considered during translation, translation combination more conforming to the external world scene is obtained, and the accuracy of the translation result is improved.

The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by the present specification, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

7页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种基于BART模型的正则表达式描述生成方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!