Analysis method, device and equipment for determining inter-translation text and similarity between texts

文档序号：1406187 发布日期：2020-03-06 浏览：14次中文

阅读说明：本技术 确定互译文本及文本间相似度分析方法、装置及设备 (Analysis method, device and equipment for determining inter-translation text and similarity between texts ) 是由陆军施杨斌龙旺钦于 2018-08-17 设计创作，主要内容包括：本公开提出了一种确定互译文本及文本间相似度分析方法、装置及设备。获取第一文本集合和第二文本集合,第一文本集合使用第一语言编写且被翻译为第二语言,第二文本集合使用第二语言编写,第一文本集合和第二文本集合均包括多个文本,文本中包括多个字或词；分别以第一文本集合中的各个字或词为键,以字或词所在文本的文本标识符为值,构建第一索引；分别以第二文本集合中的各个字或词为键,以字或词所在文本的文本标识符为值,构建第二索引；将第一索引和第二索引中对应于相同键的文本标识符互相组成文本对,并统计各个文本对的出现次数；以及基于文本对的出现次数,确定属于互译关系的文本。由此,可以减少计算量,提高互译文本的识别效率。(The disclosure provides an analysis method, device and equipment for determining inter-translation texts and similarity between the texts. The method comprises the steps of obtaining a first text set and a second text set, wherein the first text set is written by using a first language and translated into a second language, the second text set is written by using the second language, the first text set and the second text set both comprise a plurality of texts, and the texts comprise a plurality of words or words; respectively taking each character or word in the first text set as a key and taking a text identifier of a text where the character or word is located as a value to construct a first index; respectively taking each character or word in the second text set as a key and taking a text identifier of the text where the character or word is located as a value to construct a second index; mutually forming text identifiers corresponding to the same key in the first index and the second index into text pairs, and counting the occurrence times of each text pair; and determining the texts belonging to the inter-translation relationship based on the occurrence times of the text pairs. Therefore, the calculation amount can be reduced, and the recognition efficiency of the inter-translated text can be improved.)

1. A method of determining inter-translated text, comprising:

acquiring a first text set and a second text set, wherein the first text set is written by using a first language and translated into a second language, the second text set is written by using the second language, the first text set and the second text set both comprise a plurality of texts, and the plurality of texts comprise a plurality of words or words;

respectively taking each character or word in the first text set as a key and taking a text identifier of a text where the character or word is located as a value to construct a first index;

respectively taking each character or word in the second text set as a key and taking a text identifier of a text where the character or word is located as a value to construct a second index;

mutually forming text identifiers corresponding to the same key in the first index and the second index into text pairs, and counting the occurrence times of each text pair; and

and determining the texts belonging to the inter-translation relation based on the occurrence times of the text pairs.

2. The method of claim 1, wherein,

and the texts corresponding to the two text identifiers in the text pair belong to different text sets.

3. The method of claim 1, wherein,

the number of occurrences is the number of identical keys that two texts in a text pair have.

4. The method of claim 2, wherein the step of determining text belonging to an inter-translation relationship comprises:

for a first text, determining a second text in a first text pair with the largest occurrence number as an inter-translated text of a text written in a first language corresponding to the first text, wherein the first text pair is a text pair containing the first text, the first text is a text in the first text set, and the second text is a text in the second text set; and/or

And for a second text, determining the text written in the first language corresponding to the first text in a second text pair with the highest occurrence number as the inter-translated text of the second text, wherein the second text pair is the text pair containing the second text.

5. The method of claim 2, wherein the step of determining text belonging to an inter-translation relationship comprises:

regarding a first text, taking a second text in a first preset number of first text pairs with the largest occurrence number as a candidate text set of the first text, wherein the first text pairs are text pairs containing the first text, the first text is a text in the first text set, and the second text is a text in the second text set;

calculating the similarity between each second text in the candidate text set and the first text; and

and selecting the second text with the maximum similarity as the inter-translated text of the text written by the first language corresponding to the first text.

6. The method of claim 2, wherein the step of determining text belonging to an inter-translation relationship comprises:

regarding a second text, taking a first text in a second predetermined number of second text pairs with the largest occurrence number as a candidate text set of the second text, wherein the second text pairs are text pairs containing the second text, the first text is a text in the first text set, and the second text is a text in the second text set;

calculating the similarity between each first text and the second text in the candidate text set; and

and selecting a first text with the maximum similarity, and taking a text which is written by using a first language and corresponds to the first text as an inter-translated text of the second text.

7. The method of claim 1, further comprising:

obtaining web page text in different languages from a multi-language web site, wherein,

the text in the first text set is the translation text of the first language webpage text corresponding to the second language acquired from the multi-language website, and the text in the second text set is the webpage text of the second language acquired from the multi-language website.

8. The method of claim 1, further comprising:

removing stop words and/or high-frequency words in the first text set; and/or

Removing stop words and/or high frequency words in the second set of text.

9. The method of claim 1, further comprising:

weights are set for the respective characters or words,

wherein the step of determining text belonging to an inter-translation relationship comprises: and determining the texts belonging to the inter-translation relationship based on the occurrence times of the text pairs and the weight of the corresponding characters or words in each occurrence.

10. The method of claim 9, wherein the step of determining text belonging to an inter-translation relationship comprises:

calculating the sum of the weights of the corresponding characters or words of each text pair when the text pairs appear each time to obtain the weight value of each text pair; and

and determining texts belonging to the inter-translation relationship based on the weight values of the text pairs.

11. A method of determining inter-translated text, comprising:

acquiring a first text set and a second text set, wherein the first text set is written by using a first language and translated into a third language, the second text set is written by using a second language and translated into the third language, the first text set and the second text set both comprise a plurality of texts, and the plurality of texts comprise a plurality of words or words;

respectively taking each character or word in the first text set as a key and taking a text identifier of a text where the character or word is located as a value to construct a first index;

respectively taking each character or word in the second text set as a key and taking a text identifier of a text where the character or word is located as a value to construct a second index;

mutually forming text identifiers corresponding to the same key in the first index and the second index into text pairs, and counting the occurrence times of each text pair; and

and determining the texts belonging to the inter-translation relation based on the occurrence times of the text pairs.

12. A method for analyzing similarity between texts comprises the following steps:

acquiring a text set, wherein the text set comprises a plurality of texts, and the plurality of texts comprise a plurality of characters or words;

respectively taking each character or word in the text set as a key and taking a text identifier of the text where the character or word is located as a value to construct an index;

mutually forming text identifiers corresponding to the same key in the index into text pairs, and counting the occurrence times of each text pair; and

and determining the similarity between the two texts in the text pair based on the occurrence times of the text pair, wherein the similarity is positively correlated with the occurrence times.

13. An apparatus for determining an inter-translated text, comprising:

the system comprises an acquisition module, a translation module and a display module, wherein the acquisition module is used for acquiring a first text set and a second text set, the first text set is written by using a first language and translated into a second language, the second text set is written by using a second language, the first text set and the second text set both comprise a plurality of texts, and the plurality of texts comprise a plurality of words or words;

the first construction module is used for constructing a first index by taking each character or word in the first text set as a key and taking a text identifier of a text where the character or word is located as a value;

the second construction module is used for constructing a second index by taking each character or word in the second text set as a key and the text identifier of the text in which the character or word is located as a value;

the statistical module is used for mutually forming text identifiers corresponding to the same key in the first index and the second index into text pairs and counting the occurrence times of each text pair; and

and the determining module is used for determining the text pairs belonging to the inter-translation relation based on the occurrence times of the text pairs.

14. An apparatus for determining an inter-translated text, comprising:

the device comprises an acquisition module, a translation module and a display module, wherein the acquisition module is used for acquiring a first text set and a second text set, the first text set is written by using a first language and translated into a third language, the second text set is written by using a second language and translated into the third language, the first text set and the second text set both comprise a plurality of texts, and the plurality of texts comprise a plurality of words or words;

and the determining module is used for determining the texts which belong to the inter-translation relation based on the occurrence times of the text pairs.

15. An apparatus for analyzing similarity between texts, comprising:

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a text set, the text set comprises a plurality of texts, and the texts comprise a plurality of characters or words;

the building module is used for building indexes by taking each character or word in the text set as a key and taking a text identifier of a text where the character or word is located as a value;

the statistical module is used for mutually forming text pairs by the text identifiers corresponding to the same key in the index and counting the occurrence times of each text pair; and

and the similarity determining module is used for determining the similarity between two texts in the text pair based on the occurrence frequency of the text pair, wherein the similarity is positively correlated with the occurrence frequency.

16. A computing device, comprising:

a processor; and

a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method of any of claims 1-12.

17. A non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform the method of any of claims 1-12.

Technical Field

The present disclosure relates to the field of internet technologies, and in particular, to an analysis method, an analysis device, and an analysis apparatus for determining inter-translation texts and similarity between texts.

Background

Machine translation technology refers to a technology for translating an original text in one natural language (generally referred to as a source language) into a translated text in another natural language (generally referred to as a target language) using a computer or other computing device. Automatic translation is mainly realized by a trained machine translation model, so that a large amount of translation work can be processed in a relatively short time compared with manual translation.

The corpus is training data of a machine translation model, and both Statistical Machine Translation (SMT) and neural Network Machine Translation (NMT) are extremely dependent on the corpus data. In machine translation, support of multiple languages and the quality of translation in each language direction are related to the scale and quality of corpus data. The linguistic data referred to herein generally refers to large-scale sets of bilingual pairs.

Currently, such corpus data is mainly acquired by the following 3 ways.

1. And (4) directly purchasing. The cost is high, the bought linguistic data and the data needed in machine translation have certain difference, and the data do not exist in every language direction, especially in small languages.

2. And (5) making by a human translator. Higher quality corpora can be obtained, but the cost is very high, and the production level can hardly reach the scale required by machine translation.

3. Download/mine from the internet. There are many multilingual websites on the network, such as apple's official website (https:// www. applet. com/cache-your-county /), which have web page versions of many languages, and many of them are translated into each other, so that these data can be captured to form bilingual corpus.

In practical application, the above three schemes are usually adopted to obtain richer corpora. The 1 st and 2 nd modes are simple, and the 3 rd mode is complex in obtaining the corpus, so that the problem of automatic webpage alignment exists.

Specifically, for a multilingual website on the internet, all the webpages of the various language substations of the multilingual website can be crawled by a crawler. After obtaining the different language website pages, it is necessary to mine (identify) the pages that are translated with each other, and this step is called "automatic page alignment". Finally, the bilingual sentence pairs can be extracted from the two mutually translated webpages, and finally a bilingual sentence pair library is formed to be used as the linguistic data of machine translation.

Therefore, when the corpus is obtained through the 3 rd mode, how to quickly determine the inter-translated text to realize the automatic alignment of the web pages is a key for realizing the scheme.

Disclosure of Invention

It is an object of the present disclosure to provide a scheme that enables fast determination of inter-translated text.

According to a first aspect of the present disclosure, a method of determining an inter-translated text is proposed, comprising: the method comprises the steps of obtaining a first text set and a second text set, wherein the first text set is written by using a first language and translated into a second language, the second text set is written by using the second language, the first text set and the second text set both comprise a plurality of texts, and the plurality of texts comprise a plurality of words or words; respectively taking each character or word in the first text set as a key and taking a text identifier of a text where the character or word is located as a value to construct a first index; respectively taking each character or word in the second text set as a key and taking a text identifier of the text where the character or word is located as a value to construct a second index; mutually forming text identifiers corresponding to the same key in the first index and the second index into text pairs, and counting the occurrence times of each text pair; and determining the texts belonging to the inter-translation relationship based on the occurrence times of the text pairs.

Optionally, the texts corresponding to the two text identifiers in the text pair belong to different text sets.

Optionally, the number of occurrences is the number of text pairs having the same key for both texts.

Optionally, the step of determining the text belonging to the inter-translation relationship comprises: for a first text, determining a second text in a first text pair with the largest occurrence number as an inter-translated text of a text written in a first language corresponding to the first text, wherein the first text pair is a text pair containing the first text, the first text is a text in a first text set, and the second text is a text in a second text set; and/or for a second text, determining the text written in the first language corresponding to the first text in the second text pair with the largest occurrence number as the inter-translated text of the second text, wherein the second text pair is the text pair containing the second text.

Optionally, the step of determining the text pairs belonging to the inter-translation relationship comprises: regarding a first text, taking a second text in a first preset number of first text pairs with the largest occurrence frequency as a candidate text set of the first text, wherein the first text pairs are text pairs containing the first text, the first text is a text in the first text set, and the second text is a text in the second text set; calculating the similarity between each second text and the first text in the candidate text set; and selecting the second text with the maximum similarity as the inter-translation text of the text written by the first language corresponding to the first text.

Optionally, the step of determining the text belonging to the inter-translation relationship comprises: for a second text, ranking according to the occurrence times, and taking a first text in a second preset number of second text pairs with the maximum occurrence times as a candidate text set of the second text, wherein the second text pair is a text pair containing the second text, the first text is a text in the first text set, and the second text is a text in the second text set; calculating the similarity between each first text and the second text in the candidate text set; and selecting a first text with the maximum similarity, and taking a text which is written by using a first language and corresponds to the first text as an inter-translated text of a second text.

Optionally, the method further comprises: and acquiring webpage texts in different languages from the multi-language website, wherein the texts in the first text set are translation texts of the webpage texts in the first language acquired from the multi-language website, which correspond to the second language, and the texts in the second text set are the webpage texts in the second language acquired from the multi-language website.

Optionally, the method further comprises: removing stop words and/or high-frequency words in the first text set; and/or removing stop words and/or high frequency words from the second set of text.

Optionally, the method further comprises: and respectively setting weights for all characters or words, wherein the step of determining the texts which belong to the inter-translation relationship comprises the following steps: and determining the texts belonging to the inter-translation relationship based on the occurrence times of the text pairs and the weight of the corresponding characters or words in each occurrence.

Optionally, the step of determining the text belonging to the inter-translation relationship comprises: calculating the sum of the weights of the corresponding characters or words of each text pair when the text pairs appear each time to obtain the weight value of each text pair; and determining the texts belonging to the inter-translation based on the weight values of the text pairs.

According to a second aspect of the present disclosure, there is also provided a method of determining an inter-translated text, comprising: acquiring a first text set and a second text set, wherein the first text set is written by using a first language and translated into a third language, the second text set is written by using a second language and translated into the third language, the first text set and the second text set comprise a plurality of second texts, and the plurality of texts comprise a plurality of characters or words; respectively taking each character or word in the first text set as a key and taking a text identifier of a text where the character or word is located as a value to construct a first index; respectively taking each character or word in the second text set as a key and taking a text identifier of the text where the character or word is located as a value to construct a second index; mutually forming text identifiers corresponding to the same key in the first index and the second index into text pairs, and counting the occurrence times of each text pair; and determining the texts belonging to the inter-translation relationship based on the occurrence times of the text pairs.

According to a third aspect of the present disclosure, there is also provided a method for analyzing similarity between texts, including: acquiring a text set, wherein the text set comprises a plurality of texts, and the plurality of texts comprise a plurality of characters or words; respectively taking each character or word in the text set as a key and taking a text identifier of the text where the character or word is located as a value to construct an index; mutually forming text identifiers corresponding to the same key in the index into text pairs, and counting the occurrence times of each text pair; and determining the similarity between the two texts in the text pair based on the occurrence times of the text pair, wherein the similarity is positively correlated with the occurrence times.

According to a fourth aspect of the present disclosure, there is also provided an apparatus for determining an inter-translated text, including: the acquisition module is used for acquiring a first text set and a second text set, wherein the first text set is written by using a first language and translated into a second language, the second text set is written by using the second language, the first text set and the second text set both comprise a plurality of texts, and the plurality of texts comprise a plurality of words or words; the first construction module is used for constructing a first index by taking each character or word in the first text set as a key and taking a text identifier of a text where the character or word is located as a value; the second construction module is used for constructing a second index by taking each character or word in the second text set as a key and the text identifier of the text in which the character or word is located as a value; the statistical module is used for mutually forming text identifiers corresponding to the same key in the first index and the second index into text pairs and counting the occurrence times of each text pair; and the determining module is used for determining the texts which belong to the inter-translation relation based on the occurrence times of the text pairs.

Optionally, the texts corresponding to the two text identifiers in the text pair belong to different text sets.

Optionally, the number of occurrences is the number of identical keys that two texts in a text pair have.

Optionally, for a first text, the determining module determines a second text in a first text pair with the largest number of occurrences as an inter-translated text of a text written in the first language corresponding to the first text, wherein the first text pair is a text pair including the first text, the first text is a text in a first text set, and the second text is a text in a second text set, and/or for a second text, the determining module determines a text written in the first language corresponding to the first text in the second text pair with the largest number of occurrences as an inter-translated text of the second text, wherein the second text pair is a text pair including the second text.

Optionally, the determining module includes: the candidate text set determining module is used for regarding a first text, taking a second text in a first preset number of first text pairs with the largest occurrence frequency as a candidate text set of the first text, wherein the first text pairs are text pairs containing the first text, the first text is a text in the first text set, and the second text is a text in a second text set; the first calculation module is used for calculating the similarity between each second text and the first text in the candidate text set; and the selection module is used for selecting the second text with the maximum similarity as the inter-translation text of the text written by the first language corresponding to the first text.

Optionally, the determining module includes: the candidate text set determining module is used for regarding a second text, taking a first text in a second preset number of second text pairs with the largest occurrence frequency as a candidate text set of the second text, wherein the second text pair is a text pair containing the second text, the first text is a text in the first text set, and the second text is a text in the second text set; the first calculation module is used for calculating the similarity between each first text and each second text in the candidate text set; and the selection module is used for selecting the first text with the maximum similarity and taking the text which is written by using the first language and corresponds to the first text as the inter-translation text of the second text.

Optionally, the apparatus further comprises: the text acquisition module is used for acquiring webpage texts in different languages from the multi-language website, wherein the texts in the first text set are translation texts of the webpage texts in the first language acquired from the multi-language website and corresponding to the second language, and the texts in the second text set are the webpage texts in the second language acquired from the multi-language website.

Optionally, the apparatus further comprises: the first removal module is used for removing stop words and/or high-frequency words in the first text set; and/or a second removal module for removing stop words and/or high-frequency words in the second text set.

Optionally, the apparatus further comprises: and the weight setting module is used for respectively setting weights for all the characters or words, wherein the determining module determines the texts which belong to the inter-translation relationship based on the occurrence times of the text pairs and the weights of the corresponding characters or words in each occurrence.

Optionally, the determining module includes: the second calculation module is used for calculating the sum of the weights of the corresponding characters or words of each text pair in each occurrence so as to obtain the weight value of each text pair; and the determining submodule is used for determining the texts which belong to the inter-translation relation based on the weight value of the text pair.

According to a fifth aspect of the present disclosure, there is also provided an apparatus for determining an inter-translated text, comprising: the acquisition module is used for acquiring a first text set and a second text set, wherein the first text set is written by using a first language and translated into a third language, the second text set is written by using a second language and translated into the third language, the first text set and the second text set both comprise a plurality of texts, and the plurality of texts comprise a plurality of words or words; the first construction module is used for constructing a first index by taking each character or word in the first text set as a key and the text identifier of the text where the character or word is located as a value, and constructing the first index by taking the text identifier of the first text as a value; the second construction module is used for constructing a second index by taking each character or word in the second text set as a key and the text identifier of the text in which the character or word is located as a value; the statistical module is used for mutually forming text identifiers corresponding to the same key in the first index and the second index into text pairs and counting the occurrence times of each text pair; and the determining module is used for determining the texts which belong to the inter-translation relation based on the occurrence times of the text pairs.

According to a sixth aspect of the present disclosure, there is also provided an apparatus for analyzing similarity between texts, including: the acquisition module is used for acquiring a text set, wherein the text set comprises a plurality of texts, and the plurality of texts comprise a plurality of characters or words; the building module is used for building indexes by taking each character or word in the text set as a key and taking a text identifier of a text where the character or word is located as a value; the statistical module is used for mutually forming text pairs by the text identifiers corresponding to the same key in the index and counting the occurrence times of each text pair; and the similarity determining module is used for determining the similarity between two texts in the text pair based on the occurrence frequency of the text pair, wherein the similarity is positively correlated with the occurrence frequency.

According to a seventh aspect of the present disclosure, there is also provided a computing device comprising: a processor; and a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method as set forth in any one of the first to third aspects of the disclosure.

According to an eighth aspect of the present disclosure, there is also provided a non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform the method as set forth in any one of the first to third aspects of the present disclosure.

By introducing the inverted index, the webpage text pair belonging to the inter-translated text can be identified based on less calculation amount, so that the identification efficiency of the inter-translated text can be improved.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent by describing in greater detail exemplary embodiments thereof with reference to the attached drawings, in which like reference numerals generally represent like parts throughout.

FIG. 1 is a schematic flow chart diagram illustrating a method of determining inter-translated text in accordance with one embodiment of the present disclosure.

FIG. 2 is a schematic flow chart diagram illustrating a web page text alignment method according to an embodiment of the present disclosure.

Fig. 3 is a schematic flow chart diagram illustrating an analysis method of inter-text similarity according to an embodiment of the present disclosure.

Fig. 4 is a schematic block diagram illustrating a structure of an apparatus for determining an inter-translated text according to an embodiment of the present disclosure.

Fig. 5 is a schematic structural diagram illustrating modules that determine functions that a module may have according to an embodiment of the present disclosure.

Fig. 6 is a schematic structural diagram illustrating a module that determines functions that a module may have according to another embodiment of the present disclosure.

Fig. 7 is a schematic block diagram showing the structure of an inter-text similarity analysis apparatus according to an embodiment of the present disclosure.

FIG. 8 is a schematic block diagram illustrating the structure of a computing device in accordance with an embodiment of the present disclosure.

Detailed Description

Preferred embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

[ term interpretation ]

Statement pair: two sentences that are translated to each other, also called bilingual sentence pairs, for example, "weather today is good" and "It's year day today" belong to the bilingual sentence pair.

And the webpage text pair: the two web pages are different in language and are translated with each other.

Inverted indexing: an indexing method is used to store a mapping of the storage location of a word in a document or set of documents under a full-text search.

And (3) machine translation: text is translated from one natural language to another by a computer program.

Crawler: a system tool for capturing web pages on the Internet.

[ scheme overview ]

As described in the background section, the corpora used to train the machine translation model refer to pairs of sentences that are different in language and are translated with each other. After obtaining web page texts in different languages from one or more multilingual websites on the internet, it is necessary to determine a web page text pair (i.e., web page alignment) belonging to an inter-translated text, so as to further obtain corpus data that can be used for training a machine translation model from the web page text pair.

In view of the above, the present disclosure proposes an inter-translated text determination scheme (i.e., a web page text alignment scheme) capable of quickly determining a web page text pair belonging to an inter-translated text. Specifically, the web page texts of each language substation of the multilingual website on the internet can be acquired by using a crawler or the like, and for two sets of web page text sets in which two sets of acquired web page texts respectively correspond to different languages, translation can be performed for one set, and the text in the set is translated into a text corresponding to the language of the text in the other set, or translation can be performed for the two sets respectively, and the text in the two sets is translated into a text in another language different from the languages of the two sets of texts. An inverted index may then be constructed for both sets of text, i.e., with the text ID as the value and the words or words in the text as the keys. For two sets of inverted indices, the text IDs of two different sets corresponding to the same key may be merged together to obtain multiple text pairs. Finally, the occurrence number of each text pair can be counted, and the occurrence number of the text pairs can reflect the co-occurrence number of the characters or words with the same or similar characters or words of the two texts, so that the inter-translated texts can be determined according to the occurrence number.

For example, for the text a, another text different from the text a in the text pair having the highest occurrence number in all the text pairs including the text a may be directly determined as the inter-translated text of the text a. For another example, for the text a, the texts in the first N text pairs with the highest occurrence frequency in all the text pairs including the text a may also be used as similar texts of the text a, that is, a candidate text set, and then the inter-translated text belonging to the text a may be further selected from the candidate text set based on a predetermined determination manner (for example, an existing inter-translated text calculation manner).

Therefore, the webpage text pairs belonging to the inter-translated texts can be identified based on less calculation amount by introducing the inverted index, so that the identification efficiency of the inter-translated texts can be improved. The following further describes aspects of the present disclosure.

[ METHOD FOR DETERMINING INTER-TRANSLATED TEXT ]

FIG. 1 is a schematic flow chart diagram illustrating a method of determining inter-translated text in accordance with one embodiment of the present disclosure.

Referring to fig. 1, in step S110, a first text set and a second text set are acquired.

The first and second sets of text each include a plurality of texts, and the plurality of texts may include a plurality of words or phrases. For ease of distinction, text in the first set of text may be referred to as first text and text in the second set of text may be referred to as second text.

As one example of the present disclosure, the first set of text may be written in a first language and translated into a second language, that is, the text in the first set of text (i.e., the first text) may be a translated version of the second language of the text written in the first language. The second set of text is written in a second language. The first language and the second language refer to languages used for human communication and may include languages that naturally evolve culturally (i.e., natural languages such as chinese, english, french, etc.) and man-made languages (e.g., world languages), but do not include computer programming languages. Thus, in this example, for a set of text written based on a first language and a second set of text written based on a second language, one of the sets may be translated into text in a language corresponding to the other set to yield the first set of text and the second set of text.

As another example of the present disclosure, the first set of text may be written in a first language and translated into a third language, and the second set of text may be written in a second language and translated into the third language. That is, the text in the first set of text (i.e., the first text) may be a translated version of the text in the first language and the text in the second set of text (i.e., the second text) may be a translated version of the text in the second language. The third language is a language different from the first language and the second language. The first, second, and third languages may refer to languages used for human communication and may include languages that naturally evolve with culture (i.e., natural languages such as chinese, english, french, etc.) and artificial languages (e.g., world languages), but do not include computer programming languages. Thus, in this example, a set of text written based on a first language and a set of text written based on a second language may be translated into text in a third language different from the first language and the second language, respectively, to yield a first set of text and a second set of text.

The first text and the second text may be obtained from a multi-lingual website. For example, web page text in different languages may be obtained from a multi-language website, the first text may be translated text in a first language corresponding to the second language obtained from one or more multi-language websites, and the second text may be web page text in a second language obtained from the multi-language website. For another example, web page text in different languages may be obtained from a multi-language website, the first text may be translated text in a third language corresponding to the web page text in the first language obtained from one or more multi-language websites, and the second text may be translated text in a third language corresponding to the web page text in the second language obtained from the multi-language website.

In step S120, a first index is constructed by using each character or word in the first text set as a key and using a text identifier of a text in which the character or word is located as a value.

Specifically, the reverse index may be established with the word as the granularity, the reverse index may also be established with the word as the granularity, and the reverse index may also be established with the word and the word as the granularity, that is, the key in the first index may include only the word, or both the word and the word. As an example, the translated text in the second language may be subjected to word segmentation, and the first index may be constructed by using each word segmentation result (which may be a word or a word) as a key (key) and using a text identifier of the first text as a value (value). The words mentioned in the present disclosure may refer to a combination of two or more words, such as words and phrases composed of a plurality of Chinese characters in the chinese language, or phrases composed of a plurality of words (words) in the english language.

It should be noted that the text identifier mentioned in the present disclosure may be a coded value configured for the text, or may be in other data forms capable of uniquely characterizing the text. The text identifier may also be a Uniform Resource Locator (URL) of the web page text, as in the case where the first text is a translated text of the web page text and the second text is the web page text. In addition, it should be noted that the first text in the first text set is a translated text of the original text written in the first language and the translation process only changes the writing language of the text, and does not change the identifier of the text, that is, the first text is the same as the text identifier of the original text and only the writing language is different. Since the text identifiers of the first text and the original text are the same, in the method for determining the inter-translated text of the present invention, for any first text, the original text written in the first language corresponding to the first text can be determined according to the text identifier of the first text.

As an example of the present disclosure, stop words and/or high frequency words in the first set of text may also be removed prior to establishing the inverted index. The stop words may refer to words or phrases without practical meaning, such as the indefinite articles a and an in English, and "the", etc. in Chinese. The high frequency words may refer to statistically derived words or phrases that are used more frequently. Stop words and/or high frequency words in the second language may be known and may be determined by looking up a table (e.g., a stop word table and/or a high frequency word table).

In step S130, a second index is constructed by using each character or word in the second text set as a key and using the text identifier of the text in which the character or word is located as a value.

Specifically, the reverse index may be established with the word as the granularity, the reverse index may also be established with the word as the granularity, and the reverse index may also be established with the word and the word as the granularity, that is, the key in the second index may include only the word, or both the word and the word. Preferably, the granularity of the keys of the second index is the same as the granularity of the keys of the first index described previously. As an example, the second text may be participled, and the second index may be constructed with each participle result (which may be a word or a word) as a key (key) and the text identifier of the first text as a value (value).

Likewise, stop words and/or high frequency words in the second set of text may also be removed when establishing the inverted index. For the specific construction process and the related details, reference may be made to the description of step S120, which is not described herein again.

In step S140, text identifiers corresponding to the same key in the first index and the second index are grouped into text pairs with each other, and the number of occurrences of each text pair is counted. The two text identifiers in the text pair can belong to different text sets, and the occurrence frequency is the number of the same keys of the two texts in the text pair.

The first index and the second index are constructed by using characters or words as keys and text identifiers of the text as values. Thus, based on the first index and the second index, text identifiers of texts corresponding to the same key (word or word) can be quickly merged together to obtain a plurality of text groups. Wherein each key corresponds to a text group, and the different first text and second text in each text group can be regarded as a text pair. The number of occurrences of each text pair in the plurality of text groups may be counted. The number of occurrences of each text pair may characterize the number of times (i.e., the number of co-occurrences) that the first text and the second text in the text pair have the same key (word or word).

In step S150, the texts belonging to the inter-translation relationship are determined based on the number of occurrences of the text pairs.

The number of occurrences of each text pair may represent the number of co-occurrences of the first text and the second text having the same key, i.e., the number of words or phrases, and thus the text belonging to the inter-translation relationship may be determined based on the number of occurrences of the text pair.

The text belonging to the inter-translation relationship can be specifically determined in the following two ways.

1. First one

The text belonging to the inter-translation relationship may be determined directly based on the number of occurrences of the text pair.

For example, in a case where the first text set is written in a first language and translated into a second language, and the second text set is written in the second language, for the first text, the second text in the first text pair that occurs the most times may be determined as the inter-translated text of the text written in the first language corresponding to the first text, where the first text pair is the text pair including the first text. In addition, for a second text, a text written in the first language corresponding to a first text in a second text pair with the largest occurrence number may be determined as an inter-translated text of the second text, where the second text pair is a text pair including the second text.

For another example, in a case where the first text set is written in the first language and translated into the third language, and the second text set is written in the second language and translated into the third language, for the first text, the text written in the first language corresponding to the second text in the first text pair that occurs most frequently may be determined as the inter-translated text of the text written in the first language corresponding to the first text, where the first text pair is a text pair including the first text. In addition, for a second text, a text written in the first language corresponding to a first text in a second text pair with the largest occurrence number may be determined as an inter-translated text of the text written in the first language corresponding to the second text, where the second text pair is a text pair including the second text.

2. Second kind

The text pairs with high possibility of belonging to the inter-translation texts can be screened based on the occurrence times of the text pairs, and then the screened text pairs are further processed in other modes to further search the texts belonging to the inter-translation relations. Other ways may be the existing inter-translated text determination.

As an example, in a case where the first text set is written in a first language and translated into a second language, and the second text set is written in the second language, for the first text, the second text in a first predetermined number of first text pairs that occur most frequently may be ranked according to the number of occurrences as a candidate text set for the first text, where the first text pairs are text pairs including the first text, and then the second text belonging to the inter-translated text of the text written in the first language corresponding to the first text may be further selected from the candidate text set based on a plurality of ways. For example, the similarity between each second text in the candidate text set and the first text may be calculated, and then the second text with the highest similarity may be selected as the inter-translated text of the text written in the first language corresponding to the first text. The specific value of the first predetermined number may be set according to an actual situation, and is not described herein again.

Similarly, for the second text, ranking according to the occurrence times, using the first text in a second predetermined number of second text pairs with the highest occurrence times as a candidate text set of the second text, where the second text pairs are text pairs including the second text, and then further selecting the first text most similar to the second text from the candidate text set based on multiple ways, where the text written in the first language corresponding to the selected first text is the inter-translated text of the second text. For example, the similarity between each first text and the second text in the candidate text set may be calculated, and the first text with the highest similarity is selected, where the text written in the first language corresponding to the first text is the inter-translated text of the second text. The specific numerical values of the second predetermined number may be set according to actual conditions, and are not described herein again.

In addition, under the condition that the first text set is written in the first language and translated into the third language, and the second text set is written in the second language and translated into the third language, the text of the inter-translation relationship can be determined according to the method, and the specific implementation process is not repeated here.

It should be noted that, when constructing the first index and the second index, a weight may be set for each key (i.e. word or phrase), for example, the weight may be set according to the number of occurrences of the word or phrase in the text, or the weight may be set according to the importance of the semantic content of the word or phrase. Thus, in executing step S150, the text belonging to the inter-translation relationship may be determined based on the number of occurrences of the text pair and the weight of the corresponding word or phrase at each occurrence. For example, the sum of the weights of the corresponding words or phrases at each occurrence of each text pair may be calculated to obtain the weight value of each text pair, and the text belonging to the inter-translation relationship may be determined based on the weight values of the text pairs. When the inter-translation text is determined based on the weight values of the text pairs, similar to the two determination methods mentioned above, the text in the inter-translation relationship can be determined directly based on the weight values of the text pairs, the text pairs with high possibility of belonging to the inter-translation text can also be screened based on the weight values of the text pairs, and then the screened text pairs are further processed in other manners to further determine the text belonging to the inter-translation relationship, and the specific determination process is not repeated.

So far, the basic implementation flow of the method for determining the inter-translated text of the present disclosure is described in detail with reference to fig. 1.

[ application example ]

FIG. 2 is a schematic flow chart diagram illustrating a web page text alignment method according to an embodiment of the present disclosure. In this embodiment, the inter-translated bilingual web pages are mainly identified by constructing a double inverted index, and the main flow is as follows:

1. first a set of web pages in two languages is entered. As shown in fig. 2, the web page text in the language a and the web page text in the language B are respectively. The web page text in the language A and the web page text in the language B can be obtained from one or more multi-language websites. ed1, ed2, zd1 and zd2 respectively represent the ID of the web page text (each text has a unique ID), the ID is equivalent to the text identifier in the steps S120 and S130, ew1, ew2, ew3, ew4, zw1, zw2, zw3 and zw4 respectively represent the characters or words in the text, the beginning of the ew represents the characters or words in the A language web page text, and the beginning of the zw represents the characters or words in the B language web page text.

2. After obtaining the web page texts in the two languages, firstly, a text in a certain language is selected for translation, for example, the web page text in the B language can be translated into the text in the a language (of course, the other way around is also possible). The translation process may take a number of forms, such as word-by-word translation using a dictionary or translation using a machine translation engine. Thus, all words (word) in the web page text of the B language become words (word) of the A language.

3. Then, the two groups of texts are constructed into an inverted index. The inverted index is an index with a word or a word as a key and a text ID as a value. Such an indexing scheme can quickly find the text ID by words or phrases. Stop words and/or particularly high frequency words or phrases may be removed when constructing the inverted index.

4. And combining two groups of inverted indexes (respectively constructed by translating the A language text and the B language into the A language text) according to keys, and combining text IDs corresponding to the same keys together, so that a plurality of text groups can be obtained, and text pairs in different languages in the same group can be called as candidate text pairs. As shown in FIG. 3, the text groups obtained by merging the text IDs corresponding to the ew1 are { ed1, ed3, zd1, zd3}, where (ed1, zd1), (ed1, zd3), (ed3, zd1), (ed3, zd3) are all candidate text pairs.

5. And counting the occurrence times of each candidate text pair. For example, the web page texts from two different languages in each text group may form a candidate text pair, and all the text groups obtained in step 4 may be traversed, where the number of times that two web page texts in the same candidate text pair appear in different text groups is the number of times that the candidate text pair appears, and the number of times that the candidate text pair appears is the number of times that the two web page texts have the same key. After the occurrence number of each candidate text pair is obtained, the candidate text pairs may be ranked from high to low, and the occurrence number of each candidate text pair may represent the number of co-occurring words or words, so that the text pairs belonging to the inter-translated text may be determined based on the occurrence number of the candidate text pairs. For example, for a certain text, the text pair having the largest number of occurrences in the text pair containing the text may be regarded as the text pair belonging to the inter-translated text.

The traditional bilingual webpage alignment method is generally completed by two steps:

1. firstly, a bilingual parallel webpage translation calculation method is designed. The degree of inter-translation of two web pages is generally calculated from three perspectives: 1) URL similarity of the web pages, and the URLs of some inter-translated web pages have certain similarity; 2) the structure similarity of the web pages, the web page structures of two inter-translated web pages are often similar; 3) the inter-translation degree of the web page contents, and the inter-translated web pages have more inter-translation of words and sentences

2. And calculating the similarity of two groups of web pages with different languages, and finally obtaining a translation web page pair. In this type of method, a fatal disadvantage is that the calculation amount is very large. In step 2, the computational complexity is O (n ^ 2). For example, if there are n chinese web pages and m english web pages, when calculating to obtain the inter-translated web page pairs, it is necessary to calculate the inter-translation degree of each chinese web page and each english web page (using the method in step 1), and the total number of calculations is n × m. In practical application, the method is very time-consuming, and results can hardly be obtained in a reasonable time under the condition that some web pages are large in quantity.

The present disclosure greatly reduces the amount of computation when identifying inter-translated web pages by introducing an inverted index. Through verification, the calculation amount of the method (100 ten thousand webpage data) can be reduced by more than 1000 times compared with the calculation amount of the traditional method.

As an example of the present disclosure, for each web page text, e.g., ed1, N text pairs with the largest occurrence number may also be selected from the text pair containing ed1, and then a web page text belonging to B language of the inter-translated text as ed1 may be further selected from the N text pairs using a conventional method of computing the inter-translated web page text. That is, the present disclosure may also be used in conjunction with conventional methods, which may serve as a roughing solution for a candidate transliterated document pair, and a conventional method may serve as a refining solution for candidates. Therefore, the accuracy of the calculation result can be ensured while the calculation amount is reduced.

[ analysis method of similarity between texts ]

The present disclosure can also be implemented as an analysis scheme of similarity between texts. Fig. 3 is a schematic flow chart diagram illustrating an analysis method of inter-text similarity according to an embodiment of the present disclosure.

Referring to fig. 3, in step S310, a text set is acquired.

The text collection includes a plurality of texts including a plurality of words or phrases.

In step S320, an index is constructed by using each character or word in the text set as a key and using the text identifier of the text in which the character or word is located as a value.

The text in the corpus of text may be text based on the same language. The words mentioned herein may refer to words corresponding to phrases in the grammatical expression, such as single chinese characters, or words (words) in english, and the words mentioned in the present disclosure may refer to combinations of two or more words, such as words and phrases composed of multiple chinese characters, or phrases composed of multiple words (words) in english.

In step S330, text identifiers corresponding to the same key in the index are grouped into text pairs with each other, and the number of occurrences of each text pair is counted.

The index constructed in step S320 is constructed by using a word or a word in a text as a key (key) and using a text identifier of the text as a value (key), so that two values corresponding to the same key can be quickly searched for as a text pair. The number of occurrences of a text pair is the number of two texts with the same key, i.e. with the same number of words or words, so that the number of occurrences of a text pair may characterize the degree of similarity between two texts to a certain extent.

In step S340, based on the number of occurrences of the text pair, the similarity between the two texts in the text pair is determined.

The occurrence frequency of the text pair can represent the similarity between the two texts to a certain extent, so that the similarity between the two texts in the text pair can be determined based on the occurrence frequency of the text pair, wherein the similarity is positively correlated with the occurrence frequency, namely the texts are more similar as the occurrence frequency is larger.

Furthermore, a weight may be set for each key (i.e. word or phrase) in the index, for example, the weight may be set according to the number of occurrences of the word or phrase in the text, or the weight may be set according to the importance of the semantic content of the word or phrase. Thus, in performing step S430, the similarity between two texts in a text pair can be determined based on the number of occurrences of the text pair and the weight of the corresponding word or word at each occurrence. For example, the sum of the weights of the corresponding words or phrases in each occurrence of each text pair may be calculated to obtain the weight value of each text pair, and the similarity between two texts in the text pair may be determined based on the weight values of the text pairs. Wherein the similarity is positively correlated with the weight value of the text pair.

[ MEANS FOR DETERMINING INTER-TRANSLATED TEXT ]

Fig. 4 is a schematic block diagram illustrating a structure of an apparatus for determining an inter-translated text according to an embodiment of the present disclosure. Wherein the functional blocks of the device can be implemented by hardware, software, or a combination of hardware and software implementing the principles of the present invention. It will be appreciated by those skilled in the art that the functional blocks described in fig. 4 may be combined or divided into sub-blocks to implement the principles of the invention described above. Thus, the description herein may support any possible combination, or division, or further definition of the functional modules described herein.

In the following, functional modules that the device can have and operations that each functional module can perform are briefly described, and for the details related thereto, reference may be made to the above description, and details are not described here again.

Referring to fig. 4, the apparatus 400 for determining an inter-translated text includes an obtaining module 410, a first constructing module 420, a second constructing module 430, a counting module 440, and a determining module 450.

As one example of the present disclosure, the obtaining module 410 is configured to obtain a first text set and a second text set, the first text set being written in a first language and translated into a second language, the second text set being written in the second language, the first text set and the second text set each including a plurality of texts, the plurality of texts including a plurality of words or words. The first constructing module 420 is configured to construct a first index by using each character or word in the first text set as a key and using a text identifier of a text in which the character or word is located as a value. The second constructing module 430 is configured to construct a second index by using each character or word in the second text set as a key and using a text identifier of a text in which the character or word is located as a value. The counting module 440 is configured to combine the text identifiers corresponding to the same key in the first index and the second index into text pairs, and count the occurrence number of each text pair. The texts corresponding to the two text identifiers in the text pair belong to different text sets, and the occurrence frequency is the number of the same keys of the two texts in the text pair. The determining module 450 is configured to determine the texts belonging to the inter-translation relationship based on the number of occurrences of the text pair.

Optionally, the determining module 450 may determine, for a first text, a second text in a first text pair that occurs the most frequently as an inter-translated text of a text written in a first language corresponding to the first text, where the first text pair is a text pair including the first text, the first text is a text in a first text set, and the second text is a text in a second text set. And/or for a second text, the determining module 450 may also determine, as an inter-translated text of the second text, a text written in the first language corresponding to a first text in a second text pair that occurs the most frequently, where the second text pair is a text pair including the second text.

As shown in fig. 5, the determination module 450 may optionally include a candidate text set determination module 451, a first calculation module 453, and a selection module 455.

As an example, the candidate text set determining module 451 is configured to, for a first text, use a second text in a first predetermined number of first text pairs with a largest occurrence number as a candidate text set of the first text, where the first text pair is a text pair including the first text, the first text is a text in the first text set, and the second text is a text in the second text set. The first calculation module 453 is used to calculate the similarity between each second text in the candidate text set and the first text. The selecting module 455 is configured to select the second text with the largest similarity as the inter-translated text of the text written in the first language corresponding to the first text.

As another example, the candidate text set determining module 451 may also be configured to, for a second text, use a first text in a second predetermined number of second text pairs that occur most frequently as a candidate text set of the second text, where the second text pair is a text pair including the second text, the first text is a text in the first text set, and the second text is a text in the second text set. The first calculation module 453 may be used to calculate the similarity between each first text and the second text in the candidate text set. The selecting module 455 may be configured to select a first text with the largest similarity, where a text written in a first language corresponding to the first text is an inter-translated text of a second text.

As shown in fig. 4, the apparatus 400 may further optionally include a text acquisition module 460, which is shown by a dashed box. The text acquiring module 460 is configured to acquire web page texts in different languages from the multi-language website, where the texts in the first text set are translation texts of the web page texts in the first language acquired from the multi-language website corresponding to the second language, and the texts in the second text set are web page texts in the second language acquired from the multi-language website.

As shown in fig. 4, the apparatus 400 may also optionally include a first removal module 470 and/or a second removal module 480 shown in dashed boxes. The first removal module 470 is used to remove stop words and/or high frequency words in the first text set, and the second removal module 480 is used to remove stop words and/or high frequency words in the second text set.

As shown in fig. 4, the apparatus 400 may further optionally include a weight setting module 490 shown by a dashed box. The weight setting module 490 is configured to set a weight for each word or phrase, wherein the determining module 450 may determine the text belonging to the inter-translation relationship based on the number of occurrences of the text pair and the weight of the corresponding word or phrase at each occurrence.

As shown in fig. 6, as an example, the determination module 450 may include a second calculation module 457 and a determination submodule 459. The second calculation module 457 is configured to calculate a sum of weights of corresponding words or phrases in each occurrence of each text pair to obtain a weight value of each text pair. The determining submodule 459 is configured to determine the texts belonging to the inter-translation relationship based on the weight values of the text pairs.

As another example of the present disclosure, for the first text set and the second text set acquired by the acquisition module 410, the first text set may be written in a first language and translated into a third language, the second text set may be written in a second language and translated into the third language, the first text set includes a plurality of first texts, the second text set includes a plurality of second texts, and each of the plurality of first texts and the plurality of second texts includes a plurality of words or words. For operations that the first building module 420, the second building module 430, the statistical module 440, and the determining module 450 can perform, reference may be made to the above description, and details are not repeated here.

[ Analyzer ]

Fig. 7 is a schematic block diagram showing the structure of an inter-text similarity analysis apparatus according to an embodiment of the present disclosure. Wherein the functional blocks of the device can be implemented by hardware, software, or a combination of hardware and software implementing the principles of the present invention. It will be appreciated by those skilled in the art that the functional blocks described in fig. 7 may be combined or divided into sub-blocks to implement the principles of the invention described above. Thus, the description herein may support any possible combination, or division, or further definition of the functional modules described herein.

Referring to fig. 7, the analysis apparatus 700 may include an acquisition module 710, a construction module 720, a statistics module 730, and a similarity determination module 740.

The obtaining module 710 is configured to obtain a text set, where the text set includes a plurality of texts, and the plurality of texts includes a plurality of words or phrases. The construction module 720 is configured to construct an index by using each character or word in the text set as a key and using a text identifier of a text in which the character or word is located as a value. The counting module 730 is configured to combine the text identifiers corresponding to the same key in the index into text pairs, and count the occurrence times of each text pair. The similarity determination module 740 is configured to determine a similarity between two texts in a text pair based on the occurrence frequency of the text pair, where the similarity is positively correlated with the occurrence frequency.

[ calculating device ]

Fig. 8 is a schematic structural diagram of a computing device that can be used to implement the method for determining inter-translated text or the method for analyzing similarity between texts according to an embodiment of the present invention.

Referring to fig. 8, computing device 800 includes memory 810 and processor 820.

The processor 820 may be a multi-core processor or may include multiple processors. In some embodiments, processor 820 may include a general-purpose host processor and one or more special coprocessors such as a Graphics Processor (GPU), a Digital Signal Processor (DSP), or the like. In some embodiments, processor 820 may be implemented using custom circuitry, such as an Application Specific Integrated Circuit (ASIC) or a Field Programmable Gate Array (FPGA).

The memory 810 may include various types of storage units, such as system memory, Read Only Memory (ROM), and permanent storage. Wherein the ROM may store static data or instructions for the processor 820 or other modules of the computer. The persistent storage device may be a read-write storage device. The persistent storage may be a non-volatile storage device that does not lose stored instructions and data even after the computer is powered off. In some embodiments, the persistent storage device employs a mass storage device (e.g., magnetic or optical disk, flash memory) as the persistent storage device. In other embodiments, the permanent storage may be a removable storage device (e.g., floppy disk, optical drive). The system memory may be a read-write memory device or a volatile read-write memory device, such as a dynamic random access memory. The system memory may store instructions and data that some or all of the processors require at runtime. In addition, the memory 810 may include any combination of computer-readable storage media, including various types of semiconductor memory chips (DRAM, SRAM, SDRAM, flash memory, programmable read-only memory), magnetic and/or optical disks, may also be employed. In some embodiments, memory 810 may include a removable storage device that is readable and/or writable, such as a Compact Disc (CD), a read-only digital versatile disc (e.g., DVD-ROM, dual layer DVD-ROM), a read-only Blu-ray disc, an ultra-density optical disc, a flash memory card (e.g., SD card, min SD card, Micro-SD card, etc.), a magnetic floppy disc, or the like. Computer-readable storage media do not contain carrier waves or transitory electronic signals transmitted by wireless or wired means.

The memory 810 has executable code stored thereon, which when executed by the processor 820, causes the processor 820 to perform the above-mentioned method of determining inter-translated text or the analysis method of similarity between texts.

The method, apparatus and device for determining inter-translated text and similarity analysis between texts according to the present disclosure have been described in detail above with reference to the accompanying drawings.

Furthermore, the method according to the invention may also be implemented as a computer program or computer program product comprising computer program code instructions for carrying out the above-mentioned steps defined in the above-mentioned method of the invention.

Alternatively, the invention may also be embodied as a non-transitory machine-readable storage medium (or computer-readable storage medium, or machine-readable storage medium) having stored thereon executable code (or a computer program, or computer instruction code) which, when executed by a processor of an electronic device (or computing device, server, etc.), causes the processor to perform the steps of the above-described method according to the invention.

Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems and methods according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

23页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：数据库制作装置以及检索系统

Analysis method, device and equipment for determining inter-translation text and similarity between texts

相关技术

网友询问留言