Text information generation method and device, electronic equipment and computer readable medium

文档序号:1953443 发布日期:2021-12-10 浏览:16次 中文

阅读说明:本技术 文本信息生成方法、装置、电子设备和计算机可读介质 (Text information generation method and device, electronic equipment and computer readable medium ) 是由 赵靖 于 2021-01-14 设计创作,主要内容包括:本公开的实施例公开了文本信息生成方法、装置、电子设备和计算机可读介质。该方法的一具体实施方式包括:基于所接收的文本,生成词向量序列;将上述词向量序列中的每个词向量进行编码处理以生成第一词向量和第二词向量,得到第一词向量序列和第二词向量序列;将上述第一词向量序列中的每个第一词向量和与上述第一词向量对应的第二词向量进行拼接处理,得到拼接词向量序列;基于上述拼接词向量序列和上述词向量序列,生成上述文本的第一解码文本;基于预设词表、上述拼接词向量序列和上述词向量序列,生成上述文本的第二解码文本。该实施方式提高了生成文本关键词的准确度,使得用户可以快速对文本进行解读,提高了用户浏览文本时的体验感。(The embodiment of the disclosure discloses a text information generation method, a text information generation device, electronic equipment and a computer readable medium. One embodiment of the method comprises: generating a word vector sequence based on the received text; coding each word vector in the word vector sequence to generate a first word vector and a second word vector, and obtaining a first word vector sequence and a second word vector sequence; splicing each first word vector in the first word vector sequence and a second word vector corresponding to the first word vector to obtain a spliced word vector sequence; generating a first decoded text of the text based on the concatenated word vector sequence and the word vector sequence; and generating a second decoding text of the text based on a preset word list, the spliced word vector sequence and the word vector sequence. According to the embodiment, the accuracy of generating the text keywords is improved, so that a user can quickly read the text, and the experience of the user when browsing the text is improved.)

1. A text information generating method includes:

generating a word vector sequence based on the received text;

encoding each word vector in the word vector sequence to generate a first word vector and a second word vector, so as to obtain a first word vector sequence and a second word vector sequence;

splicing each first word vector in the first word vector sequence and a second word vector corresponding to the first word vector to obtain a spliced word vector sequence;

generating a first decoded text of the text based on the concatenated word vector sequence and the word vector sequence;

and generating a second decoding text of the text based on a preset word list, the spliced word vector sequence and the word vector sequence.

2. The method of claim 1, wherein the generating a first decoded text of the text based on the concatenated word vector sequence and the word vector sequence comprises:

selecting a preset number of spliced word vectors from the spliced word vector sequence as an alternative spliced word vector group;

for each alternative concatenation word vector in the alternative concatenation word vector group, executing the following processing steps:

determining the relevance of the alternative spliced word vector and each word vector in the word vector sequence to obtain a relevance group;

determining the relevance meeting a first preset condition in the relevance group and corresponding word vectors in the word vector sequence as alternative word vectors to obtain an alternative word vector group;

determining words in the text corresponding to each candidate word vector in the candidate word vector group as candidate words to obtain candidate word groups;

and generating a first decoding text of the text based on the alternative phrase sequence corresponding to the alternative spliced word vector group.

3. The method according to claim 2, wherein the generating a first decoded text of the text based on the candidate phrase sequence corresponding to the candidate concatenated word vector group comprises:

and splicing all the alternative words meeting a second preset condition in the alternative phrase sequence to generate a first decoding text of the text.

4. The method of claim 2, wherein said generating a second decoded text of said text based on a preset vocabulary, said concatenated word vector sequence, and said word vector sequence comprises:

determining each association degree corresponding to each word vector in the word vector sequence as an association degree sequence to obtain an association degree sequence set;

determining the sum of each relevance degree included in each relevance degree sequence in the relevance degree sequence set as a relevance probability to obtain a relevance probability group;

and generating a second decoding text of the text based on the association probability group, the preset word list, the spliced word vector sequence and the word vector sequence.

5. The method of claim 4, wherein said generating a second decoded text of said text based on said set of associated probabilities, said preset word list, said sequence of concatenated word vectors, and said sequence of word vectors comprises:

selecting spliced word vectors with the association probability meeting a third preset condition from the spliced word vector sequence as a target spliced word vector group;

for each target concatenation word vector in the target concatenation word vector group, performing the following steps:

determining the probability value of the target spliced word vector and each preset word in the preset word list to obtain a probability value group;

selecting words with probability values meeting fourth preset conditions from the preset word list as alternative words to obtain alternative phrases;

determining the association probability of words in the text corresponding to the target spliced word vector as an alternative probability;

and determining the target word based on the alternative phrase, the probability value group and the alternative probability.

6. The method of claim 5, wherein generating a second decoded text of the text based on the set of associated probabilities, the preset word list, the sequence of concatenated word vectors, and the sequence of word vectors further comprises:

and performing splicing processing on the generated target words to generate a second decoded text of the text.

7. The method of claim 1, wherein the method further comprises:

displaying the first decoded text and the second decoded text.

8. A text information generating apparatus comprising:

a first generating unit configured to generate a word vector sequence based on the received text;

the encoding unit is configured to perform encoding processing on each word vector in the word vector sequence to generate a first word vector and a second word vector, so as to obtain a first word vector sequence and a second word vector sequence;

the splicing unit is configured to splice each first word vector in the first word vector sequence and a second word vector corresponding to the first word vector to obtain a spliced word vector sequence;

a second generating unit configured to generate a first decoded text of the text based on the concatenated word vector sequence and the word vector sequence;

a third generating unit configured to generate a second decoded text of the text based on a preset word list, the concatenated word vector sequence, and the word vector sequence.

9. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon;

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.

10. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-7.

Technical Field

The embodiment of the disclosure relates to the technical field of computers, in particular to a text information generation method, a text information generation device, electronic equipment and a computer readable medium.

Background

The text keywords can help readers to quickly obtain the text core content, and are widely applied to the fields of information retrieval, document management, text compression and the like. At present, a commonly used text keyword extraction method generally determines words with high frequency of occurrence in a text as keywords.

However, when the above-described manner is adopted, there are generally the following technical problems: keywords which do not appear in the text cannot be generated according to the text, so that the text keywords cannot be accurately generated, a user cannot quickly read the text, and the experience of the user in browsing the text is reduced.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Some embodiments of the present disclosure propose a text information generation method, apparatus, electronic device, and computer readable medium to solve one or more of the technical problems mentioned in the background section above.

In a first aspect, some embodiments of the present disclosure provide a text information generating method, including: generating a word vector sequence based on the received text; coding each word vector in the word vector sequence to generate a first word vector and a second word vector, and obtaining a first word vector sequence and a second word vector sequence; splicing each first word vector in the first word vector sequence and a second word vector corresponding to the first word vector to obtain a spliced word vector sequence; generating a first decoded text of the text based on the concatenated word vector sequence and the word vector sequence; and generating a second decoding text of the text based on a preset word list, the spliced word vector sequence and the word vector sequence.

Optionally, generating a first decoded text of the text based on the concatenated word vector sequence and the word vector sequence includes: selecting a predetermined number of spliced word vectors from the spliced word vector sequence as an alternative spliced word vector group; for each alternative concatenation word vector in the alternative concatenation word vector group, executing the following processing steps: determining the relevance of the alternative spliced word vector and each word vector in the word vector sequence to obtain a relevance group; taking the relevance meeting a first preset condition in the relevance group and the corresponding word vector in the word vector sequence as candidate word vectors to obtain a candidate word vector group; determining words in the text corresponding to each candidate word vector in the candidate word vector group as candidate words to obtain candidate word groups; and generating a first decoding text of the text based on the candidate phrase set corresponding to the candidate spliced word vector group.

Optionally, the generating a first decoded text of the text based on the candidate phrase set corresponding to the candidate concatenated word vector group includes: and splicing all the alternative words meeting a second preset condition in the alternative word group to generate a first decoding text of the text.

Optionally, the generating a second decoded text of the text based on the preset word list, the concatenated word vector sequence, and the word vector sequence includes: determining each relevance corresponding to each word vector in the word vector sequence as a relevance sequence to obtain a relevance sequence set; determining the sum of each relevance degree included in each relevance degree sequence in the relevance degree sequence set as a relevance probability to obtain a relevance probability group; and generating a second decoded text of the text based on the association probability group, the preset word list, the spliced word vector sequence and the word vector sequence.

Optionally, the generating a second decoded text of the text based on the association probability group, the preset word list, the concatenated word vector sequence, and the word vector sequence includes: selecting spliced word vectors with the association probability meeting a third preset condition from the spliced word vector sequence as a target spliced word vector group; for each target concatenation word vector in the target concatenation word vector group, executing the following steps: determining the probability value of the target spliced word vector and each preset word in the preset word list to obtain a probability value group; selecting words with probability values meeting fourth preset conditions from the preset word list as alternative words to obtain alternative phrases; determining the association probability of the words in the text corresponding to the target spliced word vector as an alternative probability; and determining the target word based on the alternative phrases, the probability value group and the alternative probability.

Optionally, the generating a second decoded text of the text based on the association probability group, the preset word list, the concatenated word vector sequence, and the word vector sequence further includes: and carrying out splicing processing on the generated target words to generate a second decoding text of the text.

Optionally, the method further includes: and displaying the first decoded text and the second decoded text.

In a second aspect, some embodiments of the present disclosure provide a text information generating apparatus, the apparatus including: a first generating unit configured to generate a word vector sequence based on the received text; the encoding unit is configured to perform encoding processing on each word vector in the word vector sequence to generate a first word vector and a second word vector, so as to obtain a first word vector sequence and a second word vector sequence; the splicing unit is configured to splice each first word vector in the first word vector sequence and a second word vector corresponding to the first word vector to obtain a spliced word vector sequence; a second generating unit configured to generate a first decoded text of the text based on the concatenated word vector sequence and the word vector sequence; and a third generating unit configured to generate a second decoded text of the text based on a preset word list, the concatenated word vector sequence, and the word vector sequence.

Optionally, the second generating unit is further configured to: selecting a predetermined number of spliced word vectors from the spliced word vector sequence as an alternative spliced word vector group; for each alternative concatenation word vector in the alternative concatenation word vector group, executing the following processing steps: determining the relevance of the alternative spliced word vector and each word vector in the word vector sequence to obtain a relevance group; taking the relevance meeting a first preset condition in the relevance group and the corresponding word vector in the word vector sequence as candidate word vectors to obtain a candidate word vector group; determining words in the text corresponding to each candidate word vector in the candidate word vector group as candidate words to obtain candidate word groups; and generating a first decoding text of the text based on the alternative phrase sequence corresponding to the alternative spliced word vector group.

Optionally, the second generating unit is further configured to: and splicing all the alternative words meeting a second preset condition in the alternative phrase sequence to generate a first decoded text of the text.

Optionally, the third generating unit is further configured to: determining each relevance corresponding to each word vector in the word vector sequence as a relevance sequence to obtain a relevance sequence set; determining the sum of each relevance degree included in each relevance degree sequence in the relevance degree sequence set as a relevance probability to obtain a relevance probability group; and generating a second decoded text of the text based on the association probability group, the preset word list, the spliced word vector sequence and the word vector sequence.

Optionally, the third generating unit is further configured to: selecting spliced word vectors with the association probability meeting a third preset condition from the spliced word vector sequence as a target spliced word vector group; for each target concatenation word vector in the target concatenation word vector group, executing the following steps: determining the probability value of the target spliced word vector and each preset word in the preset word list to obtain a probability value group; selecting words with probability values meeting fourth preset conditions from the preset word list as alternative words to obtain alternative phrases; determining the association probability of the words in the text corresponding to the target spliced word vector as an alternative probability; and determining the target word based on the alternative phrases, the probability value group and the alternative probability.

Optionally, the third generating unit is further configured to: and carrying out splicing processing on the generated target words to generate a second decoding text of the text.

Optionally, the apparatus further comprises: a display unit configured to display the first decoded text and the second decoded text.

In a third aspect, some embodiments of the present disclosure provide an electronic device, comprising: one or more processors; a storage device having one or more programs stored thereon, which when executed by one or more processors, cause the one or more processors to implement the method described in any of the implementations of the first aspect.

In a fourth aspect, some embodiments of the present disclosure provide a computer readable medium on which a computer program is stored, wherein the program, when executed by a processor, implements the method described in any of the implementations of the first aspect.

The above embodiments of the present disclosure have the following advantages: by the text information generation method of some embodiments of the present disclosure, accuracy of generating text keywords is improved, and practical application of the generated keywords is improved. Specifically, the reason why the turnover efficiency of the articles in the offline store is not high is that: keywords that do not appear in the text cannot be generated from the text, so that text keywords cannot be accurately generated, and the actual application of the generated keywords is reduced. Based on this, the text information generation method of some embodiments of the present disclosure first generates a word vector sequence based on the received text. Thus, data support can be provided for subsequent encoding processes. And then, each word vector in the word vector sequence is encoded to generate a first word vector and a second word vector, so that a first word vector sequence and a second word vector sequence are obtained. Therefore, data support can be provided for the subsequent generation of the spliced word vector. And secondly, splicing each first word vector in the first word vector sequence and a second word vector corresponding to the first word vector to obtain a spliced word vector sequence. Thus, data support is provided for subsequent generation of the first decoded text and the second decoded text. Then, a first decoded text of the text is generated based on the concatenated word vector sequence and the word vector sequence. And finally, generating a second decoding text of the text based on a preset word list, the spliced word vector sequence and the word vector sequence. Therefore, keywords which do not appear in the text can be generated according to the preset word list. Therefore, the accuracy of generating the text keywords is improved, the user can quickly read the text, and the experience of the user when browsing the text is improved.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and elements are not necessarily drawn to scale.

Fig. 1 is a schematic diagram of an application scenario of a text information generation method of some embodiments of the present disclosure;

FIG. 2 is a flow diagram of some embodiments of a text information generation method according to the present disclosure;

FIG. 3 is a flow diagram of further embodiments of a text message generation method according to the present disclosure;

FIG. 4 is a flow diagram of still further embodiments of text information generation methods according to the present disclosure;

FIG. 5 is a schematic block diagram of some embodiments of a text information generating apparatus according to the present disclosure;

FIG. 6 is a schematic structural diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings. The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 is a schematic diagram of an application scenario of a text information generation method according to some embodiments of the present disclosure.

In the application scenario of fig. 1, first, the computing device 101 may generate a word vector sequence 103 based on the received text 102. For example, the text 102 may be a piece of information text, for example, the text 102 may be "called XX city chest surgery surveillance video for verification that two infected persons of XX city epidemic situation leave a closed disease area to a detection room for detection during isolated observation in a city chest surgery hospital, and the detection room is infected by virus due to irregular prevention and control disinfection, so that a certain inpatient plum and a certain accompanying cow are infected, which are examined in the same detection room in the morning of the next day, and the virus is brought into the tuberculosis disease area, so that the epidemic situation is spread in the hospital". Next, the computing device 101 may perform an encoding process on each word vector in the word vector sequence 103 to generate a first word vector and a second word vector, resulting in a first word vector sequence 104 and a second word vector sequence 105. Here, the encoding process may be to input each word vector of the above-described word vector sequences to a bidirectional LSTM (Long Short-Term Memory network) to generate a first word vector sequence and a second word vector sequence. Here, bi-directional LSTM may refer to forward LSTM and backward LSTM. Here, the first word vector in the first word vector sequence 104 may refer to a word vector generated by forward LSTM. Here, the second word vector in the second word vector sequence 105 may be a word vector generated by inverse LSTM. Next, the computing device 101 may perform a splicing process on each first word vector in the first word vector sequence 104 and a second word vector corresponding to the first word vector to obtain a spliced word vector sequence 106. The computing device 101 may then generate a first decoded text 107 of the text 102 based on the sequence of concatenated word vectors 106 and the sequence of word vectors 103. For example, the concatenated word vector sequence 106 and the word vector sequence 103 may be input to a one-way LSTM (Long Short-Term Memory network) to generate the first decoded text 108. For example, the first decoded text 107 may be "XX market epidemic". Finally, the computing device 101 may generate a second decoded text 109 of the text 102 based on the predetermined vocabulary 107, the sequence of concatenated word vectors 106, and the sequence of word vectors 103. Here, the concatenated word vector sequence 106 and the word vector sequence 103 may be input into a unidirectional LSTM storing a preset word list 107 to generate a first decoded text 108. For example, the first decoded text 108 may be "management confusion". For example, preset word list 107 may be a word list preset in one-way LSTM.

The computing device 101 may be hardware or software. When the computing device is hardware, it may be implemented as a distributed cluster composed of multiple servers or terminal devices, or may be implemented as a single server or a single terminal device. When the computing device is embodied as software, it may be installed in the hardware devices enumerated above. It may be implemented, for example, as multiple software or software modules to provide distributed services, or as a single software or software module. And is not particularly limited herein.

It should be understood that the number of computing devices in FIG. 1 is merely illustrative. There may be any number of computing devices, as implementation needs dictate.

With continued reference to fig. 2, a flow 200 of some embodiments of a textual information generation method according to the present disclosure is shown. The text information generation method comprises the following steps:

based on the received text, a word vector sequence is generated, step 201.

In some embodiments, first, an executing body (for example, the computing device 101 shown in fig. 1) of the text information generating method may perform word segmentation processing on the received text to generate a word sequence. And performing word embedding processing on each word in the word sequence to generate a word vector sequence. Here, the word segmentation process may be a Chinese word segmentation.

Step 202, performing encoding processing on each word vector in the word vector sequence to generate a first word vector and a second word vector, and obtaining a first word vector sequence and a second word vector sequence.

In some embodiments, the execution body may perform an encoding process on each word vector in the word vector sequence to generate a first word vector and a second word vector, resulting in a first word vector sequence and a second word vector sequence. Here, the encoding process may be to input each word vector of the above-described word vector sequences to a bidirectional LSTM (Long Short-Term Memory network) to generate a first word vector sequence and a second word vector sequence. Here, bi-directional LSTM may refer to forward LSTM and backward LSTM. Here, the first word vector in the first word vector sequence may refer to a word vector generated by forward LSTM. Here, the second word vector in the second word vector sequence may be a word vector generated by inverse LSTM.

Step 203, performing a splicing process on each first word vector in the first word vector sequence and a second word vector corresponding to the first word vector to obtain a spliced word vector sequence.

In some embodiments, the execution subject may perform a splicing process on each first word vector in the first word vector sequence and a second word vector corresponding to the first word vector to obtain a spliced word vector sequence.

And 204, generating a first decoding text of the text based on the spliced word vector sequence and the word vector sequence.

In some embodiments, first, the execution body may determine, by using an euclidean distance formula, a relevance between each spliced word vector in the sequence of spliced word vectors and each word vector in the sequence of word vectors, so as to obtain a sequence of relevance. Then, the relevance degree with the relevance degree greater than or equal to a predetermined threshold value may be selected from the relevance degree sequence as a candidate relevance degree, and a candidate relevance degree set may be obtained. And then determining words in the text corresponding to each alternative relevance in the alternative relevance set as alternative words to obtain an alternative word set. Finally, the alternative words of the alternative word set may be spliced to generate a first decoded text of the text. Here, the setting of the predetermined threshold is not limited.

Step 205, generating a second decoded text of the text based on a preset word list, the concatenated word vector sequence and the word vector sequence.

In some embodiments, first, the execution body may input the concatenated word vector sequence to a unidirectional LSTM in which a preset word list is stored in advance to generate a preset word score group. Then, a preset word score greater than or equal to a preset score can be selected from the preset word score group as a candidate preset word score, and a candidate preset word score group is obtained. Here, the setting of the predetermined score is not limited. Here, the preset word score may refer to a degree of association between a word vector corresponding to the preset word and the concatenated word vector. Secondly, the execution main body may determine a preset word corresponding to each score of the alternative preset words in the score group of the alternative preset words as an alternative preset word, so as to obtain an alternative preset word group. Then, the relevance degrees corresponding to each word vector in the word vector sequence may be summed to generate a relevance degree sum, so as to obtain a relevance degree sum group. The association degree and the corresponding word vector which are greater than or equal to the target threshold in the association degree and group can be determined as alternative words to obtain alternative phrases. Finally, the execution main body may perform a splicing process on each candidate preset word in the candidate preset word group and each candidate word in the candidate word group to generate a second decoded text of the text. Here, the setting of the target threshold is not limited.

The above embodiments of the present disclosure have the following advantages: by the text information generation method of some embodiments of the present disclosure, accuracy of generating text keywords is improved, and practical application of the generated keywords is improved. Specifically, the reason why the turnover efficiency of the articles in the offline store is not high is that: keywords that do not appear in the text cannot be generated from the text, so that text keywords cannot be accurately generated, and the actual application of the generated keywords is reduced. Based on this, the text information generation method of some embodiments of the present disclosure first generates a word vector sequence based on the received text. Thus, data support can be provided for subsequent encoding processes. And then, each word vector in the word vector sequence is encoded to generate a first word vector and a second word vector, so that a first word vector sequence and a second word vector sequence are obtained. Therefore, data support can be provided for the subsequent generation of the spliced word vector. And secondly, splicing each first word vector in the first word vector sequence and a second word vector corresponding to the first word vector to obtain a spliced word vector sequence. Thus, data support is provided for subsequent generation of the first decoded text and the second decoded text. Then, a first decoded text of the text is generated based on the concatenated word vector sequence and the word vector sequence. And finally, generating a second decoding text of the text based on a preset word list, the spliced word vector sequence and the word vector sequence. Therefore, keywords which do not appear in the text can be generated according to the preset word list. Therefore, the accuracy of generating the text keywords is improved, the user can quickly read the text, and the experience of the user when browsing the text is improved.

With further reference to fig. 3, a flow diagram of further embodiments of a textual information generation method according to the present disclosure is shown. The text information generation method comprises the following steps:

step 301, generating a word vector sequence based on the received text.

Step 302, each word vector in the word vector sequence is encoded to generate a first word vector and a second word vector, and a first word vector sequence and a second word vector sequence are obtained.

Step 303, performing a splicing process on each first word vector in the first word vector sequence and the second word vector corresponding to the first word vector to obtain a spliced word vector sequence.

In some embodiments, the specific implementation and technical effects of steps 301 and 303 can refer to steps 201 and 203 in the embodiments corresponding to fig. 2, which are not described herein again.

And 304, selecting a predetermined number of spliced word vectors from the spliced word vector sequence as an alternative spliced word vector group.

In some embodiments, an executing subject of the text information generating method (e.g., the computing device 101 shown in fig. 1) may select a predetermined number of concatenated word vectors from the above-described concatenated word vector sequence as the candidate concatenated word vector group. Here, the predetermined number may be a preset value, and is not limited.

And 305, executing a processing step for each alternative spliced word vector in the alternative spliced word vector group.

In some embodiments, the executing body may execute, for each candidate concatenation word vector in the candidate concatenation word vector group, a processing step of:

firstly, determining the relevance of the alternative spliced word vector and each word vector in the word vector sequence to obtain a relevance group. In practice, the relevance between the candidate spliced word vector and each word vector in the word vector sequence can be determined through the Euclidean distance, and a relevance group is obtained.

And secondly, determining the relevance meeting a first preset condition in the relevance group and the corresponding word vector in the word vector sequence as an alternative word vector to obtain an alternative word vector group. Here, the first predetermined condition may be "the degree of association is equal to or greater than a predetermined threshold value". Here, the setting of the predetermined threshold is not limited.

And thirdly, determining words in the text corresponding to each candidate word vector in the candidate word vector group as candidate words to obtain a candidate word group.

And step 306, generating a first decoding text of the text based on the alternative phrase sequence corresponding to the alternative spliced word vector group.

In some embodiments, the execution body may generate the first decoded text of the text based on the candidate phrase sequence corresponding to the candidate concatenated word vector group through various methods.

In some optional implementation manners of some embodiments, the executing body may perform a splicing process on each candidate word that satisfies a second predetermined condition in the candidate word group sequence to generate a first decoded text of the text. Firstly, the execution main body may determine the candidate word with the maximum relevance degree in each candidate phrase in the candidate phrase sequence as the concatenated word to obtain the concatenated word sequence. Then, each concatenated word in the concatenated word sequence may be concatenated to generate a first decoded text of the text. Here, the second predetermined condition may be "the candidate word in the candidate phrase is equal to or greater than any of the candidate words in the candidate phrase".

Step 307, determining each relevance corresponding to each word vector in the word vector sequence as a relevance sequence, and obtaining a relevance sequence set.

In some embodiments, the execution subject may determine, as the sequence of relevance degrees, respective relevance degrees corresponding to each word vector in the sequence of word vectors, to obtain a set of relevance degree sequences.

And 308, determining the sum of the relevance degrees included in each relevance degree sequence in the relevance degree sequence set as a relevance probability to obtain a relevance probability group.

In some embodiments, the execution subject may determine, as the association probability, a sum of the respective association degrees included in each association degree sequence in the association degree sequence set, to obtain an association probability group.

Step 309, generating a second decoded text of the text based on the association probability group, the preset word list, the concatenated word vector sequence and the word vector sequence.

In some embodiments, the execution body may generate the second decoded text of the text based on the association probability set, the preset word list, the concatenated word vector sequence, and the word vector sequence by various methods.

In some optional implementations of some embodiments, the executing entity may generate the second decoded text of the text by:

firstly, selecting spliced word vectors with the association probability meeting a third preset condition from the spliced word vector sequence as a target spliced word vector group. Here, the third predetermined condition may be that "the association probability is equal to or less than the target probability". Here, the setting of the target probability is not limited.

Secondly, for each target spliced word vector in the target spliced word vector group, executing the following steps:

the first substep is to determine the probability value of the target concatenated word vector and each preset word in the preset word list to obtain a probability value group. Here, the probability value may refer to a degree of association between the target concatenated word vector and a word vector corresponding to each preset word in the preset word list. Here, the probability value set may be obtained by determining the target concatenated word vector and the probability value of each preset word in the preset word list through an euclidean distance formula.

And a second substep of selecting words with probability values meeting fourth preset conditions from the preset word list as alternative words to obtain alternative phrases. Here, the fourth predetermined condition may be that "the probability value corresponding to a word in the preset word list is equal to or greater than the probability value corresponding to any word in the preset word list".

And a third substep, determining the association probability of the words in the text corresponding to the target spliced word vector as a candidate probability.

And a fourth substep of determining the target word based on the candidate phrases, the set of probability values and the candidate probabilities. In practice, the probability with the largest value may be selected from among the set of probability values and the candidate probabilities. And determining the word corresponding to the probability with the maximum numerical value as the target word.

And a fifth substep of performing a concatenation process on the generated target words to generate a second decoded text of the text.

As can be seen from fig. 3, compared with the description of some embodiments corresponding to fig. 2, the process 300 in some embodiments corresponding to fig. 3 embodies the selection of keywords that do not appear in the above text from the preset vocabulary by the decoding probability. Therefore, the accuracy of generating the text keywords is improved, the user can quickly read the text, and the experience of the user when browsing the text is improved.

With further reference to fig. 4, still further embodiments of textual information generation methods according to the present disclosure are illustrated. The text information generation method comprises the following steps:

based on the received text, a word vector sequence is generated, step 401.

Step 402, each word vector in the word vector sequence is encoded to generate a first word vector and a second word vector, and a first word vector sequence and a second word vector sequence are obtained.

Step 403, performing a splicing process on each first word vector in the first word vector sequence and a second word vector corresponding to the first word vector to obtain a spliced word vector sequence.

Step 404, generating a first decoded text of the text based on the concatenated word vector sequence and the word vector sequence.

Step 405, generating a second decoded text of the text based on a preset word list, the concatenated word vector sequence and the word vector sequence.

In some embodiments, the specific implementation and technical effects of steps 401 and 405 may refer to steps 201 and 205 in the embodiments corresponding to fig. 2, which are not described herein again.

Step 406, displaying the first decoded text and the second decoded text.

In some embodiments, the execution body may display the first decoded text and the second decoded text.

As can be seen from fig. 4, compared with the description of some embodiments corresponding to fig. 2, the process 400 in some embodiments corresponding to fig. 4 embodies displaying the first decoded text and the second decoded text for subsequent browsing.

With further reference to fig. 5, as an implementation of the methods shown in the above figures, the present disclosure provides some embodiments of a text information generating apparatus, which correspond to those shown in fig. 2, and which may be applied in various electronic devices in particular.

As shown in fig. 5, the text information generating apparatus 500 of some embodiments includes: a first generation unit 501, an encoding unit 502, a concatenation unit 503, a second generation unit 504, and a third generation unit 505. Wherein the first generating unit 501 is configured to generate a sequence of word vectors based on the received text; the encoding unit 502 is configured to perform encoding processing on each word vector in the word vector sequence to generate a first word vector and a second word vector, resulting in a first word vector sequence and a second word vector sequence; the splicing unit 503 is configured to splice each first word vector in the first word vector sequence with a second word vector corresponding to the first word vector to obtain a spliced word vector sequence; the second generating unit 504 is configured to generate a first decoded text of the text based on the concatenated word vector sequence and the word vector sequence; the third generating unit 505 is configured to generate a second decoded text of the text based on the preset word list, the concatenated word vector sequence and the word vector sequence.

Optionally, the second generating unit 504 is further configured to: selecting a predetermined number of spliced word vectors from the spliced word vector sequence as an alternative spliced word vector group; for each alternative concatenation word vector in the alternative concatenation word vector group, executing the following processing steps: determining the relevance of the alternative spliced word vector and each word vector in the word vector sequence to obtain a relevance group; taking the relevance meeting a first preset condition in the relevance group and the corresponding word vector in the word vector sequence as candidate word vectors to obtain a candidate word vector group; determining words in the text corresponding to each candidate word vector in the candidate word vector group as candidate words to obtain candidate word groups; and generating a first decoding text of the text based on the alternative phrase sequence corresponding to the alternative spliced word vector group.

Optionally, the second generating unit 504 is further configured to: and splicing all the alternative words meeting a second preset condition in the alternative phrase sequence to generate a first decoded text of the text.

Optionally, the third generating unit 505 is further configured to: determining each relevance corresponding to each word vector in the word vector sequence as a relevance sequence to obtain a relevance sequence set; determining the sum of each relevance degree included in each relevance degree sequence in the relevance degree sequence set as a relevance probability to obtain a relevance probability group; and generating a second decoded text of the text based on the association probability group, the preset word list, the spliced word vector sequence and the word vector sequence.

Optionally, the third generating unit 505 is further configured to: selecting spliced word vectors with the association probability meeting a third preset condition from the spliced word vector sequence as a target spliced word vector group; for each target concatenation word vector in the target concatenation word vector group, executing the following steps: determining the probability value of the target spliced word vector and each preset word in the preset word list to obtain a probability value group; selecting words with probability values meeting fourth preset conditions from the preset word list as alternative words to obtain alternative phrases; determining the association probability of the words in the text corresponding to the target spliced word vector as an alternative probability; and determining the target word based on the alternative phrases, the probability value group and the alternative probability.

Optionally, the third generating unit 505 is further configured to: and carrying out splicing processing on the generated target words to generate a second decoding text of the text.

Optionally, the apparatus 500 further comprises: a display unit configured to display the first decoded text and the second decoded text.

It will be understood that the elements described in the apparatus 500 correspond to various steps in the method described with reference to fig. 2. Thus, the operations, features and resulting advantages described above with respect to the method are also applicable to the apparatus 500 and the units included therein, and are not described herein again.

Referring now to FIG. 6, a block diagram of an electronic device (e.g., computing device 101 of FIG. 1)600 suitable for use in implementing some embodiments of the present disclosure is shown. The electronic device in some embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle-mounted terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 6, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM602, and the RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 6 may represent one device or may represent multiple devices as desired.

In particular, according to some embodiments of the present disclosure, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In some such embodiments, the computer program may be downloaded and installed from a network through the communication device 609, or installed from the storage device 608, or installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of some embodiments of the present disclosure.

It should be noted that the computer readable medium described in some embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: generating a word vector sequence based on the received text; coding each word vector in the word vector sequence to generate a first word vector and a second word vector, and obtaining a first word vector sequence and a second word vector sequence; splicing each first word vector in the first word vector sequence and a second word vector corresponding to the first word vector to obtain a spliced word vector sequence; generating a first decoded text of the text based on the concatenated word vector sequence and the word vector sequence; and generating a second decoding text of the text based on a preset word list, the spliced word vector sequence and the word vector sequence.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in some embodiments of the present disclosure may be implemented by software, and may also be implemented by hardware. The described units may also be provided in a processor, and may be described as: a processor includes a first generation unit, an encoding unit, a splicing unit, a second generation unit, and a third generation unit. For example, the concatenation unit may be further described as "a unit that concatenates each first word vector in the first word vector sequence with a second word vector corresponding to the first word vector to obtain a concatenated word vector sequence".

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

19页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:用于识别文本的方法和装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!