Answer generation method and device, electronic equipment and storage medium

文档序号：1846549 发布日期：2021-11-16 浏览：13次中文

阅读说明：本技术 答案生成方法、装置、电子设备及存储介质 (Answer generation method and device, electronic equipment and storage medium ) 是由彭德家王唯康于 2021-10-18 设计创作，主要内容包括：本申请涉及自然语言处理技术领域,公开了一种答案生成方法、装置、电子设备及存储介质,该方法包括：对第一文本序列进行编码处理,得到第一文本序列中各词的编码向量；根据第一文本序列中各词的编码向量和解码器网络在当前时间步的输出隐向量,确定第一文本序列中各词对应的注意力权值；根据第一文本序列中各词对应的注意力权值、解码器网络在当前时间步的输出隐向量和解码器网络在当前时间步的输入向量,确定第一概率分布；根据第一概率和第二概率,在预设词表和第一文本序列中确定当前时间步对应的答案词；本方案可以提高生成答案文本的准确度。本发明实施例可应用于云技术、人工智能、智慧交通等各种场景。(The application relates to the technical field of natural language processing, and discloses an answer generation method, an answer generation device, electronic equipment and a storage medium, wherein the answer generation method comprises the following steps: coding the first text sequence to obtain a coding vector of each word in the first text sequence; determining attention weights corresponding to the words in the first text sequence according to the coded vectors of the words in the first text sequence and the output hidden vectors of the decoder network at the current time step; determining a first probability distribution according to attention weights corresponding to words in the first text sequence, output hidden vectors of the decoder network at the current time step and input vectors of the decoder network at the current time step; according to the first probability and the second probability, determining answer words corresponding to the current time step in a preset word list and a first text sequence; the method and the device can improve the accuracy of generating the answer text. The embodiment of the invention can be applied to various scenes such as cloud technology, artificial intelligence, intelligent traffic and the like.)

1. An answer generation method, comprising:

coding a first text sequence to obtain a coding vector of each word in the first text sequence, wherein the first text sequence indicates a problem text;

determining an attention weight value corresponding to each word in the first text sequence at the current time step according to the coded vector of each word in the first text sequence and an output implicit vector of a decoder network at the current time step;

determining a first probability distribution according to the attention weight value corresponding to each word in the first text sequence at the current time step, the output hidden vector of the decoder network at the current time step and the input vector of the decoder network at the current time step, wherein the first probability distribution is used for indicating a first probability that an answer word corresponding to the current time step is from a preset word list and a second probability from the first text sequence;

according to the first probability and the second probability, determining answer words corresponding to the current time step in the preset word list and the first text sequence; the answer words are used for determining answer texts corresponding to the question texts.

2. The method of claim 1, wherein determining a first probability distribution based on the attention weight corresponding to each word in the first text sequence at the current time step, the output hidden vector of the decoder network at the current time step, and the input vector of the decoder network at the current time step comprises:

determining a context vector corresponding to the current time step according to a coding vector corresponding to each word in the first text sequence and an attention weight value corresponding to each word in the first text sequence at the current time step;

splicing the context vector corresponding to the current time step, the output implicit vector of the decoder network at the current time step and the input vector of the decoder network at the current time step to obtain a first spliced vector;

transforming the first splicing vector to obtain the first probability;

determining the second probability from the first probability, wherein a sum of the first probability and the second probability is 1.

3. The method according to claim 2, wherein the determining the context vector corresponding to the current time step according to the coding vector corresponding to each word in the first text sequence and the attention weight corresponding to each word in the first text sequence at the current time step comprises:

and taking the attention weight value corresponding to each word in the first text sequence at the current time step as a weighting coefficient of the corresponding word, and performing weighting processing on the coding vectors of all words in the first text sequence to obtain the context vector corresponding to the current time step.

4. The method of claim 1, wherein the determining the attention weight corresponding to each word in the first text sequence at the current time step according to the coded vector of each word in the first text sequence and the output hidden vector of the decoder network at the current time step further comprises:

acquiring an input vector of the decoder network at a current time step, wherein the input vector of the decoder network at the current time step comprises an output implicit vector of the decoder network at a previous time step, and the input vector of the decoder network at a first time step comprises an implicit vector corresponding to a start tag;

and processing by the decoder network according to the input vector of the current time step, and outputting an output implicit vector of the current time step.

5. The method of claim 1, wherein determining the answer word corresponding to the current time step in the preset vocabulary and the first text sequence according to the first probability and the second probability comprises:

acquiring a second probability distribution, wherein the second probability distribution is used for indicating the reference probability that each word in a preset word list is an answer word corresponding to the current time step;

according to the first probability and the second probability, carrying out weighting processing on the second probability distribution and attention weights corresponding to the words in the first text sequence at the current time step to determine a target probability distribution, wherein the target probability distribution is used for indicating target probabilities that the words in the preset word list and the first text sequence are answer words corresponding to the current time step;

and screening in the preset word list and the first text sequence according to the target probability, and determining an answer word corresponding to the current time step.

6. The method of claim 5, wherein obtaining the second probability distribution comprises:

splicing the context vector corresponding to the current time step with an output implicit vector of the decoder network at the current time step to obtain a second spliced vector;

and performing linear transformation on the second splicing vector, and performing probability prediction based on a result of the linear transformation to obtain the second probability distribution.

7. The method of claim 1, wherein determining the attention weight corresponding to each word in the first text sequence at the current time step according to the coded vector of each word in the first text sequence and an output hidden vector of a decoder network at the current time step comprises:

splicing the coding vector of each word in the first text sequence with the output implicit vector of the decoder network at the current time step respectively to obtain a third spliced vector corresponding to each word in the first text sequence;

performing linear transformation on each third splicing vector to obtain a middle transformation vector corresponding to each word in the first text sequence;

activating each intermediate transformation vector to obtain an initial attention weight value of each word in the first text sequence to the answer word corresponding to the current time step;

and normalizing each initial attention weight value to obtain the attention weight value corresponding to each word in the first text sequence at the current time step.

8. The method of claim 1, wherein the encoding the first text sequence to obtain the encoding vector of each word in the first text sequence comprises:

and carrying out bidirectional coding on the first text sequence through a coder network to obtain a coding vector of each word in the first text sequence.

9. The method of claim 1, wherein before the encoding the first text sequence to obtain the encoding vector of each word in the first text sequence, the method further comprises:

and performing text splicing on the question text and the answer polarity corresponding to the question text to obtain the first text sequence.

10. The method of claim 1, wherein before the encoding the first text sequence to obtain the encoding vector of each word in the first text sequence, the method further comprises:

and performing text splicing on the question text, the answer polarity corresponding to the question text and the answer corresponding to the question text according to the text to obtain the first text sequence.

11. The method according to claim 1, wherein after determining the answer word corresponding to the current time step in the preset vocabulary and the first text sequence according to the first probability and the second probability, the method further comprises:

associating the question text, the answer polarity corresponding to the question text and the answer text corresponding to the question text to obtain a question-answer pair;

and storing the question-answer pairs into a question-answer database.

12. The method of claim 10, further comprising:

receiving a search request, the search request indicating a target problem;

performing question matching in the question-answer database according to the target question, and determining a target question-answer pair of a question text matched with the target question;

and returning the answer text in the target question-answer pair to the initiator of the search request.

13. An answer generating apparatus, comprising:

the system comprises a coding processing module, a problem text generating module and a problem text generating module, wherein the coding processing module is used for coding a first text sequence to obtain a coding vector of each word in the first text sequence, and the first text sequence indicates the problem text;

an attention weight determination module, configured to determine an attention weight corresponding to each word in the first text sequence at the current time step according to a coded vector of each word in the first text sequence and an output hidden vector of a decoder network at the current time step;

a first probability distribution determining module, configured to determine a first probability distribution according to an attention weight corresponding to each word in the first text sequence at a current time step, an output hidden vector of the decoder network at the current time step, and an input vector of the decoder network at the current time step, where the first probability distribution is used to indicate a first probability that an answer word corresponding to the current time step is from a preset vocabulary and a second probability from the first text sequence;

an answer word determining module, configured to determine, according to the first probability and the second probability, an answer word corresponding to the current time step in the preset word list and the first text sequence; the answer words are used for determining answer texts corresponding to the question texts.

14. An electronic device, comprising:

a processor;

a memory having computer-readable instructions stored thereon which, when executed by the processor, implement the method of any of claims 1-12.

15. A computer readable storage medium having computer readable instructions stored thereon which, when executed by a processor, implement the method of any one of claims 1-12.

16. A computer program product comprising computer instructions, characterized in that the computer instructions, when executed by a processor, implement the method of any of claims 1-12.

Technical Field

The present application relates to the field of natural language processing technologies, and in particular, to an answer generation method and apparatus, an electronic device, and a storage medium.

Background

With the development of artificial intelligence technology, the research content of natural language processing technology, which is one of the important research fields of artificial intelligence technology, is becoming more and more abundant, such as machine translation, automatic generation of summaries, automatic generation of answers, and so on.

In the related art, the automatic generation of the answer is mainly based on a rule to generate an answer text of a question text, firstly, the question text input by a user is segmented, then a certain word is selected according to an existing word list, and then the selected word is changed according to a set rule, for example, "can" is changed into "cannot" as an answer when the polarity of the answer text is reverse polarity. In practice, the problem that the accuracy is not high exists in the method for automatically generating answers based on the rules.

Disclosure of Invention

In view of the foregoing problems, embodiments of the present application provide an answer generation method, apparatus, electronic device and storage medium to improve the foregoing problems.

According to an aspect of an embodiment of the present application, there is provided an answer generation method including: coding a first text sequence to obtain a coding vector of each word in the first text sequence, wherein the first text sequence indicates a problem text; determining an attention weight value corresponding to each word in the first text sequence at the current time step according to the coded vector of each word in the first text sequence and an output implicit vector of a decoder network at the current time step; determining a first probability distribution according to the attention weight value corresponding to each word in the first text sequence at the current time step, the output hidden vector of the decoder network at the current time step and the input vector of the decoder network at the current time step, wherein the first probability distribution is used for indicating a first probability that an answer word corresponding to the current time step is from a preset word list and a second probability from the first text sequence; according to the first probability and the second probability, determining answer words corresponding to the current time step in the preset word list and the first text sequence; the answer words are used for determining answer texts corresponding to the question texts.

According to an aspect of an embodiment of the present application, there is provided an answer generating apparatus including: the system comprises a coding processing module, a problem text generating module and a problem text generating module, wherein the coding processing module is used for coding a first text sequence to obtain a coding vector of each word in the first text sequence, and the first text sequence indicates the problem text; an attention weight determination module, configured to determine an attention weight corresponding to each word in the first text sequence at the current time step according to a coded vector of each word in the first text sequence and an output hidden vector of a decoder network at the current time step; a first probability distribution determining module, configured to determine a first probability distribution according to the attention weight corresponding to each word in the first text sequence at the current time step, the output hidden vector of the decoder network at the current time step, and the input vector of the decoder network at the current time step, where the first probability distribution is used to indicate a first probability that an answer word corresponding to the current time step is from a preset vocabulary and a second probability from the first text sequence; an answer word determining module, configured to determine, according to the first probability and the second probability, an answer word corresponding to the current time step in the preset word list and the first text sequence; the answer words are used for determining answer texts corresponding to the question texts.

In some embodiments of the present application, the first probability distribution determination module includes: a context vector determining unit, configured to determine a context vector corresponding to the current time step according to a coding vector corresponding to each word in the first text sequence and an attention weight corresponding to each word in the first text sequence at the current time step; the first splicing unit is used for splicing the context vector corresponding to the current time step, the output implicit vector of the decoder network at the current time step and the input vector of the decoder network at the current time step to obtain a first splicing vector; a first probability determining unit, configured to perform transformation processing on the first stitching vector to obtain the first probability; a second probability determination unit configured to determine the second probability according to the first probability, wherein a sum of the first probability and the second probability is 1.

In some embodiments of the present application, the context vector determination unit is further configured to: and taking the attention weight value corresponding to each word in the first text sequence at the current time step as a weighting coefficient of the corresponding word, and performing weighting processing on the coding vectors of all words in the first text sequence to obtain the context vector corresponding to the current time step.

In some embodiments of the present application, the answer generation apparatus further includes: an input vector acquisition unit, configured to acquire an input vector of the decoder network at a current time step, where the input vector of the decoder network at the current time step includes an output hidden vector of the decoder network at a previous time step, and the input vector of the decoder network at a first time step includes a hidden vector corresponding to a start tag; and the output implicit vector output unit is used for processing by the decoder network according to the input vector of the current time step and outputting the output implicit vector of the current time step.

In some embodiments of the present application, the answer term determination module includes: a second probability distribution obtaining unit, configured to obtain a second probability distribution, where the second probability distribution is used to indicate a reference probability that each word in a preset word list is an answer word corresponding to the current time step; a weighting processing unit, configured to perform weighting processing on the second probability distribution and an attention weight corresponding to each word in the first text sequence at the current time step according to the first probability and the second probability, and determine a target probability distribution, where the target probability distribution is used to indicate target probabilities that each word in the preset word list and the first text sequence is an answer word corresponding to the current time step; and the screening unit is used for screening in the preset word list and the first text sequence according to the target probability and determining the answer word corresponding to the current time step.

In some embodiments of the present application, the second probability distribution obtaining unit includes: the second splicing unit is used for splicing the context vector corresponding to the current time step with the output implicit vector of the decoder network at the current time step to obtain a second spliced vector; and the first linear transformation unit is used for performing linear transformation on the second splicing vector and performing probability prediction based on the result of the linear transformation to obtain the second probability distribution.

In some embodiments of the present application, the attention weight determination module includes: a third splicing unit, configured to splice the coding vector of each word in the first text sequence with the output hidden vector of the decoder network at the current time step, respectively, to obtain a third splicing vector corresponding to each word in the first text sequence; the second linear transformation unit is used for performing linear transformation on each third splicing vector to obtain a middle transformation vector corresponding to each word in the first text sequence; the activation processing unit is used for performing activation processing on each intermediate transformation vector to obtain an initial attention weight value of each word in the first text sequence to the answer word corresponding to the current time step; and the normalization processing unit is used for performing normalization processing on each initial attention weight value to obtain an attention weight value corresponding to each word in the first text sequence at the current time step.

In some embodiments of the present application, the encoding processing module is further configured to: and carrying out bidirectional coding on the first text sequence through a coder network to obtain a coding vector of each word in the first text sequence.

In some embodiments of the present application, the answer generating device further comprises: and the first text splicing module is used for performing text splicing on the question text and the answer polarity corresponding to the question text to obtain the first text sequence.

In other embodiments of the present application, the answer generating device further comprises: and the second text splicing module is used for performing text splicing on the question text, the answer polarity corresponding to the question text and the answer corresponding to the question text according to the text to obtain the first text sequence.

In some embodiments of the present application, the answer generating device further comprises: the association module is used for associating the question text, the answer polarity corresponding to the question text and the answer text corresponding to the question text to obtain a question-answer pair; and the storage module is used for storing the question-answer pairs to a question-answer database.

In some embodiments of the present application, the answer generating device further comprises: a search request receiving module for receiving a search request, the search request indicating a target problem; the question matching module is used for performing question matching in the question-answer database according to the target question and determining a target question-answer pair of which the question text is matched with the target question; and the returning module is used for returning the answer text in the target question-answer pair to the initiator of the search request.

According to an aspect of an embodiment of the present application, there is provided an electronic device including: a processor; a memory having computer readable instructions stored thereon which, when executed by the processor, implement the answer generation method as described above.

According to an aspect of embodiments of the present application, there is provided a computer-readable storage medium having stored thereon computer-readable instructions which, when executed by a processor, implement the answer generation method as described above.

According to an aspect of embodiments of the present application, there is provided a computer program product comprising computer instructions which, when executed by a processor, implement the answer generation method as described above.

In the scheme, a first probability that an answer word corresponding to the current time step is from a preset word list and a second probability that the answer word corresponding to the current time step is from the first text sequence are determined according to attention weights of words in the first text sequence to the answer word corresponding to the current time step, an output hidden vector of the decoder network at the current time step and an input vector of the decoder network at the current time step, the answer word corresponding to the current time step is determined from the preset word list and the first text sequence based on the first probability and the second probability, and then an answer text corresponding to a question text indicated by the first text sequence is determined based on the answer word; in the scheme, the inventor realizes the characteristic that the probability that the answer aiming at the question text is from the question text is high, so that the answer text is generated based on the answer words determined by the first probability and the second probability, the association between the answer text and the question text is enhanced, and the accuracy of the generated answer text is effectively improved. In addition, the answer words are determined in the preset word list and the first text sequence, so that the scope of the answer words is expanded, particularly under the condition that the problem text contains the unregistered words, word copying can be carried out from the problem text to serve as the answer words, the problem of the unregistered words can be effectively relieved, and the problem that the accuracy of the answer text is not high due to the unregistered words can be solved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

Fig. 1 is a schematic diagram illustrating an application scenario of the present solution according to an embodiment of the present application.

Fig. 2 is a schematic diagram illustrating an application scenario according to further embodiments of the present application.

Fig. 3 is a flow diagram illustrating an answer generation method according to one embodiment of the present application.

FIG. 4 is a flowchart illustrating step 330 according to an embodiment of the present application.

FIG. 5 is a flowchart illustrating step 340 according to an embodiment of the present application.

FIG. 6 is a flowchart illustrating step 320 according to an embodiment of the present application.

Fig. 7 is a flowchart illustrating an answer generation method according to an embodiment of the application.

Fig. 8 is a schematic diagram illustrating the construction of a question-and-answer database according to an embodiment of the present application.

Fig. 9 is a flowchart illustrating an answer generation method according to another embodiment of the present application.

Fig. 10 is a block diagram illustrating an answer generating apparatus according to an embodiment of the present application.

FIG. 11 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the subject matter of the present application can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the application.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

It should be noted that: reference herein to "a plurality" means two or more. "and/or" describe the association relationship of the associated objects, meaning that there may be three relationships, e.g., A and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

Before the specific explanation, terms referred to in the present application are explained as follows:

end-to-end neural networks (Seq2Seq, Sequence-to-Sequence): the encoder network encodes an input sequence into vector representation, and the decoder network is responsible for decoding the vector representation of the input sequence to obtain an output sequence. In Natural Language Processing (NLP), the input sequence to the encoder network may be a text sequence, and the output sequence to the decoder network may also be a text sequence. The decoder network determines words in the text sequence one by one based on the output of the encoder network, in other words, the decoder network determines words in the text sequence one by one, and the process of determining one word by the decoder network is called a time step.

Long Short Term Memory Network (LSTM): the Recurrent Neural Network (RNN) is a variant of a Recurrent Neural Network (RNN), solves the problem that a traditional Recurrent Neural Network cannot handle dependence for a long distance (the Recurrent Neural Network only has a hidden state, so that the Recurrent Neural Network becomes very sensitive to input in a short period), and can better capture dependence for a long distance by using the LSTM, because the LSTM can learn which information is memorized and which information is forgotten through a training process. Implicit tokens (implicit vectors) for each time step can be captured by LSTM.

Bidirectional Long Short-Term Memory network (Bi-directional Long Short-Term Memory, BilSTM): the text sequence is combined by forward LSTM and backward LSTM, and bi-directional coding (namely coding information from front to back and coding information from back to front) can be carried out on the text sequence by using the BilSTM so as to better capture bi-directional semantic dependence in the text sequence.

Whether the question is similar or not: also referred to as judgment-type questions, which are generally factual questions that can be answered with short answers such as "yes" or "not". For example, the question "do the earth revolve around the sun" is a question of whether or not to be classified.

Answer polarity: for indicating whether the answer is forward or reverse, therefore, the answer polarity includes a forward polarity indicating that the answer is forward and a reverse polarity indicating that the answer is reverse, and further, the answer polarity may also include an uncertain polarity indicating that the answer polarity is uncertain. For example, the answer to the question text "do earth revolve around the sun" is "earth revolves around the sun", and the answer is affirmative, i.e., the answer is positive, and thus, the answer polarity is positive polarity.

Unknown words: the term is a word which is not included in the vocabulary but must be separated, and includes various proper nouns (names of people, places, names of enterprises, and the like), abbreviations, newly added words, and the like.

The replication mechanism is as follows: in the scheme, the problem of unknown words is relieved by allocating probability to the words in the question text and copying the words from the answer text as the answer words in the answer text.

Fig. 1 is a schematic diagram of an application scenario of the present solution shown in an embodiment of the present application, as shown in fig. 1, the application scenario includes a terminal 110 and a server 120, and the terminal 110 may be in communication connection with the server 120 through a wired or wireless network. The terminal 110 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart audio, a vehicle-mounted terminal, and other electronic devices that can interact with a user, and is not limited in particular.

The server 120 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and artificial intelligence platform, and the like.

The terminal 110 may run a first application, which may be a knowledge question answering program, a search application, or an application integrating a search engine, and other applications that may send question text to a server. The user may send the question text to the server 120 based on the first application, and in this embodiment, the terminal 110 further sends the answer polarity corresponding to the question text to the server. For example, in fig. 1, the terminal 110 may send the answer polarity B1 corresponding to the question text a1 and the question text a1, the answer polarity B2 corresponding to the question text a2 and the question text a2 displayed on the first user interface 111 to the server 120, and the server 120 automatically generates the answer text corresponding to the question text according to the method of the present application. In other embodiments, the terminal 110 may only send the question text to the server 120, and the server automatically generates the answer text according to the question text.

The server 120 processes the question text and the answer polarity corresponding to the question text by using a natural language processing technology to generate an answer text corresponding to the question text. Natural language processing is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.

In some embodiments, the answer polarity corresponding to the question text may be user-labeled, for example, the user selects the answer polarity corresponding to the question text after inputting the question text. In some embodiments, the first application provides answer polarity selection options, which may include a first selection option indicating that the answer polarity is positive and a second selection option indicating that the answer polarity is negative. In other embodiments, the selection options may include a third selection option indicating an indeterminate polarity in addition to the two listed above.

In some embodiments, the answer polarity corresponding to the question text may also be pre-labeled, that is, the question text and the labeled answer polarity are stored in the question text library in advance, so that the user may select the question text from the question text library and send the selected question text and the answer polarity corresponding to the selected question text to the server 120.

In some embodiments, the question text library may also be provided in the server 120, so that the server 120 may extract the question text from the question text library, and generate the answer text corresponding to the question text according to the method of the present application according to the answer polarity corresponding to the question text and the question text.

In some embodiments, the server 120 may further construct a question-answer database 121 based on the question text, the corresponding answer polarity of the question text, and the generated answer text for the question text. Specifically, the question text, the answer polarity corresponding to the question text, and the corresponding answer text association are stored as a question-answer pair, and the question-answer pair is added to the question-answer database 121. On this basis, the server 120 may provide the terminal 110 with the question-answering service based on the constructed question-answering database 121.

In some embodiments, in order to ensure the accuracy of the answer text in the question-answer pair in the database 121, after the server 120 generates the answer text corresponding to the question text, the question-answer pair consisting of the question text, the answer polarity corresponding to the question text, and the corresponding answer text is sent to the terminal 110 and displayed in the terminal 110, so that the user can check whether the answer text corresponding to the question text is accurate based on the question-answer pair, and if the answer text is not accurate, the user can modify the answer text in the question-answer pair to obtain the modified question-answer pair. Then, the terminal 110 sends the modified question-answer pair to the server 120, so that the server 120 stores the modified question-answer pair in the question-answer data.

In some embodiments, the terminal 110 may send a search request to the server 120, the search request indicating a target issue. For example, in fig. 1, the user may input a target question I in the second user interface 112 of the terminal 110 and then send a search request to the server 120 based on the input text question I. After receiving the search request, the server 120 performs semantic matching on the target question indicated by the search request and the question text in the question-answer pair in the question-answer database, determines a target question-answer pair in which the question text and the target question are semantically matched, and then returns the answer text in the target question-answer pair to the terminal 110 as a search result of the search request sent to the terminal 110, for example, in fig. 1, when the target question I is input in the second user interface 112, after completing the question text matching, the server 120 returns the answer text for the target question I to the terminal 110.

Fig. 2 is a schematic diagram illustrating an application scenario according to further embodiments of the present application. As shown in fig. 2, the application scenario includes an answering robot 130 and a server 120, the answering robot 130 is in communication connection with the server 120 through a wired or wireless network, and the answering robot 130 may be located in a shopping mall, an amusement park, a hotel, a park, a bank, a hospital, and the like to provide a timely answering service.

In some embodiments, the user may have voice interaction with the question-and-answer robot 130, such as a voice question asked by the user, which upon receiving a voice signal, may perform voice-to-text recognition to determine a question for the text. Of course, the user may also input text on the user interaction interface of the question-answering robot 130 to obtain a question of the text.

Based on the question-answer database constructed in the server 120, after obtaining the question of the user, the question-answer robot 130 sends the question to the server 120, the server 120 performs question matching in the question-answer database, determines a target question-answer pair with a question text matched with the question semantics, extracts a target answer text in the target question-answer pair, returns the target answer text to the question-answer robot 130, and the question-answer robot 130 replies to the user with the target question-answer text as an answer.

In some embodiments, the question and answer robot 130 may display the target question and answer text on the display interface, and may also perform voice broadcast on the target question and answer text.

In some embodiments, in the case that the processing capability of the question-answering robot 130 is sufficient, the question-answering database may be synchronized to the question-answering robot 130, so that after the question-answering robot 130 acquires the question of the user, the question-answering robot 130 performs question semantic matching in the question-answering database, determines a target question-answering pair matching the question text and the question semantic provided by the user, extracts the target answer text in the target question-answering pair, and replies to the user with the target question-answering text as an answer.

The scheme provided by the embodiment of the application relates to an artificial intelligence natural language processing technology, and is elaborated by the following embodiments:

fig. 3 is a flowchart illustrating an answer generation method according to an embodiment of the present application, which may be executed by a computer device with processing capability, such as a server, a terminal device, and the like, and is not limited in detail herein. Referring to fig. 3, the method includes at least steps 310 to 340, which are described in detail as follows:

step 310, a first text sequence is encoded to obtain an encoding vector of each word in the first text sequence, and the first text sequence indicates a problem text.

In some embodiments, the first text sequence may be a sequence of words in the question text arranged in order in the question text sequence.

In a reading and understanding scenario, when a question represented by a question text is provided based on the content of a paragraph, and an answer corresponding to the question text is also from the paragraph, the text in the paragraph reflecting the answer corresponding to the question text can be regarded as the answer corresponding to the question text according to the text. That is, the answer-based text corresponding to the question text refers to the text content according to which the answer corresponding to the question text is based.

In some embodiments, in order to provide more information for generating answers, answers corresponding to the question texts can also be used as data bases for generating answers according to the texts. In this case, the first text sequence may be obtained by text-splicing the question text and the answer corresponding to the question text according to the text. In an embodiment, in order to facilitate distinguishing the question text from the answer-based text, a separator for separating the question text from the answer-based text may be further included in the first text sequence.

In some embodiments, in order to make the generated answer to the question text interpretable, the polarity of the answer corresponding to the question text may also be used as a data basis for generating the answer. In some embodiments, prior to step 310, the method further comprises: and performing text splicing on the question text and the answer polarity corresponding to the question text to obtain the first text sequence. Further, in order to distinguish words in the question text from words indicating the polarity of the answer, a separator may be added between the question text and the polarity of the answer, so that the separator between the question text and the polarity of the answer is also included in the first text sequence.

For example, if the question text is () Wherein, in the step (A),indicating the first in question textAnd M is the total number of words in the question text. The answer polarity corresponding to the question text isThen the first text sequence may be:(ii) a Wherein, [ SEP]Are separators.

The answer polarity corresponding to the question text is used to indicate the polarity of the answer corresponding to the question text, and in some embodiments, the polarity of the answer may include a forward polarity indicating that the answer is forward, and a reverse polarity indicating that the answer is reverse; in other embodiments, the polarity of the answer may include an uncertain polarity, in addition to the positive polarity and the negative polarity, for indicating that the answer to the question text is uncertain in polarity.

For example, if the question text is "earth revolves around the sun", the answer corresponding to the question text is affirmative, that is, "yes", and thus, the answer polarity corresponding to the question text is positive polarity; for another example, if the question text is "31 days in total in 6 months", the answer corresponding to the question text is negative, that is, "no", and thus, the answer polarity corresponding to the question text is the reverse polarity. For another example, "there is No. 29 in month 2", the answer corresponding to the question text is different in different years (in some cases, the answer is positive, and in other cases, the answer is negative), and thus the answer polarity corresponding to the question text is an uncertain polarity.

In some embodiments, the answer polarity corresponding to the question text may be obtained by labeling the question text by the user, that is, labeling the answer polarity by the user who knows the answer of the question text according to the answer of the question text, so as to obtain the answer polarity corresponding to the question text.

In some embodiments, the answer generating task in the solution may be a downstream task of a Machine Reading Composition (MRC), and the MRC may extract an answer-based text corresponding to the question text from the provided reading text, and generate a polarity tag indicating a polarity of an answer corresponding to the question text for the question text. It is to be understood that the extracted answer indicates the answer to the text of the question according to the text, and therefore, the machine reading model may generate a polarity label indicating the polarity of the answer to the text of the question according to the text and the text of the question according to the extracted answer. Then, the answer extracted by the machine reading model for the question text can be used in the scheme according to the text and the polarity label as a data base for generating the answer text. In some embodiments, the machine-reading model may be a Bert for MRC model.

In some embodiments, the question text, the answer polarity corresponding to the question text, and the answer corresponding to the question text may be used as a data basis for generating the answer. In some embodiments, prior to step 310, the method further comprises: and performing text splicing on the question text, the answer polarity corresponding to the question text and the answer corresponding to the question text according to the text to obtain the first text sequence. Further, in order to distinguish the question text, the answer polarity, and the answer-based text, separators may be added between two adjacent texts (e.g., between the question text and the answer polarity, and between the answer polarity and the answer-based text), so that the added separators are also included in the first text sequence.

Continuing with the above example, if the question text (b) ((b))) The corresponding answer is based on the text of () Wherein, in the step (A),indicating that the answer is according to the second in the textA word; and N is the total number of words in the answer according to the text. The first text sequence obtained according to the text based on the question text, the answer polarity corresponding to the question text, and the answer corresponding to the question text may be:(ii) a Therein, [ SEP1]And [ SEP2]Are separators.

In the solution of the present application, the question represented by the question text may be attributed to whether the question is a question or not (i.e. a question of a judgment class), and as described above, whether the question is a question or not may generally be replied by a short answer such as "yes" or "not", but in practice, if the question is replied by a short answer such as "yes" or "not", the answer to the question-answer text may be in a single form, and therefore, a richer and more diverse answer text may be generated for whether the question is a question or not by the solution of the present application.

In some embodiments of the present application, step 310, comprises: and coding the first text sequence through a coder network to obtain a coding vector of each word in the first text sequence.

In some embodiments, the encoder Network may be a Long Short Term Memory Network (LSTM) that can be used to better capture word-to-word dependencies in longer distance text because LSTM learns which information to remember and which information to forget through a training process.

In some embodiments, the encoder network may be a Bi-directional Long Short-Term Memory (bilst) network, and the bilst is a combination of forward LSTM and backward LSTM, and Bi-directionally encodes the first text sequence (i.e., encodes information from front to back and encodes information from back to front) using the bilst, so as to better capture Bi-directional semantic dependence in the text.

When the encoder network is a long-short time memory network or a bidirectional long-short time memory network, the hidden vector output by the long-short time memory network or the bidirectional long-short time memory network for the word in the first text sequence can be regarded as the encoding vector of the word. The hidden vectors output by the long-short time memory network or the two-way long-short time memory network indicate the hidden representation of the corresponding words.

In some embodiments, the network of encoders may also be encoders in a network of converters, also known as a Transformer model, in which the encoders have attention layers through which a sequence of text input to the encoders passes first, and which help the encoders focus on other words in the input text when encoding each word.

It is to be noted that the above-listed encoder networks are only exemplary examples and should not be considered as limiting the scope of application, and in other embodiments, the encoder networks may be other neural networks for encoding text sequences, such as gated circular neural networks.

Step 320, determining an attention weight corresponding to each word in the first text sequence at the current time step according to the coded vector of each word in the first text sequence and the output hidden vector of the decoder network at the current time step.

The decoder network is used for decoding according to the coding vector of each word in the first text sequence to obtain an output hidden vector, and the output hidden vector is used for determining the answer word of the corresponding time step. The answer word refers to a word in the answer text. In the process of generating the answer text, in order to ensure the dependency relationship between words in the answer text, the answer words in the answer text are generated according to time steps.

Correspondingly, for the decoder network, the output hidden vector of each time step is also output according to the time step, so as to ensure that the output hidden vector of the current time step is used for determining the answer word corresponding to the current time step. That is, the decoder network determines the output hidden vectors corresponding to the answer words in the answer text one by one, and the process of determining one answer word is called a time step. Because the decoder network determines the output hidden vector corresponding to the answer word according to the arrangement sequence of the answer word in the answer text, the output hidden vector output by the decoder network at time step t (or tth time step, wherein t is more than or equal to 1, and t is a positive integer) is the t-th word corresponding to the answer text.

In some embodiments, the decoder network may be a long-term memory network, and specifically, the decoder network may be one or more layers of long-term memory networks, which is not specifically limited herein. In some embodiments, the decoder network may also be a decoder in a Transformer network (Transformer model). Of course, in other embodiments, the decoder network may also be other neural networks for text decoding, and is not limited in particular here.

In some embodiments of the present application, prior to step 320, the method further comprises: acquiring an input vector of the decoder network at a current time step, wherein the input vector of the decoder network at the current time step comprises an output implicit vector of the decoder network at a previous time step, and the input vector of the decoder network at a first time step comprises an implicit vector corresponding to a start tag; and processing by the decoder network according to the input vector of the current time step, and outputting an output implicit vector of the current time step.

In some embodiments, the hidden vector corresponding to the start tag may be an embedded vector corresponding to the start tag that is input to a decoder network (e.g., a single layer of LSTM), and the decoder network outputs the hidden vector corresponding to the start tag according to the embedded vector corresponding to the start tag. Wherein the start tag correspondence Embedding vector may be generated by a shared Embedding (Embedding) layer.

In some embodiments, if the encoder network is constructed based on a long-time and short-time memory network, the input of the encoder network at each time step includes the output implicit vector of the decoder network at the previous time step, and also includes the cell unit vector output at the previous time step, so that the encoder network performs processing based on the output implicit vector of the previous time step and the cell unit vector of the previous time step to obtain the output implicit vector of the current time step and the cell unit vector of the current time step; by repeating the process, the decoder network can correspondingly output the output implicit vectors corresponding to each time step.

In some embodiments, the cell unit vector input by the decoder network at the first time step may be the cell unit vector output by the encoder network at the last time step in the encoding process, wherein the cell unit vector output by the encoder network at the last time step in the encoding process represents the encoded information of the first text sequence.

In some embodiments, the output hidden vector output by the decoder network at the current time step may represent a low-dimensional vector representation of the answer word corresponding to the current time step, or may be understood as that the output hidden vector output at the current time step indicates the semantics of the answer word corresponding to the current time step, and finally the answer word output at the current time step is related to the output hidden vector output by the decoder network at the current time step.

And the attention weight value corresponding to each word in the first text sequence at the current time step is used for representing the attention which is distributed to each word in the first text sequence by determining the answer word at the current time step. It is understood that the answer words in the answer text are determined based on the words in the first text sequence, and the degree of contribution of each word in the first text sequence to different answer words in the answer text is different, so that the attention weights corresponding to each word in the first text sequence at different time steps are different. Therefore, according to the scheme, the attention weight of each word in the first text sequence at the corresponding time step is correspondingly determined according to the time step for determining the answer word.

It can be understood that, because the contribution degrees of the words in the first text sequence to the answer words at each time step are different, in each time step for generating the answer words, according to the step 320, the attention weight value of each word in the first text sequence at each time step needs to be determined in a targeted manner by combining the coded vector of each word in the first text sequence and the output hidden vector of the decoder network at each time step, so as to ensure that the determined attention weight value accurately reflects the contribution degree of the answer words at each time step.

Step 330, determining a first probability distribution according to the attention weight value corresponding to each word in the first text sequence at the current time step, the output hidden vector of the decoder network at the current time step, and the input vector of the decoder network at the current time step, where the first probability distribution is used to indicate a first probability that the answer word corresponding to the current time step is from a preset word list and a second probability from the first text sequence.

The inventors of the present solution have realized that the probability that the answer to the question text originates from a word in the question text is high, especially for question texts belonging to question or not. Whether a question is similar to a question is generally a fact-like question, and an answer to a question text belonging to whether a question is similar to a question is high in probability that a word (hereinafter, an answer word) in the answer text corresponding to the question text originates from the question text if the answer text for the question text states a fact corresponding to the question text in an affirmative manner. Based on the consideration, in order to improve the accuracy, diversity and richness of the answer text generated aiming at the question text, instead of using single short answers such as "yes" and "no" as answers of the question text, whether word duplication is directly performed from the first text sequence is judged by calculating the second probability that the answer words are from the first text sequence and calculating the first probability that the answer words are from the preset word list, and then combining the first probability and the second probability to be used as the answer words in the answer text.

Similarly, since the answer corresponding to the question text is derived from the answer-based text, and the probability that the word in the answer text corresponding to the question text is derived from the answer-based text is high, in the case that the answer-based text corresponding to the question text is also used as the data basis for generating the answer text corresponding to the question text, the word can be copied from the answer-based text corresponding to the question text as the word in the answer text.

The preset vocabulary may be set according to actual needs, and is not specifically limited herein. In some embodiments, in order to ensure the correlation between the words in the preset vocabulary and the question-answer scenario, the preset vocabulary may be constructed based on technical text data in the technical field corresponding to the scenario, for example, if the question-answer scenario is a question-answer in the medical technical field, the words may be segmented according to the technical text data in the medical technical field, and the preset vocabulary may be constructed.

In some embodiments, since there are common words even in different technical fields, the words in the common word list may be added to the preset word list on the basis of the common word list, and then the words in the related technical fields may be further added to the preset word list in combination with the technical text data in the technical field corresponding to the question and answer scenario. The preset word list is established in a pertinence mode according to the technical field corresponding to the question and answer scene, so that the relevance of answer words extracted from the preset word list and used as answer texts in the answer generating process can be guaranteed, and the accuracy of the generated answer texts is further guaranteed.

In some embodiments of the present application, as shown in fig. 4, step 330, comprises:

step 410, determining a context vector corresponding to the current time step according to the coding vector corresponding to each word in the first text sequence and the attention weight value corresponding to each word in the first text sequence at the current time step.

In some embodiments of the present application, step 410 comprises: and taking the attention weight value corresponding to each word in the first text sequence at the current time step as a weighting coefficient of the corresponding word, and performing weighting processing on the coding vectors of all words in the first text sequence to obtain the context vector corresponding to the current time step.

In other words, if it is the first text sequenceThe coding vector corresponding to each word isIn the first text sequenceThe attention weight of a word at the current time step (assuming the current time step as time step t) isThen the context vector corresponding to time step tCan be as follows:

(ii) a (formula 1)

Step 420, the context vector corresponding to the current time step, the output implicit vector of the decoder network at the current time step, and the input vector of the decoder network at the current time step are spliced to obtain a first spliced vector.

Step 430, transform the first stitching vector to obtain the first probability.

In some embodiments, the first stitching vector may be linearly transformed by a linear network layer (also referred to as a fully-connected layer), outputting the first probability.

Step 440, determining the second probability according to the first probability, wherein the sum of the first probability and the second probability is 1.

In the scheme, the answer words in the answer text are derived from either the first text sequence or the preset word list, so that the sum of the first probability and the second probability is set to be 1, and after the first probability is determined, the difference between 1 and the first probability is the second probability.

Continuing to refer to fig. 3, step 340, determining an answer word corresponding to the current time step in the preset word list and the first text sequence according to the first probability and the second probability; the answer words are used for determining answer texts corresponding to the question texts.

The first probability indicates the probability that the answer word corresponding to the current time step comes from the preset word list; the second probability indicates the probability that the answer word corresponding to the current time step is from the first text sequence, so that the target probability that the word in the first text sequence and each word in the preset word list are the answer word corresponding to the current time step can be further calculated based on the first probability and the second probability, and then the answer word corresponding to the current time step is determined in the preset word list and the first text sequence based on the target probability.

It is understood that the same word may exist in the first text sequence and the preset word list, for example, a word (assumed as the word P) exists in both the first text sequence and the preset word list, and then the first probability and the second probability need to be simultaneously combined to calculate the target probability that the word P corresponds to the current time step.

In some embodiments of the present application, as shown in fig. 5, step 340, comprises:

step 510, obtaining a second probability distribution, where the second probability distribution is used to indicate a reference probability that each word in a preset word list is an answer word corresponding to the current time step.

In some embodiments, step 510 comprises: splicing the context vector corresponding to the current time step with an output implicit vector of the decoder network at the current time step to obtain a second spliced vector; and performing linear transformation on the second splicing vector, and performing probability prediction based on a result of the linear transformation to obtain the second probability distribution.

In some embodiments, the second stitching vector may be linearly transformed by one or more linear network layers, thereby outputting the second probability distribution. Each neuron in the linear network layer is connected with all neurons in the previous neural network layer, so that the linear transformation of the output of the previous neural network layer is realized.

In some embodiments, in the case that the second stitching vector is linearly transformed by multiple linear network layers, in order to avoid the problem of gradient disappearance in the multiple linear network layers, an activation layer may be further provided in the multiple linear network layers, and the activation process is performed by an activation function provided in the activation layer. For example, the second stitching vector may be linearly transformed by one linear network layer, the output of the previous linear network layer is then activated by the activation layer, and the output of the activation layer is then linearly transformed by another linear network layer to output the second probability distribution.

In some embodiments, the output dimension of the linear network layer directly outputting the second probability distribution is the same as the number of words in the preset word list, so that one output dimension of the linear network layer uniquely represents the probability that a word is the answer word corresponding to the current time step.

Step 520, according to the first probability and the second probability, performing weighting processing on the second probability distribution and the attention weight corresponding to each word in the first text sequence at the current time step to determine a target probability distribution, where the target probability distribution is used to indicate target probabilities that each word in the preset word list and the first text sequence is an answer word corresponding to the current time step.

Specifically, in step 520, the first probability is used as a weighting coefficient of the second probability distribution item, the second probability is used as a weighting coefficient of the attention weight item corresponding to each word in the first text sequence at the current time step, and the second probability distribution and the attention weight corresponding to each word in the first text sequence at the current time step are weighted to obtain the target probability distribution.

In some embodiments, the attention weight corresponding to the word in the first text sequence at the current time step may be regarded as the reference probability that the word in the first text sequence is the answer word corresponding to the current time step, and therefore, if a word P1 only exists in the first text sequence and the preset word list does not include the word P1, the target probability that the word P1 is the answer word corresponding to the current time step is equal to the product of the second probability and the reference probability that the word P1 is the answer word corresponding to the current time step; similarly, if a word P2 exists only in the predetermined vocabulary and the word P2 is not included in the first text sequence, the target probability of the answer word corresponding to the word P2 at the current time step may be equal to the product of the first probability and the reference probability corresponding to the word P2 indicated by the second probability distribution; if a word P3 is not only in the first text sequence but also in the preset vocabulary, the word P3 is the sum of the first target probability and the second target probability, where the first target probability is equal to the product of the second probability and the attention weight of the word P3 to the answer word corresponding to the current time step; the second target probability is equal to the product of the first probability and a reference probability corresponding to the word P3 indicated by the second probability distribution.

And 530, screening in the preset word list and the first text sequence according to the target probability, and determining an answer word corresponding to the current time step.

Through step 520, the target probability that each word in the preset word list and the first text sequence is the answer word corresponding to the current time step can be calculated, and on the basis, the word with the highest target probability in the first text sequence and the preset word list can be determined as the answer word corresponding to the current time step.

After generating the answer word corresponding to the current time step, the process of step 320-340 may be repeated to obtain the answer word corresponding to each time step, until an END mark is generated (where the END mark may be [ END ]) or the number of the generated answer words reaches a preset text length threshold, the generation of the answer word is stopped. And then combining the answer words corresponding to all the time steps according to the sequence of the time steps to obtain an answer text corresponding to the question text.

In the scheme, a first probability that an answer word corresponding to the current time step is from a preset word list and a second probability that the answer word corresponding to the current time step is from the first text sequence are determined according to attention weights of words in the first text sequence to the answer word corresponding to the current time step, an output hidden vector of the decoder network at the current time step and an input vector of the decoder network at the current time step, the answer word corresponding to the current time step is determined from the preset word list and the first text sequence based on the first probability and the second probability, and then an answer text corresponding to a question text indicated by the first text sequence is determined based on the answer word; in the scheme, the inventor realizes the characteristic that the probability that the answer aiming at the question text is from the question text is high, so that the answer text is generated based on the answer words determined by the first probability and the second probability, the association between the answer text and the question text is enhanced, and the accuracy of the generated answer text is effectively improved. In addition, the answer words are determined in the preset word list and the first text sequence, the scope of the answer words is expanded, and particularly under the condition that the problem text contains the unknown words, word copying can be carried out from the problem text to serve as the answer words, so that the problem of the unknown words can be effectively relieved, and the influence of the unknown words on the accuracy of the answer text can be solved.

Based on the replication mechanism introduced in the answer generation process in the scheme (i.e. replicating words from the first text sequence as answer words), the relationship between the generated answer text and the question text is tighter, so that the stability of the answer is improved.

Furthermore, the scheme can be suitable for generating answer texts for question texts of question-like or question-like questions, short texts such as 'yes' or 'no' are generally generated for answers generated for question texts of question-like or question-like questions in the related technology, the generated answer forms are single, if answer texts corresponding to question-like or question-like questions are generated according to the scheme, the characteristic that whether question-like or question-like is generally a fact question can be effectively utilized, words in the question texts are effectively utilized to generate the answer texts, the forms of the generated answer texts can be enriched, and the diversity of the answer texts is guaranteed. In the scheme, if the first text sequence comprises the answer polarity corresponding to the question text, the interpretability of the generated answer text can be considered.

In some embodiments of the present application, as shown in fig. 6, step 320, comprises:

and step 610, splicing the coding vector of each word in the first text sequence with the output implicit vector of the decoder network at the current time step respectively to obtain a third spliced vector corresponding to each word in the first text sequence.

And step 620, performing linear transformation on each third splicing vector to obtain a middle transformation vector corresponding to each word in the first text sequence.

In some embodiments, each third stitching vector may be linearly transformed by a linear network layer. For the sake of distinction, the linear network layer for linearly transforming the third splicing vector is referred to as a third linear network layer. Similarly, in step 620, the third splicing vector may be linearly transformed by one or more third linear network layers.

Step 630, performing activation processing on each intermediate transformation vector to obtain an initial attention weight of each word in the first text sequence to the answer word corresponding to the current time step.

In some embodiments, each intermediate transform vector may be subjected to an activation process by an activation layer. The activation function provided in the activation layer specifically performs activation processing on each intermediate transformation vector through the activation function in the activation layer. In some embodiments, the activation function may be a hyperbolic tangent function (i.e., a tanh function).

Step 640, performing normalization processing on each initial attention weight value to obtain an attention weight value of each word in the first text sequence to the answer word corresponding to the current time step.

In some embodiments, each initial attention weight value may be normalized by a softmax function, and the result after the normalization is used as the attention weight value of the corresponding word to the answer word corresponding to the current time step.

Through the steps 610-640 as above, the attention weight of each word in the first text sequence to the answer word corresponding to the current time step is calculated according to the coded vector of each word in the first text sequence and the output hidden vector of the decoder network at the current time step.

Fig. 7 is a flowchart illustrating an answer generation method according to an embodiment of the application. In the embodiment shown in fig. 7, the answer generation model includes an encoder network, a decoder network, a generation network and a replication network, an object probability output layer, and an answer text output layer. The answer generation model is an end-to-end neural network to determine answer text for the question text.

Wherein, the encoder network may be BilSTM, if the first text sequence is question text(s) ((B))) The answer polarity corresponding to the question textAnd splicing the texts, wherein the first text sequence is as follows:then each word in the first text sequence may be encoded by BilSTMExpressed as:

(ii) a (formula 2)

H is the coding matrix formed by the coding vectors of all words in the first text sequence, i.e. the hidden vectors of the coder network in fig. 7, where,wherein, in the step (A),the dimension of a single layer LSTM implies a layer.

In the embodiment shown in FIG. 7, the decoder network may be a layer LSTM network, assuming that the output implicit vector of the decoder network at time step t is。

The generating network is used for predicting the second probability distribution, and specifically, the generating network may include a first linear network layer, a first active layer, a second linear network layer, a second active layer, a weighting processing layer, a third linear network layer, a fourth linear network layer, and a third active layer, which are cascaded.

Specifically, after splicing a coding vector of each word in a first text sequence with an output hidden vector of a decoder network at a current time step, inputting the spliced coding vector into a first linear network layer for linear transformation, and then activating the output of the first linear network layer through a first activation layer, wherein an activation function set in the first activation layer is a hyperbolic tangent function; the output of the first activation layer is subjected to linear transformation by a second linear network layer, and the second linear network layer outputs the initial attention weight of each word in the first text sequence to the answer word corresponding to the current time step; and then, the output of the second linear network layer is normalized by the second activation layer to obtain the attention weight value of each word in the first text sequence to the answer word corresponding to the current time step, wherein the activation function set in the second activation layer is a softmax function. This process can be described by the following equations (3) and (4):

(ii) a (formula 3)

(ii) a (formula 4)

Wherein the content of the first and second substances,indicating the first in the first text sequenceThe code vector of a word is then encoded,andis a weight parameter of the first linear network layer,being a bias parameter of the first linear network layer,、andcan be determined by training;the weight parameter of the second linear network layer can be determined through training;is the first in the first text sequenceWord at time stepThe initial attention weight of;is the first in the first text sequenceWord at time stepThe attention weight value (i.e., the value normalized to the initial attention weight value).

Is obtained byThen, the coding vector of the word in the first text sequence and the first text sequence are weighted by the weighting processing layerWord to time stepWeighting the attention weight of the corresponding answer word to obtain the time stepContext vector of(ii) a This process can be described by equation (1) above:

(ii) a (formula)1）

Then, the output implicit vector of the decoder network at the time step t isStep with timeContext vector ofSplicing to obtain a second splicing vector(ii) a Then, inputting the second splicing vector into the third linear network layer for linear transformation, performing linear transformation on the output of the third linear network layer by the fourth linear network layer, performing probability prediction according to the output of the fourth linear network layer by the third active layer, and outputting a second probability distributionWherein the activation function provided in the third activation layer isA function; this process can be described by the following equation (5):

(ii) a (formula 5)

Wherein the content of the first and second substances,is the weight parameter of the third linear network layer,is a bias parameter of the third linear network layer;as a weight parameter of the fourth linear network layer,bias parameters for the fourth linear network layer;、、、can be determined by training.

The replication network comprises a fifth linear network layer and a fourth activation layer which are cascaded, wherein the activation function set in the fourth activation layer is a sigmoid function, and the value range of the sigmoid function is [0, 1 ]]. First, time stepContext vector ofDecoder network at time stepIs output with a hidden vectorAnd decoder network at time stepInput vector ofAfter splicing, inputting the data into a fifth linear network layer for linear transformation, and then inputting the data into the fifth linear network layer by a fourth activation layerGo out to process, output time stepThe first probability that the corresponding answer word comes from the preset word list(ii) a This process can be described by the following equation (6):

(ii) a (formula 6)

Wherein the content of the first and second substances,representing a sigmoid function;、andis a weight parameter of the fifth linear network layer,is a bias parameter of the fifth linear network layer,、、andcan be determined by training. It will be appreciated that at the first probabilityIn the case of a determination, the second probability is correspondingly determined as。

And then outputting the first probability according to the first probability by the target probability output layerSecond probability ofAnd a second probability distributionWord-to-time step in the first text sequenceOutputting the time step of each word in the first text sequence and the preset word list by the attention weight of the corresponding answer wordThe target probability of the corresponding answer word, which can be expressed as:

(ii) a (formula 7)

WhereinIndicating that each word in the first text sequence and the preset word list is the first in the answer textTarget probabilities of individual answer words.

After the target probability is obtained, the word with the maximum target probability in the first text sequence and the preset word list can be determined as the time stepThe corresponding answer word. And then, combining the answer words corresponding to each time step by the answer text output layer to obtain the answer text corresponding to the question text.

In short, in each prediction of the answer generation model, a second probability distribution for the preset word list can be obtained by generating network prediction, and a second probability that the answer word is from the first text sequence can be obtained by copying the network predictionAnd a first probability derived from a predetermined vocabularyThen, a target probability distribution is determined based on the first probability, the second probability and the second probability distribution (the word "2-0" in the final histogram is not in the preset word list and is from the first text sequence), so that the answer generation model can copy some words from the first text sequence as input directly into the answer text.

By the answer generation model provided by the scheme, high-quality answer texts can be generated for questions of the same type or not. Experimental tests show that in the accuracy index, the accuracy of the answer text generated by adopting the scheme for the answer text reaches 98 percent and is far higher than the accuracy of the answer text generated based on the question rule by 90 percent, and meanwhile, in manual experiments, the manual satisfaction degree of the answer text obtained by using the scheme is high up to 96 percent and is far higher than the answer text generated by adopting the question rule, so that the scheme can effectively avoid displaying wrong answers in the question and answer service of a search engine.

In some embodiments of the present application, after step 340, the method further comprises: associating the question text, the answer polarity corresponding to the question text and the answer text corresponding to the question text to obtain a question-answer pair; and storing the question-answer pairs into a question-answer database. Thereafter, an automatic question-and-answer service may be provided based on the question-and-answer pairs stored in the question-and-answer database.

Fig. 8 is a schematic diagram illustrating the construction of a question-and-answer database according to an embodiment of the present application. As shown in fig. 8, the question text and the answer polarity corresponding to the question text are input into the answer generation model, and the answer generation model processes the question text and the answer polarity corresponding to the question text according to the method of the present application to obtain the answer text corresponding to the question text. Then, the answer generation model may output a question-answer pair triple composed of the question text, the answer polarity, and the answer text, and then store the question-answer pair triple in a question-answer database.

In other embodiments, the question text and the answer text corresponding to the question text may be stored in association as a question-answer pair binary group, and then the question-answer pair binary group is stored in a question-answer database, and in the process of providing question-answer service, service is provided based on the question-answer binary group in the question-answer database.

The answer generation model may generate an answer text corresponding to the question text according to the process shown in fig. 7, and a specific generation process of the answer text refers to the description of the embodiment corresponding to fig. 7, which is not described herein again.

In some embodiments of the present application, as shown in fig. 9, the method further comprises:

at step 910, a search request is received, the search request indicating a target problem.

And 920, performing question matching in the question-answer database according to the target question, and determining a target question-answer pair of which the question text is matched with the target question.

In some embodiments, the question matching may be performed by calculating a similarity between the question text and the target question in the question-answer pair, and then determining the target question-answer pair based on the calculated similarity. The calculated similarity may be a semantic similarity or a text similarity, and is not specifically limited herein.

In some embodiments, the question-answer pair in which the question text with the highest similarity to the target question is located may be determined as the target question-answer pair. In other embodiments, the question-answer pair in which the question text with the similarity to the target question exceeding the similarity threshold is located may also be determined as the target question-answer pair. In other embodiments, the question-answer pair sorting may also be performed according to the similarity from high to low, and the question-answer pairs located in the top set number in the sorting are determined as the target question-answer pairs.

Step 930, returning the answer text in the target question-answer pair to the initiator of the search request. And returning answer texts in the target question-answer pairs to the initiator of the search request as search results of the target questions.

Providing a search question and answer service based on the constructed question and answer database is realized by the steps 910-930 as above. In the process of generating answers for the question texts, the probability that the answer words are from the question texts or from the question texts and the corresponding answers of the question texts according to the texts is considered, the second probability that the answer words are from the first text sequence is further calculated, the answer words are determined on the basis of the second probability, and then answer texts and question-answer pairs are generated, so that the accuracy and diversity of the answer texts in the question-answer pairs are guaranteed, the answers are not single short text answers, and the quality of service of searching the question-answer based on a question-answer database can be effectively guaranteed.

Embodiments of the apparatus of the present application are described below, which may be used to perform the methods of the above-described embodiments of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the above-described embodiments of the method of the present application.

Fig. 10 is a block diagram illustrating an answer generating apparatus according to an embodiment, as shown in fig. 10, the answer generating apparatus including: the encoding processing module 1010 is configured to perform encoding processing on a first text sequence to obtain an encoding vector of each word in the first text sequence, where the first text sequence indicates a problem text; an attention weight determining module 1020, configured to determine an attention weight corresponding to each word in the first text sequence at the current time step according to the coded vector of each word in the first text sequence and an output hidden vector of the decoder network at the current time step; a first probability distribution determining module 1030, configured to determine a first probability distribution according to an attention weight corresponding to each word in the first text sequence at a current time step, an output hidden vector of the decoder network at the current time step, and an input vector of the decoder network at the current time step, where the first probability distribution is used to indicate a first probability that an answer word corresponding to the current time step is from a preset vocabulary and a second probability from the first text sequence; an answer word determining module 1040, configured to determine, according to the first probability and the second probability, an answer word corresponding to the current time step in the preset word list and the first text sequence; the answer words are used for determining answer texts corresponding to the question texts.

In some embodiments of the present application, the first probability distribution determining module 1030 comprises: a context vector determining unit, configured to determine a context vector corresponding to the current time step according to a coding vector corresponding to each word in the first text sequence and an attention weight corresponding to each word in the first text sequence at the current time step; the first splicing unit is used for splicing the context vector corresponding to the current time step, the output implicit vector of the decoder network at the current time step and the input vector of the decoder network at the current time step to obtain a first splicing vector; a first probability determining unit, configured to perform transformation processing on the first stitching vector to obtain the first probability; a second probability determination unit configured to determine the second probability according to the first probability, wherein a sum of the first probability and the second probability is 1.

In some embodiments of the present application, the answer term determination module 1040 includes: a second probability distribution obtaining unit, configured to obtain a second probability distribution, where the second probability distribution is used to indicate a reference probability that each word in a preset word list is an answer word corresponding to the current time step; a weighting processing unit, configured to perform weighting processing on the second probability distribution and an attention weight corresponding to each word in the first text sequence at the current time step according to the first probability and the second probability, and determine a target probability distribution, where the target probability distribution is used to indicate target probabilities that each word in the preset word list and the first text sequence is an answer word corresponding to the current time step; and the screening unit is used for screening in the preset word list and the first text sequence according to the target probability and determining the answer word corresponding to the current time step.

In some embodiments of the present application, the attention weight determination module 1020 includes: a third splicing unit, configured to splice the coding vector of each word in the first text sequence with the output hidden vector of the decoder network at the current time step, respectively, to obtain a third splicing vector corresponding to each word in the first text sequence; the second linear transformation unit is used for performing linear transformation on each third splicing vector to obtain a middle transformation vector corresponding to each word in the first text sequence; the activation processing unit is used for performing activation processing on each intermediate transformation vector to obtain an initial attention weight value of each word in the first text sequence to the answer word corresponding to the current time step; and the normalization processing unit is used for performing normalization processing on each initial attention weight value to obtain an attention weight value corresponding to each word in the first text sequence at the current time step.

In some embodiments of the present application, the answer generating device further comprises: and the second text splicing module is used for performing text splicing on the question text, the answer polarity corresponding to the question text and the answer corresponding to the question text according to the text to obtain the first text sequence.

FIG. 11 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application. It should be noted that the computer system 1100 of the electronic device shown in fig. 11 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 11, the computer system 1100 includes a Central Processing Unit (CPU) 1101, which can perform various appropriate actions and processes, such as executing the methods in the above-described embodiments, according to a program stored in a Read-Only Memory (ROM) 1102 or a program loaded from a storage section 1108 into a Random Access Memory (RAM) 1103. In the RAM 1103, various programs and data necessary for system operation are also stored. The CPU1101, ROM1102, and RAM 1103 are connected to each other by a bus 1104. An Input/Output (I/O) interface 1105 is also connected to bus 1104.

The following components are connected to the I/O interface 1105: an input portion 1106 including a keyboard, mouse, and the like; an output section 1107 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 1108 including a hard disk and the like; and a communication section 1109 including a Network interface card such as a LAN (Local Area Network) card, a modem, or the like. The communication section 1109 performs communication processing via a network such as the internet. A driver 1110 is also connected to the I/O interface 1105 as necessary. A removable medium 1111 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1110 as necessary, so that a computer program read out therefrom is mounted into the storage section 1108 as necessary.

In particular, according to embodiments of the application, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 1109 and/or installed from the removable medium 1111. When the computer program is executed by a Central Processing Unit (CPU) 1101, various functions defined in the system of the present application are executed.

It should be noted that the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.

As another aspect, the present application also provides a computer-readable storage medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable storage medium carries computer readable instructions which, when executed by a processor, implement the method of any of the embodiments described above.

According to an aspect of the present application, there is also provided an electronic device, including: a processor; a memory having computer readable instructions stored thereon which, when executed by the processor, implement the method of any of the above embodiments.

According to an aspect of an embodiment of the present application, there is provided a computer program product or a computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method of any of the above embodiments.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present application.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

31页详细技术资料下载

Answer generation method and device, electronic equipment and storage medium

相关技术

网友询问留言