Conversation method, conversation device, electronic equipment and computer-readable storage medium

文档序号：1921678 发布日期：2021-12-03 浏览：20次中文

阅读说明：本技术 对话方法、装置、电子设备及计算机可读存储介质 (Conversation method, conversation device, electronic equipment and computer-readable storage medium ) 是由郑银河于 2020-06-01 设计创作，主要内容包括：本申请实施例提供了一种对话方法、装置、电子设备及计算机可读存储介质,涉及人工智能技术领域。该方法包括：获取目标问句；基于答句预测网络预测与目标问句对应的具有预设对话风格的目标预测答句；答句预测网络是基于具有预设对话风格的答句经过问句预测后训练得到的。本申请实施例提供的可以有效获取到足够的具有预设对话风格的答句和问句预测问句对初始答复预测模型进行训练,从而提高答复预测模型的预测具有预设对话风格的目标预测答句的准确率。(The embodiment of the application provides a conversation method, a conversation device, electronic equipment and a computer readable storage medium, and relates to the technical field of artificial intelligence. The method comprises the following steps: acquiring a target question; predicting a target prediction answer sentence with a preset dialogue style corresponding to the target question sentence based on an answer sentence prediction network; the answer prediction network is obtained by training answer sentences with preset dialogue styles after question prediction. The initial reply prediction model can be effectively trained by obtaining enough answers and question prediction sentences with the preset dialogue style, so that the accuracy of predicting the target prediction answers with the preset dialogue style by the reply prediction model is improved.)

1. A method of dialogues, comprising:

acquiring a target question;

predicting a target prediction answer sentence with a preset dialogue style corresponding to the target question sentence based on an answer sentence prediction network; the answer prediction network is obtained by training answer sentences with the preset dialogue style after question prediction.

2. The method according to claim 1, wherein before the sentence-based prediction network obtains a target predicted sentence corresponding to the target question sentence and having a preset dialogue style, the method further comprises:

determining a preset style label vector corresponding to the preset dialogue style;

and acquiring the answer sentence prediction network based on the preset style label vector and the answer sentence.

3. The method of claim 1, wherein the obtaining the answer prediction network based on the preset style label vector and the answer comprises:

predicting a predicted question sentence corresponding to the answer sentence;

and training an initial answer prediction network based on the answer, the preset style label vector and the predicted question to obtain the answer prediction network.

4. The method of claim 3, wherein predicting the predicted question corresponding to the answer sentence comprises:

and inputting the answer sentence into a question and sentence prediction network to obtain the corresponding predicted question sentence.

5. The method of claim 4, wherein said inputting said answers into a question prediction network, prior to obtaining said corresponding predicted question, further comprises:

obtaining a sample question sentence and a corresponding sample answer sentence;

and training an initial question prediction network based on the sample question and the sample answer to obtain the question prediction network.

6. The method of claim 5, wherein before the training an initial sentence prediction network based on the answer sentence, the predetermined style label vector, and the question sentence to obtain the answer sentence prediction network, the method further comprises:

and training an initial prediction network based on the sample question sentences and the sample answer sentences to obtain the initial answer sentence prediction network.

7. The method of claim 6, wherein training an initial prediction network based on the sample question sentences and the sample answer sentences to obtain the initial answer sentence prediction network comprises:

determining sample style label vectors corresponding to the sample answer sentences and the sample question sentences;

training the initial prediction network based on the sample question sentence, the sample answer sentence and the sample style label vector to obtain the initial answer sentence prediction network.

8. A dialogue apparatus, comprising:

the acquisition module is used for acquiring a target question;

the prediction module is used for predicting a target prediction answer sentence with a preset dialogue style corresponding to the target question sentence based on an answer sentence prediction network; the answer prediction network is obtained by training answer sentences with the preset dialogue style after question prediction.

9. An electronic device, comprising:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: performing the dialog method according to any one of claims 1 to 7.

10. A computer readable storage medium, characterized in that said storage medium stores at least one instruction, at least one program, a set of codes, or a set of instructions that is loaded and executed by said processor to implement a dialog method according to any one of claims 1 to 7.

Technical Field

The present application relates to the field of language processing technologies, and in particular, to a dialog method, an apparatus, an electronic device, and a computer-readable storage medium.

Background

With the development of information technology and the development of internet technology, users often need to query various information through the internet to obtain corresponding answers, and in order to make the conversation more fluent and natural, answers with various conversation styles are generally generated.

Currently, it is common to train a response model to obtain question and answer having a specific dialog style, so as to generate answers having a dialog style, and it is necessary to optimize the manner in which responses having a specific dialog style are generated.

Disclosure of Invention

The application provides a conversation method, a conversation device, electronic equipment and a computer-readable storage medium, which are used for solving the problem of more accurately outputting a target prediction answer sentence with a preset style when an intelligent chat robot is applied to interaction with a user, and the technical scheme is as follows:

in a first aspect, a dialog method is provided, which includes:

acquiring a target question;

In a second aspect, a dialog device is provided, the device comprising:

the acquisition module is used for acquiring a target question;

the prediction module is used for predicting a target prediction answer sentence with a preset dialogue style corresponding to the target question sentence on the basis of an answer sentence prediction network; the answer prediction network is obtained by training answer sentences with preset dialogue styles after question prediction.

In a third aspect, an electronic device is provided, which includes:

one or more processors;

a memory;

one or more application programs, wherein the one or more application programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: and executing the operation corresponding to the dialogue method shown according to the first aspect.

In a fourth aspect, there is provided a computer readable storage medium having stored thereon at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by a processor to implement the dialog method according to the first aspect.

The beneficial effect that technical scheme that this application provided brought is:

compared with the prior art, the method and the device for dialogue, the electronic equipment and the computer-readable storage medium have the advantages that question prediction is carried out on answer sentences with the preset dialogue style, corresponding predicted question sentences are obtained, and the trained answer prediction model is obtained on the basis of the predicted question sentences and the answer sentences, so that the answer prediction model can predict target predicted answer sentences with the preset dialogue style on the basis of the question sentences, enough answer sentences with the preset dialogue style and the predicted question sentences can be effectively obtained to train the initial answer prediction model, and the accuracy of the answer prediction model for predicting the target predicted answer sentences with the preset dialogue style is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.

Fig. 1 is a schematic flowchart of a dialog method according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a dialog method according to an embodiment of the present application;

fig. 3 is a schematic flowchart of a dialog method according to an embodiment of the present application;

fig. 4 is a schematic flowchart of a dialog method according to an embodiment of the present application;

fig. 5 is a schematic flowchart of a dialog method according to an embodiment of the present application;

fig. 6 is a schematic flowchart of a dialog method according to an embodiment of the present application;

fig. 7 is a schematic flowchart of a dialog method according to an embodiment of the present application;

FIG. 8 is a schematic diagram of a structure of a response prediction model in an example of the present application;

FIG. 9 is a schematic flow chart of model training in an example of the present application;

fig. 10 is a schematic structural diagram of a dialog device according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

In both academia and industry, establishing a dialog that can generate stylized and coherent responses has become an important topic. Therefore, more vivid conversations can be generated, and more attractive conversations can be carried out by utilizing the language style matching phenomenon, which shows that people tend to imitate the language style of other people in the communication process and pursue higher participation degree. For example, the form of the customer service's response may be tailored to the user, and may also mimic the speech or writing style of certain celebrities or writers, which may facilitate the development of entertainment applications.

The dialogue generating module is an important component in the dialogue system and mainly plays a role in generating fluent dialogue replies. Generating a dialog reply with a specific style is a difficult problem in the field of current dialog system research.

It is currently common to focus on the construction of stylized dialog generation models using stylized dialog pairs, i.e., dialog pairs consisting of question sentences and stylized answer sentences.

The manner in which stylized dialog responses are generated has been investigated in a variety of studies, where the definition of style covers rather ambiguous and broad concepts such as emotions or personas. Previous research has typically been conducted under full supervision requiring programmatic session data. However, in most cases, the captured language style functionality is embedded in non-conversational text that cannot be used directly by supervised methods.

In the case of dialog pairs, it is focused on building a dialog model using a pre-training technique, i.e., a transform network-based encoder-decoder architecture, and adding style-dependent tags in the decoder so that the decoder can generate replies with a specific style.

In practical applications, dialogs with a qualification are not readily available. In fact, the stylized question that we can obtain at ordinary times is not usually presented in the form of a dialog pair. These stylized corpora are typically derived from non-conversational text such as novels, blogs, etc.

Also associated with this is unsupervised text style conversion, which aims to change the style of the input text while preserving the input content. However, using such a model, a simple stylized dialog can be achieved by: first, an answersentence y is generated for a question x using a conventional dialogue model, and then the answersentence y is converted into a text y having a dialogue style while preserving its content, however, this conversion may impair the coherence between the question x and the answersentence y, and thus inappropriate content may be introduced.

In view of at least one technical question existing in the prior art or where improvement is needed, embodiments of the present application provide a dialog method, apparatus, electronic device, and computer-readable storage medium.

The following describes the technical solutions of the present application and how to solve the above technical questions in specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

A possible implementation manner is provided in the embodiment of the present application, and as shown in fig. 1, a dialog method is provided, which may include the following steps:

step S101, obtaining a target question sentence;

step S102, predicting a target prediction answer sentence with a preset dialogue style corresponding to the target question sentence based on an answer sentence prediction network; the answer prediction network is obtained by training answer sentences with preset dialogue styles after question prediction.

The question (post) and the answer (response) form a dialog pair in the dialog, and the question is not necessarily in the form of a question, nor is the answer necessarily in the form of a reply.

The answer prediction network may be a network for inputting a target question and predicting a target predicted answer corresponding to the target question.

The conversation style can be different language materials and modes adopted according to different communication occasions, purposes, tasks and the inheritability and the quality of the communication person.

Specifically, the conversation styles may include "respect style to leader/elder", "casual style to friend/colleague", and "intimacy style to family" etc. according to different occasions of communication and identities.

The language style of the question and the target prediction answer in the conversation process can be obtained, and the conversation style can also comprise a daily speech style, an application style, an artistic style, an individual language style and an individual language style; the individual language style can be different according to the individual writing habit or speaking habit, for example, the language style of the gold martial arts novel narrative language is true, such as Qiantangjiang tide, and is promoted to billow and progress layer by layer; the language style of the martial arts novel of the gulong often sees strange language and modern consciousness in common, the personality is prominent, the rhythm is clear and suspense, and the space is short and has seriousness.

In the specific implementation process, different preset conversation styles correspond to different target predictive answer sentences for the same target question sentence, wherein the different target predictive answer sentences mean that the meanings of the target predictive answer sentences may be the same, but the expression modes are different.

Specifically, as shown in fig. 2, an answer sentence with a preset dialogue style may be obtained, where the answer sentence may not necessarily be in a dialogue form, for example, a narrative segment may be obtained from a novel, a question-sentence prediction may be performed on the obtained text segment, that is, the text segment is used as a target predicted answer sentence, a predicted question sentence corresponding to the answer sentence is predicted, and then an answer-sentence prediction network is obtained based on the predicted question sentence and the answer sentence training.

In particular implementations, the sentence prediction network may include a transform network.

In the embodiment, the question prediction is performed on the answer sentence with the preset dialogue style to obtain the corresponding predicted question, and the trained answer prediction model is obtained based on the predicted question and the answer sentence, so that the answer prediction model can predict the target predicted answer sentence with the preset dialogue style based on the question, and can effectively obtain enough answer sentences with the preset dialogue style and the predicted question sentences to train the initial answer prediction model, thereby improving the accuracy of the answer prediction model for predicting the target predicted answer sentence with the preset dialogue style.

As shown in fig. 3, before predicting a target predicted answer sentence with a preset dialog style corresponding to the target question sentence based on the answer sentence prediction network in step S102, a possible implementation manner of the embodiment of the present application may further include:

step S101a, determining a preset style label vector corresponding to the preset dialog style;

step S101b, acquiring the answer sentence prediction network based on the preset style label vector and the answer sentence.

Specifically, corresponding sample style label vectors can be extracted from the sample answer sentences and the sample question sentences; and acquiring a corresponding sample question based on the sample question, and adjusting parameters of the initial prediction network based on the sample question, the sample question and the sample style label vector, so that the trained initial question prediction network can input a target prediction answer with a conversation style.

In the above embodiment, the answer sentence prediction network is obtained through the preset style tag vector and the answer sentence, so that the preset dialogue style in the target prediction answer sentence input by the answer sentence prediction network is more obvious and the dialogue is more natural.

In a possible implementation manner of the embodiment of the present application, the obtaining the answer prediction network based on the preset style label vector and the answer in step S101b may include:

(1) and predicting a predicted question sentence corresponding to the answer sentence.

The question prediction network may be a network that can predict a predicted question corresponding to a target predicted answer when the corresponding target predicted answer is input.

Specifically, predicting the question sentence corresponding to the answer sentence may include: and inputting the answer sentence into a question and sentence prediction network to obtain a corresponding predicted question and sentence.

In a specific implementation process, the question prediction network may include a transform network, and the transform network may be trained to obtain the question prediction network.

(2) And training an initial answer prediction network based on the answer, the preset style label vector and the predicted question to obtain an answer prediction network.

Specifically, an initial answer prediction network is trained based on an answer, a question predictor and a preset style label vector, parameters of the initial answer prediction network are adjusted to obtain an answer prediction network, and the obtained answer prediction network can output a target predicted answer with a preset dialogue style.

As shown in fig. 4, in the above embodiment, the dialog method may include:

1) obtaining an answer sentence with a preset dialogue style;

2) inputting the answer sentence into the trained question sentence prediction network to obtain a corresponding predicted question sentence;

3) determining a preset style label vector corresponding to a preset dialogue style;

4) training an initial answer prediction network based on the answer, the predicted question and the preset style label vector to obtain an answer prediction network;

5) and inputting the target question sentence into an answer sentence prediction network to obtain a target prediction answer sentence with a preset dialogue style.

In the above embodiment, the answer sentence with the preset dialogue style is input into the trained question prediction network to obtain the corresponding predicted question sentence, so as to construct the dialogue pair with the preset dialogue style, and then the initial answer sentence prediction network is trained based on the dialogue pair with the preset dialogue style to obtain the answer sentence prediction network, so that the answer sentence prediction network can input the target predicted answer sentence which corresponds to the target question sentence and has the preset dialogue style.

A possible implementation manner of the embodiment of the present application may be that the answer sentence is input into the question prediction network, and before the corresponding predicted question sentence is obtained, the method may further include:

(1) obtaining a sample question sentence and a corresponding sample answer sentence;

(2) and training the initial question prediction network based on the sample question and the sample answer sentences to obtain the question prediction network.

Specifically, the sample question sentences and the sample answer sentences may be common dialog pairs, that is, may not have any dialog style, or may have a common dialog style, and the sample question sentences and the corresponding sample answer sentences may be obtained from dialog records of a plurality of sample users or dialog texts of literary works.

Specifically, the initial question prediction network may be a preset transformer network, and the initial question prediction network is trained based on the sample question and the sample answer, so that the trained question prediction network may have the capability of predicting the predicted question corresponding to the target predicted answer.

In a possible implementation manner of the embodiment of the present application, before training the initial question-answering prediction network based on the question answering, the question predicting and the style label vector in step S101b, the method may further include:

and training the initial prediction network based on the sample question sentences and the sample answer sentences to obtain the initial answer sentence prediction network.

Specifically, training the initial prediction network based on the sample question sentences and the sample answer sentences to obtain an initial answer sentence prediction network, which may further include:

a. determining sample style label vectors corresponding to the sample answer sentences and the sample question sentences;

b. and training the initial prediction network based on the sample question sentences, the sample answer sentences and the sample style label vectors to obtain the initial answer sentence prediction network.

Specifically, corresponding sample style label vectors can be extracted from the sample answer sentences and the sample question sentences; and adjusting parameters of the initial prediction network based on the sample question sentences, the sample answer sentences and the sample style label vectors, so that the trained initial answer sentence prediction network can input answer sentences with conversation styles.

As shown in fig. 5, in the above embodiment, the dialog method may include:

1) obtaining a sample question sentence and a sample answer sentence;

2) determining sample style label vectors S corresponding to both sample answer sentences and sample question sentences₀；

3) Training an initial question prediction network based on the sample answer sentences and the sample question sentences to obtain a question prediction network;

4) based on sample answer sentence, sample question sentence and sample style label vector S₀Training the initial prediction network to obtain an initial answer prediction network;

5) obtaining an answer sentence with a preset dialogue style;

6) determining a preset style tag vector S corresponding to a preset dialog style₁；

7) Inputting the answer sentence into a question and sentence prediction network to obtain a predicted question and sentence;

8) label vector S based on preset style₁The answer sentence and the predicted question sentence train the initial answer sentence prediction network to obtain an answer sentence prediction network;

9) and acquiring a target question, and inputting the target question into an answer prediction network to obtain a target prediction answer with a preset dialogue style.

In the above embodiment, the same sample question and sample answer are adopted, and the initial prediction network and the initial question prediction network are trained at the same time, so that the trained question prediction network can have the capability of predicting the predicted question corresponding to the target predicted answer, and the trained initial answer prediction network has the capability of predicting the target predicted answer corresponding to the target question, thereby reducing the amount of training data information and improving the training efficiency.

In a possible implementation manner of the embodiment of the application, answers of various conversation styles can be preselected, and an initial answer prediction network is trained to obtain answer prediction networks corresponding to a plurality of different conversation styles; the target predicted answer prediction network corresponding to the dialogue style can be determined based on the target dialogue style of the target predicted answer desired by the user, and the target question is input into the target predicted answer prediction network to obtain the target predicted answer with the target dialogue style.

As shown in fig. 6, in the present embodiment, the dialog method may include:

s1, obtaining sample question sentences and target prediction answer sentence sample answer sentences;

s2, determining a second style label direction corresponding to the target prediction answer sample answer and the sample question;

s3, training the initial question prediction network based on the target prediction question sample answer sentences and the sample question sentences to obtain a question prediction network;

s4, sample answer sentence, sample question sentence and sample style label vector S based on target prediction answer sentence₀Training the initial prediction network to obtain an initial answer prediction network;

s5, obtaining answers with a plurality of conversation styles;

s6, determining style label vectors S corresponding to multiple dialog styles₁、S₂……S_n；

S7, inputting the answers into question prediction network to obtain corresponding predicted question;

s8, training the initial answer sentence prediction network based on each group of style label vectors, answer sentences and predicted question sentences to obtain answer sentence prediction networks corresponding to each conversation style;

s9, acquiring a target dialogue style selected by a user, and determining a target prediction answer prediction network corresponding to the target dialogue style;

and S10, acquiring the target question, inputting the target question into the target prediction answer prediction network, and obtaining the target prediction answer with the target dialogue style.

The dialogue method of the present application will be described below by taking an example in which an answer prediction network and a question prediction network are transform networks.

In the following example, a sentence prediction network includes a stylized encoder and a stylized decoder; the question prediction network comprises a reverse encoder and a reverse decoder, wherein an answer (namely, an answer y shown in the figure) is input to the reverse encoder and the reverse decoder to obtain a corresponding predicted question (namely, a question x shown in the figure), and then a stylized encoder and a stylized decoder are trained according to the answer, the predicted question and a preset style label vector (namely, a style s shown in the figure) to obtain the answer prediction network; and when the target question is acquired later, inputting the target question into the answer prediction network to obtain the target prediction answer corresponding to the style s.

In the following examples, the answer prediction network and the initial answer prediction network may be referred to as stylized dialogue models, and the question prediction network and the initial question prediction network may be referred to as inverse dialogue models.

As shown in fig. 7, fig. 7 is a schematic structural diagram of an answer sentence prediction network and a question sentence prediction network provided in this embodiment. It mainly includes two pairs of coder-decoder structures, specifically, our model includes a stylized coder e_pAnd a stylized decoder, the stylized code will map a user input statement post x into the hidden variable space, and then the stylized decoder will decode the hidden representation in the hidden variable space into a stylized reply response y.

To address the lack of stylized dialog data, we have devised a reverse dialog model to help generate pairs of pseudo-dialogs. As shown in fig. 7, in particular, our model structure is composed of two mirror image sub-modules. The first sub-module is to give a stylized dialogue model of question x and stylized label Si, resulting in a stylized answer y. The submodel includes an encoder e and a decoder d, the stylized code maps a user input sentence x into hidden variable space, and the stylized decoder decodes the hidden representation in the hidden variable space into a stylized reply y. The second submodule is a reverse dialogue model intended to generate a question x for a given answer y. In particular, this submodule is mirrored by the encoder and decoder of the first submodule, i.e. the answer y is encoded using the inverse encoder and the question x is decoded using the inverse decoder. Note that the reverse dialogue model aims to maintain semantic consistency between question and answer sentences. We therefore omit stylized tags in the backward decoder to encourage it to focus more on the semantic aspect of the dialog. The encoder and decoder may be parametrized using a transformer architecture and these weights initialized using a pre-trained GPT model. Furthermore, we also share the weights of the encoder and decoder from the same sub-module to save memory, i.e. the weights of the encoder and decoder are shared and the weights of the inverse encoder and inverse decoder are shared. I.e., completing the conversion from question sentence to answer sentence. If a formal formulation is used, the stylized dialog generation task is to solve the following optimization question:

y＝arg max_y′p(y′|x,S_i) (1)

in the above formula, y and y' represent the answer sentence decoded by the model, x represents the question sentence, s_iRepresenting a certain specified style.

In this study, we add stylized tags style s to the stylized decoder, so that the style carried by reply y can be specified when it is generated.

Another main component of our model is the Inverse dialog model, which contains an Inverse encoder_RAnd a reverse decoder Inverse decoder P_r2pWherein the reverse encoder receives as input a dialog reply and the hidden representation generated from the reverse encoder is fed to the reverse decoder to generate a dialog question. I.e. the conversion from an answer sentence to a question sentence is completed.

The stylized embedded Style embedding shown in fig. 7 is the stylized label we add to the stylized decoder. As shown in fig. 8, in the present application, a Style Routing (Style Routing) mechanism is proposed, which adds an additional Style embedding on the basis of the original multi-head attention (multi-head attention) output in each decoder module.

The output of the original multi-head attention mechanism is shown as follows:

R_post＝MHA[e_w(y_p),e(x),e(x)] (3)

in the above formula, MMHA and MHA respectively represent a multi-head attention mechanism with masks and a multi-head attention mechanism, R_prevAnd R_postRepresenting the output of a masked multi-attention mechanism and a multi-attention mechanism, respectively, in particular R_prevAnd R_postIs a sequence composed of a series of hidden representations. y is_pIndicating the sequence that has been decoded at the current time step, e_w(y_p) Denotes y_pThe word vector representation to which this sequence corresponds. e (x) represents a hidden variable sequence corresponding to the question x, and the hidden variable sequence is obtained by inputting the question x into the stylized coder.

In the original solution, these two outputs would be averaged:

R_context＝(R_prev+R_post)/2 (4)

wherein R is_contextThe implicit representation sequence after the averaging is represented.

In our model, however, an additional style-embedding vector is added to this averaged result:

R_merge＝R_context+e_s(S_i) (5)

wherein e_s(S_i) Representation style S_iThe corresponding style is embedded into the vector. R_mergeThe sequence of hidden representations after the blending style embedding vector is represented.

It is noted that the proposed stylized routing mechanism may also be used in combination with other methods to further enhance the stylized decoder. In particular, in our experiments, the embedded vector corresponding to the stylized label Si is also used as an initial vector representation in the decoding process and added to the word vector for each position along with the position embedding. Unlike previous studies that focused on modeling continuous context, the styles modeled in our studies are discrete and style embedding is assigned a higher priority in our stylized routing mechanism. Furthermore, we are also the first to use this stylized routing mechanism approach in the stylized dialog generation task.

In the model training process, the method uses three parts of loss functions:

1) question-to-answer cross entropy loss (i.e., sample question-to-sample answer loss):

in the above formula, the first and second carbon atoms are,conversational input data, S, representing a model₀Style labels corresponding to conversational input data representing a model<x,y>Representing dialogs sampled from dialog input data, where x represents a question, y represents a question, e (x) represents a sequence of hidden variables corresponding to question x, p_dRepresenting a stylized decoder.

2) Cross entropy loss from question to question (i.e. loss of sample question to sample question):

in the above formula, the first and second carbon atoms are,is a hidden variable sequence obtained by inputting a question y into a reverse encoder,representing a reverse decoder model

3) Reconstruction loss (i.e., loss of predicted question to answer sentence) using the reverse dialogue model:

in the above formula, the first and second carbon atoms are,representing a set of stylized text, t representing a stylized sentence sampled from the stylized text, S₁The style label to which t corresponds is represented,denotes the hidden variable sequence obtained after t is input into the inverse encoder, and x' denotes the pseudo question obtained from decoding in the inverse decoder.

It should be noted that we cannot optimize using the traditional gradient propagation-based approachThis loss is due to the discrete sampling operations involved in calculating this loss. Therefore, in our work, when optimizingIn time, we will first fix the weights of the reverse dialogue model and use pseudo question x sampled from the reverse dialogue model₀To train the stylized dialog model. Different from the prior art, the method for generating the pseudo question sentence x by adopting the top-k sampling method based on the column search₀Because the mapping between question and answer sentences is not unique. The present application introduces an interactive training process to train the proposed model. Specifically, in each training iteration, we first use a collection of sampled dialog pairs, each by optimizing a loss functionAndto update the stylized dialog model and the reverse dialog model. Then sample n_sStylized text generated using inverse dialog model samplingm pseudo dialog question sentences so that n can be constructed_sXm pseudo dialogs for using the loss functionA stylized dialog model is trained. It should be noted that we will first base our on the loss functionAnd training a reverse dialogue model, and then decoding the answer sentence by using the reverse dialogue model to obtain a pseudo question sentence.

As shown in fig. 9, our model training procedure can be summarized as follows:

s1, initializing a stylized dialogue model and a reverse dialogue model;

s2, sample n_dEach has S₀A conversation pair of conversation styles;

s3 loss function based on equation (6)Training a stylized dialogue model;

s4, loss function based on equation (7)Training a reverse dialogue model;

s5, detecting whether the training step number is larger than a predefined value N_f(ii) a If yes, go to step S6; if not, go to step S2;

s6, sample n_sStylized text (i.e., answer sentence);

s7, for each stylized text, using reverse dialogue model sampling to generate m pseudo dialogue question (namely prediction question);

s8, using n_sOptimizing a loss function for pseudo dialogs formed by xm stylized texts and pseudo dialog question sentencesUp toAnd the preset conditions are met.

Wherein the preset condition may be a loss functionLess than a predetermined value, or the number of training steps is greater than a predetermined value.

The above training method, data set, can be derived from table 1 below:

according to the dialogue method, the question prediction is carried out on the answer sentence with the preset dialogue style to obtain the corresponding predicted question, the trained answer prediction model is obtained based on the predicted question and the answer sentence, so that the answer prediction model can predict the target predicted answer sentence with the preset dialogue style based on the question, enough answer sentences with the preset dialogue style and the predicted question sentences can be effectively obtained to train the initial answer prediction model, and therefore the accuracy of the answer prediction model for predicting the target predicted answer sentence with the preset dialogue style is improved.

Furthermore, the answer sentence prediction network is obtained through the preset style tag vector and the answer sentence, so that the preset conversation style in the target prediction answer sentence input by the answer sentence prediction network is more obvious, and the conversation is more natural.

Furthermore, the same sample question sentences and sample answer sentences are adopted, and the initial prediction network and the initial question sentence prediction network are trained at the same time, so that the trained question sentence prediction network has the capability of predicting the predicted question sentences corresponding to the target predicted answer sentences, the trained initial answer sentence prediction network has the capability of predicting the target predicted answer sentences corresponding to the target question sentences, the training data information amount is reduced, and the training efficiency can be improved.

The above embodiment introduces the dialog method through an angle of the method flow, and the following introduces through an angle of the virtual module, which is specifically as follows:

the embodiment of the present application provides a dialog apparatus 100, as shown in fig. 10, the apparatus 100 may include an obtaining module 1001 and a predicting module 1002, where:

an obtaining module 1001 configured to obtain a target question;

the predicting module 1002 is configured to predict, based on an answer prediction network, a target predicted answer having a preset dialogue style corresponding to the target question; the answer prediction network is obtained by training answer sentences with the preset dialogue style after question prediction.

In a possible implementation manner of the embodiment of the present application, the dialog apparatus 100 further includes a determining module, configured to:

determining a preset style label vector corresponding to the preset dialogue style;

and acquiring the answer sentence prediction network based on the preset style label vector and the answer sentence.

In a possible implementation manner of the embodiment of the present application, the determining module, when obtaining the answer prediction network based on the preset style label vector and the answer, is configured to:

predicting a predicted question sentence corresponding to the answer sentence;

and training an initial answer prediction network based on the answer, the preset style label vector and the predicted question to obtain the answer prediction network.

In a possible implementation manner of the embodiment of the present application, when predicting the question sentence corresponding to the answer sentence, the determining module is configured to:

and inputting the answer sentence into a question and sentence prediction network to obtain the corresponding predicted question sentence.

In a possible implementation manner of the embodiment of the present application, the dialog device 100 further includes a first training module, configured to:

obtaining a sample question sentence and a corresponding sample answer sentence;

and training an initial question prediction network based on the sample question and the sample answer to obtain the question prediction network.

In a possible implementation manner of the embodiment of the present application, the dialog device 100 further includes a second training module, configured to:

and training an initial prediction network based on the sample question sentences and the sample answer sentences to obtain the initial answer sentence prediction network.

In a possible implementation manner of the embodiment of the present application, the second training module is configured to, when training the initial prediction network based on the sample question sentences and the sample answer sentences to obtain the initial answer sentence prediction network,:

determining sample style label vectors corresponding to the sample answer sentences and the sample question sentences;

training the initial prediction network based on the sample question sentence, the sample answer sentence and the sample style label vector to obtain the initial answer sentence prediction network.

According to the dialogue device, the question prediction is carried out on the answer sentence with the preset dialogue style to obtain the corresponding predicted question, the trained answer prediction model is obtained based on the predicted question and the answer sentence, so that the answer prediction model can predict the target predicted answer sentence with the preset dialogue style based on the question, enough answer sentences with the preset dialogue style and the predicted question sentences can be effectively obtained to train the initial answer prediction model, and the accuracy of the answer prediction model for predicting the target predicted answer sentence with the preset dialogue style is improved.

The picture dialog device of the embodiment of the present disclosure may execute the picture dialog method provided by the embodiment of the present disclosure, and the implementation principles thereof are similar, the actions executed by each module in the picture dialog device of each embodiment of the present disclosure correspond to the steps in the picture dialog method of each embodiment of the present disclosure, and for the detailed function description of each module of the picture dialog device, reference may be specifically made to the description in the corresponding picture dialog method shown in the foregoing, and details are not repeated here.

The apparatus provided in the embodiment of the present application may implement at least one of the modules through an AI model. The functions associated with the AI may be performed by the non-volatile memory, the volatile memory, and the processor.

The processor may include one or more processors. At this time, the one or more processors may be general-purpose processors, such as a Central Processing Unit (CPU), an Application Processor (AP), or the like, or pure graphics processing units, such as a Graphics Processing Unit (GPU), a Vision Processing Unit (VPU), and/or AI-specific processors, such as a Neural Processing Unit (NPU).

The one or more processors control the processing of the input data according to predefined operating rules or Artificial Intelligence (AI) models stored in the non-volatile memory and the volatile memory. Predefined operating rules or artificial intelligence models are provided through training or learning.

Here, the provision by learning means that a predefined operation rule or an AI model having a desired characteristic is obtained by applying a learning algorithm to a plurality of learning data. This learning may be performed in the device itself in which the AI according to the embodiment is performed, and/or may be implemented by a separate server/system.

The AI model may include a plurality of neural network layers. Each layer has a plurality of weight values, and the calculation of one layer is performed by the calculation result of the previous layer and the plurality of weights of the current layer. Examples of neural networks include, but are not limited to, Convolutional Neural Networks (CNNs), Deep Neural Networks (DNNs), Recurrent Neural Networks (RNNs), Restricted Boltzmann Machines (RBMs), Deep Belief Networks (DBNs), Bidirectional Recurrent Deep Neural Networks (BRDNNs), generative confrontation networks (GANs), and deep Q networks.

A learning algorithm is a method of training a predetermined target device (e.g., a robot) using a plurality of learning data to make, allow, or control the target device to make a determination or prediction. Examples of the learning algorithm include, but are not limited to, supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning.

The dialog apparatus provided in the embodiment of the present application is described above from the perspective of function modularization, and then, the electronic device provided in the embodiment of the present application is described from the perspective of hardware implementation, and a computing system of the electronic device is also described.

Based on the same principle as the method shown in the embodiments of the present disclosure, embodiments of the present disclosure also provide an electronic device, which may include but is not limited to: a processor and a memory; a memory for storing computer operating instructions; and the processor is used for executing the dialogue method shown in the embodiment by calling the computer operation instruction. Compared with the prior art, the dialogue method can effectively acquire enough answer sentences and question sentence prediction question sentences with the preset dialogue style to train the initial answer prediction model, so that the accuracy of the answer prediction model for predicting the target prediction answer sentences with the preset dialogue style is improved.

In an alternative embodiment, an electronic device is provided, as shown in FIG. 11, the electronic device 1100 shown in FIG. 11 comprising: a processor 1101 and a memory 1103. The processor 1101 is coupled to the memory 1103, such as by a bus 1102. Optionally, the electronic device 1100 may also include a transceiver 1104. It should be noted that the transceiver 1104 is not limited to one in practical applications, and the structure of the electronic device 1100 is not limited to the embodiment of the present application.

The Processor 1101 may be a CPU (Central Processing Unit), a general purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 1101 may also be a combination of computing functions, e.g., comprising one or more microprocessors, DSPs and microprocessors, and the like.

Bus 1102 may include a path that transfers information between the above components. The bus 1102 may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 1102 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 11, but this is not intended to represent only one bus or type of bus.

The Memory 1103 may be a ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, a RAM (Random Access Memory) or other type of dynamic storage device that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read Only Memory) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these.

The memory 1103 is used for storing application program codes for executing the present application, and the execution is controlled by the processor 1101. The processor 1101 is configured to execute application program code stored in the memory 1103 to implement the content shown in the foregoing method embodiments.

Among them, electronic devices include but are not limited to: mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 11 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

According to the present disclosure, in the method of performing a dialog in an electronic device, a method for reasoning about or predicting a dialog may recommend/perform a dialog by using a question sentence using an artificial intelligence model. A processor of the electronic device may perform pre-processing operations on the data to convert into a form suitable for use as an artificial intelligence model input. The artificial intelligence model may be obtained through training. Here, "obtained by training" means that a basic artificial intelligence model is trained with a plurality of pieces of training data by a training algorithm to obtain a predefined operation rule or artificial intelligence model configured to perform a desired feature (or purpose). The artificial intelligence model can include a plurality of neural network layers. Each of the plurality of neural network layers includes a plurality of weight values, and the neural network calculation is performed by a calculation between a calculation result of a previous layer and the plurality of weight values.

Inferential forecasting is a technique for making logical inferences and predictions by determining information, including, for example, knowledge-based reasoning, optimization predictions, preference-based planning or recommendations.

The present application provides a computer-readable storage medium, on which a computer program is stored, which, when running on a computer, enables the computer to execute the corresponding content in the foregoing method embodiments. Compared with the prior art, the dialogue method can effectively acquire enough answer sentences and predicted question sentences with the preset dialogue style to train the initial reply prediction model, so that the accuracy of predicting the target predicted answer sentences with the preset dialogue style by the reply prediction model is improved.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to perform the methods shown in the above embodiments.

Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present disclosure may be implemented by software or hardware. The name of a module does not in some cases constitute a limitation of the module itself, and for example, the acquiring module may also be described as a "module that acquires a target question".

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

23页详细技术资料下载

Conversation method, conversation device, electronic equipment and computer-readable storage medium

相关技术

网友询问留言