Information processing method and device based on neural network, medium and electronic equipment

文档序号：1544893 发布日期：2020-01-17 浏览：8次中文

阅读说明：本技术 基于神经网络的信息处理方法及装置、介质和电子设备 (Information processing method and device based on neural network, medium and electronic equipment ) 是由王星郝杰涂兆鹏史树明于 2019-09-02 设计创作，主要内容包括：本发明公开了一种基于神经网络的信息处理方法、基于神经网络的信息处理装置、计算机可读介质和电子设备,涉及人工智能技术领域。该信息处理方法包括：获取与输入信息对应的目标信息序列,并基于目标信息序列确定出短语级表示序列；其中,短语级表示序列中包括多个短语元素；对短语级表示序列进行线性变换,得到与短语级表示序列对应的请求向量序列、键向量序列和值向量序列；计算请求向量序列与键向量序列之间的逻辑相似度,并对逻辑相似度进行非线性变换,得到与各短语元素对应的注意力权重分布；基于注意力权重分布和值向量序列,确定输入信息对应的第一网络表示序列。本发明可以提高自关注神经网络的性能。(The invention discloses an information processing method based on a neural network, an information processing device based on the neural network, a computer readable medium and electronic equipment, and relates to the technical field of artificial intelligence. The information processing method includes: acquiring a target information sequence corresponding to input information, and determining a phrase level representation sequence based on the target information sequence; wherein the phrase level representation sequence comprises a plurality of phrase elements; performing linear transformation on the phrase-level representation sequence to obtain a request vector sequence, a key vector sequence and a value vector sequence corresponding to the phrase-level representation sequence; calculating the logic similarity between the request vector sequence and the key vector sequence, and carrying out nonlinear transformation on the logic similarity to obtain the attention weight distribution corresponding to each phrase element; based on the attention weight distribution and the value vector sequence, a first network representation sequence corresponding to the input information is determined. The invention can improve the performance of the self-attention neural network.)

1. An information processing method based on a neural network, comprising:

acquiring a target information sequence corresponding to input information, and determining a phrase level representation sequence based on the target information sequence; wherein the phrase-level representation sequence includes a plurality of phrase elements;

performing linear transformation on the phrase-level representation sequence to obtain a request vector sequence, a key vector sequence and a value vector sequence corresponding to the phrase-level representation sequence;

calculating the logic similarity between the request vector sequence and the key vector sequence, and carrying out nonlinear transformation on the logic similarity to obtain the attention weight distribution corresponding to each phrase element;

determining a first network representation sequence corresponding to the input information based on the attention weight distribution and the value vector sequence.

2. The information processing method of claim 1, wherein determining a phrase-level representation sequence based on the target information sequence comprises:

performing phrase segmentation on the target information sequence to form a plurality of phrase groups;

performing feature fusion on features in each phrase group to generate feature vectors corresponding to the phrase groups;

and combining the feature vectors corresponding to the phrase groups to generate the phrase-level representation sequence.

3. The information processing method of claim 2, wherein phrase-segmenting the target information sequence into a plurality of phrase groups comprises:

and performing phrase segmentation on the target information sequence according to a preset phrase length to form a plurality of phrase groups.

4. The information processing method of claim 2, wherein phrase-segmenting the target information sequence into a plurality of phrase groups comprises:

and performing phrase segmentation on the target information sequence according to the syntactic structure of the input information to form a plurality of phrase groups.

5. The information processing method of claim 2, wherein combining the feature vectors corresponding to each phrase group to generate the phrase-level representation sequence comprises:

combining the feature vectors corresponding to the phrase groups to generate an intermediate representation sequence;

and performing dependency relationship reinforcement on the feature vectors corresponding to the phrase groups in the intermediate representation sequence to generate the phrase-level representation sequence.

6. The information processing method according to any one of claims 1 to 5, wherein performing linear transformation on the phrase-level representation sequence to obtain a request vector sequence, a key vector sequence, and a value vector sequence corresponding to the phrase-level representation sequence includes:

and utilizing three parameter matrixes capable of being trained to respectively carry out linear transformation on the phrase-level representation sequences to obtain a request vector sequence, a key vector sequence and a value vector sequence corresponding to the phrase-level representation sequences.

7. The information processing method according to any one of claims 1 to 5, characterized by further comprising:

determining a word level representation sequence corresponding to the input information; wherein the word-level representation sequence comprises a plurality of word elements;

generating a request vector sequence, a key vector sequence and a value vector sequence corresponding to the word-level representation sequence, and determining an attention weight distribution corresponding to each of the word elements;

determining a second network representation sequence corresponding to the input information based on the attention weight distribution corresponding to each word element and the value vector sequence corresponding to the word-level representation sequence;

determining a final network representation sequence corresponding to the input information using the first network representation sequence and the second network representation sequence.

8. An information processing apparatus based on a neural network, comprising:

the phrase level sequence determining module is used for acquiring a target information sequence corresponding to the input information and determining a phrase level representation sequence based on the target information sequence; wherein the phrase-level representation sequence includes a plurality of phrase elements;

the linear transformation module is used for carrying out linear transformation on the phrase level representation sequence to obtain a request vector sequence, a key vector sequence and a value vector sequence corresponding to the phrase level representation sequence;

an attention weight determining module, configured to calculate a logical similarity between the request vector sequence and the key vector sequence, and perform nonlinear transformation on the logical similarity to obtain an attention weight distribution corresponding to each phrase element;

and the network representation sequence determining module is used for determining a first network representation sequence corresponding to the input information based on the attention weight distribution and the value vector sequence.

9. A computer-readable medium on which a computer program is stored, the program implementing the neural network-based information processing method according to any one of claims 1 to 7 when executed by a processor.

10. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the neural network-based information processing method as claimed in any one of claims 1 to 7.

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to an information processing method based on a neural network, an information processing apparatus based on a neural network, a computer-readable medium, and an electronic device.

Background

Attention Mechanism (Attention Mechanism) has become a basic module in most deep learning models, and it can dynamically select related representations in a network according to requirements, and particularly, the Attention Mechanism has a remarkable effect in generating tasks such as machine translation and image annotation.

SAN (Self-Attention Network) is a neural Network model based on a Self-Attention mechanism, and belongs to one of Attention mechanism models. The SAN model can calculate an attention weight for each element in an input sequence, so that long-distance dependency relationships can be captured, and network representations corresponding to the elements are not influenced by the distance between the elements.

However, the SAN model only stays at the word level, making it inefficient when processing some information.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

An object of the present disclosure is to provide a neural network-based information processing method, a neural network-based information processing apparatus, a computer-readable medium, and an electronic device, thereby overcoming, at least to some extent, the problem of non-ideal information processing effect of a self-concern neural network due to limitations and disadvantages of the related art.

According to a first aspect of the present disclosure, there is provided a neural network-based information processing method, including: acquiring a target information sequence corresponding to input information, and determining a phrase level representation sequence based on the target information sequence; wherein the phrase-level representation sequence includes a plurality of phrase elements; performing linear transformation on the phrase-level representation sequence to obtain a request vector sequence, a key vector sequence and a value vector sequence corresponding to the phrase-level representation sequence; calculating the logic similarity between the request vector sequence and the key vector sequence, and carrying out nonlinear transformation on the logic similarity to obtain the attention weight distribution corresponding to each phrase element; determining a first network representation sequence corresponding to the input information based on the attention weight distribution and the value vector sequence.

According to a second aspect of the present disclosure, there is provided a neural network-based information processing apparatus including a phrase-level sequence determination module, a linear transformation module, an attention weight determination module, and a network representation sequence determination module.

Specifically, the phrase level sequence determining module is configured to obtain a target information sequence corresponding to the input information, and determine a phrase level representation sequence based on the target information sequence; wherein the phrase-level representation sequence includes a plurality of phrase elements; the linear transformation module is used for carrying out linear transformation on the phrase level representation sequence to obtain a request vector sequence, a key vector sequence and a value vector sequence corresponding to the phrase level representation sequence; the attention weight determining module is used for calculating the logic similarity between the request vector sequence and the key vector sequence, and carrying out nonlinear transformation on the logic similarity to obtain attention weight distribution corresponding to each phrase element; the network representation sequence determination module is used for determining a first network representation sequence corresponding to the input information based on the attention weight distribution and the value vector sequence.

Optionally, the phrase level sequence determination module includes a phrase segmentation unit, a feature fusion unit, and a feature combination unit.

Specifically, the phrase segmentation unit is configured to perform phrase segmentation on the target information sequence to form a plurality of phrase groups; the feature fusion unit is used for performing feature fusion on features in each phrase group to generate feature vectors corresponding to the phrase groups; the feature combination unit is used for combining the feature vectors corresponding to the phrase groups to generate the phrase-level representation sequence.

Optionally, the phrase segmentation unit comprises a first segmentation subunit.

Specifically, the first segmentation subunit is configured to perform phrase segmentation on the target information sequence according to a predetermined phrase length to form a plurality of phrase groups.

Optionally, the phrase segmentation unit comprises a second segmentation subunit.

Specifically, the second segmentation subunit is configured to perform phrase segmentation on the target information sequence according to a syntax structure of the input information to form a plurality of phrase groups.

Optionally, the feature combining unit is configured to perform: combining the feature vectors corresponding to the phrase groups to generate an intermediate representation sequence; and performing dependency relationship reinforcement on the feature vectors corresponding to the phrase groups in the intermediate representation sequence to generate the phrase-level representation sequence.

Optionally, the linear transformation module is configured to perform: and utilizing three parameter matrixes capable of being trained to respectively carry out linear transformation on the phrase-level representation sequences to obtain a request vector sequence, a key vector sequence and a value vector sequence corresponding to the phrase-level representation sequences.

Optionally, the information processing apparatus based on a neural network further includes a network representation sequence combination module.

In particular, the network representation sequence combination module is configured to perform: determining a word level representation sequence corresponding to the input information; wherein the word-level representation sequence comprises a plurality of word elements; generating a request vector sequence, a key vector sequence and a value vector sequence corresponding to the word-level representation sequence, and determining an attention weight distribution corresponding to each of the word elements; determining a second network representation sequence corresponding to the input information based on the attention weight distribution corresponding to each word element and the value vector sequence corresponding to the word-level representation sequence; determining a final network representation sequence corresponding to the input information using the first network representation sequence and the second network representation sequence.

According to a third aspect of the present disclosure, there is provided a computer-readable medium having stored thereon a computer program which, when executed by a processor, implements the neural network-based information processing method as described above.

According to a fourth aspect of the present disclosure, there is provided an electronic device comprising: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the neural network-based information processing method as described above.

In some embodiments of the present disclosure, a phrase-level representation sequence corresponding to input information is determined, a request vector sequence, a key vector sequence, and a value vector sequence corresponding to the phrase-level representation sequence are obtained, an attention weight distribution of each phrase element in the phrase-level representation sequence is calculated based on the request vector sequence and the key vector sequence, and a network representation sequence corresponding to the input information is determined using the attention weight distribution and the value vector sequence. On one hand, compared with the technology that only word level processing is utilized, in the whole process of the method, phrase level representation sequences are utilized, so that the information processing performance of the self-attention neural network can be improved; on the other hand, the scheme disclosed by the invention can be applied to the field of machine translation, so that the translation quality can be greatly improved; on the other hand, the phrase level represents the configuration of the phrases in the sequence, the flexibility is high, multiple configuration schemes can be used for parallel processing, and after results are combined or compared, the accuracy of information processing is further ensured.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty. In the drawings:

fig. 1 is a schematic diagram showing an exemplary system architecture of a neural network-based information processing method or a neural network-based information processing apparatus to which an embodiment of the present invention can be applied;

FIG. 2 illustrates a schematic structural diagram of a computer system suitable for use with the electronic device to implement an embodiment of the invention;

FIG. 3 schematically illustrates a flow chart of a neural network-based information processing method according to an exemplary embodiment of the present disclosure;

FIG. 4 shows a schematic diagram of a process of determining a network representation sequence according to an example embodiment of the present disclosure;

figure 5 schematically illustrates a block diagram of a stacked multi-headed self-care neural network, according to an exemplary embodiment of the present disclosure;

FIG. 6 is a diagram illustrating comparison of results using different methods in a machine translation scenario;

fig. 7 schematically shows a block diagram of a neural network-based information processing apparatus according to an exemplary embodiment of the present disclosure;

FIG. 8 schematically illustrates a block diagram of a phrase-level sequence determination module in accordance with an exemplary embodiment of the present disclosure;

FIG. 9 schematically illustrates a block diagram of a phrase segmentation unit in accordance with an exemplary embodiment of the present disclosure;

FIG. 10 schematically illustrates a block diagram of a phrase segmentation unit, according to another exemplary embodiment of the present disclosure;

fig. 11 schematically illustrates a block diagram of a phrase-level sequence determination module according to another exemplary embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the steps. For example, some steps may be decomposed, and some steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation. In addition, all of the following terms "first" and "second" are used for distinguishing purposes only and should not be construed as limiting the present disclosure.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.

The scheme provided by the embodiment of the application mainly relates to the field of artificial intelligence natural language processing. The following is a specific explanation.

Fig. 1 is a schematic diagram showing an exemplary system architecture of a neural network-based information processing method or a neural network-based information processing apparatus to which an embodiment of the present invention can be applied.

As shown in fig. 1, the system architecture 1000 may include one or more of terminal devices 1001, 1002, 1003, a network 1004, and a server 1005. The network 1004 is used to provide a medium for communication links between the terminal devices 1001, 1002, 1003 and the server 1005. Network 1004 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, the server 1005 may be a server cluster composed of a plurality of servers.

A user may use the terminal devices 1001, 1002, 1003 to interact with a server 1005 via a network 1004 to receive or transmit messages or the like. The terminal devices 1001, 1002, 1003 may be various electronic devices having a display screen, including but not limited to smart phones, tablet computers, portable computers, desktop computers, and the like.

The server 1005 may be a server that provides various services. For example, the server 1005 may acquire input information transmitted by the terminal devices 1001, 1002, 1003, determine a target information sequence corresponding to the input information, and convert the target information sequence into a phrase level representation sequence, where the phrase level representation sequence includes a plurality of phrase elements. Next, the server 1005 may perform linear transformation on the phrase-level representation sequence to obtain a request vector sequence, a key vector sequence, and a value vector sequence corresponding to the phrase-level representation sequence, calculate a logical similarity between the request vector sequence and the key vector sequence, and perform nonlinear transformation on the logical similarity to obtain an attention weight distribution corresponding to each phrase element. Subsequently, the server 1005 may determine a sequence of network representations corresponding to the input information based on the attention weight distribution and the sequence of value vectors.

Based on the difference of the division granularity of the phrases, that is, the length of the divided phrases is different, a plurality of groups of network representation sequences corresponding to the input information can be obtained, and the network representation sequences are spliced to obtain the final network representation sequence capable of accurately representing the input information. Then, for example, the server 1005 may apply the network representation sequence to a scenario of machine translation, resulting in a more accurate translation result compared to the prior art.

It should be noted that the neural network-based information processing method according to the exemplary embodiment of the present disclosure is generally executed by the server 1005, and accordingly, the neural network-based information processing apparatus described below is generally configured in the server 1005.

FIG. 2 illustrates a schematic structural diagram of a computer system suitable for use with the electronic device used to implement the exemplary embodiments of this disclosure.

It should be noted that the computer system 200 of the electronic device shown in fig. 2 is only an example, and should not bring any limitation to the functions and the scope of the application of the embodiments of the present disclosure.

As shown in fig. 2, the computer system 200 includes a Central Processing Unit (CPU)201 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)202 or a program loaded from a storage section 208 into a Random Access Memory (RAM) 203. In the RAM 203, various programs and data necessary for system operation are also stored. The CPU201, ROM 202, and RAM 203 are connected to each other via a bus 204. An input/output (I/O) interface 205 is also connected to bus 204.

The following components are connected to the I/O interface 205: an input portion 206 including a keyboard, a mouse, and the like; an output section 207 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 208 including a hard disk and the like; and a communication section 209 including a network interface card such as a LAN card, a modem, or the like. The communication section 209 performs communication processing via a network such as the internet. A drive 210 is also connected to the I/O interface 205 as needed. A removable medium 211 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 210 as necessary, so that a computer program read out therefrom is mounted into the storage section 208 as necessary.

In particular, the processes described below with reference to the flowcharts may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 209 and/or installed from the removable medium 211. The computer program executes various functions defined in the system of the present application when executed by a Central Processing Unit (CPU) 201.

It should be noted that the computer readable media shown in the present disclosure may be computer readable signal media or computer readable storage media or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.

As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method as described in the embodiments below.

Fig. 3 schematically shows a flowchart of a neural network-based information processing method of an exemplary embodiment of the present disclosure. Referring to fig. 3, the neural network-based information processing method may include the steps of:

s32, acquiring a target information sequence corresponding to input information, and determining a phrase level representation sequence based on the target information sequence; wherein the phrase level representation sequence includes a plurality of phrase elements.

In an exemplary embodiment of the present disclosure, the input information is information to be transformed and then output a corresponding network representation sequence. The input information may comprise a set of ordered elements, with the input information comprising I elementsFor example, the input information may be represented as a sequence H ═ H₁,h₂,…,h_I}。

For example, in the context of machine translation, the input information may be text information to be translated, and the respective elements in the input information are respective words in the text information. The language of the text information to be translated is not particularly limited in the present disclosure, and for example, the language type of the text information to be translated may be chinese, english, german, japanese, or the like. For example, the text information to be translated is "Bushhold a talk with Sharon", and at this time, the sequence H is { Bush, hell, a, talk, with Sharon }.

The target information sequence may be a hidden vector corresponding to the input information, and may be represented as Z ═ { Z ═ Z₁,z₂,…,z_I}. Specifically, an embedding (embedding) layer of the neural network may be adopted to convert discrete elements in the input information into a continuous spatial representation, so as to obtain the target information sequence.

For example, first, input information may be encoded by an index, one index being assigned to each different input information; next, an embedding matrix may be created and the vector length required for each index determined. Thus, the input information can be represented using an embedded matrix, rather than a huge code vector.

In an exemplary embodiment of the present disclosure, the server may determine a phrase-level representation sequence corresponding to the input information based on the target information sequence. Wherein the phrase level indicates that a plurality of phrase elements are included in the sequence.

First, the server may perform phrase-level segmentation on the target information sequence to form a plurality of phrase groups.

According to some embodiments of the present disclosure, the target information sequence may be phrase-segmented by a predetermined phrase length to form a plurality of phrase groups. The predetermined phrase length may refer to the number of words included in the segmented phrase group, which is a super-parameter that may be adjusted. For example, for a sentence containing 10 words, if the predetermined phrase length is 2, it can be divided into 5 phrase groups.

According to other embodiments of the present disclosure, the target information sequence may be phrase-segmented according to a syntactic structure of the input information to form a plurality of phrase groups. For example, with the help of the syntax tree, the target information sequence may be phrase-segmented by Noun Phrase (NP), Verb Phrase (VP), Preposition Phrase (PP), and the like in the sentence. In addition, the lengths of the divided phrases may also be different for different syntax structure rules, that is, the phrases have different granularities.

After a plurality of phrase groups are formed, feature fusion can be performed on features in each phrase group to generate feature vectors corresponding to the phrase groups, that is, phrase-level vector representations are obtained, and one phrase group corresponds to one vector representation. Specifically, the feature fusion may be implemented by a convolutional neural network, a cyclic neural network, and the like, which is not particularly limited in this exemplary embodiment.

Next, the feature vectors corresponding to each phrase group may be combined to generate a phrase-level representation sequence corresponding to the input information, which may be denoted Hg in this disclosure.

In addition, in some embodiments of the present disclosure, after a plurality of phrase groups are determined, feature vectors corresponding to the phrase groups may be combined to generate an intermediate representation sequence. Next, the dependency relationship of the feature vectors corresponding to the phrase groups in the intermediate representation sequence is strengthened to generate a phrase-level representation sequence corresponding to the input information.

Specifically, the intermediate representation sequence can be input into a neuron ordering model ON-LSTM (Ordered neuron Long Short-Term Memory), so as to obtain a phrase-level representation sequence Hg.

By strengthening the dependency relationship among the phrases, the performance of the neural network is further improved.

And S34, performing linear transformation on the phrase-level representation sequence to obtain a request vector sequence, a key vector sequence and a value vector sequence corresponding to the phrase-level representation sequence.

In an exemplary embodiment of the present disclosure, a linear transformation may map vectors belonging to one vector space to another vector space. Specifically, the server may perform linear transformation on the phrase-level representation sequence through three parameter matrices capable of performing training, and map the estimated representation sequence into three different vector spaces, respectively, to obtain a request vector sequence, a key vector sequence, and a value vector sequence corresponding to the phrase-level representation sequence.

In one embodiment of the present disclosure, the neural network-based information processing method is applied to the SAN model, in which case the request vector sequence, the key vector sequence, and the value vector sequence are each obtained by performing linear transformation on the phrase-level representation sequence.

In another embodiment of the present disclosure, the neural network-based information processing method may also be applied to a neural network model including an Encoder-Decoder structure, in which case, the key vector sequence and the value vector sequence are encoded by the Encoder as an output of the Encoder. The request vector sequence is an input of the decoder, and may be, for example, a target-side vector representation sequence, which may be a vector representation corresponding to each element in an output sequence output by the decoder.

The server can determine a request vector sequence Q, a key vector sequence K, and a value vector sequence V corresponding to the phrase-level representation sequence by the following equations 1, 2, and 3:

Q＝Hg·W_Q(formula 1)

K＝Hg·W_K(formula 2)

V＝Hg·W_V(formula 3)

The phrase level indicates that each phrase element in the sequence is a d-dimensional column vector, that is, the phrase level indicates that the sequence may be a vector sequence formed by 1 d-dimensional column vector, and is denoted as a matrix of 1 × d. Three parameter matrices W that can be trained_Q、W_KAnd W_VAre each a d x d matrix. The request vector sequence Q, the key vector sequence K and the value vector sequence V are all matrices of 1 × d.

And S36, calculating the logic similarity between the request vector sequence and the key vector sequence, and carrying out nonlinear transformation on the logic similarity to obtain the attention weight distribution corresponding to each phrase element.

The logic similarity is used for measuring the similarity between phrase elements in the input information, so that the output network representation sequence considers the relation between the phrase elements, and the generated network representation sequence can more accurately express the characteristics of each phrase element and cover richer information.

In an embodiment of the present disclosure, the information processing method based on the neural network may be applied to a neural network model of an encoder-decoder structure, and then the request vector sequence is a target-side vector representation sequence, and the calculated logical similarity is used to represent the similarity between the target-side vector representation sequence and a key vector sequence corresponding to the input information. Assigning an attention weight to the corresponding sequence of value vectors based on the similarity may enable the network representation of each phrase element to account for the effects of the sequence of target-side vector representations input at the target side.

The server may calculate a logic similarity matrix e between the request vector sequence Q and the key vector sequence K in a cosine similarity manner, specifically referring to formula 4:

wherein, K^TA transpose matrix representing a key vector sequence K, d is the dimension of each phrase element in the input information, and is also the dimension of the network hidden state vector, which is divided by the dimension in equation 4

The purpose is to reduce the inner product and the computational complexity.

After the logical similarity is calculated, nonlinear transformation can be performed according to the logical similarity, and the attention weight relationship corresponding to each phrase element is obtained. For the weight value α of each key-value pair, refer to equation 5:

α ═ softmax (e) (equation 5)

S38, determining a first network representation sequence corresponding to the input information based on the attention weight distribution and the value vector sequence.

For the determined attention weight relationship and the corresponding value vector sequence, a first network representation sequence O corresponding to the input information can be determined using equation 6:

o ═ α · V (formula 6)

A process of determining a network representation sequence according to an exemplary embodiment of the present disclosure is explained below with reference to fig. 4.

In step S402, performing embedding processing on the input information H to obtain a target information sequence Z; in step S404, phrase-level conversion is performed on the target information sequence Z to obtain a phrase-level representation sequence Hg; in step S406, the phrase-level representation sequence Hg is linearly transformed to a request vector sequence Q, a key vector sequence K, and a value vector sequence V corresponding to the phrase-level representation sequence Hg; in step S408, a logical similarity e between the request vector sequence Q and the key vector sequence K is calculated; in step S410, the logic similarity e is subjected to nonlinear transformation to obtain a weight value α; in step S412, a dot product operation is performed on the weight value α and the value vector sequence V to obtain one network representation sequence O corresponding to the input information.

It should be noted that the above processing of the input information in the phrase manner determines the first network representation sequence. However, the first network representation sequence may also be combined with other forms of network representation sequences to determine a final network representation sequence corresponding to the input information. These other forms may include, for example, forms in which individual words are divided, forms in which other phrases are divided, and the like.

According to some embodiments of the present disclosure, first, a server may determine a word-level representation sequence corresponding to input information, where the word-level representation sequence includes a plurality of word elements; subsequently, generating a request vector sequence, a key vector sequence and a value vector sequence corresponding to the word-level representation sequence, and determining attention weight distribution corresponding to each morpheme; next, a second network representation sequence corresponding to the input information is determined based on the attention weight distribution corresponding to each morpheme and the value vector sequence corresponding to the word-level representation sequence. The manner of determining the second network representation sequence is similar to the manner of determining the first network representation sequence, and is not described herein again.

The first network representation sequence may then be combined with the second network representation sequence to determine a final network representation sequence corresponding to the input information.

The disclosed exemplary embodiments also provide a network utilizing Stacked Multi-Head Self-Attention (Stacked Multi-Head Self-Attention).

Referring to fig. 5, first, the input information may be divided into m input subsequences, that is, m is determined as the number of multiple starts, where m is a positive integer greater than 1, and as shown, m is, for example, 4. Each input subsequence corresponds to a different phrase granularity, which, as will be readily understood, corresponds to processing for a single word, with a phrase granularity of 1.

And then, respectively generating a corresponding request vector sequence, a key vector sequence and a value vector sequence for each input subsequence by using the self-attention neural network, wherein the parameter matrix in each self-attention neural network is different.

And then, calculating each input subsequence and the request vector sequence and the key vector sequence corresponding to each input subsequence to obtain m sub-logic similarity. And determining a sub-weight value corresponding to each input sub-sequence according to each sub-logic similarity, determining a sub-output vector according to each sub-weight value and the value vector sequence, splicing each sub-output vector to generate a network representation sequence corresponding to the input information, and repeating for multiple times until the coding is finished and the network representation is finished.

In terms of the effect of applying the neural network-based information processing method according to the exemplary embodiment of the present disclosure, a Bilingual Evaluation Understudy (BLEU) score of a sentence is evaluated, taking machine translation as an example.

Referring to fig. 6, a first embodiment of the present disclosure is an embodiment of performing phrase-level processing on an input sentence and strengthening the dependency relationship between phrases, and a second embodiment of the present disclosure is an embodiment of performing phrase-level processing on an input sentence without strengthening the dependency relationship between phrases. In addition, the abscissa of the graph represents the phrase length and the ordinate represents the BLEU difference between the disclosed scheme and the reference model. It can be seen that the solutions of the first and second embodiments of the present disclosure are significantly better in translation quality than the reference model at different phrase lengths.

In addition, for illustration, the present disclosure also provides the processing effect of the above method on a machine translation system, specifically referring to table 1.

TABLE 1

The BLEU score is generally increased by more than 0.5 point, which means that the effect is significantly improved, Δ represents an absolute numerical value of improvement, the unit of parameter number is million (M), and the unit of training speed is the number of iterations per second, so as to be known from table 1, the scheme of the present disclosure significantly improves the translation quality, and particularly, after the dependency relationship between phrases is strengthened, the effect is particularly significant.

It should be noted that, besides the application scenario of machine translation, the network representation determined by the neural network-based information processing method according to the exemplary embodiment of the present disclosure may also be applied to other scenarios, and a better effect is obtained.

It should be noted that although the various steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that these steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.

Further, an information processing apparatus based on a neural network is also provided in the present exemplary embodiment.

Fig. 7 schematically shows a block diagram of a neural network-based information processing apparatus according to an exemplary embodiment of the present disclosure. Referring to fig. 7, the neural network-based information processing apparatus 7 according to an exemplary embodiment of the present disclosure may include a phrase level sequence determination module 71, a linear transformation module 73, an attention weight determination module 75, and a network representation sequence determination module 77.

Specifically, the phrase level sequence determining module 71 may be configured to obtain a target information sequence corresponding to the input information, and determine a phrase level representation sequence based on the target information sequence; wherein the phrase-level representation sequence includes a plurality of phrase elements; the linear transformation module 73 may be configured to perform linear transformation on the phrase-level representation sequence to obtain a request vector sequence, a key vector sequence, and a value vector sequence corresponding to the phrase-level representation sequence; the attention weight determining module 75 may be configured to calculate a logic similarity between the request vector sequence and the key vector sequence, and perform a nonlinear transformation on the logic similarity to obtain an attention weight distribution corresponding to each phrase element; the network representation sequence determination module 77 may be configured to determine a first network representation sequence corresponding to the input information based on the attention weight distribution and the value vector sequence.

With the information processing apparatus based on a neural network according to the exemplary embodiment of the present disclosure, on one hand, compared with some technologies that only use word level for processing, in the whole process of the present disclosure, a phrase level representation sequence is used, which can improve the information processing performance of the self-care neural network; on the other hand, the scheme disclosed by the invention can be applied to the field of machine translation, so that the translation quality can be greatly improved; on the other hand, the phrase level represents the configuration of the phrases in the sequence, the flexibility is high, multiple configuration schemes can be used for parallel processing, and after results are combined or compared, the accuracy of information processing is further ensured.

According to an exemplary embodiment of the present disclosure, referring to fig. 8, the phrase level sequence determination module 71 may include a phrase segmentation unit 801, a feature fusion unit 803, and a feature combination unit 805.

Specifically, the phrase segmentation unit 801 may be configured to perform phrase segmentation on the target information sequence to form a plurality of phrase groups; the feature fusion unit 803 may be configured to perform feature fusion on features in each phrase group to generate a feature vector corresponding to each phrase group; feature combination unit 805 may be configured to combine feature vectors corresponding to the phrase groups to generate the phrase-level representation sequence.

Referring to fig. 9, a phrase segmentation unit 801 may include a first segmentation subunit 901 according to an example embodiment of the present disclosure.

Specifically, the first segmentation subunit 901 may be configured to perform phrase segmentation on the target information sequence according to a predetermined phrase length to form a plurality of phrase groups.

Referring to fig. 10, the phrase segmentation unit 801 may include a second segmentation sub-unit 101 according to an exemplary embodiment of the present disclosure.

Specifically, the second segmentation subunit 101 may be configured to perform phrase segmentation on the target information sequence according to a syntactic structure of the input information to form a plurality of phrase groups.

According to an exemplary embodiment of the present disclosure, the feature combining unit 805 may be configured to perform: combining the feature vectors corresponding to the phrase groups to generate an intermediate representation sequence; and performing dependency relationship reinforcement on the feature vectors corresponding to the phrase groups in the intermediate representation sequence to generate the phrase-level representation sequence.

According to an exemplary embodiment of the present disclosure, the linear transformation module 73 may be configured to perform: and utilizing three parameter matrixes capable of being trained to respectively carry out linear transformation on the phrase-level representation sequences to obtain a request vector sequence, a key vector sequence and a value vector sequence corresponding to the phrase-level representation sequences.

According to an exemplary embodiment of the present disclosure, referring to fig. 11, the neural network-based information processing apparatus 11 may further include a network representation sequence combining module 111, as compared to the neural network-based information processing apparatus 7.

In particular, the network representation sequence combination module 111 may be configured to perform: determining a word level representation sequence corresponding to the input information; wherein the word-level representation sequence comprises a plurality of word elements; generating a request vector sequence, a key vector sequence and a value vector sequence corresponding to the word-level representation sequence, and determining an attention weight distribution corresponding to each of the word elements; determining a second network representation sequence corresponding to the input information based on the attention weight distribution corresponding to each word element and the value vector sequence corresponding to the word-level representation sequence; determining a final network representation sequence corresponding to the input information using the first network representation sequence and the second network representation sequence.

Since each functional module of the program operation performance analysis apparatus according to the embodiment of the present invention is the same as that in the embodiment of the present invention, it is not described herein again.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

Furthermore, the above-described figures are merely schematic illustrations of processes involved in methods according to exemplary embodiments of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is to be limited only by the terms of the appended claims.

21页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：基于实时学习的融合型词义嵌入方法

Information processing method and device based on neural network, medium and electronic equipment

相关技术

网友询问留言