Information processing method and device

文档序号:1338296 发布日期:2020-07-17 浏览:27次 中文

阅读说明:本技术 一种信息处理的方法以及装置 (Information processing method and device ) 是由 涂兆鹏 杨宝嵩 王星 于 2018-11-19 设计创作,主要内容包括:本发明实施例公开了一种信息处理的方法,包括:获取待处理文本信息所对应的目标文本序列,目标文本序列中包括多个元素;根据目标文本序列获取上下文向量,所述上下文向量用于体现所述元素间的依存关系;根据上下文向量对目标文本信息所对应的目标文本序列进行编码处理,得到文本编码结果。本发明实施例采用与离散序列相关的上下文向量对该离散序列进行编码,由此,强化离散序列中各个元素之间的依存关系,从而增强神经网络模型的性能,提升模型的学习能力。(The embodiment of the invention discloses an information processing method, which comprises the following steps: acquiring a target text sequence corresponding to text information to be processed, wherein the target text sequence comprises a plurality of elements; obtaining a context vector according to a target text sequence, wherein the context vector is used for reflecting the dependency relationship among the elements; and coding the target text sequence corresponding to the target text information according to the context vector to obtain a text coding result. The embodiment of the invention adopts the context vector related to the discrete sequence to encode the discrete sequence, thereby strengthening the dependency relationship among elements in the discrete sequence, enhancing the performance of the neural network model and improving the learning capability of the model.)

1. A method of information processing, the method comprising:

acquiring a target text sequence corresponding to text information to be processed, wherein the target text sequence comprises a plurality of elements;

obtaining a context vector according to the target text sequence, wherein the context vector is used for reflecting the dependency relationship among the elements;

and according to the context vector, coding the target text sequence corresponding to the target text information to obtain a text coding result.

2. The method according to claim 1, wherein the obtaining of the target text sequence corresponding to the text information to be processed includes:

acquiring a target text sequence corresponding to the text information to be processed through a neural network;

accordingly, the context vector is learned from internal representations in the neural network.

3. The method of claim 1, wherein obtaining a context vector from the target text sequence comprises:

obtaining a vector of each element in the target text sequence;

and determining overall information of the target text sequence according to the vector of each element in the target text sequence, wherein the overall information is used for representing the context vector.

4. The method of claim 3, wherein the determining the overall information of the target text sequence according to the vector of each element in the target text sequence comprises:

and calculating the average value of the target text sequence according to the vector of each element in the target text sequence, and taking the average value as the overall information.

5. The method of claim 1, wherein obtaining a context vector from the target text sequence comprises:

obtaining L layers of text sequences corresponding to the target text sequence, wherein the L layers of text sequences are network layers generated before the target text sequence, and the L is an integer greater than or equal to 1;

generating the context vector from the L-layer text sequence.

6. The method of claim 1, wherein obtaining a context vector from the target text sequence comprises:

obtaining L layers of text sequences corresponding to the target text sequence, wherein the L layers of text sequences are network layers generated before the target text sequence, and the L is an integer greater than or equal to 1;

obtaining L layers of first context vectors according to the L layers of text sequences, wherein each layer of first context vector is an average value of elements in each layer of text sequence;

obtaining a second context vector according to the target text sequence, wherein the second context vector is an average value of elements in the target text sequence;

and calculating the context vector according to the L-layer first context vector and the second context vector.

7. The method according to any one of claims 1 to 6, wherein the encoding the target text sequence corresponding to the target text information according to the context vector to obtain a text encoding result includes:

determining a target request vector and a target key vector according to the context vector and the target text sequence, wherein the target request vector has a corresponding relation with elements in the target text sequence, and the target key vector has a corresponding relation with elements in the target text sequence;

determining the logic similarity corresponding to the target text sequence according to the target request vector and the target key vector;

and coding the target text sequence corresponding to the target text information by adopting the logic similarity to obtain a text coding result.

8. The method of claim 7, wherein determining a target request vector and a target key vector from the context vector and the target text sequence comprises:

calculating an original request vector, an original key vector and an original value vector according to the target text sequence, wherein the original value vector is used for determining a target output vector corresponding to the target text sequence;

calculating a request vector scalar and a key vector scalar according to the context vector, the original request vector and the original key vector;

calculating the target request vector and the target key vector based on the context vector, the request vector scalar, and the key vector scalar.

9. An information processing apparatus characterized by comprising:

the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a target text sequence corresponding to text information to be processed, and the target text sequence comprises a plurality of elements;

the obtaining module is configured to obtain a context vector according to the target text sequence, where the context vector is used to reflect a dependency relationship between the elements;

and the coding module is used for coding the target text sequence corresponding to the target text information according to the context vector acquired by the acquisition module to obtain a text coding result.

10. An information processing apparatus characterized by comprising: a memory, a transceiver, a processor, and a bus system;

wherein the memory is used for storing programs;

the processor is used for executing the program in the memory and comprises the following steps:

acquiring a target text sequence corresponding to text information to be processed, wherein the target text sequence comprises a plurality of elements;

obtaining a context vector according to the target text sequence, wherein the context vector is used for reflecting the dependency relationship among the elements;

coding the target text sequence corresponding to the target text information according to the context vector to obtain a text coding result;

the bus system is used for connecting the memory and the processor so as to enable the memory and the processor to communicate.

11. A computer-readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform the method of any of claims 1 to 8.

Technical Field

The invention relates to the field of artificial intelligence, in particular to an information processing method and device.

Background

Attention mechanism has become a fundamental module in most deep learning models that can dynamically select relevant representations in a network as needed. Research shows that attention is remarkably played in tasks such as machine translation and image annotation.

Currently, a self-attention neural network (SAN) model has been proposed based on attention mechanism, and the SAN model can calculate an attention weight for each element in a discrete sequence, for easy understanding, please refer to fig. 1, where fig. 1 is a basic architecture diagram of a prior art solution in which a SAN model models a discrete sequence, and as shown in the figure, a SAN network can directly calculate a dependency relationship between hidden states in a neural network, and each upper network representation is directly connected to a lower network representation.

Referring to fig. 2, fig. 2 is a schematic diagram illustrating a SAN model representing a relationship between two words in a conventional scheme, and as shown in the figure, the SAN model using an attention mechanism only considers the relationship between two words when calculating a dependency between two words (e.g., "talk" and "Sharon" in fig. 2), so that for a discrete sequence, a network representation of elements in the whole discrete sequence is weaker, thereby reducing the performance of a neural network model.

Disclosure of Invention

The embodiment of the invention provides a text translation method, an information processing method and a device, and the context vector related to a discrete sequence is adopted to encode the discrete sequence, so that the dependency relationship among elements in the discrete sequence is strengthened, the performance of a neural network model is enhanced, and the learning capability of the model is improved.

In view of the above, a first aspect of the present invention provides a method for text translation, including:

acquiring a target text sequence corresponding to target text information, wherein the target text sequence comprises a plurality of elements;

obtaining a context vector according to the target text sequence;

determining a target request vector and a target key vector according to the context vector and the target text sequence, wherein the target request vector has a corresponding relation with elements in the target text sequence, and the target key vector has a corresponding relation with elements in the target text sequence;

determining the logic similarity corresponding to the target text sequence according to the target request vector and the target key vector;

coding the target text sequence corresponding to the target text information by adopting the logic similarity to obtain a text coding result;

and decoding the text coding result to obtain a text translation result corresponding to the target text information.

A second aspect of the present invention provides an information processing method, including:

acquiring a target text sequence corresponding to text information to be processed, wherein the target text sequence comprises a plurality of elements;

obtaining a context vector according to the target text sequence;

determining a target request vector and a target key vector according to the context vector and the target text sequence, wherein the target request vector has a corresponding relation with elements in the target text sequence, and the target key vector has a corresponding relation with elements in the target text sequence;

determining the logic similarity corresponding to the target text sequence according to the target request vector and the target key vector;

and coding the target text sequence corresponding to the target text information by adopting the logic similarity to obtain a text coding result.

A third aspect of the present invention provides a text translation apparatus including:

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a target text sequence corresponding to target text information, and the target text sequence comprises a plurality of elements;

the obtaining module is further configured to obtain a context vector according to the target text sequence;

a determining module, configured to determine a target request vector and a target key vector according to the context vector and the target text sequence acquired by the acquiring module, where the target request vector has a correspondence with an element in the target text sequence, and the target key vector has a correspondence with an element in the target text sequence;

the determining module is further configured to determine a logical similarity corresponding to the target text sequence according to the target request vector and the target key vector;

the encoding module is used for encoding the target text sequence corresponding to the target text information by adopting the logic similarity determined by the determination module to obtain a text encoding result;

and the decoding module is used for decoding the text coding result coded by the coding module to obtain a text translation result corresponding to the target text information.

A fourth aspect of the present invention provides an information processing apparatus comprising:

the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a target text sequence corresponding to text information to be processed, and the target text sequence comprises a plurality of elements;

the obtaining module is used for obtaining a context vector according to the target text sequence;

a determining module, configured to determine a target request vector and a target key vector according to the context vector and the target text sequence acquired by the acquiring module, where the target request vector has a correspondence with an element in the target text sequence, and the target key vector has a correspondence with an element in the target text sequence;

the determining module is further configured to determine a logical similarity corresponding to the target text sequence according to the target request vector and the target key vector;

and the encoding module is used for encoding the target text sequence corresponding to the target text information by adopting the logic similarity determined by the determination module to obtain a text encoding result.

A fifth aspect of the present invention provides a text translation apparatus including: a memory, a transceiver, a processor, and a bus system;

wherein the memory is used for storing programs;

the processor is used for executing the program in the memory and comprises the following steps:

acquiring a target text sequence corresponding to target text information, wherein the target text sequence comprises a plurality of elements;

obtaining a context vector according to the target text sequence;

determining a target request vector and a target key vector according to the context vector and the target text sequence, wherein the target request vector has a corresponding relation with elements in the target text sequence, and the target key vector has a corresponding relation with elements in the target text sequence;

determining the logic similarity corresponding to the target text sequence according to the target request vector and the target key vector;

coding the target text sequence corresponding to the target text information by adopting the logic similarity to obtain a text coding result;

decoding the text coding result to obtain a text translation result corresponding to the target text information;

the bus system is used for connecting the memory and the processor so as to enable the memory and the processor to communicate.

A sixth aspect of the present invention provides an information processing apparatus comprising: a memory, a transceiver, a processor, and a bus system;

wherein the memory is used for storing programs;

the processor is used for executing the program in the memory and comprises the following steps:

acquiring a target text sequence corresponding to text information to be processed, wherein the target text sequence comprises a plurality of elements;

obtaining a context vector according to the target text sequence;

determining a target request vector and a target key vector according to the context vector and the target text sequence, wherein the target request vector has a corresponding relation with elements in the target text sequence, and the target key vector has a corresponding relation with elements in the target text sequence;

determining the logic similarity corresponding to the target text sequence according to the target request vector and the target key vector;

coding the target text sequence corresponding to the target text information by adopting the logic similarity to obtain a text coding result;

the bus system is used for connecting the memory and the processor so as to enable the memory and the processor to communicate.

A seventh aspect of the present invention provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to perform the method of the above-described aspects.

According to the technical scheme, the embodiment of the invention has the following advantages:

the embodiment of the invention provides an information processing method, which comprises the steps of firstly, obtaining a target text sequence corresponding to text information to be processed, wherein the target text sequence comprises a plurality of elements, then obtaining a context vector according to the target text sequence, then determining a target request vector and a target key vector by using the context vector and the target text sequence, wherein the target request vector has a corresponding relation with the elements in the target text sequence, and the key vector has a corresponding relation with the elements in the target text sequence, finally determining a logic similarity corresponding to the target text sequence according to the target request vector and the target key vector, and coding the target text sequence corresponding to the target text information by adopting the logic similarity to obtain a text coding result. By the method, the discrete sequence is encoded by the context vector related to the discrete sequence, so that the dependency relationship among elements in the discrete sequence is strengthened, the performance of the neural network model is enhanced, and the learning capability of the model is improved.

Drawings

FIG. 1 is a schematic diagram of a basic architecture of a SAN model modeling a discrete sequence in a prior art scheme;

FIG. 2 is a diagram of a SAN model representing the relationship between two words in a prior art scheme;

FIG. 3 is a block diagram of a text translation system according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a computing process of the SAN model according to an embodiment of the present invention;

FIG. 5 is a diagram of an embodiment of a method for text translation according to an embodiment of the present invention;

FIG. 6 is a diagram of an embodiment of a method for processing information according to an embodiment of the present invention;

FIG. 7 is a diagram of an embodiment of a global context vector in an embodiment of the present invention;

FIG. 8 is a diagram of an embodiment of a depth context vector in an embodiment of the present invention;

FIG. 9 is a diagram of an embodiment of a deep global context vector in an embodiment of the present invention;

FIG. 10 is a schematic structural diagram of a heap multi-headed self-care network according to an embodiment of the present invention;

FIG. 11 is a schematic diagram illustrating a comparison of a SAN model for translation in an application scenario of the present invention;

FIG. 12 is a diagram of an embodiment of a text translation apparatus according to an embodiment of the present invention;

FIG. 13 is a diagram showing an embodiment of an information processing apparatus according to the present invention;

fig. 14 is a schematic diagram of another embodiment of an information processing apparatus according to an embodiment of the present invention;

fig. 15 is a schematic structural diagram of a terminal device in the embodiment of the present invention;

fig. 16 is a schematic structural diagram of a server in an embodiment of the present invention.

Detailed Description

The embodiment of the invention provides a text translation method, an information processing method and a device, and the context vector related to a discrete sequence is adopted to encode the discrete sequence, so that the dependency relationship among elements in the discrete sequence is strengthened, the performance of a neural network model is enhanced, and the learning capability of the model is improved.

39页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种便携式智能指向型翻译装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!