Text processing method and device, computer equipment and computer readable storage medium

文档序号：699310 发布日期：2021-05-04 浏览：6次中文

阅读说明：本技术 文本处理方法、装置、计算机设备及计算机可读存储介质 (Text processing method and device, computer equipment and computer readable storage medium ) 是由陈孝良赵昂叶森冯大航于 2021-01-13 设计创作，主要内容包括：本申请提供了一种文本处理方法、装置、计算机设备及计算机可读存储介质,属于自然语言处理技术领域。本申请首先从待进行文本规整的文本数据中,确定出待进行文本规整的文本片段,并仅对该文本片段进行文本规整处理,降低了计算量；另外,本申请实施例基于该文本片段的语义特征,来对该文本片段进行文本规整处理,再以处理后的文本片段替换原始的文本片段,得到新的文本数据,这种文本规整方式能够对经过语音识别得到的文本进行有效规整,提高了文本数据的可读性和连贯性,进而确保了用户的阅读体验,提高了文本处理效率。(The application provides a text processing method, a text processing device, computer equipment and a computer readable storage medium, and belongs to the technical field of natural language processing. The method and the device firstly determine the text segment to be subjected to text normalization from the text data to be subjected to text normalization, and only perform text normalization on the text segment, so that the calculated amount is reduced; in addition, according to the text formatting method, the text fragment is subjected to text formatting processing based on the semantic features of the text fragment, the processed text fragment is used for replacing the original text fragment, and new text data is obtained.)

1. A method of text processing, the method comprising:

acquiring first text data; determining a first text segment to be subjected to text normalization from the first text data;

extracting semantic features of the first text segment, wherein the semantic features comprise at least one of a first type of semantic features, a second type of semantic features and a third type of semantic features; the first semantic feature is used for indicating semantic information of the first text segment, the second semantic feature is used for indicating semantic information of a context text segment of the first text segment, and the third semantic feature is used for indicating pinyin information of the context text segment of the first text segment;

based on the semantic features of the first text segment, performing text normalization processing on the first text segment to obtain a second text segment;

and replacing the first text segment with the second text segment to obtain second text data.

2. The method according to claim 1, wherein the performing text-warping processing on the first text segment based on the semantic features of the first text segment to obtain a second text segment comprises:

inputting the semantic features of the first text segment into an encoder network of a text normalization model, performing feature mapping processing on the semantic features through the encoder network, and outputting the encoding features of the semantic features;

inputting the coding features into a decoder network of the text normalization model, and determining first probability distribution features corresponding to the coding features through the decoder network;

inputting the first probability distribution characteristics into a linear full-connection layer of the text-structured model, and splicing the first probability distribution characteristics through the linear full-connection layer to obtain spliced first probability distribution characteristics;

and inputting the spliced first probability distribution characteristics into a normalization layer of the text-normalized model, and outputting the second text segments through the normalization layer.

3. The method according to claim 2, wherein the performing feature mapping processing on the semantic features through the encoder network, and outputting the encoded features of the semantic features comprises:

inputting the semantic features into a multi-head attention layer of the encoder network, and acquiring the features of the semantic features on multiple layers through the multi-head attention layer;

inputting the features of the semantic features in multiple layers into a first normalization layer of the encoder network, and performing normalization processing on first superposition features through the first normalization layer to obtain first-class normalized features; the first superposition characteristic is a superposition result of the semantic characteristic and the characteristics of the semantic characteristic at a plurality of levels;

inputting the first type of standardized features into a forward full-connection layer of the encoder network, and splicing the first type of standardized features through the forward full-connection layer to obtain spliced first type of standardized features;

inputting the spliced first-class standardized features into a second standardized layer of the encoder network, and carrying out standardized processing on the first-class standardized features and the spliced first-class standardized features through the second standardized layer to obtain the encoding features.

4. The method of claim 2, wherein said determining, by the decoder network, a first probability distribution characteristic corresponding to the coding characteristic comprises:

inputting the coding features into a multi-head attention layer of the decoder network, and acquiring features of the coding features at multiple levels through the multi-head attention layer;

inputting the second type of standardized features into a forward full-connection layer of the decoder network, and splicing the second type of standardized features through the forward full-connection layer to obtain spliced second type of standardized features;

inputting the spliced second type of normalized features into a second normalization layer of the decoder network, and normalizing the second type of normalized features and the spliced second type of normalized features through the second normalization layer to obtain the first probability distribution features.

5. The method of claim 1, wherein determining, from the first text data, a first text segment to be text-warped comprises:

performing word segmentation processing on the first text data;

inputting the characters obtained after the word segmentation processing into a text detection model, and determining the first text segment from the first text data through the text detection model.

6. The method of claim 5, wherein inputting the character obtained after the word segmentation into the text detection model, and determining the first text segment from the first text data through the text detection model comprises:

inputting the characters into an input layer of the text detection model, and acquiring text representation features of the characters through the input layer, wherein the text representation features are used for indicating word indexes of the characters in a dictionary;

inputting the text representation features into a word embedding layer of the text detection model, performing feature mapping processing on the text representation features through the word embedding layer, and outputting embedding features;

inputting the embedded features into a bidirectional recurrent neural network layer of the text detection model, and determining probability distribution features of the characters marked as various labels through the bidirectional recurrent neural network layer, wherein the labels are used for indicating the types of the characters;

inputting the probability distribution characteristics into a full connection layer of the text detection model, and splicing the probability distribution characteristics through the full connection layer to obtain spliced probability distribution characteristics;

inputting the spliced probability distribution characteristics into a conditional random field output layer of the text detection model, and determining the label of the character through the conditional random field output layer;

determining the first text segment from the first text data based on the label of the character.

7. The method of claim 1, wherein extracting semantic features of the first text segment comprises:

and inputting the first text segment into an input embedding layer of a text normalization model, and acquiring a coding vector of the first text segment through the input embedding layer to be used as a semantic feature of the first text segment.

8. A method of text processing, the method comprising:

acquiring voice data; performing voice recognition based on the voice data to obtain first text data;

determining a first text segment to be subjected to text normalization from the first text data;

based on the semantic features of the first text segment, performing text normalization processing on the first text segment to obtain a second text segment;

and replacing the first text segment with the second text segment to obtain second text data.

9. A text processing apparatus, characterized in that the apparatus comprises:

the acquisition module is used for acquiring first text data;

the determining module is used for determining a first text segment to be subjected to text normalization from the first text data;

the extraction module is used for extracting semantic features of the first text segment, wherein the semantic features comprise at least one of a first type of semantic features, a second type of semantic features and a third type of semantic features; the first semantic feature is used for indicating semantic information of the first text segment, the second semantic feature is used for indicating semantic information of a context text segment of the first text segment, and the third semantic feature is used for indicating pinyin information of the context text segment of the first text segment;

the warping processing module is used for carrying out text warping processing on the first text fragment based on the semantic features of the first text fragment to obtain a second text fragment;

and the replacing module is used for replacing the first text segment with the second text segment to obtain second text data.

10. A text processing apparatus, characterized in that the apparatus comprises:

the acquisition module is used for acquiring voice data;

the voice recognition module is used for carrying out voice recognition based on the voice data to obtain first text data;

the determining module is used for determining a first text segment to be subjected to text normalization from the first text data;

the warping processing module is used for carrying out text warping processing on the first text fragment based on the semantic features of the first text fragment to obtain a second text fragment;

and the replacing module is used for replacing the first text segment with the second text segment to obtain second text data.

11. A computer device comprising one or more processors and one or more memories having at least one program code stored therein, the program code loaded into and executed by the one or more processors to implement the operations performed by the text processing method of any one of claims 1 to 7; or, the operations performed by the text processing method of claim 8.

12. A computer-readable storage medium having at least one program code stored therein, the program code being loaded and executed by a processor to implement the operations performed by the text processing method according to any one of claims 1 to 7; or, the operations performed by the text processing method of claim 8.

Technical Field

The present application relates to the field of natural language processing technologies, and in particular, to a text processing method and apparatus, a computer device, and a computer-readable storage medium.

Background

Due to the particularity of the voice data, the text obtained by voice recognition of the voice data is generally poor in readability and consistency, and needs to be processed and regulated. For example, due to the language habit of the speaker or the environment where the speaker is located, some redundant, repeated or word order-reversed words may exist in the voice data of the speaker, and when performing voice recognition based on such voice data, the obtained text may also include these redundant, repeated or word order-reversed words, which may seriously affect the reading experience of the user. Therefore, a text processing method is needed to normalize the text obtained through speech recognition.

Disclosure of Invention

The embodiment of the application provides a text processing method and device, computer equipment and a computer readable storage medium, which can effectively regulate texts obtained through voice recognition, improve readability and continuity of text data, further ensure reading experience of a user, improve text processing efficiency and reduce calculation amount of text regulation. The technical scheme provided by the application is as follows:

in one aspect, a text processing method is provided, and the method includes:

acquiring first text data; determining a first text segment to be subjected to text normalization from the first text data;

based on the semantic features of the first text segment, carrying out text normalization processing on the first text segment to obtain a second text segment;

and replacing the first text segment with the second text segment to obtain second text data.

In a possible implementation manner, the performing text normalization processing on the first text segment based on the semantic feature of the first text segment to obtain a second text segment includes:

inputting the first probability distribution characteristic into a linear full-connection layer of the text-structured model, and splicing the first probability distribution characteristic through the linear full-connection layer to obtain a spliced first probability distribution characteristic;

and inputting the spliced first probability distribution characteristic into a normalization layer of the text-normalized model, and outputting the second text segment through the normalization layer.

In a possible implementation manner, the performing, by the encoder network, a feature mapping process on the semantic features, and outputting the encoding features of the semantic features includes:

inputting the semantic features into a multi-head attention layer of the encoder network, and acquiring the features of the semantic features on a plurality of layers through the multi-head attention layer;

inputting the features of the semantic features in multiple layers into a first normalization layer of the encoder network, and performing normalization processing on the first superposition features through the first normalization layer to obtain first-class normalized features; the first superposition characteristic is a superposition result of the semantic characteristic and the characteristics of the semantic characteristic at a plurality of layers;

In one possible implementation, the determining, by the decoder network, the first probability distribution characteristic corresponding to the coding characteristic includes:

inputting the coding features into a multi-head attention layer of the decoder network, and acquiring the features of the coding features at a plurality of levels through the multi-head attention layer;

inputting the spliced second type of standardized features into a second standardized layer of the decoder network, and carrying out standardized processing on the second type of standardized features and the spliced second type of standardized features through the second standardized layer to obtain the first probability distribution features.

In one possible implementation manner, the determining, from the first text data, a first text segment to be subjected to text normalization includes:

performing word segmentation processing on the first text data;

inputting the characters obtained after the word segmentation processing into a text detection model, and determining the first text segment from the first text data through the text detection model.

In a possible implementation manner, the inputting the character obtained after the word segmentation into the text detection model, and the determining, by the text detection model, the first text segment from the first text data includes:

inputting the character into an input layer of the text detection model, and acquiring a text representation characteristic of the character through the input layer, wherein the text representation characteristic is used for indicating a word index of the character in a dictionary;

inputting the text representation characteristics into a word embedding layer of the text detection model, performing characteristic mapping processing on the text representation characteristics through the word embedding layer, and outputting the embedding characteristics;

the first text segment is determined from the first text data based on the label of the character.

In one possible implementation, the extracting semantic features of the first text segment includes:

and inputting the first text segment into an input embedding layer of a text-structured model, and acquiring the coding vector of the first text segment through the input embedding layer to be used as the semantic feature of the first text segment.

In one possible implementation, the training process of the text detection model includes:

obtaining first sample text data and a sample label of the first sample text data, wherein the sample label is used for indicating the type of a sample text fragment to be normalized in the sample text data;

the text detection model is trained based on the first sample text data and sample labels of the first sample text data.

In one possible implementation, the training process of the text-warping model includes:

acquiring second sample text data and structured text data corresponding to the second sample text data;

training the text normalization model based on the second sample text data and the normalized text data.

In one aspect, a text processing method is provided, and the method includes:

acquiring voice data; performing voice recognition based on the voice data to obtain first text data;

determining a first text segment to be subjected to text normalization from the first text data;

based on the semantic features of the first text segment, carrying out text normalization processing on the first text segment to obtain a second text segment;

and replacing the first text segment with the second text segment to obtain second text data.

In one aspect, a text processing apparatus is provided, the apparatus including:

the acquisition module is used for acquiring first text data;

the determining module is used for determining a first text segment to be subjected to text normalization from the first text data;

the normalization processing module is used for performing text normalization processing on the first text segment based on the semantic features of the first text segment to obtain a second text segment;

and the replacing module is used for replacing the first text segment with the second text segment to obtain second text data.

In a possible implementation manner, the regularization processing module includes a first processing unit, a second processing unit, a third processing unit and a fourth processing unit;

the first processing unit is used for inputting the semantic features of the first text segment into an encoder network of a text-structured model, performing feature mapping processing on the semantic features through the encoder network, and outputting the encoding features of the semantic features;

the second processing unit is used for inputting the coding characteristics into a decoder network of the text normalization model, and determining first probability distribution characteristics corresponding to the coding characteristics through the decoder network;

the third processing unit is configured to input the first probability distribution characteristic into a linear full-link layer of the text-structured model, and splice the first probability distribution characteristic through the linear full-link layer to obtain a spliced first probability distribution characteristic;

the fourth processing unit is configured to input the spliced first probability distribution characteristic into a normalization layer of the text normalization model, and output the second text segment through the normalization layer.

In a possible implementation manner, the first processing unit is configured to input the semantic feature into a multi-head attention layer of the encoder network, and obtain features of the semantic feature at multiple levels through the multi-head attention layer; inputting the features of the semantic features in multiple layers into a first normalization layer of the encoder network, and performing normalization processing on the first superposition features through the first normalization layer to obtain first-class normalized features; the first superposition characteristic is a superposition result of the semantic characteristic and the characteristics of the semantic characteristic at a plurality of layers; inputting the first type of standardized features into a forward full-connection layer of the encoder network, and splicing the first type of standardized features through the forward full-connection layer to obtain spliced first type of standardized features; inputting the spliced first-class standardized features into a second standardized layer of the encoder network, and carrying out standardized processing on the first-class standardized features and the spliced first-class standardized features through the second standardized layer to obtain the encoding features.

In a possible implementation manner, the second processing unit is configured to input the encoded feature into a multi-head attention layer of the decoder network, and obtain features of the encoded feature at multiple levels through the multi-head attention layer;

inputting the features of the multiple layers into a first normalization layer of the decoder network, and normalizing the second superposition feature through the first normalization layer to obtain a second type of normalized feature; the second superposition characteristic is a result of superposition of the coding characteristic and the characteristics of the coding characteristic at a plurality of levels; inputting the second type of standardized features into a forward full-connection layer of the decoder network, and splicing the second type of standardized features through the forward full-connection layer to obtain spliced second type of standardized features; inputting the spliced second type of standardized features into a second standardized layer of the decoder network, and carrying out standardized processing on the second type of standardized features and the spliced second type of standardized features through the second standardized layer to obtain the first probability distribution features.

In a possible implementation manner, the determining module comprises a word dividing processing unit and a determining unit;

the word segmentation processing unit is used for carrying out word segmentation processing on the first text data;

the determining unit is used for inputting the characters obtained after the word segmentation processing into a text detection model, and determining the first text segment from the first text data through the text detection model.

In a possible implementation manner, the determining unit is configured to input the character into an input layer of the text detection model, and obtain, through the input layer, a text representation feature of the character, where the text representation feature is used to indicate a word index of the character in a dictionary; inputting the text representation characteristics into a word embedding layer of the text detection model, performing characteristic mapping processing on the text representation characteristics through the word embedding layer, and outputting the embedding characteristics; inputting the embedded features into a bidirectional recurrent neural network layer of the text detection model, and determining probability distribution features of the characters marked as various labels through the bidirectional recurrent neural network layer, wherein the labels are used for indicating the types of the characters; inputting the probability distribution characteristics into a full connection layer of the text detection model, and splicing the probability distribution characteristics through the full connection layer to obtain spliced probability distribution characteristics; inputting the spliced probability distribution characteristics into a conditional random field output layer of the text detection model, and determining the label of the character through the conditional random field output layer; the first text segment is determined from the first text data based on the label of the character.

In a possible implementation manner, the extraction module is configured to input the first text segment into an input embedding layer of a text-structured model, and obtain, through the input embedding layer, a coding vector of the first text segment as a semantic feature of the first text segment.

In one possible implementation, the training process of the text detection model includes:

the text detection model is trained based on the first sample text data and sample labels of the first sample text data.

In one possible implementation, the training process of the text-warping model includes:

acquiring second sample text data and structured text data corresponding to the second sample text data;

training the text normalization model based on the second sample text data and the normalized text data.

In one aspect, a text processing apparatus is provided, the apparatus including:

the acquisition module is used for acquiring voice data;

the voice recognition module is used for carrying out voice recognition based on the voice data to obtain first text data;

the determining module is used for determining a first text segment to be subjected to text normalization from the first text data;

and the replacing module is used for replacing the first text segment with the second text segment to obtain second text data.

In one aspect, a computer device is provided that includes one or more processors and one or more memories having at least one program code stored therein, the program code being loaded and executed by the one or more processors to perform the operations performed by the text processing method.

In one aspect, a computer-readable storage medium having at least one program code stored therein is provided, the program code being loaded and executed by a processor to implement the operations performed by the text processing method.

In one aspect, a computer program product is provided that includes computer program code that is loaded and executed by a processor to perform the operations performed by the text processing method.

According to the scheme provided by the application, firstly, the text segment to be subjected to text normalization is determined from the text data to be subjected to text normalization, and only the text segment is subjected to text normalization, so that the calculated amount is reduced; in addition, according to the text formatting method, the text fragment is subjected to text formatting processing based on the semantic features of the text fragment, the processed text fragment is used for replacing the original text fragment, and new text data is obtained.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram of an implementation environment of a text processing method according to an embodiment of the present application;

fig. 2 is a flowchart of a text processing method provided in an embodiment of the present application;

fig. 3 is a flowchart of a text processing method provided in an embodiment of the present application;

fig. 4 is a flowchart of a text processing method provided in an embodiment of the present application;

FIG. 5 is a schematic processing diagram of a text detection model according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of a process of a text normalization model according to an embodiment of the present application;

fig. 7 is a flowchart of a text processing method provided in an embodiment of the present application;

FIG. 8 is a schematic structural diagram of a text processing apparatus according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a text processing apparatus according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a terminal according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Fig. 1 is a schematic diagram of an implementation environment of a text processing method provided in an embodiment of the present application, and referring to fig. 1, the implementation environment includes: a terminal 101 and a server 102.

The terminal 101 may be at least one of a smart phone, a game console, a desktop computer, a tablet computer, an e-book reader, an MP3(Moving Picture Experts Group Audio Layer III, motion Picture Experts compression standard Audio Layer 3) player, an MP4(Moving Picture Experts Group Audio Layer IV, motion Picture Experts compression standard Audio Layer 4) player, and a laptop computer. The terminal 101 and the server 102 are connected by wired or wireless communication, which is not limited in the embodiment of the present application. The terminal 101 acquires voice data of a user through the microphone component, and then sends the acquired voice data to the server 102, so that the server 102 sends the voice data to other corresponding terminals. The microphone assembly is embedded in or externally connected to the terminal 101, which is not limited in the embodiment of the present application. The terminal 101 may further perform voice recognition on the obtained voice data after obtaining the voice data, perform text normalization processing on the text data obtained by the voice recognition to obtain text data subjected to the text normalization processing, and then send the text data subjected to the text normalization processing to the server 102, so that the text data subjected to the text normalization processing is sent to other corresponding terminals through the server 102.

The terminal 101 may also receive voice data sent by other terminals through the server 102, send a text conversion request to the server 102 in response to a triggering operation of a text conversion control corresponding to the voice data by a user, and receive text data corresponding to the voice data returned by the server 102 based on the text conversion request; or, the terminal 101 performs voice recognition on the received voice data in response to a triggering operation of a text conversion control corresponding to the voice data by a user, performs text normalization on the text data obtained by the voice recognition to obtain text data subjected to the text normalization, and displays the text data subjected to the text normalization.

The terminal 101 may be generally referred to as one of a plurality of terminals, and the embodiment of the present application is illustrated by the terminal 101. Those skilled in the art will appreciate that the number of terminals described above may be greater or fewer. For example, the number of the terminals may be only a few, or the number of the terminals may be several tens or hundreds, or more, and the number of the terminals 101 and the type of the device are not limited in the embodiment of the present application.

The server 102 may be at least one of a server, a plurality of servers, a cloud computing platform, and a virtualization center. The server 102 and the terminal 101 are connected by wired or wireless communication, which is not limited in the embodiment of the present application. The server 102 receives the voice data sent by the terminal 101 and sends the received voice data to a corresponding terminal; alternatively, the server 102 receives the text data subjected to the text normalization processing and transmitted by the terminal 101, and transmits the received text data to the corresponding terminal. The server 102 may also receive a text conversion request sent by the terminal 101, perform speech recognition on speech data corresponding to the text conversion request, perform text normalization processing on the text data obtained by the speech recognition to obtain text data subjected to the text normalization processing, and send the text data subjected to the text normalization processing to the terminal 101. Optionally, the number of the servers may be more or less, and the embodiment of the present application does not limit this. Of course, the server 102 may also include other functional servers to provide more comprehensive and diverse services.

Fig. 2 is a flowchart of a text processing method provided in an embodiment of the present application, and referring to fig. 2, the method includes:

201. the computer equipment acquires first text data; from the first text data, a first text segment to be subjected to text warping is determined.

202. The computer equipment extracts semantic features of the first text segment, wherein the semantic features comprise at least one of a first type of semantic features, a second type of semantic features and a third type of semantic features; the first semantic feature is used for indicating semantic information of the first text segment, the second semantic feature is used for indicating semantic information of a context text segment of the first text segment, and the third semantic feature is used for indicating pinyin information of the context text segment of the first text segment.

203. And the computer equipment carries out text normalization processing on the first text fragment based on the semantic features of the first text fragment to obtain a second text fragment.

204. And replacing the first text segment with the second text segment by the computer equipment to obtain second text data.