Text processing method and device, computer equipment and computer readable storage medium

文档序号:699310 发布日期:2021-05-04 浏览:6次 中文

阅读说明:本技术 文本处理方法、装置、计算机设备及计算机可读存储介质 (Text processing method and device, computer equipment and computer readable storage medium ) 是由 陈孝良 赵昂 叶森 冯大航 于 2021-01-13 设计创作,主要内容包括:本申请提供了一种文本处理方法、装置、计算机设备及计算机可读存储介质,属于自然语言处理技术领域。本申请首先从待进行文本规整的文本数据中,确定出待进行文本规整的文本片段,并仅对该文本片段进行文本规整处理,降低了计算量;另外,本申请实施例基于该文本片段的语义特征,来对该文本片段进行文本规整处理,再以处理后的文本片段替换原始的文本片段,得到新的文本数据,这种文本规整方式能够对经过语音识别得到的文本进行有效规整,提高了文本数据的可读性和连贯性,进而确保了用户的阅读体验,提高了文本处理效率。(The application provides a text processing method, a text processing device, computer equipment and a computer readable storage medium, and belongs to the technical field of natural language processing. The method and the device firstly determine the text segment to be subjected to text normalization from the text data to be subjected to text normalization, and only perform text normalization on the text segment, so that the calculated amount is reduced; in addition, according to the text formatting method, the text fragment is subjected to text formatting processing based on the semantic features of the text fragment, the processed text fragment is used for replacing the original text fragment, and new text data is obtained.)

1. A method of text processing, the method comprising:

acquiring first text data; determining a first text segment to be subjected to text normalization from the first text data;

extracting semantic features of the first text segment, wherein the semantic features comprise at least one of a first type of semantic features, a second type of semantic features and a third type of semantic features; the first semantic feature is used for indicating semantic information of the first text segment, the second semantic feature is used for indicating semantic information of a context text segment of the first text segment, and the third semantic feature is used for indicating pinyin information of the context text segment of the first text segment;

based on the semantic features of the first text segment, performing text normalization processing on the first text segment to obtain a second text segment;

and replacing the first text segment with the second text segment to obtain second text data.

2. The method according to claim 1, wherein the performing text-warping processing on the first text segment based on the semantic features of the first text segment to obtain a second text segment comprises:

inputting the semantic features of the first text segment into an encoder network of a text normalization model, performing feature mapping processing on the semantic features through the encoder network, and outputting the encoding features of the semantic features;

inputting the coding features into a decoder network of the text normalization model, and determining first probability distribution features corresponding to the coding features through the decoder network;

inputting the first probability distribution characteristics into a linear full-connection layer of the text-structured model, and splicing the first probability distribution characteristics through the linear full-connection layer to obtain spliced first probability distribution characteristics;

and inputting the spliced first probability distribution characteristics into a normalization layer of the text-normalized model, and outputting the second text segments through the normalization layer.

3. The method according to claim 2, wherein the performing feature mapping processing on the semantic features through the encoder network, and outputting the encoded features of the semantic features comprises:

inputting the semantic features into a multi-head attention layer of the encoder network, and acquiring the features of the semantic features on multiple layers through the multi-head attention layer;

inputting the features of the semantic features in multiple layers into a first normalization layer of the encoder network, and performing normalization processing on first superposition features through the first normalization layer to obtain first-class normalized features; the first superposition characteristic is a superposition result of the semantic characteristic and the characteristics of the semantic characteristic at a plurality of levels;

inputting the first type of standardized features into a forward full-connection layer of the encoder network, and splicing the first type of standardized features through the forward full-connection layer to obtain spliced first type of standardized features;

inputting the spliced first-class standardized features into a second standardized layer of the encoder network, and carrying out standardized processing on the first-class standardized features and the spliced first-class standardized features through the second standardized layer to obtain the encoding features.

4. The method of claim 2, wherein said determining, by the decoder network, a first probability distribution characteristic corresponding to the coding characteristic comprises:

inputting the coding features into a multi-head attention layer of the decoder network, and acquiring features of the coding features at multiple levels through the multi-head attention layer;

inputting the features of the multiple layers into a first normalization layer of the decoder network, and normalizing the second superposition feature through the first normalization layer to obtain a second type of normalized feature; the second superposition characteristic is a result of superposition of the coding characteristic and the characteristics of the coding characteristic at a plurality of levels;

inputting the second type of standardized features into a forward full-connection layer of the decoder network, and splicing the second type of standardized features through the forward full-connection layer to obtain spliced second type of standardized features;

inputting the spliced second type of normalized features into a second normalization layer of the decoder network, and normalizing the second type of normalized features and the spliced second type of normalized features through the second normalization layer to obtain the first probability distribution features.

5. The method of claim 1, wherein determining, from the first text data, a first text segment to be text-warped comprises:

performing word segmentation processing on the first text data;

inputting the characters obtained after the word segmentation processing into a text detection model, and determining the first text segment from the first text data through the text detection model.

6. The method of claim 5, wherein inputting the character obtained after the word segmentation into the text detection model, and determining the first text segment from the first text data through the text detection model comprises:

inputting the characters into an input layer of the text detection model, and acquiring text representation features of the characters through the input layer, wherein the text representation features are used for indicating word indexes of the characters in a dictionary;

inputting the text representation features into a word embedding layer of the text detection model, performing feature mapping processing on the text representation features through the word embedding layer, and outputting embedding features;

inputting the embedded features into a bidirectional recurrent neural network layer of the text detection model, and determining probability distribution features of the characters marked as various labels through the bidirectional recurrent neural network layer, wherein the labels are used for indicating the types of the characters;

inputting the probability distribution characteristics into a full connection layer of the text detection model, and splicing the probability distribution characteristics through the full connection layer to obtain spliced probability distribution characteristics;

inputting the spliced probability distribution characteristics into a conditional random field output layer of the text detection model, and determining the label of the character through the conditional random field output layer;

determining the first text segment from the first text data based on the label of the character.

7. The method of claim 1, wherein extracting semantic features of the first text segment comprises:

and inputting the first text segment into an input embedding layer of a text normalization model, and acquiring a coding vector of the first text segment through the input embedding layer to be used as a semantic feature of the first text segment.

8. A method of text processing, the method comprising:

acquiring voice data; performing voice recognition based on the voice data to obtain first text data;

determining a first text segment to be subjected to text normalization from the first text data;

extracting semantic features of the first text segment, wherein the semantic features comprise at least one of a first type of semantic features, a second type of semantic features and a third type of semantic features; the first semantic feature is used for indicating semantic information of the first text segment, the second semantic feature is used for indicating semantic information of a context text segment of the first text segment, and the third semantic feature is used for indicating pinyin information of the context text segment of the first text segment;

based on the semantic features of the first text segment, performing text normalization processing on the first text segment to obtain a second text segment;

and replacing the first text segment with the second text segment to obtain second text data.

9. A text processing apparatus, characterized in that the apparatus comprises:

the acquisition module is used for acquiring first text data;

the determining module is used for determining a first text segment to be subjected to text normalization from the first text data;

the extraction module is used for extracting semantic features of the first text segment, wherein the semantic features comprise at least one of a first type of semantic features, a second type of semantic features and a third type of semantic features; the first semantic feature is used for indicating semantic information of the first text segment, the second semantic feature is used for indicating semantic information of a context text segment of the first text segment, and the third semantic feature is used for indicating pinyin information of the context text segment of the first text segment;

the warping processing module is used for carrying out text warping processing on the first text fragment based on the semantic features of the first text fragment to obtain a second text fragment;

and the replacing module is used for replacing the first text segment with the second text segment to obtain second text data.

10. A text processing apparatus, characterized in that the apparatus comprises:

the acquisition module is used for acquiring voice data;

the voice recognition module is used for carrying out voice recognition based on the voice data to obtain first text data;

the determining module is used for determining a first text segment to be subjected to text normalization from the first text data;

the extraction module is used for extracting semantic features of the first text segment, wherein the semantic features comprise at least one of a first type of semantic features, a second type of semantic features and a third type of semantic features; the first semantic feature is used for indicating semantic information of the first text segment, the second semantic feature is used for indicating semantic information of a context text segment of the first text segment, and the third semantic feature is used for indicating pinyin information of the context text segment of the first text segment;

the warping processing module is used for carrying out text warping processing on the first text fragment based on the semantic features of the first text fragment to obtain a second text fragment;

and the replacing module is used for replacing the first text segment with the second text segment to obtain second text data.

11. A computer device comprising one or more processors and one or more memories having at least one program code stored therein, the program code loaded into and executed by the one or more processors to implement the operations performed by the text processing method of any one of claims 1 to 7; or, the operations performed by the text processing method of claim 8.

12. A computer-readable storage medium having at least one program code stored therein, the program code being loaded and executed by a processor to implement the operations performed by the text processing method according to any one of claims 1 to 7; or, the operations performed by the text processing method of claim 8.

Technical Field

The present application relates to the field of natural language processing technologies, and in particular, to a text processing method and apparatus, a computer device, and a computer-readable storage medium.

Background

Due to the particularity of the voice data, the text obtained by voice recognition of the voice data is generally poor in readability and consistency, and needs to be processed and regulated. For example, due to the language habit of the speaker or the environment where the speaker is located, some redundant, repeated or word order-reversed words may exist in the voice data of the speaker, and when performing voice recognition based on such voice data, the obtained text may also include these redundant, repeated or word order-reversed words, which may seriously affect the reading experience of the user. Therefore, a text processing method is needed to normalize the text obtained through speech recognition.

Disclosure of Invention

The embodiment of the application provides a text processing method and device, computer equipment and a computer readable storage medium, which can effectively regulate texts obtained through voice recognition, improve readability and continuity of text data, further ensure reading experience of a user, improve text processing efficiency and reduce calculation amount of text regulation. The technical scheme provided by the application is as follows:

in one aspect, a text processing method is provided, and the method includes:

acquiring first text data; determining a first text segment to be subjected to text normalization from the first text data;

extracting semantic features of the first text segment, wherein the semantic features comprise at least one of a first type of semantic features, a second type of semantic features and a third type of semantic features; the first semantic feature is used for indicating semantic information of the first text segment, the second semantic feature is used for indicating semantic information of a context text segment of the first text segment, and the third semantic feature is used for indicating pinyin information of the context text segment of the first text segment;

based on the semantic features of the first text segment, carrying out text normalization processing on the first text segment to obtain a second text segment;

and replacing the first text segment with the second text segment to obtain second text data.

In a possible implementation manner, the performing text normalization processing on the first text segment based on the semantic feature of the first text segment to obtain a second text segment includes:

inputting the semantic features of the first text segment into an encoder network of a text normalization model, performing feature mapping processing on the semantic features through the encoder network, and outputting the encoding features of the semantic features;

inputting the coding features into a decoder network of the text normalization model, and determining first probability distribution features corresponding to the coding features through the decoder network;

inputting the first probability distribution characteristic into a linear full-connection layer of the text-structured model, and splicing the first probability distribution characteristic through the linear full-connection layer to obtain a spliced first probability distribution characteristic;

and inputting the spliced first probability distribution characteristic into a normalization layer of the text-normalized model, and outputting the second text segment through the normalization layer.

In a possible implementation manner, the performing, by the encoder network, a feature mapping process on the semantic features, and outputting the encoding features of the semantic features includes:

inputting the semantic features into a multi-head attention layer of the encoder network, and acquiring the features of the semantic features on a plurality of layers through the multi-head attention layer;

inputting the features of the semantic features in multiple layers into a first normalization layer of the encoder network, and performing normalization processing on the first superposition features through the first normalization layer to obtain first-class normalized features; the first superposition characteristic is a superposition result of the semantic characteristic and the characteristics of the semantic characteristic at a plurality of layers;

inputting the first type of standardized features into a forward full-connection layer of the encoder network, and splicing the first type of standardized features through the forward full-connection layer to obtain spliced first type of standardized features;

inputting the spliced first-class standardized features into a second standardized layer of the encoder network, and carrying out standardized processing on the first-class standardized features and the spliced first-class standardized features through the second standardized layer to obtain the encoding features.

In one possible implementation, the determining, by the decoder network, the first probability distribution characteristic corresponding to the coding characteristic includes:

inputting the coding features into a multi-head attention layer of the decoder network, and acquiring the features of the coding features at a plurality of levels through the multi-head attention layer;

inputting the features of the multiple layers into a first normalization layer of the decoder network, and normalizing the second superposition feature through the first normalization layer to obtain a second type of normalized feature; the second superposition characteristic is a result of superposition of the coding characteristic and the characteristics of the coding characteristic at a plurality of levels;

inputting the second type of standardized features into a forward full-connection layer of the decoder network, and splicing the second type of standardized features through the forward full-connection layer to obtain spliced second type of standardized features;

inputting the spliced second type of standardized features into a second standardized layer of the decoder network, and carrying out standardized processing on the second type of standardized features and the spliced second type of standardized features through the second standardized layer to obtain the first probability distribution features.

In one possible implementation manner, the determining, from the first text data, a first text segment to be subjected to text normalization includes:

performing word segmentation processing on the first text data;

inputting the characters obtained after the word segmentation processing into a text detection model, and determining the first text segment from the first text data through the text detection model.

In a possible implementation manner, the inputting the character obtained after the word segmentation into the text detection model, and the determining, by the text detection model, the first text segment from the first text data includes:

inputting the character into an input layer of the text detection model, and acquiring a text representation characteristic of the character through the input layer, wherein the text representation characteristic is used for indicating a word index of the character in a dictionary;

inputting the text representation characteristics into a word embedding layer of the text detection model, performing characteristic mapping processing on the text representation characteristics through the word embedding layer, and outputting the embedding characteristics;

inputting the embedded features into a bidirectional recurrent neural network layer of the text detection model, and determining probability distribution features of the characters marked as various labels through the bidirectional recurrent neural network layer, wherein the labels are used for indicating the types of the characters;

inputting the probability distribution characteristics into a full connection layer of the text detection model, and splicing the probability distribution characteristics through the full connection layer to obtain spliced probability distribution characteristics;

inputting the spliced probability distribution characteristics into a conditional random field output layer of the text detection model, and determining the label of the character through the conditional random field output layer;

the first text segment is determined from the first text data based on the label of the character.

In one possible implementation, the extracting semantic features of the first text segment includes:

and inputting the first text segment into an input embedding layer of a text-structured model, and acquiring the coding vector of the first text segment through the input embedding layer to be used as the semantic feature of the first text segment.

In one possible implementation, the training process of the text detection model includes:

obtaining first sample text data and a sample label of the first sample text data, wherein the sample label is used for indicating the type of a sample text fragment to be normalized in the sample text data;

the text detection model is trained based on the first sample text data and sample labels of the first sample text data.

In one possible implementation, the training process of the text-warping model includes:

acquiring second sample text data and structured text data corresponding to the second sample text data;

training the text normalization model based on the second sample text data and the normalized text data.

In one aspect, a text processing method is provided, and the method includes:

acquiring voice data; performing voice recognition based on the voice data to obtain first text data;

determining a first text segment to be subjected to text normalization from the first text data;

extracting semantic features of the first text segment, wherein the semantic features comprise at least one of a first type of semantic features, a second type of semantic features and a third type of semantic features; the first semantic feature is used for indicating semantic information of the first text segment, the second semantic feature is used for indicating semantic information of a context text segment of the first text segment, and the third semantic feature is used for indicating pinyin information of the context text segment of the first text segment;

based on the semantic features of the first text segment, carrying out text normalization processing on the first text segment to obtain a second text segment;

and replacing the first text segment with the second text segment to obtain second text data.

In one aspect, a text processing apparatus is provided, the apparatus including:

the acquisition module is used for acquiring first text data;

the determining module is used for determining a first text segment to be subjected to text normalization from the first text data;

the extraction module is used for extracting semantic features of the first text segment, wherein the semantic features comprise at least one of a first type of semantic features, a second type of semantic features and a third type of semantic features; the first semantic feature is used for indicating semantic information of the first text segment, the second semantic feature is used for indicating semantic information of a context text segment of the first text segment, and the third semantic feature is used for indicating pinyin information of the context text segment of the first text segment;

the normalization processing module is used for performing text normalization processing on the first text segment based on the semantic features of the first text segment to obtain a second text segment;

and the replacing module is used for replacing the first text segment with the second text segment to obtain second text data.

In a possible implementation manner, the regularization processing module includes a first processing unit, a second processing unit, a third processing unit and a fourth processing unit;

the first processing unit is used for inputting the semantic features of the first text segment into an encoder network of a text-structured model, performing feature mapping processing on the semantic features through the encoder network, and outputting the encoding features of the semantic features;

the second processing unit is used for inputting the coding characteristics into a decoder network of the text normalization model, and determining first probability distribution characteristics corresponding to the coding characteristics through the decoder network;

the third processing unit is configured to input the first probability distribution characteristic into a linear full-link layer of the text-structured model, and splice the first probability distribution characteristic through the linear full-link layer to obtain a spliced first probability distribution characteristic;

the fourth processing unit is configured to input the spliced first probability distribution characteristic into a normalization layer of the text normalization model, and output the second text segment through the normalization layer.

In a possible implementation manner, the first processing unit is configured to input the semantic feature into a multi-head attention layer of the encoder network, and obtain features of the semantic feature at multiple levels through the multi-head attention layer; inputting the features of the semantic features in multiple layers into a first normalization layer of the encoder network, and performing normalization processing on the first superposition features through the first normalization layer to obtain first-class normalized features; the first superposition characteristic is a superposition result of the semantic characteristic and the characteristics of the semantic characteristic at a plurality of layers; inputting the first type of standardized features into a forward full-connection layer of the encoder network, and splicing the first type of standardized features through the forward full-connection layer to obtain spliced first type of standardized features; inputting the spliced first-class standardized features into a second standardized layer of the encoder network, and carrying out standardized processing on the first-class standardized features and the spliced first-class standardized features through the second standardized layer to obtain the encoding features.

In a possible implementation manner, the second processing unit is configured to input the encoded feature into a multi-head attention layer of the decoder network, and obtain features of the encoded feature at multiple levels through the multi-head attention layer;

inputting the features of the multiple layers into a first normalization layer of the decoder network, and normalizing the second superposition feature through the first normalization layer to obtain a second type of normalized feature; the second superposition characteristic is a result of superposition of the coding characteristic and the characteristics of the coding characteristic at a plurality of levels; inputting the second type of standardized features into a forward full-connection layer of the decoder network, and splicing the second type of standardized features through the forward full-connection layer to obtain spliced second type of standardized features; inputting the spliced second type of standardized features into a second standardized layer of the decoder network, and carrying out standardized processing on the second type of standardized features and the spliced second type of standardized features through the second standardized layer to obtain the first probability distribution features.

In a possible implementation manner, the determining module comprises a word dividing processing unit and a determining unit;

the word segmentation processing unit is used for carrying out word segmentation processing on the first text data;

the determining unit is used for inputting the characters obtained after the word segmentation processing into a text detection model, and determining the first text segment from the first text data through the text detection model.

In a possible implementation manner, the determining unit is configured to input the character into an input layer of the text detection model, and obtain, through the input layer, a text representation feature of the character, where the text representation feature is used to indicate a word index of the character in a dictionary; inputting the text representation characteristics into a word embedding layer of the text detection model, performing characteristic mapping processing on the text representation characteristics through the word embedding layer, and outputting the embedding characteristics; inputting the embedded features into a bidirectional recurrent neural network layer of the text detection model, and determining probability distribution features of the characters marked as various labels through the bidirectional recurrent neural network layer, wherein the labels are used for indicating the types of the characters; inputting the probability distribution characteristics into a full connection layer of the text detection model, and splicing the probability distribution characteristics through the full connection layer to obtain spliced probability distribution characteristics; inputting the spliced probability distribution characteristics into a conditional random field output layer of the text detection model, and determining the label of the character through the conditional random field output layer; the first text segment is determined from the first text data based on the label of the character.

In a possible implementation manner, the extraction module is configured to input the first text segment into an input embedding layer of a text-structured model, and obtain, through the input embedding layer, a coding vector of the first text segment as a semantic feature of the first text segment.

In one possible implementation, the training process of the text detection model includes:

obtaining first sample text data and a sample label of the first sample text data, wherein the sample label is used for indicating the type of a sample text fragment to be normalized in the sample text data;

the text detection model is trained based on the first sample text data and sample labels of the first sample text data.

In one possible implementation, the training process of the text-warping model includes:

acquiring second sample text data and structured text data corresponding to the second sample text data;

training the text normalization model based on the second sample text data and the normalized text data.

In one aspect, a text processing apparatus is provided, the apparatus including:

the acquisition module is used for acquiring voice data;

the voice recognition module is used for carrying out voice recognition based on the voice data to obtain first text data;

the determining module is used for determining a first text segment to be subjected to text normalization from the first text data;

the extraction module is used for extracting semantic features of the first text segment, wherein the semantic features comprise at least one of a first type of semantic features, a second type of semantic features and a third type of semantic features; the first semantic feature is used for indicating semantic information of the first text segment, the second semantic feature is used for indicating semantic information of a context text segment of the first text segment, and the third semantic feature is used for indicating pinyin information of the context text segment of the first text segment;

the normalization processing module is used for performing text normalization processing on the first text segment based on the semantic features of the first text segment to obtain a second text segment;

and the replacing module is used for replacing the first text segment with the second text segment to obtain second text data.

In one aspect, a computer device is provided that includes one or more processors and one or more memories having at least one program code stored therein, the program code being loaded and executed by the one or more processors to perform the operations performed by the text processing method.

In one aspect, a computer-readable storage medium having at least one program code stored therein is provided, the program code being loaded and executed by a processor to implement the operations performed by the text processing method.

In one aspect, a computer program product is provided that includes computer program code that is loaded and executed by a processor to perform the operations performed by the text processing method.

According to the scheme provided by the application, firstly, the text segment to be subjected to text normalization is determined from the text data to be subjected to text normalization, and only the text segment is subjected to text normalization, so that the calculated amount is reduced; in addition, according to the text formatting method, the text fragment is subjected to text formatting processing based on the semantic features of the text fragment, the processed text fragment is used for replacing the original text fragment, and new text data is obtained.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram of an implementation environment of a text processing method according to an embodiment of the present application;

fig. 2 is a flowchart of a text processing method provided in an embodiment of the present application;

fig. 3 is a flowchart of a text processing method provided in an embodiment of the present application;

fig. 4 is a flowchart of a text processing method provided in an embodiment of the present application;

FIG. 5 is a schematic processing diagram of a text detection model according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of a process of a text normalization model according to an embodiment of the present application;

fig. 7 is a flowchart of a text processing method provided in an embodiment of the present application;

FIG. 8 is a schematic structural diagram of a text processing apparatus according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a text processing apparatus according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a terminal according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Fig. 1 is a schematic diagram of an implementation environment of a text processing method provided in an embodiment of the present application, and referring to fig. 1, the implementation environment includes: a terminal 101 and a server 102.

The terminal 101 may be at least one of a smart phone, a game console, a desktop computer, a tablet computer, an e-book reader, an MP3(Moving Picture Experts Group Audio Layer III, motion Picture Experts compression standard Audio Layer 3) player, an MP4(Moving Picture Experts Group Audio Layer IV, motion Picture Experts compression standard Audio Layer 4) player, and a laptop computer. The terminal 101 and the server 102 are connected by wired or wireless communication, which is not limited in the embodiment of the present application. The terminal 101 acquires voice data of a user through the microphone component, and then sends the acquired voice data to the server 102, so that the server 102 sends the voice data to other corresponding terminals. The microphone assembly is embedded in or externally connected to the terminal 101, which is not limited in the embodiment of the present application. The terminal 101 may further perform voice recognition on the obtained voice data after obtaining the voice data, perform text normalization processing on the text data obtained by the voice recognition to obtain text data subjected to the text normalization processing, and then send the text data subjected to the text normalization processing to the server 102, so that the text data subjected to the text normalization processing is sent to other corresponding terminals through the server 102.

The terminal 101 may also receive voice data sent by other terminals through the server 102, send a text conversion request to the server 102 in response to a triggering operation of a text conversion control corresponding to the voice data by a user, and receive text data corresponding to the voice data returned by the server 102 based on the text conversion request; or, the terminal 101 performs voice recognition on the received voice data in response to a triggering operation of a text conversion control corresponding to the voice data by a user, performs text normalization on the text data obtained by the voice recognition to obtain text data subjected to the text normalization, and displays the text data subjected to the text normalization.

The terminal 101 may be generally referred to as one of a plurality of terminals, and the embodiment of the present application is illustrated by the terminal 101. Those skilled in the art will appreciate that the number of terminals described above may be greater or fewer. For example, the number of the terminals may be only a few, or the number of the terminals may be several tens or hundreds, or more, and the number of the terminals 101 and the type of the device are not limited in the embodiment of the present application.

The server 102 may be at least one of a server, a plurality of servers, a cloud computing platform, and a virtualization center. The server 102 and the terminal 101 are connected by wired or wireless communication, which is not limited in the embodiment of the present application. The server 102 receives the voice data sent by the terminal 101 and sends the received voice data to a corresponding terminal; alternatively, the server 102 receives the text data subjected to the text normalization processing and transmitted by the terminal 101, and transmits the received text data to the corresponding terminal. The server 102 may also receive a text conversion request sent by the terminal 101, perform speech recognition on speech data corresponding to the text conversion request, perform text normalization processing on the text data obtained by the speech recognition to obtain text data subjected to the text normalization processing, and send the text data subjected to the text normalization processing to the terminal 101. Optionally, the number of the servers may be more or less, and the embodiment of the present application does not limit this. Of course, the server 102 may also include other functional servers to provide more comprehensive and diverse services.

Fig. 2 is a flowchart of a text processing method provided in an embodiment of the present application, and referring to fig. 2, the method includes:

201. the computer equipment acquires first text data; from the first text data, a first text segment to be subjected to text warping is determined.

202. The computer equipment extracts semantic features of the first text segment, wherein the semantic features comprise at least one of a first type of semantic features, a second type of semantic features and a third type of semantic features; the first semantic feature is used for indicating semantic information of the first text segment, the second semantic feature is used for indicating semantic information of a context text segment of the first text segment, and the third semantic feature is used for indicating pinyin information of the context text segment of the first text segment.

203. And the computer equipment carries out text normalization processing on the first text fragment based on the semantic features of the first text fragment to obtain a second text fragment.

204. And replacing the first text segment with the second text segment by the computer equipment to obtain second text data.

According to the scheme provided by the embodiment of the application, firstly, the text segment to be subjected to text normalization is determined from the text data to be subjected to text normalization, and only the text segment is subjected to text normalization, so that the calculation amount is reduced; in addition, according to the text formatting method, the text fragment is subjected to text formatting processing based on the semantic features of the text fragment, the processed text fragment is used for replacing the original text fragment, and new text data is obtained.

In a possible implementation manner, the performing text normalization processing on the first text segment based on the semantic feature of the first text segment to obtain a second text segment includes:

inputting the semantic features of the first text segment into an encoder network of a text normalization model, performing feature mapping processing on the semantic features through the encoder network, and outputting the encoding features of the semantic features;

inputting the coding features into a decoder network of the text normalization model, and determining first probability distribution features corresponding to the coding features through the decoder network;

inputting the first probability distribution characteristic into a linear full-connection layer of the text-structured model, and splicing the first probability distribution characteristic through the linear full-connection layer to obtain a spliced first probability distribution characteristic;

and inputting the spliced first probability distribution characteristic into a normalization layer of the text-normalized model, and outputting the second text segment through the normalization layer.

In a possible implementation manner, the performing, by the encoder network, a feature mapping process on the semantic features, and outputting the encoding features of the semantic features includes:

inputting the semantic features into a multi-head attention layer of the encoder network, and acquiring the features of the semantic features on a plurality of layers through the multi-head attention layer;

inputting the features of the semantic features in multiple layers into a first normalization layer of the encoder network, and performing normalization processing on the first superposition features through the first normalization layer to obtain first-class normalized features; the first superposition characteristic is a superposition result of the semantic characteristic and the characteristics of the semantic characteristic at a plurality of layers;

inputting the first type of standardized features into a forward full-connection layer of the encoder network, and splicing the first type of standardized features through the forward full-connection layer to obtain spliced first type of standardized features;

inputting the spliced first-class standardized features into a second standardized layer of the encoder network, and carrying out standardized processing on the first-class standardized features and the spliced first-class standardized features through the second standardized layer to obtain the encoding features.

In one possible implementation, the determining, by the decoder network, the first probability distribution characteristic corresponding to the coding characteristic includes:

inputting the coding features into a multi-head attention layer of the decoder network, and acquiring the features of the coding features at a plurality of levels through the multi-head attention layer;

inputting the features of the multiple layers into a first normalization layer of the decoder network, and normalizing the second superposition feature through the first normalization layer to obtain a second type of normalized feature; the second superposition characteristic is a result of superposition of the coding characteristic and the characteristics of the coding characteristic at a plurality of levels;

inputting the second type of standardized features into a forward full-connection layer of the decoder network, and splicing the second type of standardized features through the forward full-connection layer to obtain spliced second type of standardized features;

inputting the spliced second type of standardized features into a second standardized layer of the decoder network, and carrying out standardized processing on the second type of standardized features and the spliced second type of standardized features through the second standardized layer to obtain the first probability distribution features.

In one possible implementation manner, the determining, from the first text data, a first text segment to be subjected to text normalization includes:

performing word segmentation processing on the first text data;

inputting the characters obtained after the word segmentation processing into a text detection model, and determining the first text segment from the first text data through the text detection model.

In a possible implementation manner, the inputting the character obtained after the word segmentation into the text detection model, and the determining, by the text detection model, the first text segment from the first text data includes:

inputting the character into an input layer of the text detection model, and acquiring a text representation characteristic of the character through the input layer, wherein the text representation characteristic is used for indicating a word index of the character in a dictionary;

inputting the text representation characteristics into a word embedding layer of the text detection model, performing characteristic mapping processing on the text representation characteristics through the word embedding layer, and outputting the embedding characteristics;

inputting the embedded features into a bidirectional recurrent neural network layer of the text detection model, and determining probability distribution features of the characters marked as various labels through the bidirectional recurrent neural network layer, wherein the labels are used for indicating the types of the characters;

inputting the probability distribution characteristics into a full connection layer of the text detection model, and splicing the probability distribution characteristics through the full connection layer to obtain spliced probability distribution characteristics;

inputting the spliced probability distribution characteristics into a conditional random field output layer of the text detection model, and determining the label of the character through the conditional random field output layer;

the first text segment is determined from the first text data based on the label of the character.

In one possible implementation, the extracting semantic features of the first text segment includes:

and inputting the first text segment into an input embedding layer of a text-structured model, and acquiring the coding vector of the first text segment through the input embedding layer to be used as the semantic feature of the first text segment.

In one possible implementation, the training process of the text detection model includes:

obtaining first sample text data and a sample label of the first sample text data, wherein the sample label is used for indicating the type of a sample text fragment to be normalized in the sample text data;

the text detection model is trained based on the first sample text data and sample labels of the first sample text data.

In one possible implementation, the training process of the text-warping model includes:

acquiring second sample text data and structured text data corresponding to the second sample text data;

training the text normalization model based on the second sample text data and the normalized text data.

Fig. 3 is a flowchart of a text processing method provided in an embodiment of the present application, and referring to fig. 3, the method includes:

301. the computer equipment acquires voice data; and performing voice recognition based on the voice data to obtain first text data.

302. The computer equipment determines a first text segment to be subjected to text normalization from the first text data.

303. The computer equipment extracts semantic features of the first text segment, wherein the semantic features comprise at least one of a first type of semantic features, a second type of semantic features and a third type of semantic features; the first semantic feature is used for indicating semantic information of the first text segment, the second semantic feature is used for indicating semantic information of a context text segment of the first text segment, and the third semantic feature is used for indicating pinyin information of the context text segment of the first text segment.

304. And the computer equipment carries out text normalization processing on the first text fragment based on the semantic features of the first text fragment to obtain a second text fragment.

305. And replacing the first text segment with the second text segment by the computer equipment to obtain second text data.

According to the scheme provided by the embodiment of the application, firstly, voice recognition is carried out on the obtained voice data to obtain the text data to be structured, then the text segment to be subjected to text structuring is determined from the text data to be subjected to text structuring, and only the text segment is subjected to text structuring, so that the calculated amount is reduced; in addition, according to the text formatting method, the text fragment is subjected to text formatting processing based on the semantic features of the text fragment, the processed text fragment is used for replacing the original text fragment, and new text data is obtained.

Fig. 4 is a flowchart of a text processing method provided in an embodiment of the present application, and referring to fig. 4, the method includes:

401. the server acquires first text data.

In a possible implementation manner, the terminal acquires text data input by a user, and further sends the acquired text data as first text data to the server, so that the server acquires the first text data.

402. The server performs word segmentation processing on the first text data.

In one possible implementation manner, the server separates the characters in the first text data by spaces to realize word segmentation processing on the first text data, so that the characters in the first text data are processed based on the result of the word segmentation processing during subsequent processing.

Taking the first text data as "issuing wish service indexes (two zero to three zero to four) developed by small and medium-sized enterprises in S city", for example, "issuing three thousand copies" of the wish service indexes (two zero to three zero to four) developed by small and medium-sized enterprises in shanghai city "after word separation processing.

It should be noted that the above process is only an exemplary way of performing word segmentation processing on text data, and in a more possible implementation manner, other ways are used to perform word segmentation processing on text data, which is not limited in this embodiment of the present application.

403. And the server inputs the characters obtained after the word segmentation into a text detection model, and determines the first text segment from the first text data through the text detection model.

It should be noted that the text detection model includes an input layer, a word embedding layer, a Bi-directional Recurrent Neural Network (Bi-RNN) layer, a linear full-link layer, and a Conditional Random Field (CRF) output layer. Optionally, a convolutional layer or a translation (Transformer) model is used to replace the bidirectional cyclic neural network layer, which is not limited in this embodiment.

In a possible implementation manner, the server firstly inputs the character into an input layer of the text detection model, and obtains a text representation feature of the character through the input layer, where the text representation feature is used to indicate a word index of the character in a dictionary, where the dictionary stores a plurality of characters, each character corresponds to a corresponding word index, and the word index is used to indicate a storage location of the character in the dictionary, so as to determine the corresponding character from the dictionary according to the word index; inputting the text representation characteristics into a word embedding layer of the text detection model, performing characteristic mapping processing on the text representation characteristics through the word embedding layer, and outputting the embedding characteristics; inputting the embedded features into a bidirectional cyclic neural network layer of the text detection model, and determining probability distribution features of the characters marked as various labels through the bidirectional cyclic neural network layer, wherein the labels are used for indicating the types of the characters; then inputting the probability distribution characteristics into a full connection layer of the text detection model, and splicing the probability distribution characteristics through the full connection layer to obtain spliced probability distribution characteristics; then inputting the spliced probability distribution characteristics into a conditional random field output layer of the text detection model, and determining the label of the character through the conditional random field output layer; finally, the first text segment is determined from the first text data based on the label of the character.

Referring to fig. 5, fig. 5 is a schematic diagram of a processing process of a text detection model provided in the embodiment of the present application, taking first text data as an example of "issuing" three thousand indexes (two zero to three zero to four) for development of volunteer services for small and medium-sized enterprises in S city, "the server inputs each character of the first text data obtained by word segmentation into the text detection model, and after each character is input into an input layer of the text detection model, the input character is converted into a vector X (X1, X2, X3, …, xn) as a text representation feature of the character through a Lookup Table (Lookup Table) of the input layer; inputting a vector X as a text representation feature into a word embedding layer of a text detection model, and mapping the vector X into a vector E (E1, E2, E3, …, en) with a fixed length through the word embedding layer to serve as an embedding feature; inputting a vector E serving as an embedded feature into a bidirectional cyclic neural network layer of a text detection model, determining probability distribution of each character marked as various labels through the bidirectional cyclic neural network layer, further determining a result obtained by splicing the probability distribution of each character marked as various labels through a full connection layer of the text detection model, outputting the result to a conditional random field layer of the text detection model, determining a label with the highest probability in various labels corresponding to each character as the label of the character through the conditional random field layer, and further determining a first text segment based on the label of each character in first text data. Optionally, after determining the first text segment to be subjected to text normalization, the server marks the position of the first text segment in the first text data, so that when performing subsequent replacement of the first text segment, the server performs replacement according to the position of the first text segment in the first text data.

When the bidirectional cyclic neural network layer determines the probability distribution of any character marked as various labels, the embedded characteristics of the character, the probability distribution of the previous character marked as various labels of the character in the first text data, and the probability distribution of the next character marked as various labels of the character in the first text data are input, that is, the context of the character in the text data is considered, and the accuracy of the determined probability distribution result is improved.

For the conditional random field layer, the input of the conditional random field layer is the probability distribution characteristic after splicing, that is, an n × k matrix, where n denotes the input length, k is the number of labels, the input of the conditional random field is an observation sequence, which is denoted as D (D1, D2, D3, …, dn), the output is a marker sequence, which is denoted as Y (Y1, Y2, Y3, …, yn), the goal of the conditional random field layer is to construct a conditional probability model P (Y | x), see the following formula (1):

wherein the content of the first and second substances,tj(yi-1,yix, i) is a transfer function defined at the position of two adjacent markers of the observation sequence for representing the correlation between adjacent marker variables and the effect of the observation sequence on the adjacent marker variables, sk(yiX, i) is a state feature function defined at a marker position i of the observation sequence, representing the effect of the observation sequence on the marker sequence, λj,μkIs a preset parameter and Z is a normalization factor.

For the predicted tag sequence, that is, the marker sequence Y ═ (Y1, Y2, Y3, …, yn), the score calculation formula is seen in the following formula (2):

wherein the content of the first and second substances, label for I position input and output is yiThe probability of (a) of (b) being,is from yi-1To yiThe transition probability of (2).

It should be noted that the text detection model is trained by the server in advance. In one possible implementation manner, a server acquires first sample text data and a sample label of the first sample text data, wherein the sample label is used for indicating the type of a sample text fragment to be normalized in the sample text data; the text detection model is trained based on the first sample text data and sample labels of the first sample text data.

When the text detection model is trained based on the first sample text data and the sample labels of the first sample text data, the server firstly inputs the first sample text data in the first sample text data into an initial text detection model, the label of the first sample text data is determined through the initial text detection model, then the loss function value of the initial text detection model is determined based on the label of the first sample text data and the sample label of the first sample text data, and the parameters of the initial text detection model are adjusted by utilizing a gradient correction network according to the loss function value to obtain the text detection model with the first parameter adjustment.

And then inputting second first sample text data in the first sample text data into the text detection model subjected to the first parameter adjustment, determining a label of the second first sample text data through the text detection model subjected to the first parameter adjustment, further determining a loss function value of the text detection model subjected to the first parameter adjustment based on the label of the second first sample text data and the sample label of the second first sample text data, and further adjusting the parameters of the text detection model subjected to the first parameter adjustment by using a gradient correction network according to the loss function value to obtain the text detection model subjected to the second parameter adjustment. And by analogy, continuously adjusting the parameters of the text detection model subjected to parameter adjustment based on each sample text data in the first sample text data, and finally obtaining the text detection model meeting the target condition. The target condition is that the similarity between the label output by the model and the sample label meets the iteration cutoff condition, or the loss function value of the model meets the iteration cutoff condition, or the iteration frequency reaches the preset frequency, and which condition is specifically adopted is not limited in the embodiment of the application.

The sample label of the first sample text data is obtained by labeling in advance by related technicians, and the related technicians mark the text segments belonging to the preset regular type according to preset marking rules to obtain the sample label of the first sample text data. The preset regulation type includes a numeric regulation type, a date regulation type, a spoken word regulation type, a formula regulation type, and optionally other types, which are not limited in the embodiment of the present application. Different warping types can be defined as corresponding labels, for example, a numeric type is defined as a "num" label, a date type is defined as a "date" label, a time type is defined as a "time" label, a spoken word type (including repeated tone words, spoken words, and the like) is defined as a "reduce" label, a formula type is defined as a "formula" label, and the like, so that a related technician marks a position of a text segment to be subjected to text warping in first sample text data and a sample label corresponding to the text segment to be subjected to text warping according to a preset warping type. Taking the first sample text data as "issuing wish service index (two zero one three to two zero one four) three thousand for small and medium-sized enterprises in S city", where "two zero one three to two zero one four" and "three thousand" are sample text fragments to be subjected to text normalization, and the preset normalization types corresponding to the two text fragments to be normalized are digital normalization types, the sample labels of the two sample text fragments in the first sample text data are labeled as "num" labels.

In addition, if a new preset regular type exists, a corresponding label is added. For the newly added preset structured type, the iterative training can be performed on the text detection model by quantitatively supplementing the labeled data of the type, so that the text structured model is adapted to the corresponding structured task, and the robustness of solving the structured problem by the scheme is improved.

In more possible implementation manners, default labels are set, all the preset regulation types are marked as the default labels, so that the preset regulation types are identified in a unified manner, and the cost and the workload of marking are reduced. For example, taking a label for marking a start character of a text fragment to be normalized as B, a label for marking a middle character of the text fragment to be normalized as I, and a label for marking an end character of the text fragment to be normalized as E, the first text data "issue" three thousand copies "of development volunteer service indexes (two zero to three zero to four) of small and medium-sized enterprises in S city" corresponds to "ooooooooooooooooooooooooooobiiiieooeo".

By labeling the text segments belonging to the preset regular type, each text segment in the text data does not need to be labeled, the data volume of the labels required during model training is reduced, and the labeling cost required during model training is reduced.

It should be noted that, the above is only an exemplary method for determining the first text segment to be subjected to text normalization, and in a more possible implementation manner, the determination of the first text segment is performed in other manners, which is not limited in the embodiment of the present application.

404. The server extracts semantic features of the first text segment, wherein the semantic features comprise at least one of a first type of semantic features, a second type of semantic features and a third type of semantic features; the first semantic feature is used for indicating semantic information of the first text segment, the second semantic feature is used for indicating semantic information of a context text segment of the first text segment, and the third semantic feature is used for indicating pinyin information of the context text segment of the first text segment.

It should be noted that, in this step 404, the process of extracting the semantic features of the first text segment is implemented by a text-structured model, and the text-structured model adopts an encoding-decoding (Encoder-Decoder) model framework of a Transformer, and includes an input embedding layer, an Encoder network, a Decoder network, a linear full-link layer, and a normalization layer. Optionally, the text-normalization model adopts a Recurrent Neural Network (RNN) framework, which is not limited in this embodiment of the present application.

In a possible implementation manner, the server inputs the first text segment into an input embedding layer of the text-structured model, and obtains the encoding vector of the first text segment through the input embedding layer as the semantic feature of the first text segment.

The second semantic feature is the semantic feature of the context window text including the first text segment, the window length is the sum of the word number of the first text segment and the left-right boundary step length, and the window length is generally greater than or equal to the word number of the first text segment; the third semantic feature is the result of converting the context window text including the first text segment into pinyin, and the non-chinese characters remain the same as the original characters.

Taking the first text data as "issuing" three thousand copies of wish service indexes (two zero one three to two zero one four) for development of small and medium-sized enterprises in S city ", wherein the first text fragment is" two zero one three to two zero one four "and" three thousand ", and taking the first text fragment as" two zero one three to two zero one four "as an example, the first semantic feature of the first text fragment is the self text information of the first text fragment" two zero one three to two zero one four ", namely" two zero one three to two zero one four ". Taking the left and right boundary step length of the window as an example, the window text of the window where the first text fragment is located is "business index (twenty-zero-thirteen to twenty-zero-fourteen)", if the left and right boundary step length is less than 4, the second semantic feature does not need to be obtained, optionally, the left and right boundary step length of the window is other values, which is not limited in the embodiment of the present application. The third type text feature of the first text segment is the pinyin information obtained by converting the text of the window text into pinyin, such as "business index (twenty zero, thirteen to twenty zero, fourteen)" three thousand "with pinyin of [ ' wu ', ' suo ', ' yin ', ', ' er ', ' qiian ', ' ling ', ' yi ', ' shi ', ' si ', ' san ', ' qian '), i.e., the third type text feature of the first text segment, wherein non-chinese characters are represented by source characters, such as the symbolic characters in the above example.

In a possible implementation manner, after the server provides the three semantic features, the three semantic features are spliced together to obtain the spliced semantic feature, which is used as the semantic feature of the encoder network to be subsequently input into the text normalization model. When feature splicing is carried out, the three semantic features are spliced through the custom symbols. For example, the word segmentation is performed on the regular target text and the window text, the space segmentation is performed on the window pinyin, and the three semantic features are spliced by the custom symbol "&" to obtain the spliced semantic features [ ' two ', ' thousand ', ' zero ', ' one ', ' ten ', ' three ', ' to ', ' two ', ' thousand ', ' zero ', ' one ', ' ten ', ' four ', ' several ', ' transaction ', ' cord ', ' lead ', ' prime ', ' two ', ' zero ', ' one ', ' ten ', ' three ', ' two ', ' several ', ' zero ', ' one ', ' ten ', ' three ', ' four ', ' several ', ' q ', ' three ', ' one ', ' four ', ' etc ', ' q ', ' three ', ' several ', ' four ', ' etc ', ' four ', ' etc., ' san ', ' zhi ', ' er ', ' qian ', ' ling ', ' yi ', ' shi ', ' si ', ' etc. ', ' san ', ' qian ', ' hi ', ' ri ', ' hi ', ' yi ', ' hi. Optionally, the custom symbol is of another type, which is not limited in this embodiment of the application.

405. The server inputs the semantic features of the first text segment into an encoder network of a text normalization model, performs feature mapping processing on the semantic features through the encoder network, and outputs the encoding features of the semantic features.

It should be noted that, the Encoder (Encoder) network is composed of N identical layers (layers), each Layer is composed of two Sub-layers (Sub-layers), respectively a Multi-Head Attention (Multi-Head Attention) Layer and a Forward full-link (FC Feed Forward) Layer, each Sub-Layer is added with a Normalization (Add & Norm) Layer, which includes a Residual Connection (Residual Connection) Layer and a Normalization Layer (Layer Normalization), that is, the Encoder network includes the Multi-Head Attention Layer, the first Normalization Layer, the Forward full-link Layer and the second Normalization Layer, and the output of each Sub-Layer is: sub _ Layer _ output ═ LayerNorm (x + sublayer (x)).

In a possible implementation manner, the server inputs the semantic features into a multi-head attention layer of the encoder network, and obtains the features of the semantic features at multiple layers through the multi-head attention layer; inputting the features of the semantic features in multiple layers into a first normalization layer of the encoder network, and performing normalization processing on the first superposition features through the first normalization layer to obtain first-class normalized features; the first superposition characteristic is a superposition result of the semantic characteristic and the characteristics of the semantic characteristic at a plurality of layers; inputting the first type of standardized features into a forward full-connection layer of the encoder network, and splicing the first type of standardized features through the forward full-connection layer to obtain spliced first type of standardized features; inputting the spliced first-class standardized features into a second standardized layer of the encoder network, and carrying out standardized processing on the first-class standardized features and the spliced first-class standardized features through the second standardized layer to obtain the encoding features.

406. The server inputs the coding features into a decoder network of the text normalization model, and determines first probability distribution features corresponding to the coding features through the decoder network.

It should be noted that the structure of the Decoder (Decoder) network is similar to that of the encoder network, and includes a multi-head attention layer, a first normalization layer, a forward full-link layer, and a second normalization layer, where an input of the Decoder network is an output of the encoder network, and an output is a probability distribution corresponding to each possible result after text normalization processing is performed on a text segment to be normalized in text data.

In a possible implementation manner, the server inputs the coding feature into a multi-head attention layer of the decoder network, and obtains the feature of the coding feature at multiple levels through the multi-head attention layer; inputting the features of the multiple layers into a first normalization layer of the decoder network, and normalizing the second superposition feature through the first normalization layer to obtain a second type of normalized feature; the second superposition characteristic is a result of superposition of the coding characteristic and the characteristics of the coding characteristic at a plurality of levels; inputting the second type of standardized features into a forward full-connection layer of the decoder network, and splicing the second type of standardized features through the forward full-connection layer to obtain spliced second type of standardized features; inputting the spliced second type of standardized features into a second standardized layer of the decoder network, and carrying out standardized processing on the second type of standardized features and the spliced second type of standardized features through the second standardized layer to obtain the first probability distribution features.

407. And the server inputs the first probability distribution characteristic into a linear full-connection layer of the text regular model, and the first probability distribution characteristic is spliced through the linear full-connection layer to obtain a spliced first probability distribution characteristic.

408. And the server inputs the spliced first probability distribution characteristic into a normalization layer of the text normalization model, and outputs a second text segment through the normalization layer.

It should be noted that the text-structured model used in the above steps 404 to 408 is trained by the server in advance. Unlike the text-warping model in actual use, the text-warping model used for training is provided with a mask-based Multi-Head Attention (mask Multi-Head Attention) layer and a third normalization layer before the Multi-Head Attention layer in the decoder network. By arranging the multi-head attention layer based on the mask between the multi-head attention layers in the decoder network, the information of the words at the subsequent positions can not be contacted when the words at any position are predicted in the training process, and the accuracy of model training is improved.

And after the coding layer calculates, the decoding layer uses the result calculated by the coding layer as the input of the multi-head attention layer to circularly decode the output word probability of the current position.

In a possible implementation manner, the server obtains second sample text data and structured text data corresponding to the second sample text data; training the text normalization model based on the second sample text data and the normalized text data.

When the text-structured model is trained based on the second sample text data and the structured text data, the server firstly inputs first second sample text data in the second sample text data into an initial text-structured model, determines text data obtained by processing the first second sample text data through the text-structured model through the initial text-structured model, further determines a loss function value of the initial text-structured model based on the text data obtained by processing the first second sample text data through the text-structured model and the structured text data corresponding to the first second sample text data, and adjusts parameters of the initial text-structured model by using a gradient correction network according to the loss function value to obtain the text-structured model subjected to the first parameter adjustment.

And then inputting second sample text data in the second sample text data into the text-warping model subjected to the first parameter adjustment, determining text data obtained by processing the second sample text data through the text-warping model subjected to the first parameter adjustment, further determining a loss function value of the text-warping model subjected to the first parameter adjustment based on the text data obtained by processing the second sample text data through the text-warping model and the warped text data corresponding to the second sample text data, and further adjusting the parameters of the text-warping model subjected to the first parameter adjustment by using a gradient correction network according to the loss function value to obtain the text-warping model subjected to the second parameter adjustment.

And by analogy, continuously adjusting the parameters of the text-structured model subjected to parameter adjustment based on each sample text data in the second sample text data, and finally obtaining the text-structured model meeting the target conditions. The similarity of the text data which is obtained by processing the text normalization model under the target condition and is normalized corresponding to the sample text data meets an iteration cutoff condition, or a loss function value of the model meets the iteration cutoff condition, or the iteration times reach preset times.

Taking the second sample text data as "issuing volunteer service indexes (two zero one three to two zero one four) for three thousand for small and medium enterprises in S city", wherein "two zero one three to two zero one four" and "three thousand" are sample text fragments to be subjected to text normalization, and the result obtained after labeling the second sample text data is "issuing volunteer service indexes (two zero one three to two zero one four/< 2013-2014 >)" three thousand/< 3000> for small and medium enterprises in S city ","/< 2013-2014 and "/< 3000 >" are corresponding sample labels, marked with "/< XX >" for representing the result obtained after the sample text fragments to be subjected to text normalization are normalized, and "XX" is the content of the normalization result. Optionally, the sample text segment is identified separately from the preceding and following characters by a space symbol as a boundary so that the server can quickly locate the sample text segment.

In more possible implementations, a pre-training model is used to replace the encoder network in the text-warping model, and then the decoder network is trained by fine-tuning the training parameters.

The processes shown in the above steps 404 to 408 are shown in fig. 6, where fig. 6 is a schematic diagram of a processing process of a text-structured model provided in this application, where fig. 6 includes two structures of the text-structured model in the training and actual use processes, and the complete structure in fig. 6 is the structure of the text-structured model used for training, when the text-structured model is trained, second sample text data as input is input into an embedding layer, and after the input embedding layer is processed, a vector representation of the second sample text data is obtained as a semantic feature of the second sample text data, and the semantic feature of the second sample text data is position-encoded, and the position-encoded semantic feature is input into an encoder network, and is processed through a multi-head attention layer, a first normalization layer, a forward full-connection layer, and a second normalization layer in the encoder network, and outputting the coding characteristics of the second sample text data.

Inputting the structured text data corresponding to the second sample text data as a calibration standard into an output embedding layer, processing the structured text data by the output embedding layer to obtain vector representation of the structured text data corresponding to the second sample text data, using the vector representation as semantic features of the structured text data, performing position coding on the semantic features of the structured text data, inputting the position-coded semantic features into a decoder network, inputting the supervision features of the structured text data through a mask-based multi-head attention layer and a third normalization layer in the decoder network, further inputting the coding features of the second sample text data and the supervision features of the structured text data into the multi-head attention layer, the first normalization layer, the forward full-connection layer and the second normalization layer in the decoder network, and outputting a first probability distribution feature corresponding to the second sample text data, and outputting text data obtained by the second sample text data through text normalization processing through the linear full-connection layer and the normalization layer.

It should be noted that, in the actual use process, the processing procedure of the text-regularizing model is the same as the above-mentioned process, and is not described herein again, but different from that, in the actual use process, the text-regularizing model does not include the output embedding layer, and the mask-based multi-head attention layer and the third normalization layer in the decoder network.

409. And the server replaces the first text segment with the second text segment to obtain second text data.

In a possible implementation manner, the server replaces the first text segment at the corresponding position with the second text segment according to the position of the first text segment to obtain second text data subjected to text normalization processing.

It should be noted that, in the foregoing steps 401 to 409, a process of performing text normalization processing on the acquired first text data by the server is taken as an example for explanation, in more possible implementation manners, when the terminal acquires the first text data, the terminal processes the first text data through steps similar to the foregoing steps 402 to 409 to obtain second text data subjected to text normalization processing, where specific processes refer to the foregoing steps 402 to 409, and details are not described here.

According to the scheme provided by the embodiment of the application, firstly, the text segment to be subjected to text normalization is determined from the text data to be subjected to text normalization, and only the text segment is subjected to text normalization, so that the calculation amount is reduced; in addition, according to the text formatting method, the text fragment is subjected to text formatting processing based on the semantic features of the text fragment, the processed text fragment is used for replacing the original text fragment, and new text data is obtained. The text regularization task is divided into the text detection task of the text fragment and the text regularization task aiming at the text fragment, so that the task decomposition of the text regularization is realized, the final effect of the task is improved through the optimization of each subtask, the overall task difficulty is reduced, the text regularization efficiency is improved, in addition, the detection and the regularization of the text fragment are carried out step by step, the calculation amount of the text regularization process can be reduced, and the text regularization speed is improved.

The scheme provided by the embodiment of the present application can be applied to various scenarios such as voice recognition and voice transcription, taking the application of the embodiment of the present application in a voice transcription scenario as an example, refer to fig. 7, where fig. 7 is a flowchart of a text processing method provided by the embodiment of the present application, and is applied to a voice transcription scenario, where the method includes:

701. the server obtains voice data.

In a possible implementation manner, the terminal acquires voice data input by a user, sends the acquired voice data to the server, and the server performs voice recognition on the received voice data to obtain the first text data.

When speech recognition is performed on speech data, extracting speech features of the acquired speech data, inputting the extracted speech features into a speech recognition model, extracting hidden layer features of the speech features through a hidden layer of the speech recognition model, classifying the extracted hidden layer features through a feature classification layer of the speech recognition model to obtain probability values of the hidden layer features corresponding to the phonemes, and determining text data corresponding to the speech data based on the probability values of the hidden layer features corresponding to the phonemes, a pronunciation dictionary and a language model. The pronunciation dictionary is used for indicating mapping relation between phonemes and pronunciations, and the language model is used for determining probability values corresponding to various words forming the text data.

Wherein the voice feature is a spectral feature, and the spectral feature is used for indicating the change information of the voice data in the frequency domain. Optionally, the speech feature is another feature, which is not limited in this embodiment of the present application. Taking the voice feature as a spectrum feature as an example, performing fourier transform on the voice data to obtain amplitude information of the voice data on a frequency domain, that is, obtaining amplitudes corresponding to each frequency in the voice data to obtain the spectrum feature of the voice data. Optionally, before performing fourier transform on the voice data, preprocessing is performed on the voice data, such as pre-emphasis, framing, windowing and the like, so as to reduce the influence of problems such as aliasing and higher harmonic distortion caused by a sounding organ and a device for acquiring the voice data, improve the quality of the voice data, and further ensure the accuracy of voice feature extraction.

It should be noted that, the above is only an exemplary manner of performing speech recognition on speech data to obtain first text data, and in a more possible implementation manner, speech recognition is performed on speech data in another manner, which is not limited in the embodiment of the present application.

702. The server performs voice recognition based on the voice data to obtain first text data.

It should be noted that, in the above steps 301 to 302, the terminal acquires the voice data and then sends the voice data to the server, and the server performs voice recognition on the voice data to obtain the first text data as an example. The voice recognition for the voice data is the same as the above step 301, and is not described herein again.

Through the step 302, the voice transcription of the acquired voice data can be realized to obtain the first text data, and then the text normalization is performed on the first text data obtained through the voice transcription through the following steps 303 to 310, wherein the specific process refers to the following steps 303 to 310. In addition, in addition to the above voice transcription scenario, the scheme provided by the embodiment of the present application can also be applied to other scenarios, which is not limited by the embodiment of the present application.

703. The server performs word segmentation processing on the first text data.

704. And the server inputs the characters obtained after the word segmentation into a text detection model, and determines the first text segment from the first text data through the text detection model.

705. The server extracts semantic features of the first text segment, wherein the semantic features comprise at least one of a first type of semantic features, a second type of semantic features and a third type of semantic features; the first semantic feature is used for indicating semantic information of the first text segment, the second semantic feature is used for indicating semantic information of a context text segment of the first text segment, and the third semantic feature is used for indicating pinyin information of the context text segment of the first text segment.

706. The server inputs the semantic features of the first text segment into an encoder network of a text normalization model, performs feature mapping processing on the semantic features through the encoder network, and outputs the encoding features of the semantic features.

707. The server inputs the coding features into a decoder network of the text normalization model, and determines first probability distribution features corresponding to the coding features through the decoder network.

708. And the server inputs the first probability distribution characteristic into a linear full-connection layer of the text regular model, and the first probability distribution characteristic is spliced through the linear full-connection layer to obtain a spliced first probability distribution characteristic.

709. And the server inputs the spliced first probability distribution characteristic into a normalization layer of the text normalization model, and outputs a second text segment through the normalization layer.

710. And the server replaces the first text segment with the second text segment to obtain second text data.

The process from the step 703 to the step 710 is the same as the process from the step 402 to the step 409, and is not described herein again.

It should be noted that, in the above steps 701 to 710, a process of performing text rule processing on text data obtained by speech data recognition by the server is taken as an example for explanation, in more possible implementation manners, after acquiring speech data, the terminal performs speech recognition on the speech data to obtain first text data corresponding to the speech data, and then processes the first text data through steps similar to the above steps 703 to 710 to obtain second text data subjected to text normalization processing, for a specific process, refer to the above steps 703 to 710, which is not described herein again.

According to the scheme provided by the embodiment of the application, firstly, voice recognition is carried out on the obtained voice data to obtain the text data to be structured, then the text segment to be subjected to text structuring is determined from the text data to be subjected to text structuring, and only the text segment is subjected to text structuring, so that the calculated amount is reduced; in addition, according to the text formatting method, the text fragment is subjected to text formatting processing based on the semantic features of the text fragment, the processed text fragment is used for replacing the original text fragment, and new text data is obtained.

All the above optional technical solutions may be combined arbitrarily to form optional embodiments of the present application, and are not described herein again.

Fig. 8 is a schematic structural diagram of a text processing apparatus according to an embodiment of the present application, and referring to fig. 8, the apparatus includes:

an obtaining module 801, configured to obtain first text data;

a determining module 802, configured to determine, from the first text data, a first text segment to be subjected to text normalization;

an extracting module 803, configured to extract semantic features of the first text segment, where the semantic features include at least one of a first type of semantic features, a second type of semantic features, and a third type of semantic features; the first semantic feature is used for indicating semantic information of the first text segment, the second semantic feature is used for indicating semantic information of a context text segment of the first text segment, and the third semantic feature is used for indicating pinyin information of the context text segment of the first text segment;

a warping processing module 804, configured to perform text warping processing on the first text segment based on the semantic feature of the first text segment to obtain a second text segment;

a replacing module 805, configured to replace the first text segment with the second text segment to obtain second text data.

According to the device provided by the embodiment of the application, firstly, the text segment to be subjected to text normalization is determined from the text data to be subjected to text normalization, and only the text segment is subjected to text normalization, so that the calculated amount is reduced; in addition, according to the text formatting method, the text fragment is subjected to text formatting processing based on the semantic features of the text fragment, the processed text fragment is used for replacing the original text fragment, and new text data is obtained.

In a possible implementation manner, the regularization processing module 804 includes a first processing unit, a second processing unit, a third processing unit, and a fourth processing unit;

the first processing unit is used for inputting the semantic features of the first text segment into an encoder network of a text-structured model, performing feature mapping processing on the semantic features through the encoder network, and outputting the encoding features of the semantic features;

the second processing unit is used for inputting the coding characteristics into a decoder network of the text normalization model, and determining first probability distribution characteristics corresponding to the coding characteristics through the decoder network;

the third processing unit is configured to input the first probability distribution characteristic into a linear full-link layer of the text-structured model, and splice the first probability distribution characteristic through the linear full-link layer to obtain a spliced first probability distribution characteristic;

the fourth processing unit is configured to input the spliced first probability distribution characteristic into a normalization layer of the text normalization model, and output the second text segment through the normalization layer.

In a possible implementation manner, the first processing unit is configured to input the semantic feature into a multi-head attention layer of the encoder network, and obtain features of the semantic feature at multiple levels through the multi-head attention layer; inputting the features of the semantic features in multiple layers into a first normalization layer of the encoder network, and performing normalization processing on the first superposition features through the first normalization layer to obtain first-class normalized features; the first superposition characteristic is a superposition result of the semantic characteristic and the characteristics of the semantic characteristic at a plurality of layers; inputting the first type of standardized features into a forward full-connection layer of the encoder network, and splicing the first type of standardized features through the forward full-connection layer to obtain spliced first type of standardized features; inputting the spliced first-class standardized features into a second standardized layer of the encoder network, and carrying out standardized processing on the first-class standardized features and the spliced first-class standardized features through the second standardized layer to obtain the encoding features.

In a possible implementation manner, the second processing unit is configured to input the encoded feature into a multi-head attention layer of the decoder network, and obtain features of the encoded feature at multiple levels through the multi-head attention layer;

inputting the features of the multiple layers into a first normalization layer of the decoder network, and normalizing the second superposition feature through the first normalization layer to obtain a second type of normalized feature; the second superposition characteristic is a result of superposition of the coding characteristic and the characteristics of the coding characteristic at a plurality of levels; inputting the second type of standardized features into a forward full-connection layer of the decoder network, and splicing the second type of standardized features through the forward full-connection layer to obtain spliced second type of standardized features; inputting the spliced second type of standardized features into a second standardized layer of the decoder network, and carrying out standardized processing on the second type of standardized features and the spliced second type of standardized features through the second standardized layer to obtain the first probability distribution features.

In one possible implementation, the determining module 802 includes a word segmentation processing unit and a determining unit;

the word segmentation processing unit is used for carrying out word segmentation processing on the first text data;

the determining unit is used for inputting the characters obtained after the word segmentation processing into a text detection model, and determining the first text segment from the first text data through the text detection model.

In a possible implementation manner, the determining unit is configured to input the character into an input layer of the text detection model, and obtain, through the input layer, a text representation feature of the character, where the text representation feature is used to indicate a word index of the character in a dictionary; inputting the text representation characteristics into a word embedding layer of the text detection model, performing characteristic mapping processing on the text representation characteristics through the word embedding layer, and outputting the embedding characteristics; inputting the embedded features into a bidirectional recurrent neural network layer of the text detection model, and determining probability distribution features of the characters marked as various labels through the bidirectional recurrent neural network layer, wherein the labels are used for indicating the types of the characters; inputting the probability distribution characteristics into a full connection layer of the text detection model, and splicing the probability distribution characteristics through the full connection layer to obtain spliced probability distribution characteristics; inputting the spliced probability distribution characteristics into a conditional random field output layer of the text detection model, and determining the label of the character through the conditional random field output layer; the first text segment is determined from the first text data based on the label of the character.

In a possible implementation manner, the extracting module 803 is configured to input the first text segment into an input embedding layer of a text-structured model, and obtain, through the input embedding layer, a coding vector of the first text segment as a semantic feature of the first text segment.

In one possible implementation, the training process of the text detection model includes:

obtaining first sample text data and a sample label of the first sample text data, wherein the sample label is used for indicating the type of a sample text fragment to be normalized in the sample text data;

the text detection model is trained based on the first sample text data and sample labels of the first sample text data.

In one possible implementation, the training process of the text-warping model includes:

acquiring second sample text data and structured text data corresponding to the second sample text data;

training the text normalization model based on the second sample text data and the normalized text data.

Fig. 9 is a schematic structural diagram of a text processing apparatus according to an embodiment of the present application, and referring to fig. 9, the apparatus includes:

an obtaining module 901, configured to obtain voice data;

a voice recognition module 902, configured to perform voice recognition based on the voice data to obtain first text data;

a determining module 903, configured to determine, from the first text data, a first text segment to be subjected to text normalization;

an extracting module 904, configured to extract semantic features of the first text segment, where the semantic features include at least one of a first type of semantic feature, a second type of semantic feature, and a third type of semantic feature; the first semantic feature is used for indicating semantic information of the first text segment, the second semantic feature is used for indicating semantic information of a context text segment of the first text segment, and the third semantic feature is used for indicating pinyin information of the context text segment of the first text segment;

a warping processing module 905, configured to perform text warping processing on the first text segment based on the semantic feature of the first text segment, to obtain a second text segment;

a replacing module 906, configured to replace the first text segment with the second text segment to obtain second text data.

According to the device provided by the embodiment of the application, firstly, the voice recognition is carried out on the obtained voice data to obtain the text data to be structured, then the text segment to be subjected to text structuring is determined from the text data to be subjected to text structuring, and only the text segment is subjected to text structuring, so that the calculated amount is reduced; in addition, according to the text formatting method, the text fragment is subjected to text formatting processing based on the semantic features of the text fragment, the processed text fragment is used for replacing the original text fragment, and new text data is obtained.

It should be noted that: in the text processing apparatus provided in the above embodiment, when performing text normalization processing on a text obtained through speech recognition, only the division of the above function modules is illustrated, and in practical applications, the function distribution may be completed by different function modules according to needs, that is, the internal structure of the computer device is divided into different function modules to complete all or part of the functions described above. In addition, the text processing apparatus and the text processing method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments and are not described herein again.

In an exemplary embodiment, a computer device is also provided. Optionally, the computer device is provided as a terminal, or the computer device is provided as a server, which is not limited in this embodiment of the present application. The structures of the terminal and the server will be described below, respectively.

Fig. 10 is a schematic structural diagram of a terminal according to an embodiment of the present application. In general, terminal 1000 can include: one or more processors 1001 and one or more memories 1002.

Processor 1001 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so forth. The processor 1001 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 1001 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also referred to as a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 1001 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content that the display screen needs to display. In some embodiments, the processor 1001 may further include an AI (Artificial Intelligence) processor for processing a computing operation related to machine learning.

Memory 1002 may include one or more computer-readable storage media, which may be non-transitory. The memory 1002 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in the memory 1002 is used to store at least one program code for execution by the processor 1001 to implement the text processing methods provided by the method embodiments herein.

In some embodiments, terminal 1000 can also optionally include: a peripheral interface 1003 and at least one peripheral. The processor 1001, memory 1002 and peripheral interface 1003 may be connected by a bus or signal line. Various peripheral devices may be connected to peripheral interface 1003 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1004, display screen 1005, camera assembly 1006, audio circuitry 1007, positioning assembly 1008, and power supply 1009.

The peripheral interface 1003 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 1001 and the memory 1002. In some embodiments, processor 1001, memory 1002, and peripheral interface 1003 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 1001, the memory 1002, and the peripheral interface 1003 may be implemented on separate chips or circuit boards, which are not limited in this application.

The Radio Frequency circuit 1004 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 1004 communicates with communication networks and other communication devices via electromagnetic signals. The radio frequency circuit 1004 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 1004 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 1004 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 1004 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 1005 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 1005 is a touch display screen, the display screen 1005 also has the ability to capture touch signals on or over the surface of the display screen 1005. The touch signal may be input to the processor 1001 as a control signal for processing. At this point, the display screen 1005 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, display screen 1005 can be one, disposed on a front panel of terminal 1000; in other embodiments, display 1005 can be at least two, respectively disposed on different surfaces of terminal 1000 or in a folded design; in other embodiments, display 1005 can be a flexible display disposed on a curved surface or a folded surface of terminal 1000. Even more, the display screen 1005 may be arranged in a non-rectangular irregular figure, i.e., a shaped screen. The Display screen 1005 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and the like.

The camera assembly 1006 is used to capture images or video. Optionally, the camera assembly 1006 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 1006 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

The audio circuit 1007 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 1001 for processing or inputting the electric signals to the radio frequency circuit 1004 for realizing voice communication. For stereo sound collection or noise reduction purposes, multiple microphones can be provided, each at a different location of terminal 1000. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 1001 or the radio frequency circuit 1004 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuit 1007 may also include a headphone jack.

A Location component 1008 is employed to locate a current geographic Location of terminal 1000 for purposes of navigation or LBS (Location Based Service). The Positioning component 1008 may be a Positioning component based on the Global Positioning System (GPS) in the united states, the beidou System in china, the graves System in russia, or the galileo System in the european union.

Power supply 1009 is used to supply power to various components in terminal 1000. The power source 1009 may be alternating current, direct current, disposable batteries, or rechargeable batteries. When the power source 1009 includes a rechargeable battery, the rechargeable battery may support wired charging or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, terminal 1000 can also include one or more sensors 1010. The one or more sensors 1010 include, but are not limited to: acceleration sensor 1011, gyro sensor 1012, pressure sensor 1013, fingerprint sensor 1014, optical sensor 1015, and proximity sensor 1016.

Acceleration sensor 1011 can detect acceleration magnitudes on three coordinate axes of a coordinate system established with terminal 1000. For example, the acceleration sensor 1011 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 1001 may control the display screen 1005 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 1011. The acceleration sensor 1011 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 1012 may detect a body direction and a rotation angle of the terminal 1000, and the gyro sensor 1012 and the acceleration sensor 1011 may cooperate to acquire a 3D motion of the user on the terminal 1000. From the data collected by the gyro sensor 1012, the processor 1001 may implement the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

Pressure sensor 1013 can be disposed on a side frame of terminal 1000 and/or underneath display screen 1005. When pressure sensor 1013 is disposed on a side frame of terminal 1000, a user's grip signal on terminal 1000 can be detected, and processor 1001 performs left-right hand recognition or shortcut operation according to the grip signal collected by pressure sensor 1013. When the pressure sensor 1013 is disposed at a lower layer of the display screen 1005, the processor 1001 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 1005. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 1014 is used to collect a fingerprint of the user, and the processor 1001 identifies the user according to the fingerprint collected by the fingerprint sensor 1014, or the fingerprint sensor 1014 identifies the user according to the collected fingerprint. Upon identifying that the user's identity is a trusted identity, the processor 1001 authorizes the user to perform relevant sensitive operations including unlocking a screen, viewing encrypted information, downloading software, paying, and changing settings, etc. Fingerprint sensor 1014 may be disposed on a front, back, or side of terminal 1000. When a physical key or vendor Logo is provided on terminal 1000, fingerprint sensor 1014 can be integrated with the physical key or vendor Logo.

The optical sensor 1015 is used to collect the ambient light intensity. In one embodiment, the processor 1001 may control the display brightness of the display screen 1005 according to the ambient light intensity collected by the optical sensor 1015. Specifically, when the ambient light intensity is high, the display brightness of the display screen 1005 is increased; when the ambient light intensity is low, the display brightness of the display screen 1005 is turned down. In another embodiment, the processor 1001 may also dynamically adjust the shooting parameters of the camera assembly 1006 according to the intensity of the ambient light collected by the optical sensor 1015.

Proximity sensor 1016, also known as a distance sensor, is typically disposed on a front panel of terminal 1000. Proximity sensor 1016 is used to gather the distance between the user and the front face of terminal 1000. In one embodiment, when proximity sensor 1016 detects that the distance between the user and the front surface of terminal 1000 is gradually reduced, processor 1001 controls display screen 1005 to switch from a bright screen state to a dark screen state; when proximity sensor 1016 detects that the distance between the user and the front of terminal 1000 is gradually increased, display screen 1005 is controlled by processor 1001 to switch from a breath-screen state to a bright-screen state.

Those skilled in the art will appreciate that the configuration shown in FIG. 10 is not intended to be limiting and that terminal 1000 can include more or fewer components than shown, or some components can be combined, or a different arrangement of components can be employed.

Fig. 11 is a schematic structural diagram of a server according to an embodiment of the present application, where the server 1100 may generate a relatively large difference due to a difference in configuration or performance, and may include one or more processors (CPUs) 1101 and one or more memories 1102, where the one or more memories 1102 store at least one program code, and the at least one program code is loaded and executed by the one or more processors 1101 to implement the text Processing method provided by each method embodiment. Of course, the server 1100 may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input/output, and the server 1100 may also include other components for implementing device functions, which are not described herein again.

In an exemplary embodiment, there is also provided a computer-readable storage medium, such as a memory, including program code executable by a processor to perform the text processing method in the above-described embodiments. For example, the computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc Read-Only Memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, a computer program product is also provided, which comprises computer program code stored in a computer readable storage medium, which is loaded and executed by a processor of a computer device to perform the method steps of the text processing method provided in the above embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by hardware associated with program code, and the program may be stored in a computer readable storage medium, where the above mentioned storage medium may be a read-only memory, a magnetic or optical disk, etc.

The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

36页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:地址文本处理方法、装置及设备

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!