Text information processing method and device, storage medium and electronic equipment

文档序号:1378992 发布日期:2020-08-14 浏览:6次 中文

阅读说明:本技术 文本信息处理方法、装置、存储介质及电子设备 (Text information processing method and device, storage medium and electronic equipment ) 是由 刘澍 刘智静 周宇超 康斌 于 2020-04-13 设计创作,主要内容包括:本申请实施例公开了一种文本信息处理方法、装置、存储介质及电子设备。文本信息处理方法包括:当文本信息存在情感词时,按照预设规则对文本信息中的情感词进行量化处理,并根据量化处理结果确定文本信息的目标情感类别;当文本信息中不存在情感词时,检测文本信息的文本长度;若文本长度小于或等于预设值,则根据文本信息的句向量确定其对应属于多个不同样本情感类别的概率,并根据概率从多个不同样本情感类别中确定文本信息的目标情感类别;若文本长度大于预设值,则根据文本信息指定类型的嵌入向量确定文本信息的目标情感类别。本方案中,按照计算复杂程度从易到难的逻辑架构对文本信息进行处理,提升了文本信息的处理速度和处理效果。(The embodiment of the application discloses a text information processing method and device, a storage medium and electronic equipment. The text information processing method comprises the following steps: when the text information has the emotion words, carrying out quantitative processing on the emotion words in the text information according to a preset rule, and determining the target emotion type of the text information according to a quantitative processing result; when the emotional words do not exist in the text information, detecting the text length of the text information; if the text length is smaller than or equal to a preset value, determining the probability of the text information corresponding to a plurality of different sample emotion categories according to the sentence vector of the text information, and determining the target emotion category of the text information from the plurality of different sample emotion categories according to the probability; and if the text length is larger than the preset value, determining the target emotion category of the text information according to the embedded vector of the specified type of the text information. In the scheme, the text information is processed according to the logic architecture with the complexity of calculation from easy to difficult, and the processing speed and the processing effect of the text information are improved.)

1. A text information processing method, comprising:

acquiring text information to be processed;

when the text information has emotion words, carrying out quantitative processing on the emotion words in the text information according to a preset rule, and determining the target emotion type of the text information according to a quantitative processing result;

when the emotional words do not exist in the text information, detecting the text length of the text information;

if the text length is smaller than or equal to a preset value, determining the probability of the text information corresponding to a plurality of different sample emotion categories according to the sentence vector of the text information, and determining the target emotion category of the text information from the plurality of different sample emotion categories according to the probability;

and if the text length is larger than the preset value, determining the target emotion category of the text information according to the embedded vector of the type specified by the text information, wherein the embedded vectors of different types are obtained based on the characteristics of the text information on different dimensions and the correlation among the characteristics.

2. The text information processing method according to claim 1, wherein the specified type of the embedded vector includes: word embedding vectors;

the determining the target emotion category of the text information according to the embedded vector of the specified type of the text information comprises the following steps:

performing convolution processing on the word embedding vector;

converting the word embedding vector after convolution processing into a sentence embedding vector;

and determining the target emotion category of the text information according to the sentence embedding vector.

3. The text information processing method according to claim 1, wherein the specified type of the embedded vector includes: word embedding vectors, position embedding vectors and segment embedding vectors;

the determining the target emotion category of the text information according to the embedded vector of the specified type of the text information comprises the following steps:

encoding the word embedding vector based on the position embedding vector and the segment embedding vector;

converting the word embedding vector after the coding processing into a sentence embedding vector;

and determining the target emotion category of the text information according to the sentence embedding vector.

4. The text information processing method according to claim 1, wherein the specified type of the embedded vector includes: word embedding vectors, position embedding vectors and segment embedding vectors;

the determining the target emotion category of the text information according to the embedded vector of the specified type of the text information comprises the following steps:

performing convolution processing on the word embedding vector, and converting the word embedding vector after the convolution processing into a first sentence embedding vector;

the word embedding vector is coded based on the position embedding vector and the segment embedding vector, and the word embedding vector after the coding processing is converted into a second sentence embedding vector;

determining a first probability that the text information corresponds to a plurality of different sample emotion classes according to the first sentence embedding vector, and determining a second probability that the text information corresponds to the plurality of different sample emotion classes according to the second sentence embedding vector;

and determining the target emotion category of the text information according to the first probability and the second probability.

5. The method of claim 4, wherein determining the emotion classification of the target text according to the first probability and the second probability comprises:

respectively determining a first probability and a second probability corresponding to each sample emotion category;

carrying out mean processing on the first probability and the second probability to obtain a target probability corresponding to each sample emotion category;

and selecting the target emotion category with the maximum target probability value from the plurality of sample emotion categories to be determined as the target emotion category of the text information.

6. The text information processing method according to claim 4, further comprising:

performing word segmentation processing on the text information to obtain a corresponding word set, generating vector representation of the word set, and obtaining a word embedding vector;

and determining a text sequence of the text information, and generating a vector representation of the position of each word according to the position of each word in the text sequence to obtain a position embedded vector.

7. The method of claim 5, further comprising, prior to generating the vector representation of the set of words:

detecting an emoticon in the text message;

extracting characters in the expression packet;

performing word segmentation processing on the characters, and updating the word set based on the obtained words;

and deleting the words with the word frequency smaller than the preset word frequency in the updated word set.

8. The method of claim 1, wherein determining the probability that the sentence vector of the text message corresponds to different sample emotion categories according to the sentence vector of the text message, and determining the target emotion category of the text message from the different sample emotion categories according to the probability comprises:

calculating the similarity between the sentence vector and each sample sentence vector in the sample sentence vector set;

determining a preset number of target sample sentence vectors from the sample sentence vector set according to the sequence of similarity from large to small;

determining weighting information of sample emotion categories corresponding to each target sample sentence vector according to the similarity value;

based on the weighting information, carrying out weighting processing on the sample emotion types corresponding to the target sample sentence vectors;

and determining the target emotion type of the text information according to the weighting processing result.

9. The information processing method according to any one of claims 1 to 8, further comprising, before quantizing emotion words in the text information according to a preset rule:

detecting the text information based on a sample word set, wherein the sample word set comprises sample words belonging to the same emotion category;

when the text information is detected to have the content matched with the sample word set, determining the emotion category corresponding to the sample word set as the target category of the text information.

10. The information processing method according to claim 9, prior to detecting the text information based on the sample word set, further comprising:

detecting the text information based on a preset text library;

and when a preset text which is the same as the text information exists in the preset text library, determining the emotion category associated with the preset text as the target emotion category of the text information.

11. The information processing method of claim 10, wherein after determining probabilities of the textual information corresponding to the different sample emotion categories from the sentence vector of the textual information, and determining the target emotion category of the textual information from the different sample emotion categories based on the probabilities, further comprising:

establishing a first association relation between the determined target emotion category and the text information;

and updating the preset text library based on the first incidence relation.

12. The information processing method of claim 10, further comprising, after determining the target emotion classification of the text information from the embedded vector of the type specified by the text information:

establishing a second association relation between the determined target emotion category and the text information;

and updating the preset text library based on the second incidence relation.

13. A text information processing apparatus characterized by comprising:

the acquisition unit is used for acquiring text information to be processed;

a first determining unit, configured to determine, when a target candidate text matching the text information exists in a candidate text set, an emotion category corresponding to the target candidate text as a target emotion category of the text information;

the length detection unit is used for detecting the text length of the text information when a target candidate text matched with the text information does not exist in the candidate text set;

the second determining unit is used for determining the probability of the text information corresponding to a plurality of different sample emotion categories according to the sentence vector of the text information if the text length is smaller than or equal to a preset value, and determining the target emotion category of the text information from the plurality of different sample emotion categories according to the probability;

and the third determining unit is used for determining the target emotion category of the text information according to the embedded vector of the type specified by the text information if the text length is larger than the preset value, wherein the embedded vectors of different types are obtained based on the characteristics of the text information in different dimensions and the correlation among the characteristics.

14. A computer-readable storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the information processing method according to any one of claims 1 to 12.

15. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the information processing method according to any one of claims 1 to 12 when executing the program.

Technical Field

The present application relates to the field of information processing technologies, and in particular, to a text information processing method and apparatus, a storage medium, and an electronic device.

Background

With the development of the internet and the mobile communication network, and with the rapid development of the processing capability and the storage capability of the electronic device, a large number of application programs are rapidly spread and used, and especially, the application programs can be used for users to release media information such as texts, pictures, sounds or videos.

The text sentiment analysis is also called opinion mining, tendency analysis and the like, and particularly relates to a process for analyzing, processing, inducing and reasoning subjective texts with sentiment colors. A great deal of user-participated valuable review information for such things as people, events, products, etc. is generated on the internet. The comment information expresses various emotional colors and emotional tendencies of people, such as happiness, anger, grief, music and criticism, praise and the like. Therefore, by performing emotion analysis on the media information, the analysis result can provide a high reference value for decision making in application scenes such as information auditing, user portrait portrayal, content recommendation and the like.

Disclosure of Invention

The embodiment of the application provides a text information processing method and device, a storage medium and electronic equipment, which can improve the processing speed and the processing effect of text information.

The embodiment of the application provides a text information processing method, which comprises the following steps:

acquiring text information to be processed;

when the text information has emotion words, carrying out quantitative processing on the emotion words in the text information according to a preset rule, and determining the target emotion type of the text information according to a quantitative processing result;

when the emotional words do not exist in the text information, detecting the text length of the text information;

if the text length is smaller than or equal to a preset value, determining the probability of the text information corresponding to a plurality of different sample emotion categories according to the sentence vector of the text information, and determining the target emotion category of the text information from the plurality of different sample emotion categories according to the probability;

and if the text length is larger than the preset value, determining the target emotion category of the text information according to the embedded vector of the type specified by the text information, wherein the embedded vectors of different types are obtained based on the characteristics of the text information on different dimensions and the correlation among the characteristics.

Correspondingly, an embodiment of the present application further provides a text information processing apparatus, including:

the acquisition unit is used for acquiring text information to be processed;

a first determining unit, configured to determine, when a target candidate text matching the text information exists in a candidate text set, an emotion category corresponding to the target candidate text as a target emotion category of the text information;

the length detection unit is used for detecting the text length of the text information when a target candidate text matched with the text information does not exist in the candidate text set;

the second determining unit is used for determining the probability of the text information corresponding to a plurality of different sample emotion categories according to the sentence vector of the text information if the text length is smaller than or equal to a preset value, and determining the target emotion category of the text information from the plurality of different sample emotion categories according to the probability;

and the third determining unit is used for determining the target emotion category of the text information according to the embedded vector of the type specified by the text information if the text length is larger than the preset value, wherein the embedded vectors of different types are obtained based on the characteristics of the text information in different dimensions and the correlation among the characteristics.

Correspondingly, the embodiment of the application also provides a computer-readable storage medium, and the storage medium stores a plurality of instructions, and the instructions are suitable for being loaded by a processor to execute the text information processing method.

Correspondingly, the embodiment of the present application further provides an electronic device, which includes a memory, a processor and a computer program stored in the memory and executable on the processor, and the processor implements the text information processing method as described above when executing the program.

In the embodiment of the application, when the text information has the emotion words, the emotion words in the text information are quantized according to a preset rule, and the target emotion category of the text information is determined according to a quantization processing result; when the emotional words do not exist in the text information, detecting the text length of the text information; if the text length is smaller than or equal to a preset value, determining the probability of the text information corresponding to a plurality of different sample emotion categories according to the sentence vector of the text information, and determining the target emotion category of the text information from the plurality of different sample emotion categories according to the probability; and if the text length is larger than the preset value, determining the target emotion category of the text information according to the embedded vector of the specified type of the text information. In the scheme, the text information is processed according to the logic architecture with the complexity of calculation from easy to difficult, and the processing speed and the processing effect of the text information are improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flowchart of a text information processing method according to an embodiment of the present application.

Fig. 2 is a schematic structural diagram of a TF-ID model provided in an embodiment of the present application.

Fig. 3 is a schematic structural diagram of a Text CNN model provided in an embodiment of the present application.

Fig. 4 is a schematic structural diagram of the BERT model provided in the embodiment of the present application.

Fig. 5 is a schematic diagram of a model online deployment process provided in an embodiment of the present application.

FIG. 6 is a schematic diagram of a model architecture of text emotion analysis provided in an embodiment of the present application.

Fig. 7 is a schematic structural diagram of a text information processing apparatus according to an embodiment of the present application.

Fig. 8 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Fig. 9 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The embodiment of the application provides a text information processing method and device, a storage medium and electronic equipment. The text information processing device can be integrated into an electronic device having a storage unit and a microprocessor and having an arithmetic capability, such as a tablet pc (personal computer) or a mobile phone.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, to obtain knowledge and to use the knowledge to obtain the best results, so that the machine has the functions of perception, reasoning and decision making.

Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, and the like.

In the scheme, a large number of natural language processing technologies are adopted to understand and classify the text information and analyze the emotion expressed in the text information. According to the characteristics of different text processing algorithms or models, different text information is intelligently matched with a proper algorithm or model, and the text information is analyzed and processed in a targeted manner, so that the purpose of intelligently processing the text is achieved.

The following are detailed below. The numbers in the following examples are not intended to limit the order of preference of the examples. Referring to fig. 1, fig. 1 is a schematic flow chart of a text information processing method according to an embodiment of the present application. The specific flow of the text information processing method may be as follows:

101. and acquiring text information to be processed.

The text information may include different levels of text such as words, sentences, paragraphs, chapters, and the like. The text information may be obtained in real time or from a predetermined database.

Taking real-time acquisition as an example, when the content input by the user through the electronic device is text content, the text content can be directly acquired as text information to be processed.

When the input content is image content, the characters in the picture can be identified and extracted by utilizing an image identification technology, and the extracted characters are used as text information; or identifying the meaning expressed by the picture, and generating corresponding character expression according to the identified meaning so as to obtain text information.

When the input content is video content, the video content can be divided into image frames, characters in the images are identified through an image identification technology, and the extracted characters are used as text information; or identifying the meaning expressed by the picture, and generating corresponding character expression according to the identified meaning so as to obtain text information.

When the input content is voice, the voice can be converted into text content by using a voice recognition technology, so that text information is obtained.

102. When the text information has the emotion words, the emotion words in the text information are quantized according to a preset rule, and the target emotion category of the text information is determined according to the quantization processing result.

In this embodiment, the target emotion categories may include: positive (i.e., positive), negative (negative), and neutral. Specifically, to analyze whether a sentence is positive or negative, the simplest and most basic method is to find the emotional words in the sentence. The emotion words refer to words expressing emotion, which can be roughly classified into the following types: "happy", "angry", "sade", "happy", "tired", "terrorist", "scare", "hate", "frighten", "thinking", "quiet", "calm", "disappointing" and "excited". Positive emotion words such as: praise, good, magnificent, negative emotion words such as: bad, rotten, bad.

In the embodiment of the application, the emotional words are quantized, namely, the degree of the emotion on the expressed emotion is converted into a physical quantity which can be measured by using a numerical value. Specifically, modifiers or specific punctuations in the text for the emotion word may be digitized, and the emotion value of the emotion word in the text is determined according to the digitized value, so as to realize quantization of the emotion word. Modifiers may include words of degree (e.g., "very," "extremely," "normal"), negatives (e.g., "no"), punctuation of the degree (e.g., an exclamation point), and the like. Different values are given to different modifiers in advance, and corresponding calculation rules are formulated, so that the modifiers or specific punctuations of the emotional words can be digitized, and the emotional values obtained after the digitization are used as the quantitative processing results of the emotional words. In the embodiment, the target emotion category of the text information can be determined by performing quantization processing on the emotion words in the text information through the emotion dictionary. The specific algorithm equipment of the emotion dictionary can be as follows:

the first step is as follows: reading a text, and separating sentences of the text;

the second step is that: searching for emotion words for clauses, recording whether the emotion words are positive or negative, and positions;

the third step: searching the degree words before the emotion words, and stopping searching when the degree words are found; setting a weight value for the degree word, and multiplying the weight value by an emotion value;

the fourth step: searching negative words before the emotional words, and searching complete negative words, if the number of the negative words is an odd number, multiplying the negative words by-1, and if the negative words is an even number, multiplying the negative words by 1;

the fifth step: judging whether the end of the clause has an exclamation mark, if so, searching for an emotional word forward, and if so, obtaining a corresponding emotional value of + 2;

and a sixth step: calculating the emotion values of all the clauses of one comment, and recording the emotion values by using an array;

the seventh step: calculating and recording the emotion values of all comments;

eighth step: and calculating the positive emotion mean, the negative emotion mean, the positive emotion variance and the negative emotion variance of each comment by sentence division.

For example, the sentence "your practice leaves me very unsatisfied! ", where the positive emotion word is" happy ", the level word is" very ", the negative word is" not ", and a particular punctuation mark"! ", no negative emotion words. Assuming that the weighting value of "very" is set to 80%, it can be calculated that the positive emotion value of the sentence is 0, the negative emotion value is-2.8, and-2.8 is the result of the quantization process of the emotion words in the sentence, and it can be determined that the emotion classification of the text information is negative.

In some embodiments, before performing quantization processing on emotion words in the text information according to a preset rule, the following process may be further included:

(11) detecting the text information based on a sample word set, wherein the sample word set comprises sample words belonging to the same emotion category;

(12) when the text information is detected to have the content matched with the sample word set, determining the emotion category corresponding to the sample word set as the target category of the text information.

Specifically, the text may be matched with a sample word set corresponding to a blacklist word bank, a meaningless text rule, and the like according to a preset matching rule to match the corresponding sample word.

In some embodiments, before detecting the text information based on the sample word set, further comprising:

detecting the text information based on a preset text library;

and when a preset text which is the same as the text information exists in the preset text library, determining the emotion category associated with the preset text as the target emotion category of the text information.

Specifically, taking a media comment as an example, since a media website often has a user posting behavior, a latest analyzed comment can be stored in a preset text library, and a repeated comment is filtered by taking the comment in the preset text library as a reference, so as to reduce the prediction pressure of a subsequent model.

103. And when the emotional words do not exist in the text information, detecting the text length of the text information.

When the emotion words do not exist in the text information, the simple emotion dictionary representation mode cannot meet the requirement of processing the text information. At the moment, the emotion of the text information needs to be analyzed in a more intelligent mode. Because the text information has different lengths, if a complicated algorithm is adopted to analyze the text new information with shorter length, resources are wasted; if a simple algorithm is used to perform emotion analysis on a long text, a lot of useful information may be lost. Therefore, different algorithm models can be adopted for processing according to different text lengths, so that the purpose of saving resources is achieved while the accuracy is improved.

25页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种多源管理条款的语义互斥的智能检测方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!