Microblog emotion analysis method based on expression dictionary and emotion common sense

文档序号:1215830 发布日期:2020-09-04 浏览:6次 中文

阅读说明:本技术 基于表情词典与情感常识的微博情感分析方法 (Microblog emotion analysis method based on expression dictionary and emotion common sense ) 是由 徐新燕 张顺香 朱广丽 于 2020-05-25 设计创作,主要内容包括:本发明提供一种基于表情词典与情感常识的微博情感分析方法。其包括对某一话题下的微博文本数据进行采集、预处理和分词操作;选取高频使用表情构建微博表情词典;抽取ConceptNet语义库的二元搭配,进行情感标注,并用同义词词典进行扩展形成情感常识库;根据表情符号和情感常识的权值计算来对微博进行情感分析等步骤。本发明利用表情词典与情感常识相结合的方式来判定微博的情感分类,在情感分析任务中融合了网络用语和表情符号等显性特征以及情感常识等隐性特征,在很大程度上可以深度挖掘微博文本所要表达的隐含情感,从而提高情感分析的准确性。(The invention provides a microblog emotion analysis method based on an expression dictionary and emotion common knowledge. The method comprises the steps of collecting, preprocessing and segmenting microblog text data under a certain topic; selecting high-frequency used expressions to construct a microblog expression dictionary; extracting binary collocation of a concept net semantic library, carrying out emotion marking, and expanding by using a synonym dictionary to form an emotion common sense library; and performing emotion analysis on the microblog according to the expression symbols and weight calculation of the emotion common knowledge. According to the method, the emotion classification of the microblog is judged by combining the expression dictionary with the emotion common sense, explicit characteristics such as network expressions and expression symbols and implicit characteristics such as emotion common sense are fused in the emotion analysis task, and the implicit emotion to be expressed in the microblog text can be deeply mined to a great extent, so that the accuracy of emotion analysis is improved.)

1. A microblog emotion analysis method based on an expression dictionary and emotion common sense is characterized in that: the microblog emotion analysis method based on the expression dictionary and the emotion common sense comprises the following steps:

(1) acquiring a microblog text under a specified topic, preprocessing and segmenting text data, and selecting high-frequency emoticons to construct a microblog emoticon dictionary;

(2) extracting binary entities with obvious emotional tendency in the concept net as a common sense candidate set, and filtering the binary emotion common sense collocation containing the explicit emotion;

(3) calculating the emotion polarity of the binary entity candidate set;

(4) expanding the coverage range of the emotion general knowledge by using the synonym forest of the Haughard to form an emotion general knowledge library;

(5) and (4) calculating according to the emotion weight in the step (1) and the emotion common sense weight in the step (3) to carry out emotion classification on the microblog text.

2. The microblog emotion analysis method based on the expression dictionary and the emotion common sense of claim 1, wherein: in the step (1), the method for collecting microblog texts under the specified topic, preprocessing and segmenting text data, and selecting high-frequency emoticons to construct a microblog emoticon dictionary comprises the following steps: collecting microblog texts under a certain topic, and preprocessing collected microblog data, wherein the preprocessing operation mainly comprises removing noise information including "# topic #", "@ username", pictures, videos, webpage links and the like; and then, performing word segmentation by using an ICTCCLAS word segmentation tool of the Chinese academy of sciences, finally extracting high-frequency used expression symbols, constructing an expression dictionary, and manually marking the emotional intensity.

3. The microblog emotion analysis method based on the expression dictionary and the emotion common sense of claim 1, wherein: in step (2), the binary entities with significant emotional tendency are extracted as a common sense candidate set, mainly because most of the common sense knowledge contained in the ConceptNet does not have emotional tendency.

4. The microblog emotion analysis method based on the expression dictionary and the emotion common sense of claim 1, wherein: in step (3), the emotion polarity calculation of the binary entity candidate set is mainly performed by using the mean difference value of the similarity of the sememes.

5. The microblog emotion analysis method based on the expression dictionary and the emotion common sense of claim 1, wherein: in the step (4), the method for expanding the coverage range of the emotional general knowledge by using the synonym forest of Hadamard is as follows: in the emotion common sense marked with polarity, synonym forest replacement is carried out on two entities respectively, and the replaced synonym tuples are expanded into the existing emotion common sense library.

6. The microblog emotion analysis method based on the expression dictionary and the emotion common sense of claim 1, wherein: in the step (5), the influence of the emoticons and the microblog texts on the trend value result is comprehensively considered for the emotion value of the whole microblog message, whether binary emotion common sense matching exists or not is searched in the microblog text, if matching exists, the existing binary emotion common sense with the annotated emotion weight replaces the existing word matching of the microblog text to calculate the emotion trend of the microblog text, and the emotion trend value of the whole microblog message is weighted and calculated to obtain the emotion trend of the whole microblog.

Technical Field

The invention belongs to the technical field of text emotion analysis in natural language processing, and particularly relates to a microblog emotion analysis method based on an expression dictionary and emotion common sense.

Background

Microblogs have gradually become an important emerging social network platform in the internet era, and users can share personal life on the microblogs through web pages or clients, release personal views and exchange and interact with friends. Until now, the number of microblog users exceeds 3 hundred million, and massive microblog data contain microblog resources with subjective emotional tendencies of a plurality of users, so that the research on how to efficiently mine topics and emotions hidden in the frequent and complicated microblog messages is helpful for public opinion analysis and network supervision of governments and public opinion guidance of enterprises and public institutions on concerned topics.

However, due to the unique originality and unpredictability of microblogs and other characteristics, the emotion value is calculated by the existing microblog emotion analysis basically from dominant characteristics such as emoticons and network words, the latent emotion in a microblog text often has an important influence on judgment of emotion tendencies, the expression of the latent emotion does not contain emotion words to a great extent, and a reader needs to have a certain knowledge background to discover some hidden emotions which are conveyed in a hidden way through reasoning.

Disclosure of Invention

In order to solve the problems, the invention aims to provide a microblog emotion analysis method based on an expression dictionary and emotion common sense.

In order to achieve the purpose, the microblog emotion analysis method based on the expression dictionary and the emotion common sense comprises the following steps of sequentially:

(1) acquiring a microblog text under a specified topic, preprocessing and segmenting text data, and selecting high-frequency emoticons to construct a microblog emoticon dictionary;

(2) extracting binary entities with obvious emotional tendency in the concept net as a common sense candidate set, and filtering the binary emotion common sense collocation containing the explicit emotion;

(3) calculating the emotion polarity of the binary entity candidate set;

(4) expanding the coverage range of the emotion general knowledge by using the synonym forest of the Haughard to form an emotion general knowledge library;

(5) calculating the emotion classification of the microblog texts according to the emotion weight in the step (1) and the emotion common sense weight in the step (3);

in the step (1), the method for collecting microblog texts under the specified topic, preprocessing and segmenting text data, and selecting high-frequency emoticons to construct a microblog emoticon dictionary comprises the following steps: collecting microblog texts under a certain topic, and preprocessing collected microblog data, wherein the preprocessing operation mainly comprises removing noise information including "# topic #", "@ username", pictures, videos, webpage links and the like; and then, performing word segmentation by using an ICTCCLAS word segmentation tool of the Chinese academy of sciences, finally extracting high-frequency used expression symbols, constructing an expression dictionary, and manually marking the emotional intensity.

In step (2), the binary entities with significant emotional tendency are extracted as a common sense candidate set, mainly because most of the common sense knowledge contained in the ConceptNet does not have emotional tendency.

In step (3), the emotion polarity calculation of the binary entity candidate set is mainly performed by using the mean difference value of the similarity of the sememes. The method for obtaining the emotional tendency of the emotional common sense comprises the following steps:

1) the maximum similarity between two words is calculated:

calculating semantic similarity by using distance in the perceptron tree to obtain semantic similarity of words, and calculating the similarity of 2 Chinese words w1And w2If w is1There are n concepts x1,x2,…,xn,w2There are m concepts of y1,y2,…,ynDefining w1And w2Is the maximum value of the similarity of the respective concepts, namely:

S(W1,W2)=max(S(xi,yj))i∈(1,n)j∈(1,m) (1)

Figure BDA0002507122510000021

wherein λ is a positive variable parameter; d (x)1,y2) Represents an atom x1And the synonym y2Distance in the hierarchical tree;

2) obtaining word emotional tendency through mean difference of similarity of the senses:

for any word, the emotional tendency value can be obtained through the distance between the word and the seed word in the emotional dictionary. And comparing the word W with each seed word in the emotion dictionary to obtain a positive emotion tendency value and a negative emotion tendency value, and finally obtaining the emotion tendency value of the word W by comparing the mean difference values of the positive emotion tendency value and the negative emotion tendency value. The emotional tendency calculation formula of the word W is as follows:

wherein, PiA seed word representing positive emotion; n is a radical ofjA certain seed word representing negative emotion;

in the step (4), the method for expanding the coverage range of the emotional general knowledge by using the synonym forest of Hadamard is as follows: in the emotion common sense marked with polarity, synonym forest replacement is carried out on two entities respectively, and the replaced synonym tuples are expanded into the existing emotion common sense library. For example: (school, vacation) and a new emotional general knowledge can be formed by expanding the synonym of school, college, to the left (college, vacation) and expanding the synonym of vacation to the right (school, vacation).

In the step (5), the influence of the emoticons and the microblog texts on the trend value result needs to be comprehensively considered for the emotion value of the whole microblog message. And searching whether binary emotion common sense matching exists in the microblog text, if so, replacing the existing word matching of the microblog text with the existing binary emotion common sense with the annotated emotion weight to calculate the emotion tendency of the microblog text, and performing emotion tendency value weighting processing on the two parts to calculate the whole microblog emotion tendency. The method for obtaining the emotional tendency of the whole microblog text comprises the following steps:

1) the emotional tendency of the expression can be obtained by calculating the weight of the expression symbol, namely:

wherein E isiAnd the emotional intensity of the ith expression in a certain microblog message.

2) The emotion value tendency formula of the whole microblog message is as follows:

and if the obtained Q value is larger than 0, the microblog emotional tendency is positive, if the Q value is smaller than 0, the microblog emotional tendency is negative, and if the Q value is equal to 0, the microblog emotional tendency is neutral.

The microblog emotion analysis method based on the expression dictionary and the emotion common sense provided by the invention has the following advantages: (1) according to the method, the emotional tendency of the microblog message is judged by using a method of combining the dominant characteristic and the recessive characteristic, and the implied emotion to be expressed by the microblog text can be deeply mined to a great extent by applying common knowledge, so that the accuracy of emotion analysis is improved. (2) The method is different from a machine learning method, does not need to use large-scale data for training, and is more suitable for real-time data processing.

Drawings

FIG. 1 is a flow diagram of the present invention.

Detailed Description

The microblog emotion analysis method based on the expression dictionary and the emotion common sense provided by the invention is explained in detail below with reference to the accompanying drawings.

As shown in fig. 1, the microblog emotion analysis method based on the expression dictionary and the emotion common sense provided by the invention comprises the following steps in sequence:

(1) acquiring a microblog text under a specified topic, preprocessing and segmenting text data, and selecting high-frequency emoticons to construct a microblog emoticon dictionary;

the microblog text under the specified topic is collected as an analysis object of the invention, the text data is preprocessed, and noise information which has little influence on the subsequent emotion analysis is removed, wherein the noise information mainly comprises a # topic #, "@ username, pictures, videos, webpage links and the like.

And then performing word segmentation by using an ICTCCLAS word segmentation tool of the Chinese academy of sciences.

And then extracting the expression symbols used at high frequency, constructing an expression emotion dictionary, and manually marking the emotion intensity, as shown in table 1.

TABLE 1 Emotion dictionary example

(2) Extracting binary entities with obvious emotional tendency in the concept net as a common sense candidate set, and filtering the binary emotion common sense collocation containing the explicit emotion;

(3) the emotion polarity calculation of the binary entity candidate set is mainly calculated by using the mean difference value of the similarity of the sememes. The method for obtaining the emotional tendency of the emotional common sense comprises the following steps:

1) the maximum similarity between two words is calculated:

calculating semantic similarity by using distance in the perceptron tree to obtain semantic similarity of words, and calculating the similarity of 2 Chinese words w1And w2If w is1There are n concepts x1,x2,…,xn,w2There are m concepts of y1,y2,…,ynDefining w1And w2Is the maximum value of the similarity of the respective concepts, namely:

S(W1,W2)=max(S(xi,yj))i∈(1,n)j∈(1,m) (1)

Figure BDA0002507122510000042

wherein λ is a positive variable parameter; d (x)1,y2) Represents an atom x1And the synonym y2Distance in the hierarchical tree.

2) Obtaining word emotional tendency through mean difference of similarity of the senses:

for any word, the emotional tendency value can be obtained through the distance between the word and the seed word in the emotional dictionary. And comparing the word W with each seed word in the emotion dictionary to obtain a positive emotion tendency value and a negative emotion tendency value, and finally obtaining the emotion tendency value of the word W by comparing the mean difference values of the positive emotion tendency value and the negative emotion tendency value. The emotional tendency calculation formula of the word W is as follows:

wherein, PiA seed word representing positive emotion; n is a radical ofjA certain seed word in negative emotion is represented.

(4) And expanding the coverage range of the emotion common sense by using the Harmony synonym forest to form an emotion common sense library, respectively replacing synonym forests for the two entities in the emotion common sense with the marked polarity, and expanding the replaced synonym tuples into the existing emotion common sense library. For example: (school and vacation) and a new emotional common sense (colleges and vacations) can be formed by expanding the synonym of school and colleges to the left, and a new emotional common sense (schools and vacations) can be formed by expanding the synonym of vacation to the right;

(5) calculating emotion classification on the microblog texts according to the emotion weight in the step (1) and the emotion common sense weight in the step (3) to obtain the emotion tendency of the whole microblog text, wherein the method comprises the following steps;

1) the emotional tendency of the expression can be obtained by calculating the weight of the expression symbol, namely:

Figure BDA0002507122510000052

wherein E isiAnd the emotional intensity of the ith expression in a certain microblog message.

2) The emotion value tendency formula of the whole microblog message is as follows:

Figure BDA0002507122510000053

and if the obtained Q value is larger than 0, the microblog emotional tendency is positive, if the Q value is smaller than 0, the microblog emotional tendency is negative, and if the Q value is equal to 0, the microblog emotional tendency is neutral.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

7页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:交互方法、装置及介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!