Emotional score analysis processing method based on emotional dictionary entity

文档序号:661720 发布日期:2021-04-27 浏览:32次 中文

阅读说明:本技术 一种基于情感词典实体的情感得分的分析处理方法 (Emotional score analysis processing method based on emotional dictionary entity ) 是由 张娴 王盼盼 周庆勇 于 2021-01-08 设计创作,主要内容包括:本发明提供一种基于情感词典实体的情感得分的分析处理方法,属于自然语言处理领域,本发明包括6个步骤:1)词典准备;2)建立定义实体的结构,3)建立实体比较器;4)根据建立的实体遍历待分析文本,产生所有的候选实体;5)筛选候选实体;6)计算情感得分。本方法使用情感词典等四个词典创建实体,在实体的遍历上进行了细粒度的处理,减小了误差。(The invention provides an analysis processing method of emotion scores based on an emotion dictionary entity, which belongs to the field of natural language processing and comprises 6 steps: 1) preparing a dictionary; 2) establishing a structure defining an entity, and 3) establishing an entity comparator; 4) traversing the text to be analyzed according to the established entity to generate all candidate entities; 5) screening candidate entities; 6) an emotion score is calculated. The method uses four dictionaries such as an emotion dictionary to create the entity, and fine-grained processing is performed on the traversal of the entity, so that errors are reduced.)

1. An analysis processing method of emotion score based on emotion dictionary entity is characterized in that,

comprises 6 steps:

1) preparing a dictionary;

2) establishing a structure for defining an entity;

3) establishing an entity comparator;

4) traversing the text to be analyzed according to the established entity to generate all candidate entities;

5) screening candidate entities;

6) an emotion score is calculated.

2. The method of claim 1,

four dictionaries of emotion words, degree adverbs, negative words and punctuation marks need to be prepared first.

3. The method of claim 2,

the four dictionaries come from a general dictionary or a custom dictionary of a specific industry according to specific requirements; wherein the content of the first and second substances,

the method comprises the following steps of representing that positive emotion words are assigned to positive scores and the stronger emotions are, the higher scores are, and negative emotion words are assigned to negative scores and the stronger emotions are, the lower scores are; each degree adverb in the degree adverb dictionary is assigned with a different score according to different expressed strengths, and the score is larger when the degree represented by a general degree word is higher; the negative adverbs are a dictionary of simple negative words; punctuation dictionaries are also commonly used for punctuation or segmentation.

4. The method of claim 1,

the entity structure comprises an entity name, an entity starting index, an entity ending index, an entity type and an entity length, wherein the entity type is divided into emotional words, degree adverbs, negative words and punctuation marks.

5. The method of claim 1,

an entity comparator is established, namely two entities are set: and if the initial position of the entity I is larger than that of the entity II, returning to 1, if the initial position of the entity I is smaller than that of the entity II, returning to-1, and the initial positions of the two entities are equal, comparing the lengths of the two entities, if the length of the entity I is larger than that of the entity II, returning to 1, and otherwise, returning to-1.

6. The method of claim 1,

generating candidate entities, giving a text to be analyzed, sequentially traversing the four dictionaries, if words in the dictionaries appear in the text, constructing a corresponding entity by the words, putting the entity into a candidate entity list, generating all the candidate entities after traversing the four dictionaries, and sequencing the candidate entities according to a defined filter, wherein the candidate entity list is a list generated according to the size of a starting position.

7. The method of claim 6,

when screening entities, the candidate entity list is searched iteratively, if the initial indexes of the following entities are consistent with the initial index of the current entity, the longest entity is found and used as the entity of the current index, the initial index of the next word is larger than the end index of the longest entity, the index of the current entity is smaller than the end index of the last entity, the next entity is judged by directly skipping, and finally the required entity list is obtained.

8. The method of claim 7,

and traversing the generated final entity list, directly skipping if the type of the current entity is not an emotional entity, and if the type of the current entity is the emotional entity, searching the position of the emotional entity or the punctuation mark entity closest to the emotional entity forward according to the position of the entity as an index, and simultaneously recording the number of all emotional entities.

9. The method of claim 8,

calculating the emotion score of the current emotional entity: the initial weight of the emotional entity is the score of the emotional word, the negative entity and the degree adverb entity which appear are found from the emotional entity to the position index, and the situation of the degree adverb, the degree adverb and the emotional word is removed, and the score of the emotional entity is as follows: and (3) obtaining the emotion score of the current emotional entity by the degree adverb entity score ^ times of the degree adverb entity (-1) times of the negation word entity ^ initial weight.

10. The method of claim 9,

traversing all the emotional entities, and summing all the emotional scores to obtain the emotional score of the text to be analyzed; if normalization is required, it can be divided by the number of affective entities.

Technical Field

The invention relates to the field of natural language processing, in particular to an emotion score analysis processing method based on an emotion dictionary entity.

Background

What is the sentiment analysis? Briefly, this is the process of analyzing, processing, generalizing, and reasoning subjective text with emotional colors. A great deal of valuable review information about people, events, products, etc. is generated on the internet (e.g., blogs and forums and social service networks such as mass reviews, beauty groups). The comment information expresses various emotional colors and emotional tendencies of people, such as happiness, anger, grief, music and criticism, praise and the like. Based on this, the potential user can know the opinion of the public opinion on a certain event or product by browsing the subjective color comments. Developments and rapid initiatives in this area benefit from rapid development of social media on the network, such as product reviews, forum discussions, micro blogs, and the like. Since the early 2000 s, emotion analysis has grown into one of the most active research areas in Natural Language Processing (NLP), and has been a widespread research in data mining, Web mining, text mining, and information retrieval. At present, the emotional direction is mainly analyzed by a text classification method or a dictionary-based method, and the classification method has the defects that labels of training samples need to be labeled manually, and manpower and material resources are consumed; the dictionary-based calculation method only considers one kind of dictionary of the emotion dictionary or has certain error in searching the emotion words.

Disclosure of Invention

In order to solve the technical problems, the invention provides an emotion score analysis processing method based on an emotion dictionary entity, which performs fine-grained processing on the traversal of the entity, reduces errors and aims to perform emotion score analysis processing on unstructured emotion text data through text processing and statistical methods.

The technical scheme of the invention is as follows:

an analysis processing method based on the emotion score of an emotion dictionary entity,

comprises 6 steps:

1) dictionary preparation

2) The structure defining the entity is established and,

3) establishing an entity comparator;

4) traversing the text to be analyzed according to the established entity to generate all candidate entities;

5) screening candidate entities;

6) an emotion score is calculated.

Further, in the above-mentioned case,

four dictionaries of emotion words, degree adverbs, negative words and punctuation marks need to be prepared first.

The four dictionaries come from a general dictionary or a custom dictionary of a specific industry according to specific requirements; wherein the content of the first and second substances,

the method comprises the following steps of representing that positive emotion words are assigned to positive scores and the stronger emotions are, the higher scores are, and negative emotion words are assigned to negative scores and the stronger emotions are, the lower scores are; each degree adverb in the degree adverb dictionary is assigned with a different score according to different expressed strengths, and the score is larger when the degree represented by a general degree word is higher; the negative adverbs are a dictionary of simple negative words; punctuation dictionaries are also commonly used for punctuation or segmentation.

Further, in the above-mentioned case,

the entity structure comprises an entity name, an entity starting index, an entity ending index, an entity type and an entity length, wherein the entity type is divided into emotional words, degree adverbs, negative words and punctuation marks.

Further, in the above-mentioned case,

an entity comparator is established, namely two entities are set: and if the initial position of the entity I is larger than that of the entity II, returning to 1, if the initial position of the entity I is smaller than that of the entity II, returning to-1, and the initial positions of the two entities are equal, comparing the lengths of the two entities, if the length of the entity I is larger than that of the entity II, returning to 1, and otherwise, returning to-1.

Further, in the above-mentioned case,

generating candidate entities, giving a text to be analyzed, sequentially traversing the four dictionaries, if words in the dictionaries appear in the text, constructing a corresponding entity by the words, putting the entity into a candidate entity list, generating all the candidate entities after traversing the four dictionaries, and sequencing the candidate entities according to a defined filter, wherein the candidate entity list is a list generated according to the size of a starting position.

Further, in the above-mentioned case,

when screening entities, the candidate entity list is searched iteratively, if the initial indexes of the following entities are consistent with the initial index of the current entity, the longest entity is found and used as the entity of the current index, the initial index of the next word is larger than the end index of the longest entity, the index of the current entity is smaller than the end index of the last entity, the next entity is judged by directly skipping, and finally the required entity list is obtained.

Further, in the above-mentioned case,

and traversing the generated final entity list, directly skipping if the type of the current entity is not an emotional entity, and if the type of the current entity is the emotional entity, searching the position of the emotional entity or the punctuation mark entity closest to the emotional entity forward according to the position of the entity as an index, and simultaneously recording the number of all emotional entities.

Calculating the emotion score of the current emotional entity: the initial weight of the emotional entity is the score of the emotional word, the negative entity and the degree adverb entity which appear are found from the emotional entity to the position index, and the situation of the degree adverb, the degree adverb and the emotional word is removed, and the score of the emotional entity is as follows: and (3) obtaining the emotion score of the current emotional entity by the degree adverb entity score ^ times of the degree adverb entity (-1) times of the negation word entity ^ initial weight.

Traversing all the emotional entities, and summing all the emotional scores to obtain the emotional score of the text to be analyzed; if normalization is required, it can be divided by the number of affective entities.

The invention has the advantages that

1. The invention is not limited to a specific field or scene, and the emotional text to be analyzed can come from fields such as news, product evaluation, public opinion analysis and the like;

2. the analysis of the text class usually performs word segmentation first, and then has a certain word segmentation error. The method does not perform operations such as basic word segmentation and the like on the text to be analyzed, so that certain accuracy is improved;

3. the user-defined method of the invention comprises four dictionaries, punctuation mark entities of sentences or paragraphs are added, the accuracy of searching the entities is improved, and modified entities for modifying the entities are searched for and corresponding weight change is carried out.

Drawings

FIG. 1 is a schematic workflow diagram of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention, and based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without creative efforts belong to the scope of the present invention.

The invention provides an analysis processing method of emotion scores based on an emotion dictionary entity, which is mainly realized by the following technical scheme and specifically comprises the following steps:

1. dictionary preparation

Firstly, four dictionaries of emotion words, degree adverbs, negative words and punctuation marks are prepared: the four dictionaries can be from general dictionaries or custom dictionaries of specific industries according to specific requirements; each emotion word in the emotion word dictionary is assigned with a certain fraction to express the strength of the emotion, which generally means that positive emotion words are assigned with positive scores and the stronger the emotion, the higher the score is, and negative emotion words are assigned with negative scores and the stronger the emotion, the lower the score is; each degree adverb in the degree adverb dictionary is assigned with a different score according to different expressed strengths, and the score is larger when the degree represented by a general degree word is higher; the negative adverbs are a dictionary of simple negative words; punctuation dictionaries are also commonly used for punctuation or segmentation.

2. Defining the structure of an entity

The entity structure comprises an entity name, an entity starting index, an entity ending index, an entity type and an entity length, wherein the entity type is divided into emotional words, degree adverbs, negative words and punctuation marks. Subsequent calculation steps will use these specific properties of the entity for calculation.

3. Building a physical comparator

For example, there are two entities, i.e., entity one and entity two, and if the starting position of entity one is greater than the starting position of entity two, return to 1, if entity one is less than the starting position of entity two, return to-1, if the starting positions of the two entities are equal, compare the lengths of the two entities, and if the length of entity one is greater than the length of entity two, return to 1, otherwise return to-1.

4. Generating candidate entities

And giving a text to be analyzed, sequentially traversing the four dictionaries, if a word in the dictionary appears in the text, constructing a corresponding entity by the word, putting the entity into a candidate entity list, generating all candidate entities after traversing the four dictionaries, and sequencing the candidate entities according to a defined filter, wherein the candidate entity list is a list generated according to the size of the starting position.

5. Screening candidate entities

And iteratively searching the candidate entity list, if the initial indexes of the subsequent entities are consistent with the initial index of the current entity, finding the longest entity as the entity of the current index, directly skipping the initial index of the next word which is larger than the ending index of the longest entity and the index of the current entity which is smaller than the ending index of the previous entity, and judging the next entity. And finally obtaining the required entity list.

6. Calculating an emotion score

And traversing the final entity list generated in the previous step, directly skipping if the current entity type is not an emotional entity, and if the current entity type is the emotional entity, searching forward the position of the emotional entity or the punctuation mark entity closest to the emotional entity according to the position of the entity as an index, and simultaneously recording the number of all the emotional entities. Calculating the emotion score of the current emotion entity, wherein the initial weight of the emotion entity is the score of the emotion word, finding the negative entity and the degree adverb entity from the emotion entity to the position index, and removing the situation of the degree adverb and the degree adverb, and the score of the emotion entity is as follows: and (3) obtaining the emotion score of the current emotional entity by the degree adverb entity score ^ times of the degree adverb entity (-1) times of the negation word entity ^ initial weight. And traversing all the emotional entities, and summing all the emotional scores to obtain the emotional score of the text to be analyzed. If normalization is required, it can be divided by the number of affective entities.

The invention can be adjusted according to actual requirements, for example, specific contents of four dictionaries are customized according to actual requirements, and corresponding personalization is performed on specific details, for example, the definition of emotion words in different industries is possibly different, and optimization can be performed through modification of emotion dictionaries. In the method, the combination of four dictionaries is considered, and weights can be given to different combination forms, for example, when the degree adverb, the degree adverb and the emotional word, a user highlights the combination more and can assign corresponding weights, so that the method has great applicability and expandability.

The method does not perform operations such as word segmentation and filtering on the text to be analyzed, and reduces errors caused by inaccurate processing of information by operations such as word segmentation. Candidate entities are generated in an entity traversal mode, further entity screening is performed according to the candidate entities and designed corresponding rules, final entities are reserved, and accuracy is improved. And finally, calculating to obtain emotion scores according to the text to be analyzed, and carrying out standardization or normalization, wherein the user can divide emotion grades according to needs.

The above description is only a preferred embodiment of the present invention, and is only used to illustrate the technical solutions of the present invention, and not to limit the protection scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

8页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:房源标题生成模型的训练方法、生成方法、装置以及设备

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!