Semantic analysis method based on emoji

文档序号:1504920 发布日期:2020-02-07 浏览:2次 中文

阅读说明:本技术 一种基于emoji的语义解析方法 (Semantic analysis method based on emoji ) 是由 梁敏 唐军 于 2019-10-14 设计创作,主要内容包括:本发明公开了一种基于emoji的语义解析方法,所述方法包括步骤1:建立emoji表情符号文本数据采集与存储平台,采集含有emoji的文本内容进行分词处理,将emoji表情符号作为单个词组处理;步骤2:将采集到的文本数据划分为训练集和测试集;步骤3:建立算法模型,对步骤2所划分好的数据进行训练得出语义解析模型,将语义解析模型应用于新采集的文本中即可得出文本内的emoji表情符号的予以解析结果。很好的解决了现有技术中Unicode码表识别智能区分出emoji表情符号不能识别出具体含义,emoji表情符号与释义对照表不能准确构成一个可以理解的词组的问题。(The invention discloses a semantic parsing method based on emoji, which comprises the following steps of 1: establishing an emoji emoticon text data acquisition and storage platform, acquiring text contents containing emoji to perform word segmentation processing, and processing the emoji emoticon as a single word group; step 2: dividing the collected text data into a training set and a test set; and step 3: and (3) establishing an algorithm model, training the divided data in the step (2) to obtain a semantic analysis model, and applying the semantic analysis model to a newly acquired text to obtain an analysis result of emoji emoticons in the text. The problem that the emoji emoticons intelligently distinguished by the Unicode code table recognition in the prior art cannot recognize specific meanings, and the emoji emoticons and the paraphrase comparison table cannot accurately form an understandable phrase is solved.)

1. A semantic parsing method based on emoji is characterized by comprising the following steps:

step 1: establishing an emoji emoticon text data acquisition and storage platform, acquiring text contents containing emoji to perform word segmentation processing, and processing the emoji emoticon as a single word group;

step 2: dividing the collected text data into a training set and a test set;

and step 3: and (3) establishing an algorithm model, training the divided data in the step (2) to obtain a semantic analysis model, and applying the semantic analysis model to a newly acquired text to obtain an analysis result of emoji emoticons in the text.

2. The emoji-based semantic analysis method according to claim 1, wherein step 1 is implemented by crawling the post text data of each internet forum through a crawler technology, so as to establish an emoji emoticon text data acquisition and storage platform.

3. The emoji-based semantic analysis method according to claim 1, wherein the test set is a paraphrased text segment containing an emoji emoticon, the training set is a text segment of an emoji emoticon to be paraphrased, and the test set is 20% of the collected data volume extracted randomly.

4. The emoji-based semantic parsing method according to claim 1, wherein the step 3 of training the sample comprises the steps of:

step 3.1: extracting emoji emoticons in the training set, marking the positions of the emoji emoticons in the original text, wherein the emoji emoticons have a specific coding range and format in a Unicode coding collection, and the emoji emoticons appearing in the text can be screened out by constructing a regular expression;

step 3.2: calculating correlation coefficients between the front and rear phrases of the word segmentation result obtained in the step 1 by using a correlation coefficient formula, wherein the larger the correlation coefficient is, the more the occurrence frequency of phrase combinations is;

step 3.3: comparing the result containing the emoji emoticons with the result not containing the emoji emoticons, and finding out a possible paraphrase result A of the emoji emoticons by using the texts of the same front and back phrases; comparing the result set with a result set B in the emoji expression symbol paraphrase comparison table, training a BP neural network through dimensions such as word frequency [ w1], word property [ w2], pronunciation [ w3] and word meaning [ w4], finding out a result of the most matched elements in A, B two result sets as paraphrase phrases of the emoji expression symbol in the text segment, and obtaining an optimal weight combination [ w1, w2, w3 and w4 ];

and 4, step 4: and applying the model result of the training set to the test set, comparing the model output result with the result manually given by the test set, and adjusting weight combination [ w1, w2, w3 and w4] to obtain the final training model.

5. The emoji-based semantic analysis method according to claim 1, wherein in step 3.1, emoji emoticons are extracted through a regular expression.

Technical Field

The invention relates to the field of emoji analysis and semantic analysis, in particular to a semantic analysis method based on emoji.

Background

emoji, i.e., emoji, is a visual emotion symbol used in wireless communication in japan, and is drawn with a picture, and characters are pointed with characters, which can be used to represent various expressions, such as smiling face to smile, cake to show food, etc., and is gradually popular with network and mobile phone user groups. Unicode is a character encoding scheme established by the International organization to accommodate all the characters and symbols in the world, emoji ranges from E63E to E757 in Unicode encoding, and has fixed eigenvalues characterizing the emoji characters.

With the gradual popularization of networks, more and more network users are applicable to emoji emoticons in forums or communication software, more users write text contents by using pure emoji emoticons, and if an information receiver is not familiar with the emoji emoticons, correct information cannot be acquired or only partial information can be guessed.

The application of emoji emoticons to internet forums and communication software is becoming more and more widespread nowadays, and for information receivers, if the emoji emoticons are not particularly familiar, the received text information cannot be correctly interpreted, or only specific meanings can be guessed by means of personal understanding of the emoji emoticons; particularly, for a content composed of a plurality of emoji emoticons, because the sender may use harmonic sounds or association to combine the emoji emoticons, the look-up code table or the emoji paraphrase reference table cannot obtain an accurate paraphrase.

In order to enable the user to more accurately understand the meaning of the emoji emoticon expression in the current context, we use this method to translate the emoji emoticon in the text into intelligible text content.

The existing emoji identification methods mainly comprise two types:

the Unicode code table identification has the advantages that emoji emoticons in texts and corresponding Unicode codes can be identified, and the defects that only emoji emoticons can be distinguished, and specific meanings cannot be identified are overcome.

The emoji expression symbol and the paraphrase comparison table have the advantages that the meanings of a single emoji expression symbol can be compared, and the defect is that an understandable phrase cannot be accurately formed by comparing the results of the paraphrase comparison table with phrases formed by a plurality of emoji expression symbols.

Disclosure of Invention

The invention aims to provide a semantic parsing method based on emoji, which is used for solving the problems that in the prior art, a Unicode code table identification intelligently distinguishes that an emoji emoticon cannot identify a specific meaning, and the emoji emoticon and a paraphrase comparison table cannot accurately form an understandable phrase.

The invention solves the problems through the following technical scheme:

a semantic parsing method based on emoji, comprising the following steps:

step 1: establishing an emoji emoticon text data acquisition and storage platform, acquiring text contents containing emoji to perform word segmentation processing, and processing the emoji emoticon as a single word group;

step 2: dividing the collected text data into a training set and a test set;

and step 3: and (3) establishing an algorithm model, training the divided data in the step (2) to obtain a semantic analysis model, and applying the semantic analysis model to a newly acquired text to obtain an analysis result of emoji emoticons in the text.

Preferably, in the step 1, the text data of the posts in each internet forum is crawled through a crawler technology, so that an emoji emoticon text data acquisition and storage platform is established.

Preferably, the test set is a well-defined text segment containing emoji emoticons, the training set is a text segment of emoji emoticons to be defined, and the test set is 20% of the randomly extracted collected data volume.

Preferably, the training of the sample in step 3 includes the following steps:

step 3.1: extracting emoji emoticons in the training set, marking the positions of the emoji emoticons in the original text, wherein the emoji emoticons have a specific coding range and format in a Unicode coding collection, and the emoji emoticons appearing in the text can be screened out by constructing a regular expression;

step 3.2: calculating correlation coefficients between the front and rear phrases of the word segmentation result obtained in the step 1 by using a correlation coefficient formula, wherein the larger the correlation coefficient is, the more the occurrence frequency of phrase combinations is;

step 3.3: comparing the result containing the emoji emoticons with the result not containing the emoji emoticons, and finding out a possible paraphrase result A of the emoji emoticons by using the texts of the same front and back phrases; comparing the result set with a result set B in the emoji expression symbol paraphrase comparison table, training a BP neural network through dimensions such as word frequency [ w1], word property [ w2], pronunciation [ w3] and word meaning [ w4], finding out a result of the most matched elements in A, B two result sets as paraphrase phrases of the emoji expression symbol in the text segment, and obtaining an optimal weight combination [ w1, w2, w3 and w4 ];

and 4, step 4: and applying the model result of the training set to the test set, comparing the model output result with the result manually given by the test set, and adjusting weight combination [ w1, w2, w3 and w4] to obtain the final training model.

Preferably, in step 3.1, the emoji emoticons are extracted through a regular expression.

Compared with the prior art, the invention has the following advantages and beneficial effects:

the method and the system utilize historical data and text data of a large number of users using the emoji to give the most possible paraphrasing result of the emoji emoticons in the text, and reduce the difficulty of understanding the emoji emoticons by information receivers. In addition, because emoji is a standardized symbolic language which is popular worldwide and managed by special institutions, users in different countries and regions can use emoji emoticons more easily and conveniently by using the method and the system. The problem that the emoji emoticons intelligently distinguished by the Unicode code table recognition in the prior art cannot recognize specific meanings, and the emoji emoticons and the paraphrase comparison table cannot accurately form an understandable phrase is solved.

Drawings

FIG. 1 is a schematic flow chart of the semantic analysis method based on emoji of the present invention.

Detailed Description

The present invention will be described in further detail with reference to examples, but the embodiments of the present invention are not limited thereto.

6页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:图片处理方法、装置、设备及存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!