Article application analysis method and system based on natural language processing

文档序号：1170278 发布日期：2020-09-18 浏览：13次中文

阅读说明：本技术 基于自然语言处理的物品应用分析方法及系统 (Article application analysis method and system based on natural language processing ) 是由崔亿萍于 2020-06-09 设计创作，主要内容包括：本发明涉及数据处理技术领域,具体提供了一种基于自然语言处理的物品应用分析方法及系统,旨在解决如何准确且高效地对海量繁杂的用户交流数据进行物品应用分析,以确定目标物品的应用状态的技术问题。为此目的,根据本发明一个实施例的方法,首先对交流数据中的每句对话信息进行对话序号以及唯一标识符设置；然后对每句对话信息进行数据清洗,随后根据对话序号与唯一标识符将数据清洗后的对话信息重新组合还原成交流数据；最后根据神经网络分类模型对还原后的交流数据进行目标物品推荐类别识别,根据识别结果输出目标物品的应用状态。通过上述步骤,能够从关于物品的交流数据中准确且快速地识别出当前物品的推荐类别,进而得出物品的应用状态。(The invention relates to the technical field of data processing, in particular to an article application analysis method and system based on natural language processing, and aims to solve the technical problem of accurately and efficiently carrying out article application analysis on massive and complicated user communication data to determine the application state of a target article. For this purpose, according to the method of an embodiment of the present invention, firstly, a session serial number and a unique identifier are set for each sentence of session information in the communication data; then, cleaning data of each sentence of dialogue information, and recombining and restoring the dialogue information after the data cleaning into the dialogue data according to the dialogue serial number and the unique identifier; and finally, carrying out target article recommendation type identification on the restored alternating current data according to a neural network classification model, and outputting the application state of the target article according to an identification result. Through the steps, the recommended category of the current article can be accurately and quickly identified from the communication data about the article, and the application state of the article is further obtained.)

1. An article application analysis method based on natural language processing, the method comprising:

the method comprises the steps of obtaining communication data of communication objects related to a target object, wherein the communication data comprise a plurality of communication object identifications and conversation information corresponding to each communication object identification;

generating a dialogue serial number corresponding to each sentence of dialogue information in each exchange data according to the exchange sequence corresponding to each sentence of dialogue information in each exchange data; acquiring characteristic information of each piece of alternating current data and respectively setting a unique identifier corresponding to each sentence of dialogue information in each piece of alternating current data according to the characteristic information;

the method comprises the steps of cleaning data of dialogue information in each piece of alternating current data, then obtaining dialogue information with the same unique identifier in the data cleaned dialogue information, sequencing the dialogue information with the same unique identifier according to dialogue serial numbers, and generating alternating current data to be processed according to sequencing results; the to-be-processed communication data comprise dialogue information which is arranged according to the dialogue serial number and is subjected to data cleaning, and a communication object identifier and a dialogue serial number which correspond to each sentence of dialogue information;

and performing target article recommendation type identification on the to-be-processed communication data according to a preset neural network classification model, and outputting the application state of the target article according to an identification result.

2. The natural language processing based item application analysis method of claim 1,

the acquiring of the characteristic information of each piece of communication data specifically includes:

acquiring source information of each piece of communication data;

judging whether the communication data with the same source information exists or not;

if the communication data with the same source information does not exist, directly taking the source information as the characteristic information of the communication data;

if the communication data with the same source information exists, acquiring object characteristics of a communication object in each communication data, performing information combination on the source information and the object characteristics corresponding to each communication data, and taking an information combination result as the characteristic information corresponding to each communication data;

and/or the like and/or,

the data cleaning of the session information in each piece of communication data specifically includes:

removing messy information irrelevant to the communication content of the communication object in the conversation information, and respectively carrying out information duplication removal processing on the conversation information after the messy information is removed;

the method comprises the steps of obtaining punctuation marks in conversation information, setting other punctuation marks except question marks and periods in the punctuation marks into commas, and then converting each punctuation mark into corresponding character information according to semantic information of each punctuation mark.

3. The natural language processing based item application analysis method of claim 1,

when the article is a medicine, performing target article recommendation category identification on the to-be-processed communication data according to a preset neural network classification model, specifically comprising:

acquiring dialogue information containing a target medicine in the to-be-processed communication data and taking the dialogue information as first target dialogue information;

acquiring an exchange object identifier of the first target dialogue information;

if the communication object identifier is a doctor, performing target drug recommendation type identification on the first target dialogue information according to a preset neural network classification model;

if the communication object identifier is the patient, selecting first target conversation information containing a question mark as second target conversation information, acquiring conversation serial numbers in the to-be-processed communication data after the conversation serial numbers of the second target conversation information, wherein the communication object identifier is the conversation information of a doctor, and performing target medicine recommendation category identification on the acquired conversation information according to a preset neural network classification model;

and/or the like and/or,

the method for performing target item recommendation category identification on the to-be-processed communication data according to the preset neural network classification model specifically comprises the following steps:

acquiring an alternating current data sample in a preset training set, wherein the alternating current data sample comprises article recommendation category information, dialogue information which is arranged according to dialogue serial numbers and is obtained after data cleaning, and an alternating current object identifier and a dialogue serial number which correspond to each sentence of dialogue information;

performing model training on a pre-constructed neural network classification model based on the alternating current data sample by utilizing a machine learning algorithm;

and performing recommendation type identification on the target object in the to-be-processed communication data according to the neural network classification model after model training to obtain a recommendation type corresponding to the target object.

4. The natural language processing-based item application analysis method according to claim 2, wherein the preset neural network classification model comprises a BRNN model layer, a BIGRU model layer, a classification function layer, and a recommended category output layer;

the BRNN model layer is configured to obtain a word segmentation processing result of dialog information, obtain a word vector corresponding to each word in the dialog information according to the word segmentation processing result, obtain a symbol vector of a punctuation mark according to character information of the punctuation mark in the dialog information, and obtain an object vector of the communication object identifier according to an communication object identifier of the dialog information;

the BIGRU model layer is configured to obtain a feature vector of the dialogue information according to a word vector, a symbol vector and an object vector of the dialogue information output by the BRNN model layer;

the classification function layer is configured to predict a probability of each recommendation category corresponding to the dialog information according to a feature vector of the dialog information;

the recommendation category output layer is configured to obtain and output a recommendation category corresponding to the maximum probability.

5. The natural language processing based item application analysis method of claim 4, further comprising:

the BRNN model layer is configured to obtain a feature vector corresponding to each word according to semantic information of each word, obtain a weight corresponding to each word in the dialogue information according to a method shown by the following formula, and perform weighting calculation according to the feature vector and the weight corresponding to each word to obtain a word vector corresponding to each word:

Tfidf(w)＝tf(d，w)×idf(w)

wherein, tfidf (w) is the weight of the w term, tf (d, w) is the word frequency of the w term in the d communication data, idf (w) is the inverse text frequency index of the w term;

if the w-th word is the related word of the target object, then

6. An article application analysis system based on natural language processing, the system comprising:

the communication data acquisition device is configured to acquire communication data of a communication object related to the target object, wherein the communication data comprises a plurality of communication object identifications and conversation information corresponding to each communication object identification;

a first data processing device configured to generate a conversation sequence number corresponding to each sentence of conversation information in each exchange data according to an exchange sequence corresponding to each sentence of conversation information in each exchange data; acquiring characteristic information of each piece of alternating current data and respectively setting a unique identifier corresponding to each sentence of dialogue information in each piece of alternating current data according to the characteristic information;

the second data processing device is configured to perform data cleaning on the dialogue information in each piece of communication data, then acquire the dialogue information with the same unique identifier in the data cleaned dialogue information, sort the dialogue information with the same unique identifier according to a dialogue serial number and generate to-be-processed communication data according to a sorting result; the to-be-processed communication data comprise dialogue information which is arranged according to the dialogue serial number and is subjected to data cleaning, and a communication object identifier and a dialogue serial number which correspond to each sentence of dialogue information;

and the article application analysis device is configured to perform target article recommendation category identification on the to-be-processed communication data according to a preset neural network classification model, and output an application state of a target article according to an identification result.

7. The natural language processing based item application analysis system of claim 6, wherein the first data processing device comprises a characteristic information acquisition module and/or the second data processing module comprises a data cleansing module;

the feature information acquisition module is configured to perform the following operations:

acquiring source information of each piece of communication data;

judging whether the communication data with the same source information exists or not;

if the communication data with the same source information does not exist, directly taking the source information as the characteristic information of the communication data;

the data cleansing module is configured to perform the following operations:

8. The natural language processing based item application analysis system according to claim 6, wherein the item application analysis device comprises a first item application analysis module and/or a second item application analysis module;

the first item application analysis module is configured to perform the following operations when the item is a pharmaceutical:

acquiring dialogue information containing a target medicine in the to-be-processed communication data and taking the dialogue information as first target dialogue information;

acquiring an exchange object identifier of the first target dialogue information;

the second item application analysis module is configured to perform the following operations:

performing model training on a pre-constructed neural network classification model based on the alternating current data sample by utilizing a machine learning algorithm;

9. The natural language processing based item application analysis system of claim 7, wherein the preset neural network classification model comprises a BRNN model layer, a BIGRU model layer, a classification function layer and a recommended category output layer;

the recommendation category output layer is configured to obtain and output a recommendation category corresponding to the maximum probability.

10. The natural language processing based item application analysis system of claim 9, further comprising:

Tfidf(w)＝tf(d，w)×idf(w)

wherein, tfidf (w) is the weight of the w term, tf (d, w) is the word frequency of the w term in the d communication data, idf (w) is the inverse text frequency index of the w term;

if the w-th word is the related word of the target object, thenIf the w-th word is not the related word of the target object, thenN is the total number of the communication data, N (w) is the number of the communication data containing the w-th word, and k is a preset weighting coefficient.

20页详细技术资料下载

Article application analysis method and system based on natural language processing

相关技术

网友询问留言