Named entity identification method for automobile engine fault diagnosis

文档序号：1544892 发布日期：2020-01-17 浏览：14次中文

阅读说明：本技术 一种面向汽车发动机故障诊断的命名实体识别方法 (Named entity identification method for automobile engine fault diagnosis ) 是由陈志成刘孝保阴艳超陆宏彪于 2019-08-28 设计创作，主要内容包括：本发明公开了一种面向汽车发动机故障诊断的命名实体识别方法,包括步骤1：建立汽车发动机故障诊断命名实体识别文本语料库；步骤2：对语料库进行预处理；步骤3：对预处理后的文本数据进行分布式表示,通过BERT语言模型得到文本预训练“词+词性”向量；步骤4：将得到的文本预训练“词+词性”向量输入到BiLstm神经网络中进行文本特征提取；步骤5：在BiLstm神经网络后引入Attention机制捕捉文本特征中较为重要的部分；步骤6：结合CRF模型,得到汽车发动机故障诊断命名实体识别的最优标注序列。本发明解决了现有方法中缺少汽车发动机故障诊断相关的数据集,知识实体句法特征明显,知识实体内部存在大量修饰词干扰,实体全文标签不一致,从而使得命名实体识别结果准确率低的问题。(The invention discloses a named entity identification method for automobile engine fault diagnosis, which comprises the following steps of 1: establishing a vehicle engine fault diagnosis named entity recognition text corpus; step 2: preprocessing a material library; and step 3: performing distributed representation on the preprocessed text data, and obtaining a text pre-training word + part of speech vector through a BERT language model; and 4, step 4: inputting the obtained text pre-training word + part of speech vectors into a BiLstm neural network for text feature extraction; and 5: introducing an Attention mechanism behind a BiLstm neural network to capture a more important part in text features; step 6: and obtaining the optimal labeling sequence for the automobile engine fault diagnosis named entity recognition by combining the CRF model. The method solves the problems that a data set related to automobile engine fault diagnosis is lacked, the syntactic characteristics of a knowledge entity are obvious, a large number of modifier interferences exist in the knowledge entity, and the full-text labels of the entity are inconsistent, so that the accuracy of the named entity recognition result is low.)

1. A named entity identification method for automobile engine fault diagnosis is characterized in that: the method comprises the following steps:

step 1: establishing a vehicle engine fault diagnosis named entity recognition text corpus;

step 2: preprocessing a material library;

and step 3: performing distributed representation on the preprocessed text data, and obtaining a text pre-training word + part of speech vector through a BERT language model;

and 4, step 4: inputting the obtained text pre-training word + part of speech vectors into a BiLstm neural network for text feature extraction;

and 5: introducing an Attention mechanism behind a BiLstm neural network to capture a more important part in text features; wherein, the more important part refers to the part which can form the knowledge entity in the text characteristic;

step 6: and obtaining the optimal labeling sequence for the automobile engine fault diagnosis named entity recognition by combining the CRF model.

2. The named entity recognition method for automobile engine fault diagnosis according to claim 1, characterized in that: the named entity corpus for automobile engine fault diagnosis consists of texts containing automobile engine fault diagnosis data, including automobile engine maintenance records, automobile engine fault diagnosis academic papers and automobile engine fault diagnosis patent applications.

3. The named entity recognition method for automobile engine fault diagnosis according to claim 1, characterized in that: the corpus preprocessing specifically comprises: firstly, word segmentation is carried out on a corpus by using word segmentation software, part-of-speech tagging is carried out on a text after word segmentation, stop words are filtered on the text after word segmentation and part-of-speech tagging, then entity tagging is carried out on the text after processing, and finally the corpus is divided into a training set and a test set.

4. The named entity recognition method for automobile engine fault diagnosis according to claim 3, wherein: the entity labeling adopts a BIESO labeling method, wherein B, I, E, S represents the beginning, the inside, the tail and the word of the entity respectively as entity labels, and O represents a non-entity label; entity category labels are corresponding to the post-BIES labels and comprise ENGP, FAU, CAU and SOL, wherein ENGP represents engine parts, FAU represents fault expression, CAU represents fault reason and SOL represents fault processing method.

5. The named entity recognition method for automobile engine fault diagnosis according to claim 1, characterized in that: the distributed representation of the text data refers to that the feature vector of the word and the part of speech is obtained by using a BERT language model by taking the word after the pretreatment and the part of speech corresponding to the word as a whole.

6. The named entity recognition method for automobile engine fault diagnosis according to claim 1, characterized in that: the BiLstm refers to a positive and negative bidirectional Lstm network.

Technical Field

The invention relates to a named entity identification method for automobile engine fault diagnosis, and belongs to the technical field of information.

Background

With the rapid development of information technology, the automobile engine fault diagnosis technology presents a big data environment, massive automobile engine fault diagnosis text data such as automobile engine maintenance records, fault diagnosis academic papers, patents and the like are continuously generated and accumulated, fault knowledge information is efficiently and accurately mined from the data, and great convenience is brought to the automobile engine fault diagnosis technology. Named entity identification is a very necessary technology in the process of mining the text information of automobile engine fault diagnosis. Through named entity recognition, information such as engine parts, fault expressions, fault reasons, processing methods and the like can be efficiently and accurately extracted from the automobile engine fault diagnosis text data, and a basis is provided for further subsequent data analysis and utilization.

At present, a named entity method based on a deep learning model gradually becomes a mainstream, and compared with a traditional model based on a rule template or a latent machine learning method, the deep learning model can acquire more text features, so that the accuracy of named entity identification is greatly improved. However, the existing named entity method based on the deep learning model is mainly oriented to the public field and lacks a data set related to the fault diagnosis of an automobile engine; in addition, analyzing the relevant text of the automobile engine fault diagnosis, finding that the syntactic components of different knowledge entities have large difference, and simultaneously, a large amount of modifier interference exists in the knowledge; further, the same word is endowed with different labels in different knowledge entities, which causes difficulty in identifying the named entity for diagnosing the automobile engine fault and causes undesirable identification effect.

Disclosure of Invention

The invention provides a named entity recognition method for automobile engine fault diagnosis, which is used for solving the problems that a data set related to automobile engine fault diagnosis is lacked, the syntactic characteristics of a knowledge entity are obvious, a large number of modifier interferences exist in the knowledge entity, and full-text labels of the entity are inconsistent, so that the recognition result accuracy of the named entity is low.

The technical scheme of the invention is as follows: a named entity identification method for automobile engine fault diagnosis comprises the following steps:

step 1: establishing a vehicle engine fault diagnosis named entity recognition text corpus;

step 2: preprocessing a material library;

and step 3: performing distributed representation on the preprocessed text data, and obtaining a text pre-training word + part of speech vector through a BERT language model;

and 4, step 4: inputting the obtained text pre-training word + part of speech vectors into a BiLstm neural network for text feature extraction;

step 6: and obtaining the optimal labeling sequence for the automobile engine fault diagnosis named entity recognition by combining the CRF model.

The named entity corpus for automobile engine fault diagnosis consists of texts containing automobile engine fault diagnosis data, including automobile engine maintenance records, automobile engine fault diagnosis academic papers and automobile engine fault diagnosis patent applications.

The corpus preprocessing specifically comprises: firstly, word segmentation is carried out on a corpus by using word segmentation software, part-of-speech tagging is carried out on a text after word segmentation, stop words are filtered on the text after word segmentation and part-of-speech tagging, then entity tagging is carried out on the text after processing, and finally the corpus is divided into a training set and a test set.

The entity labeling adopts a BIESO labeling method, wherein B, I, E, S represents the beginning, the inside, the tail and the word of the entity respectively as entity labels, and O represents a non-entity label; entity category labels are corresponding to the post-BIES labels and comprise ENGP, FAU, CAU and SOL, wherein ENGP represents engine parts, FAU represents fault expression, CAU represents fault reason and SOL represents fault processing method.

The distributed representation of the text data refers to that the feature vector of the word and the part of speech is obtained by using a BERT language model by taking the word after the pretreatment and the part of speech corresponding to the word as a whole.

The BiLstm refers to a positive and negative bidirectional Lstm network.

The invention has the beneficial effects that:

1. the invention constructs a data set related to the named entity recognition for automobile engine fault diagnosis, obtains a word + part of speech vector through a BERT language model, effectively considers the characteristic of obvious syntactic characteristics of a fault diagnosis knowledge entity, and solves the problem that the recognition accuracy of the knowledge entity is influenced by the inconsistency of full-text labels of the entity.

2. According to the invention, an Attention mechanism is introduced behind a BiLstm neural network to capture a more important part in text characteristics, so that the problem of interference of a large number of modifiers in a fault diagnosis knowledge entity is effectively solved.

3. The invention can effectively extract the engine parts, the fault expression, the fault reasons and the fault processing method in the automobile engine fault diagnosis text data, and provides a basis for further data analysis and utilization.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a model block diagram of the present invention;

fig. 3 is a diagram of a BiLstm model architecture.

Detailed Description

7页详细技术资料下载

Named entity identification method for automobile engine fault diagnosis

相关技术

网友询问留言