Chemical process fault identification method and system

文档序号:1686604 发布日期:2020-01-03 浏览:4次 中文

阅读说明:本技术 一种化工过程故障识别方法及系统 (Chemical process fault identification method and system ) 是由 田文德 贾旭清 刘子健 张士发 于 2019-09-06 设计创作,主要内容包括:本公开提出了一种化工过程故障识别方法及系统,采用应用于标签昂贵的化工故障识别领域,采用动态主动安全半监督支持向量机模型(简称为PCA-DAS4VM模型)识别化工过程运行状态,将主成分分析方法与动态主动安全半监督支持向量机结合,弥补了传统监督学习对于标签数据数量的要求,提高了半监督学习的识别精度。采用主成分分析方能够消除化工过程噪声和冗余数据,结合历史信息和未来信息进行异常工况故障识别,有效地选择和标记高熵值的无标记数据,充分利用无标签数据提升识别模型性能,实现了高效和完整的进行化工过程故障识别工作,识别准确度更高,识别速度更快有益于推动化工安全的发展。(The method and the system are applied to the field of chemical fault recognition with expensive labels, a dynamic active safety semi-supervised support vector machine model (PCA-DAS 4VM for short) is used for recognizing the operation state of the chemical process, a principal component analysis method is combined with the dynamic active safety semi-supervised support vector machine, the requirement of traditional supervised learning on the number of label data is made up, and the recognition accuracy of the semi-supervised learning is improved. The main component analysis method is adopted to eliminate noise and redundant data of the chemical process, abnormal working condition fault identification is carried out by combining historical information and future information, unmarked data with high entropy is effectively selected and marked, the performance of the identification model is improved by fully utilizing the unmarked data, efficient and complete fault identification work of the chemical process is realized, the identification accuracy is higher, and the identification speed is higher, so that the development of chemical safety is promoted.)

1. A chemical process fault identification method is characterized by comprising the following steps:

acquiring operation data in a chemical production process in real time;

preprocessing the acquired running data;

selecting key characteristic data in the operating data by adopting a principal component analysis method;

and establishing a dynamic active safety semi-supervised support vector machine model based on a semi-supervised learning method, inputting key characteristic data into the trained dynamic active safety semi-supervised support vector machine model, and outputting the running state of the chemical process.

2. A chemical process fault identification method as claimed in claim 1, wherein: the key characteristic data comprises label data and label-free data, and the processing of the key data by the dynamic active safety semi-supervised support vector machine model comprises the step of adding labels to the label-free data by adopting an active learning method.

3. A chemical process fault identification method as claimed in claim 1, wherein: the method for selecting the key characteristic data in the operating data by adopting the principal component analysis method comprises the following steps:

calculating a characteristic covariance matrix of the preprocessed data matrix, and a characteristic value and a characteristic vector of the covariance matrix; sorting according to the variance contribution rate from large to small, and obtaining a variable with the sum of the variance contribution rates exceeding a set proportion threshold value as a principal component variable;

establishing principal component linear expressions according to the principal component variables, and calculating coefficients of the principal component variables in the principal component linear expressions according to the characteristic values;

obtaining a comprehensive scoring model according to the coefficient of the principal component variable in the principal component linear expression, and calculating the variable coefficient in the comprehensive scoring model through the variance of the principal component variable;

normalizing the variable coefficient in the obtained comprehensive score model, and re-determining the variable weight;

and sorting the re-determined variable weights according to the weight values, wherein the operation data corresponding to the variables with the weight sum higher than the set threshold value is the key characteristic data.

4. A chemical process fault identification method as claimed in claim 3, wherein:

calculating coefficients of principal component variables in each principal component linear expression according to the characteristic values, wherein the calculation formula is as follows:

Figure FDA0002194626040000021

wherein coe is the coefficient of variable q in the d-th principal component linear expression; v is the d-th principal element of the variable q; e is the feature root of the d-th pivot.

Or

Calculating the variable coefficient in the comprehensive score model according to the coefficient of the principal component variable in the principal component linear expression, wherein the calculation formula is as follows:

Figure FDA0002194626040000022

wherein, w in the equation is the coefficient of variable q in the comprehensive scoring model; o is the number of principal components; s is the variance of the d-th principal.

5. A chemical process fault identification method as claimed in claim 1, wherein: the training process of the dynamic active safety semi-supervised support vector machine model comprises the following steps:

acquiring historical data of a chemical production process, wherein the historical data comprises fault data and non-fault data;

preprocessing the acquired historical data;

selecting key characteristic data in the operating data by adopting a principal component analysis method, wherein the key characteristic data comprises tag data and non-tag data;

the method comprises the steps of adding labels to label-free data by adopting an active learning method, inputting the labeled data and the labeled data as input into a dynamic active safety semi-supervised support vector machine model for training, and obtaining parameters of the dynamic active safety semi-supervised support vector machine model by taking fault type or normal operation as output.

6. A chemical process fault identification method as claimed in claim 2 or 5, wherein: the method for adding the label to the label-free data by adopting the active learning method comprises the following steps:

optimizing the confidence coefficient of the pseudo tag of the recognition model by combining historical information and future information of the chemical process data;

and calculating the entropy value of the key characteristic data according to the confidence coefficient of the pseudo label, selecting the key characteristic data with high entropy value by adopting active learning, and adding a data label for the key characteristic data based on the knowledge body.

7. A chemical process fault identification method as claimed in claim 6, wherein: the method for optimizing the confidence coefficient of the pseudo tag of the identification model by combining the historical information and the future information of the chemical process data specifically comprises the following steps:

classifying the historical data according to faults to obtain k classes corresponding to the k faults;

and calculating the confidence coefficient of each data belonging to each class K, and calculating the confidence coefficient of the pseudo label of each key characteristic data by adopting an averaging method according to the calculated confidence coefficient.

8. A chemical process fault identification system is characterized by comprising:

a data acquisition module: the system is used for acquiring operation data in the chemical production process in real time;

a preprocessing module: the system is used for preprocessing the acquired operation data;

the key characteristic data extraction module: the method comprises the steps of selecting key characteristic data in the operating data by adopting a principal component analysis method;

an identification module: the method is used for inputting the key characteristic data into the trained dynamic active safety semi-supervised support vector machine model and outputting the running state of the chemical process.

9. An electronic device comprising a memory and a processor and computer instructions stored on the memory and executable on the processor, the computer instructions when executed by the processor performing the steps of the method of any of claims 1 to 7.

10. A computer-readable storage medium storing computer instructions which, when executed by a processor, perform the steps of the method of any one of claims 1 to 7.

Technical Field

The disclosure relates to the technical field related to chemical process fault identification, in particular to a chemical process fault identification method and system.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

According to the statistical analysis of accidents in chemical enterprises, a plurality of tiny exceptions are inevitably generated before any major accident occurs. Therefore, the method carries out fault identification research aiming at the chemical process, finds out potential abnormal conditions in time, and has important theoretical and practical significance for keeping the safe and stable operation of the chemical device.

The inventor finds that the existing process fault identification method mainly comprises the following steps: qualitative models, quantitative models, and data-driven methods. In all data-driven fault identification methods, the supervised learning technology shows a good identification result for fault identification in the chemical process, and the identification precision reaches more than 92%. However, the number of label data in the actual chemical process often does not meet the requirement of supervised learning, labels are added to the non-label data generally through manpower according to experience, and the cost of marking a large amount of easily collected non-label chemical data is expensive.

Semi-supervised learning has been currently applied in a number of fields, such as digital recognition, emotion classification, medical image classification, and so on. In some researches, the requirement of the traditional supervised learning on the number of label data is higher, and the existing semi-supervised learning method shows worse performance than the supervised learning under the condition of the same number of label data. Therefore, applying semi-supervised learning to chemical process fault identification is a topic of little research.

Disclosure of Invention

In order to solve the problems, the invention provides a chemical process fault identification method and a chemical process fault identification system, which are applied to the field of chemical fault identification with expensive labels, and the method combines a principal component analysis method and a dynamic active safety semi-supervised support vector machine, and adopts a dynamic active safety semi-supervised support vector machine model (PCA-DAS 4VM model for short) to identify the operating state of the chemical process, so that the requirement of the traditional supervised learning on the number of label data is met, and the identification precision of the semi-supervised learning is improved.

In order to achieve the purpose, the following technical scheme is adopted in the disclosure:

one or more embodiments provide a chemical process fault identification method, including the steps of:

acquiring operation data in a chemical production process in real time;

preprocessing the acquired running data;

selecting key characteristic data in the operating data by adopting a principal component analysis method;

and establishing a dynamic active safety semi-supervised support vector machine model based on a semi-supervised learning method, inputting key characteristic data into the trained dynamic active safety semi-supervised support vector machine model, and outputting the running state of the chemical process.

Further, the key characteristic data comprises tag data and label-free data, and the processing of the key data by the dynamic active safety semi-supervised support vector machine model comprises the step of adding a tag to the label-free data by adopting an active learning method.

Further, the method for selecting key feature data in the operating data by adopting the principal component analysis method comprises the following steps:

calculating a characteristic covariance matrix of the preprocessed data matrix, and a characteristic value and a characteristic vector of the covariance matrix; sorting according to the variance contribution rate from large to small, and obtaining a variable with the sum of the variance contribution rates exceeding a set proportion threshold value as a principal component variable;

establishing principal component linear expressions according to the principal component variables, and calculating coefficients of the principal component variables in the principal component linear expressions according to the characteristic values;

obtaining a comprehensive scoring model according to the coefficient of the principal component variable in the principal component linear expression, and calculating the variable coefficient in the comprehensive scoring model through the variance of the principal component variable;

normalizing the variable coefficient in the obtained comprehensive score model, and re-determining the variable weight;

and sorting the re-determined variable weights according to the weight values, wherein the operation data corresponding to the variables with the weight sum higher than the set threshold value is the key characteristic data.

Further, calculating coefficients of principal component variables in each principal component linear expression according to the characteristic values, wherein the calculation formula is as follows:

Figure BDA0002194626050000031

wherein coe is the coefficient of variable q in the d-th principal component linear expression; v is the d-th principal element of the variable q; e is the feature root of the d-th pivot.

Or

Calculating the variable coefficient in the comprehensive score model according to the coefficient of the principal component variable in the principal component linear expression, wherein the calculation formula is as follows:

Figure BDA0002194626050000032

wherein, w in the equation is the coefficient of variable q in the comprehensive scoring model; o is the number of principal components; s is the variance of the d-th principal.

Further, the training process of the dynamic active safety semi-supervised support vector machine model comprises the following steps:

acquiring historical data of a chemical production process, wherein the historical data comprises fault data and non-fault data;

preprocessing the acquired historical data;

selecting key characteristic data in the operating data by adopting a principal component analysis method, wherein the key characteristic data comprises tag data and non-tag data;

the method comprises the steps of adding labels to label-free data by adopting an active learning method, inputting the labeled data and the labeled data as input into a dynamic active safety semi-supervised support vector machine model for training, and obtaining parameters of the dynamic active safety semi-supervised support vector machine model by taking fault type or normal operation as output.

Further, the step of adding a label to the non-label data by adopting an active learning method comprises the following steps:

optimizing the confidence coefficient of the pseudo tag of the recognition model by combining historical information and future information of the chemical process data;

and calculating the entropy value of the key characteristic data according to the confidence coefficient of the pseudo label, selecting the key characteristic data with high entropy value by adopting active learning, and adding a data label for the key characteristic data based on the knowledge body.

Further, the method for optimizing the confidence of the pseudo tag of the identification model by combining the historical information and the future information of the chemical process data specifically comprises the following steps:

classifying the historical data according to faults to obtain k classes corresponding to the k faults;

and calculating the confidence coefficient of each data belonging to each class K, and calculating the confidence coefficient of the pseudo label of each key characteristic data by adopting an averaging method according to the calculated confidence coefficient.

A chemical process fault identification system, comprising:

a data acquisition module: the system is used for acquiring operation data in the chemical production process in real time;

a preprocessing module: the system is used for preprocessing the acquired operation data;

the key characteristic data extraction module: the method comprises the steps of selecting key characteristic data in the operating data by adopting a principal component analysis method;

an identification module: the method is used for inputting the key characteristic data into the trained dynamic active safety semi-supervised support vector machine model and outputting the running state of the chemical process.

An electronic device comprising a memory and a processor and computer instructions stored on the memory and executed on the processor, the computer instructions, when executed by the processor, performing the steps of the above method.

A computer readable storage medium storing computer instructions which, when executed by a processor, perform the steps of the above method.

Compared with the prior art, the beneficial effect of this disclosure is:

according to the method, the principal component analysis method is combined with the dynamic active safety semi-supervised support vector machine, the requirement of the traditional supervised learning on the number of the label data is made up, and the identification precision of the semi-supervised learning is improved. The method can eliminate noise and redundant data in the chemical process, combines historical information and future information to identify abnormal working condition faults, effectively selects and marks unmarked data with high entropy, establishes a graphical scenario object model based on a knowledge body, and determines the label of the unmarked data according to the established graphical scenario object model and expert knowledge. Make full use of no label data promotes the identification model performance, has realized high-efficient and complete the work of carrying out chemical process fault identification, and the identification accuracy is higher, and the recognition rate is of value to the development that promotes chemical industry safety sooner.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and not to limit the disclosure.

FIG. 1 is a flow diagram of a chemical process fault identification method in accordance with one or more embodiments;

FIG. 2 is a principal component variance percentage of TE process fault 4 during training of embodiment 1 of the present disclosure;

FIG. 3 is a principal component eigenvalue of TE process fault 4 during training in embodiment 1 of the present disclosure;

FIG. 4 is a key measured variable weight of TE process fault 4 during training of embodiment 1 of the present disclosure;

FIG. 5 is a graph of key measured variables determined in 20 TE faults in the training process of embodiment 1 of the present disclosure;

FIG. 6 is a comparison of PCA-DAS4VM accuracy at different unlabeled data volumes;

FIG. 7 is a TE process graphical scenario object model based on ontology;

FIG. 8 is a graphical scenario object model of TE process fault 4;

FIG. 9 is a comparison of F1 scores for the PCA-S4VM, DAS4VM, and PCA-DAS4VM models;

FIG. 10 is a comparison of the FPRs of the PCA-S4VM, DAS4VM, and PCA-DAS4VM models;

FIG. 11 is a FDR comparison of PCA-S4VM, DAS4VM, and PCA-DAS4VM models;

FIG. 12 is a G-mean comparison of PCA-S4VM, DAS4VM, and PCA-DAS4VM models;

FIG. 13 is a comparison of the accuracy of the DSSAE, ALSemiFDA and PCA-DAS4VM models.

The specific implementation mode is as follows:

the present disclosure is further described with reference to the following drawings and examples.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise. It should be noted that, in the case of no conflict, the embodiments and features in the embodiments in the present disclosure may be combined with each other. The embodiments will be described in detail below with reference to the accompanying drawings.

19页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种面向高性能计算机的配电监测管理系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类