Cross-domain subspace learning electronic nose drift compensation method based on manifold learning

文档序号：340160 发布日期：2021-12-03 浏览：29次中文

阅读说明：本技术 一种基于流形学习的跨域子空间学习电子鼻漂移补偿方法 (Cross-domain subspace learning electronic nose drift compensation method based on manifold learning ) 是由闫嘉田雨桐王丽丹段书凯于 2021-08-12 设计创作，主要内容包括：本发明涉及一种基于流形学习的跨域子空间学习电子鼻漂移补偿方法,其特征在于,将气体数字信号送入处理器,处理器利用LME-CDSL数学模型进行气体样本分析,获得气体样本数据；处理器将气体样本数据送入分类器进行气体分类,获得气体分类结果。其有益效果是：具有更好的信息保存性能。不仅抑制了数据漂移,还同时维护了源域的流形和标签信息以及两个域的知识信息；局部线性流形学习完成高维数据的紧凑表示,能够保留非漂移样本的局部特征,提高模型的判别能力；保证了LME-CDSL模型的鲁棒性和识别性能；本发明通过特征值分解实现特征子空间学习,计算成本极短,易于实际应用。(The invention relates to a cross-domain subspace learning electronic nose drift compensation method based on manifold learning, which is characterized in that a gas digital signal is sent to a processor, and the processor performs gas sample analysis by using an LME-CDSL mathematical model to obtain gas sample data; and the processor sends the gas sample data into the classifier for gas classification to obtain a gas classification result. The beneficial effects are as follows: has better information storage performance. Not only is data drift inhibited, but also the manifold and tag information of a source domain and the knowledge information of the two domains are maintained; the compact representation of high-dimensional data is completed by local linear manifold learning, the local characteristics of non-drifting samples can be reserved, and the discrimination capability of the model is improved; the robustness and the recognition performance of the LME-CDSL model are ensured; the invention realizes the characteristic subspace learning through the characteristic value decomposition, has extremely short calculation cost and is easy to be practically applied.)

1. A cross-domain subspace learning electronic nose drift compensation method based on manifold learning comprises the following steps:

step 1, after an electronic nose sensor system detects gas to be detected in an environment, the sensor system generates current signals or voltage signals, the current signals or the voltage signals are processed into a resistance response curve in a unified mode, each signal sample is filtered and amplified through a signal conditioning circuit, and then effective gas digital signals are generated through AD conversion;

step 2, sending the gas digital signals to a processor, extracting the characteristics of a response curve by the processor, and obtaining source domain gas sample data without drift and target domain gas sample data with drift;

step 3, the processor trains an LME-CDSL model by using the source domain gas sample data and the target domain gas sample data, then performs drift compensation on the target domain gas sample data by using the trained LME-CDSL model, and sends the obtained compensated data to a classifier for gas classification to obtain a gas classification result;

and 4, outputting a gas classification result by the processor through the human-computer interaction mechanism.

2. The method as claimed in claim 1, wherein the primary expression of the LME-CDSL mathematical model in step 3 is as follows:

so that

Wherein the content of the first and second substances,for the source domain data set, N_SIs the number of source domain samples,is target domain data, N_TFor the target domain sample number, P is the projection transformation matrix, Tr (-) and (-) respectively^TRespectively trace and transpose operations, E_d×dThe matrix is a d-dimensional unit matrix, all the matrices are capital letters, and the vector is a lowercase letter;

the LME-CDSL mathematical model in the step 3 obtains an optimized expression through a Lagrange multiplier method, wherein the optimized expression is as follows:

in this section, let To obtain：

B^-1AP＝ρ₂P (3)

Thus, P is B^-1The feature vector set of A, i.e. projective transformation matrix, can be obtained by pairing B^-1A is obtained by decomposing characteristic values, and the optimal projection direction P is obtained due to the fact that the optimal target requires a solution corresponding to the maximum value^*Should be the first d eigenvectors of P, their eigenvectors [ (P)₁)₁,(p₁)₂,...,(p₁)_d]Arranged in descending order according to the characteristic value, P^*Is represented as:

P^*＝[p₁,p₂,...,p_d]. (4)

wherein, λ is a regularization parameter,

an inter-class scattering matrix and an intra-class scattering matrix, respectively, of the source domain data, x being the single source domain data, C being the number of classes, μ_cData center, μ, which is the original source domain subclass c_SFor original Source Domain data centers, μ_TIs the original target domain data center, E is the identity matrix, W is the local weight matrix, rho₂Is a lagrange multiplier;

for target area gas sample data X_TAnd performing drift compensation, wherein the expression is as follows:

Y_T＝(P^*)^TX_T

wherein, Y_TIs the compensated target domain gas sample data.

3. The method for compensating drift of an electronic nose based on manifold learning and learning of a cross-domain subspace, according to claim 1, is characterized in that the drift compensation method in the step 3 is a static analysis method:

obtaining a first batch of gas sample data as drift-free source domain gas sample data;

training an original LME-CDSL mathematical model by using drift-free source domain gas sample data, optimizing parameters of the original LME-CDSL mathematical model, and obtaining a first parameter LME-CDSL mathematical model;

and calculating the gas sample data of all time periods after the target domain gas sample data with drift by using the first parameter LME-CDSL mathematical model.

4. The method for compensating drift of an electronic nose based on manifold learning and learning of a cross-domain subspace, according to claim 1, is characterized in that the drift compensation method in the step 3 is a dynamic analysis method:

obtaining a first batch of gas sample data as drift-free source domain gas sample data;

calculating a second time period sample gas serving as drifting target domain gas sample data by using a first parameter LME-CDSL mathematical model to obtain second batch of compensated target domain gas sample data;

training an original LME-CDSL mathematical model by using sample gas in a second time period, optimizing parameters of the original LME-CDSL mathematical model, and obtaining a second parameter LME-CDSL mathematical model;

calculating a third time period sample gas serving as drifting target domain gas sample data by using a second parameter LME-CDSL mathematical model to obtain third batch of compensated target domain gas sample data;

and so on until;

and calculating the sample gas in the last time period by using a last-1 parameter LME-CDSL mathematical model to obtain the target domain gas sample data after the last batch of compensation.

5. The method for compensating drift of an electronic nose through cross-domain subspace learning based on manifold learning according to claim 1, wherein in the step 3, the classifier expression is as follows:

f(Y_T)＝softmax(Hβ+b), (5)

wherein(Hβ+b)_iIs the ith row vector of the matrix (H beta + b), and N is the sample matrixThe number of samples in (a) is,for activation functions, hidden layer neurons are biased by b ═ b₁,b₂,…,b_hid]And hid is the number of hidden neurons,the expression of the weight matrix β is as follows:

wherein E_DD dimension unit matrix, T label vector;

obtained gas classification result f (Y)_T) Classification result for a certain sample thereinThe method is expressed as a c-dimensional row vector, each numerical value in the vector corresponds to the probability of belonging to various gas classes, namely, the first element represents the probability of belonging to the first gas class of the sample, the second element represents the probability of belonging to the second gas class of the sample, and so on, the sum of all elements in the vector is 1, and the sample with the largest probability is regarded as belonging to the corresponding class.

Technical Field

The invention belongs to an electronic nose signal processing technology, and particularly relates to a cross-domain subspace learning electronic nose drift compensation method based on manifold learning.

Background

The electronic nose is an intelligent device simulating a biological olfactory system, consists of a plurality of cross-sensitive chemical sensors and a pattern recognition program, and can distinguish simple or complex smells. In recent decades, electronic nose systems have become increasingly popular and are used in many fields such as medicine, biology, food industry, indoor environmental monitoring, etc. Ideally, the electronic nose system will respond equally under the same gas environment. When there is no gas to be measured in the environment, the electronic nose system response will be at baseline. However, the current electronic nose technology has many problems in practical application, such as gas sensor drift.

Gas sensor drift, which is caused by sensor aging, poisoning, or humidity and temperature fluctuations, exhibits nonlinear dynamics in the gas sensor array. Therefore, the response of the gas sensor is not always the same when reacting to the same kind of gas. In the field of artificial olfaction, the problem of drift of electronic nose sensors is called an ill-posed problem, and it is difficult to explore common characteristics thereof.

Methods of drift compensation can be divided into 3 categories, sensor signal preprocessing, periodic correction and adaptive correction.

1. The sensor signal preprocessing method comprises the following steps: such as baseline processing and frequency domain filtering. Although the implementation of sensor signal preprocessing is simple, the drift signal is only considered as an ideal linear signal, thereby limiting the range of applications.

2. Periodic corrections, such as partial least squares (partial least squares). It uses the reference gas obtained from the experiment to simulate the actual drift level of the electronic nose system, then estimates the drift signal, and achieves the suppression by the operation between the estimated drift signal and the actual drift signal. This method, while easy to implement, is very time consuming and labor intensive.

3. Adaptive correction, such as domain regularized component analysis (domain regularized component analysis), cross-domain discriminant subspace learning (cross-domain discriminant subspace learning), and the like. Unlike the above method, it accomplishes drift compensation at the feature level. In contrast, knowledge of the changes in the signal related to long-term drift will adaptively train the model, and then make a corresponding anti-drift output of the sensor.

Disclosure of Invention

In order to solve the technical problems, the invention provides a cross-domain subspace learning electronic nose drift compensation method based on manifold learning, and aims to reduce the influences of distribution distortion, data uncertainty enhancement and electronic nose performance deterioration caused by sensor drift.

Based on the purpose, the invention designs a supervised local manifold embedding cross-domain subspace learning (LME-CDSL) model for electronic nose drift compensation.

The technical scheme of the invention is as follows:

a cross-domain subspace learning electronic nose drift compensation method based on manifold learning comprises the following steps:

and 4, outputting a gas classification result by the processor through the human-computer interaction mechanism.

According to the invention, the dimension of the data sample is reduced by enhancing the flow shape and distribution consistency of the characteristic level, the data drift compensation is realized, and the gas identification performance of the drift gas sensor data set is further improved. Inspired by the machine learning principle, we treat the source domain data as non-drift data, while the target domain data is drift data.

The primary expression l of the LME-CDSL mathematical model in the step 3 is as follows:

wherein the content of the first and second substances,for the source domain data set, N_SIs the number of source domain samples,is target domain data, N_TFor the target domain sample number, P is the projection transformation matrix, Tr (-) and (-) respectively^TRespectively performing trace operation and transposition operation, wherein all matrixes are capital letters, and vectors are lowercase letters;

the LME-CDSL mathematical model in the step 3 obtains an optimized expression through a Lagrange multiplier method, wherein the optimized expression is as follows:

in this section, let Obtaining:

B^-1AP＝ρ₂P (3)

thus, P is B^-1The feature vector set of A, i.e. projective transformation matrix, can be obtained by pairing B^-1A is obtained by decomposing characteristic values, and the optimal projection direction P is obtained due to the fact that the optimal target requires a solution corresponding to the maximum value^*Should be the first d eigenvectors of P, their eigenvectors [ (P)₁)₁，(p₁)₂，...，(p₁)_d]Arranged in descending order according to the characteristic value, P^*Is represented as:

P^*＝[p₁，p₂，...，p_d]. (4)

wherein, λ is a regularization parameter,andan inter-class scattering matrix and an intra-class scattering matrix, respectively, of the source domain data, x being the single source domain data, C being the number of classes, μ_cData center, μ, which is the original source domain subclass c_SFor original Source Domain data centers, μ_TIs the original target domain data center, E is the identity matrix, W is the local weight matrix, rho₂Is a lagrange multiplier.

For target area gas sample data X_TAnd performing drift compensation, wherein the expression is as follows:

Y_T＝(P^*)^TX_T

wherein, Y_TIs the compensated target domain gas sample data.

The drift compensation method in the third step is a static analysis method:

obtaining a first batch of gas sample data as drift-free source domain gas sample data;

and calculating the gas sample data of all time periods after the target domain gas sample data with drift by using the first parameter LME-CDSL mathematical model.

The drift compensation method in the third step is a dynamic analysis method:

obtaining a first batch of gas sample data as drift-free source domain gas sample data;

training an original LME-CDSL mathematical model by using a second time period sample gas as drift-free source domain gas sample data, optimizing parameters of the original LME-CDSL mathematical model, and obtaining a second parameter LME-CDSL mathematical model;

and so on until;

and calculating the sample gas in the last time period by using a last-1 parameter LME-CDSL mathematical model to obtain the target domain gas sample data after the last batch of compensation.

The LME-CDSL model considers the consistency of statistical distribution and geometric distribution at the same time, so that the compensation performance of the sensor drift is improved through the combination of domain adaptation and manifold learning. It can easily accomplish a compact representation of the original data, thus greatly preserving the intrinsic information. This approach has not been applied to sensor drift compensation by its discovery.

The LME-CDSL model is designed primarily to realize feature extraction and data dimension reduction and further realize drift compensation, and the final task is to classify the predicted subspace and ensure the accurate classification performance as high as possible. This patent uses an Extreme Learning Machine (ELM) as a classifier. This patent is intended to be based on source domain post-projection data Y_SLearning classifiers by projecting post-data Y using the target domain_TAnd (6) carrying out testing. The ELM classifier and its learning steps are briefly described below.

The classifier expression in step three is:

f(Y_T)＝softmax(Hβ+b)， (5)

wherein(Hβ+b)_iIs the ith row vector of the matrix (H beta + b), and the matrix N is a sample matrix The number of samples in (a) is,for activating functions, matricesHidden layer neuron bias b ═ b₁，b₂，...，b_hid]And hid is the number of hidden neurons, and the expression of the weight matrix beta is as follows:

wherein E_DIs a D-dimensional identity matrix and T is a label vector.

Obtained gas classification result f (Y)_T) Classification result for a certain sample thereinThe method is expressed as a c-dimensional row vector, each numerical value in the vector corresponds to the probability of belonging to various gas classes, namely, the first element represents the probability of belonging to the first gas class of the sample, the second element represents the probability of belonging to the second gas class of the sample, and so on, the sum of all elements in the vector is 1, and the sample with the largest probability is regarded as belonging to the corresponding class.

Has the advantages that: the invention reduces the dimension of the data, not only inhibits the data drift, but also maintains the manifold and label information of the source domain and the knowledge information of the two domains; the compact representation of high-dimensional data is completed by local linear manifold learning, the geometric knowledge of a source domain is transferred to a target domain, the local characteristics of a non-drifting sample can be reserved, and the discrimination capability of a model is improved; the domain self-adaptive part utilizes Maximum Mean Difference (MMD) and maximum variance to ensure that sample distributions in different domains are more similar, the inherent attribute is kept, and the robustness and the identification performance of an LME-CDSL model are ensured; the invention realizes the characteristic subspace learning through the characteristic value decomposition, has extremely short calculation cost and is easy to be practically applied.

Drawings

FIG. 1 is a flow chart of the present invention.

Detailed Description

The invention is further described with reference to the following figures and examples:

as shown in fig. 1, a cross-domain subspace learning electronic nose drift compensation method based on manifold learning includes:

the digital signal of a gas is acquired every time, a time period with an indefinite length is taken as a sampling period, the early-stage aging is not obvious, the time period can be long, the later-stage drift is obvious, and the time period can be short. The data set for each time segment is one or more sets of gas digital signals. The multiple groups of data are used for training, and single data are detected;

and 4, outputting a gas classification result by the processor through the human-computer interaction mechanism.

The primary expression of the LME-CDSL mathematical model in the step 3 is as follows:

wherein the content of the first and second substances,for the source domain data set, N_SIs the number of source domain samples,is target domain data, N_TFor the target domain sample number, P is the projection transformation matrix, Tr (-) and (-) respectively^TRespectively trace and transpose operations, E_d×dThe matrix is a d-dimensional unit matrix, all the matrices are capital letters, and the vector is a lowercase letter;

the LME-CDSL mathematical model in the step 3 obtains an optimized expression through a Lagrange multiplier method, wherein the optimized expression is as follows:

in this section, let Obtaining:

B^-1AP＝ρ₂P (3)

thus, P is B^-1A, can be obtained by pairing B with the feature vector set of A^-1A is obtained by decomposing characteristic values, and the optimal projection direction P is obtained due to the fact that the optimal target requires a solution corresponding to the maximum value^*Should be the first d eigenvectors of P, their eigenvectors [ (P)₁)₁，(p₁)₂，...，(p₁)_d]Arranged in descending order according to the characteristic value, P^*Is shown byComprises the following steps:

P^*＝[p₁，p₂，...，p_d]. (4)

wherein, λ is a regularization parameter,andan inter-class scatter matrix and an intra-class scatter matrix of the source domain data, respectively, C being the number of classes, μ_cData center, μ, which is the original source domain subclass c_SFor original Source Domain data centers, μ_TIs the original target domain data center, E is the identity matrix, W is the local weight matrix, rho₂Is a lagrange multiplier.

For target area gas sample data X_TAnd performing drift compensation, wherein the expression is as follows:

Y_T＝(P^*)^TX_T

wherein, Y_TIs the compensated target domain gas sample data.

The drift compensation method in the step 3 is a static analysis method:

obtaining a first batch of gas sample data as drift-free source domain gas sample data;

and calculating the gas sample data of all time periods after the target domain gas sample data with drift by using the first parameter LME-CDSL mathematical model.

The drift compensation method in the step 3 is a dynamic analysis method:

obtaining a first batch of gas sample data as drift-free source domain gas sample data;

and so on until;

and calculating the sample gas in the last time period by using a last-1 parameter LME-CDSL mathematical model to obtain the target domain gas sample data after the last batch of compensation.

The classifier expression in step 3 is:

f(Y_T)＝softmax(Hβ+b)， (5)

wherein E_DIs a D-dimensional identity matrix and T is a label vector.

The verification method of the present invention is as follows, and the following two embodiments are shared:

1) static verification: fixed batch 1 is the source domain and K is the target domain (K ═ 2,3, …, 10);

2) dynamic verification: training was performed on batch K-1 and testing was performed on batch K (K ═ 2,3, …, 10). The number of hidden layer nodes is set to 50.

A sensor drift dataset collected by Vergara over three years was employed, collected by the gas platform within 36 months of 2008. month 1 to 2011. month 2. These recordings were sampled by an electronic nose system having an array of 16 MOS gas sensors exposed to six gases, including ammonia, acetaldehyde, acetone, ethylene, ethanol and toluene at various concentration levels. For each sensor, eight features are extracted, so each sample is a 128-dimensional feature vector. The batches were divided into 10 batches according to time, detailed in the following table:

the static validation results are as follows: the last behavior in the table is the accuracy of the LME-CDSL model verification.

The dynamic verification results are as follows: the last behavior in the table is the accuracy of the LME-CDSL model verification.

The verification method steps are summarized as follows:

inputting: source domain dataTarget domain dataThe number k of neighbors, the regularization factor lambda and the dimensionality reduction d, and the number L of hidden nodes of the ELM classifier.

And (3) outputting: the target domain data classification result V ═ f (Y)_T).

1. Computing source domain data center

2. Computing target domain data center

3. Computing an intra-class scatter matrix for source domain data

And 4. interspecies scatter matrix

Repeating the following step N_SSecondly:

finding the ith source domain dataThe most recent k data;

calculating a local covariance matrix S from the k data obtained above_i；

Computing ith source domain dataWeight vector w of_i

Return to step 4 or end

5. Combining weight vectors of all source domain data into a weight matrix

6. Calculation of B^-1A；

7. To B^-1A, decomposing the characteristic value to obtain a characteristic value and a characteristic vector group P;

8. according to the characteristic values from large to small, the characteristic vectors in the characteristic vector group P are arranged, the first d characteristic vectors are selected as the projection direction of the subspace, and P is obtained^*＝[p₁，p₂，...，p_d]；

9. Computing a projection subspace Y of a source domain and a target domain_S＝(P^*)^TX_S，Y_T＝(P^*)^TX_T；

10. Initializing ELM by using L hidden nodes, an input weight matrix W and hidden layer bias b;

11. using source domain subspace data Y_SCalculating H;

12. calculating a matrix beta;

13. repeating the following step N_TSecondly:

computingTo obtain a sampleClass is the number of categories corresponding to the probability of belonging to each category;

returning to the step 13 or ending;

classifying the results of all samples in the target domain subspaceCombined into a matrix And taking the class corresponding to the maximum probability of each class from each sample as a model classification class, comparing the classification class with an actual label to judge whether the classification is right or wrong, and further calculating the accuracy of the model.

13页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种微型气体传感器的制作工艺及其传感器

Cross-domain subspace learning electronic nose drift compensation method based on manifold learning

相关技术

网友询问留言