Confidence coefficient estimation method for vehicle fuel oil near infrared spectrum detection result

文档序号:1612172 发布日期:2020-01-10 浏览:16次 中文

阅读说明:本技术 一种车用燃油近红外光谱检测结果的置信度估计方法 (Confidence coefficient estimation method for vehicle fuel oil near infrared spectrum detection result ) 是由 熊智新 张肖雪 杨冲 赵静远 于 2019-09-25 设计创作,主要内容包括:本发明提供一种车用燃油近红外光谱检测结果的置信度估计方法,该方法基于主成分分析完成马氏距离的计算,在此基础上根据F分布得出显著性水平并完成待测样本检测结果的置信度估计。根据马氏距离及由其得到的统计量服从F分布,为近红外光谱分析技术在实际应用时提供了对检测结果可靠性的置信度估计,为分析对象下一步定性诊断或定量分析结果有效性评估提供量化依据。(The invention provides a confidence estimation method for a vehicle fuel near infrared spectrum detection result, which is used for completing the calculation of Mahalanobis distance based on principal component analysis, obtaining a significance level according to F distribution on the basis and completing the confidence estimation of the detection result of a sample to be detected. And according to the Mahalanobis distance and the statistic obtained by the Mahalanobis distance, obeying to F distribution, confidence coefficient estimation on the reliability of the detection result is provided for the practical application of the near infrared spectrum analysis technology, and a quantitative basis is provided for the next qualitative diagnosis of the analysis object or the effectiveness evaluation of the quantitative analysis result.)

1. A confidence degree estimation method for a vehicle fuel near infrared spectrum detection result is characterized by comprising the following steps:

s1, carrying out standardization processing on spectral data to obtain correction set spectral data X;

s2, adopting PCA-MD T2Detecting and removing abnormal samples in the correction set spectral data X to ensure that the correction set spectral data X are all normal samples;

s3, carrying out PCA decomposition on the spectrum data X of the correction set, and combining the spectrum data X of the test settestCalculating the squared value of the Mahalanobis distance between the sample to be measured and the sample in the correction set

Figure FDA0002214692230000011

S4, according to

Figure FDA0002214692230000012

2. The method for estimating the confidence of the detection result of the near infrared spectrum of the vehicle fuel according to claim 1, wherein the step S2 includes:

s21: the PCA decomposition of the calibration set spectral data can be expressed as:

Figure FDA0002214692230000013

in the formula, T is belonged to Rn×pFor the score matrix, n represents the number of samples, P represents the number of principal components, and P belongs to Rm×pM represents the number of variables as a load matrix;

s22: the square of the PCA-MD value of the ith sample of the calibration set spectral data can be expressed as:

Figure FDA0002214692230000014

wherein, tiRepresenting the ith row vector of the scoring matrix T, and sigma being the covariance matrix of T;

S23:T2the control limit is expressed as:

wherein, alpha is a significance level, and the confidence coefficient of the control limit is 1-alpha; if it isIf the value is less than the control limit, judging the sample to be a normal sample; if it is

Figure FDA0002214692230000017

3. The method for estimating the confidence of the detection result of the near infrared spectrum of the vehicle fuel according to claim 2, wherein the step S3 includes:

s31: after the abnormal samples are removed, PCA decomposition is carried out on the spectrum data X of the correction set according to a formula (1), and the covariance matrix sigma of the load matrix P and the scoring matrix T is updated;

s32: calculating a score matrix of the spectral data of the sample set to be detected:

Ttest=XtestP (4)

in the formula, XtestA sample set to be detected and a correction set load matrix P are obtained;

s33: the square value of the Mahalanobis distance between the ith sample to be detected and the calibration set

Figure FDA0002214692230000021

in the formula, ttest-iScore matrix T representing sample set to be testedtestThe ith row vector of (1).

4. The method for estimating the confidence of the detection result of the near infrared spectrum of the vehicle fuel according to claim 3, wherein the step S4 includes:

s41: significance level α is achieved based on F distribution with degrees of freedom p and n-ptestFor the ith test set sample, the significance level αtest-iObtained according to the following formula:

Figure FDA0002214692230000023

s42: confidence level c for the ith test set sampletest-iCan be expressed as:

ctest-i=1-αtest-i(7)。

5. the confidence estimation method for the detection result of the near infrared spectrum of the vehicle fuel according to any one of claims 2 to 4, characterized in that the significance level α is set to be 0.01 or 0.05.

Technical Field

The invention relates to near infrared spectrum anomaly detection of vehicle fuel oil, in particular to a near infrared detection result confidence degree estimation method based on Mahalanobis distance.

Background

Based on group frequency and frequency multiplication absorption of hydrogen group stretching vibration in organic molecules, the near infrared spectrum can establish a linear or nonlinear relation between the spectrum and a quality index through a chemometrics method, quickly and efficiently complete qualitative and quantitative analysis of a sample, and can overcome the defects of complicated process, high cost, low efficiency and the like in the traditional oil analysis technology.

In recent years, the near infrared spectroscopy is widely and more mature to be applied to the measurement of the content of various components of oil products so as to improve the production management and quality supervision level of the oil products. In the acquisition process of the near infrared spectrum, abnormal spectrum data can be generated due to factors such as change of sample properties, change of experimental conditions, measurement errors of instruments, artificial measurement errors and the like; the presence of abnormal spectra affects the data characteristic performance, and further reduces the reliability of the spectrum detection result. Therefore, identifying and rejecting abnormal samples is a necessary condition for building a reliable near-infrared analysis model. Common abnormal sample point elimination methods include Mahalanobis Distance (MD), a lever method, monte carlo cross validation, and the like. However, in the actual process of oil product rapid detection, the difference of the same oil product spectral data is often unavoidable in consideration of different production processes and adulteration possibility existing in different oil refineries. Therefore, simple abnormal sample rejection is often not desirable, and enterprises need to provide a suitable judgment standard (for example, the confidence level is not less than 80%) to complete the judgment and screening of samples, which is important for the rapid detection of oil quality indexes and further diagnostic analysis.

However, in the field of near infrared spectroscopy, there is no mature confidence estimation method for the detection result of the spectral data. In the field of process control, the quadratic calculation of mahalanobis distance (PCA-MD) due to Principal Component Analysis (PCA) is equivalent to hotelling T2PCA-MD is commonly used for T2Checking; through T2The comparison of the control limit and the square of the PCA-MD value can judge whether the sample to be tested is in a normal state. On the basis of the above, T is combined2And the statistic accords with the F distribution, so that the calculation of the significance level of the sample can be completed. Therefore, by means of the data distribution idea, the significance level of the sample can be calculated by calculating the square value of near infrared spectrum data PCA-MD, and the confidence degree of the detection result can be further estimated.

Disclosure of Invention

Aiming at the rapid detection of oil products, the invention provides a near-infrared detection result confidence degree estimation method based on the Mahalanobis distance, so that whether a sample is qualified or not can be conveniently and effectively judged according to the confidence degree in practical application, and the reliability of an analysis result can be ensured.

The method completes the calculation of the Mahalanobis distance based on the principal component analysis, obtains the significance level according to the F distribution on the basis, and completes the confidence estimation of the detection result of the sample to be detected.

The implementation of the process specifically comprises the following steps:

s1, carrying out standardization processing on spectral data to obtain correction set spectral data X;

s2, adopting PCA-MD T2Detecting and removing abnormal samples in the correction set to ensure that the spectral data X in the correction set are all normal samples;

s3, carrying out PCA decomposition on the spectrum data X of the correction set, and combining the spectrum data X of the test settestCalculating the squared value of the Mahalanobis distance between the sample to be measured and the sample in the correction set

Figure BDA0002214692240000021

S4, according to

Figure BDA0002214692240000022

Calculating the significance level alpha by following the F distributiontestThen obtaining confidence coefficient c of the near infrared spectrum detection resulttest

Step S2 includes:

s21: the PCA decomposition of the calibration set spectral data can be expressed as:

Figure BDA0002214692240000023

in the formula, T is belonged to Rn×pFor the score matrix, n represents the number of samples, P represents the number of principal components, and P belongs to Rm×pM represents the number of variables as a load matrix;

s22: the square of the PCA-MD value of the ith sample of the calibration set spectral data can be expressed as:

Figure BDA0002214692240000024

wherein, tiRepresenting the ith row vector of the scoring matrix T, and sigma being the covariance matrix of T;

S23:T2the control limit may be expressed as:

Figure BDA0002214692240000025

where α is the significance level and the confidence in the control limits is 1- α. At this time, if

Figure BDA0002214692240000026

If the value is less than the control limit, judging the sample to be a normal sample; if it is

Figure BDA0002214692240000027

If the value is larger than the control limit, the abnormal sample is judged.

Step S3 includes:

s31: after the abnormal samples are removed, PCA decomposition is carried out on the spectrum data X of the correction set according to a formula (1), and the covariance matrix sigma of the load matrix P and the scoring matrix T is updated;

s32: calculating a score matrix of the spectral data of the sample set to be detected:

Ttest=XtestP (4)

in the formula, XtestA sample set to be detected and a correction set load matrix P are obtained;

s33: the square value of the Mahalanobis distance between the ith sample to be detected and the calibration set

Figure BDA0002214692240000031

Can be expressed as:

Figure BDA0002214692240000032

in the formula, ttest-iScore matrix T representing sample set to be testedtestThe ith row vector of (1).

Step S4 includes:

s41: according to the degree of freedom of the F distribution of p and n-pTo a significant level of alphatestFor the ith test set sample, the significance level αtest-iCan be obtained according to the following formula:

Figure BDA0002214692240000033

s42: confidence level c for the ith test set sampletest-iCan be expressed as:

ctest-i=1-αtest-i(7)

the method has the advantages that according to the Mahalanobis distance and the statistic obtained by the Mahalanobis distance, the F distribution obeys, confidence degree estimation of the reliability of the detection result is provided for the practical application of the near infrared spectrum analysis technology, and a quantitative basis is provided for the next qualitative diagnosis of an analysis object or the effectiveness evaluation of the quantitative analysis result.

Drawings

FIG. 1 is a flow chart of a PCA-MD based near infrared anomalous spectral confidence quantification method;

FIG. 2 is T taken from PCA-MD2Checking and eliminating a line graph of abnormal samples in the correction set;

FIG. 3 is a plot of a sample confidence estimate for a near infrared spectrum of diesel-blended gasoline;

FIG. 4 is a sample confidence estimate line graph for simulation case 1;

fig. 5 is a line graph of confidence estimates for simulation case 2.

Detailed description of the preferred embodiments

The technical scheme adopted by the method for performing confidence estimation on the oil product near infrared spectrum detection result is as follows:

s1, carrying out standardization processing on spectral data to obtain correction set spectral data X;

s2, adopting PCA-MD T2Detecting and removing abnormal samples in the correction set to ensure that the spectral data X in the correction set are all normal samples;

s3, carrying out PCA decomposition on the spectrum data X of the correction set, and combining the spectrum data X of the test settestCalculating the squared value of the Mahalanobis distance between the sample to be measured and the sample in the correction set

Figure BDA0002214692240000041

S4, according to

Figure BDA0002214692240000042

Calculating the significance level alpha by following the F distributiontestThen obtaining confidence coefficient c of the near infrared spectrum detection resulttest

Step S2 includes:

s21: the PCA decomposition of the calibration set spectral data can be expressed as:

Figure BDA0002214692240000043

in the formula, T is belonged to Rn×pFor the score matrix, n represents the number of samples, P represents the number of principal components, and P belongs to Rm×pM represents the number of variables as a load matrix;

s22: the square of the PCA-MD value of the ith sample of the calibration set spectral data can be expressed as:

wherein, tiRepresenting the ith row vector of the scoring matrix T, and sigma being the covariance matrix of T;

S23:T2the control limit may be expressed as:

Figure BDA0002214692240000045

where α is the significance level (typically set at 0.01 or 0.05) and the confidence of the control limit is 1- α. At this time, ifIf the value is less than the control limit, judging the sample to be a normal sample; if it is

Figure BDA0002214692240000047

If the value is greater than the control limit, the judgment is thatAnd (4) abnormal samples.

Step S3 includes:

s31: after the abnormal samples are removed, PCA decomposition is carried out on the spectrum data X of the correction set according to a formula (1), and the covariance matrix sigma of the load matrix P and the scoring matrix T is updated;

s32: calculating a score matrix of the spectral data of the sample set to be detected:

Ttest=XtestP (4)

in the formula, XtestA sample set to be detected and a correction set load matrix P are obtained;

s33: the square value of the Mahalanobis distance between the ith sample to be detected and the calibration set

Figure BDA0002214692240000048

Can be expressed as:

Figure BDA0002214692240000049

in the formula, ttest-iScore matrix T representing sample set to be testedtestThe ith row vector of (1).

Step S4 includes:

s41: significance level α is achieved based on F distribution with degrees of freedom p and n-ptestFor the ith test set sample, the significance level αtest-iCan be obtained according to the following formula:

Figure BDA0002214692240000051

s42: confidence level c for the ith test set sampletest-iCan be expressed as:

ctest-i=1-αtest-i(7)

13页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种光谱仪输出光谱补偿方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!