Comparison method for multi-data correction smoothness based on NIR high-dimensional data

文档序号:1904797 发布日期:2021-11-30 浏览:6次 中文

阅读说明:本技术 一种基于nir高维数据下多数据修正平滑的比较方法 (Comparison method for multi-data correction smoothness based on NIR high-dimensional data ) 是由 潘晓光 潘哲 焦璐璐 令狐彬 宋晓晨 于 2021-07-13 设计创作,主要内容包括:本发明属于NIR高维数据修正平滑技术领域,具体涉及一种基于NIR高维数据下多数据修正平滑的比较方法,包括如下步骤:数据异常值检测并处理、数据修正、数据平滑、数据特征查看、数据模型拟合训练、数据模型误差计算并比较,所述数据异常值检测并处理使用LOF异常值检测方法去除异常值;所述数据修正使用标准化、标准化+去趋势、乘性散射修正法三种数据修正方法对NIR高维光谱数据进行修正并输出结果;所述数据平滑使用SG-filter平滑数据;所述数据特征查看观看数据修正前后的特征;所述数据模型拟合训练在不同数据上运行相同模型,然后选用偏最小二乘法拟合模型;所述数据模型误差计算并比较计算误差并比较不同数据处理方法下的结果。(The invention belongs to the technical field of NIR high-dimensional data correction smoothing, and particularly relates to a comparison method of multi-data correction smoothing based on NIR high-dimensional data, which comprises the following steps: detecting and processing abnormal data values, correcting data, smoothing data, checking data characteristics, fitting training a data model, calculating and comparing errors of the data model, and removing the abnormal values by using an LOF abnormal value detection method; the data correction uses three data correction methods of standardization, standardization + trend removal and multiplicative scattering correction to correct the NIR high-dimensional spectral data and output results; the data smoothing uses SG-filter smoothing data; the data characteristics check characteristics of the viewing data before and after correction; the data model fitting training runs the same model on different data, and then selects a partial least square method fitting model; and calculating errors of the data models, comparing the calculated errors and comparing results under different data processing methods.)

1. A comparison method for multi-data correction smoothing based on NIR high-dimensional data is characterized by comprising the following steps: comprises the following steps:

s100, detecting and processing abnormal data values: removing the abnormal value by using an LOF abnormal value detection method;

s200, data correction: correcting NIR high-dimensional spectral data by using three data correction methods of standardization, standardization + trend removal and multiplicative scattering correction, and outputting a result;

s300, data smoothing: smoothing the data using SG-filter;

s400, data characteristic checking: characteristics before and after viewing data modification;

s500, data model fitting training: running the same model on different data, and then selecting a partial least square method to fit the model;

s600, calculating and comparing data model errors: the error is calculated and the results under different data processing methods are compared.

2. The method for comparing multiple data correction smoothing based on NIR high dimensional data as claimed in claim 1, wherein: in the S100 data abnormal value detection and processing, an unsupervised learning abnormal value detection method capable of detecting abnormal values in a sub-high dimensional space is used for detecting local abnormal values, and then the method is used for removing possible abnormal values and carrying out data correction.

3. The method for comparing multiple data correction smoothing based on NIR high dimensional data as claimed in claim 2, wherein: in the S200 data correction, Barnes standardizes logarithmic data under each wavelength, standardizes data of each variable, then performs trend-related fitting on the standardized data by using a linear trend detrending method, removes the trend of continuous rising or falling of the data, and then uses Martens to rotate the spectral data to enable the spectral data to be close to the mean value.

4. The method for comparing multiple data correction smoothing under NIR high dimensional data as claimed in claim 3, wherein the method is characterized in thatIn the following steps: in the S300 data smoothing, SG-filter is used for simply removing data noise smoothing data, and x is assumedjIs the central value of the smoothing window, the length of the smoothing window is equal to 2m +1, i is ∈ [ -m, m],CiRepresents each xj+iThe derived weight of (1), then xjIs calculated by the formula

5. The comparison method for multiple data correction smoothing based on NIR high dimensional data as claimed in claim 4, wherein: in the S500 data model fitting training, data are divided into a training set and a test set, a model is fitted on the training set to find a proper model, and then the proper model is fitted on the test set.

6. The method for comparing multiple data correction smoothing based on NIR high dimensional data as claimed in claim 5, wherein: in the S600 data model error calculation and comparison, the same model is fitted on the test sets subjected to different treatments, and finally the mean square error prediction error is calculated

Technical Field

The invention belongs to the technical field of NIR high-dimensional data correction smoothing, and particularly relates to a comparison method for multi-data correction smoothing based on NIR high-dimensional data.

Background

The NIR spectral data are generally high-dimensional and data-volume data at present, and high correlation inside the variance of particle sizes can cause different reflection values, and meanwhile, the data trend of high unsmooth exists.

Cause of problems or defects: to address this problem, studies have taken the logarithm of the data at each wavelength and then normalized, and studies have rotated the spectrum to approximate the mean and variance. However, the existing methods do not have the effect of systematically comparing the correction methods and combining the correction methods, so the data correction effect still needs to be improved.

Disclosure of Invention

Aiming at the problems that the effects of various correction methods are not compared, the data correction effect is poor and the like in the method, the invention provides the method which can reduce the loss of effective information of data and improve the accuracy of the model fitting.

In order to solve the technical problems, the invention adopts the technical scheme that:

a comparison method for multi-data correction smoothing based on NIR high-dimensional data comprises the following steps:

s100, detecting and processing abnormal data values: removing the abnormal value by using an LOF abnormal value detection method;

s200, data correction: correcting NIR high-dimensional spectral data by using three data correction methods of standardization, standardization + trend removal and multiplicative scattering correction, and outputting a result;

s300, data smoothing: smoothing the data using SG-filter;

s400, data characteristic checking: characteristics before and after viewing data modification;

s500, data model fitting training: running the same model on different data, and then selecting a partial least square method to fit the model;

s600, calculating and comparing data model errors: the error is calculated and the results under different data processing methods are compared.

In the data abnormal value detection and processing, the local abnormal value is detected by using an unsupervised learning abnormal value detection method capable of detecting the abnormal value in the sub-high dimensional space, and then the possible abnormal value is removed by using the method to carry out data correction.

In the data correction, Barnes standardizes logarithmic data under each wavelength, standardizes the data of each variable, then performs trend-related fitting on the standardized data by using a linear trend detrending method, removes the trend of continuous rising or falling of the data, and then uses Martens to rotate the spectral data to enable the spectral data to be close to the mean value.

In the data smoothing, the data is smoothed by simply removing data noise by using SG-filter, and x is assumedjIs the central value of the smoothing window, the length of the smoothing window is equal to 2m +1, i is ∈ [ -m, m],CiRepresents each xj+iThe derived weight of (1), then xjIs calculated by the formula

In the data model fitting training, data are divided into a training set and a test set, a model is fitted on the training set to find a proper model, and then the proper model is fitted on the test set.

In the data model error calculation and comparison, the same model is fitted on the test sets subjected to different treatments, and finally the mean square error prediction error is calculated

Compared with the prior art, the invention has the following beneficial effects:

the invention combines four correction modes of a multiplicative scattering correction method, standardization, trend elimination and raw data with SG-filter, can select a data correction mode which can eliminate irrelevant noise as much as possible and has less loss of useful information when selecting different derivation orders and window smooth lengths, can compare the effect of different centralization modes on data correction, and provides reference for later model establishment or further analysis.

Drawings

FIG. 1 is a system flow diagram of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

A comparison method of multiple data correction smoothing based on NIR high-dimensional data, as shown in fig. 1, includes the following steps:

s100, detecting and processing abnormal data values: removing the abnormal value by using an LOF abnormal value detection method;

s200, data correction: correcting NIR high-dimensional spectral data by using three data correction methods of standardization, standardization + trend removal and multiplicative scattering correction, and outputting a result;

s300, data smoothing: smoothing the data using SG-filter;

s400, data characteristic checking: characteristics before and after viewing data modification;

s500, data model fitting training: running the same model on different data, and then selecting a partial least square method to fit the model;

s600, calculating and comparing data model errors: the error is calculated and the results under different data processing methods are compared.

Further, in the step data outlier detection and processing, for high-dimensional data, especially high-dimensional spectral data, there is a high correlation between the data, and usually there are multiple variables, and the noise and fluctuation of the data under different spectra are very large, so different types of data preprocessing methods can cause the final model setting to be completely different.

Further, in the step data abnormal value detection and processing, the local abnormal value is detected by using an unsupervised learning abnormal value detection method capable of detecting the abnormal value in the sub-high dimensional space, and then the possible abnormal value is removed by using the method, and data correction is carried out.

Further, in step data modification, there is generally a high internal correlation between different reflectance values, and to solve this problem Barnes normalizes the logarithmic data at each wavelength, normalizes the data for each variable, and after normalization, the data has a mean of 0 and a variance of 1, according to the formulaWherein ximIs the ith data value for the mth variable (spectrum),is the mean, σ, of the m-th spectrummIs the standard deviation of the m-th spectrum. The normalized data is then fitted with a trend, usually two detrending methods, either constant or linear, where removing the linear trend removes the trend of the data going up or down. To better solve the problem of higher internal correlation, Martens is used to rotate the spectral data to make it close to the mean value, and the reference formula isWherein ximIs the ith data value for the mth variable (spectrum),is the mean of the m-th spectrum,the method is data processed by a multiplicative scattering correction method.

Further, in the step data smoothing, the data processed in the previous step still has much noise. The simple way to remove data noise is derivative smoothing, SG-filter is a more mature method to smooth data,let x bejIs the central value of the smoothing window, the length of the smoothing window is equal to 2m +1, i is ∈ [ -m, m],CiRepresents each xj+iThe derived weight of (1), then xjIs calculated by the formulaThe smoothing method distributes different weights to points in a smoothing window, tries to fit the smoothing window by using a least square curve, each smoothing window can find a least square curve which enables the error to be minimum, and the data are substituted to obtain a middle point xjAn estimate of (d). Only the usual cases of the derivative numbers 1 and 2, window lengths 3,5,7,9 are discussed in this invention.

Further, in step data feature inspection, normalized data should fluctuate around 0, detrended data should have no continuous upward or downward trend, msc-corrected data would fluctuate around the mean of each variable, and SG-filter smoothed data should fluctuate slowly.

Further, in the step of data model fitting training, NIR high-dimensional data have multivariate characteristics, and on the basis, a partial least square fitting model is selected. Dividing data into a training set and a testing set, fitting a model on the training set to find a proper model, and then fitting the proper model on the testing set.

Further, in the step of data model error calculation and comparison, the same model is fitted on the test sets subjected to different treatments, and finally the mean square error prediction error is calculated

Although only the preferred embodiments of the present invention have been described in detail, the present invention is not limited to the above embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art, and all changes are encompassed in the scope of the present invention.

6页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种燃油车交通碳排放计算方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!