Method and device for constructing spectrum quantitative analysis model

文档序号：1863211 发布日期：2021-11-19 浏览：26次中文

阅读说明：本技术 一种光谱定量分析模型的构建方法及装置 (Method and device for constructing spectrum quantitative analysis model ) 是由魏康丽杨平李贤信于 2021-08-24 设计创作，主要内容包括：本发明公开了一种光谱定量分析模型的构建方法及装置,包括：获取训练样本的化学值和光谱特征；根据化学值和光谱特征分别计算得到正则化参数初始值和不敏感损失函数参数初始值；并通过参数寻优得到最优正则化参数和最优不敏感损失函数参数；根据最优正则化参数和最优不敏感损失函数参数构建光谱定量分析模型。本发明实施例通过训练样本的化学值和光谱特征计算得到正则化参数初始值和不敏感损失函数参数初始值,并依据这两个参数初始值进行参数优化以及光谱定量分析模型的构建,不仅能够有效提高参数优化和建模的运算速度,且能够有效提高构建得到光谱定量分析模型的预测精度和泛化能力。(The invention discloses a method and a device for constructing a spectrum quantitative analysis model, which comprise the following steps: acquiring a chemical value and spectral characteristics of a training sample; respectively calculating to obtain a regularization parameter initial value and an insensitive loss function parameter initial value according to the chemical value and the spectral characteristics; obtaining an optimal regularization parameter and an optimal insensitive loss function parameter through parameter optimization; and constructing a spectrum quantitative analysis model according to the optimal regularization parameter and the optimal insensitive loss function parameter. According to the embodiment of the invention, the regularization parameter initial value and the insensitive loss function parameter initial value are obtained by calculating the chemical value and the spectral characteristic of the training sample, and parameter optimization and the construction of the spectral quantitative analysis model are carried out according to the two parameter initial values, so that the operation speed of parameter optimization and modeling can be effectively improved, and the prediction precision and the generalization capability of the spectral quantitative analysis model obtained by construction can be effectively improved.)

1. A method for constructing a spectral quantitative analysis model is characterized by comprising the following steps:

acquiring a chemical value and spectral characteristics of a training sample;

calculating to obtain a regularization parameter initial value of the support vector machine according to the distribution data and the precision of the chemical value;

calculating to obtain noise data of the training sample according to the spectral characteristics, and calculating to obtain an initial value of an insensitive loss function parameter of a support vector machine according to the number of samples of the training sample and the noise data;

performing parameter optimization according to the regularization parameter initial value and the insensitive loss function parameter initial value respectively to obtain an optimal regularization parameter and an optimal insensitive loss function parameter;

and constructing a spectrum quantitative analysis model according to the optimal regularization parameter and the optimal insensitive loss function parameter.

2. The method for constructing a quantitative spectral analysis model according to claim 1, wherein the distribution data includes a mean value and a standard deviation, and the initial value of the regularization parameter of the support vector machine is calculated according to the distribution data and precision of the chemical value, specifically including:

selecting a kernel function of a support vector machine;

and calculating to obtain a regularization parameter initial value of the support vector machine according to the average value, the standard deviation and the precision of the chemical value based on the kernel function.

3. The method for constructing the quantitative spectral analysis model according to claim 1, wherein the initial value of the insensitive loss function parameter of the support vector machine is calculated according to the number of samples of the training sample and the noise data, and the expression of the initial value of the insensitive loss function parameter is:

4. The method for constructing a quantitative spectral analysis model according to claim 1, wherein the parameter optimization is performed according to the regularization parameter initial value and the insensitive loss function parameter initial value, respectively, to obtain an optimal regularization parameter and an optimal insensitive loss function parameter, specifically comprising:

dividing samples of the training set into K folds, and calculating an evaluation parameter of each fold under the regularization parameter initial value and the insensitive loss function parameter initial value;

respectively performing parameter search according to the regularization parameter initial value and the insensitive loss function parameter initial value by adopting a parameter optimization method to obtain candidate regularization parameters and candidate insensitive loss function parameters;

and respectively taking the candidate regularization parameter and the candidate insensitive loss function parameter when the evaluation parameter is optimal as an optimal regularization parameter and an optimal insensitive loss function parameter.

5. The method for constructing a quantitative spectral analysis model according to claim 4, wherein the parameter optimization method comprises a grid search method, a gradient descent method, and a single-target optimization method.

6. The method for constructing a quantitative spectral analysis model according to claim 2, wherein the kernel function is a radial basis kernel function, and the expression of the radial basis kernel function is as follows:

K(x_i，x)＝exp(-γ||x-x_i||²)

wherein x is_iIs a certain central point of the feature space, x is any point of the feature space, and gamma is a width parameter.

7. The method for constructing a quantitative analysis model for spectrum according to claim 2, wherein the expression of the initial value of the regularization parameter is:

wherein p is the precision of the chemical value,is the mean value of the chemical values, σ_yStandard deviation of chemical values.

8. The method for constructing a quantitative analysis model for spectrum according to claim 3, wherein the expression of the noise standard deviation is:

wherein d is 0-1, k is the number of neighboring points, y_iIn the case of a chemical value,is a k-neighbor algorithm predictor.

9. An apparatus for constructing a model for quantitative analysis of spectra, comprising:

the data acquisition module is used for acquiring chemical values and spectral characteristics of the training samples;

the first calculation module is used for calculating to obtain a regularization parameter initial value of the support vector machine according to the distribution data and the precision of the chemical value;

the second calculation module is used for calculating the noise data of the training sample according to the spectral characteristics and calculating the initial value of the insensitive loss function parameter of the support vector machine according to the sample number of the training sample and the noise data;

the parameter optimizing module is used for optimizing parameters according to the regularization parameter initial value and the insensitive loss function parameter initial value respectively to obtain an optimal regularization parameter and an optimal insensitive loss function parameter;

and the model construction module is used for constructing a spectrum quantitative analysis model according to the optimal regularization parameter and the optimal insensitive loss function parameter.

10. The apparatus for constructing a quantitative spectral analysis model according to claim 9, wherein the distribution data includes a mean and a standard deviation, and the first calculating module is specifically configured to:

selecting a kernel function of a support vector machine;

Technical Field

The invention relates to the technical field of quantitative analysis, in particular to a method and a device for constructing a spectrum quantitative analysis model.

Background

The infrared spectrum technology has the advantages of simple and convenient operation, high analysis speed, high detection efficiency, no need of pretreatment and the like, and is widely applied to various industries such as food, medicine, cosmetics, petrochemical industry and the like. Machine learning and deep learning algorithms such as linear regression, support vector machine, neural network and the like are common methods for establishing a quantitative prediction model. Among them, Partial Least Squares (PLS) regression is the most classical and widely used method in linear regression. But variable screening is time-consuming and applicability is difficult to guarantee. The neural network has the disadvantages of various related parameters, relatively complex construction and higher application threshold. The support vector machine can well solve the problems of small samples, high dimension, nonlinearity and the like on the principle of minimizing the structural risk, has fewer related parameters and relatively simple construction, and is widely applied to regression analysis. The reasonable selection of the parameter C and the parameter epsilon of the support vector machine can enable the model to have higher prediction accuracy and better generalization capability. The parameter C determines the balance between model prediction accuracy and model complexity. For example, if C is too large, the goal is to minimize the empirical risk, and the accuracy of the model is high, but the model is too complex and the generalization performance is poor. The parameter epsilon controls the width of the epsilon insensitive region, affects the number of SVs used to construct the regression function, and thus affects the model complexity. For example, if ε is large, it will result in fewer SVs being selected and the model is too simple.

The existing construction method of the spectrum quantitative analysis model is constructed according to optimized parameters of a support vector machine, and the parameter optimization method of the support vector machine mainly comprises the following steps: setting initial values of parameters according to prior knowledge, and carrying out parameter optimization by combining methods such as grid search, gradient descent method and the like. However, the parameter optimization effect of the existing construction method of the spectral quantitative analysis model is poor, so that the prediction accuracy and the generalization capability of the spectral quantitative analysis model are low.

Disclosure of Invention

The invention provides a method and a device for constructing a spectral quantitative analysis model, which aim to solve the problem that the prediction precision and generalization capability of the spectral quantitative analysis model are low due to poor parameter optimization effect of the conventional method for constructing the spectral quantitative analysis model.

The first embodiment of the present invention provides a method for constructing a spectral quantitative analysis model, including:

acquiring a chemical value and spectral characteristics of a training sample;

calculating to obtain a regularization parameter initial value of the support vector machine according to the distribution data and the precision of the chemical value;

and constructing a spectrum quantitative analysis model according to the optimal regularization parameter and the optimal insensitive loss function parameter.

Further, the distribution data includes an average value and a standard deviation, and the calculating according to the distribution data and precision of the chemical value obtains an initial value of a regularization parameter of the support vector machine, which specifically includes:

selecting a kernel function of a support vector machine;

Further, the initial value of the insensitive loss function parameter of the support vector machine is obtained by calculation according to the number of samples of the training sample and the noise data, and the expression of the initial value of the insensitive loss function parameter is as follows:

Further, the performing parameter optimization according to the regularization parameter initial value and the insensitive loss function parameter initial value respectively to obtain an optimal regularization parameter and an optimal insensitive loss function parameter specifically includes:

Further, the parameter optimization method comprises a grid search method, a gradient descent method and a single-target optimization method.

Further, the kernel function is a radial basis kernel function, and the expression of the radial basis kernel function is:

K(x_i,x)＝exp(-γ||x-x_i||²)

wherein x is_iIs a certain central point of the feature space, x is any point of the feature space, and gamma is a width parameter.

Further, the expression of the regularization parameter initial value is as follows:

wherein p is the precision of the chemical value,is the mean value of the chemical values, σ_yStandard deviation of chemical values.

Further, the expression of the noise standard deviation is as follows:

wherein n is the number of samples in the training set, d is 0-1, k is the number of neighboring points, y_iIn the case of a chemical value,is a k-neighbor algorithm predictor.

A second embodiment of the present invention provides a device for constructing a spectral quantitative analysis model, including:

the data acquisition module is used for acquiring chemical values and spectral characteristics of the training samples:

Further, the distribution data includes a mean and a standard deviation, and the first calculating module is specifically configured to:

selecting a kernel function of a support vector machine;

Drawings

FIG. 1 is a schematic flow chart of a method for constructing a quantitative spectral analysis model according to an embodiment of the present invention;

FIG. 2 is another schematic flow chart of a method for constructing a quantitative spectral analysis model according to an embodiment of the present invention;

FIG. 3 is a cross-validation residual error diagram for a support vector machine according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating a support vector machine prediction residual according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a PLS cross validation residual provided by an embodiment of the present invention;

FIG. 6 is a diagram illustrating PLS prediction residuals provided by an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a device for constructing a quantitative spectral analysis model according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In the description of the present application, it is to be understood that the terms "first", "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present application, "a plurality" means two or more unless otherwise specified.

In the description of the present application, it is to be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the present application can be understood in a specific case by those of ordinary skill in the art.

Referring to fig. 1-6, in a first embodiment of the present invention, a method for constructing a quantitative spectrum analysis model shown in fig. 1 is provided, which includes:

s1, acquiring chemical values and spectral characteristics of the training samples;

s2, calculating to obtain a regularization parameter initial value of the support vector machine according to the distribution data and the precision of the chemical value; wherein the distribution data comprises a mean and a standard deviation;

s3, calculating to obtain noise data of the training sample according to the spectral characteristics, and calculating to obtain an initial value of an insensitive loss function parameter of the support vector machine according to the sample number of the training sample and the noise data;

s4, respectively carrying out parameter optimization according to the regularization parameter initial value and the insensitive loss function parameter initial value to obtain an optimal regularization parameter and an optimal insensitive loss function parameter;

and S5, constructing a spectrum quantitative analysis model according to the optimal regularization parameters and the optimal insensitive loss function parameters.

According to the embodiment of the invention, the regularization parameter initial value and the insensitive loss function parameter initial value are calculated through the chemical value and the spectral characteristic of the training sample, the parameter optimization is further carried out, the optimal regularization parameter and the optimal insensitive loss function parameter are determined, and the spectral quantitative analysis model for complex component analysis can be quickly constructed based on the optimal regularization parameter and the optimal insensitive loss function parameter, so that the operation speed of parameter optimization and modeling can be effectively improved, and the prediction precision and the generalization capability of the constructed spectral quantitative analysis model can be effectively improved.

Fig. 2 is a schematic flow chart of a method for constructing a quantitative spectral analysis model according to an embodiment of the present invention.

As a specific implementation manner of the embodiment of the present invention, the calculating, according to the distribution data and the precision of the chemical value, to obtain the initial value of the regularization parameter of the support vector machine specifically includes:

selecting a kernel function of a support vector machine;

Optionally, the kernel function of the support vector machine is a radial basis kernel function, and the expression of the radial basis kernel function is:

K(x_i,x)＝exp(-γ||x-x_i||²)

wherein x is_iIs a certain central point of the feature space, x is any point of the feature space, and gamma is a width parameter.

Based on the radial basis kernel function, the regularization parameter can be associated with the response value range of the training set, so that the regularization parameter equal to the response value range of the training set can be selected, and specifically, the chemical value of the training sample is combined with the precision to obtain the initial value C of the regularization parameter₀The expression of (a) is:

wherein p is the precision of the chemical value,is the mean value of the chemical values, σ_yStandard deviation of chemical values.

According to the embodiment of the invention, the regularization parameter initial value of the support vector machine can be reasonably determined according to the distribution data and the precision of the chemical value, the parameter initial value does not need to be set according to the prior knowledge and resampling is not needed, so that the regularization parameter initial value of the support vector machine can be quickly and accurately obtained, and the parameter optimization effect of the support vector machine can be effectively improved.

As a specific implementation manner of the embodiment of the present invention, an initial value of an insensitive loss function parameter of a support vector machine is obtained by calculation according to the number of samples of a training sample and noise data, and an expression of the initial value of the insensitive loss function parameter is as follows:

wherein epsilon₀The initial value of the insensitive loss function parameter is n, t is the preset multiple of the number of samples, f (n) is the logarithmic function of the number of samples, and sigma is the noise standard deviation. Illustratively, t is 1 to 10 times of the number of samples, f (n) can be mln, mlogn, and the like, m is a positive real number, and the noise standard deviation σ is expressed as:

wherein n is the number of samples in the training set, d is 0-1, k is the number of neighboring points, y_iIn the case of a chemical value,is a k-neighbor algorithm predictor.

In the embodiment of the invention, the insensitive loss function parameter is in direct proportion to the in-out noise level and in inverse proportion to the number of the training set samples, namely, a larger number of samples should generate a smaller insensitive loss function parameter. The introduction of the function related to the number of training samples can effectively prevent the initial value of the insensitive loss function parameter from approaching 0 when the number of samples is too large.

According to the embodiment of the invention, the initial value epsilon of the insensitive loss function parameter is obtained by calculation according to the number of samples and noise data₀The influence of the number of samples and noise data on the construction of the spectral quantitative analysis model is comprehensively considered, the determination mode of the insensitive loss function parameter initial value of the embodiment of the invention not only can effectively reduce the operation amount of parameter optimization, but also can effectively improve the prediction precision and generalization capability of the model when the constructed spectral quantitative analysis model carries out quantitative analysis on the samples with complex components. In the embodiment of the present invention, the sample of complex components includes soy sauce and its fermentation broth, which is prepared by fermenting boiled soybean, roasted cereal, salt water and aspergillus oryzae or aspergillus sojae, and contains dozens or even hundreds of complex compounds such as esters, alcohols, carbonyl compounds, acetals and phenols, and the kinds and contents of the compounds are affected by various processes.

As a specific implementation manner of the embodiment of the present invention, parameter optimization is performed according to a regularization parameter initial value and an insensitive loss function parameter initial value, respectively, to obtain an optimal regularization parameter and an optimal insensitive loss function parameter, which specifically includes:

dividing samples of a training set into K folds, and calculating an evaluation parameter of each fold under a regularization parameter initial value and an insensitive loss function parameter initial value;

In the embodiment of the invention, cross validation is carried out by dividing samples of a training set into K folds, such as cross validation errors SECV and R²RPD, and searching near the initial value of regularization parameter and the initial value of insensitive loss function parameter by using parameter optimization method to obtain candidate regularization parameter C_iAnd a candidate insensitive loss function parameter ε_iAnd repeating the steps of cross validation and parameter optimization until the optimal parameters of the support vector machine with the optimal evaluation parameters are obtained.

As a specific implementation manner of the embodiment of the present invention, the parameter optimization method includes a grid search method, a gradient descent method, and a single-target optimization method.

Referring to fig. 3-6, an embodiment of the present invention provides a specific implementation example:

taking experimental soy sauce produced in a natural sun-drying process of a certain company as a sample, and taking glucose (g/100g) as a detection index; the experimental device is a mid-infrared spectrometer, and the analysis spectral range is 968-2947cm^-1Parallel measurements were performed 2 times; the parameter screening and model building process is implemented by Python programming.

In this example, the cross validation index k is 3, the precision p of the chemical value is 5%, and ε₀In the calculation formula, f (n) ═ ln, n is the number 284 of training set samples. m 3 and k 3.

C is obtained by calculation according to a calculation formula of the regularization parameter initial value and the insensitive loss function parameter initial value₀And ε₀2.296 and 0.0074 respectively.

The grid search step size was set to 0.001, the resulting minimum SECV was 0.0719, and the optimal parameters C and ε were 3.356 and 0.0046, respectively.

Referring to fig. 3, a schematic diagram of the cross validation residual error of the optimal SVM model obtained based on the optimal parameters C and epsilon according to the embodiment of the present invention is shown. Referring to fig. 4, a diagram of a prediction result of a validation set (sample number 37) according to an embodiment of the present invention is shown. Referring to fig. 5, a cross-validation residual error diagram based on a PLS optimal model SECV of 0.2549 according to an embodiment of the present invention is shown, and referring to fig. 6, a diagram of a validation set prediction result according to an embodiment of the present invention is shown.

3-6, compared with the classical PLS model, the spectral quantitative analysis model constructed by the embodiment of the invention has smaller SECV, better applicability to the concentration anomaly and more robust model; on the other hand, the verification set SEP is smaller, and the prediction precision and generalization performance are better.

The embodiment of the invention has the following beneficial effects:

according to the embodiment of the invention, the regularization parameter initial value of the support vector machine can be reasonably determined according to the distribution data and the precision of the chemical value, the parameter initial value does not need to be set according to the prior knowledge and resampling is not needed, so that the regularization parameter initial value of the support vector machine can be quickly and accurately obtained, and the parameter optimization effect of the support vector machine can be effectively improved. According to the method and the device, the initial value of the insensitive loss function parameter is calculated according to the number of samples and noise data, and the influence of the number of samples and the noise data on the construction of the quantitative spectrum analysis model is comprehensively considered.

Referring to fig. 7, a second embodiment of the present invention provides an apparatus for constructing a quantitative spectral analysis model, including:

a data acquisition module 10, configured to acquire chemical values and spectral features of the training sample:

the first calculation module 20 is configured to calculate a regularization parameter initial value of the support vector machine according to the distribution data and precision of the chemical value; wherein the distribution data comprises a mean and a standard deviation;

the second calculation module 30 is configured to calculate noise data of the training sample according to the spectral feature, and calculate an initial value of an insensitive loss function parameter of the support vector machine according to the number of samples of the training sample and the noise data;

the parameter optimizing module 40 is configured to perform parameter optimizing according to the regularization parameter initial value and the insensitive loss function parameter initial value, respectively, to obtain an optimal regularization parameter and an optimal insensitive loss function parameter;

and the model building module 50 is used for building a spectrum quantitative analysis model according to the optimal regularization parameter and the optimal insensitive loss function parameter.

As a specific implementation manner of the embodiment of the present invention, the first calculating module 20 is specifically configured to:

selecting a kernel function of a support vector machine;

Optionally, the kernel function of the support vector machine is a radial basis kernel function, and the expression of the radial basis kernel function is:

K(x_i,x)＝exp(-γ||x-x_i||²)

wherein x is_iIs a certain central point of the feature space, x is any point of the feature space, and gamma is a width parameter.

wherein p is the precision of the chemical value,is the mean value of the chemical values, σ_yStandard deviation of chemical values.

As a specific implementation manner of the embodiment of the present invention, an expression of an initial value of an insensitive loss function parameter is as follows:

wherein epsilon is an initial value of the insensitive loss function parameter, n is the number of samples, t is a preset multiple of the number of samples, f (n) is a logarithmic function of the number of samples, and sigma is a noise standard deviation. Illustratively, t is 1 to 10 times of the number of samples, f (n) can be mln, mlogn, and the like, m is a positive real number, and the noise standard deviation σ is expressed as:

wherein n is the number of samples in the training set, d is 0-1, k is the number of neighboring points, y_iIn the case of a chemical value,is a k-neighbor algorithm predictor.

As a specific implementation manner of the embodiment of the present invention, the parameter optimizing module 40 is specifically configured to:

Referring to fig. 3-6, an embodiment of the present invention provides a specific implementation example:

The grid search step size was set to 0.001, the resulting minimum SECV was 0.0719, and the optimal parameters C and ε were 3.356 and 0.0046, respectively.

The embodiment of the invention has the following beneficial effects:

The foregoing is a preferred embodiment of the present invention, and it should be noted that it would be apparent to those skilled in the art that various modifications and enhancements can be made without departing from the principles of the invention, and such modifications and enhancements are also considered to be within the scope of the invention.

16页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种特种合金铝锭熔炼的配料方法

Method and device for constructing spectrum quantitative analysis model

相关技术

网友询问留言