Spectrogram chemometrics analysis method based on segmented intelligent optimization

文档序号：1965081 发布日期：2021-12-14 浏览：15次中文

阅读说明：本技术 一种基于分段智能优选的谱图化学计量学解析方法 (Spectrogram chemometrics analysis method based on segmented intelligent optimization ) 是由刘阳詹辉何恺源邓晓旭胡松松于 2021-09-16 设计创作，主要内容包括：本发明涉及一种基于分段智能优选的谱图化学计量学解析方法,本发明采用所有样品谱图和化验分析数据并智能优选条件建立主模型的基础上,将建模所用的样品按照物性数据排序后根据性质分布特点分成若干段,每一段通过智能算法优选出最佳的谱图预处理方法及建模所用谱图区域等条件,从而在每一段内分别建立该指标的分模型,进而提高分析结果的准确性。另外还可以样品按照物性数据排序后根据性质分布特点,将种类或结构相似、性质相近的样品化为一类,并将该段范围内样品谱图和数据在智能优选预处理方法、建模区域等条件后,建立相应的分段模型；按性质分布特点进行分段后,可将种类或结构相似的样品划分为一类,在智能优选条件后建立单独的分模型,可有效解决现有快评分析方法物性变化范围较大时分析模型准确性不高的问题。(The invention relates to a spectrogram chemometric analysis method based on segmented intelligent optimization, which is characterized in that on the basis of establishing a main model by adopting all sample spectrograms and assay analysis data and intelligently optimizing conditions, samples used for modeling are divided into a plurality of segments according to property distribution characteristics after being sorted according to physical property data, and each segment optimizes the optimal spectrogram preprocessing method, the spectrogram area used for modeling and other conditions through an intelligent algorithm, so that the submodel of the index is respectively established in each segment, and the accuracy of an analysis result is further improved. In addition, samples can be sorted according to physical property data, then samples with similar types or structures and properties are classified into one type according to property distribution characteristics, and corresponding segmented models are established after sample spectrograms and data in the segment range are subjected to intelligent optimization preprocessing method, modeling area and other conditions; after segmentation is carried out according to the property distribution characteristics, samples with similar types or structures can be divided into one type, and an independent sub-model is established after intelligent optimization conditions, so that the problem of low accuracy of the analysis model when the physical property change range of the existing quick evaluation analysis method is large can be effectively solved.)

1. A spectrogram chemometrics analysis method based on segmented intelligent optimization is characterized by comprising seven processes of data collection, spectrogram classification, spectrogram preprocessing, interval sequencing, main model intelligent optimization, partial model intelligent optimization and spectrogram analysis, and specifically comprising the following steps of:

(1) collecting data of the accumulated samples to obtain a sample spectrogram data set;

(2) performing spectrogram classification based on the sample spectrogram dataset, and dividing the spectrogram into a correction set and a verification set;

(3) respectively preprocessing the correction set and the verification set after the spectrogram classification;

(4) respectively carrying out interval sequencing on the preprocessed correction sets and verification sets to obtain sequenced correction sets and verification sets;

(5) intelligently optimizing a main model based on the correction set and the verification set data set;

(6) performing intelligent optimization of a sub-model based on the correction set and the verification set data set;

(7) and carrying out spectrogram analysis on the new sample spectrogram data set according to the main model and the sub model obtained through intelligent optimization, and outputting an analysis result.

2. The method for resolving spectrogram chemometrics based on segmented intelligence optimization of claim 1, wherein: the step (1) is specifically as follows: performing sweep spectrum analysis on the accumulated samples by using a quick evaluation analyzer, and collecting a spectrum or spectrum data set X (m X n) of the samples, wherein m is the number of samples, and n is the number of data points in a spectrogram; while collecting spectrogram data, simultaneously analyzing macroscopic properties of the sample by adopting a standard method and a traditional analytical instrument, and collecting each physical property assay analysis data set Y (m × n '), wherein n' is the number of the physical properties analyzed by the sample; when a certain physical property is modeled, the column vector of the physical property in the data set Y is proposed and merged to the last column in the spectrogram data set X to form a new data set Z (m × n +1) for the establishment of the next model.

3. The method of claim 1The spectrogram chemometrics analysis method based on segmented intelligent optimization is characterized by comprising the following steps of: the step (2) is specifically as follows: dividing the spectrogram in the data set into a correction set Z according to a Kolmogorov-Smirnov algorithm_c(m₁N +1) and validation set Z_v(m₂N +1), the sample proportion of the two data sets can be set artificially, wherein the preferred correction set is set in the range of 60-80%; wherein m is₁+m₂＝m。

4. The method for resolving spectrogram chemometrics based on segmented intelligence optimization of claim 1, wherein: the step (3) is specifically as follows: for the sample spectrogram data set Z in the correction set_cPreprocessing the data except the last column to form a latest data set Z_c—pre(ii) a The preprocessing method selects any one or more of smoothing, derivative, vector normalization, mean centralization, mean variance, standard normal variable transformation and multivariate scattering correction mathematical methods; the selection of the preprocessing method can be omitted, and the type and the number range of the spectrogram preprocessing selection method are artificially limited during modeling; in the same way, verification set Z_vSelecting the same method as the correction set for preprocessing to form a data set Z_v—pre。

5. The method for resolving spectrogram chemometrics based on segmented intelligence optimization of claim 1, wherein: the step (4) is specifically as follows: for the preprocessed correction set data set Z_c—preRespectively carrying out correlation analysis on the 1 st column to the nth column and the n +1 th column, and arranging the 1 st column to the nth column in descending order according to the correlation coefficient magnitude relation to form a new data set Z_c—red，Z_c—redWherein each column of data is in the original data set Z_c—preThe sequence number in (1 x n) is denoted as data set D; similarly, the preprocessed verification set Z is processed according to the sequence number sequence in the data set D_v—preThe data of the 1 st to the nth columns are rearranged to form a new data set Z_v—red。

6. The method for resolving spectrogram chemometrics based on segmented intelligence optimization of claim 1, wherein: the step (5) is specifically as follows: under a certain pre-processing method, the following operations are performed: successive selection of Z_c—redRespectively correlating the front 1 column, the front 2 column and the front 3 column … … front n column with the test analysis data of the n +1 column by adopting a partial least square method to obtain a regression coefficient data set B₁、B₂……B_nNamely, the main model of the physical property under different data dimensions is obtained; and then using data set B₁、B₂……B_nRespectively to verification set Z_v—redAnalyzing and calculating the … … front n columns of data sets of the front 1 column, the front 2 column and the front 3 column to obtain m₂Physical property analysis data of each sample; then comparing the analysis result of each sample with the actual assay analysis data, and calculating m under each condition₂Individual error and overall average error E of individual samples₁、E₂、E₃……E_n(ii) a If the minimum average error is E_kThen data set B_kThe optimal main model under the pretreatment method is obtained; wherein, other possible preprocessing methods of the spectrogram of the correction set are replaced in the limit condition, and the minimum average error E in each case is respectively calculated_kAnd a corresponding coefficient data set B_kAnd then comparing E in all cases_kValue, selecting the smallest average error E_k-bestAnd a corresponding coefficient data set B_k-bestI.e. the best master model that is intelligently preferred.

7. The method for resolving spectrogram chemometrics based on segmented intelligence optimization of claim 1, wherein: the step (6) is specifically as follows: selecting p-segment data sets in the correction set according to the distribution condition of physical property data of the samples, and respectively recording the p-segment data sets as Z_c1(m₁₁*n+1)、Z_c2(m₁₂*n+1)……Z_cp(m_1pN +1), wherein m₁₁+m₁₂+……+m₁₂≤m₁(ii) a And then repeating the processes from the step (3) to the step (5) for each section of area sample, and intelligently optimizing the method according to the main modelMethod of finding the minimum average error E in each segment_k1-best、E_k2-best……E_kp-bestAnd a corresponding coefficient data set B_k1-bes、B_k2-best……B_kp-bestI.e. the best sub-model for each segment of the intelligent optimization.

8. The method for resolving spectrogram chemometrics based on segmented intelligence optimization of claim 1, wherein: the step (7) is specifically as follows:

(7) new sample spectrogram data set Z_n(h x n) according to the main model, establishing the selected preprocessing method and the interval sequencing mode for processing to obtain a data set Z_n—red(ii) a Regression coefficient data set B using the principal model_k-bestTo Z_n—redAnalyzing and calculating the k lines of data in the middle and front to obtain a new sample main model analysis result R_t；

If the main model analysis result is not in any data set physical numerical range in the step (6), the final analysis result is R_n(ii) a If the predicted result is distributed in the j-th physical property value range in the step (6), the partial model coefficient data set B of the j-th physical property value range is used_kj-bestIs paired with Z again_n—redAnalyzing and calculating the k lines of data in the middle and front, and performing partial model prediction on a result R_njI.e. the final prediction result.

Technical Field

The invention relates to the technical field of quick evaluation analysis of various raw materials and intermediate products in the petrochemical industry, in particular to a spectrogram chemometrics analysis method based on segmented intelligent optimization.

Background

For the petrochemical industry, the analysis and evaluation of raw materials and intermediate products play an important role in the production process. Enterprises need to adjust the process conditions and parameters of each device in the production process in time through real-time analysis data of key indexes of raw materials and intermediate products, so that the economic benefits of the whole plant are improved as much as possible while the product quality is ensured to meet the requirements. Therefore, the accuracy and timeliness of the analysis data of each index are very important.

Generally, there are many kinds of raw materials or products to be analyzed, and the kinds of the analysis indexes of each material are different. In the traditional assay analysis method, generally, each index needs professional personnel to adopt a specific instrument to carry out analysis independently, and the whole analysis process needs a plurality of personnel and a plurality of instruments to be carried out simultaneously, so that the analysis cost is high, the timeliness is poor, the efficiency is low, and timely and accurate analysis data is difficult to provide for production.

In recent years, many large-scale petrochemical enterprises at home and abroad gradually adopt modern instruments to realize rapid evaluation and analysis on various raw materials or products, including near infrared spectroscopy (NIR), mid-infrared spectroscopy (MIR), nuclear magnetic resonance spectroscopy (NMR) and the like. The quick evaluation analysis methods firstly need to scan a sample by using an instrument to obtain a spectrum or a wave spectrum corresponding to the sample, and then analyze a plurality of indexes of the sample at one time in a short time through analyzing the spectrum. The quick evaluation analysis method greatly improves the analysis efficiency of the sample, reduces the labor intensity and the analysis cost of the chemical examination personnel, provides the analysis data of the key indexes of each material in time and provides guidance for production.

For the quick evaluation analysis method, the most core process is mainly the analysis of a sample spectrogram. The spectrogram analysis mainly comprises two processes: (1) and (5) establishing an analysis model. Before general modeling, a certain amount of samples with assay analysis data obtained by a standard analysis method need to be accumulated, and a fast evaluation analyzer is used for scanning to obtain a spectrogram corresponding to the samples. During modeling, the spectrogram of the accumulated sample is processed by a specific method and then is associated with assay analysis data, and an index analysis model is established; (2) and (5) analyzing the spectrogram. And (3) rapidly analyzing the spectrogram of the new sample scanned by the quick evaluation analyzer by using the established analysis model to obtain the analysis data of each index of the sample. Therefore, the quality of each index analysis model directly determines the accuracy of the quick-evaluation analysis data.

At present, the process of establishing an analysis model of each index includes preprocessing a sample spectrogram by methods of smoothing, derivative, proper normalization, multivariate scattering correction and the like, selecting a proper area according to the characteristics of each index and the relation between the characteristics of each index and the structure represented by each characteristic peak in the spectrogram, and then associating the selected area with assay analysis data by chemometrics methods such as a partial least square method and the like, so as to establish a physical property analysis model of each index. The method is widely applied at present, but overall, the model accuracy needs to be further improved.

Disclosure of Invention

The invention aims to overcome the defects and aims to provide a spectrogram chemometrics analysis method based on segmented intelligent optimization. In addition, in addition to the main model, the invention can also sort the samples according to the physical property data, and then classify the samples with similar types or structures and properties into one type according to the property distribution characteristics, and establish the corresponding segmented model after intelligently optimizing the conditions of the preprocessing method, the modeling area and the like for the sample spectrogram and the data in the segment range; after segmentation is carried out according to the property distribution characteristics, samples with similar types or structures can be divided into one type, and an independent sub-model is established after intelligent optimization conditions, so that the accuracy of the model is greatly improved, and the problem that the accuracy of the analysis model is not high when the physical property change range of the existing quick evaluation analysis method is large can be effectively solved.

The invention achieves the aim through the following technical scheme: the invention relates to a spectrogram chemometrics analysis method based on segmented intelligent optimization.

After a certain number of sample spectrograms and corresponding assay analysis data are collected in the early stage, the spectrograms can be divided into a correction set and a verification set according to a certain algorithm. Establishing a main model, which is mainly to intelligently optimize a preprocessing method and a spectrogram interval range of spectrograms of all samples in a calibration set, and then to correlate the spectrograms with index conventional assay analysis data by adopting chemometrics methods such as a partial least squares method and the like to form the main model of an analysis process. The sample spectrogram in the correction set can be preprocessed by selecting one or more mathematical methods such as smoothing, derivative, vector normalization, mean centralization, mean variance, standard normal variable transformation, multivariate scattering correction and the like. After the preprocessing is completed, correlation analysis can be performed on each data point in the correction collection spectrogram and the index assay analysis numerical value, and the data points of the spectrogram are sorted again according to the magnitude of the correlation coefficient.

When the main model is established, the spectrogram is processed according to a certain specific preprocessing method, data points of the spectrogram are arranged in a descending order according to the size relation of the correlation coefficient of the modeling index, then the first 1, the first 2 and the first 3 … … data points are selected successively according to the descending order and are associated with index assay analysis data (n is the total number of the data points of the sample spectrogram), and the main model of the index is established. After the first 1 point is selected to complete the establishment of the main model, each spectrogram in the verification set is analyzed and predicted by using the main model, and the main model is compared with actual assay analysis data to count average errors. And when the first 2 data points are taken for the second time, analyzing and predicting each spectrogram in the verification set after the main model is established, counting the average error, comparing the average error with the average error of the first time, and selecting the main model with the smaller average error in the two times. And by analogy, counting the average error of the main model analysis verification set established each time, and comparing the average error with the average error of the main model selected before. After the nth-time main model is established, the main model with the minimum average error is reserved as the optimal main model corresponding to the preprocessing method.

After the selection of the optimal main model under a certain spectrogram preprocessing method is completed, the spectrogram preprocessing method of the correction set can be adjusted according to the actual situation. The general spectrogram pretreatment can select one or more methods in combination, or can not select the methods, and the variety and the number range of the spectrogram pretreatment selection methods can be artificially limited when each index is modeled. And then under a limited condition, searching for the corresponding optimal main models under various preprocessing methods, comparing the average error of the analysis result when each main model analyzes the verification set spectrogram, and selecting the main model with the minimum average error, namely the intelligent optimal main model.

And when the variation range of the physical index numerical value is large, a segmented model can be established for samples in different physical property ranges besides the main model. The establishment of the segmented model firstly needs to sort the samples in the correction set according to the physical property data and then to classify the samples with similar types or structures and properties into one type according to the property distribution characteristics. Usually, the samples in the calibration set can be divided into one or more sections in the region with dense property distribution, but the number of samples in each section should be ensured to be enough to ensure the accuracy of the partial model.

After segmentation, the samples in each segment can be subjected to a spectrogram preprocessing method and intelligent optimization of interval ranges. The specific optimization method is similar to the main model optimization process, the spectrogram in the segmented interval is processed according to a certain specific preprocessing method, then data points of the spectrogram are arranged in a descending order according to the size relation of the correlation coefficient with the modeling index, different data points are respectively selected according to the order to respectively establish the sub models of the index in the segment range, and the optimal sub model is optimized from the sub models. And then under a limited condition, searching the corresponding optimal partial model under each preprocessing method, comprehensively comparing, and selecting the partial model with the minimum average error, namely the intelligent optimal partial model.

When a new sample spectrogram is analyzed, firstly, the main model is adopted to analyze and calculate the spectrogram, and whether the index prediction result is within the range of a certain physical property value when the sub model is established is observed. If the prediction result is not in any physical property numerical range, the main model prediction result is the final prediction result; if the prediction result is distributed in a certain section of physical property numerical range, the spectrogram needs to be analyzed and calculated again by using the partial model in the section of range, and the prediction result of the partial model is the final prediction result.

The method mainly comprises seven processes of data collection, spectrogram classification, spectrogram preprocessing, interval sequencing, main model intelligent optimization, sub-model intelligent optimization, spectrogram analysis and the like.

(1) Data collection

Before modeling, a certain amount of samples needs to be accumulated. Firstly, a fast evaluation analyzer is utilized to perform sweep spectrum analysis on accumulated samples, and a spectrum or spectrum data set X (m X n) of the samples is collected, wherein m is the number of samples, and n is the number of data points in the spectrum. In general, the greater the number of samples m, the greater the accuracy of the data and model. While collecting the spectrogram data, the macroscopic properties of the sample need to be analyzed by a standard method and a traditional analytical instrument, and each physical property assay analysis data set Y (m × n ') is collected, wherein n' is the number of the physical properties analyzed by the sample. When a certain physical property is modeled, the column vector of the physical property in the data set Y is proposed and merged to the last column in the spectrogram data set X to form a new data set Z (m × n +1) for the establishment of the next model.

(2) Spectrogram classification

After the sample spectrogram and corresponding assay analysis data are collected, the spectrogram can be divided into a correction set Z according to a Kolmogorov-Smirnov algorithm_c(m₁N +1) and validation set Z_v(m₂N +1), the sample proportion of the two data sets can be set artificially, and the general correction set is set in the range of 60-80%. Wherein m is₁+m₂＝m。

(3) Spectrogram pretreatment

Sample spectrogram data set Z in correction set_cThe data except the last column can be preprocessed by selecting mathematical methods such as smoothing, derivative, vector normalization, mean centralization, mean variance, standard normal variable transformation, multivariate scattering correction and the like to form a latest data set Z_c—pre. One or more of the pretreatment methods can be selected, or not selected, and the type and the number range of the spectrogram pretreatment selection method can be artificially limited during modeling. At the same time, validation set Z_vSelecting the same method as the correction set for preprocessing to form a data set Z_v—pre。

(4) Section ordering

For data set Z_c—preRespectively carrying out correlation analysis on the 1 st column to the nth column and the n +1 th column, and arranging the 1 st column to the nth column in descending order according to the correlation coefficient magnitude relation to form a new data set Z_c—red，Z_c—redWherein each column of data is in the original data set Z_c—preThe sequence number in (1 x n) is denoted as data set D. At the same time, Z is pair according to the sequence number in the data set D_v—preThe data of the 1 st to the nth columns are rearranged to form a new data set Z_v—red。

(5) Intelligent optimization of master model

Under a certain pre-processing method, the following operations are performed: successive selection of Z_c—redRespectively correlating the front 1 column, the front 2 column and the front 3 column … … front n column with the test analysis data of the n +1 column by adopting a partial least square method to obtain a regression coefficient data set B₁、B₂……B_nThat is, the main model of the physical property in different data dimensions. And then using data set B₁、B₂……B_nRespectively to verification set Z_v—redAnalyzing and calculating the … … front n columns of data sets of the front 1 column, the front 2 column and the front 3 column to obtain m₂Data on physical property analysis of each sample. Then comparing the analysis result of each sample with the actual assay analysis data, and calculating m under each condition₂Individual error and global mean error of each sampleE₁、E₂、E₃……E_n. If the minimum average error is E_kThen data set B_kNamely the optimal main model under the pretreatment method.

Replacing other possible preprocessing methods of the spectrogram of the correction set in the limiting conditions, and respectively calculating the minimum average error E in each case_kAnd a corresponding coefficient data set B_kAnd then comparing E in all cases_kValue, selecting the smallest average error E_k-bestAnd a corresponding coefficient data set B_k-bestI.e. the best master model that is intelligently preferred.

(6) Sub-model intelligent optimization

Selecting p-segment data sets in the correction set according to the distribution condition of physical property data of the samples, and respectively recording the p-segment data sets as Z_c1(m₁₁*n+1)、Z_c2(m₁₂*n+1)……Z_cp(m_1pN +1), wherein m₁₁+m₁₂+……+m₁₂≤m₁. And (5) repeating the processes of the steps (3) to (5) for each section of area sample, and calculating the minimum average error E in each section of area according to the intelligent optimization method of the main model_k1-best、E_k2-best……E_kp-bestAnd a corresponding coefficient data set B_k1-bes、B_k2-best……B_kp-bestI.e. the best sub-model for each segment of the intelligent optimization.

(7) Spectrogram analysis

New sample spectrogram data set Z_n(h x n) according to the main model, establishing the selected preprocessing method and the interval sequencing mode for processing to obtain a data set Z_n—red. Regression coefficient data set B using the principal model_k-bestTo Z_n—redAnalyzing and calculating the k lines of data in the middle and front to obtain a new sample main model analysis result R_t。

The invention has the beneficial effects that: (1) the method adopts a main model + sub model combined analysis method to predict the fast evaluation spectrogram of the sample, when the range of physical change is large, the sample may contain various different types or structures, and at the moment, after all data are uniformly modeled, the model precision is greatly influenced; under the condition, a sub-model is additionally established for a certain specific region, so that the sample prediction error of the physical property in the region can be greatly reduced, and the accuracy of the prediction result is improved; (2) according to the method, the intelligent selection of the spectrogram interval range is realized, the relevance analysis is respectively carried out between each data point and physical property data of the spectrogram in the modeling process, and the spectrogram area participating in modeling is automatically selected according to the relevance, so that the accuracy of the model is improved, meanwhile, the manual operation and judgment process are reduced, and the convenience of the modeling process is improved; (3) the spectrogram preprocessing method is richer and more intelligent, and in the modeling process, the spectrogram preprocessing can select various mathematical algorithms such as smoothing, derivative, vector normalization, mean centralization, mean variance, standard normal variable transformation, multivariate scattering correction and the like, so that the types are richer; in addition, the optimal pretreatment method can be selected in the modeling process according to the model effect of each treatment method, so that the selection process of the spectrogram pretreatment method is more intelligent.

Drawings

FIG. 1 is a flow chart of the detection of the present invention;

FIG. 2 is a property profile of sulfur content of crude oil according to an embodiment of the present invention;

FIG. 3 is a graph of a research octane number property profile for a gasoline in accordance with an embodiment of the present invention;

FIG. 4 is a graph comparing the average error of crude oil with different pretreatment conditions for sulfur content in crude oil according to an embodiment of the present invention;

FIG. 5 is a graph comparing the average error of gasoline octane under different pretreatment conditions in accordance with an embodiment of the present invention.

Detailed Description

The invention will be further described with reference to specific examples, but the scope of the invention is not limited thereto:

example 1: the method for analyzing the spectrum chemometrics based on the segmented intelligent optimization carries out quick evaluation analysis on the infrared spectrum of the crude oil in a certain refinery, and the operation steps mainly comprise seven processes of data collection, spectrum classification, spectrum pretreatment, interval sequencing, main model intelligent optimization, sub-model intelligent optimization, spectrum analysis and the like.

(1) Data collection

Before modeling, a certain amount of samples needs to be accumulated. Firstly, scanning spectrum analysis is carried out on 412 accumulated crude oil samples by using a mid-infrared analyzer, and a mid-infrared spectrum data set X (412X 1804) of the samples is collected, wherein the wave number range of the mid-infrared spectrum of each crude oil is about 650-4000 cm^-1In this range, 1804 data points are decomposed. While collecting the spectrogram data, the method simultaneously analyzes 28 macroscopic properties of the crude oil sample, such as density, sulfur content, nitrogen content, carbon residue, acid value, viscosity, water content, salt content, distillation range and the like by adopting a standard method and a traditional analytical instrument, and collects a physical property assay analysis data set Y (412 x 28). When a certain physical property is modeled, the column vector of the physical property in the data set Y is proposed and merged to the last column in the spectrogram data set X to form a new data set Z (412 × 1805) for the establishment of the next model. The sulfur content of crude oil will now be described.

(2) Spectrogram classification

Dividing the infrared spectrogram in the crude oil into a correction set Z according to a Kolmogorov-Smirnov algorithm_c(330 x 1805) and validation set Z_v(82 × 1805) and the proportion of the correction set is 80% of the total number of samples.

(3) Spectrogram pretreatment

For the sample spectrogram data set Z in the correction set_cThe preprocessing method of the first 1804 item is set to be smooth, derivative, vector normalization, mean centering, mean variance, standard normal variable transformation, multivariate scattering correction and other 7 types, and the number of the preprocessing methods is set to be 0 or 1, then Z_cThe preprocessing scheme of (1) has 8 possibilities, and the set of post-processing corrections is noted as Z_c—pre(330*1805). All in oneIn time, after the preprocessing method is selected in each correction set, the verification set Z is selected_vSelecting the same method as the correction set for preprocessing to form a data set Z_v—pre(82*1805)。

(4) Section ordering

Under the condition of selecting no preprocessing method, the data set Z is subjected to_c—preIn the method, the 1 st to 1804 th columns are respectively subjected to correlation analysis with the 1805 th column sulfur content physical property values, and the 1 st to 1804 th columns are rearranged in a descending order according to the magnitude relation of correlation coefficients to form a new data set Z_c—red(330*1805)，Z_c—redWherein each column of data is in the original data set Z_c—preThe sequence number in (1 × 1805) is denoted as data set D. At the same time, Z is pair according to the sequence number in the data set D_v—preThe data in the middle 1 st to 1804 th columns are rearranged to form a new data set Z_v—red(82*1805)。

(5) Intelligent optimization of master model

Successive selection of Z_c—redThe sulfur content values of the middle first 1 column, the first 2 column and the first 3 column, … … and 1804 column and 1805 column are respectively associated by a partial least square method to obtain a regression coefficient data set B₁、B₂……B₁₈₀₄That is, the main model of the physical property in different data dimensions. And then using data set B₁、B₂……B₁₈₀₄Respectively to verification set Z_v—redAnalysis and calculation are carried out on data sets of column … … and column 1804 in the first 1, the second 2 and the third 3 to obtain physical property analysis data of the samples of the 82 verification sets. Comparing the analysis result of each sample with the actual analysis data, and calculating the respective error and the overall average error E of 82 samples in each case₁、E₂、E₃……E₁₈₀₄. Wherein the minimum average error is E₁₇₀₉That is, when the top 1709 columns of data with larger correlation coefficients are selected for modeling with sulfur content, the average error is the smallest. Data set B at this time₁₇₀₉Namely the optimal main model.

7 preprocessing methods of smoothing, derivative, vector normalization, mean centralization, mean variance, standard normal variable transformation and multiple scattering correctionThe method preprocesses the correction set separately and obtains the minimum average error E in each case_kAnd a corresponding coefficient data set B_k. Table 1 shows the minimum average error for each of the 8 pretreatment protocols. As can be seen from the data in table 1, when the 8 th preprocessing scheme (multivariate scatter correction method) is selected to process the correction set, the verification set predicts the best result. Coefficient data set B under this method_bestI.e. the best master model that is intelligently preferred.

TABLE 1

(6) Sub-model intelligent optimization

The distribution of the sulfur content assay analysis data within the sample set is shown in figure 2. Selecting two sections of data sets of 0-1.2 and 2.6-4.2 according to the distribution condition of physical property data of the sample, and respectively recording as Z_c1(169*1805)、Z_c2(146*1805). And (5) repeating the processes of the steps (3) to (5) on the two-section area sample, and solving the minimum average error E in each section area according to the intelligent optimization method of the main model_1-best、E_2-bestAnd a corresponding coefficient data set B_1-bes、B_2-bestI.e. the best sub-model for each segment of the intelligent optimization.

(7) Spectrogram analysis

And after the main model and the sub model are established, analyzing and predicting the spectrogram of the new crude oil sample. Taking q new samples with unknown properties, and collecting a sample spectrogram dataset Z_n(q 1804) preprocessing according to the multivariate scatter correction method. When q is small, the new spectrogram dataset Z can be_nAnd correction set Z for modeling_cVerification set Z_vMerging, preprocessing and separating to obtain a prediction sample set Z_n-pre. Then the first 1804 columns of data are rearranged according to the sequence in the data set D to obtainData set Z_n—red. Regression coefficient data set B using the principal model_bestTo Z_n—redAnalyzing and calculating the k lines of data in the middle and front to obtain a new sample main model analysis result R_n。

If the main model analysis result is not in the numerical range of 0-1.2 and 2.6-4.2, the final analysis result is R_n(ii) a If the prediction result is distributed in the j-th physical property value range, the partial model coefficient data set B of the j-th physical property value range is used_j-bestIs paired with Z again_n—redAnalyzing and calculating the k lines of data in the middle and front, and performing partial model prediction on a result R_njI.e. the final prediction result.

Example 2: the spectrum chemometrics analysis method based on segmented intelligent optimization provided by the invention is used for carrying out quick evaluation analysis on the near infrared spectrum of the No. 92 gasoline in a certain refinery tank area. The operation steps mainly comprise seven processes of data collection, spectrogram classification, spectrogram preprocessing, interval sequencing, intelligent main model optimization, intelligent sub-model optimization, spectrogram analysis and the like.

(1) Data collection

Before modeling, a certain amount of samples needs to be accumulated. Firstly, scanning spectrum analysis is carried out on 357 accumulated gasoline samples by using a near infrared analyzer, and a near infrared spectrum data set X (357 × 1298) of the samples is collected, wherein the wave number range of each gasoline near infrared spectrum is about 4000-14000 cm^-1In this range, 1298 data points are resolved. While collecting spectrogram data, 12 macroscopic properties of gasoline samples, such as density, research octane number, motor octane number, distillation range, vapor pressure, benzene content, aromatic hydrocarbon content, olefin content, oxygen content and the like, are analyzed by a standard method and a traditional analytical instrument, and a physical property assay analysis data set Y (357 x 12) is collected. When a certain physical property is modeled, the column vector of the physical property in the data set Y is proposed and combined to the last column in the spectrogram data set X to form a new data set Z (357, 1299) for the next modeling. Now, the research octane number will be described as an example.

(2) Spectrogram classification

Dividing the gasoline near-infrared spectrogram according to Kolmogorov-Smirnov algorithmFor correction set Z_c(286 x 1299) and validation set Z_v(71 × 1299) with a correction set proportion of 80% of the total number of samples.

(3) Spectrogram pretreatment

For the sample spectrogram data set Z in the correction set_cThe preprocessing method types of the first 1298 items are set to be 7 types such as smoothing, derivative, vector normalization, mean centering, mean variance, standard normal variable transformation, multivariate scattering correction and the like, and the number of the preprocessing methods is set to be 0 or 1, then Z_cThe preprocessing scheme of (1) has 8 possibilities, and the set of post-processing corrections is noted as Z_c—pre(286*1299). Meanwhile, after the preprocessing method is selected in each correction set, the verification set Z is selected_vSelecting the same method as the correction set for preprocessing to form a data set Z_v—pre(71*1299)。

(4) Section ordering

Under the condition of selecting no preprocessing method, the data set Z is subjected to_c—preThe 1 st to 1298 th columns are respectively subjected to correlation analysis with the 1299 th research octane number value, and the 1 st to 1298 th columns are rearranged in a descending order according to the correlation coefficient size relationship to form a new data set Z_c—red(286*1299)，Z_c—redWherein each column of data is in the original data set Z_c—preThe sequence number in (2) is denoted as data set D (1 × 1299). At the same time, Z is pair according to the sequence number in the data set D_v—preThe data in the middle 1 st to 1298 th columns are rearranged to form a new data set Z_v—red(71*1299)。

(5) Intelligent optimization of master model

Successive selection of Z_c—redRespectively correlating the middle first 1 column, the first 2 column and the first 3 columns … … and the first 1298 column with the 1299 th research octane number by a partial least squares method to obtain a regression coefficient data set B₁、B₂……B₁₂₉₈That is, the main model of the physical property in different data dimensions. And then using data set B₁、B₂……B₁₂₉₈Respectively to verification set Z_v—redAnalyzing and calculating data sets of the first 1 column, the first 2 column and the first 3 column … … and the first 1298 column to obtain octane number analysis numbers of each of 71 verification set samplesAccordingly. Comparing the analysis result of each sample with the actual analysis data, and calculating the respective error and the overall average error E of 71 samples under each condition₁、E₂、E₃……E₁₂₉₈. Wherein the minimum average error is E₁₂₁₁That is, when the front 1211 column data with a larger correlation coefficient is selected to be modeled with the research octane number, the average error is the smallest. Data set B at this time₁₂₁₁Namely the optimal main model.

The correction set is preprocessed by 7 preprocessing methods of smoothing, derivative, vector normalization, mean centralization, mean variance, standard normal variable transformation and multiple scattering correction respectively, and the minimum average error E under each condition is solved respectively_kAnd a corresponding coefficient data set B_k. Table 2 shows the minimum average error for each of the 8 pretreatment protocols. As can be seen from the data in table 2, the validation set predicted the best when the 1 st preprocessing scheme (no preprocessing method) was selected. Coefficient data set B under this method_bestI.e. the best master model that is intelligently preferred.

Scheme number	Pretreatment method	Minimum mean error
			Scheme 1	Without pretreatment	0.228145697
Scheme 2	Smoothing	0.248177205
			Scheme 3	Derivative of	0.247426024
Scheme 4	Vector normalization	0.268616612
			Scheme 5	Mean centering	0.251932265
Scheme 6	Mean variance analysis	0.267560076
			Scheme 7	Standard normal variable transformation	0.251588032
Scheme 8	Multivariate scatter correction	0.273317586

TABLE 2

(6) Sub-model intelligent optimization

The distribution of research octane number assay analysis data within a sample set is shown in figure 3. Selecting a data set in a region of 92.1-94.1 according to the distribution condition of physical property data of the sample to establish a partial model, and recording as Z_c1(279*1299). And (5) repeating the processes of the steps (3) to (5) on the two-segment region sample, and calculating the minimum average error E in the region according to the intelligent optimization method of the main model_1-bestAnd a corresponding coefficient data set B_1-besI.e. the best sub-model for each segment of the intelligent optimization.

(7) Spectrogram analysis

And after the main model and the sub model are established, analyzing and predicting the spectrogram of the new sample. Taking s new 92# gasoline samples with unknown properties, and collecting a sample spectrogram data set Z_n(s 1298) the first 1298 columns of data are rearranged in the order of data set D without any processing, resulting in data set Z_n—red. Regression coefficient data set B using the principal model_bestTo Z_n—redAnalyzing and calculating the k lines of data in the middle and front to obtain a new sample main model analysis result R_n。

If the main model analysis result is not in the range of 92.1-94.1, the final analysis result is R_n(ii) a If the prediction result is distributed in the range of 92.1-94.1, the partial model coefficient data set B in the range is required to be used_1-bestIs paired with Z again_n—redAnalyzing and calculating the k lines of data in the middle and front, and performing partial model prediction on a result R_n1I.e. the final prediction result.

The embodiment greatly improves the accuracy of the crude oil mid-infrared spectrum analysis result and the gasoline near-infrared spectrum analysis result and the convenience of the operation of the modeling process by using the optimal spectrogram chemometrics analysis method based on the segmented intelligence. As can be seen from the graphs in FIGS. 4 and 5, when the method of the present invention is used for analyzing a new sample spectrogram, the average error of the predicted result of the validation set is significantly smaller than that of the analysis by the conventional partial least squares method, which proves that the method of the present invention can greatly improve the accuracy of the analysis result in the quick evaluation analysis of various samples in the petrochemical field. In addition, for the preprocessing method and the interval range selected by the modeling process, the method can automatically and preferably select the optimal condition, so that the method has higher application value.

While the invention has been described in connection with specific embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

14页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种基于神经网络的高拱坝模型试验相似材料配合比确定方法

Spectrogram chemometrics analysis method based on segmented intelligent optimization

相关技术

网友询问留言