Method for identifying and comparing three-dimensional fluorescence spectrogram of soluble organic matter

文档序号：1336437 发布日期：2020-07-17 浏览：8次中文

阅读说明：本技术 一种溶解性有机物三维荧光谱图的识别比对方法 (Method for identifying and comparing three-dimensional fluorescence spectrogram of soluble organic matter ) 是由何鹰魏峨尊王南达高贝贝李京都王欣刘璐宋宗东于 2019-01-10 设计创作，主要内容包括：本发明提供对溶解性有机物三维荧光谱图的识别方法,在溶解性有机物荧光光谱数据库样本比对时,使数据库建设与比对同时进行,自动完善更新等扩充功能,提高了被测样品识别。将被测样品荧光光谱数据进行必要数据处理,即可与参照比对数据库中样本进行相似度计算与匹配,从而获得快速准确的识别判别。对样本峰位置坐标和峰强度信息的识别,建立概率神经网络,结合贝耶斯理论,实现对样本的聚类分类和判别。根据被测样品荧光数据矩阵,进行余弦系数的计算,得到最大匹配度和综合相似度系数指标,形成谱图识别信息。经平行因子解析获得的荧光光谱数据和各组分荧光光谱数据,分别进行余弦相似度系数的计算,从而达到对样本的进一步分类和识别信息。(The invention provides a method for identifying a three-dimensional fluorescence spectrogram of a soluble organic matter, which enables the construction and comparison of a database to be carried out simultaneously when samples of a fluorescence spectrum database of the soluble organic matter are compared, automatically improves the expansion functions of updating and the like, and improves the identification of a detected sample. And (3) carrying out necessary data processing on the fluorescence spectrum data of the detected sample, and carrying out similarity calculation and matching on the fluorescence spectrum data and the sample in the reference comparison database so as to obtain rapid and accurate identification and judgment. And identifying the sample peak position coordinates and peak intensity information, establishing a probabilistic neural network, and combining a Bayesian theory to realize cluster classification and discrimination of the samples. And calculating a cosine coefficient according to the fluorescence data matrix of the detected sample to obtain a maximum matching degree and a comprehensive similarity coefficient index, and forming spectrogram identification information. And (3) respectively calculating cosine similarity coefficients of the fluorescence spectrum data obtained by parallel factor analysis and the fluorescence spectrum data of each component, thereby further classifying and identifying the samples.)

1. A method for identifying and comparing three-dimensional fluorescence spectrograms of soluble organic matters is characterized by comprising the following steps: and establishing a matrix which is the same as the fluorescence data of each sample in the reference comparison database according to the fluorescence data matrix of the detected sample, calculating a cosine similarity coefficient, adjusting the cosine coefficient when necessary to obtain two discrimination indexes of maximum matching degree and comprehensive similarity coefficient, and forming identification and comparison information of the spectrogram according to a maximum similarity coefficient matching principle.

2. The method as claimed in claim 1, wherein the cosine coefficient matrix is calculated by calculating a cosine value cos θ for each column of the sample data matrix X of the tested sample and each sample data matrix Y of the reference comparison database to obtain a cosine value numerical matrix OriginalData. NEx × Nsamples composed of the number of excitation wavelengths and the number of libraries, and designated as R1, and calculating a cosine value cos θ for each column of the sample data matrix X of the tested sample and each sample data matrix of the reference comparison database to obtain a cosine value numerical matrix OriginalData. NEm × Nsamples composed of the number of emission wavelengths and the number of libraries, designated as R2.

3. The method for identifying and comparing the three-dimensional fluorescence spectrum of the soluble organic compounds according to claim 1, wherein the method comprises the following steps: the maximum matching degree is calculated as: assuming that the cosine value is greater than 0.9, calculating the number of cosine values of each sample in R1 which are greater than 0.9 for comparison limits of the tested sample and the reference comparison database sample, and recording the name of a file as NPCorrA; calculating the number of cosine values of each sample in R2 which are more than 0.9, and recording the name of a file as NPCorrB; the sum of NPCorrA and NPCorrB is named as the maximum matching degree, which is one of the criteria for judging whether the detected sample is similar to the library sample.

4. The method for identifying and comparing the three-dimensional fluorescence spectrum of the soluble organic compounds according to claim 1, wherein the method comprises the following steps: the calculation of the comprehensive similarity coefficient is as follows: sometimes, the maximum matching degrees of some samples in the library are equal, the similarity degree with the tested sample cannot be judged according to the number of a certain threshold value, such as the cosine value, which is greater than 0.9, and the cosine coefficient threshold value of 0.9 can be adjusted to be larger so as to reduce the number of the samples with the maximum matching degree; however, the reduction of the maximum matching degree sometimes causes the missing judgment of similar samples; at this time, the comprehensive similarity coefficient is combined for judgment; the construction method of the comprehensive similarity coefficient comprises the following steps: samples with equal maximum matching degree called from R1 and R2, namely cosine coefficient matrixes of the samples, are recorded as MaxpCorrA and MaxpCorrB, and the cosine coefficients of each sample in the MaxpCorrA and the MaxpCorrB are respectively summed, namelyAnd

is provided withFor matching the total coefficient before transposing of the sample data matrix

For matching the transformed total coefficients of the sample data matrix

Let the overall similarity coefficient R be a × B,

namely:

and comparing the R value of each matching file (R is less than or equal to 1), wherein the larger the R value is, the higher the similarity with the tested sample is, and identifying the sample with the maximum R value as the best matching sample compared with the tested sample.

5. The method for identifying and comparing the three-dimensional fluorescence spectrum of the soluble organic compounds according to claim 1, wherein the method comprises the following steps: when a three-dimensional fluorescence spectrum database of water environment soluble organic matters is constructed and sample comparison is carried out, and a detected sample is compared and does not belong to a member of the reference comparison database, the sample is automatically added into the reference comparison database, so that the database construction and comparison are carried out simultaneously, the reference comparison database has automatic expansion functions of automatic improvement, automatic updating and the like, and the identification and comparison range of the detected sample is improved.

6. The method for identifying and comparing the three-dimensional fluorescence spectrum of the soluble organic compounds according to claim 1, wherein the method comprises the following steps: and (3) carrying out basic data processing and sorting such as blank deduction, interference elimination and the like on the fluorescence spectrum data of the detected sample, and calculating and matching the similarity coefficient with the sample of the reference comparison database without excessive mathematical analysis, thereby obtaining quick and accurate identification and discrimination information.

7. The method for identifying and comparing the three-dimensional fluorescence spectrum of the soluble organic compounds according to claim 1, wherein the method comprises the following steps: the clustering method comprises the steps of carrying out data processing such as normalization on characteristic values such as peak position coordinates and peak intensity of each sample through identification of the peak position coordinates and peak intensity information of the sample, establishing a generalized radial basis function probability neural network, and realizing clustering, classification and discrimination of large samples by combining a Bayesian decision theory.

8. The method for identifying and comparing the three-dimensional fluorescence spectrum of the soluble organic compounds according to claim 1, wherein the method comprises the following steps: by utilizing algorithms such as a parallel factor and the like, three-dimensional fluorescence spectrum data obtained by analyzing a fluorescence spectrum of a soluble organic matter in a water environment and three-dimensional fluorescence spectrum data of each component forming the fluorescence spectrum are respectively subjected to cosine similarity coefficient calculation on rows and columns of the data matrix to obtain a comprehensive similarity coefficient, so that further classification and identification information of a sample are achieved.

9. The method for identifying and comparing the three-dimensional fluorescence spectrum of the soluble organic compounds according to claim 1, wherein the method comprises the following steps: according to different purposes, the cosine coefficient similarity matching identification of the detected sample is realized in three stages, namely a first stage, after the fluorescence spectrum data passes through a necessary data sorting stage, identification comparison information is obtained through cosine coefficient calculation before and after related data matrix transposition; in the second stage, after the sample is subjected to peak position coordinates and peak intensity, a probabilistic neural network clustering and classifying method is established, so that effective classification and identification of large samples are realized; in the third stage, after necessary analysis is performed on the sample fluorescence spectrum data, namely after the analyzed fluorescence spectrum data and the fluorescence spectrum data of each component are obtained, comparison between the cosine coefficients before and after the sample matrix transposition of the measured sample and the cosine coefficients before and after the sample transposition of the reference comparison database can still be utilized, namely, identification and comparison are performed through the constructed comprehensive similarity coefficient.

[ technical field ] A method for producing a semiconductor device

The invention relates to the fields of environmental science, food and beverage, traditional Chinese medicine materials, spectroscopy and the like, in particular to a method for identifying and comparing a three-dimensional fluorescence spectrogram of soluble organic matters in a water environment.

[ background of the invention ]

In recent years, with the development of a fluorescence photometer, a fluorescence spectrometer having excitation wavelength scanning and emission wavelength scanning functions is increasingly popularized, and a three-dimensional fluorescence spectrum obtained by scanning the fluorescence spectrometer is applied to a plurality of fields, such as petroleum product development, oil well drilling classification, drug synthesis and impurity identification, Chinese medicinal material genuine identification, beverage detection of white spirit tea and the like, medical inspection, pesticide residue and the like. In order to make up the defects of the monitoring technology and the detection technology parameter in the existing water environment, the invention provides a method for identifying and comparing the fluorescence spectrum spectrogram of a soluble organic matter in the water environment.

Each fluorescent substance has unique fluorescence spectrum information, and the fluorescence detection has higher sensitivity and selectivity, so the fluorescent substance is widely applied. However, it is extremely difficult to obtain comprehensive and accurate identification and discrimination information of dissolved organic matters by only depending on a contour fluorescence spectrum consisting of excitation wavelength, emission wavelength and fluorescence intensity projection. This is because, for a complex mixed system, the fluorescence spectrum obtained by scanning is a comprehensive result of mutual influence between each component of the mixed system, so people need a method which can not only identify the three-dimensional fluorescence spectrogram of soluble organic matters of a single component system and a multi-component mixed system, but also compare the data information of the composition spectrogram with the existing standard or reference information to complete tasks such as classification, identification, comparison and tracing. At present, the common fluorescent spectrum identification method at home and abroad is to analyze the fluorescent spectrum by utilizing a parallel factor analysis method, a partial least square method, an alternative trilinear decomposition method, a non-negative matrix factor decomposition method and the like to obtain multi-component fluorescent information and then establish the identification method. However, a reliable scientific method is not yet available for how to automatically and effectively identify and compare fluorescence spectra obtained at different times or in different ways by using a spectral analysis method. Some identification methods only use the relation between the peak position and the peak intensity of the fluorescence spectrum to carry out discrimination and comparison; the other identification method is used for identifying the components of the mixed three-dimensional fluorescence spectrum by constructing a comprehensive similarity index through the characteristic peak and waveform characteristic parameters of two-dimensional fluorescence spectra decomposed by a multi-dimensional analysis algorithm; the former identifies a sample by simply utilizing so-called characteristic parameters such as peak position, fluorescence intensity and the like when the central positions of fluorescence peaks are close to some complex multi-component systems, has great artificial randomness and has lower identification or comparison accuracy; in the latter, because the identification or comparison method is based on the specific parallel factor method (PARAFAC), the output result sequence of the algorithm is uncertain, which may also cause the erroneous determination of identification or comparison. In practice, people need to identify and compare the working range of the people or the interested water environment samples, for example, different brands of white spirits have different three-dimensional fluorescence spectra, and the fluorescence spectrum spectrums of the same brand of white spirits are different due to different production processes and different storage times; similarly, the three-dimensional fluorescence spectra of the water extracts of the same Chinese medicinal materials planted in different regions are different; there is sometimes a pressing need in sewage treatment plants or chemical parks to know the type, concentration and source of sewage. Therefore, the method utilizes the three-dimensional fluorescence spectrum spectrogram and data to rapidly identify and distinguish the authenticity of the white spirit and the traditional Chinese medicinal materials, and has very important significance for monitoring, detecting and tracing the environmental water quality.

[ summary of the invention ]

The invention aims to provide a method for identifying or distinguishing soluble organic matters in a water environment through a three-dimensional fluorescence spectrum, and aims to solve the problems of component identification misjudgment and low identification rate caused by simply identifying a constructed similarity index of an analyzed two-dimensional fluorescence spectrum when the three-dimensional fluorescence spectrum is analyzed by using a multi-dimensional analysis algorithm. On the basis of constructing a three-dimensional fluorescence spectrum database of water environment soluble organic matters, establishing a matrix which is the same as fluorescence data of each standard sample of the database according to a fluorescence data matrix of a detected sample, calculating cosine similarity and the like to obtain two indexes of maximum matching degree and comprehensive similarity coefficient, and forming identification and comparison information of a spectrogram according to the maximum similarity matching coefficient.

In order to achieve the aim, the invention provides a three-dimensional fluorescence spectrum identification and comparison method by constructing water environment soluble organic matters, which is characterized in that a reference comparison database is constructed for storing three-dimensional fluorescence spectrum information of various wine beverages, traditional Chinese medicines and various water environment soluble organic matters and storing three-dimensional fluorescence spectrum information of soluble organic matters with known physicochemical properties such as chemical formulas and structural formulas; constructing a three-dimensional fluorescence spectrum data matrix and deducting an interference peak processing algorithm to obtain standard three-dimensional fluorescence spectrum data with interference such as Rayleigh scattering, Raman scattering and the like deducted; analyzing the fluorescence spectrum by constructing a parallel factor algorithm to obtain the fluorescence spectrum of the detected sample and the fluorescence spectrum data of each component; constructing a three-dimensional fluorescence spectrum query method for extracting query retrieval information from each spectrum data in a reference comparison database; constructing a cosine similarity or adjusting a cosine similarity calculation method for calculating, identifying and distinguishing similarity coefficients of the tested sample and each sample in the library; constructing a fluorescence spectrum data extraction query retrieval information algorithm for identifying, comparing, classifying and tracing the detected sample; and the document and data output unit is constructed and used for outputting detection, identification, matching results and analysis reports.

Compared with the prior art, the invention has the following advantages:

(1) the existing identification and comparison of three-dimensional fluorescence spectra is based on analytical methods such as a parallel factor analysis method, a partial least square method, an alternative trilinear decomposition method, a non-negative matrix factorization method and the like, and then an identification and comparison method is established by obtaining multi-component fluorescence information. The method can identify and compare samples under three different requirements according to different requirements of users; if only the comparison information is needed, excessive mathematical analysis is not needed, and only basic data processing and sorting such as blank deduction, interference elimination and the like are needed to be carried out on the fluorescence spectrum data of the detected sample, the calculation and matching of the similarity coefficient can be carried out on the fluorescence spectrum data of the detected sample and the sample of the reference comparison database, so that accurate identification and judgment information is obtained, the identification and comparison accuracy is high, and the detection speed is high.

(2) The method continuously and automatically expands the information of the reference comparison database through synchronously constructing the database of the reference comparison database and measuring the fluorescence spectrum of the tested sample, namely: when the tested sample is compared and does not belong to the reference comparison database member, the sample is automatically added into the reference comparison database, so that the reference comparison database has automatic expansion functions of automatic improvement, automatic updating and the like, and the identification and comparison range of the tested sample is improved. The water quality under the water environment can be monitored, detected and traced in real time on site, and soluble fluorescent organic matters generated in the processes of Chinese medicinal water extract, white spirit beverage, vegetable pesticide residue, organic medicine synthesis and the like can be detected, identified and compared.

(3) The method comprises the steps of forming a matrix of m × n (m represents the number of collected fluorescence spectrums and n represents a vector consisting of peak positions and fluorescence intensities) by using the coordinates of the peak positions of fluorescence spectrum data such as peak 1(280, 330), peak 2(350, 450), peak 3(410, 520) and the like of each processed fluorescence spectrum data and the information of the fluorescence intensities such as Q1, Q2, Q3 and the like, constructing a database with the characteristics of the coordinates of the peak positions, the peak intensities and the like by using the method, forming a probabilistic neural network by using a generalized radial basis function neural network with supervised learning in the center positions and the weights and combining a Bayesian decision theory, and realizing the clustering, classification and discrimination of the measured samples according to the characteristic that a decision curved surface formed by the neural network is very close to a curved surface under the Bayesian optimal criterion.

(4) And respectively carrying out cosine coefficient (including modified cosine coefficient) similarity calculation before and after transposition on the classified measured sample by using the fluorescence spectrum data matrix, constructing a similarity coefficient product as a comprehensive similarity coefficient, and accordingly, using the similarity coefficient product as a basis for identification and judgment. And according to the established index, inquiring and matching to obtain the information of the measured result of the measured sample, and outputting the information in a text form. The method improves the accurate identification and comparison of the samples because the cosine coefficient before and after the transposition is calculated for the sample data matrix.

(5) The method analyzes the fluorescence spectrum of the soluble organic matters in the water environment by utilizing algorithms such as a parallel factor and the like to obtain more precise three-dimensional fluorescence spectrum data and three-dimensional fluorescence spectrum data of each component forming the fluorescence spectrum. The cosine coefficient calculation is carried out on the rows and the columns of the data matrix to obtain the maximum matching degree and the comprehensive similarity coefficient, thereby further classifying and identifying the samples.

[ description of the drawings ]

FIG. 1 is a flow chart of a three-dimensional fluorescence spectrogram identification and comparison method;

FIG. 2 is a three-dimensional contour plot of the measured sample S after interference subtraction;

FIG. 3 is a three-dimensional contour plot of the measured sample S component 1;

FIG. 4 is a three-dimensional contour plot of the measured sample S, fraction 2;

FIG. 5 is a three-dimensional contour plot of the measured sample S, fraction 3;

fig. 6 is a three-dimensional contour plot of the measured sample S component 4.

[ detailed description ] according to the present embodiment

1. Creation of three-dimensional fluorescence spectrum data matrix: xlsx, in order to facilitate calculation, when a data matrix is constructed, an original output file needs to be sorted, row and column information needing to be read by relevant software is specified, a non-data part in the file is deducted, and a corresponding variable name, such as a structural body named OriginalData, is stored in a software working area, wherein the structural body mainly comprises originaldata.ex, originaldata.em and originaldata.x, namely fields of excitation wavelength data, emission wavelength data, fluorescence intensity data and the like; the structure may be added with field information such as the number of excitation wavelengths originaldata. nex, the number of emission wavelengths originaldata. nem, the number of samples to be measured, and the like, as necessary.

2. Subtracting interference of first and second Rayleigh scattering and Raman scattering: because Rayleigh scattering and Raman scattering interfere with the analysis of the three-dimensional fluorescence spectrum, the interference data must be subtracted and replaced by 0 or a background value when the correlation calculation is carried out.

3. And (3) constructing a reference comparison database: the method comprises the steps of 1, 2, scanning and data processing are carried out on collected samples of the information of known sources, and the samples are stored in a designated directory according to sample names or serial numbers. In this way, the reference comparison database is continuously expanded and perfected. The library may be composed of a known compound information library, an industry information library, an enterprise information library, or the like, depending on the purpose of the library construction.

All sample information of the reference comparison database, such as file names, creation dates, serial numbers, byte numbers and the like, can be acquired through a dir command, the variable names are stored as list structure arrays, the number of samples in the database can be acquired through the array calculation, the number is recorded as Nsamples, the number is from 1 to Nsamples, data files of all samples are read into a working area, the data files are recorded as Y, the Y is a cell array, each sample data in the reference comparison database is recorded, and the data structure of each sample is an originaldata.NEx × originaldata.NEm matrix.

4. And (3) calculating a cosine similarity coefficient: if a and b are two vectors and the included angle is theta, then,

the cosine formula is:

further assume that the a vector is [ x ]₁,y₁]The b vector is [ x ]₂,y₂]Then the above cosine formula can be rewritten as:

it has been shown that this method of calculating the cosine,the same is true for the n-dimensional vector. That is, if A and B are two n-dimensional vectors, A is [ A ]₁,A₂,...,A_n]B is [ B ]₁,B₂,...,B_n]The cosine of the angle θ between a and B is equal to:

the closer the cosine value cos θ is to 1, the closer the angle is to 0 degrees, i.e., the more similar the two vectors are. In order to improve or improve the accuracy of the identification and comparison, the modified cosine similarity may be used for calculation if necessary.

5. And (3) judging the comprehensive similarity coefficient and the similarity, namely calculating a cosine value cos theta of each column of each sample data matrix of the tested sample data matrix X and each sample data matrix of Y in the reference comparison database to obtain a cosine value numerical matrix (origin IData. NEx × Nsamples) consisting of the excitation wavelength number and the library sample number, wherein a field is named as R1, and calculating the cosine value cos theta of each row of each sample data matrix of the tested sample data matrix and the reference comparison database to obtain a cosine value numerical matrix (origin IDaNEm × Nsamples) consisting of the emission wavelength number and the library sample number, wherein the field is named as R2.

Assuming that the cosine value is greater than 0.9 (which can be changed as required) as the comparison limit of the tested sample and the reference comparison database sample, calculating the number of cosine values of each sample in R1 which is greater than 0.9, and recording the name of the file as NPCorrA; calculating the number of cosine values of each sample in R2 which are more than 0.9, and recording the name of a file as NPCorrB; NPCorrA and NPCorrB record the number and sample coordinate information of cosine value more than 0.9 calculated by row and column of each sample and the tested sample in the reference comparison sample library. In general, a greater sum of NPCorrA and NPCorrB for a sample in the library indicates greater similarity to the sample being tested. Therefore, the sample with the largest sum of NPCorrA and NPCorrB is most similar to the sample to be measured, and the sum of NPCorrA and NPCorrB is named as the maximum matching degree; the sum of NPCorrA and NPCorrB is less than or equal to the sum of originaldata. Namely:

NPCorrA+NPCorrB≤OriginaIData.NEx+OriginalData.NEm

sometimes, the sum of NPCorrA and NPCorrB of some samples in the library is equal, i.e. the maximum matching degree is equal, and the similarity degree with the tested sample cannot be judged according to the number of cosine values larger than 0.9. In this case, it is necessary to perform discrimination by combining the overall similarity coefficient.

The construction method of the comprehensive similarity coefficient comprises the following steps:

samples with NPCorrA + NPCorrB having the maximum value equal are listed from the list library and recorded under the name MatchingFile. MatchingFile is a structure array, similar to the structure array list, and records the information of the matched file. That is, list records all file information, and MatchingFile records some file information matched with the tested sample, such as the name of the matched file, data acquisition time, byte size and data volume.

From the record information of MatchingFile, cosine coefficient matrixes of the matched samples are called from R1 and R2, recorded as MaxpCorrA and MaxpCorrB, and the cosine coefficients of each sample in the MaxpCorrA and the MaxpCorrB are respectively summed, namelyAnd

is provided withFor matching the total coefficient before transposing of the sample data matrix

For matching the transformed total coefficients of the sample data matrix

Let the overall similarity coefficient R be a × B,

namely:

and (3) comparing the R value of each matched file in the MatchingFile (R is less than or equal to 1), wherein the larger the R value is, the higher the similarity with the tested sample is, and recording the sample with the maximum R value as the best matched sample compared with the tested sample.

6. And (4) outputting and reporting a result: and outputting the information of the tested sample and the information of the matched sample in a text form, outputting related information such as the matched sample, the best matched sample and the like in the text together, and simultaneously forming an identification, classification, identification and comparison report of a comparison result.

7. In some cases, satisfactory identification and comparison results cannot be obtained by the above method, and further processing is required to refer to the sample data in the comparison sample library and the sample data to be tested.

8. Determination of the coordinates of the fluorescence spectrum peak of the detected sample: the fluorescence spectrum data of the sample to be measured was loaded into the software work area, and a structure named OriginalData was obtained according to embodiment 1, which contains fields such as originaldata.ex, originaldata.em, originaldata.x, and the like. By setting a reasonable peak value interval and a peak searching program, the position coordinate and fluorescence intensity information of a fluorescence peak are obtained, and a variable name PksEmExA is recorded in a working area. If necessary, for example, for the determination of complex fluorescence spectrum or the peak position coordinates of fluorescence spectrum with overlap, after the conversion of each field in OriginalData, the fluorescence peak position coordinates and fluorescence intensity information can be obtained by the same method, and the variable name is PksEmExB recorded in the working area.

9. Constructing a peak position coordinate information base: and taking the peak position coordinate and the peak intensity information of each sample out of PksEmExA and PksEmExB, and forming a new cellular array with a variable name of Klist with list, wherein the cellular array not only contains the basic information of the original data files, but also contains the peak position coordinate and the peak intensity information of each original data file.

10. Constructing clusters of the extended reference alignment database: and (4) according to a K-MEANS function of the K-MEANS clustering, constructing a clustering system for the peak position coordinates and the peak intensity information of each sample taken from Klist, and if necessary, standardizing the data. The clustering result will generate a set of sets, and the similarity between the objects in the set and the objects in the same set is higher; while objects in different sets have less similarity. Thus, the samples in the database are classified according to the expanded reference comparison, and a plurality of data file information with the similar peak position coordinates and peak values are stored in each class (set). The clustering method determines the number of the clustering categories by calculating a statistic correlation coefficient (R-SQUARE) or an Adjusted correlation coefficient (Adjusted R-SQUARE) and the like.

11. Clustering is necessary for large reference alignment databases. Because the clustering classification is carried out on each sample of the database, the data of the tested sample is called into a Bayes classifier or a probabilistic neural network to carry out discrimination classification and discrimination, after the discrimination structure is distributed to the belonged class, the samples of the belonged class are matched and calculated by utilizing the identification standards such as the comprehensive similarity coefficient, and the comparison information of the tested sample is obtained. After such clustering and classification, the amount of calculation of the comprehensive similarity coefficient can be reduced greatly.

12. Repeating the steps 8-11 of the embodiment, the reference comparison database can be automatically expanded, for example, after the tested sample is clustered by K-MEANS and the like, if the tested sample is judged to be a new class, a new class is generated; if the new category is not generated, judging the category as a certain category by a Bayesian classifier or a probabilistic neural network, judging the category as similar or equal by the magnitude of the comprehensive similarity coefficient, and if the category is similar, storing the sample information into the category; if so, the sample need not be stored in the reference alignment database.

13. If the user has further requirements on the analysis information of the tested sample, more multi-component fluorescence information can be obtained on the basis of analysis methods such as a parallel factor analysis method, a partial least square method, an alternative trilinear decomposition method, a non-negative matrix factorization method and the like, and then an identification and comparison method is established.

Taking the parallel factor analysis method as an example, a structure file named OriginalData, which contains fields of originaldata.ex, originaldata.em, originaldata.x, originaldata.nex, originaldata.nem, originaldata.nx, and the like, is called into a work area, and is analyzed by a parallel factor method. Three-dimensional fluorescence spectrum data (including three-dimensional fluorescence spectrum data of each component), excitation wavelength loading factor two-dimensional data, emission wavelength loading factor two-dimensional data and concentration score factor data calculated according to different component score models can be obtained.

The three-dimensional fluorescence spectrum data of each component obtained by calculation according to different component number models are respectively stored as CompX1.csv (component 1), CompX2.csv (component 2), CompX3.csv (component 3), … …, CompXn. csv (component n) and the like, and the excitation wavelength load factor two-dimensional data, the emission wavelength load factor two-dimensional data and the concentration score factor data are stored as Em L oadingsX. csv, Ex 35L oadingsX. csv and Conc L oadingsX. csv.

The three-dimensional fluorescence spectrum data of the sample S to be measured is processed by the same similar method to obtain CompS1.csv (component 1), CompS2.csv (component 2), CompS3.csv (component 3), … …, CompSn. csv (component n), etc., and the excitation wavelength loading factor two-dimensional data, the emission wavelength loading factor two-dimensional data and the concentration scoring factor data are stored as Em L oadingS.csv, Ex L oadingS.csv and Conc L oadingS.csv.

According to the method of the step 5 of the embodiment, the comparison information of each component of a certain sample in the tested sample and the reference comparison database can be obtained by calculating the comprehensive similarity coefficient R before and after the matrix transposition.

15页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：一种利用量子点敏化上转换纳米材料检测凝血酶的方法

Method for identifying and comparing three-dimensional fluorescence spectrogram of soluble organic matter

相关技术

网友询问留言