Prediction method, equipment and medium of toxicity effect of phthalate on zebra fish

文档序号:1478002 发布日期:2020-02-25 浏览:11次 中文

阅读说明:本技术 邻苯二甲酸酯对斑马鱼毒性效应的预测方法、设备及介质 (Prediction method, equipment and medium of toxicity effect of phthalate on zebra fish ) 是由 杨彦 陈瑞琰 陈浩佳 于 2019-09-23 设计创作,主要内容包括:本发明公开了一种邻苯二甲酸酯对斑马鱼毒性效应的预测方法、设备及介质,所述方法包括:根据多种邻苯二甲酸酯对斑马鱼的毒性数据,选择毒性效应终点值,构建毒性数据集;获取每种邻苯二甲酸酯对应的结构参数,构建结构描述符数据集;将毒性效应终点值作为因变量,将每种邻苯二甲酸酯对应的结构参数为自变量,计算两因子之间的相关系数;根据相关系数,筛选出与毒性效应终点值显著相关的结构参数,获得最佳结构描述符;建立多元线性回归方程,构建定量构效关系模型;利用定量构效关系模型对未知邻苯二甲酸酯对斑马鱼的毒性效应终点值进行预测。本发明可以为邻苯二甲酸酯类的化合物的毒性研究提供参考,对该类化合物的健康风险评估具有重要意义。(The invention discloses a prediction method, equipment and a medium of toxic effect of phthalic acid ester on zebra fish, wherein the method comprises the following steps: selecting a toxicity effect terminal value according to toxicity data of various phthalates on the zebra fish, and constructing a toxicity data set; obtaining structural parameters corresponding to each phthalate, and constructing a structural descriptor data set; taking a toxic effect terminal value as a dependent variable, taking a structure parameter corresponding to each phthalate as an independent variable, and calculating a correlation coefficient between the two factors; screening out structural parameters obviously related to the toxic effect terminal value according to the correlation coefficient to obtain an optimal structure descriptor; establishing a multiple linear regression equation and constructing a quantitative structure-activity relationship model; and predicting the toxic effect end point value of the unknown phthalate on the zebra fish by using a quantitative structure-activity relationship model. The invention can provide reference for toxicity research of phthalate compounds and has important significance for health risk assessment of the compounds.)

1. A method for predicting toxic effects of phthalates on zebrafish, the method comprising:

selecting a toxicity effect terminal value according to toxicity data of various phthalates on the zebra fish, and constructing a toxicity data set;

obtaining structural parameters corresponding to each phthalate, and constructing a structural descriptor data set;

taking a toxicity effect end point value in a toxicity data set as a dependent variable, taking a structure parameter corresponding to each phthalate in a structure descriptor data set as an independent variable, and calculating a correlation coefficient between the two factors;

screening out structural parameters obviously related to the toxic effect terminal value according to the correlation coefficient, and determining an optimal structural descriptor;

establishing a multiple linear regression equation according to the optimal structure descriptor and the toxicity data set, and constructing a quantitative structure-activity relationship model;

and predicting the toxic effect end point value of the unknown phthalate on the zebra fish by using a quantitative structure-activity relationship model.

2. The prediction method according to claim 1, wherein the selecting a toxicity effect end point value according to toxicity effects of a plurality of phthalates on zebra fish and constructing the toxicity data set specifically comprises:

inputting keywords of phthalate compound name, zebra fish and toxicity into a plurality of related databases to obtain toxicity data of the phthalate to the zebra fish;

and (3) screening out a toxicity effect terminal value under the same experimental conditions according to toxicity data of the various phthalates on the zebra fish, and constructing a toxicity data set.

3. The prediction method of claim 1, wherein the calculating of the correlation coefficient between the two factors is as follows:

in the formula (I), the compound is shown in the specification,

Figure FDA0002211157510000012

4. The prediction method according to claim 1, wherein the screening out the structural parameters significantly correlated to the toxicity effect endpoint value according to the correlation coefficient, and determining the optimal structural descriptor specifically comprises:

taking the correlation coefficient larger than 0.9 as a significant correlation coefficient, and screening out structural parameters significantly related to the toxicity effect end point value;

determining an optimal structural descriptor by principal component analysis based on structural parameters significantly correlated to the toxic effect endpoint value.

5. The prediction method according to any one of claims 1 to 4, wherein a multiple linear regression equation is established based on the optimal structure descriptor and the toxicity data set, specifically:

taking the optimal structure descriptor as an independent variable, taking a toxic effect terminal value as a dependent variable Y, and establishing a multiple linear regression equation Y (AX + B) by using a multiple linear regression analysis method, wherein:

Figure FDA0002211157510000021

in the formula, n is the number of observed values; m is the number of structure parameters in the optimal structure descriptor; a represents unknown parameters and is estimated by a least square method; b represents a random error, reflecting the division by x1,x2,...,xmThe influence of random factors other than the linear relation to Y on Y;

the estimation is performed by the least squares method, as follows:

Figure FDA0002211157510000022

in the formula, XTIs a transposed matrix of X.

6. The prediction method of claim 5, wherein the goodness-of-fit test indicators of the multiple linear regression equation include a determinant coefficient, a degree-of-freedom correction determinant coefficient, and a root mean square error, and the F-test indicators include a F-value and a correlation probability p calculated by one-factor analysis of variance, as follows:

Figure FDA0002211157510000023

Figure FDA0002211157510000024

Figure FDA0002211157510000025

in the formula, yiRepresents observed values, and y represents predicted toxicity effect of the ith phthalateThe value of the end point should be,

Figure FDA0002211157510000031

checking by using p value corresponding to F statistic, if R2More than or equal to 0.8, the significant level is gamma, and when p is less than gamma, the multiple linear regression equation is significant.

7. The prediction method according to any one of claims 1 to 4, wherein after the constructing the quantitative structure-activity relationship model, the method further comprises:

verifying the quantitative structure-activity relationship model, specifically comprising the following steps:

for each phthalate, randomly extracting one sample in the toxicity data set as a prediction set, and taking the rest samples as training sets;

establishing a multiple linear regression equation according to the training set and the optimal structure descriptor, and calculating a predicted toxicity effect end point value of each phthalate;

calculating cross validation correlation coefficient Q2 cvAnd cross-validation root mean square error RMSECV, as follows:

Figure FDA0002211157510000035

in the formula, yi obsRepresents the measured toxic effect end point value, y, of the i-th phthalatei predcvRepresenting the predicted toxicity effect end point value of the ith phthalate of the quantitative structure-activity relationship model,

Figure FDA0002211157510000036

if Q2 cvGreater than 0.6, RMSECV less than or equal to 0.4, and R2-Q2 cvIf the quantitative structure-activity relationship model is less than or equal to 0.3, the quantitative structure-activity relationship model passes verification; wherein R is2Indicating the decision coefficient.

8. The prediction method according to claim 7, wherein after the quantitative structure-activity relationship model is verified, the method further comprises:

the application range of the quantitative structure-activity relationship model is calculated by adopting a lever value method, and specifically comprises the following steps:

calculating a lever value hiThe following formula:

hi=Xi T(XTX)-1Xi

in the formula, xiA column vector representing the composition of structural parameters corresponding to the ith phthalate;

calculating a critical value h as follows:

in the formula, p represents the variable number in the quantitative structure-activity relationship model, and n represents the number of phthalate compounds in the training set;

drawing a scatter diagram by taking the lever value of each structural parameter in the optimal structure descriptor as the abscissa and the predicted residual as the ordinate, and hiAnd the coordinate space less than h is the application range of the quantitative structure-activity relationship model.

9. A computer device comprising a processor and a memory for storing a program executable by the processor, wherein the processor, when executing the program stored in the memory, implements the prediction method of any one of claims 1 to 8.

10. A storage medium storing a program, wherein the program, when executed by a processor, implements the prediction method according to any one of claims 1 to 8.

Technical Field

The invention relates to a method, equipment and a medium for predicting toxic effect of phthalic acid ester on zebra fish, and belongs to the fields of ecotoxicology, environmental pollution and human health.

Background

Phthalic Acid Esters (PAEs) are formed by esterification of phthalic anhydride and alcohol in the presence of an acid catalyst (such as sulfuric acid), are large lipid-soluble compounds, are identified as a fourth class of toxic chemical substances, and have wide application in various industries.

Phthalate is colorless or yellowish and tasteless oily viscous liquid at normal temperature, is slightly soluble in water, and is easily soluble in organic solvents. As a common plasticizer, the plasticizer is commonly used for improving the mechanical property of plastic materials and increasing the plasticity and strength of products. The plasticizer can be used for manufacturing plastic containers, infant toys and the like, and can also be added into plastic products such as cosmetics, building materials, medical parts and the like.

Phthalate esters are used in amounts of over 500 million tons per year worldwide, and due to their widespread use in industry and daily life, these phthalates enter large amounts into environmental media such as water, soil, organisms and the atmosphere, and these compounds are now one of the major environmental pollutants worldwide.

In recent years, as researchers at home and abroad research on phthalate ester, the phthalate ester is discovered to have other toxicological effects such as reproductive toxicity, neurotoxicity, carcinogenicity, teratogenicity, mutagenicity, endocrine interference and the like, and the phthalate ester attracts international social attention.

Quantitative Structure-Activity Relationship (QSAR) is a method for establishing Quantitative Relationship between physicochemical property parameters and various structural parameters of molecules by means of mathematics and statistics and searching for the Relationship between the microstructure and the bioactivity of a compound. The quantitative structure-activity relationship belongs to the category of computational toxicology, and is widely applied to prediction of biological toxicity data in health risk assessment at present. Katiechan et al predict the cytotoxicity of halobenzene compounds on rat and human hepatocytes by using a quantitative structure-activity relationship model; MounirGhamali et al predict the toxicity of phenol and thiophenol to luminescent bacteria using a quantitative structure-activity relationship model. Besides, many domestic scholars also use quantitative structure-activity relationship models to predict the biotoxicity of compounds, such as Lianmian, etc., use quantitative structure-activity relationship models to predict the toxicity of chlorophenols-containing compounds to Dunaliella salina in the sea, and use quantitative structure-activity relationship models to predict the acute toxicity of nitroaromatic compounds to tetrahymena pyriformis.

The zebra fish is a model animal for ecotoxicology research, has the advantages of easy feeding, large number of offspring, small volume and the like, and is widely applied to the aspects of pollutant ecotoxicology effect, water quality monitoring and the like.

The toxicity of phthalate esters to zebrafish is currently generally based on experimental tests. Experimental studies of Muiches et al show that dioctyl phthalate (DEHP) and dibutyl phthalate (DBP) can induce zebra fish embryos to have a series of dysplasias including abnormal autonomic movement, heart rate reduction, spinal curvature, pericardial edema and the like. Neixianping et al also researches the toxic effect of 4 phthalate compounds including dimethyl phthalate (DMP), diethyl phthalate (DEP), di-n-butyl phthalate (DBP) and diisooctyl phthalate (DEHP) on the embryonic development of the zebra fish through experiments, and the compounds have obvious inhibition effect on the embryonic development of the zebra fish and can cause embryonic development deformity and even death. However, the obtained data lacks reliability due to a large number of uncertain factors such as a long experimental period, high cost, high background interference, and susceptibility to the limitation of detection condition detection technology. Moreover, there are many kinds of compounds, and the individual experimental tests of these compounds have failed to provide sufficient ecotoxicity data for health risk assessment efficiently and quickly. Although computational toxicology is applied to the scientific fields of toxicology, environmental chemistry, bioinformatics and the like, no related report on the research of toxic effect of phthalate on zebra fish based on quantitative structure-activity relationship exists.

Disclosure of Invention

In view of the above, the invention provides a method, a system, a computer device and a storage medium for predicting toxicity effects of phthalic acid ester on zebra fish, which are used for building a model based on toxicity data of various phthalic acid esters on zebra fish, researching the ecotoxicology effects of the phthalic acid esters, providing references for toxicity research of phthalate compounds while reducing experiment cost and shortening time period, and having important significance for health risk assessment of the compounds.

The first purpose of the invention is to provide a prediction method of toxicity effect of phthalate on zebra fish.

The second purpose of the invention is to provide a prediction system of toxicity effect of phthalate on zebra fish.

It is a third object of the invention to provide a computer apparatus.

It is a fourth object of the present invention to provide a storage medium.

The first purpose of the invention can be achieved by adopting the following technical scheme:

a method of predicting the toxic effect of phthalates on zebrafish, the method comprising:

selecting a toxicity effect terminal value according to toxicity data of various phthalates on the zebra fish, and constructing a toxicity data set;

obtaining structural parameters corresponding to each phthalate, and constructing a structural descriptor data set;

taking a toxicity effect end point value in a toxicity data set as a dependent variable, taking a structure parameter corresponding to each phthalate in a structure descriptor data set as an independent variable, and calculating a correlation coefficient between the two factors;

screening out structural parameters obviously related to the toxic effect terminal value according to the correlation coefficient, and determining an optimal structural descriptor;

establishing a multiple linear regression equation according to the optimal structure descriptor and the toxicity data set, and constructing a quantitative structure-activity relationship model;

and predicting the toxic effect end point value of the unknown phthalate on the zebra fish by using a quantitative structure-activity relationship model.

Further, selecting a toxicity effect end point value according to toxicity effects of the plurality of phthalates on the zebra fish, and constructing a toxicity data set, wherein the toxicity data set specifically comprises the following steps:

inputting keywords of phthalate compound name, zebra fish and toxicity into a plurality of related databases to obtain toxicity data of the phthalate to the zebra fish;

and (3) screening out a toxicity effect terminal value under the same experimental conditions according to toxicity data of the various phthalates on the zebra fish, and constructing a toxicity data set.

Further, the correlation coefficient between the two factors is calculated as follows:

Figure BDA0002211157520000031

in the formula (I), the compound is shown in the specification,

Figure BDA0002211157520000032

denotes the mean value, x, of the respective structural parameters and toxic effect end-point valuesiAnd yiRespectively representing the structural parameters and the toxic effect terminal values corresponding to the ith phthalate; n represents the amount of the phthalate compound.

Further, screening out a structural parameter significantly related to the toxic effect endpoint value according to the correlation coefficient, and determining an optimal structure descriptor, specifically comprising:

taking the correlation coefficient larger than 0.9 as a significant correlation coefficient, and screening out structural parameters significantly related to the toxicity effect end point value;

determining an optimal structural descriptor by principal component analysis based on structural parameters significantly correlated to the toxic effect endpoint value.

Further, the establishing a multiple linear regression equation according to the optimal structure descriptor and the toxicity data set specifically comprises:

taking the optimal structure descriptor as an independent variable, taking a toxic effect terminal value as a dependent variable Y, and establishing a multiple linear regression equation Y (AX + B) by using a multiple linear regression analysis method, wherein:

Figure BDA0002211157520000033

in the formula, n is the number of observed values; m is the number of structure parameters in the optimal structure descriptor; a represents unknown parameters and is estimated by a least square method; b represents a random error, reflecting the division by x1,x2,…,xmThe influence of random factors other than the linear relation to Y on Y;

the estimation is performed by the least squares method, as follows:

Figure BDA0002211157520000041

in the formula, XTIs a transposed matrix of X.

Further, the goodness-of-fit test indicators of the multiple linear regression equation include a decision coefficient, a degree-of-freedom correction decision coefficient and a root mean square error, and the F-test indicators include an F value and a related probability p calculated by a one-factor variance analysis, which are as follows:

Figure BDA0002211157520000042

Figure BDA0002211157520000043

Figure BDA0002211157520000045

in the formula, yiRepresenting the observed value, y representing the predicted toxicity effect end point value for the i-th phthalate,

Figure BDA0002211157520000046

represents the average value of the respective toxic effect end points, n represents the amount of the phthalate ester compound,

Figure BDA0002211157520000047

expressing the predicted value of a multiple linear regression equation, R2The coefficient of the decision is represented by,

Figure BDA0002211157520000048

denotes a degree-of-freedom correction decision coefficient, RMSE denotes a root mean square error, ss (total) denotes a sum of squares of total errors, and ss (residual) denotes a sum of squares of residual errors;

checking by using p value corresponding to F statistic, if R2Not less than 0.8, significant level is gamma, when p<Gamma, the multiple linear regression equation is significant.

Further, after the construction of the quantitative structure-activity relationship model, the method further includes:

verifying the quantitative structure-activity relationship model, specifically comprising the following steps:

for each phthalate, randomly extracting one sample in the toxicity data set as a prediction set, and taking the rest samples as training sets;

establishing a multiple linear regression equation according to the training set and the optimal structure descriptor, and calculating a predicted toxicity effect end point value of each phthalate;

calculating cross validation correlation coefficient Q2 cvAnd cross-validation root mean square error RMSECV, as follows:

Figure BDA0002211157520000051

Figure BDA0002211157520000052

in the formula, yi obsRepresents the measured toxic effect end point value, y, of the i-th phthalatei predcvRepresenting the predicted toxicity effect end point value of the ith phthalate of the quantitative structure-activity relationship model,

Figure BDA0002211157520000053

representing the average value of phthalate toxicity effect end points in the training set, and n represents the amount of phthalate compounds in the training set;

if Q2 cv>0.6, RMSECV is less than or equal to 0.4, and R2-Q2 cvIf the quantitative structure-activity relationship model is less than or equal to 0.3, the quantitative structure-activity relationship model passes verification; wherein R is2Indicating the decision coefficient.

Further, after the quantitative structure-activity relationship model is verified, the method further includes:

the application range of the quantitative structure-activity relationship model is calculated by adopting a lever value method, and specifically comprises the following steps:

calculating a lever value hiThe following formula:

hi=Xi T(XTX)-1Xi

in the formula, xiA column vector representing the composition of structural parameters corresponding to the ith phthalate;

calculating a critical value h as follows:

wherein, p represents the variable number in the quantitative structure-activity relationship model, p is 2 in the multi-parameter model, and n represents the number of phthalate compounds in the training set;

drawing a scatter diagram by taking the lever value of each structural parameter in the optimal structure descriptor as the abscissa and the predicted residual as the ordinate, and hiAnd the coordinate space less than h is the application range of the quantitative structure-activity relationship model.

The second purpose of the invention can be achieved by adopting the following technical scheme:

a system for predicting toxic effects of phthalates on zebrafish, the system comprising:

the first construction module is used for selecting a toxicity effect end point value according to toxicity data of various phthalates on the zebra fish and constructing a toxicity data set;

the second construction module is used for acquiring the structural parameters corresponding to each phthalate and constructing a structural descriptor data set;

the calculation module is used for calculating a correlation coefficient between two factors by taking a toxicity effect end point value in a toxicity data set as a dependent variable and taking a structure parameter corresponding to each phthalate in a structure descriptor data set as an independent variable;

the determining module is used for screening out structural parameters obviously related to the toxic effect terminal value according to the correlation coefficient and determining an optimal structure descriptor;

the third construction module is used for establishing a multiple linear regression equation according to the optimal structure descriptor and the toxicity data set and constructing a quantitative structure-activity relationship model;

and the prediction module is used for predicting the toxic effect terminal value of the unknown phthalate to the zebra fish by utilizing the quantitative structure-activity relationship model.

The third purpose of the invention can be achieved by adopting the following technical scheme:

a computer device comprising a processor and a memory for storing a program executable by the processor, wherein the processor implements the prediction method when executing the program stored in the memory.

The fourth purpose of the invention can be achieved by adopting the following technical scheme:

a storage medium stores a program which, when executed by a processor, implements the prediction method described above.

Compared with the prior art, the invention has the following beneficial effects:

1. the toxicity effect end point value is selected to construct a toxicity data set, the structure parameter corresponding to each phthalate is obtained to construct a structure descriptor data set, the toxicity effect end point value in the toxicity data set is used as a dependent variable, the structure parameter corresponding to each phthalate in the structure descriptor data set is used as an independent variable, a correlation coefficient between the two factors is calculated, so that the optimal structure descriptor is determined, a multiple linear regression equation is established, a quantitative structure-activity relation model is constructed, the toxicity effect end point value of unknown phthalate to zebra fish can be predicted by using the quantitative structure-activity relation model, the deep research on the toxicity effect of the compound is facilitated, the quantitative structure-activity relation model is simple, convenient and easy to understand, has good robustness, reliability and prediction capability, is convenient to practical use, and has low cost, The method is simple and efficient, the cost required by the experiment test can be greatly reduced, and the experiment period is shortened.

2. After the quantitative structure-activity relationship model is constructed, the quantitative structure-activity relationship model can be verified, one sample in a toxicity data set is randomly extracted as a prediction set for each phthalate, the other samples are used as training sets, a multiple linear regression equation is established through the training sets and the optimal structure descriptors, the predicted toxicity effect end point value of each phthalate is calculated, and the predicted toxicity effect end point value is compared with the actually measured toxicity effect end point value to verify whether the constructed quantitative structure-activity relationship model is reliable or not.

3. After the quantitative structure-activity relationship model passes verification, the application range of the quantitative structure-activity relationship model can be calculated by adopting a lever value method, and the quantitative structure-activity relationship model can be ensured to have the best reliability in the prediction process.

4. The method can provide basic data for risk assessment and monitoring of phthalate pollutants, fully utilizes relatively perfect toxicity databases at home and abroad and research literatures at home and abroad, establishes a toxicity prediction model suitable for the zebra fish, and supplements the zebra fish toxicity database.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.

FIG. 1 is a flow chart of the method for predicting toxicity effect of phthalate ester on zebra fish in example 1 of the present invention.

FIG. 2 is a flow chart of constructing toxicity data sets according to example 1 of the present invention.

Fig. 3 is a flowchart of determining an optimal structure descriptor according to embodiment 1 of the present invention.

Fig. 4 is a scatter diagram of the application range of the quantitative structure-activity relationship model calculated by using one of the structural parameters in embodiment 1 of the present invention.

FIG. 5 is a scattergram of the application range of the quantitative structure-activity relationship model calculated by using another structure parameter in embodiment 1 of the present invention.

Fig. 6 is a block diagram showing the structure of a system for predicting toxicity effect of phthalate ester on zebra fish in example 3 of the present invention.

Fig. 7 is a block diagram showing the structure of a first building block according to embodiment 3 of the present invention.

Fig. 8 is a block diagram of a determining module according to embodiment 3 of the present invention.

Fig. 9 is a block diagram of a computer device according to embodiment 4 of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts based on the embodiments of the present invention belong to the protection scope of the present invention.

25页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种不依赖数据库搜索的蛋白质生物标志物鉴定方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!