Pancreatic cancer diagnosis marker based on metabonomics and screening method and application thereof

文档序号:1686042 发布日期:2020-01-03 浏览:15次 中文

阅读说明:本技术 基于代谢组学的胰腺癌诊断标志物及其筛选方法和应用 (Pancreatic cancer diagnosis marker based on metabonomics and screening method and application thereof ) 是由 尹玉新 王光熙 庞瑞芳 于 2019-09-12 设计创作,主要内容包括:本发明公开了基于代谢组学的胰腺癌诊断标志物及其筛选方法,所述诊断标志物包含31种血浆代谢标志物的任意一种或多种的组合。本发明还提供了使用所述胰腺癌诊断标志物构建诊断模型的方法及其在诊断试剂盒中的应用。本发明通过高效液相色谱质谱联用技术对患者血浆进行非靶标代谢组学分析,通过人工智能数据分析技术发现胰腺癌患者与正常人群之间的差异代谢物,进一步通过靶标代谢组学分析及机器学习建模验证所述特异性差异代谢物即胰腺癌诊断标志物在胰腺癌诊断中的诊断能力。(The invention discloses a pancreatic cancer diagnostic marker based on metabonomics and a screening method thereof, wherein the diagnostic marker comprises any one or more combination of 31 plasma metabolic markers. The invention also provides a method for constructing a diagnosis model by using the pancreatic cancer diagnosis marker and application of the pancreatic cancer diagnosis marker in a diagnosis kit. The method disclosed by the invention carries out non-target metabonomics analysis on the plasma of the patient by a high performance liquid chromatography-mass spectrometry combined technology, discovers the differential metabolite between the pancreatic cancer patient and a normal population by an artificial intelligence data analysis technology, and further verifies the diagnosis capability of the specific differential metabolite, namely the pancreatic cancer diagnosis marker, in pancreatic cancer diagnosis by target metabonomics analysis and machine learning modeling.)

1. A diagnostic marker for the diagnosis of pancreatic cancer, characterized by: the marker is any one or more of the following 31 plasma metabolism markers: lysophosphatidylcholine LPC 14:0, lysophosphatidylcholine LPC16:0, lysophosphatidylcholine LPC16:2, lysophosphatidylcholine LPC18:1, lysophosphatidylcholine LPC 20:4, phosphatidylcholine PC16:0-16:0, phosphatidylcholine PC16: 0-18:1, phosphatidylcholine PC18:0-18:2, phosphatidylcholine PC18: 0-20:3, phosphatidylcholine PC16:0-22:5, phosphatidylcholine PC18: 0-22:5, phosphatidylcholine PC O-16:0-18:2, phosphatidylcholine PC16:0e/18:2, phosphatidylcholine PC 38:3e, phosphatidylcholine PC 46:1e, lysophosphatidylcholine LPE22:4, phosphatidylethanolamine PE16:0-18:2, phosphatidylethanolamine PE16: 3e/2:0, Phosphatidylethanolamine PE22: 4e/4:0, phosphatidylethanolamine PE22:6e/4:0, phosphatidylethanolamine PE 26:0e/8:0, phosphatidylethanolamine PE22: 5e/20:3, phosphatidylserine PS18:0-18:1, phosphatidylinositol PI18:0-18:2, sphingomyelin SMd18:1/18:0, sphingomyelin SMd18:2/24:1, sphingomyelin SMd18:2/24:2, diglyceride DG18:1-18:1, triglyceride TG8:0-8:0-8:0, triglyceride TG8:0-8: 0-10:0, and branched fatty acid ester FAHFA 4:0/20: 4.

2. The diagnostic marker of claim 1, wherein: the markers include any one or more of the following 19 plasma metabolism markers: lysophosphatidylcholine LPC 14:0, lysophosphatidylcholine LPC16:0, lysophosphatidylcholine LPC18:1, lysophosphatidylcholine LPC 20:4, phosphatidylcholine PC16:0-16:0, phosphatidylcholine PC16: 0-18:1, phosphatidylcholine PC18:0-18:2, phosphatidylcholine PC18: 0-20:3, phosphatidylcholine PC16:0-22:5, phosphatidylcholine PC18: 0-22:5, phosphatidylcholine PC O-16:0-18:2, lysophosphatidylethanolamine LPE22:4, phosphatidylethanolamine PE16:0-18:2, phosphatidylserine PS18:0-18:1, phosphatidylinositol PI18:0-18:2, sphingomyelin SMd18:1/18:0, sphingomyelin SMd18:2/24:1, Sphingomyelin SMd18:2/24:2, diglyceride DG18:1-18: 1.

3. The diagnostic marker according to claim 1 or claim 2, characterized in that the marker comprises any one or more of the following 17 plasma metabolism markers: lysophosphatidylcholine LPC 14:0, lysophosphatidylcholine LPC16:0, lysophosphatidylcholine LPC18:1, lysophosphatidylcholine LPC 20:4, phosphatidylcholine PC16:0-16:0, phosphatidylcholine PC16: 0-18:1, phosphatidylcholine PC18:0-18:2, phosphatidylcholine PC18: 0-20:3, phosphatidylcholine PC16:0-22:5, 0-22:5 of phosphatidylcholine PC18, 0-18:2 of phosphatidylcholine PC O-16, 4 of lysophosphatidylethanolamine LPE22, 0-18:2 of phosphatidylethanolamine PE16, 0-18:2 of sphingomyelin SMd18, 1/18:0, SMd18:2/24:1 of sphingomyelin, SMd18:2/24:2 of sphingomyelin, and 18:1-18:1 of diglyceride DG.

4. The diagnostic marker of claim 1 or claim 2 or claim 3, wherein: the markers include any one or more of the following 14 plasma metabolism markers: lysophosphatidylcholine LPC16:0, lysophosphatidylcholine LPC18:1, lysophosphatidylcholine LPC 20:4, phosphatidylcholine PC16: 0-18:1, phosphatidylcholine PC18:0-18:2, phosphatidylcholine PC18: 0-20:3, phosphatidylcholine PC16:0-22:5, phosphatidylcholine PC18: 0-22:5, phosphatidylcholine PC O-16:0-18:2, lysophosphatidylethanolamine LPE22:4, sphingomyelin SMd18:1/18:0, sphingomyelin SMd18:2/24:1, sphingomyelin SMd18:2/24:2, and diglyceride DG18:1-18: 1.

5. A method for screening a pancreatic cancer diagnostic marker, comprising: comprises the following steps:

(1) collecting plasma samples of pancreatic cancer patients and healthy people as analysis samples;

(2) performing non-targeted metabonomics analysis on each analysis sample by adopting a liquid chromatography-mass spectrometry combined technology to obtain an original metabolic fingerprint of each plasma sample;

(3) performing map processing on original metabolic fingerprints of pancreatic cancer plasma samples and healthy plasma samples by using MS-Dial software to obtain metabolite information of each row, wherein each row is a two-dimensional matrix of analysis samples; carrying out metabolite peak identification including isotope peaks, adducts and fragment ions and peak area integration on the two-dimensional matrix for further machine learning;

(4) learning the two-dimensional matrix data in the step (3) by using a machine learning Support Vector Machine (SVM) algorithm, taking 3/4 of the pancreatic cancer and healthy control plasma sample data as a training set, taking 1/4 as a test set, and adopting a random four-fold learning model for the training set, namely, randomly selecting 3/4 samples as the training set, taking 1/4 samples as a cross validation set, and performing random cyclic iteration for 5000 times to generate an optimal classification model on the cross validation set, and finally performing validation and analysis on the test set, wherein the SVM model can effectively classify the metabolic group data of pancreatic cancer patients and healthy people by counting the average value of the accuracy of the final model;

(5) according to the obtained SVM model, through a feature screening sequence based on machine learning, with the help of feature importance scores of SVM modeling and continuous accumulation of important features to form a model to be tested, the classification accuracy of the model is evaluated to display the classification efficiency of different models, and finally the relative optimal feature number and the combination mode are displayed, and the standard for screening the optimal feature number and the combination mode is as follows: the model accuracy does not rise any more when the feature number is increased;

(6) performing mass spectrum-based optimization screening on the optimal characteristics, namely the target differential metabolites obtained by screening, and screening according to the quality of chromatographic peak patterns and secondary mass spectrum data by using MS-Dial software to obtain potential metabolic markers;

(7) and (3) according to the primary and secondary mass spectrum information of the potential metabolic markers, presuming the molecular mass and molecular formula of the markers, and comparing the molecular mass and molecular formula with spectrogram information in a metabolite spectrogram database, thereby identifying the metabolites and obtaining the plasma metabolic markers suitable for diagnosing pancreatic cancer.

6. A method for constructing a pancreatic cancer diagnosis model is characterized in that: comprises the following steps:

(1) collecting plasma samples of pancreatic cancer patients and healthy people as analysis samples;

(2) performing targeted metabonomics analysis of the diagnosis marker on each analysis sample by adopting a liquid chromatography-mass spectrometry combined technology to obtain a targeted metabonomic map of each plasma sample;

(3) performing map processing on the targeted metabolome map of the pancreatic cancer plasma sample and the healthy plasma sample by using MS-Dial software to obtain metabolite information of each row, wherein each row is a two-dimensional matrix of markers of analysis samples and is used for further machine learning;

(4) and constructing a classification model by using a machine learning SVM according to the two-dimensional matrix of the diagnosis marker to obtain a pancreatic cancer diagnosis model.

7. The method for constructing a pancreatic cancer diagnostic model according to claim 6, characterized in that: the diagnostic marker in the step (2) is any one or more of the following 31 plasma metabolism markers: lysophosphatidylcholine LPC 14:0, lysophosphatidylcholine LPC16:2, lysophosphatidylcholine LPC18:1, lysophosphatidylcholine LPC 20:4, phosphatidylcholine PC16:0-16:0, phosphatidylcholine PC16: 0-18:1, phosphatidylcholine PC18:0-18:2, phosphatidylcholine PC18: 0-20:3, phosphatidylcholine PC16:0-22:5, phosphatidylcholine PC18: 0-22:5, phosphatidylcholine PC O-16:0-18:2, phosphatidylcholine PC16:0e/18:2, phosphatidylcholine PC 38:3e, phosphatidylcholine PC 46:1e, lysophosphatidylethanolamine LPE22:4, phosphatidylethanolamine PE16:0-18:2, Phosphatidylethanolamine PE16: 3e/2:0, phosphatidylethanolamine PE22: 4e/4:0, phosphatidylethanolamine PE22:6e/4:0, phosphatidylethanolamine PE 26:0e/8:0, phosphatidylethanolamine PE22: 5e/20:3, phosphatidylserine PS18:0-18:1, phosphatidylinositol PI18:0-18:2, sphingomyelin SMd18:1/18:0, sphingomyelin SMd18:2/24:1, sphingomyelin SMd18:2/24:2, diglyceride DG18:1-18:1, triglyceride TG8:0-8:0-8:0, triglyceride TG8:0-8: 0-10:0, and fatty acid branched-chain fatty acid ester FAHFA 4:0/20: 4.

8. The method for constructing a pancreatic cancer diagnostic model according to claim 6, characterized in that: the diagnostic marker in the step (2) is any one or more of the following 19 plasma metabolism markers: lysophosphatidylcholine LPC 14:0, lysophosphatidylcholine LPC16:0, lysophosphatidylcholine LPC18:1, lysophosphatidylcholine LPC 20:4, phosphatidylcholine PC16:0-16:0, phosphatidylcholine PC16: 0-18:1, phosphatidylcholine PC18:0-18:2, phosphatidylcholine PC18: 0-20:3, phosphatidylcholine PC16:0-22:5, phosphatidylcholine PC18: 0-22:5, phosphatidylcholine PC O-16:0-18:2, lysophosphatidylethanolamine LPE22:4, phosphatidylethanolamine PE16:0-18:2, phosphatidylserine PS18:0-18:1, phosphatidylinositol PI18:0-18:2, sphingomyelin SMd18:1/18:0, sphingomyelin SMd18:2/24:1, Sphingomyelin SMd18:2/24:2, diglyceride DG18:1-18: 1.

9. The method for constructing a pancreatic cancer diagnostic model according to claim 6, characterized in that: the diagnostic marker in the step (2) is any one or more of the following 17 plasma metabolism markers: lysophosphatidylcholine LPC 14:0, lysophosphatidylcholine LPC16:0, lysophosphatidylcholine LPC18:1, lysophosphatidylcholine LPC 20:4, phosphatidylcholine PC16:0-16:0, phosphatidylcholine PC16: 0-18:1, phosphatidylcholine PC18:0-18:2, phosphatidylcholine PC18: 0-20:3, phosphatidylcholine PC16:0-22:5, 0-22:5 of phosphatidylcholine PC18, 0-18:2 of phosphatidylcholine PC O-16, 4 of lysophosphatidylethanolamine LPE22, 0-18:2 of phosphatidylethanolamine PE16, 0-1/18: 0 of sphingomyelin SMd18, 2/24:1 of sphingomyelin SMd18, 1-18:1 of sphingomyelin SMd18:2/24:2 and 18:1-18:1 of diglyceride DG.

10. The method for constructing a pancreatic cancer diagnostic model according to claim 6, characterized in that: the diagnostic marker in the step (2) is any one or more of the following 14 plasma metabolism markers: lysophosphatidylcholine LPC16:0, lysophosphatidylcholine LPC18:1, lysophosphatidylcholine LPC 20:4, phosphatidylcholine PC16: 0-18:1, phosphatidylcholine PC18:0-18:2, phosphatidylcholine PC18: 0-20:3, phosphatidylcholine PC16:0-22:5, phosphatidylcholine PC18: 0-22:5, phosphatidylcholine PC O-16:0-18:2, lysophosphatidylethanolamine LPE22:4, sphingomyelin SMd18:1/18:0, sphingomyelin SM d18:2/24:1, sphingomyelin SMd18:2/24:2, and diglyceride DG18:1-18: 1.

11. A pancreatic cancer diagnostic kit characterized by: comprising a diagnostic marker according to claims 1-4.

Technical Field

The invention belongs to the field of clinical examination and diagnosis, and particularly relates to a pancreatic cancer diagnosis marker based on metabonomics and machine learning analysis technology, a screening method of the diagnosis marker, a method for constructing a diagnosis model by using the diagnosis marker, and application of the diagnosis marker in pancreatic cancer diagnosis.

Background

Pancreatic cancer (pancreatic cancer) is a malignant disease of the digestive tract that is highly malignant and difficult to diagnose and treat, and the incidence rate thereof has rapidly increased in recent years. According to national cancer statistical data published by the national cancer center in 2019 in 1 month, the pancreatic cancer is the tenth of the malignant tumor in China, the death rate is the seventh, and the diagnosis and treatment status is not optimistic. Long-term smoking, high-fat diet, excessive body mass index, excessive drinking, diabetes or chronic pancreatitis as a concomitant cause of pancreatic cancer. In recent years, the clinical diagnosis and treatment level of pancreatic cancer is obviously improved under the promotion of a new concept of oncology, but the pancreatic cancer, which is taken as an exocrine gland, has high malignancy, short course of disease, high development and deterioration speed, poor prognosis and extremely high mortality. The five-year survival rate of patients is extremely low (less than 5%), and is called "cancer king". Therefore, if pancreatic cancer can be found in an asymptomatic or asymptomatic early stage and targeted therapy can be performed in time, the survival rate and the cure rate of pancreatic cancer patients can be greatly improved.

The symptoms of patients with early pancreatic cancer are atypical, the most common symptoms are inappetence, nausea, vomiting, emaciation and hypodynamia, and the patients are often treated according to other diseases and the condition is delayed. Most pancreatic cancer patients have been diagnosed at an advanced stage. On one hand, the pancreas is located in the deep part of the abdomen of the human body, and the diagnosis is difficult to be confirmed in the early stage if relevant imaging examination is not specially performed; on the other hand, early symptoms of pancreatic cancer are atypical, and sometimes they are only mild upper abdominal nonspecific symptoms, and are similar to stomach pain and other symptoms, and they are prone to misdiagnosis as chronic stomach diseases. Therefore, the development of a simple and convenient novel pancreatic cancer early diagnosis method has great clinical significance and social and economic significance.

According to the general guidelines for pancreatic cancer diagnosis and treatment (2018 edition) published by the national cancer institute for cancer pancreas, the pancreas cancer professional committee in 2018, the carbohydrate antigen CA19-9 is mainly used as the most commonly used marker for pancreatic cancer diagnosis at present, and the clinical characteristics are as follows: the sensitivity and specificity for diagnosing pancreatic cancer reached 78.2% and 82.8% respectively using serum CA19-9>37U/ml as a positive index (Poruk KE, Gay DZ, Brown K, et al, the clinical utility of CA19-9 in pancreatic cancer: Diagnostic and Diagnostic updates. curr Mol Med,2013,13(3): 340-. About 10% of pancreatic cancer patients are Lewis antigen negative and CA19-9 is not elevated, and other tumor markers such as CA125 and/or carcinoembryonic antigen (CEA) are combined for auxiliary diagnosis (Luo G, Liu C, Guo M, et al. CA 19-9-Low & Lewis (+) pancreatic cancer: A unique subtype. cancer Lett,2017,385: 46-50). Although the marker is clinically applied, the specificity is not high, the marker is increased in patients with digestive tract malignant tumors such as pancreatic cancer, gallbladder cancer, colon cancer, gastric cancer, liver cancer and the like, the early diagnosis value is not high, and the marker is mainly used as an index for disease monitoring and recurrence prediction. In addition, microRNA, ctDNA, in vitro Glyphalin-1, etc. in peripheral blood also have potential clinical application prospects, but basically stay in the laboratory research stage, still have the disadvantages of high false positive rate, high cost, etc., and have yet to be confirmed by high-level evidence of medical evidence (Xu J, Cao Z, Liu W, et al. plasmid miRNAs effective diagnosis genes with great expense; Amulistic student study. Ann Surg.2016,263(6): 1173. about 1179; Xu L, Li Q, Xu D, et al. has-scientific-141. down regulation TM. 4SF1 in high molecular diagnosis cell invasion and analysis. int J. on col, Sa 2014,44(2): JD 459, J182, K59, K51, K59, K5, K, 2017,114(38) 10202-10207; ma L, Tian X, Guo H, et al, Long nononcoding RNA H19 derivative miR-675 regulating cells promotion by down-regulating E2F-1 inhuman biological additive J Cancer,2018,9(2): 389-399; li W, Zhang GX, Lu X, et al.5-Hydroxymethyl cellulose signatures in circulating cells-free DNAas diagnostic biomakers for human cameras, 2017,27(10):1243 + 1257).

Metabolomics is a science that qualitatively and quantitatively analyzes all small molecule metabolites (such as amino acids, fatty acids, lipids, and the like) in a biological sample (such as plasma, serum, urine, feces, saliva, and the like) or cells and finds the relative relationship between the metabolites and pathophysiological changes. Since information transmission in the living body is gradually increased in the direction of DNA, mRNA, protein, metabolite, cell, tissue, organ, and individual, metabolomics can be regarded as extension and embodiment of genomics and proteomics. Genomics and proteomics, while revealing intrinsic differences in organisms, do not necessarily result in phenotypic differences, thanks to the powerful compensatory mechanisms of organisms. The generation and metabolism of small molecules can reflect the inherent differences of organisms and can reflect the interference and influence of external factors on the organisms. At present, the pathogenesis of pancreatic cancer is not completely clarified, but external factors such as smoking and drinking and internal factors such as endocrine dyscrasia have a certain relation with the occurrence and development of pancreatic cancer. It is generally accepted that the onset of pancreatic cancer is not a single factor but may be the result of a synergistic effect of multiple factors. Therefore, the use of metabolomics techniques to find metabolite changes characteristic of early onset pancreatic cancer is consistent with its pathogenesis.

Researchers have studied pancreatic cancer using metabonomic techniques such as Fest et al (Fest J, Vijfhuizen LS, Goeman JJ, et al. search for early pancreas cancer. transporter in biological in five microorganisms using metabonomic techniques such as Nuclear expression biological in five microorganisms using 20. endogenetic. Endocolology. 2019,160 (1731. upright 1742), Dutta et al (Dutta P, Perez MR, Lee J, et al. combining with systemic-temporal metabolism imaging and Nuclear expression biological in biological 2019. 18 (7: 2826. 2834), Gaiser et al (Gaisera. A. important biological in biological) 2019. Biochemical in biological in 2019, 18. micro rat J. Nuclear resonance in biological in three microorganisms using 18. Nuclear magnetic resonance of III. Nuclear expression of nucleic acids, III Analyzing serum, pancreatic cyst fluid (cyst fluid) and exosome samples by a liquid chromatography-mass spectrometry (LC-MS) technology, and analyzing obtained data by a traditional statistical method such as a Principal Component Analysis (PCA) method to search for pancreatic cancer related biomarkers. However, most of these studies select only a small amount of samples, use universal chromatographic methods, and do not report the sensitivity and specificity of screening/diagnosing pancreatic cancer by the screened metabolites, and have very limited practical clinical significance. Therefore, the method adopts large-scale clinical samples to carry out plasma metabonomics research, searches for a pancreatic cancer diagnosis plasma metabolic marker which has high sensitivity, good specificity and safety and economy, and establishes a reliable and effective pancreatic cancer early molecular diagnosis model, and still has important clinical application value.

Machine learning is an important branch of artificial intelligence, which refers to data analysis and the establishment of effective models. Over the past few years, artificial intelligence and machine learning have evolved rapidly. Artificial intelligence has shown good performance in some biomedical applications, especially in the diagnosis of diseases, and has become a popular research direction in this field, and thus is also considered as an important direction and aid for future medical development. At present, a bottleneck of discovering biomarkers by applying a metabonomics technology is that the biomarkers have high detection sensitivity, a plurality of data characteristics and huge data volume, and a traditional principal component analysis method ignores a plurality of characteristics which have certain influence on distinguishing two types of samples in order to reduce the number of the characteristics. Therefore, the metabonomics technology is combined with an artificial intelligence machine learning method, and more effective and reliable diagnosis markers can be found more quickly and accurately.

Disclosure of Invention

Aiming at the current situations that pancreatic cancer is hidden, early diagnosis is difficult, and a simple, convenient and practical screening method is not available, the invention provides a diagnostic marker suitable for pancreatic cancer diagnosis. The marker has good sensitivity and specificity for pancreatic cancer, can be used for diagnosing pancreatic cancer, and has important significance for improving the prognosis of pancreatic cancer and increasing the survival rate of pancreatic cancer patients.

The invention also provides a screening method of the diagnosis marker suitable for pancreatic cancer diagnosis, and the marker obtained by the method has good sensitivity and specificity for pancreatic cancer, is particularly suitable for early diagnosis of pancreatic cancer, and has important significance for treatment of pancreatic cancer.

The invention also provides a pancreatic cancer diagnosis model and a construction method of the diagnosis model, the model construction method is simple, has higher sensitivity and specificity for pancreatic cancer, and provides effective technical support for early diagnosis and early treatment of pancreatic cancer.

The invention also provides a method for diagnosing pancreatic cancer by adopting the diagnosis model, the diagnosis can be carried out by adopting the diagnosis model only through blood sampling, the diagnosis is convenient, fast and noninvasive, the sensitivity to pancreatic cancer is high, the specificity is good, and the clinical application value is very good.

The invention also provides a kit containing the diagnostic marker suitable for pancreatic cancer diagnosis, and the kit can be used for pancreatic cancer diagnosis

The invention analyzes the plasma samples of 333 pancreatic cancer patients and 262 healthy control plasma samples, respectively obtains the fingerprints of 1416 and 669 small molecule metabolites under positive and negative ion modes by using a high performance liquid chromatography-mass spectrometry combined instrument (LC-MS), obtains diagnosis markers suitable for pancreatic cancer diagnosis by performing analysis and feature screening on the fingerprints of the pancreatic cancer patients and the healthy normal control small molecule metabolites based on a machine learning support vector machine and combining with optimization screening based on mass spectrometry, establishes a targeted metabolome method aiming at the diagnosis markers, constructs a model for detection data by using machine learning to obtain a pancreatic cancer diagnosis model, can rapidly diagnose whether the pancreatic cancer is the pancreatic cancer by using the model, particularly can diagnose early pancreatic cancer, and has accuracy, high sensitivity and strong universality, has clinical use and popularization value.

In the invention, the plasma of the pancreatic cancer patient refers to preoperative plasma of a pancreatic duct adenocarcinoma patient which is pathologically confirmed after operation in 2016-18 years. Except for other systemic malignant tumors, the medicine is subjected to anticancer treatment before operation or other new adjuvant treatment.

The diagnosis marker and the diagnosis model can diagnose the pancreatic cancer with unobvious symptoms, the method is simple, convenient and quick, and has no internal creation, thereby having very important significance for early diagnosis and early treatment of the pancreatic cancer, improving the prognosis of a patient and improving the survival rate of the patient. The specific technical scheme for realizing the invention is as follows:

a diagnostic marker suitable for the diagnosis of pancreatic cancer, which is any one or more of the following 31 plasma metabolism markers: lysophosphatidylcholine LPC 14:0, lysophosphatidylcholine LPC16:2, lysophosphatidylcholine LPC18:1, lysophosphatidylcholine LPC 20:4, phosphatidylcholine PC16:0-16:0, phosphatidylcholine PC16: 0-18:1, phosphatidylcholine PC18:0-18:2, phosphatidylcholine PC18: 0-20:3, phosphatidylcholine PC16:0-22:5, phosphatidylcholine PC18: 0-22:5, phosphatidylcholine PC O-16:0-18:2, phosphatidylcholine PC16:0e/18:2, phosphatidylcholine PC 38:3e, phosphatidylcholine PC 46:1e, lysophosphatidylethanolamine LPE22:4, phosphatidylethanolamine PE16:0-18:2, Phosphatidylethanolamine PE16: 3e/2:0, phosphatidylethanolamine PE22: 4e/4:0, phosphatidylethanolamine PE22:6e/4:0, phosphatidylethanolamine PE 26:0e/8:0, phosphatidylethanolamine PE22: 5e/20:3, phosphatidylserine PS18:0-18:1, phosphatidylinositol PI18:0-18:2, sphingomyelin SM d18:1/18:0, sphingomyelin SM d18:2/24:1, sphingomyelin SM d18:2/24:2, diglyceride DG18:1-18:1, triglyceride TG8:0-8:0-8:0, triglyceride TG8:0-8: 0-10:0, and hydroxy fatty acid branched fatty acid ester FAHFA 4:0/20: 4.

Further, the diagnostic marker may be any one or more of the following 19 plasma metabolism markers: lysophosphatidylcholine LPC 14:0, lysophosphatidylcholine LPC16:0, lysophosphatidylcholine LPC18:1, lysophosphatidylcholine LPC 20:4, phosphatidylcholine PC16:0-16:0, phosphatidylcholine PC16: 0-18:1, phosphatidylcholine PC18:0-18:2, phosphatidylcholine PC18: 0-20:3, phosphatidylcholine PC16:0-22:5, phosphatidylcholine PC18: 0-22:5, phosphatidylcholine PC O-16:0-18:2, lysophosphatidylethanolamine LPE22:4, phosphatidylethanolamine PE16:0-18:2, phosphatidylserine PS18:0-18:1, phosphatidylinositol PI18:0-18:2, sphingomyelin SM d18:1/18:0, Sphingomyelin SMd18:2/24:1, sphingomyelin SM d18:2/24:2, diglyceride DG18:1-18: 1.

Further, the diagnostic marker may be any one or more of the following 17 plasma metabolism markers: lysophosphatidylcholine LPC 14:0, lysophosphatidylcholine LPC16:0, lysophosphatidylcholine LPC18:1, lysophosphatidylcholine LPC 20:4, phosphatidylcholine PC16:0-16:0, phosphatidylcholine PC16: 0-18:1, phosphatidylcholine PC18:0-18:2, phosphatidylcholine PC18: 0-20:3, phosphatidylcholine PC16:0-22:5, 0-22:5 parts of phosphatidylcholine PC18, 0-18:2 parts of phosphatidylcholine PC O-16, 4 parts of lysophosphatidylethanolamine LPE22, 0-18:2 parts of phosphatidylethanolamine PE16, 0-1/18: 0 parts of sphingomyelin SM d18, 2/24:1 parts of sphingomyelin SM d18, 1-18:1 parts of sphingomyelin SM d18:2/24:2 parts of diglyceride DG 18.

Further, the diagnostic marker may be any one or more of the following 14 plasma metabolism markers: lysophosphatidylcholine LPC16:0, lysophosphatidylcholine LPC18:1, lysophosphatidylcholine LPC 20:4, phosphatidylcholine PC16: 0-18:1, phosphatidylcholine PC18:0-18:2, phosphatidylcholine PC18: 0-20:3, phosphatidylcholine PC16:0-22:5, phosphatidylcholine PC18: 0-22:5, phosphatidylcholine PC O-16:0-18:2, lysophosphatidylethanolamine LPE22:4, sphingomyelin SM d18:1/18:0, sphingomyelin SM d18:2/24:1, sphingomyelin SM d18:2/24:2, and diglyceride DG18:1-18: 1.

The invention also provides a screening method of the diagnosis markers suitable for pancreatic cancer diagnosis, which comprises the following steps:

(1) collecting plasma samples of pancreatic cancer patients and healthy people as analysis samples;

(2) performing non-targeted metabonomics analysis on each analysis sample by adopting a liquid chromatography-mass spectrometry combined technology to obtain an original metabolic fingerprint of each plasma sample;

(3) performing map processing on original metabolic fingerprints of pancreatic cancer plasma samples and healthy plasma samples by using MS-Dial software to obtain metabolite information of each row, wherein each row is a two-dimensional matrix of analysis samples; carrying out metabolite peak identification including isotope peaks, adducts and fragment ions and peak area integration on the two-dimensional matrix for further machine learning;

(4) and (3) learning the two-dimensional matrix data in the step (3) by using a machine learning Support Vector Machine (SVM) algorithm, wherein 495 cases of the pancreatic cancer and healthy control plasma sample data are used as a training set, and 100 cases of the pancreatic cancer and healthy control plasma sample data are used as a test set. And (3) adopting a random four-fold learning model for the training set, namely randomly selecting 3/4 samples of two kinds of blood plasma as the training set, using 1/4 samples as a cross validation set, and performing random loop iteration for 5000 times to generate an optimal classification model on the cross validation set. Finally, verification and analysis are carried out on 100 test sets, and the SVM model can effectively classify the metabolome data of early pancreatic cancer patients and healthy people by counting the average value of the accuracy of the final model;

(5) according to the obtained SVM model, through feature screening based on a machine learning greedy algorithm, new features beneficial to improvement of classification performance are scored and continuously accumulated by means of feature importance of SVM modeling to form a model to be tested, the classification accuracy of the model is evaluated to display the classification efficiency of different models, the relatively optimal feature number and the combination mode are finally displayed, and the standard for screening the optimal feature number and the combination mode is as follows: the model accuracy does not rise any more when the feature number is increased;

(6) performing mass spectrum-based optimization screening on the optimal characteristics, namely the target differential metabolites obtained by screening, and screening according to the quality of chromatographic peak patterns and secondary mass spectrum data by using MS-Dial software to obtain potential metabolic markers;

(7) and (3) according to the primary and secondary mass spectrum information of the potential metabolic markers, deducing the molecular mass and molecular formula of the markers, and comparing the molecular mass and molecular formula with spectrogram information in a metabolite spectrogram database (LipidPlast) so as to identify the metabolites and obtain the plasma metabolic markers suitable for diagnosing pancreatic cancer. A combination of different plasma metabolism markers can serve as diagnostic markers suitable for the diagnosis of pancreatic cancer.

In the screening method, the pancreatic cancer patient is a patient with pancreatic ductal adenocarcinoma which is pathologically diagnosed after operation, does not receive new adjuvant therapy, and does not have other systemic tumors.

In the screening method, the healthy population is the healthy population without pancreatic lesions after physical examination.

In the screening method, when LC-MS plasma non-targeted metabonomics technology analysis is carried out, one quality control sample is added into every 20 analysis samples for monitoring the quality control condition of the analysis samples from sample injection pretreatment to the analysis process in real time, wherein the quality control sample is a mixed sample of 333 parts of early pancreatic cancer plasma samples and 262 parts of healthy plasma samples.

In the screening method, the analysis sample and the quality control sample are pretreated before sample injection:

(1) pipette 50. mu.l of the assay sample or quality control sample into a 2.0ml EP (eppendorf) tube;

(2) adding 150 μ l methanol for extraction, and shaking for 5 min to precipitate protein;

(3) then centrifuged at 12000rpm for 10 minutes in a high speed centrifuge at 4 ℃;

(4) transferring the supernatant obtained in the step (3) into an LC-MS sample introduction bottle, and storing at-80 ℃ for LC-MS detection.

In the screening method, the step of performing map processing on the original metabolic fingerprint map comprises the following steps: and reading the original metabolism fingerprint by using MS-Dial software, and performing processing operations including retention time correction, peak identification, peak matching and peak alignment to obtain a two-dimensional matrix.

In the screening method, when each analysis sample is analyzed by adopting an LC-MS plasma non-targeted metabonomics technology, a chromatographic column used by liquid chromatography is a Waters XSelect CSH C18 chromatographic column, and the specification is 100 multiplied by 4.6mm and 3.5 mu m; the sample injection temperature is 4 ℃, and the sample injection volume is 10 mu L; the chromatographic mobile phase comprises two solvents A and B, wherein A is 0.1% formic acid, 60% acetonitrile and 40% water solution; b is 0.1 percent formic acid, 10 percent acetonitrile and 90 percent isopropanol solution; the chromatographic gradient elution conditions were: 40% B at 0 min, 43% B at 2 min, 50% B at 2.1 min, 50% B-60% B gradually increasing at 2.1-12 min, 75% B at 12.1 min, 75% B to 99% B at 12.1-18 min, 99% B at 18-19 min, 40% B at 19-20 min, and 40% B for 5 min; the flow rate was 0.5 ml/min.

In the screening method, when each analysis sample is analyzed by adopting an LC-MS plasma non-targeted metabonomics technology, a quadrupole rod-electrostatic field orbit trap mass spectrometer Q-active is used for mass spectrometry, positive ion mode ESI + and negative ion mode ESI of an electrospray ion source are adopted, the ion source temperature is 320 ℃, the back blowing air is set to be 2, the desolvation air temperature is 300 ℃, and sheath gas and auxiliary gas are respectively set to be 40 and 10; the capillary voltage is +3kV and-3 kV respectively in the positive ion mode and the negative ion mode, and the taper hole voltage is 0V; the mass-to-charge ratio range of the atlas data acquisition is 200-1200 m/z, and the acquisition mode is a data dependent mode (DDA).

In a preferred embodiment of the present invention, 333 patients and 262 healthy patients are selected as pancreatic cancer patients.

In a preferred scheme of the invention, when the SVM classification model is constructed, a random four-fold learning model is adopted for a training set, and a modeling parameter C is 5.

In the preferred scheme of the invention, random loop iteration is carried out for 5000 times in SVM modeling in the screening process, and the average value of the final model accuracy is more than 0.9.

The invention also provides a construction method of the pancreatic cancer diagnosis model, which comprises the following steps:

(1) collecting plasma samples of pancreatic cancer patients and healthy people as analysis samples;

(2) performing targeted metabonomics analysis on each analysis sample by adopting a liquid chromatography-mass spectrometry combined technology to obtain a targeted metabonomic map of each plasma sample;

(3) performing map processing on the targeted metabolome map of the pancreatic cancer plasma sample and the healthy plasma sample by using MS-Dial software to obtain metabolite information of each row, wherein each row is a two-dimensional matrix of markers of analysis samples and is used for further machine learning;

(4) and constructing a classification model by using a machine learning SVM according to the two-dimensional matrix of the diagnosis marker to obtain a pancreatic cancer diagnosis model.

In the above construction method, the pancreatic cancer patient is a patient with pancreatic ductal adenocarcinoma pathologically diagnosed after surgery, and does not receive new adjuvant therapy and has no other systemic tumor.

In the above construction method, the targeted metabonomics analysis described in step (2) refers to targeted detection of metabolites that can be used as diagnostic markers and are screened according to the screening method for diagnostic markers of the present invention.

In the construction method, when each analysis sample is analyzed by adopting an LC-MS plasma targeted metabonomics technology, a chromatographic column used by liquid chromatography is a Waters XSelect CSH C18 chromatographic column, and the specification is 100 multiplied by 4.6mm and 3.5 mu m; the sample injection temperature is 4 ℃, and the sample injection volume is 10 mu L; the chromatographic mobile phase comprises two solvents A and B, wherein A is 0.1 weight percent formic acid, 60 percent acetonitrile and 40 percent water solution; b is 0.1 weight percent formic acid, 10 percent acetonitrile and 90 percent isopropanol solution; the chromatographic gradient elution conditions were: 40% B at 0-l min, 40% B-50% B gradually increasing from 1-5 min, 50% B gradually increasing to 100% B gradually increasing from 5-15 min, 100% B maintaining at 15-18 min, 40% B rapidly decreasing from 18-19 min, then 40% B lasting for 5 min; the flow rate was 0.5 ml/min.

In the construction method, when each analysis sample is analyzed by adopting an LC-MS plasma targeted metabonomics technology, a quadrupole rod-electrostatic field orbit trap mass spectrometer Q-active is used for mass spectrometry, the positive ion mode ESI + of an electrospray ion source is adopted, the ion source temperature is 320 ℃, the back-flushing gas is set to be 2, the desolvation temperature is 300 ℃, and the sheath gas and the auxiliary gas are respectively set to be 40 and 10; the capillary voltage is +3kV, and the taper hole voltage is 0V; the acquisition mode is a parallel reaction monitoring mode (PRM).

In a preferred embodiment of the present invention, the model is constructed based on the following number of samples: the patients with pancreatic cancer used were 333 people and 262 people who were healthy.

In a preferred embodiment of the present invention, when constructing the SVM classification model, the modeling parameter C is 15.

In a preferred embodiment of the invention, when the diagnostic marker suitable for the diagnosis of pancreatic cancer is a combination of 19 plasma metabolic markers (comprising lysophosphatidylcholine LPC 14:0, lysophosphatidylcholine LPC16:0, lysophosphatidylcholine LPC18:1, lysophosphatidylcholine LPC 20:4, phosphatidylcholine PC16:0-16:0, phosphatidylcholine PC16: 0-18:1, phosphatidylcholine PC18:0-18:2, phosphatidylcholine PC18: 0-20:3, phosphatidylcholine PC16:0-22:5, phosphatidylcholine PC18: 0-22:5, phosphatidylcholine PC O-16:0-18:2, lysophosphatidylethanolamine LPE22:4, phosphatidylethanolamine PE16:0-18:2, phosphatidylserine PS18:0-18:1, phosphatidylethanolamine PE16:0-18:2, etc.), Phosphatidylinositol PI18:0-18:2, sphingomyelin SM d18:1/18:0, sphingomyelin SM d18:2/24:1, sphingomyelin SM d18:2/24:2 and diglyceride DG18:1-18: 1), the AUC value of the area under the ROC curve of the obtained diagnostic model can reach 0.9657.

The invention also provides a pancreatic cancer diagnosis model which is constructed according to the construction method of the pancreatic cancer diagnosis model. As above, in a preferred embodiment of the invention, when the diagnostic markers used in the diagnostic model are a combination of 19 plasma metabolism markers (comprising lysophosphatidylcholine LPC 14:0, lysophosphatidylcholine LPC16:0, lysophosphatidylcholine LPC18:1, lysophosphatidylcholine LPC 20:4, phosphatidylcholine PC16:0-16:0, phosphatidylcholine PC16: 0-18:1, phosphatidylcholine PC18:0-18:2, phosphatidylcholine PC18: 0-20:3, phosphatidylcholine PC16:0-22:5, phosphatidylcholine PC18: 0-22:5, phosphatidylcholine PC O-16:0-18:2, lysoethanolamine LPE22:4, phosphatidylethanolamine PE16:0-18:2, phosphatidylserine PS18:0-18:1, phosphatidyl ethanolamine PE16:0-18: 1), Phosphatidylinositol PI18:0-18:2, sphingomyelin SMd18:1/18:0, sphingomyelin SM d18:2/24:1, sphingomyelin SM d18:2/24:2 and diglyceride DG18:1-18: 1), the AUC value of the area under the ROC curve of the diagnostic model can reach 0.9657.

The invention has the advantages that the plasma metabonomics technology and the artificial intelligence data analysis technology are adopted to obtain the diagnosis marker and the pancreatic cancer diagnosis model which are suitable for pancreatic cancer diagnosis. The diagnostic marker screening method has strong operability, the model construction method is simple, the obtained diagnostic model has good effect, high sensitivity and good specificity, and is suitable for diagnosing pancreatic cancer, in particular early pancreatic cancer with unobvious symptoms. The invention can realize diagnosis only by blood sampling detection without additionally collecting tissue samples, can well replace the existing blood detection and imaging diagnosis modes, has simple and quick diagnosis, is beneficial to early diagnosis and early treatment of pancreatic cancer, and has good clinical use and popularization values.

Drawings

FIG. 1 shows Total Ion Chromatograms (TICs) of original metabolic fingerprints, wherein ESI + is in positive ion mode, ESI-is in negative ion mode, the horizontal axis represents retention time, and the vertical axis represents relative metabolite concentration.

FIG. 2 is a diagram of a machine learning Support Vector Machine (SVM) classification model, where specificity is specificity, sensitivity is sensitivity, accuracy is accuracy, training is a training set, validity is a cross-validation set, and test is a test set.

FIG. 3 is a feature selection score chart of an SVM model, wherein the horizontal axis is the number of features, the vertical axis is accuracy, training is a training set, validation is a cross-validation set, and test is a test set.

FIG. 4. typical selective ion chromatograms (EIC) of targeted metabolome profiles, with retention time on the horizontal axis and metabolite response intensity on the vertical axis.

FIG. 5 ROC graph of early diagnosis model of pancreatic cancer constructed using 19 plasma metabolism markers, where Training set is the Training set result and Test set is the Test set result.

Detailed Description

The present invention is further illustrated below by reference to specific examples, which are provided only for the purpose of illustration and are not meant to limit the scope of the present invention.

23页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种通肾颗粒的检测方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!