Method for identifying specific molecules in colorectal cancer through large sample transcriptome sequencing

文档序号:70755 发布日期:2021-10-01 浏览:38次 中文

阅读说明:本技术 一种通过大样本转录组测序鉴定结直肠癌中特异分子的方法 (Method for identifying specific molecules in colorectal cancer through large sample transcriptome sequencing ) 是由 曾凡新 张红雨 李洁 高敏 李诗林 于 2021-06-19 设计创作,主要内容包括:本发明公开了一种通过大样本转录组测序鉴定结直肠癌中特异分子的方法。本发明通过大样本临床数据研究,可以鉴定出结直肠癌中显著变化的分子,对研究治疗结直肠癌的治疗方法具有深远意义。同时,针对显著变化的分子进行靶向药物设计,可以缓解患者的耐药情况,延长患者的生存时间。(The invention discloses a method for identifying specific molecules in colorectal cancer through large sample transcriptome sequencing. The invention can identify the molecules with obvious changes in the colorectal cancer through the research of large sample clinical data, and has profound significance for researching the treatment method for treating the colorectal cancer. Meanwhile, the targeted drug design is carried out aiming at the molecules with obvious changes, so that the drug resistance condition of the patient can be relieved, and the survival time of the patient is prolonged.)

1. A method for identifying specific molecules in colorectal cancer by sequencing of large sample transcriptomes, comprising the steps of:

s1, data acquisition

Selecting and collecting a large sample amount of patients diagnosed as colorectal cancer, and collecting cancer tissues and tissues beside the cancer of the patients in the operation;

s2. transcriptome sequencing

Extracting total RNA of the tissue sample by using a TRIZOL reagent, and detecting the integrity and concentration of the total RNA by using an Agilent 2100RNA Nano6000 detection kit; after the total RNA sample is qualified, selecting magnetic beads for enrichment and purification, extracting a target fragment through agarose gel electrophoresis, performing PCR amplification, and then sequencing the qualified library by adopting an Illumina platform of a PE150 sequencing strategy;

s3, data analysis

S3-1, screening the difference genes by using a 'DSeq 2' packet in the R language, wherein the screening standard is as follows: log2fold of change & padj < 0.05; then clustering the patients according to the differential gene expression by using a principal component analysis method;

s3-2, identifying significantly-changed signal pathways of the cancer tissues and the tissues beside the cancer by using GSEA software;

s3-3, further screening the differential genes according to the expression conditions of the differential genes;

s3-4, verifying the expression condition of the screened differential genes by using a verification set.

2. The method of claim 1, wherein 6 of the 6 signature factors are selected from the group consisting of: IGHG1, CA2, COL1A1, RNA5-8SN2, MIR3648-1 and AQP 8.

3. The method of claim 1, wherein the patient' S selection criteria in the data collection of S1 is: patients with other serious diseases and mental diseases are excluded, and patients with incomplete sample clinical information data are excluded.

4. Use of the method of claim 1 for identifying specific molecules in colorectal cancer by large sample transcriptome sequencing, wherein the identified specific factors are used in the design of targeted drugs.

Technical Field

The invention relates to the field of medicine, in particular to a method for identifying specific molecules in colorectal cancer through large sample transcriptome sequencing.

Background

Colorectal Cancer (CRC) is a Cancer that begins in the colon (large intestine) or rectum. Both organs are located in the lower part of the digestive system. The American Cancer Society (ACS) estimates that approximately 1 in every 23 men and every 25 women will suffer colorectal cancer in their lifetime. International cancer research institute 2018, a global assessment shows that there are about 180 new cases of colorectal cancer every year, and 90 million people die of colorectal cancer, making it the third most common malignancy and the second leading cause of cancer death. Certain fixed factors, such as a prior history of colonic polyps, a history of bowel disease, a family history of colorectal cancer, certain genetic syndromes (such as Family Adenomatous Polyposis (FAP)), hind descent of eastern ohy or african, age, etc., increase the risk of colorectal cancer. Furthermore, industrial and economic growth has led to western dietary patterns, sedentary lifestyles and increasing obesity, all being risk factors for colorectal cancer. In recent years, the incidence and mortality of colorectal cancer patients has decreased due to the use of colonoscopic screening, but about 25% of patients still have stage IV disease, while the 5-year survival rate of stage IV patients does not exceed 10%.

The different colorectal cancer stages and the overall health of the patient directly determine the choice of treatment modality for colorectal cancer. In the early stage of colorectal cancer, radical resection is the preferred treatment, followed by the subsequent control of cancer cells using therapeutic drugs such as capecitabine (Xeloda), fluorouracil, platinum (Eloxatin), irinotecan (Camptosar), or radiation. In addition, targeted drugs and immunotherapy are also effective means for colorectal cancer treatment, and are generally used for treating metastatic or advanced colorectal cancer, and the appropriate drugs are selected according to the expression of target genes in patients. However, there are still some patients that are not sensitive to targeted or immunotherapeutic drugs.

Therefore, the research on targeting molecules and cell activities related to the colorectal cancer has significant significance for the treatment of the colorectal cancer. Designing a drug or treatment regimen for the identified targeted gene may provide more treatment regimens for the patient, extending the patient's survival time.

Disclosure of Invention

The invention aims to identify genes and signal paths which are obviously changed in colorectal cancer tissues by using a bioinformatics method based on tissue transcriptome data of a large sample of colorectal cancer patient, which is favorable for providing basis for individualized treatment and prognosis judgment of colorectal cancer by using the genes which are obviously changed. In order to achieve the purpose, the invention adopts the following technical scheme:

a method for identifying specific molecules in colorectal cancer by large sample transcriptome sequencing, comprising the steps of:

s1, data acquisition

Selecting and collecting a large sample amount of patients diagnosed with colorectal cancer, and collecting cancer tissues and tissues beside the cancer of the patients in the operation;

s2. transcriptome sequencing

Extracting total RNA of the tissue sample by using a TRIZOL reagent, and detecting the integrity and concentration of the total RNA by using an Agilent 2100 RNArano 6000 detection kit; after the total RNA sample is qualified, selecting magnetic beads for enrichment and purification, extracting a target fragment through agarose gel electrophoresis, performing PCR amplification, and then sequencing the qualified library by adopting an Illumina platform of a PE150 sequencing strategy;

s3, data analysis

S3-1, screening the difference genes by using a 'DSeq 2' packet in the R language, wherein the screening standard is as follows: log2fold change & padj < 0.05; then clustering the patients according to the differential gene expression by using a principal component analysis method;

s3-2, identifying significantly-changed signal pathways of the cancer tissues and the tissues beside the cancer by using GSEA software;

s3-3, further screening the differential genes according to the expression conditions of the differential genes;

s3-4, verifying the expression condition of the screened differential genes by using a verification set.

The method of the invention screens 6 characteristic factors which are respectively as follows: IGHG1, CA2, COL1A1, RNA5-8SN2, MIR3648-1 and AQP 8.

As an improvement, the selection criteria for the patient in the S1 data acquisition are: patients with other serious diseases and mental diseases are excluded, and patients with incomplete sample clinical information data are excluded.

As an improvement, the identified specific factors are applied to the design of targeted drugs.

The invention has the advantages that:

the invention can identify the molecules with obvious changes in the colorectal cancer through the research of large sample clinical data, and has profound significance for researching the treatment method for treating the colorectal cancer. Meanwhile, the targeted drug design is carried out aiming at the molecules with obvious changes, so that the drug resistance condition of the patient can be relieved, and the survival time of the patient is prolonged.

Drawings

FIG. 1 is a volcanic, heat map and principal component analysis plot of differential genes;

FIG. 2 is a diagram showing MTORC1, OXIDATIVE _ PHOSPHORYLATION, and DNA _ REPAIR signal path GSEA;

FIG. 3 is a bar graph showing expression of IGHG1, CA2, COL1A1, RNA5-8SN2, MIR3648-1 and AQP 8;

FIG. 4 is a ROC analysis of validation sets IGHG1, CA2, COL1A1, RNA5-8SN2, MIR3648-1, AQP 8;

FIG. 5 shows the expression level and prognosis of AQP8, CA2 and COL1A 1.

Detailed Description

The present invention will be described in detail and specifically with reference to the following examples so as to facilitate the understanding of the present invention, but the following examples do not limit the scope of the present invention.

Example 1

This example discloses a method for identifying specific molecules in colorectal cancer by sequencing large sample transcriptomes, comprising the steps of:

s1, data acquisition

Selecting and collecting a large sample amount of patients diagnosed as colorectal cancer, and collecting cancer tissues and tissues beside the cancer of the patients in the operation; the selection criteria for the patient in data acquisition are: patients with other serious diseases and mental diseases are excluded, and patients with incomplete sample clinical information data are excluded.

S2. transcriptome sequencing

Extracting total RNA of the tissue sample by using a TRIZOL reagent, and detecting the integrity and concentration of the total RNA by using an Agilent 2100 RNArano 6000 detection kit; and after the total RNA sample is qualified, selecting magnetic beads for enrichment and purification, extracting a target fragment by agarose gel electrophoresis, carrying out PCR amplification, and then sequencing the qualified library by adopting an Illumina platform of a PE150 sequencing strategy.

S3, data analysis

S3-1, screening the difference genes by using a 'DSeq 2' packet in the R language, wherein the screening standard is as follows: log2fold change & padj < 0.05; then clustering the patients according to the differential gene expression by using a principal component analysis method;

s3-2, identifying significantly-changed signal pathways of the cancer tissues and the tissues beside the cancer by using GSEA software;

s3-3, further screening the differential genes according to the expression conditions of the differential genes;

s3-4, verifying the expression condition of the screened differential genes by using a verification set.

The inventors carried out the analytical identification of colorectal cancer patients in the hospital according to the method used in example 1:

s1, data acquisition

144 patients with colorectal cancer pathologically diagnosed in a hospital from 2 months to 10 months in 2018, patients with other serious diseases and mental diseases and patients with incomplete sample clinical information are selected, and cancer tissues and tissues beside cancer of the patients are collected in an operation, wherein a queue from 2 months to 3 months in 2018 is used as a training queue, and patients from 4 months to 10 months in 2019 are used as a verification queue.

S2. transcriptome sequencing

Extracting total RNA of the tissue sample by using TRIZOL reagent; detecting the integrity and concentration of the total RNA by using an Agilent 2100 RNArano 6000 detection kit; and after the total RNA sample is qualified, selecting magnetic beads for enrichment and purification. Extracting the target fragment by agarose gel electrophoresis and carrying out PCR amplification. Qualified libraries were then sequenced using the Illumina platform of the PE150 sequencing strategy.

S3, data analysis

S3-1:

Screening for differential genes using the "DSeq 2" package in the R language, according to the screening criteria: log2fold change & padj < 0.05 co-screened 839 significantly changed genes, of which 648 genes were significantly up-regulated and 191 genes were significantly down-regulated. The heatmap and volcano plots of the differential genes are shown in FIGS. 1A-B. Further, the main component analysis method is used for clustering the patients according to the differential gene expression, and the result shows that the differential gene can obviously distinguish cancer tissues from tissues beside the cancer (figure 1C).

S3-2:

GSEA software is used for identifying significantly changed signal paths of cancer tissues and tissues beside the cancer, and 50 classical signal paths of HALMARK database are compared, wherein 41 signal paths are significantly activated in the cancer tissues, and 9 signal paths are significantly activated in the tissues beside the cancer. FIG. 2 shows activation diagrams of three of the classical signal paths (MTORC1, OXIDATIVE _ PHOSPHORYLATION, DNA _ REPAIR).

S3-3:

According to the expression condition of the differential genes, 6 differential genes, namely IGHG1, CA2, COL1A1, RNA5-8SN2, MIR3648-1 and AQP8 are further screened out. As shown in fig. 3, CA2 and AQP8 were significantly reduced, and IGHG1, COL1a1, RNA5-8SN2 and MIR3648-1 were significantly increased in cancer tissues.

S3-4:

The expression conditions of the 6 screened differential genes are further verified by a verification set, and the results show that the 6 molecules can independently screen out cancer tissues and tissues beside the cancer by utilizing an ROC curve and have good areas under the curve: IGHG1 (0.752: 0.662-0.842), CA2 (0.832: 0.753-0.912), COL1A1 (0.769: 0.680-0.857), RNA5-8SN2 (0.902: 0.849-0.956), MIR3648-1 (0.865: 0.800-0.930), AQP8 (0.869: 0.799-0.938).

Further, a GEPIA online database is used for carrying out correlation analysis on the relation between the expression levels of CA2, COL1A1 and AQP8 and prognosis, and the results show that the total survival (P is less than 0.05) of AQP8 patients with high expression is remarkably increased, the total survival (P is 0.076) of CA2 patients with high expression is better, and the total survival (P is 0.067) of COL1A1 with low expression is better (FIG. 5).

The results of this analysis identified 6 specific molecules that were significantly altered in colorectal cancer: IGHG1, CA2, COL1A1, RNA5-8SN2, MIR3648-1 and AQP 8. Has profound significance for researching the treatment method for treating the colorectal cancer. Meanwhile, the targeted drug design is carried out aiming at the molecules with obvious changes, so that the drug resistance condition of the patient can be relieved, and the survival time of the patient is prolonged.

The embodiments of the present invention have been described in detail, but they are merely exemplary, and the present invention is not equivalent to the above-described embodiments. Any equivalent modifications and substitutions to those skilled in the art are also within the scope of the present invention. Accordingly, it is intended that all equivalent alterations and modifications be included within the scope of the invention, without departing from the spirit and scope of the invention.

8页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种蛋白纳米孔氨基酸序列的筛选方法、蛋白纳米孔及其应用

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!