Method for constructing model for classifying hand-foot-mouth samples and application of model

文档序号:1244127 发布日期:2020-08-18 浏览:13次 中文

阅读说明:本技术 构建用于手足口样本归类的模型的方法及其应用 (Method for constructing model for classifying hand-foot-mouth samples and application of model ) 是由 麻锦敏 李琼芳 陈唯军 于 2020-04-20 设计创作,主要内容包括:本发明提出了一种用于区分手足口样本的方法。该方法包括:确定待测样本的第一标志基因组合中每个基因的表达量;将所述第一标志基因组合的表达量结果输入至第一分类模型,以便将所述手足口样本在轻症状组和重症状组之间进行区分。(The invention provides a method for distinguishing hand-foot-mouth samples. The method comprises the following steps: determining the expression quantity of each gene in a first marker gene combination of a sample to be detected; and inputting the expression quantity result of the first marker gene combination into a first classification model so as to distinguish the hand-foot-mouth sample between a light symptom group and a severe symptom group.)

1. A method of constructing a model for hand-foot-and-mouth sample classification, comprising:

(1) performing nucleic acid sample sequencing on samples from a plurality of hand-foot-and-mouth patients and obtaining sequencing data of each patient, wherein the plurality of hand-foot-and-mouth patients comprise a light symptom group and a severe symptom group;

(2) determining the expression level of each gene in the initial gene set of each patient by comparing the sequencing data with a reference genome;

(3) determining an internal reference gene set based on the variation coefficient of the expression quantity of each gene in the initial gene set in each patient, wherein the variation coefficient of the internal reference gene is smaller than a preset threshold value;

(4) performing a first classification training using the expression level of the gene determined in step (2) as a training feature and using the light symptom group and the heavy symptom group as a training set, so as to obtain a first marker gene combination and a first classification model for distinguishing light symptoms from severe symptoms;

(5) selecting one reference gene from the reference gene set, taking the ratio of the rest genes in the initial gene set to the reference gene as a training characteristic, and taking the light symptom group and the heavy symptom group as a training set to perform auxiliary classification training so as to obtain an auxiliary marker gene combination and an auxiliary classification model for distinguishing the light symptom and the heavy symptom.

2. The method of claim 1, wherein the reference genes comprise GPI and GAPDH.

3. The method of claim 1, wherein the first marker gene combination comprises FGFR1OP2, IFNAR2, PAFAH1B1, PTPRC, HNRNPF, YEATS2, UBP 1.

4. The method of claim 1, wherein the first classification training and the auxiliary classification training are each independently stochastic model classification training.

5. The method according to claim 1, wherein in step (5), the assistant classification training is performed separately for each reference gene in the reference gene set so as to obtain a plurality of assistant marker gene combinations and a corresponding plurality of assistant classification models.

6. The method of claim 5, wherein the first reference gene is GPI, the first auxiliary marker gene set comprises GAS6-AS2, UBR4, C9orf16, IFNAR2, YEATS2,

the second internal reference gene is GAPDH, and the second auxiliary marker gene combination comprises QSOX1, VIM, ZEB2 and C9orf 16.

7. A method for differentiating hand-foot-and-mouth samples, comprising:

determining the expression quantity of each gene in a first marker gene combination of a sample to be detected;

inputting the expression level result of the first marker gene combination into a first classification model so as to distinguish the hand-foot-and-mouth sample between a light symptom group and a severe symptom group,

wherein the first marker gene combination and the first classification model are established in any one of claims 1 to 6.

8. The method of claim 7, wherein the expression level of the first marker gene set is obtained by high throughput sequencing.

9. The method of claim 7, further comprising determining the location of the target by:

a qPCR method for determining the expression level of each gene in the first auxiliary marker gene combination and the second auxiliary marker gene combination;

distinguishing the hand-foot-and-mouth sample between a light symptom group and a severe symptom group by using a first auxiliary classification model based on the gene expression amount of the first auxiliary marker gene combination so as to obtain a first distinguishing result;

distinguishing the hand-foot-and-mouth sample between a light symptom group and a severe symptom group by using a second auxiliary classification model based on the gene expression amount of the second auxiliary marker gene combination so as to obtain a second distinguishing result;

selecting as a judgment result a discrimination result in which the first discrimination result and the second discrimination result are the same, wherein the first auxiliary marker gene combination and the second auxiliary marker gene, and the first auxiliary classification model and the second auxiliary classification model are established in any one of claims 1 to 6.

10. An apparatus for differentiating hand-foot-and-mouth samples, comprising:

the first expression quantity determining module is used for determining the expression quantity of each gene in the first marker gene combination of the sample to be detected;

a first classification module for inputting the expression quantity result of the first marker gene combination into a first classification model so as to distinguish the hand-foot-mouth sample between a light symptom group and a severe symptom group,

wherein the first marker gene combination and the first classification model are established in any one of claims 1 to 6.

11. The apparatus of claim 10, wherein the expression level of the first marker gene set is obtained by high throughput sequencing.

12. The apparatus of claim 10, further comprising a computer program product configured to cause a computer to perform the steps of:

a second expression amount determination module for determining an expression amount of each gene in the first auxiliary marker gene combination and the second auxiliary marker gene combination by a qPCR method;

the first auxiliary classification module is used for distinguishing the hand-foot-mouth sample between the light symptom group and the severe symptom group by using a first auxiliary classification model based on the gene expression quantity of the first auxiliary marker gene combination so as to obtain a first distinguishing result;

the second auxiliary classification module is used for distinguishing the hand-foot-mouth sample between the light symptom group and the severe symptom group by using a second auxiliary classification model based on the gene expression quantity of the second auxiliary marker gene combination so as to obtain a second distinguishing result;

the judging module is used for selecting the distinguishing result with the same first distinguishing result and the second distinguishing result as a judging result;

wherein the first auxiliary marker gene combination and the second auxiliary marker gene, the first auxiliary classification model and the second auxiliary classification model are established in any one of claims 1 to 6.

Technical Field

The invention relates to the field of biological analysis, in particular to a method for constructing a model for classifying hand-foot-mouth samples, a method and equipment for distinguishing the hand-foot-mouth samples.

Background

Hand-foot-and-mouth disease (HFMD) is a common infectious disease in children caused by a group of enteroviruses. Severe patients often develop rapid neurological and systemic complications, and in severe cases, death occurs within 3 to 5 days. For infants and children between 6 months and 5 years of age, their immune system is not fully developed and no longer acquires maternally transferred antibodies, and thus lacks the ability to resist viruses and relies entirely on autoimmune development. Therefore, the search for a marker immune gene which can be used for distinguishing mild diseases from severe diseases in the early stage of diseases and predicting the mild and severe hand-foot-and-mouth disease conditions has great significance for clinical treatment and can even reduce the death rate caused by severe diseases.

High-throughput sequencing and artificial intelligence are combined with medical treatment, artificial intelligence analysis is adopted for high-throughput sequencing data, and diagnosis deviation is reduced by adjusting parameters. The method increases more objectivity for diagnosis depending on the experience of doctors, and can also make up for the defect of insufficient modern medical resources. Particularly, for the prediction of the early mild and severe hand-foot-and-mouth disease development, the method only depends on the traditional medical means and does not have a good solution, and the method has important significance for distinguishing the mild and severe hand-foot-and-mouth disease at the early stage by combining high-throughput sequencing and artificial intelligence.

Disclosure of Invention

According to the invention, through the combination of high-throughput sequencing, artificial intelligence and medical treatment, a plurality of marker genes are selected, and modeling is performed by using artificial intelligence, machine learning and other modes, so that the situation of early prediction of mild and severe hand-foot-and-mouth disease is intuitively displayed, and the result is more objective and higher in accuracy.

In a first aspect of the invention, the invention proposes a method of constructing a model for hand-foot-and-mouth sample classification. According to an embodiment of the invention, the method comprises: (1) performing nucleic acid sample sequencing on samples from a plurality of hand-foot-and-mouth patients and obtaining sequencing data of each patient, wherein the plurality of hand-foot-and-mouth patients comprise a light symptom group and a severe symptom group; (2) determining the expression level of each gene in the initial gene set of each patient by comparing the sequencing data with a reference genome; (3) determining an internal reference gene set based on the variation coefficient of the expression quantity of each gene in the initial gene set in each patient, wherein the variation coefficient of the internal reference gene is smaller than a preset threshold value; (4) performing a first classification training using the expression level of the gene determined in step (2) as a training feature and using the light symptom group and the heavy symptom group as a training set, so as to obtain a first marker gene combination and a first classification model for distinguishing light symptoms from severe symptoms; (5) selecting one reference gene from the reference gene set, taking the ratio of the rest genes in the initial gene set to the reference gene as a training characteristic, and taking the light symptom group and the heavy symptom group as a training set to perform auxiliary classification training so as to obtain an auxiliary marker gene combination and an auxiliary classification model for distinguishing the light symptom and the heavy symptom.

According to an embodiment of the present invention, the method may further include at least one of the following additional technical features:

according to an embodiment of the present invention, the reference genes include GPI and GAPDH.

According to an embodiment of the invention, the first marker gene combination comprises FGFR1OP2, IFNAR2, PAFAH1B1, PTPRC, HNRNPF, YEATS2, UBP 1.

According to an embodiment of the present invention, the first classification training and the auxiliary classification training are each independently stochastic model classification training.

According to the embodiment of the present invention, in step (5), the auxiliary classification training is performed separately for each reference gene in the reference gene set, so as to obtain a plurality of auxiliary marker gene combinations and a corresponding plurality of auxiliary classification models.

According to an embodiment of the present invention, the first internal reference gene is GPI, the first auxiliary marker gene set comprises GAS6-AS2, UBR4, C9orf16, IFNAR2, YEATS2, the second internal reference gene is GAPDH, and the second auxiliary marker gene set comprises QSOX1, VIM, ZEB2, C9orf 16.

In a second aspect of the invention, the invention proposes a method for distinguishing hand-foot-and-mouth samples. According to an embodiment of the invention, the method comprises: determining the expression quantity of each gene in a first marker gene combination of a sample to be detected; inputting the expression quantity result of the first marker gene combination into a first classification model so as to distinguish the hand-foot-and-mouth sample between a light symptom group and a severe symptom group, wherein the first marker gene combination and the first classification model are established according to the method.

According to an embodiment of the present invention, the method may further include at least one of the following additional technical features:

according to an embodiment of the present invention, the expression level of the first marker gene combination is obtained by high-throughput sequencing.

According to an embodiment of the invention, further comprising the step of: a qPCR method for determining the expression level of each gene in the first auxiliary marker gene combination and the second auxiliary marker gene combination; distinguishing the hand-foot-and-mouth sample between a light symptom group and a severe symptom group by using a first auxiliary classification model based on the gene expression amount of the first auxiliary marker gene combination so as to obtain a first distinguishing result; distinguishing the hand-foot-and-mouth sample between a light symptom group and a severe symptom group by using a second auxiliary classification model based on the gene expression amount of the second auxiliary marker gene combination so as to obtain a second distinguishing result; selecting the discrimination result with the same first discrimination result and the same second discrimination result as the judgment result, wherein the first auxiliary marker gene combination and the second auxiliary marker gene, and the first auxiliary classification model and the second auxiliary classification model are established by the method described above.

In a third aspect of the invention, the invention proposes a device for differentiating hand-foot-and-mouth samples. According to an embodiment of the invention, the apparatus comprises: the first expression quantity determining module is used for determining the expression quantity of each gene in the first marker gene combination of the sample to be detected; and the first classification module is used for inputting the expression quantity result of the first marker gene combination into a first classification model so as to distinguish the hand-foot-and-mouth sample between a light symptom group and a heavy symptom group, wherein the first marker gene combination and the first classification model are established according to the method.

According to an embodiment of the present invention, the apparatus may further include at least one of the following additional features:

according to an embodiment of the present invention, the expression level of the first marker gene combination is obtained by high-throughput sequencing.

According to an embodiment of the invention, further comprising the step of: a second expression amount determination module for determining an expression amount of each gene in the first auxiliary marker gene combination and the second auxiliary marker gene combination by a qPCR method; the first auxiliary classification module is used for distinguishing the hand-foot-mouth sample between the light symptom group and the severe symptom group by using a first auxiliary classification model based on the gene expression quantity of the first auxiliary marker gene combination so as to obtain a first distinguishing result; the second auxiliary classification module is used for distinguishing the hand-foot-mouth sample between the light symptom group and the severe symptom group by using a second auxiliary classification model based on the gene expression quantity of the second auxiliary marker gene combination so as to obtain a second distinguishing result; and a judging module, configured to select a discrimination result with the same first discrimination result as the discrimination result as a judgment result, where the first auxiliary marker gene combination and the second auxiliary marker gene are established according to the method.

According to the method and the device for distinguishing the hand-foot-and-mouth samples, disclosed by the embodiment of the invention, through the combination of high-throughput sequencing, artificial intelligence and medical treatment, the limitation that early mild and severe hand-foot-and-mouth disease diagnosis depends heavily on experience diagnosis of doctors is broken through, a plurality of marker genes are selected, and modeling is performed by using modes such as artificial intelligence, machine learning and the like, so that the situation of early prediction of mild and severe hand-foot-and-mouth disease is visually shown, the result is more objective, and the accuracy is higher.

Drawings

FIG. 1 is a flow chart of a sample for differentiating light-hand, light-mouth and severe cases according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of an apparatus for distinguishing hand-foot-mouth samples according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a device for differentiating hand-foot-and-mouth samples according to another embodiment of the present invention;

FIG. 4 is a graph showing the combination accuracy of 7 marker genes (FGFR1OP2, IFNAR2, PAFAH1B1, PTPRC, HNRNPF, YEATS2, UBP1) selected by a random forest model using gene expression levels FPKM according to an embodiment of the present invention;

FIG. 5 is a ROC curve of a training set modeled by 7 marker genes selected from gene expression FPKM, 4/5 samples, according to an embodiment of the present invention;

FIG. 6 is a graph showing an expression level of GPI gene (FPKM) according to an embodiment of the present inventionGPI) On the basis of the expression level of other genes, the ratio of the expression level of other genes FPKM to the expression level of other genes (FPKM/FPKM) was calculatedGPI) For the ratio, 5 marker genes (GAS6-AS2, UBR4, C9orf16, IFNAR2 and YEATS2) are picked out by using a random forest model, and the combination accuracy is optimal;

FIG. 7 is a graph showing an expression level of GPI gene (FPKM) according to an embodiment of the present inventionGPI) Taking a standard to pick out 5 marker genes for modeling, taking 4/5 samples as a training set, and taking ROC curves of the training set;

FIG. 8 is a graph showing an expression amount (FPKM) of GAPDH gene according to an example of the present inventionGAPDH) On the basis of the expression level of other genes, the ratio of the expression level of other genes FPKM to the expression level of other genes (FPKM/FPKM) was calculatedGAPDH) For the ratio, 4 marker genes (QSOX1, VIM, ZEB2 and C9orf16) are picked out by a random forest model, and the combination accuracy is optimal;

FIG. 9 is a graph showing an expression amount (FPKM) of GAPDH gene according to an example of the present inventionGAPDH) And 4 marker genes are selected for modeling by taking a reference, and 4/5 samples are taken as a training set and ROC curves of the training set.

Detailed Description

In the following, embodiments of the present invention of a method for distinguishing a light-hand-mouth severe sample will be described in detail, examples of which are shown in the accompanying drawings. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

Firstly, extracting nucleic acid from a sample

After peripheral blood lymphocytes (PBMCs) are isolated from blood, nucleic acids from the cells are extracted (RNA extraction), and high-throughput sequencing or quantitative pcr (qpcr) gene quantification is performed on the nucleic acids.

Second, biological information analysis

The method comprises the following steps: analysis of sequencing results

1. And removing the low-quality sequence from the off-line sequencing data to obtain a standby sequence.

2. The alternative sequences were aligned to a human reference gene using the software Bowtie 2.

3. The gene expression level was calculated using (RNA-Seq by expression amplification, RSEM) software package (Fragments Per Kibase of exon Million Fragments mapped, FPKM).

Step two: selection of internal reference genes

Calculating the variation coefficient of the gene, and selecting relatively stable reference genes (GeneA and GeneB).

Step three: selection of marker genes

1. A Group of marker genes (Group1) are selected by a random forest model based on a mild disease sample by using the gene expression quantity FPKM and are used for distinguishing different groups.

2. Expressed amount of GeneA gene (FPKM)GeneA) On the basis of the expression level of other genes, the ratio of the expression level of other genes FPKM to the expression level of other genes (FPKM/FPKM) was calculatedGeneA) For the ratio, a Group of marker genes (Group2) is selected by using a random forest model.

3. Expressed amount of GeneB gene (FPKM)GeneB) On the basis of the expression level of other genes, the ratio of the expression level of other genes FPKM to the expression level of other genes (FPKM/FPKM) was calculatedGeneB) For the ratio, a Group of marker genes (Group3) is selected by using a random forest model.

4. And repeating the steps 2 or 3 by repeating a plurality of reference genes to make a ratio, and selecting the optimal combination.

Step four: prediction of mild or severe symptoms

1. Modeling by using the selected marker gene (Group1), and predicting the mild and serious symptoms of the hand-foot-mouth sample subjected to high-throughput sequencing.

2. Modeling the selected marker gene (Group2) by using a GeneA gene as a reference, and predicting the mild and serious symptoms of the detected hand-foot-mouth samples.

3. Modeling the selected marker gene (Group3) by using a GeneB gene as a reference, predicting the condition of the mild and serious symptoms of the detected hand-foot-and-mouth sample, combining the predicted result with the result of 2, adopting the result of judging consistency, and judging inconsistency to be not predicted.

For ease of understanding, the applicant shows the flow of the present application for distinguishing light-hand-mouth and severe-hand samples as fig. 1.

In another aspect, the invention features an apparatus for distinguishing hand-foot-and-mouth samples. According to an embodiment of the invention, with reference to fig. 2, the apparatus comprises: a first expression level determining module 100, configured to determine an expression level of each gene in a first marker gene combination of a sample to be detected; a first classification module 200, configured to input the expression level result of the first marker gene combination into a first classification model, so as to distinguish the hand-foot-and-mouth sample between a light symptom group and a severe symptom group, where the first marker gene combination and the first classification model are established according to the method described above.

Specifically, according to an embodiment of the present invention, referring to fig. 3, the apparatus further includes: a second expression amount determining module 300 for determining an expression amount of each gene in the first auxiliary marker gene combination and the second auxiliary marker gene combination by the qPCR method; a first auxiliary classification module 400, configured to perform, based on the gene expression level of the first auxiliary marker gene combination, a first auxiliary classification model to distinguish the hand-foot-and-mouth sample between the light symptom group and the severe symptom group so as to obtain a first distinguishing result; a second auxiliary classification module 500, configured to perform, based on the gene expression level of the second auxiliary marker gene combination, a second auxiliary classification model to distinguish the hand-foot-and-mouth sample between the light symptom group and the severe symptom group so as to obtain a second distinguishing result; a determining module 600, configured to select a distinguishing result with the same first distinguishing result as the determining result, where the first auxiliary marker gene combination and the second auxiliary marker gene, the first auxiliary classification model and the second auxiliary classification model are established according to the foregoing method.

The invention will be further explained with reference to specific examples. The experimental procedures used in the following examples are all conventional procedures unless otherwise specified. Materials, reagents and the like used in the following examples are commercially available unless otherwise specified.

14页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种分泌入支气管肺泡灌洗液蛋白质预测方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!