Biomarker identification method for individual cancer sample

文档序号:1615537 发布日期:2020-01-10 浏览:25次 中文

阅读说明:本技术 一种个体癌症样本的生物标记物识别方法 (Biomarker identification method for individual cancer sample ) 是由 李�杰 王东 王亚东 于 2019-10-14 设计创作,主要内容包括:本发明是一种个体癌症样本的生物标记物识别方法。本发明先基于两种不同表型的样本数据确定差异表达基因成分,如基因,蛋白质等分子化合物,选取出q个差异表达成分;基于选取的q个差异表达成分,获得平均样本。本发明构建基于平均样本和单体样本的回归模型,对样本进行回归预测,得到样本回归预测的结果;基于样本回归预测的结果和差异表达成分,确定单样本的生物标记。本发明可以针对不同的个体样本选出差异化的生物标记物。(The invention relates to a biomarker identification method for an individual cancer sample. The method comprises the steps of firstly determining differential expression gene components such as molecular compounds such as genes and proteins based on sample data of two different phenotypes, and selecting q differential expression components; based on the q differentially expressed components selected, an average sample was obtained. Constructing a regression model based on an average sample and a monomer sample, and performing regression prediction on the sample to obtain a regression prediction result of the sample; based on the results of the regression prediction of the samples and the differentially expressed components, biomarkers for the single samples are determined. The present invention allows for the selection of differential biomarkers for different individual samples.)

1. A method of biomarker identification of a cancer sample from an individual, comprising: the method comprises the following steps:

step 1: determining differential expression components based on sample data of two different phenotypes, wherein the components comprise proteins, genes or molecular compounds, and selecting q differential expression components;

step 2: obtaining an average sample based on the selected q differential expression components;

and step 3: constructing a regression model based on the average sample and the monomer sample, and performing regression prediction on the sample to obtain a regression prediction result of the sample;

and 4, step 4: and determining the biomarker components of the single sample based on the result of the regression prediction of the sample and the differential expression components.

2. The method of claim 1, wherein the biomarker recognition method for an individual cancer sample comprises: the step 1 specifically comprises the following steps:

selecting two groups of component expression data samples with different phenotypes, and respectively using a plus sign and a minus sign as labels of the two groups of component expression data samples with different phenotypes, n1And n2Respectively representing the sample numbers of two types of samples of "+" and "-";

by yjiThe expression value of the jth component representing the ith sample labeled "+" is taken as xjiThe expression value of the jth component representing the ith sample labeled "-" based on yjiAnd xjiQ differentially expressed components were selected.

3. The method of claim 1, wherein the biomarker recognition method for an individual cancer sample comprises: the step 2 specifically comprises the following steps:

step 2.1: determining the average sample of the two groups of samples "+" and "-" by:

Figure FDA0002232870260000011

Figure FDA0002232870260000012

wherein u is+And u-The average samples of the two groups of samples are denoted "+" and "-",

Figure FDA0002232870260000013

step 2.2: determining average expression values of jth components in the "+" group and the "-" group according to the expression value of jth component of the ith sample labeled as "+" and the expression value of jth component of the ith sample labeled as "-", determining the average expression values of jth components in the "+" group and the "-" group by the following formula:

Figure FDA0002232870260000015

Figure FDA0002232870260000016

wherein the content of the first and second substances,

Figure FDA0002232870260000021

4. The method of claim 1, wherein the biomarker recognition method for an individual cancer sample comprises: the step 3 specifically comprises the following steps:

step 3.1: constructing a regression model based on the mean and monomer samples, let y'jiRepresenting the ith sample, the expression value of the jth differential expression component of which the sample label is "+", obtaining the ith sample labeled "+", and determining the ith sample labeled "+" by the following formula:

wherein the content of the first and second substances,

Figure FDA0002232870260000024

performing regression prediction on the ith sample labeled with "+", wherein the result of performing regression prediction on the ith sample labeled with "+" is represented by the following formula:

Figure FDA0002232870260000025

wherein the content of the first and second substances,the result of regression prediction for the ith sample labeled "+",

Figure FDA0002232870260000027

step 3.2: x'jiRepresenting the ith sample, and the expression value of the ith differential expression component with the sample label of "-" to obtain the ith sample with the label of "-", and determining the ith sample with the label of "-" according to the following formula:

Figure FDA0002232870260000028

wherein the content of the first and second substances,

Figure FDA00022328702600000212

performing regression prediction on the ith sample labeled with "-", and expressing the result of performing regression prediction on the ith sample labeled with "+" by the following formula:

Figure FDA0002232870260000029

wherein the content of the first and second substances,

Figure FDA00022328702600000210

5. The method of claim 1, wherein the biomarker recognition method for an individual cancer sample comprises: the step 4 specifically comprises the following steps:

step 4.1: of the q differentially expressed components, the expression values of some components of a single sample are significantly different from the average value, the degree of difference is quantified by the residual value, and for the sample labeled "+" of the ith sample, the residual value of the jth differentially expressed component is calculated by the following formula:

Figure FDA0002232870260000031

for the sample labeled "-" of the ith sample, the residual value of the jth differentially expressed component is calculated by the following formula:

Figure FDA0002232870260000032

step 4.2: to obtain the biomarker components for the ith sample labeled "+", the residual values are estimated by a gaussian kernel density estimation algorithm, which performs kernel density estimation represented by the following formula:

Figure FDA0002232870260000034

wherein the content of the first and second substances,

Figure FDA0002232870260000035

step 4.3: obtaining a confidence interval of the residual value distribution under the confidence coefficient alpha through phi, wherein phi is a cumulative distribution function of the estimated nuclear density, and calculating the confidence interval of the residual value distribution under the confidence coefficient alpha through the following formula:

Figure FDA0002232870260000036

wherein, CIαA confidence interval distributed by the residual error value under the condition that the confidence coefficient is alpha;

step 4.4: in obtaining CIαThen, for the jth component of the ith sample labeled "+", when satisfied

Figure FDA0002232870260000037

for the sample labeled "-", when satisfied

Figure FDA0002232870260000038

6. The method of claim 2, wherein the biomarker recognition method for an individual cancer sample comprises: the "+" and "-" represent samples of two different phenotypes, respectively, the "+" represents a sample of cancer, recurrence, or response, and the "-" represents a sample of normal, non-recurrence, or non-response; alternatively, the "+" indicates a normal, non-relapsed, or non-responsive sample, and the "-" indicates a cancer, relapsed, or responsive sample.

Technical Field

The invention relates to the technical field of biomarker identification, and discloses a biomarker identification method for an individual cancer sample.

Background

The existing biomarker identification methods are based on the difference between two groups of samples with different phenotypes to identify the biomarkers, however, the cancer is a complex heterogeneous disease, different patients have different pathogenesis and need different treatments, and therefore, a method for determining the biomarkers of individual cancer samples is needed.

Disclosure of Invention

The invention provides a biomarker identification method of an individual cancer sample for determining the biomarker of the individual cancer sample, and the invention provides the following technical scheme:

a method of biomarker identification of an individual cancer sample comprising the steps of:

step 1: determining differential expression components based on sample data of two different phenotypes, wherein the components comprise proteins, genes or molecular compounds, and selecting q differential expression components;

step 2: obtaining an average sample based on the selected q differential expression components;

and step 3: constructing a regression model based on the average sample and the monomer sample, and performing regression prediction on the sample to obtain a regression prediction result of the sample;

and 4, step 4: and determining the biomarker components of the single sample based on the result of the regression prediction of the sample and the differential expression components.

Preferably, the step 1 specifically comprises:

selecting two groups of component expression data samples with different phenotypes, and respectively using a plus sign and a minus sign as labels of the two groups of component expression data samples with different phenotypes, n1And n2Respectively representing the sample numbers of two types of samples of "+" and "-";

by yjiThe expression value of the jth component representing the ith sample labeled "+" is taken as xjiThe expression value of the jth component representing the ith sample labeled "-" based on yjiAnd xjiQ differentially expressed components were selected.

Preferably, the step 2 specifically comprises:

step 2.1: determining the average sample of the two groups of samples "+" and "-" by:

Figure BDA0002232870270000011

Figure BDA0002232870270000012

wherein u is+And u-The average samples of the two groups of samples are denoted "+" and "-",

Figure BDA0002232870270000021

is the average expression value of the qth element in the "+" group,

Figure BDA0002232870270000022

is the average expression value of the q-th component in the "-" group;

step 2.2: determining average expression values of jth components in the "+" group and the "-" group according to the expression value of jth component of the ith sample labeled as "+" and the expression value of jth component of the ith sample labeled as "-", determining the average expression values of jth components in the "+" group and the "-" group by the following formula:

Figure BDA0002232870270000023

Figure BDA0002232870270000024

wherein the content of the first and second substances,is the average expression value of the jth component of the "+" group,

Figure BDA0002232870270000026

is the average expression value of the jth component of the "-" group, n1And n2The numbers of samples of the two types of samples are indicated as "+" and "-", respectively.

Preferably, the step 3 specifically comprises:

step 3.1: constructing a regression model based on the mean and monomer samples, let y'jiRepresenting the ith sample, the expression value of the jth differential expression component of which the sample label is "+", obtaining the ith sample labeled "+", and determining the ith sample by the following formulai samples labeled "+":

Figure BDA0002232870270000027

wherein the content of the first and second substances,

Figure BDA0002232870270000028

sample with the ith label of "+";

performing regression prediction on the ith sample labeled with "+", wherein the result of performing regression prediction on the ith sample labeled with "+" is represented by the following formula:

Figure BDA0002232870270000029

wherein the content of the first and second substances,

Figure BDA00022328702700000210

the result of regression prediction for the ith sample labeled "+",is the intercept coefficient of the linear regression;

step 3.2: x'jiRepresenting the ith sample, and the expression value of the ith differential expression component with the sample label of "-" to obtain the ith sample with the label of "-", and determining the ith sample with the label of "-" according to the following formula:

Figure BDA00022328702700000212

wherein the content of the first and second substances,

Figure BDA00022328702700000213

is the sample with the ith label of "-";

performing regression prediction on the ith sample labeled with "-", and expressing the result of performing regression prediction on the ith sample labeled with "+" by the following formula:

Figure BDA0002232870270000031

wherein the content of the first and second substances,

Figure BDA0002232870270000032

the result of regression prediction for the ith sample labeled "+",

Figure BDA0002232870270000033

is the independent variable coefficient of linear regression.

Preferably, the step 4 specifically includes:

step 4.1: of the q differentially expressed components, the expression values of some components of a single sample are significantly different from the average value, the degree of difference is quantified by the residual value, and for the sample labeled "+" of the ith sample, the residual value of the jth differentially expressed component is calculated by the following formula:

Figure BDA0002232870270000034

for the sample labeled "-" of the ith sample, the residual value of the jth differentially expressed component is calculated by the following formula:

Figure BDA0002232870270000035

step 4.2: to obtain the biomarker components for the ith sample labeled "+", the residual values are estimated by a gaussian kernel density estimation algorithm, which performs kernel density estimation represented by the following formula:

Figure BDA0002232870270000036

Figure BDA0002232870270000037

wherein the content of the first and second substances,

Figure BDA0002232870270000038

performing a kernel density estimation result for a gaussian kernel, h being a smoothing factor, K being a gaussian kernel function;

step 4.3: obtaining a confidence interval of the residual value distribution under the confidence coefficient alpha through phi, wherein phi is a cumulative distribution function of the estimated nuclear density, and calculating the confidence interval of the residual value distribution under the confidence coefficient alpha through the following formula:

Figure BDA0002232870270000039

wherein, CIαA confidence interval distributed by the residual error value under the condition that the confidence coefficient is alpha;

step 4.4: in obtaining CIαThen, for the jth component of the ith sample labeled "+", when satisfiedThe jth component is the biomarker component of the ith sample labeled "+";

for the sample labeled "-", when satisfied

Figure BDA00022328702700000311

The jth component is the biomarker component of the ith sample labeled "+".

Preferably, the "+" and "-" denote samples of two different phenotypes, respectively, the "+" denotes samples of cancer, recurrence, or response, and the "-" denotes samples of normal, non-recurrence, or non-response; alternatively, the "+" indicates a normal, non-relapsed, or non-responsive sample, and the "-" indicates a cancer, relapsed, or responsive sample.

The invention has the following beneficial effects:

the present invention allows for the selection of differential biomarkers for different individual samples.

The invention can effectively identify the biomarker, and the main effective bodies of the biomarker comprise:

a) the expression values of the biomarkers of different samples were statistically significantly different from those of other samples;

b) the frequent biomarkers in different samples can effectively distinguish the survival conditions of the samples;

c) the selected biomarkers are reported in the literature to have phenotypic-related biological effects.

Drawings

FIG. 1 is a flow chart of a method for biomarker identification of a cancer sample from an individual;

Detailed Description

The present invention will be described in detail with reference to specific examples.

12页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种沉铁过程出口离子预测方法及其系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!