System and detection method for identifying arabidopsis thaliana cotyledon cell type

文档序号:685302 发布日期:2021-04-30 浏览:46次 中文

阅读说明:本技术 一种鉴定拟南芥子叶细胞类型的系统和检测方法 (System and detection method for identifying arabidopsis thaliana cotyledon cell type ) 是由 孙旭武 肖云平 刘祉辛 殷昊 陆瑶 巴永兵 于 2020-12-01 设计创作,主要内容包括:本发明涉及一种基于单细胞测序鉴定拟南芥子叶细胞类型的系统和检测方法,与手工鉴定相比,仅需输入待鉴定的数据,即可快速得到拟南芥子叶中代表气孔发育不同阶段的细胞类型。1万个细胞左右可以在约10分钟内鉴定完成,极大地降低了人工成本,确保了注释精度。(The invention relates to a system and a detection method for identifying the cell type of an arabidopsis cotyledon based on single cell sequencing. About 1 ten thousand cells can be identified in about 10 minutes, so that the labor cost is greatly reduced, and the annotation precision is ensured.)

1. A system for identifying an arabidopsis cotyledon cell type based on single cell sequencing, the system comprising: a cell sequencing platform, a database platform of cell types, and a data analysis and processing platform.

2. The system of claim 1, wherein the platform is a single-cell transcriptome sequencing platform, and the genetic data of the cell is obtained by single-cell transcriptome sequencing technology (scRNA-seq).

3. The system of claim 1, wherein the database platform of cell types is based on Marker genes of mesophyll cells (MPC), pseudomeristematic mother cells (MMC), early meristematic cells (EM), late meristematic cells (LM), Guard Mother Cells (GMC), Young Guard Cells (YGC), Guard Cells (GC), and squamous cells (PC), and the database platform of cell types constructs an arabidopsis thaliana reference data platform, wherein the Marker genes of each cell are as follows:

mesophyll cells (MPCs): RBCS, LHCB

Pseudomeristematic blast (MMC): HDG2, POLAR, SPCH, TMM, MUTE, EPF2

Early meristematic cells (EM): MUTE, BASL, SPCH, EPF2

Late meristematic cells (LM): BASL, MUTE, EPF1

Guard Mother Cell (GMC): EPF1, HIC, FAMA, SCRM

Young Guard Cells (YGC): RBCS, FAMA, EPF1

Guard Cells (GC): low expression of RBCS, FAMA, SCRM, and TMM genes

Flat cell (PC): IQD5, RBCS.

4. The system for identifying the type of Arabidopsis cotyledon cell based on single cell sequencing as claimed in claim 3, wherein the database platform of the cell type is established by the following method:

through single cell transcriptome sequencing technology (scRNA-seq), a plurality of Marker genes are collected to identify cell types representing different stages of stomatal development, and specific cells and Marker genes are as follows:

mesophyll cells (MPCs): RBCS, LHCB

Pseudomeristematic blast (MMC): HDG2, POLAR, SPCH, TMM, MUTE, EPF2

Early meristematic cells (EM): MUTE, BASL, SPCH, EPF2

Early meristematic cells (LM): BASL, MUTE, EPF1

Guard Mother Cell (GMC): EPF1, HIC, FAMA, SCRM

Young Guard Cells (YGC): RBCS, FAMA, EPF1

Guard Cells (GC): low expression of RBCS, FAMA, SCRM, and TMM genes

Flat cell (PC): IQD5, RBCS;

a database platform (single cell reference data set) suitable for identifying the type of the arabidopsis thaliana cotyledon cells is constructed.

5. The system for identifying the type of the arabidopsis thaliana cotyledon cell based on single cell sequencing as claimed in claim 4, wherein the step of constructing the database platform (single cell reference data set) suitable for identifying the type of the arabidopsis thaliana cotyledon cell is as follows:

plotting expression level in single cells for the relevant markers using FeaturePlot () and VlnPlot () functions;

plotting gene expression clustering heatmaps in single cells for related markers using a pheasap () function;

and judging the cell type composition of the arabidopsis thaliana cotyledon based on the expression quantity diagram and the gene expression clustering heat map, obtaining a single cell expression spectrum corresponding to each cell type of the arabidopsis thaliana cotyledon, and constructing a cell type identification reference data set.

6. The system of claim 1, wherein the data analysis and processing platform is configured to identify cell types using SingleR () function, generate a correlation heatmap for identifying cell types, generate statistical scores for cell types, and output results and maps.

7. The system for identifying the type of cotyledon cell of Arabidopsis thaliana based on single cell sequencing as claimed in claim 6, wherein the data analysis and processing steps are as follows:

based on the construction, a cell type identification reference data set is obtained, and a SingleR packet is used for matching corresponding cell types by comparing the ranking of genes which are obviously up-regulated and expressed in each group of data to be detected in the reference data set, and is used for quickly judging the types of the arabidopsis thaliana cotyledon cells in the subsequent high-throughput single-cell transcriptome sequencing, and the specific operation steps are as follows:

importing data to be detected;

loading a constructed database platform (single cell reference data set) suitable for identifying the type of the arabidopsis thaliana cotyledon cells;

identifying a cell type using a SingleR () function;

mapping cell type identification correlation heatmaps;

counting the most abundant cell types;

and outputting the result and drawing.

8. A detection method for identifying the type of an arabidopsis thaliana cotyledon cell based on single cell sequencing is characterized by comprising the following steps:

based on a system for identifying the type of an arabidopsis cotyledon cell based on single cell sequencing, the system comprises: a cell sequencing platform, a database platform of cell types, and a data analysis and processing platform;

the cell sequencing platform is a single-cell transcriptome sequencing platform, and gene data of cells are obtained by a single-cell transcriptome sequencing technology (scRNA-seq);

the cell type database platform is based on Marker genes of mesophyll cells (MPC), pseudomeristematic blast cells (MMC), early meristematic cells (EM), late meristematic cells (LM), guard blast cells (GMC), Young Guard Cells (YGC), Guard Cells (GC) and squamous cells (PC), and an arabidopsis thaliana reference data platform is constructed, wherein the Marker genes of all the cells are as follows:

mesophyll cells (MPCs): RBCS, LHCB

Pseudomeristematic blast (MMC): HDG2, POLAR, SPCH, TMM, MUTE, EPF2

Early meristematic cells (EM): MUTE, BASL, SPCH, EPF2

Late meristematic cells (LM): BASL, MUTE, EPF1

Guard Mother Cell (GMC): EPF1, HIC, FAMA, SCRM

Young Guard Cells (YGC): RBCS, FAMA, EPF1

Guard Cells (GC): low expression of RBCS, FAMA, SCRM, and TMM genes

Flat cell (PC): IQD5, RBCS;

the data analysis and processing platform identifies cell types by using a SingleR () function, draws a cell type identification correlation heat map, counts the most cell types, and outputs results and a drawing.

9. The detection method for identifying the type of the cotyledon cell of Arabidopsis thaliana based on single-cell sequencing according to claim 8, wherein: the database platform of the cell types is established as follows:

through single cell transcriptome sequencing technology (scRNA-seq), a plurality of Marker genes are collected to identify cell types representing different stages of stomatal development,

a database platform (single cell reference data set) suitable for identifying the type of the arabidopsis thaliana cotyledon cells is constructed.

10. The system for identifying the type of the arabidopsis thaliana cotyledon cell based on single cell sequencing as claimed in claim 8, wherein the step of constructing the database platform (single cell reference data set) suitable for identifying the type of the arabidopsis thaliana cotyledon cell is as follows:

plotting expression level in single cells for the relevant markers using FeaturePlot () and VlnPlot () functions;

plotting gene expression clustering heatmaps in single cells for related markers using a pheasap () function;

and judging the cell type composition of the arabidopsis thaliana cotyledon based on the expression quantity diagram and the gene expression clustering heat map, obtaining a single cell expression spectrum corresponding to each cell type of the arabidopsis thaliana cotyledon, and constructing a cell type identification reference data set.

Technical Field

The invention belongs to the technical field of transcriptome sequencing, and particularly relates to a system and a detection method for identifying the type of an arabidopsis thaliana cotyledon cell based on single-cell transcriptome sequencing data.

Background

In the field of high-throughput single-cell transcriptome sequencing analysis, cell type identification is a crucial link, and by cell type identification and analysis, heterogeneity of complex cell populations can be effectively revealed, and a cell map is constructed. At present, two methods exist for identifying cell types, namely manual identification based on specific Marker genes (Marker-based), and identification based on single cell reference data sets. The use of the former method of marker-based artificial identification means that researchers must consult a large amount of literature to collect markers, is time-consuming and labor-consuming, and many cell types cannot distinguish cell types or subtypes well by a few markers. For example, in Reference-based analysis of long single-cell sequencing derived a translational structural macro, using CD27 gene can not accurately judge the negative B cell and memory B cell, and in T cell subtype, the marker gene has only high or low expression level difference in many cases, and the cell type can not be judged by the expression of a small amount of marker. However, methods based on singleR dataset identification can distinguish cell subtypes well.

For an arabidopsis thaliana cotyledon sample, no directly available reference data set is available at present for automatically and rapidly matching and identifying a cell type, manual identification only by a marker gene is time-consuming and labor-consuming, the automation degree is low, and the accuracy of identification of similar cell types is not high. Therefore, it is highly desirable to construct a single cell reference data set suitable for the identification of the cotyledon cell type of Arabidopsis thaliana, and to establish a set of computer programs for the automated identification of the cell type.

Disclosure of Invention

Based on the above problems, the present invention aims to overcome the above disadvantages of the prior art, and provide an analysis method for rapidly and objectively identifying the cell type of arabidopsis thaliana cotyledons based on single-cell transcriptome sequencing data.

The invention provides a system for identifying the type of an arabidopsis cotyledon cell based on single cell sequencing, which is characterized by comprising the following components: a cell sequencing platform, a database platform of cell types, and a data analysis and processing platform.

The cell sequencing platform is a single-cell transcriptome sequencing platform, and gene data of the cell is obtained by a single-cell transcriptome sequencing technology (scRNA-seq).

The database platform for cell types as described above was based on Marker genes of mesophyll cells (MPC), pseudomeristematic blast cells (MMC), early meristematic cells (EM), late meristematic cells (LM), Guard Mother Cells (GMC), Young Guard Cells (YGC), Guard Cells (GC), squamous cells (PC), and arabidopsis thaliana reference data platforms were constructed, wherein the Marker genes of each cell were as follows:

mesophyll cells (MPCs): RBCS, LHCB

Pseudomeristematic blast (MMC): HDG2, POLAR, SPCH, TMM, MUTE, EPF2

Early meristematic cells (EM): MUTE, BASL, SPCH, EPF2

Late meristematic cells (LM): BASL, MUTE, EPF1

Guard Mother Cell (GMC): EPF1, HIC, FAMA, SCRM

Young Guard Cells (YGC): RBCS, FAMA, EPF1

Guard Cells (GC): low expression of RBCS, FAMA, SCRM, and TMM genes

Flat cell (PC): IQD5, RBCS.

The database platform for cell types as described above was established as follows:

through single cell transcriptome sequencing technology (scRNA-seq), a plurality of Marker genes are collected to identify cell types representing different stages of stomatal development, and specific cells and Marker genes are as follows:

mesophyll cells (MPCs): RBCS, LHCB

Pseudomeristematic blast (MMC): HDG2, POLAR, SPCH, TMM, MUTE, EPF2

Early meristematic cells (EM): MUTE, BASL, SPCH, EPF2

Early meristematic cells (LM): BASL, MUTE, EPF1

Guard Mother Cell (GMC): EPF1, HIC, FAMA, SCRM

Young Guard Cells (YGC): RBCS, FAMA, EPF1

Guard Cells (GC): low expression of RBCS, FAMA, SCRM, and TMM genes

Flat cell (PC): IQD5, RBCS;

a database platform (single cell reference data set) suitable for identifying the type of the arabidopsis thaliana cotyledon cells is constructed.

The steps for constructing a database platform (single cell reference data set) suitable for identifying the type of the arabidopsis thaliana cotyledon cell are as follows:

plotting expression level in single cells for the relevant markers using FeaturePlot () and VlnPlot () functions;

plotting gene expression clustering heatmaps in single cells for related markers using a pheasap () function;

and judging the cell type composition of the arabidopsis thaliana cotyledon based on the expression quantity diagram and the gene expression clustering heat map, obtaining a single cell expression spectrum corresponding to each cell type of the arabidopsis thaliana cotyledon, and constructing a cell type identification reference data set.

The data analysis and processing platform identifies cell types using the SingleR () function, plots a correlation heat map for cell type identification, counts the most abundant cell types, outputs results and plots as described above.

Preferably, the data analysis and processing steps are as follows:

based on the construction, a cell type identification reference data set is obtained, and a SingleR packet is used for matching corresponding cell types by comparing the ranking of genes which are obviously up-regulated and expressed in each group of data to be detected in the reference data set, and is used for quickly judging the types of the arabidopsis thaliana cotyledon cells in the subsequent high-throughput single-cell transcriptome sequencing, and the specific operation steps are as follows:

importing data to be detected;

loading a constructed database platform (single cell reference data set) suitable for identifying the type of the arabidopsis thaliana cotyledon cells;

identifying a cell type using a SingleR () function;

mapping cell type identification correlation heatmaps;

counting the most abundant cell types;

and outputting the result and drawing.

The invention also provides a detection method for identifying the type of the arabidopsis thaliana cotyledon cell based on single cell sequencing, which is characterized by comprising the following steps:

based on a system for identifying the type of an arabidopsis cotyledon cell based on single cell sequencing, the system comprises: a cell sequencing platform, a database platform of cell types, and a data analysis and processing platform;

the cell sequencing platform is a single-cell transcriptome sequencing platform, and gene data of cells are obtained by a single-cell transcriptome sequencing technology (scRNA-seq);

the cell type database platform is based on Marker genes of mesophyll cells (MPC), pseudomeristematic blast cells (MMC), early meristematic cells (EM), late meristematic cells (LM), guard blast cells (GMC), Young Guard Cells (YGC), Guard Cells (GC) and squamous cells (PC), and an arabidopsis thaliana reference data platform is constructed, wherein the Marker genes of all the cells are as follows:

mesophyll cells (MPCs): RBCS, LHCB

Pseudomeristematic blast (MMC): HDG2, POLAR, SPCH, TMM, MUTE, EPF2

Early meristematic cells (EM): MUTE, BASL, SPCH, EPF2

Late meristematic cells (LM): BASL, MUTE, EPF1

Guard Mother Cell (GMC): EPF1, HIC, FAMA, SCRM

Young Guard Cells (YGC): RBCS, FAMA, EPF1

Guard Cells (GC): low expression of RBCS, FAMA, SCRM, and TMM genes

Flat cell (PC): IQD5, RBCS;

the data analysis and processing platform identifies cell types by using a SingleR () function, draws a cell type identification correlation heat map, counts the most cell types, and outputs results and a drawing.

The database platform for cell types as described above was established as follows:

through a single cell transcriptome sequencing technology (scRNA-seq), a plurality of Marker genes are collected to identify cell types representing different stages of stomatal development, and a database platform (single cell reference data set) suitable for identifying the type of the arabidopsis thaliana cotyledon cells is constructed.

The steps for constructing a database platform (single cell reference data set) suitable for identifying the type of the arabidopsis thaliana cotyledon cell are as follows:

plotting expression level in single cells for the relevant markers using FeaturePlot () and VlnPlot () functions;

plotting gene expression clustering heatmaps in single cells for related markers using a pheasap () function;

and judging the cell type composition of the arabidopsis thaliana cotyledon based on the expression quantity diagram and the gene expression clustering heat map, obtaining a single cell expression spectrum corresponding to each cell type of the arabidopsis thaliana cotyledon, and constructing a cell type identification reference data set.

Further elaborating the technical scheme of the invention:

the method provided by the invention collects a plurality of Marker genes in the existing related documents to identify the cell types representing different stages of stomatal development by a single cell transcriptome sequencing technology (scRNA-seq), constructs a single cell reference data set suitable for identifying the types of the arabidopsis thaliana cotyledon cells, and establishes a set of computer programs for automatic identification. The method specifically comprises the following steps:

1. expression levels in individual cells were plotted against the associated markers using the FeaturePlot () and VlnPlot () functions in the sourta package (v3.0.0).

Mesophyll cells (MPCs): RBCS, LHCB

Pseudomeristematic blast (MMC): HDG2, POLAR, SPCH, TMM, MUTE, EPF2

Early meristematic cells (EM): MUTE, BASL, SPCH, EPF2

Early meristematic cells (LM): BASL, MUTE, EPF1

Guard Mother Cell (GMC): EPF1, HIC, FAMA, SCRM

Young Guard Cells (YGC): RBCS, FAMA, EPF1

Guard Cells (GC): low expression of RBCS, FAMA, SCRM, and TMM genes

Flat cell (PC): iqd5, RBCS

2. The gene expression cluster heatmap in individual cells was plotted against the associated markers using the pheamap () function in the pheamap package.

library(pheatmap)

pdf("heatmap.pdf")

pheatmap(topn_markers2vis,cluster_rows=T,cluster_cols=T,show_rownames=T)

dev.off()

3. And judging the cell type composition of the arabidopsis thaliana cotyledon based on the expression quantity graph, the gene expression clustering heat map and the like, obtaining a single cell expression spectrum corresponding to each cell type of the arabidopsis thaliana cotyledon, and constructing a cell type identification reference data set.

library(SingleR)

library(Seurat)

library(scater)

library(dplyr)

ref_ob=readRDS("celltype.rds")

ref.m=GetAssayData(ref_ob,assay="RNA",slot="counts")

cell_metadata=[email protected]%>%select("celltype")

ref.sce=SingleCellExperiment(assays=list(counts=ref.m),colData=cell_metadata)

ref.sce=logNormCounts(ref.sce)

saveRDS(ref.sce,"reference.rds")

4. And obtaining a cell type identification reference data set based on the construction, and using a SingleR packet to match corresponding cell types by comparing the ranking of the genes which are obviously up-regulated and expressed in each group of data to be detected in the reference data set, so as to be used for quickly judging the types of the arabidopsis thaliana cotyledon cells in the subsequent high-throughput single-cell transcriptome sequencing.

In conclusion, the beneficial effects of the invention are as follows: based on single cell transcriptome sequencing data, annotation aiming at the arabidopsis thaliana cotyledon cell type can be quickly finished by adopting the reference data set and the automatic identification process, and identification can be finished within about 10 minutes for about 1 ten thousand cells. The method has the advantages that single R is not innovatively used, but a reference data set of the arabidopsis thaliana cotyledon cell type is constructed for the first time by using the single R, so that subsequent researchers can quickly identify the arabidopsis thaliana cotyledon cell type in a single cell sequencing result.

Drawings

FIG. 1 is a violin diagram showing the expression level of marker gene, the abscissa is the number of cell population and the ordinate is the normalized gene expression value;

FIG. 2 is a graph showing the expression amount of Maker gene, featureplot;

FIG. 3 is a schematic diagram of the process of automated identification of the type of cotyledon cells of Arabidopsis thaliana in single cell sequencing;

FIG. 4 shows the results of cell type identification using the present automated procedure.

Detailed Description

The invention is further described in detail with reference to the following specific examples and the accompanying drawings. The procedures, conditions, experimental methods and the like for carrying out the present invention are general knowledge and common general knowledge in the art except for the contents specifically mentioned below, and the present invention is not particularly limited.

Example 1, Manual identification

Firstly, a lot of literature data are consulted to collect Marker genes, an expression clustering heat map of the genes and an expression quantity map in a single cell (FeatureParot) are drawn, so that cell types representing different stages of stomata development in an arabidopsis cotyledon are identified manually, and the specific used Marker genes are as follows:

mesophyll cells (MPCs): RBCS, LHCB

Pseudomeristematic blast (MMC): HDG2, POLAR, SPCH, TMM, MUTE, EPF2

Early meristematic cells (EM): MUTE, BASL, SPCH, EPF2

Late meristematic cells (LM): BASL, MUTE, EPF1

Guard Mother Cell (GMC): EPF1, HIC, FAMA, SCRM

Young Guard Cells (YGC): RBCS, FAMA, EPF1

Guard Cells (GC): low expression of RBCS, FAMA, SCRM, and TMM genes

Flat cell (PC): iqd5, RBCS

The expression level of the gene in a single cell was plotted using the following code:

example 2 identification method based on singleR reference dataset

Based on the identified arabidopsis thaliana cotyledon cell types, a reference data set of each cell type is constructed according to the expression profile of the arabidopsis thaliana cotyledon cell types and is used for quickly judging the arabidopsis thaliana cotyledon cell types in high-throughput single-cell transcriptome sequencing, and the specific operation steps are as follows:

step 1, importing data to be tested;

seurat_ob=readRDS("seurat_ob.rds")

query.m=GetAssayData(seurat_ob,assay="RNA",slot="counts")

query.sce=SingleCellExperiment(assays=list(counts=query.m))

query.sce=logNormCounts(query.sce)

step 2, loading the constructed arabidopsis reference data set;

ref.sce=readRDS("reference.rds")

step 3, identifying the cell type by using a SingleR () function;

pred=SingleR(query.sce,ref.sce,labels=factor(ref.sce$celltype),BPPARAM=

MulticoreParam(workers=10))

saveRDS(pred,"singleR.rds")

step 4, drawing a correlation heat map for identifying the cell types;

step 5, counting the cell types with the most proportion in each cluster;

seurat_ob=SetIdent(seurat_ob,value="clusters")

top_celltype=main_celltyping_stat%>%group_by(clusters)%>%top_n(1,cell_num)

write.table(top_celltype,quote=F,"top_celltyping_statistics.xls",sep="\t",row.names=F)

and 6, outputting the comment result of each cluster and drawing.

from.id=as.vector(top_celltype$clusters)

to.id=as.vector(top_celltype$raw_celltype)

seurat_ob=SetIdent(seurat_ob,value=

plyr::mapvalues(x=Idents(seurat_ob),from=from.id,to=to.id))

seurat_ob=StashIdent(seurat_ob,save.name="celltype")

ggtsne2=DimPlot(object=seurat_ob,reduction="tsne",pt.size=1)+theme(plot.title=

element_text(hjust=0.5))

ggsave("celltyping.pdf",plot=ggtsne2)

Results and analysis:

the SCRM gene is one of Marker genes of Guard Mother Cells (GMC), and by drawing a violin graph and FeaturePlot (figure 1 and figure 2) of the expression amount of the gene in a single cell, the gene can be seen to be expressed in the 6 th group and the 11 th group, and only the expression amount is different, so that the accuracy of judging the type of a similar cell population is not high only by the expression of a small amount of Marker, and a large amount of literature data needs to be consulted to search for more Marker genes to manually identify, which is time-consuming and labor-consuming.

By using the reference data set and the automatic program constructed by the invention, the cell types representing different stages of stomatal development in the arabidopsis cotyledon can be quickly obtained only by inputting the data to be identified (the flow schematic diagram is shown in fig. 3) (fig. 4). About 1 ten thousand cells can be identified in about 10 minutes, so that the labor cost is greatly reduced, and two similar cell types are well distinguished: guard Mother Cells (GMC) and Young Guard Cells (YGC), i.e. group 6 and group 11 cells, ensured annotation accuracy.

The foregoing description is a general description of the invention. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation, as form changes and equivalents may be employed. Various changes or modifications may be effected therein by one skilled in the art and equivalents may be made thereto without departing from the scope of the invention as defined in the claims appended hereto.

17页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种基于神经网络的真实纳米孔测序信号滤波方法及装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!