Method for evaluating quality of pluripotent stem cells

文档序号:1848110 发布日期:2021-11-16 浏览:26次 中文

阅读说明:本技术 一种评估多能干细胞质量的方法 (Method for evaluating quality of pluripotent stem cells ) 是由 王淋立 卢宇鹏 陈月花 李强 杨建国 李乾坤 于 2021-07-22 设计创作,主要内容包括:本发明公开了一种评估多能干细胞质量的方法,所述方法包括以下步骤:对待测细胞进行序列测定,得到转录组表达谱系数据和全基因组序列数据;基于转录组表达谱系数据,进行基因的表达量的统计,得到细胞的质量分数1;基于全基因组测序数据,进行全基因组的DNA变异分析;根据所得的细胞的质量分数和全基因组的DNA变异分析结果,得到细胞质量分数2,来评价多能干细胞细胞质量。本发明方法通过综合多方面来评估细胞质量,且本发明采用测序的手段减少了大量实验流程,仅从转录组测序和全基因组测序的层面评估多能干细胞的质量,快速缩短了时间,得到的数据结果通过统计处理可靠准确。(The invention discloses a method for evaluating the quality of pluripotent stem cells, which comprises the following steps: performing sequence determination on the cells to be detected to obtain transcriptome expression lineage data and whole genome sequence data; performing statistics of the expression amount of the gene based on the transcriptome expression lineage data to obtain a mass fraction 1 of the cell; performing whole genome DNA variation analysis based on whole genome sequencing data; the cell mass fraction 2 was obtained from the obtained mass fraction of cells and the results of the whole genome DNA mutation analysis, and the pluripotent stem cell mass was evaluated. The method of the invention evaluates the cell quality by integrating multiple aspects, reduces a large amount of experimental processes by adopting a sequencing means, evaluates the quality of the pluripotent stem cells only from the level of transcriptome sequencing and whole genome sequencing, quickly shortens the time, and obtains reliable and accurate data results through statistical processing.)

1. A method of assessing the quality of pluripotent stem cells, comprising the steps of:

s1, carrying out sequence determination on the cells to be detected to obtain transcriptome expression lineage data and whole genome sequence data;

s2, carrying out gene expression quantity statistics based on transcriptome expression lineage data of cells to be detected, respectively carrying out pluripotency analysis, singularity analysis and cell characterization value analysis on the gene expression quantity statistics results, and then integrating the cell pluripotency analysis, singularity analysis and cell characterization value analysis results to obtain a cell mass fraction 1; performing whole genome DNA variation analysis based on whole genome sequence data of the cells to be detected;

s3, obtaining a cell mass fraction 2 according to the cell mass fraction obtained in the step S2 and the result of DNA variation analysis of the whole genome, and evaluating the cell mass of the pluripotent stem cell.

2. The method of claim 1, wherein the step of counting the gene expression levels in step S1 further comprises the steps of performing data analysis and quality control on the transcriptome expression lineage data, and comparing the obtained data with a reference genome.

3. The method of claim 2, wherein the reference genome is a human reference genome.

4. The method according to claim 1, wherein in step S1, the statistical gene expression level is calculated using one of htseq-count software and featurecount software.

5. The method of claim 1, wherein the results of the analysis of the integrated cell pluripotency, singularity, and analysis of the cell characterization values are expressed by the following formulas:

C=∑Fia group III;

wherein, F in the formula IiIs a score in the evaluation of cell quality, and X is a score in the evaluation of pluripotencyiIs a value, X, calculated under a certain evaluation of cell qualitymaxIs the maximum threshold value, X, of the cell quality evaluation designminIs the minimum threshold value of the cell evaluation design, and q represents the number of the cell quality evaluation aspects; f in the formula IIiIs an evaluation score of one of singularity and cell characterization value in the aspect of cell quality evaluation, XiIs a value, X, calculated under a certain evaluation of cell qualitymaxIs the maximum threshold for the cell quality assessment designValue, XminIs the minimum threshold value of the cell evaluation design, and q represents the number of the cell quality evaluation aspects; in the third formula, C represents the cell mass fraction of 1.

6. The method according to claim 1, wherein the obtaining of the cell characterization value in step S2 includes obtaining the expression levels of the relevant cytogenetic stability genes and cell function genes from the transcriptome expression lineage data, and statistically calculating the baseline deviation value of the expression level of the genes from the expression level of the embryonic stem cell sample genes.

7. The method according to claim 6, wherein the cell functional genes include embryonic stem cell characterization genes and iPS cell characterization genes; the embryonic stem cell characterization genes comprise SOX2, OCT4, NANOG, SSEA-4, TRA-1-60, TRA-1-81 and SSEA-1; the iPS cell characterization genes comprise SSEA3, SSEA4, TRA-1-60, TRA-1-81, OCT4 and NANOG.

8. The method of claim 6, wherein the cytogenetically stable genes comprise tert, TET1, TET3, Sirt1, CHK1, Oct4-endo, OCT4, Nanog, and P53.

9. The method according to claim 1, wherein the step of processing the genome-wide data in step S3 further comprises the steps of performing data analysis and quality control on the genome-wide data, and comparing the obtained data with a reference genome.

10. The method according to claim 1, wherein the genome-wide DNA mutation analysis in step S3 comprises point mutation analysis, indel analysis, copy number mutation analysis, and large fragment chromosomal mutation analysis.

Technical Field

The invention belongs to the technical field of bioinformatics, and particularly relates to a method for evaluating the quality of pluripotent stem cells.

Background

With the rapid growth of human pluripotent stem cells, particularly induced pluripotent stem cells (ipscs), a method for well evaluating the quality of existing pluripotent stem cells is urgently needed. Currently, the mainstream evaluation method is to evaluate the quality of cells individually from aspects of cell morphology, pluripotency verification, karyotype detection, genetic stability marker detection, and the like. This approach requires the combination of several technical approaches, such as qualified cell culturists, animal experiments (mouse teratoma experiments), tissue section staining, qPCR techniques (fluorescent quantitative PCR), etc. Besides long waiting time, a large amount of manpower and material resources must be consumed after a set of process evaluation, and the method for verifying the cell quality based on the mainstream experiment has large subjectivity and blindness, cannot form an accurate scientific system, and seriously hinders the research and clinical development of the pluripotent stem cells.

Related art research shows that the genetic stability of the pluripotent stem cells can be verified by adopting a qPCR method. Molecular markers on the surface of hESC cells can define the function of the cells. The genetic stability quality of the pluripotent stem cells is effectively proved by using the fluorescence reaction of the genes, and the genetic stability quality can be used as one of evaluation criteria of the cell quality. In addition, the result of gene expression level by transcriptome sequencing is highly correlated with the result of qPCR, and the principle is basically consistent.

Disclosure of Invention

The present invention is directed to solving at least one of the problems of the prior art described above. Therefore, the invention provides a method for evaluating the quality of the pluripotent stem cells, which can quickly evaluate the quality of clinical-grade cells.

According to one aspect of the present invention, there is provided a method of assessing the quality of pluripotent stem cells, the method comprising the steps of:

s1, carrying out sequence determination on the cells to be detected to obtain transcriptome expression lineage data and whole genome sequence data;

s2, carrying out gene expression quantity statistics based on transcriptome expression lineage data of the cells to be detected, and respectively carrying out pluripotency analysis, singularity analysis and cell characterization value analysis on the gene expression quantity statistics results; then integrating the results of analysis of cell pluripotency, singularity and cell characterization values; obtaining the mass fraction 1 of the cells; performing whole genome DNA variation analysis based on whole genome sequence data of the cells to be detected;

s3, obtaining a cell mass fraction 2 according to the cell mass fraction obtained in the step S2 and the result of DNA variation analysis of the whole genome, and evaluating the cell mass of the pluripotent stem cell.

In some embodiments of the invention, the pluripotent stem cells comprise embryonic stem cells and induced pluripotent stem cells.

In some embodiments of the invention, in step S1, the sequencing comprises one of transcriptome sequencing, whole genome sequencing, targeted sequencing and multiplex PCR sequencing, gene chips and qPCR.

In some embodiments of the invention, the transcriptome expression lineage data comprises transcriptome sequencing data.

In some embodiments of the present invention, the step S2 of counting the gene expression level further comprises the step of performing data analysis and quality control on the transcriptome expression lineage data, and comparing the obtained data with the reference genome.

In some embodiments of the invention, the data analysis and quality control comprises filtering low quality data using fastp software.

In some embodiments of the invention, the statistics of gene expression amount further comprises filtering original low quality RNA-Seq reads of Pair-end using fastp software, assembling and aligning the original RNA-Seq reads of Pair-end, aligning the reference genome, and counting gene expression amount, preferably filtering low quality data using fastp software, assembling and aligning hipat 2 software, aligning the reference genome using default parameters.

In some embodiments of the invention, a method of assessing the quality of a plurality of groups of pluripotent stem cells, the method comprising the steps of:

(1) performing sequence determination on multiple groups of cells to be detected to obtain transcriptome expression lineage data;

(2) carrying out gene expression quantity statistics based on transcriptome expression lineage data of cells to be detected, and respectively carrying out pluripotency analysis, singularity analysis and cell characterization value analysis on the gene expression quantity statistics results; then integrating the results of analysis of cell pluripotency, singularity and cell characterization values; obtaining the mass fraction 1 of the cells, and ranking the cells;

(3) selecting the cells to be detected which are 10% of the first ranking for sequence determination to obtain sequence data of the whole genome, and analyzing the DNA variation of the whole genome based on the sequence data of the whole genome;

(4) obtaining a mass fraction 2 of the cells according to the mass fraction of the cells obtained in the step (2) and the step (3) and the DNA variation analysis result of the whole genome to evaluate the cell quality of the pluripotent stem cells; the multiple groups are pluripotent stem cell groups larger than 10.

In some embodiments of the invention, the reference genome is a human reference genome.

In some embodiments of the invention, the reference genome is the human reference genome GRCh37d 5.

In some embodiments of the present invention, the statistical gene expression level is obtained by using one of htseq-count software and featurecount software.

In some embodiments of the invention, the statistical gene expression is counted using the default parameters of htseq-count (v0.13.5) software (raw count).

In some embodiments of the invention, the pluripotency analysis comprises the steps of: the method comprises the steps of performing data transformation on raw count data in a transcription group of pluripotent stem cells and non-pluripotent stem cells by using an R statistical language, adopting TDM software, correcting all data in batches by using lumi (v2.42.0) software in order to integrate sample data of different platforms, changing the transformed raw count data into data consistent with a microarray chip, performing NMF (non-negative matrix decomposition) on an obtained data matrix by using NMF software, extracting characteristic values of data of the pluripotent cells and the non-pluripotent cells in the data matrix through a machine learning model of logistic regression, adjusting parameters according to the accuracy of a training set and a verification set, determining a formula of a classification model, and calculating the pluripotency score of unknown cells.

In some embodiments of the invention, the analysis of singularities comprises the steps of: the method comprises the steps of converting raw count data obtained from an ESC (embryonic stem cell) transcriptome into data consistent with a microarray chip by using an R statistical language, adopting TDM (time division multiplexing) software to convert the converted raw count data into the data consistent with the microarray chip, correcting all data in batches by using lumi (v2.42.0) software in order to integrate sample data of different platforms, carrying out NMF (non-negative matrix decomposition) on an obtained data matrix by using NMF software, and calculating a single score by comparing residual errors of V matrixes of decomposed unknown samples and embryonic stem cell samples with Root Mean Square Errors (RMSE) to serve as a singularity score.

In some embodiments of the present invention, the obtaining of the cell characterization value comprises obtaining the expression amount of the relevant cytogenetic stability gene and cell function gene from the transcriptome expression lineage data, and calculating the baseline deviation value from the embryonic stem cell sample by using R statistical language.

In some embodiments of the invention, the cell functional genes include embryonic stem cell characterization genes and iPS cell characterization genes; the embryonic stem cell characterization genes comprise SOX2, OCT4, NANOG, SSEA-4, TRA-1-60, TRA-1-81 and SSEA-1; the iPS cell characterization genes comprise SSEA3, SSEA4, TRA-1-60, TRA-1-81, OCT4 and NANOG.

In some embodiments of the invention, the cytogenetically stable genes include tert, TET1, TET3, Sirt1, CHK1, Oct4-endo, Oct4, Nanog, and P53.

In some embodiments of the invention, the results of the analysis of the integrated cell pluripotency, the analysis of singularities, and the analysis of cell characterization values employ the following formulas:

C=∑Fia group III;

wherein, F in the formula IiIs a score in the evaluation of cell quality, and X is a score in the evaluation of pluripotencyiIs a value, X, calculated under a certain evaluation of cell qualitymaxIs the maximum threshold value, X, of the cell quality evaluation designminIs the minimum threshold value of the cell evaluation design, and q represents the number of the cell quality evaluation aspects; f in the formula IIiIs an evaluation score of one of singularity and cell characterization value in the aspect of cell quality evaluation, XiIs a value, X, calculated under a certain evaluation of cell qualitymaxIs the maximum threshold value, X, of the cell quality evaluation designminIs the minimum threshold value of the cell evaluation design, and q represents the number of the cell quality evaluation aspects; in the third formula, C represents the cell mass fraction of 1.

In some embodiments of the invention, the analysis of the cell characterization values comprises the steps of: and obtaining the expression quantity of the related cell genetic stability gene and the cell functional gene from the transcriptome expression lineage data, and counting the baseline deviation value of the expression quantity of the gene and the expression quantity of the embryonic stem cell sample gene.

In some embodiments of the present invention, the statistically calculated baseline deviation of the expression level of the gene from the embryonic stem cell sample is in R statistical language.

In some embodiments of the present invention, the processing of the whole genome data in step S3 further includes a step of performing data analysis and quality control on the whole genome data, and a step of comparing the obtained data with the reference genome.

In some embodiments of the invention, the data analysis and quality control comprises filtering low quality data using fastp software.

In some embodiments of the invention, genome-wide data processing further comprises processing and washing the raw reads (FASTQ) data of Pair-end using fastp software, assembling and aligning the raw reads of Pair-end, aligning the reference genome, analyzing genome-wide variations, preferably, processing and analyzing fastp software, assembling and aligning BWA software, aligning the reference genome using default parameters, and analyzing genome-wide DNA variations.

In some embodiments of the invention, the reference genome is a human reference genome.

In some embodiments of the invention, the reference genome is the human reference genome GRCh37d 5.

In some embodiments of the invention, in step S3, the genome-wide DNA mutation analysis includes point mutation analysis, indel analysis, copy number mutation analysis, and large fragment chromosomal mutation analysis.

In some embodiments of the invention, in step S3, the genome-wide DNA mutation analysis employs GATK4 software and cnvnator software.

In some embodiments of the invention, the genome-wide DNA mutation analysis employs HaplotypeCaller construction of GATK4 software to perform SNPs/Indel trapping on a series of processed bam files, and the resulting vcf files are annotated with disease levels using a specialized database.

In some embodiments of the invention, the genome-wide DNA mutation analysis uses cnvnator software to perform a sliding window analysis on the sorted bam files, and the resulting vcf files are also annotated for disease level using a professional database.

In some embodiments of the invention, the genome-wide analysis of DNA variations comprises performing an analysis of the total number of cellular DNA mutations, an analysis of the number of genetic diseases, an analysis of the number of possible harmful variations and an analysis of the number of benign variations on the genome-wide sequence data, followed by integrating the results of the analysis of the total number of cellular DNA mutations, the analysis of the number of genetic diseases, the analysis of the number of possible harmful variations and the analysis of the number of benign variations to obtain a genetic variation score for the cell.

In some embodiments of the invention, the results of genome-wide DNA variation analysis are calculated using the following formula:

V=∑Fi- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -formula five.

Wherein, F in the formula IViIs a score in the evaluation of cellular variation, XiNumber of variations under a certain variation factor, XmaxMaximum of the factor for each variation, XminIs the minimum of the factors for each variation, and v represents the genetic variation score in equation five.

In some embodiments of the invention, the variation factors include total number of mutations, number of genetic diseases, number of potentially harmful variations, and number of benign variations; xmaxGroups of pluripotent Stem cells X under Each variant factoriMinimum value of (1); xminThe variation number X of each group of pluripotent stem cells under each variation factoriMinimum value of (1).

In some embodiments of the invention, the method further comprises a step of culturing the cells before sequencing, wherein the step comprises culturing the cells to be tested by using Essential8 (TM) culture medium.

In some embodiments of the invention, the chromosomal variation comprises a deletion, duplication, translocation, inversion of a chromosome.

A method of constructing a pluripotent stem cell database, the method comprising the steps of: a pluripotent stem cell database was constructed by integrating transcriptome data of different pluripotent and non-pluripotent stem cells.

A method of constructing a pluripotent stem cell database, the method comprising the steps of: performing data transformation on raw count data in a transcription group of the pluripotent stem cells and the non-pluripotent stem cells by using an R statistical language, integrating sample data of different sequencing platforms, and transforming the transformed raw count data into data consistent with the microarray chip; and (4) correcting all data in batches by adopting lumi software to construct a pluripotent stem cell database.

According to the embodiment of the invention, at least the following beneficial effects are achieved: according to the scheme, bioinformatics analysis is carried out on the transcriptome gene data and the whole genome data, the obtained results of the bioinformatics analysis of the transcriptome gene data and the whole genome data are integrated, and the cell mass fraction is rapidly calculated; the method is simple, and the scheme of the invention is to comprehensively and numerically evaluate the quality of the cells according to the function of the cells and the information of the variation of the cell genome for the first time, so that the quality of Induced Pluripotent Stem Cells (iPSCs) or Embryonic Stem Cells (ESCs) can be effectively screened. In addition, a large number of experimental processes are reduced by adopting a sequencing method, the quality of the pluripotent stem cells is evaluated only from the level of transcriptome sequencing and whole genome sequencing, the time is shortened rapidly, and the obtained data result is reliable and accurate through statistical processing.

Drawings

The invention is further described with reference to the following figures and examples, in which:

FIG. 1 is a flow chart of the bioinformatic analysis of transcriptome expression lineage data in example 1 of the present invention;

FIG. 2 is a flowchart of the genome-wide data bioinformatics analysis in example 1 of the present invention;

FIG. 3 is a graph showing the analysis of the pluripotency of cells in example 1 of the present invention;

FIG. 4 is a flowchart of the method for evaluating the quality of pluripotent stem cells in example 1 of the present invention.

Detailed Description

The concept and technical effects of the present invention will be clearly and completely described below in conjunction with the embodiments to fully understand the objects, features and effects of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments, and those skilled in the art can obtain other embodiments without inventive effort based on the embodiments of the present invention, and all embodiments are within the protection scope of the present invention.

Cell culture medium: essential 8. TM. medium (Thermo Fisher SCIENTIFIC); the sequencing platform selects MGISEQ-2000 (purchased from Huada gene) and complete genome and transcriptome library establishing reagent matched with the MGISEQ-2000; extraction of Total RNA and DNA selection separatelyKits and purelink (TM) kits (Thermo Fisher SCIENTIFIC).

Example 1

The present example provides a method of assessing the quality of pluripotent stem cells, the method comprising the steps of:

1. culturing the cell to be detected by adopting Essential 8TM culture medium, and adopting total RNA extraction kit (a)Kit) and a DNA extraction kit (PureLink TM kit) for extracting DNA and RNA, sequencing the extracted DNA and RNA, selecting MGISEQ-2000 by a sequencing platform, and establishing a library reagent by adopting a Huada gene and a complete genome and a transcriptome matched with the Huada gene; and (3) obtaining transcriptome expression lineage data and whole genome sequence data (if a plurality of groups of pluripotent stem cells are evaluated, RNA is extracted for sequencing, the obtained transcriptome expression lineage data is analyzed, the plurality of groups of pluripotent stem cells are ranked, 10% of cells before the ranking are selected for whole genome sequence determination, and the group of the pluripotent stem cells is larger than 10 groups).

2. Analyzing the transcriptome expression lineage data obtained in step 1: the original RNA-Seq reads (FASTQ) of Pair-end were processed and analyzed using fastp software. The opposite ends of the reads were assembled and aligned using the HISAT2 software, aligning the human reference genome using default parameters (GRCh37d 5). Statistics of gene expression levels (raw count) were performed using the default parameters of htseq-count (v0.13.5) software, and the flow of bioinformatic analysis of transcriptome expression lineage data is shown in fig. 1.

3. Analyzing the whole genome data obtained in the step 1: raw reads (FASTQ) data from Pair-end was processed and cleaned using fastp software, assembled and aligned with the reference genome using BWA software (v0.7.1) using end-to-end reads, and aligned with the human reference genome (GRCh37d5) using default parameters to form sam files. The sam file is transformed by using samtools (v1.12) software, so that the storage of the file is convenient, and the internal genome sequence is ordered. DNA variation analysis of the whole genome calls HaplotpypeCaller construction of GATK4 software to perform SNPs/Indel calling on a series of processed bam files, and the obtained vcf files are annotated with disease levels by using a professional database. In the CNV (copy number variation) process, a cnvnator software is used for carrying out sliding window analysis on the sorted bam files, and the obtained vcf files are also annotated by using a professional database for disease level; the flow of bioinformatics analysis of the genome-wide data is shown in FIG. 2.

4. Mass analysis of cells

(1) Analysis of pluripotency

The data transformation of raw count data in the transcriptome of pluripotent stem cells and non-pluripotent stem cells is carried out by using an R statistical language, TDM software (Thompson, 2017) is adopted as a transformation method, and the purpose of the method is to integrate sample data of different sequencing platforms and transform the transformed raw count data into data consistent with a microarray chip. All data are corrected in batches by using lumi (v2.42.0) software, the NMF software is used for NMF (non-negative matrix decomposition) of an obtained data matrix, characteristic values of pluripotent cells and non-pluripotent cell data in the data are extracted through a machine learning model of logistic regression, parameters are adjusted according to the accuracy of a training set and a verification set, the highest threshold value of the pluripotency of the logistic regression model is 94.21 and the lowest threshold value is 23.344 (shown in figure 3) are constructed on the basis of the current data set, namely, cells higher than 23.344 are all pluripotent cells, cells lower than the threshold value and even cells with non-pluripotency to the negative number, a classification model is determined, and the pluripotency score of unknown cells is calculated.

(2) Analysis of singularities

The method comprises the steps of performing data transformation on raw count data obtained from a known ESC (embryonic stem cell) transcriptome by using an R statistical language, adopting TDM (Thompson, 2017) software to transform the transformed raw count data into data consistent with a microarray chip, correcting all data in batches by using lumi (v2.42.0) software in order to integrate sample data of different platforms, performing NMF (nonnegative matrix decomposition) on an obtained data matrix by using NMF software, and calculating a single score by comparing residual errors and Root Mean Square Errors (RMSE) of V matrixes of decomposed unknown samples and embryonic stem cell samples (high-quality embryonic stem cell samples from a data set constructed in the step (1)) to serve as a singularity score.

(3) Analysis of cell characterization values

Obtaining expression amounts of related cytogenetic stable genes and cytofunctional genes from transcriptome expression lineage data, the related genes including embryonic stem cell characterization genes (SOX2, OCT4, NANOG, SSEA-4, TRA-1-60, TRA-1-81 and non-expression SSEA-1 gene), iPS cell characterization genes (SSEA3, SSEA4, TRA-1-60, TRA-1-81, OCT4 and NANOG), cytogenetic stable genes (ert, TET1, TET3, Sirt1, CHK1, Oct4-end, OCT4, Nanog and P53), and calculating a deviation from a baseline value of the expression amount of the gene with an embryonic stem cell sample (a high-quality embryonic stem cell sample from the data set constructed in step (1)) using an R statistical language.

(4) Analysis of comprehensive evaluation quantifiable

Counting the deviation values of the molecules with the pluripotency, singularity and cell representation into a cell mass fraction, and performing comprehensive sequencing, wherein the sequencing formula is as follows:

C=∑Fia group III;

wherein, F in the formula IiIs a score in the evaluation of cell quality, and X is a score in the evaluation of pluripotencyiIs a value, X, calculated under a certain evaluation of cell qualitymaxIs the maximum threshold value, X, of the cell quality evaluation designminIs the minimum threshold value of the cell evaluation design, and q represents the number of the cell quality evaluation aspects; f in the formula IIiIs an evaluation score of one of singularity and cell characterization in terms of cell quality evaluation, XiIs a value, X, calculated under a certain evaluation of cell qualitymaxIs the maximum threshold value, X, of the cell quality evaluation designminIs the minimum threshold for the design of the cell evaluation, and q represents the number of cells in the evaluation (q is 3); in the third formula, C represents the cell mass fraction of 1.

The highest threshold value in the pluripotency analysis is 94.21, and the lowest threshold value is 23.344; the maximum threshold value of the singularity analysis is 1.17, and the minimum threshold value is 0.46; the highest threshold value in the cell characterization value analysis is 1, and the lowest threshold value is 0.

5. Analysis of DNA variation of Whole genome (disease analysis at genetic level)

After ranking according to the cell mass fraction of 1, selecting 10% of cells in the top ranking from the cells needing large-scale screening for whole genome sequencing, and aiming at saving sequencing cost. Ensuring that the sequencing depth is more than or equal to 30X and accords with the accepted sequencing standard in the market. And classifying the disease annotation results related to CNV and SNPs/Indel by ACMG genetic variation classification standard and guide, labeling cells with genetic risk, and checking related literature to determine authenticity. Grading and scoring the cells at the genetic level, confirming that the generated new variation does not come from a donor, counting the number of the variation, and combining the ACMG genetic variation classification standard and guideline to count harmful variation and possibly benign variation, wherein the more the variation, the more dangerous the cells are.

V=∑Fi- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -formula five.

Wherein, F in the formula IViIs a score in the evaluation of cellular variation, XiNumber of variations under a certain variation factor, XmaxMaximum of the factor for each variation, XminIs the minimum of the factors for each variation, and v represents the genetic variation score in equation five.

The variation factors include total number of mutations, number of inherited diseases, number of potentially harmful variations, and number of benign variations; xmaxThe minimum in each group of pluripotent stem cells Xi under each variant factor; xminThe minimum value among the variation numbers Xi of the groups of pluripotent stem cells under the factor of each variation.

6. Analysis of comprehensive evaluation

The results of the values (C) obtained in step 4 and (V) obtained in step 5 were averaged as the cell mass fraction 2 and sorted again.

The flow of the method for assessing pluripotent cell quality in this example is shown in FIG. 4.

Test example 1

The above method was used to evaluate the amount of 20 stem cell-like substances of unknown pluripotency, which are shown in Table 1. The cells are simply named as transcriptome, and 20 groups of pluripotent stem cell sample culture media in the table 1 all adopt Essential8TMCulturing in culture medium, and selecting MGISEQ-2000 (Chinesemedicine) and its complete genome and transcriptome library-building reagent by sequencing platform; extraction of Total RNA and DNA selection separatelyKit and PureLinkTMKit (Thermo Fisher SCIENTIFIC). Transcriptome expression lineage data and whole genome sequence data were obtained.

The resulting transcriptome expression lineage data for C1-C20 cells was used to assess stem cell quality as described in example 1 for the assessment of pluripotent stem cell quality. The following are data obtained from bioinformatic analysis of transcriptome expression lineage data as shown in table 2. As can be seen from the table, the expression levels of different genes in different stem cells are different. The cell pluripotency, singularity and deviation of cell characterization molecules were counted using R language statistics software, and the score of the cell mass score of 1 was counted, and the top 10% of cell rankings were intercepted, with the results shown in table 3.

The quality of the stem cells was evaluated using the whole genome sequence data obtained from C1-C20 cells according to the method for evaluating the quality of pluripotent stem cells described in example 1. Bioinformatic analysis of whole genome sequencing data after annotation of disease levels and statistical variation data and genetic variation scores are shown in table 4 below.

As shown in Table 5, the number of stem cell-like substances of unknown pluripotency was evaluated, and the best of the cells in this batch was C6, which was obtained by averaging the cell mass fraction 1 and the genetic variation fraction, forming the cell mass fraction 2 and then ranking.

TABLE 1

Sample name Cell type
C1-C60 iPS cell

TABLE 2

TABLE 3

TABLE 4

TABLE 5

The evaluation of cell quality is mixed, and a scientific and reasonable evaluation system is provided. In addition, the methods for verifying cell quality based on mainstream experiments are subjective and blind, which results in large amount of time and money, and may not obtain accurate results. The evaluation modes of pluripotency and singularity verification, chromosome aberration, cytogenetic stability, cell function and the like provided by the related technology are single evaluation modes, and are not integrated to form a set of process framework.

In the evaluation of the pluripotency and singularity of cells, the related art predicts the pluripotency of an unknown cell sample by means of machine learning based on a gene expression dataset of a microarray. Unfortunately, the drawbacks of this method are becoming more and more prominent, and the existing sequencing system has begun to eliminate the original microarray sequencing technology and replace it with a new generation sequencing technology, so that the original data set has begun to fail to evaluate the existing data correctly. Moreover, the technical means is limited by the understanding of the embryonic stem cells and iPS cells, the defects of the cell culture technology, the difference between the race and the sex and other problems, and the paper does not distinguish to a certain extent. Confounding experimental data is prone to erroneous assessments, which are not allowed for clinical-grade assessment of cell quality.

In the evaluation of genetic variation, the related art has developed a method that can evaluate chromosomal aberration based on transcriptome expression data. However, the transcription expression profile only has a part of the whole genome data, and the situation of the related genetic variation cannot explain the problem from the disease level, and from the technical aspect, for the micro-deletion and the multi-deletion on a part of genome, the expression profile data cannot be accurately presented, so that the situations of some chromosome deletion can not cause the occurrence of the disease, and the occurrence of the disease can be caused due to the precision deletion but not detected, so that the safety problem of the cell quality can be evaluated wrongly.

In the evaluation of cell-characterizing molecules (cytogenetic stability and cell function), experiments based on the qPCR technique need to be performed additionally, while effective data can be obtained as well using the evaluation in transcriptome data.

A single assessment method easily ignores other equally important aspects of cell quality, and an innovative assessment method that can integrate the above factors is urgently needed.

At present, the teratoma test of mice is still considered as the gold standard in the field when the pluripotency of the cells is evaluated, and is also the core defining characteristic of all the Pluripotent Stem Cells (PSC), but the evaluation mode of the characteristic is questioned by researchers because of no unified experimental standard. The teratoma experiment of the mice mainly comprises the step of generating well-differentiated teratoma after injecting pluripotent stem cells into the bodies of the immunodeficient mice. Although this approach is quantitative and subjective, only the skilled pathologist has knowledge that teratoma histology can distinguish between tumors composed primarily of poorly differentiated neuroectoderm and cystic masses composed of highly differentiated tissues of all three embryonic germ layers. The former appears as a malignant tumor, similar to a teratoma, while the latter appears as a benign mass of the envelope, a true teratoma from pluripotent stem cells. Teratoma testing has practical limitations as a routine screening tool due to the long and cumbersome time required to perform the test, as well as the requirements for animal use and expert pathology assessment. With the development of microarray (chip) sequencing technology, there is research on the evaluation of human cell pluripotency by using an economical and efficient animal-free alternative teratoma detection method, which can predict pluripotency of unknown cell samples by machine learning based on gene expression data sets of microarrays. Unfortunately, the drawbacks of this method are becoming more and more prominent, and the existing sequencing system has begun to continuously eliminate the original microarray sequencing technology, and instead, the existing sequencing system is in the form of the second generation sequencing technology or the third generation sequencing technology, so that the original data set has begun to fail to correctly evaluate the existing data. Moreover, if the method is still used to evaluate the embryonic stem cells and iPS cells of Chinese people, a great bias will occur, which often causes inaccurate phenomena, due to the understanding of the embryonic stem cells and iPS cells at the time, the defects of the cell culture technology, the difference between the race and the sex, and other problems.

For the detection of chromosome karyotype, most experiments still adopt a tissue section staining technology, and the technology can visually observe the structure and the number of chromosomes under a microscope. However, this approach is not the best option when a large number of pluripotent stem cells are required for clinical and scientific use. The related art has developed a method for evaluating chromosomal aberration based on transcript profile data, using a variation degree of SNPs (single nucleotide polymorphisms) or a gene expression level close to the average of the total sample as a criterion for measuring whether or not chromosomal variation occurs. Compared with copy number variation and mutation analysis results under whole genome sequencing, the whole genome sequencing has wider coverage and higher resolution precision, and the result of a single sample can be annotated to the disease level, which cannot be compared with the transcriptome data.

The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the gist of the present invention. Furthermore, the embodiments of the present invention and the features of the embodiments may be combined with each other without conflict.

20页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:土壤中玉米秸秆碳同化关键微生物的识别方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!