Method for quantitatively evaluating polyploid biological genome diploidy degree

文档序号:1273775 发布日期:2020-08-25 浏览:17次 中文

阅读说明:本技术 一种多倍体生物基因组二倍化程度量化评估的方法 (Method for quantitatively evaluating polyploid biological genome diploidy degree ) 是由 刘海平 牟振波 肖世俊 于 2020-05-12 设计创作,主要内容包括:发明涉及一种多倍体生物基因组二倍化程度量化评估的方法,包括以下步骤:1)根据基因组的Kmer分析结果,评估基因组的杂合度;2)根据基因组的杂合度特征,构建多倍体基因组特征模型;3)在该多倍体基因组特征模型基础上,模拟不同二倍化程度的多倍体基因组;4)评估多倍体二倍化率估计的准确性,并计算获得该多倍体基因组的二倍化程度。本方法首次利用重测序数据进行多倍体基因组二倍化分析,成本低,适用于动植物的多倍体基因组研究。(The invention relates to a method for quantitatively evaluating the doubling degree of a polyploid organism genome, which comprises the following steps: 1) evaluating the heterozygosity of the genome according to the Kmer analysis result of the genome; 2) constructing a polyploid genome feature model according to the heterozygosity feature of the genome; 3) on the basis of the polyploid genome feature model, polyploid genomes with different degrees of diploidy are simulated; 4) evaluating the accuracy of polyploid diploidy estimation and calculating the diploidy degree of the polyploid genome. The method performs diploid analysis on the polyploid genome by utilizing the resequencing data for the first time, has low cost and is suitable for polyploid genome research of animals and plants.)

1. A method for quantitatively evaluating the doubling degree of a polyploid organism genome is characterized by comprising the following steps:

s1: evaluating the heterozygosity of the genome according to the K-mer analysis result of the genome;

s2: constructing a polyploid genome feature model according to the heterozygosity features of the genome, and calculating the genome diploidization degree by using the information data obtained in the step S1, wherein the method specifically comprises the following steps:

m is the total number of genome K-mers;

n is the total number of the K-mers in the repeat region;

alpha is the doubling rate;

beta is the ratio of the diploid repetitive sequences;

k is genomic heterozygosity;

s3: simulating polyploid genomes of different degrees of diploidy, simulating genome sequences of different degrees of diploidy on the basis of the polyploid genome feature model, and calculating the diploidy of the simulated genomes by utilizing the relation of the diploidy constructed in the step S2;

s4: evaluating the accuracy of polyploid diploidy and calculating the diploidy degree of the polyploid genome, specifically:

obtaining the doubling rate according to the simulated doubling and calculation in the step S3, evaluating the accuracy of the result of the quantitative evaluation of the doubling rate to evaluate the accuracy of the model in the step S2, and obtaining the doubling rate of the polyploid genome according to the step S2 by using the real genome sequence on the basis of the accuracy of the quantitative evaluation of the doubling rate.

2. The method of claim 1, wherein the step S1 is performed by calculating the types of kmers and the number of each type in the polyploid genome second generation high throughput re-sequencing data, so as to construct the whole genome Kmer peak map. And (4) judging important characteristic indexes of the genome, such as the size of the genome, heterozygosity and the like according to the peak image characteristics.

3. The method of claim 1, wherein said statistical genomic resequencing data of step S1 has a Kmer read length of 17 bp.

4. The method of claim 1, wherein step S3 is to randomly introduce single nucleotide mutations and small fragment indel mutations into the genome, so that the simulated polyploid genome has a doubling degree of 0.1 to 0.9 and a simulation interval of 0.1, thereby obtaining standard data of polyploid genomes with different doubling degrees.

5. The method for quantitative assessment of the degree of polyploid organism genome diploidy according to claim 1, wherein said assessment method of step S4 is: using the calculated doubling factor of the step S2, and performing linear regression with the simulation standard of the step S3 to evaluate the model accuracy.

[ technical field ] A method for producing a semiconductor device

The invention relates to a biological genome analysis technology, in particular to a method for quantitatively evaluating the doubling degree of polyploid biological genomes.

[ background of the invention ]

Ploidy is a method of genome evolution, a state of doubling of chromosome data in animal and plant genomes. Genomic polyploidization is found primarily in plants, in the genomes of some animals, particularly roundworms and amphibians. On diploid plant and animal genomes, there are two copies (2N) of each chromosome, one from the male parent and one from the female parent. The core of the gamete of a diploid species is haploid, that is, half of the karyotype of the diploid body. Among the polyploid types, polyploids are classified into triploid (3N), tetraploid (4N), etc. according to the state of chromosome doubling. But in nature, tetraploid polyploids are more common.

Ploidy is the result of chromosome doubling of the genome of an organism, usually due to the cells being subjected to extreme conditions such as low temperature, particle irradiation, chemical agents or changes in the physical state of the cells. These factors may cause the cells to undergo abnormalities during meiosis or mitosis, resulting in an increase in chromosome number. However, genomic studies have shown that genomic polyploidization can bring significant selection advantages to organisms. Such as plant and animal polyploids, generally have greater growth advantages and are also more environmentally compatible. Thus, many scientists believe that genomic polyploidization is a molecular mechanism by which organisms evolve and adapt to form in extreme environments.

Polyploids are formed at the moment, and the copied chromosome sequences have very high similarity. This similarity leads to abnormal disorder of chromosome association and pairing, and thus the offspring of polyploid individuals tend to have low fertility rates. Therefore, after the organism is multiplied, the genome begins to enter the doubling process, i.e., two copied genomes relatively accumulate mutations, so that the multiple genomes gradually transit to the doubling state. Therefore, by quantifying the degree of polyploid genome diploidy, it is possible to estimate the time when the polyploid genome occurs and analyze the evolution characteristics of the polyploid genome. However, there is no method for quantitative assessment of polyploid genome diploidy using polyploid genome sequences.

[ summary of the invention ]

In order to evaluate the doubling state of the polyploid genome, the invention provides a method for analyzing genome sequencing data, which analyzes the doubling state of the polyploid genome and quantitatively evaluates the doubling degree. The method is the first technology for carrying out quantitative analysis on the diploid genome diploidy state.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a method for quantitatively evaluating the doubling degree of polyploid organism genome comprises the following steps:

1. evaluating the heterozygosity of the genome according to the Kmer analysis result of the genome;

2. constructing a polyploid genome feature model according to the heterozygosity feature of the genome;

3. on the basis of the polyploid genome feature model, polyploid genomes with different degrees of diploidy are simulated;

4. evaluating the accuracy of polyploid diploidy estimation and calculating the diploidy degree of the polyploid genome.

Further, the step 1 evaluates heterozygosity of the genome based on the result of Kmer analysis of the genome. And (4) counting the types of the Kmers and the number of each type of the genome re-sequencing data through the second-generation high-throughput re-sequencing data of the polyploid genome, thereby constructing a Kmer peak map of the whole genome. And (4) judging important characteristic indexes of the genome, such as the size of the genome, heterozygosity and the like according to the peak image characteristics.

Further, the Kmer read length of the statistical genome re-sequencing data in the step 1 is preferably 17 bp.

Further, the step 2 constructs a model relationship among the total number of the genome Kmer, the total number of the repeat region Kmer, the proportion of the homologous region, the proportion of the diploid repeat sequence and the heterozygosity of the genome by using the information of the genome size, the heterozygosity and the proportion of the repeat sequence obtained in the step 1. Specifically, M is the total number of K-mers in the genome, N is the total number of K-mers in the repeat region, alpha is the doubling rate, beta is the ratio of diploid repeat sequences, and K is the heterozygosity of the genome.

Further, the step 2 proposes that the genome characteristics satisfy the following relationship: α M-k α M + β (1- α) M ═ N. It can be deduced that the doubling of the genome is estimated as:

further, in the step 3, a single nucleotide mutation (SNV) and a small fragment insertion deletion mutation (InDel) are randomly introduced into the genome such that the degree of diploidization of the simulated polyploid genome is 0.1 to 0.9 and the simulation interval is 0.1, thereby obtaining the standard data of the polyploid genome with different degrees of diploidization.

Further, the evaluation method in the step 4 specifically includes: obtaining standard data of doubling rate according to the simulated polyploid genome calculation in the step 3 so as to evaluate the accuracy of the model in the step 2). And (3) on the basis of the accuracy of the quantitative evaluation of the doubling rate, calculating to obtain the doubling rate of the polyploid genome by using the real genome sequence and the model in the step (2).

Further, in the accuracy evaluation method in step 4, preferably, the doubling factor obtained in step 2 and the simulation standard in step 3 are subjected to linear regression evaluation, and the obtained regression coefficient is used as a basis for accuracy judgment.

Compared with the prior art, the invention has the beneficial effects that:

the method for quantitatively evaluating the polyploid biological genome diploidy degree can quantitatively evaluate the polyploid genome diploidy degree by utilizing the whole genome re-sequencing data and provide basic parameter information for the research of polyploid genome evolution.

The method is the first method for carrying out diploid genome analysis on polyploid genome by using re-sequencing data, so that the method is low in cost and suitable for polyploid genome research of animals and plants.

[ description of the drawings ]

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

FIG. 1 is a schematic diagram of the method for constructing a polyploid feature model.

FIG. 2 is a schematic diagram of a linear regression for verifying the accuracy of the double-fold estimation in example 6 of the present invention.

[ detailed description ] embodiments

The present invention will be further described with reference to the following examples, but the present invention is not limited to these examples. This example utilizes the method for quantitative evaluation of the degree of polyploid organism genome diploidy provided by the present invention.

7页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种模型非依赖的基因组结构变异检测系统及方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!