DNA fingerprint atlas database of tobacco variety and application thereof

文档序号:1320947 发布日期:2020-07-14 浏览:4次 中文

阅读说明:本技术 烟草品种的dna指纹图谱库及其应用 (DNA fingerprint atlas database of tobacco variety and application thereof ) 是由 刘楠 张威 何声宝 王英元 罗安娜 冯晓民 张晓慧 范月月 于 2020-04-27 设计创作,主要内容包括:本发明公开了烟草品种的DNA指纹图谱库,所述DNA指纹图谱库包括101个特异性SNP位点;并且本发明还提供了采用所述DNA指纹图谱库在烟草品种鉴定中的应用。本发明提供的DNA指纹图谱库以及特定烟草品种的DNA指纹图谱基于简化基因组测序得到,用于烟草品种鉴定时,与传统的外观、评吸鉴定方法相比,可以提供更加客观、更加准确、更加可靠的鉴定结果。(The invention discloses a DNA fingerprint atlas database of tobacco varieties, which comprises 101 specific SNP sites; the invention also provides application of the DNA fingerprint atlas database in tobacco variety identification. The DNA fingerprint spectrum library and the DNA fingerprint spectrum of the specific tobacco variety are obtained based on simplified genome sequencing, and when the DNA fingerprint spectrum library and the DNA fingerprint spectrum of the specific tobacco variety are used for identifying the tobacco variety, compared with the traditional appearance and comment identification method, the DNA fingerprint spectrum library and the DNA fingerprint spectrum of the specific tobacco variety can provide more objective, more accurate and more reliable identification results.)

1. A DNA fingerprint atlas database of tobacco varieties comprises the following 101 specific SNP sites:

2. a method of building a fingerprint atlas database as claimed in claim 1, the method of building comprising the steps of:

(1) collecting 131 varieties of tobacco seeds, see table 1, respectively extracting genome DNA from fresh tobacco leaves, and then carrying out enzyme digestion on the genome DNA by using restriction enzymes SacI and MseI;

(2) connecting the enzyme digestion fragment obtained in the step (1) with a sequencing joint, purifying, selecting a fragment with the length within the range of 320-420bp, enriching, amplifying and sequencing, wherein an Illumina NovaSeq sequencing system is adopted, the original sequencing read length is Paired-end 150bp, and the sequencing data is in a FASTQ format;

(3) obtaining a genome of a typical tobacco variety from ftp/ftp. solgenomics. net/genes/Nicotiana _ tabacum/edwards _ et _ al _ 2017/establishment/using the level of the scaffold as a reference genome, selecting the scaffold with the length of more than or equal to 1kb, 371,419 strips in total and 4.2Gb in total, performing simulated enzyme digestion by using restriction enzymes SacI and MseI, selecting fragments with two ends respectively being two enzyme digestion sites and the fragment length being 200-600bp, and obtaining 355,695 fragments in total;

(4) performing quality statistics on the original reads obtained in the step (2) by using software FastQC v0.11.5, and filtering by using software Trimmomatic-0.38, wherein the filtering parameters are as follows: a. removing the first 5 bases from the front end of the double-end sequencing, and removing the first 3 bases from the rear end of the double-end sequencing; b. removing bases with mass less than 20 from the beginning until bases with mass greater than 20 appear, and removing bases with mass less than 20 from the end until bases with mass greater than 20 appear; c. if the average mass of the 5 contiguous bases is less than 20, removing the 5 bases; read minimum length 50 bp; reads mean mass greater than 20;

(5) constructing a DNA fingerprint atlas database, which comprises the following steps:

(5-1) detecting the variation of the reads obtained in the step (4) by using a BWA + GATK process, wherein the variation comprises 1,219,877 indels and 2,222,588 SNPs;

(5-2) selecting the SNP, and filtering the low-quality SNP according to the principle that the QD is more than 2.0| | FS and more than 60.0| | MQ is less than 40.0| | MQRankSum < -12.5| | ReadPosRankSum < -8.0| | | QUA L <40 >, so as to obtain 978,426 high-quality SNPs;

(5-3) filtering out SNPs with allele frequency of less than 0.05 and deletion rate of more than 20% in a population by using vcftools to obtain 104,788 SNPs;

(5-4) calculating a Polymorphism Information Content (PIC) value of each SNP, and selecting SNPs with PIC values larger than 0.35 to obtain 65,582 SNPs;

(5-5) selecting SNPs on the Scaffold with the gene size larger than 50Kb according to the number and the length of the Scaffold of the reference genome to obtain 44,122 SNPs;

(5-6) annotating the SNPs, selecting SNPs in the coding region of the gene to obtain 746 SNPs, and filtering out SNPs with a heterozygosity of more than 5% to obtain 101 SNPs.

3. A method for establishing a DNA fingerprint of a tobacco variety, comprising the following steps: the method comprises the step of detecting specific nucleotides on SNP sites corresponding to standard varieties of tobacco based on the fingerprint spectrum library of claim 1 to obtain DNA fingerprint spectra of the tobacco varieties.

4. A DNA fingerprint obtained by the establishing method according to claim 3.

5. Use of the fingerprint library of claim 1 or the DNA fingerprint of claim 4 for identifying tobacco varieties.

6. The DNA fingerprint of Ningxiang sun-cured tobacco comprises 101 SNP loci, wherein the SNP loci and specific nucleotides on the SNP loci are as follows:

7. use of the DNA fingerprint of claim 6 for identifying Ningxiang sun-cured tobacco.

8. The identification method of Ningxiang sun-cured tobacco comprises the following steps:

detecting 101 SNP sites defined in claim 1 in the genome DNA of the tobacco to be detected, comparing the SNP sites with specific nucleotides on the sites defined in claim 6, and identifying the tobacco to be detected as Ningxiang sun-cured tobacco when the coincidence rate is more than or equal to 95%;

preferably, the genomic DNA is obtained from fresh leaves of the tobacco to be tested.

9. The DNA fingerprint map of Zhongyan 103 comprises 101 SNP sites, and the SNP sites and specific nucleotides thereon are as follows:

10. use of the DNA fingerprint of claim 9 for identifying cigarette 103.

11. A method of identifying Zhongyan 103, the method comprising the steps of:

detecting 101 SNP sites defined in claim 1 in the genome DNA of the tobacco to be detected, comparing the SNP sites with specific nucleotides on the sites defined in claim 9, and identifying the tobacco to be detected as the tobacco 103 when the coincidence rate is more than or equal to 95%;

preferably, the genomic DNA is obtained from fresh leaves of the tobacco to be tested.

12. The DNA fingerprint map of the Yunyan 87 comprises 101 SNP sites, wherein the SNP sites and specific nucleotides on the SNP sites are as follows:

13. use of the DNA fingerprint of claim 12 to identify Yunyan 87.

14. An identification method of Yunyan 87, comprising the steps of:

detecting 101 SNP sites defined in claim 1 in the genome DNA of the tobacco to be detected, comparing the SNP sites with specific nucleotides on the sites defined in claim 12, and identifying the tobacco to be detected as the Yunyan 87 when the coincidence rate is more than or equal to 95%;

preferably, the genomic DNA is obtained from fresh leaves of the tobacco to be tested.

Technical Field

The invention relates to the field of molecular biology research, in particular to a DNA fingerprint atlas database of tobacco varieties and application thereof in tobacco variety identification.

Background

In order to legally deal with smuggling and illegal sales of tobacco materials, it is necessary to perform an identification check on tobacco materials which are legally penalized. At present, in the identification and inspection work of tobacco raw materials, tobacco leaf-shaped samples are basically carried out by subjective means such as appearance, smoking evaluation and the like.

In order to improve the accuracy of the identification test, the research of tobacco DNA identification test has been carried out. Because the DNA molecular level can be well distinguished among different species and even among different individuals of the same species, the DNA molecular technology is widely applied to the classification and identification among genera, species and varieties and the research of genetic relationship as a means of species classification and identification, can almost classify and identify all biological species, and is the fastest to develop in the current species identification technology and the most popular.

At present, there are dozens of DNA molecular identification techniques, such as restriction fragment length polymorphism markers, random amplification polymorphism markers, sequence-specific amplification region markers, intron length polymorphism markers, single nucleotide polymorphisms, simple sequence repeat markers, and the like. The DNA molecular technology for tobacco identification at present mainly comprises SSR marking technology and SNP marking technology.

The SSR marker technology is the most widely applied DNA molecule identification method at present, but because the genetic information carried by the SSR marker is limited, the identification is carried out by the SSR marker, in order to ensure the reliability of the identification result, more than dozens to twenty pairs of primers are often required to be designed, the design and development of the primers are complex, and a large number of verification tests are required, so that the design period of the primers at the early stage is long, the experimental verification workload is large, the operation is complicated during the later stage identification, the identification flux is low, and the accuracy of the detection result still needs to be verified by sequencing in the slightly complicated identification. The SNP labeling technology comprises a mass spectrometry method, a chip method and the like, mass spectrometry equipment adopted by the mass spectrometry method is expensive, the detection flux is low, the chip method is high in cost, a gene chip needs to be customized, and the experiment operation is complex.

With the benefit of the progress of the current sequencing technology and the substantial reduction of the cost, the tobacco DNA identification by the sequencing method becomes possible. The sequencing method mainly comprises a simplified genome sequencing method and a whole genome re-sequencing method, wherein the two methods can comprehensively acquire genetic information of a detected sample, and the difference is that the number of sites required for the simplified genome sequencing is small and only 5% -10% of the genome is covered, while the whole genome re-sequencing method needs to acquire genetic variation information of the whole genome, so that the required data size is large, the cost is high, and the simplified genome sequencing can completely meet the identification requirement by combining with the actual condition of tobacco DNA identification.

The novel DNA identification method for developing tobacco varieties has positive significance for improving the accuracy of identification of related varieties and promoting the development of tobacco DNA identification and inspection work.

Disclosure of Invention

In view of the above technical problems, the present invention aims to provide a DNA fingerprinting atlas database for identifying tobacco varieties, which includes variety-specific SNP sites. In addition, two major types of chinese tobacco are sun cured tobacco, which is a typical type of Ningxiang sun cured tobacco, and flue cured tobacco, which is a typical type of Chinese tobacco, such as Zhongyan 103 and Yunyan 87. In the actual identification work, the identification requirements of the tobaccos are high, and the identification difficulty is high, so the invention also aims to provide the DNA fingerprints of the specific tobacco varieties.

The technical scheme of the invention is as follows.

In one aspect, the invention provides a DNA fingerprint atlas database of tobacco varieties, which comprises the following 101 specific SNP sites:

on the other hand, the invention also provides a method for establishing the fingerprint atlas database, which comprises the following steps:

(1) collecting 131 germplasms of tobacco varieties, and taking table 1, respectively extracting genome DNA from fresh tobacco leaves, and then respectively carrying out enzyme digestion on the genome DNA by using restriction enzymes SacI and MseI;

(2) connecting the enzyme digestion fragment obtained in the step (1) with a sequencing joint, purifying, selecting a fragment with the length within the range of 320-420bp, enriching, amplifying and sequencing, wherein an Illumina NovaSeq sequencing system is adopted, the original sequencing read length is Paired-end 150bp, and the sequencing data is in a FASTQ format;

(3) from ftp: // ftp. solgenomics. net/genes/Nicotiana _ tabacum/edwards _ et _ al _2017 >

acquiring genome of a typical tobacco variety, selecting the scaffold with the length more than or equal to 1kb by using the scaffold level as a reference genome, 371,419 pieces in total and 4.2Gb in total, performing simulated enzyme digestion by using restriction enzymes SacI and MseI, selecting fragments with two ends respectively serving as two enzyme digestion sites and the fragment length of 200-600bp, and acquiring 355,695 fragments in total;

(4) performing quality statistics on the original reads obtained in the step (2) by using software FastQC v0.11.5, and filtering by using software Trimmomatic-0.38, wherein the filtering parameters are as follows: a. removing the first 5 bases from the front end of the double-end sequencing, and removing the first 3 bases from the rear end of the double-end sequencing; b. removing bases with mass less than 20 from the beginning until bases with mass greater than 20 appear, and removing bases with mass less than 20 from the end until bases with mass greater than 20 appear; c. if the average mass of the 5 contiguous bases is less than 20, removing the 5 bases; read minimum length 50 bp; reads mean mass greater than 20;

(5) constructing a DNA fingerprint atlas database, which comprises the following steps:

(5-1) detecting the variation of the reads obtained in the step (4) by using a BWA + GATK process, wherein the variation comprises 1,219,877 indels and 2,222,588 SNPs;

(5-2) selecting the SNP, and filtering the low-quality SNP according to the condition that QD is less than 2.0| | FS is more than 60.0| | | MQ is less than 40.0| | MQRankSum is less than-12.5 | | ReadPosRankSum is less than-8.0 | | | QUA L and less than 40 to obtain 978,426 high-quality SNPs;

(5-3) filtering out SNPs with allele frequency of less than 0.05 and deletion rate of more than 20% in a population by using vcftools to obtain 104,788 SNPs;

(5-4) calculating a Polymorphism Information Content (PIC) value of each SNP, and selecting SNPs with PIC values larger than 0.35 to obtain 65,582 SNPs;

(5-5) selecting SNPs on the Scaffold with the gene size larger than 50Kb according to the number and the length of the Scaffold of the reference genome to obtain 44,122 SNPs;

(5-6) annotating the SNPs, selecting SNPs in the coding region of the gene to obtain 746 SNPs, and filtering out SNPs with a heterozygosity of more than 5% to obtain 101 SNPs.

On the other hand, the invention also provides a method for establishing the DNA fingerprint spectrum of the tobacco variety, which comprises the following steps: based on the fingerprint spectrum library provided by the invention, specific nucleotide on the SNP site corresponding to the standard variety of the tobacco is detected, and the DNA fingerprint spectrum of the tobacco variety is obtained.

Thus, in a further aspect, the invention also provides a DNA fingerprint obtained by the establishing method. In another aspect, the invention also provides the application of the DNA fingerprint atlas database or the DNA fingerprint atlas in identifying tobacco varieties.

According to a specific embodiment of the invention, the invention provides a DNA fingerprint of Ningxiang sun tobacco, wherein the DNA fingerprint comprises 101 SNP sites, and the SNP sites and specific nucleotides thereon are as follows:

the DNA fingerprint can be used for identifying Ningxiang sun-cured tobacco, so the invention also provides application of the DNA fingerprint in identifying Ningxiang sun-cured tobacco.

Therefore, the invention also provides an identification method of Ningxiang sun-cured tobacco, which comprises the following steps:

101 SNP loci in the DNA fingerprint spectrum library provided by the invention in the genome DNA of the tobacco to be detected are detected, and are compared with specific nucleotides on the loci in the DNA fingerprint spectrum of Ningxiang sun-cured tobacco, and the tobacco to be detected is identified as Ningxiang sun-cured tobacco when the coincidence rate is more than or equal to 95 percent. Wherein preferably, the genomic DNA is obtained from fresh tobacco leaves of the tobacco to be tested.

According to a specific embodiment of the present invention, the present invention provides a DNA fingerprint of snuff 103, the DNA fingerprint comprises 101 SNP sites, and the SNP sites and specific nucleotides thereon are as follows:

the DNA fingerprint can be used for identifying the cigarette 103, so the invention also provides the application of the DNA fingerprint in identifying the cigarette 103.

Accordingly, the present invention also provides a method for identifying Zhongyan 103, comprising the steps of:

detecting 101 SNP sites in the DNA fingerprint spectrum library provided by the invention in the genome DNA of the tobacco to be detected, comparing with the specific nucleotide at the site in the DNA fingerprint spectrum of the tobacco 103, and identifying the tobacco to be detected as the tobacco 103 when the coincidence rate is more than or equal to 95%. Wherein preferably, the genomic DNA is obtained from fresh tobacco leaves of the tobacco to be tested.

According to a specific embodiment of the invention, the invention provides a DNA fingerprint of the Yunyan 87, the DNA fingerprint comprises 101 SNP sites, and the SNP sites and specific nucleotides thereon are as follows:

the DNA fingerprint can be used for identifying the Yunyan cigarette 87, so the invention also provides the application of the DNA fingerprint in identifying the Yunyan cigarette 87.

Therefore, the present invention also provides an identification method of the cloud 87, which comprises the following steps:

detecting 101 SNP sites in the DNA fingerprint spectrum library provided by the invention in the genome DNA of the tobacco to be detected, comparing with the specific nucleotide on the site in the DNA fingerprint spectrum of the Yunyan 87, and identifying the tobacco to be detected as the Yunyan 87 when the coincidence rate is more than or equal to 95%. Wherein preferably, the genomic DNA is obtained from fresh tobacco leaves of the tobacco to be tested.

Specifically, the identification method provided by the present invention preferably comprises the following steps:

(1) extracting genome DNA from a tissue sample of tobacco to be detected, and carrying out enzyme digestion on the genome DNA by using restriction endonucleases SacI and MseI;

(2) connecting the enzyme digestion fragment obtained in the step (1) with a sequencing joint, purifying, selecting a fragment with the length within the range of 320-420bp, enriching, amplifying and sequencing, wherein an Illumina NovaSeq sequencing system is adopted, the original sequencing read length is Paired-end 150bp, and the sequencing data is in a FASTQ format;

(3) and (3) comparing the nucleotide on the 101 SNP loci obtained in the step (2) with the specific nucleotide on the loci in DNA fingerprints of Ningxiang sun-cured tobacco, Zhongyan 103 or Yunyan 87, and identifying the tobacco to be detected as Ningxiang sun-cured tobacco, Zhongyan 103 or Yunyan 87 when the coincidence rate is more than or equal to 95%.

In the identification method, in step (1), the tissue sample is fresh tobacco leaves; the enzyme cutting system is as follows:

incubate at 37 ℃ for more than 4 h.

The specific scheme of the invention is as follows.

The research process and principle of the invention are as follows: the method comprises the steps of widely collecting the germplasm of the tobacco variety planted at present, constructing a tobacco DNA database based on simplified genome sequencing, screening and filtering specific loci of the tobacco variety in the database, and verifying the specific loci. The method comprises the following specific steps: 1. widely collecting the germplasm of the tobacco variety currently planted; 2. respectively carrying out DNA extraction on the collected tobacco variety germplasms, and carrying out quality analysis on the extracted DNAs; 3. respectively carrying out enzyme digestion on the extracted DNA by using a proper enzyme digestion method, connecting a designed sequencing joint with an enzyme digestion fragment, purifying the fragment added with the joint, then carrying out fragment selection, carrying out enrichment amplification on the selected DNA fragment, and carrying out machine sequencing; 4. selecting a published genome of a typical tobacco variety as a reference genome, simulating enzyme digestion of the reference genome according to the step 3, selecting fragments with proper positions and lengths, and analyzing by using biological information analysis software; 5. comparing the reads of each sample to a reference genome by adopting software, setting reasonable screening and filtering parameters, and constructing a database of specific sites for planting tobacco varieties; 6. screening out specific sites of specific tobacco varieties, establishing a fingerprint spectrum based on simplified genome sequencing and an identification method adopting the fingerprint spectrum; 7. the reliability of the method is verified in a blind mode.

The method comprises the following specific steps:

1) a total of 131 varieties of tobacco currently planted are widely collected (see Table 1).

2) Respectively carrying out DNA extraction on collected 131 tobacco variety germplasms, (1) respectively grinding tissues of fresh tobacco leaves of different varieties into fine powder, putting the fine powder into a centrifugal tube of 2m L, (2) adding 600-800 mu L CTAB extraction buffer solution, uniformly mixing (CTAB is preheated in a 65 ℃ water bath), gently shaking for several times every 3-5 min, centrifuging for 10-15 min at 12000r/min after 20min, (3) carefully sucking supernatant, adding equal volume of phenol and chloroform (each 400 mu L) solution, uniformly mixing, 4 ℃, 12000r/min, centrifuging for 10-15 min, (4) carefully sucking supernatant, adding equal volume of chloroform, uniformly mixing, 4 ℃, 12000r/min, centrifuging for 10-15 min, repeating for 2-3 times until no protein layer appears, (5) taking supernatant, precipitating for 1-2 h at-20 ℃,4 ℃, 12000r/min, centrifuging for 10-15 min, 6) discarding supernatant, washing and precipitating for 2 times by using 50-70% ethanol, drying for later use, (260) for later use, drying for later use, and carrying out DNA extraction, wherein the DNA extraction is carried out after OD is larger than 0.1-40 g, the OD is larger than 0.8, and DNA degradation is detected under the OD is detected under the condition that no DNA is larger than 0.1-20 ℃, and the OD is detected.

3) The genomic DNA was digested with the restriction enzymes SacI and MseI,

the enzyme digestion system is as follows:

the enzyme digestion conditions are as follows: incubating at 37 ℃ for more than 4h,

then, a commercial sequencing company connects the enzyme digestion fragments with a designed sequencing adaptor; purifying the fragments added with the joint, then selecting the fragments, selecting the fragments with the length within the range of 320-420bp, finally carrying out enrichment amplification on the selected DNA fragments and carrying out computer sequencing. The sequencing platform uses an Illumina NovaSeq sequencing system, the original sequencing read length is Paired-end 150bp, the sequencing data is in a FASTQ format, and the sequencing data statistics of all 131 tobacco germplasms are shown in Table 3.

4) The genome of a typical tobacco variety that has been published was selected as a reference genome (ftp: net/genes/Nicotiana _ tabacum/edwards _ et _ al _ 2017/association /). The total length of chromosome level of this reference genome is only 2.8Gb, so using the scaffold level as a reference genome, 371,419 total scaffold of length ═ 1kb were selected, and the total length was 4.2Gb (see table 4). Restriction enzymes SacI (GAGCTC) and MseI (TTAA) are used for simulating enzyme digestion of a reference tobacco genome, fragments with two ends respectively serving as two enzyme digestion sites and the fragment length of 200-600bp are selected, and 355,695 fragments are obtained in total.

The original reads quality statistics obtained in the step 3) uses software FastQC v0.11.5 (see figure 1); the filter software used trimmatic-0.38, data statistics as in table 4, and the filter parameters were as follows: a. removing the first 5 bases from the front end of the double-end sequencing, and removing the first 3 bases from the rear end of the double-end sequencing; b. removing bases with mass less than 20 from the beginning until bases with mass greater than 20 appear, and removing bases with mass less than 20 from the end until bases with mass greater than 20 appear; c. if the average mass of the 5 contiguous bases is less than 20, removing the 5 bases; read minimum length 50 bp; reads mean mass greater than 20.

5) Constructing a DNA fingerprint atlas database:

(1) the method comprises the steps of (1) detecting mutations of reads obtained in step 4 by using a B psiA + GATK process, (2) selecting SNPs to perform further fingerprint development, filtering low-quality SNPs according to a QD <2.0| | FS >60.0| | MQRankSum < -12.5| | ReadPosRankSum < -8.0| | QUA L <40 > to obtain 978,426 high-quality SNPs, (3) filtering out SNPs with vcftools, wherein the allele frequency is less than 0.05, the deletion rate of the group is more than 20% to obtain 104,788 SNPs, calculating the PIC Information Content (PIC) value of each SNP, selecting SNPs with a PIC value more than 0.35 according to the result (figure 2), performing next analysis, counting 7 SNPs, calculating 355) and performing a more significant screening on coding region of a coding region (Scafed gene) of a coding region, wherein the coding region has a relatively high correlation score, and the coding region is selected from a coding region of a gene with a gene sequence of a gene of a genotype of 10, a genotype of a genotype.

6) Obtaining the DNA fingerprint of a specific tobacco variety:

6-1) Ningxiang sun-cured tobacco:

the DNA fingerprints of Ningxiang sun-cured tobacco obtained by determining the specific nucleotides at 101 sites with reference to the sequencing result of T131 Ningxiang sun-cured tobacco in Table 1 are shown in Table 6.

Next, blind sample DNA identification is carried out on the Ningxiang sun-cured tobacco specific locus obtained in the step 6-1). In blind sample identification, 10 tobacco samples (wherein 5 is Ningxiang sun-cured tobacco) are randomly extracted, a simplified genome sequencing identification method of Ningxiang sun-cured tobacco is verified, the result shows that 99 core SNP sites of the sample 5 have coincidence, the coincidence rate is more than 95%, the sample 5 is judged to be Ningxiang sun-cured tobacco, and the fact is also consistent, so that the DNA identification method of Ningxiang sun-cured tobacco based on simplified genome sequencing is reliable, and DNA identification can be carried out on Ningxiang sun-cured tobacco.

6-2) smoke 103:

the DNA fingerprint of the middle tobacco 103 obtained by determining the specific nucleotides at 101 positions by referring to the sequencing result of the tobacco 103 in T2 in Table 1 is shown in Table 7.

Next, blind DNA identification was performed on the specific locus of Zhongyan 103 obtained in step 6-2). In blind sample identification, 10 tobacco samples (wherein No. 3 is the Zhongyan 103) are randomly extracted, the simplified genome sequencing identification method of the Zhongyan 103 is verified, and the result shows that 96 core SNP sites of the No. 3 sample are consistent, the coincidence rate is more than 95%, and the No. 3 sample is judged to be the Zhongyan 103, which is consistent with the fact, so that the DNA identification method of the Zhongyan 103 based on the simplified genome sequencing is reliable and can be used for DNA identification of the Zhongyan 103.

6-3) Yunyan 87:

the DNA fingerprints of yunyan 87 obtained by determining the specific nucleotides at 101 positions by referring to the sequencing results of T101 yunyan 87 in table 1 are shown in table 8.

Next, blind sample DNA identification is carried out on the specific locus of the Yunyan 87 obtained in the step 6-3). In blind sample identification, 10 tobacco samples (wherein No. 7 is Yunyan 87) are randomly extracted, a simplified genome sequencing identification method of the Yunyan 87 is verified, and the result shows that 97 core SNP sites of the No. 7 sample are consistent, the coincidence rate is more than 95%, and the No. 7 sample is judged to be the Yunyan 87, which is consistent with the fact, so that the method for identifying the DNA of the Yunyan 87 based on the simplified genome sequencing is reliable and can be used for identifying the DNA of the Yunyan 87.

Compared with the prior art, the method takes the specificity of the tobacco DNA as an identification basis, so that the method has the advantages of being more objective, more accurate and more reliable compared with the traditional appearance and comment identification method.

Table 1 collected tobacco variety germplasm

Table 2 results of quality analysis of extracted DNA

TABLE 3 simplified sequencing data statistics for tobacco germplasm

TABLE 4 control tobacco genome scaffold statistics

No.of scaffold Length of scaffold(bp)
0-1k 713,013 447,478,500
1-10k 354,214 701,677,344
10-100k 8,431 342,749,966
100-500k 7,092 1,645,198,586
500k-1M 1,230 840,375,009
1M 452 717,469,393
Toltal 1,084,432 4,694,948,798

TABLE 5 heterozygosity of individual samples in the core SNP marker (sample ignore with heterozygosity equal to 0)

Drawings

Embodiments of the invention are described in detail below with reference to the attached drawing figures, wherein:

fig. 1 shows the raw reads quality statistics, wherein fig. 1A: (a) content of each base sequence at the front end of double-end sequencing; FIG. 1B: (b) the quality of each base sequence at the front end of double-end sequencing; FIG. 1C: (c) content of each base sequence at the rear end of double-end sequencing; FIG. 1D: (d) quality of each base sequence at the rear end of double-ended sequencing.

FIG. 2 shows the polymorphism information content distribution of 104,788 SNP markers.

Detailed Description

The invention is illustrated below with reference to specific examples. The experimental procedures in the following examples are conventional unless otherwise specified. The raw materials and reagent materials used in the following examples are all commercially available products unless otherwise specified.

45页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种用于培育优质小麦的亲本品种的筛选方法及其使用的引物组组合

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!