Apostichopus japonicus breeding whole genome 50K SNP chip and application

文档序号:1961430 发布日期:2021-12-14 浏览:16次 中文

阅读说明:本技术 一种仿刺参育种全基因组50k snp芯片及应用 (Apostichopus japonicus breeding whole genome 50K SNP chip and application ) 是由 王杨帆 王孟秋 倪萍 吕佳 王师 胡景杰 包振民 于 2021-07-07 设计创作,主要内容包括:本发明公开了一种仿刺参育种全基因组50KSNP芯片及应用,包括(1)仿刺参全基因组范围的50kSNP芯片的开发:通过构建仿刺参样品群体,全基因组范围的SNP分型,对仿刺参50KSNP标记筛选,HD-marker高密度芯片的设计和开发探针的设计进而获得48K位点的液相芯片池;(2)测试芯片的准确性和分型效果,通过DNA样品质量检测,HD-Marker芯片检测并对结果进行分析确保其具有较高的准确性和较好的分型效果,本芯片可应用于仿刺参全基因组育种芯片在不同群体仿刺参的遗传背景分析、仿刺参性状关联分析。(The invention discloses a full genome 50KSNP chip for apostichopus japonicus breeding and application thereof, which comprises the following steps of (1) developing a 50kSNP chip in the full genome range of apostichopus japonicus: by constructing an apostichopus japonicus sample population, SNP typing in the whole genome range, screening of an apostichopus japonicus 50KSNP marker, design of an HD-marker high-density chip and design of a development probe, a liquid phase chip pool with 48K sites is further obtained; (2) the accuracy and the typing effect of the chip are tested, the high accuracy and the good typing effect of the chip are ensured by detecting the quality of a DNA sample and detecting an HD-Marker chip and analyzing the result, and the chip can be applied to the genetic background analysis and the apostichopus japonicus character correlation analysis of the apostichopus japonicus whole genome breeding chip in different groups of apostichopus japonicus.)

1. A apostichopus japonicus breeding whole genome 50K SNP chip is characterized in that: the SNP chip comprises an SNP marker combination for apostichopus japonicus breeding and a liquid phase breeding chip for apostichopus japonicus breeding, the SNP marker combination for apostichopus japonicus high temperature resistance character breeding consists of 48755 SNP sites, nucleotide sequences of the SNP sites are respectively sequences shown as SEQ No.001-SEQ ID No.48755, the length of the SNP sites is 49bp, the liquid phase breeding chip for apostichopus japonicus breeding comprises 48755 pairs of probe sequences, each SNP site corresponds to two probe sequences, and the SNP sites are respectively a Forward probe and a Reverse probe.

2. The apostichopus japonicus breeding whole genome 50K SNP chip according to claim 1, wherein: the apostichopus japonicus whole genome breeding chip plays a role in genetic background analysis of apostichopus japonicus of different populations.

3. The apostichopus japonicus breeding whole genome 50K SNP chip according to claim 1, wherein: the apostichopus japonicus whole genome breeding chip is applied to apostichopus japonicus character correlation analysis.

4. The apostichopus japonicus high temperature resistant breeding low-density 12K SNP chip according to claim 1, characterized in that: the apostichopus japonicus high temperature resistant breeding low-density 12K SNP chip is applied to the whole genome selective breeding value analysis of high temperature resistant characters.

5. The method for preparing the whole genome 50K SNP chip for stichopus japonicus breeding according to claims 1 to 4, which comprises the following steps:

s1, adding 500ul STE lysis buffer, 50ul 10% SDS,3.5ul proteinase K with the concentration of 20mg/ml and 16ul RNase A with the concentration of 100mg/ml into a 1.5ml tube, wherein the lysis buffer comprises 100mM NaCl, 10mM Tris-Cl with the pH value of 8.0 and 1mM EDTA with the pH value of 8.0, taking 0.1 g of scallop adductor muscle, adding, shearing, grinding by a grinding rod to be flocculent, processing at 56 ℃ for 2h, reversing and mixing uniformly every 30mins, finally clarifying the lysate, adding 500ul Tris saturated phenol, the volume ratio of 100ul being 24:1, gently shaking for 20min, and centrifuging at 12000rpm at room temperature for 10 min;

s2, extracting the supernatant to a new EP tube of 1.5ml, adding 300ul Tris saturated phenol, wherein the volume ratio of 300ul is 24:1, gently shaking the mixture for 20mins, centrifuging the mixture at the room temperature of 12000rpm for 10 minutes, and repeating the step S2 for two to three times until no protein layer exists;

s3, extracting a supernatant, adding 500uL of chloroform/isoamylol with the same volume, slightly shaking for 20min, centrifuging at room temperature of 8000rpm for 10min, adding 1ml of ice absolute ethyl alcohol and 50uL of sodium acetate (3M), standing at 20 ℃ for 40min, centrifuging at 12000rpm for 10min to precipitate nucleic acid, discarding the supernatant, washing the precipitate for 2 times by using 70% ethanol, centrifuging at low temperature of 8000rpm for 5min each time, drying until the ethanol is completely volatilized, adding 30uL of ddH2O for dissolving, adding 0.75uL of RNase to digest RNA at 37 ℃ for 1.5h, quantifying DNA by using a Qubit kit, detecting the quality of the DNA by agarose gel electrophoresis with the concentration of 1%, and storing the extracted DNA at-20 ℃ for later use;

s4, using a Covaris crusher to perform breaking treatment on the extracted apostichopus japonicus genomic DNA, setting the breaking range at 350bp, using a genomic DNA library construction kit to perform end repair and A addition on the DNA fragment, then connecting joints at two ends for amplification, using a primer with Barcode to perform library amplification, completing library construction, using Qubit 2.0 to perform library quantification, performing sequencing on an Illumina HiSeq X Ten PE150 platform, using an index command of Bwa software, an index command of Samtools, and a createSequential dictionary of Picard to construct an index of a reference sequence, using a Bwa-mem command to perform comparison on double-end sequencing reads, generating a bam file, using a sort command of Samtools to perform sorting, generating a sorted bam file, using Picard Markduplicates command, using a dumper command, setting a parameter E _ MOVICIC, discarding the sample sequence to perform individual index detection on the variation in a sample group by using a Samtwood, and using a GAtumble module to detect variation, the method specifically comprises the steps that a HaplotpypeCaller module which is applicable to group variation detection in the GATK software performs variation detection on all samples, wherein each sample generates a gvcf file, then joint-genetic of a group is performed, and variation and genotype data of an individual are corrected according to variation information of the group;

s5, selecting a binary SNP locus, and filtering SNP loci exceeding 3 in a 10bp window; filtering low-quality SNP sites of QD <2.0, FS >20.0, MQ <40.0, DP <6.0, DP >1000.0, MQRankSum < -12.5, ReadPosRankSum < -8.0; sites with a minimum allele of less than 0.05 were filtered. Finally, 967 ten thousand of high-quality SNPs are obtained, and the next low-density SNP selection is carried out;

s6, screening out SNPs shared by different groups, then reducing or deleting SNPs in high linkage disequilibrium in the shared SNPs, adopting R2>0.35 of LD as a deletion SNP scale, constructing an SNP selection optimization model under constraint conditions according to Wright' S FST, the average Euclidean distance of SNP gene frequency and information entropy, optimizing and unpacking by utilizing R software to obtain SNP gene frequency containing high-information SNP, wherein the SNP can be more uniformly distributed on a genome, and obtaining SNP sites which are important for cloned apostichopus japonicus and are related to functions of high-temperature-resistant character related genes, obtaining the SNP sites which are obviously related to tare and high-temperature-resistant characters by utilizing GWAS association analysis through GWAS Pval values, annotating the high-quality SNPs by utilizing SnpEff [179] software, determining gene elements where the SNPs are located and influence on the change of amino acids;

s7, selecting a base sequence of 22bp at the upstream and a sequence of 22bp at the downstream of the SNP site as a specific probe of the site, and designing and evaluating a flanking probe according to the following HD-Marker probe design principle:

the flanking probe sequences need to meet a GC content of between 40% and 60%

Tm value of 55 to 65 DEG C

Regions within the flanking probes that cannot have more than 5 contiguous bases

The region with more than 80% flanking sequence match cannot be greater than 5

The number of variation sites within the flanking sequence of the probe does not exceed 3

The number of sites meeting the design standard is 48755, and the site information passing the design standard is integrated to form an HD-marker liquid chip pool containing the site information, probe sequences and annotation information. 22778 sites on a chip are derived from a gene region, the number of covered genes is 6955, the sites of an intergenic region are 25977, the 5 'end of an upstream probe and the 3' end of a downstream probe are respectively connected with a 22bp primer sequence universal for Illumina platform sequencing to form a Forward probe and a Reverse probe, the Forward probes of all the sites are gathered to form a Forward probe pool, the Reverse probes of all the sites are gathered to form a Reverse probe pool, and the Forward probe pool and the Reverse probe pool are synthesized to obtain a liquid phase chip pool with 48k sites.

The technical field is as follows:

relates to the fields of molecular biology, functional genomics, bioinformatics and molecular breeding, in particular to SNP site screening related to growth traits, a Apostichopus japonicus genome SNP chip and a preparation method thereof, and also relates to application of the Apostichopus japonicus SNP chip.

Background art:

molecular Markers (Molecular Markers), genetic Markers based on nucleotide sequence variations in the genetic material between individuals, are a direct reflection of genetic polymorphisms at the DNA level. Compared with other genetic markers, namely morphological markers, biochemical markers and cytological markers, the DNA molecular marker has the following advantages: most molecular markers are co-dominant, and selection of recessive characters is very convenient; the genome variation is extremely abundant, and the number of molecular markers is almost unlimited; the detection means is simple and rapid. With the development of molecular biology technology, there are dozens of DNA molecular marker technologies, and the DNA molecular marker technologies are widely applied to the aspects of genetic breeding, genome mapping, gene localization, species genetic relationship identification, gene library construction, gene cloning and the like. Genetic markers undergo 4 stages of development: morphological level; cytology and chromosome level; protein and isozyme marker levels; the DNA molecular level (Zhuyuxian, Prunus illustrative, Zheng Xiao Feng, Guo hongwei. modern molecular biology (5 th edition) [ J ] Life world, 2019(07): 2.). DNA molecular labeling has significant advantages over previous labeling: expressed in the form of DNA, is not limited by development stage and external environmental factors, and can be detected in each tissue of organisms; the markers are uniformly distributed in the whole genome, have a large number and high polymorphism, are ubiquitous in nature, do not need to be artificially transformed, do not influence the natural state of species, and have quick and simple detection means of DNA molecular markers (Li is civilian, Li Shufeng, Li and Li.) the application progress of DNA molecular markers in the genetic diversity research of wild soybeans [ J ]. Chinese agronomy report, 2014,30(21):246 and 250.); most of the DNA markers are neutral in selection and do not influence the biological properties.

The DNA molecular marker technology represented by a Single Nucleotide Polymorphism (SNP) marker is one of molecular marker technologies. The significant advantages of SNP tagging technology compared to existing tags (zhuibo, zhanli. single nucleotide polymorphism and its progress in the field of veterinary medicine [ J ] modern veterinary medicine 2020(07): 48-51.): because SNP loci mostly exist in a form of two-locus polymorphism in diploid organisms, the frequency of alleles at each locus is easy to estimate; because of the property of single nucleotide mutation, the SNP marker exists in a large amount in the whole gene range, and compared with a DNA molecular marker-microsatellite marker which is widely applied, the SNP marker has wider distribution and more stability in a genome; the SNP mutation in the coding region can be divided into two types, namely, synonymous mutation and non-synonymous mutation, and the SNP of the non-synonymous mutation can cause the change of protein structure or gene expression level due to the change of the protein sequence of a gene transcription translation product, so that the research significance for the SNP of the coding region is more important; the screening process of SNP markers can be realized quickly and massively, the subsequent data analysis process is basically automatic, and is very important for shortening the research period (Fangxuan Jun, agriculture, Wu Ren., Tangjie, Biogenetics. crop DNA marker assisted breeding [ M ]. scientific publishing agency, 2001 ]); SNP markers are unevenly distributed at the genome-wide level, and SNP mutations in nature mostly occur in non-coding regions of the genome. With the development of high-throughput sequencing technology, SNP screening based on sequencing means is becoming a focus of attention of researchers.

Although the whole genome re-sequencing can obtain the most comprehensive genome variation information, the sequencing cost is still high if the whole genome re-sequencing is applied to the large-scale analysis of hundreds or even thousands of individuals. The advent of simplified genomic or low-depth re-sequencing technologies, while better reducing sequencing costs, has made it difficult to achieve comprehensive coverage and typing of many important trait-related known genes or gene pathways (Ruiqiang Li, Yingrui Li, Xiaoodong Fang, Huangming Yang, Jian Wang, Karsten Kristiansen, Jun Wang. SNP detection for mapping parallel vector calculation [ J ]. Cold Spring Harbor Laboratory Press,2009,19(6) Xiangyang Xu, Guiia Bai. grain-genome calculation: mapping of the features of SNP detection, Molecular mapping and distribution [ J ]. 35) due to the randomness of their sequencing sites. The gene chip technology is a highly accurate and reproducible target site typing technology ([1] Jay Shend, Robi D.Mitra, Chris Varma, George M.Church.advanced sequencing technologies: methods and metals [ J ]. Nature Reviews Genetics,2004,5 (5)), widely used in breeding research of model organisms or crops, livestock and poultry (Andrea Kranis, Almas A Gheyas, Clarisa Boschiero, France Turner, Leyu, Sarah Smith, Richard Talbot, Ali Pirani, Fiona Brew, Pete Kaiser, Paul M Hocking, Mark Fife, Nigel Salhol, Jason net, Timm M Strom, Georg flare, Wei Kaiser 201, Paul M Hocking, and so on, and the like, but lacks of the commercial culture gene of Huang variety of Huang Kaiser, Shi et al SNP (Biotech SNP of Huang variety J.), luyang Sun, Yun Li, Fanyue Sun, Yanliang Jiang, Yu Zhang, Jianren Zhang, Jianbin Feng, Ludmilla Kaltenboeck, Huseyin Kucuktas, Zhanjiang Liu. development of the configuration 250K SNP array for gene-side association students [ J ]. BioMed Central,2014,7(1), which is expensive to customize, and the solid phase chip is difficult to meet the application requirement of site flexible selection. The HD-marker technology is a genotyping technology based on liquid phase molecular hybridization. The technology realizes high-throughput screening and analysis of up to ten thousand known gene variation sites by high-integration probe hybridization-extension-connection reaction in a single PCR tube compared with a solid-phase chip platform. The technology effectively combines the advantages of site selection flexibility of liquid phase hybridization reaction, high flux and low cost of a high-flux sequencing platform, breaks through the technical bottlenecks of high cost, poor flexibility, difficulty in large-scale application and the like of the conventional solid-phase custom chip platform, and provides a high-efficiency and flexible targeted genotyping technology compatible with different flux levels and different marker types for non-model organisms. (Lv J, Jiano W, Guo H, et al. HD-Marker: a high-throughput multiplexed and flexible approach for Targeted generation of a more than 10,000genes in a single-tube approach. genome Research,2018,28(12): 1919-1930.Zhu X, Wang J, Lv J, et al.sequence-Based transcript-wire Targeted generation for evolution and economic students. evolution Bioinformatics, 2019,15(1176934319836074.) because HD-Marker has significant advantages in targeting accuracy, flexibility, cost, etc., it is a high-throughput Marker technology with application in Stichopus japonicus molecular breeding potential.

Apostichopus japonicus (Apostichopus japonicus), also known as Apostichopus japonicus, belongs to Echinodermata (Echinodermata), Holothuroidea (Holothuroidea), tenosynovitidae (Aspidochirotidae), Apostichopus (Stichopodidae), Apostichopus (Apostichopus) (Polygonum jade. Chinese annals: Echinodermata. Holothuroidea [ M ]. scientific Press, 1997.). The apostichopus japonicus culture is a very important component in the world aquaculture industry, the total culture yield and area of the apostichopus japonicus are continuously increased since 2003, the increased culture area reaches 24.67 hectares and the annual output reaches 17.17 ten thousand tons in 2019, the direct economic output value exceeds 300 hundred million yuan, the apostichopus japonicus culture becomes one of the species with higher output value of a single variety of the marine culture in China, and the apostichopus japonicus culture also becomes a main part of a fifth new sea culture wave following algae, prawns, shellfish and fish, and becomes a prop industry of the economic structure of the coastal fishing industry. According to incomplete statistics, 70 million persons are absorbed by the stichopus japonicus breeding industry at present, huge capital investment is accumulated, the development of related industries such as processing, feed, health-care food and the like is driven, and a new way is opened for coastal economic structure adjustment and fisherman employment and income increase. The present situation, the problems and the countermeasure discussion (on) of the sea cucumber breeding industry in China [ J ] scientific fish farming 2021(02):24-25.) but the explosive breeding scale amplification leads to the increasingly serious inbreeding depression caused by inbreeding (PA Hohenlohe et al, 2010), and the germplasm resources of the apostichopus japonicus seriously decline, thereby seriously affecting the development of the apostichopus japonicus breeding industry. In addition, apostichopus japonicus is extremely sensitive to temperature change, and the physiological activity of apostichopus japonicus is changed along with the change of water temperature. Generally, the food intake and movement of the apostichopus japonicus can be reduced when the water temperature is higher, namely when the water temperature reaches 18 ℃; when the water temperature reaches above 20 ℃, the life of apostichopus japonicus is changed, they move to the middle of the reef in deeper water, the food intake and the exercise amount are reduced to a very low level, and even the food intake and the exercise are stopped, and the apostichopus japonicus enters a state of sleep (F Li et al, 1996). Hibernation is a hallmark biological property of Apostichopus japonicus and is a self-protective stress activity for maintaining basic survival generated by individuals adapting to high temperature environment (F Li et al, 1996; Y Liu et al, 1996). In 2018, in summer, Liaoning mostly keeps high temperature, and the highest temperature in local areas even breaks through 40 ℃, so that the holothurian culture industry encounters unprecedented attack. The sea cucumber culturing area of Liaoning province is 184.4 ten thousand mu, and the pond culturing area is 98.5 ten thousand mu. At present, the primary statistics shows that the disaster area of the sea cucumber is 95 ten thousand mu, the lost yield is 6.8 ten thousand tons, and 68.7 hundred million yuan RMB is directly lost economically.

In view of the above, the need for applying molecular biological means aiming at improving germplasm resources in echinoderm is very important and urgent. However, a stable and efficient liquid chip is still lacked in the conventional molecular breeding research of the apostichopus japonicus, and the liquid chip designed based on the HD-Marker technology can provide an important technical means for the economic character research, variety identification and Marker-assisted breeding of the apostichopus japonicus, and can meet the requirement of large-scale commercial breeding of the apostichopus japonicus.

The invention content is as follows:

the invention aims to provide a convenient and efficient tool for typing target sites in the research of economically cultured animals or crops by using the SNP chip technology, but no commercial SNP chip is available in non-model organisms such as apostichopus japonicus and the like. Provides a reliable technical platform for developing the economic character selection of the apostichopus japonicus and other related genetic breeding work, promotes the development of the molecular breeding of the apostichopus japonicus in China, and simultaneously provides preliminary theoretical reference and method guidance for the design and development of chips of other aquatic organisms. The research develops high-flux SNP markers for the groups of the apostichopus japonicus in different regions, and carries out the fine positioning of growth and high-temperature resistance; the SNP chip containing 50k sites is preliminarily designed and developed, so that the character genetic parameters of the apostichopus japonicus can be quickly and accurately evaluated, an efficient and reliable technical means is provided for guiding researchers to carry out molecular breeding work of the apostichopus japonicus, and the requirements of excellent variety breeding engineering of the apostichopus japonicus are met.

The invention provides a 50K SNP chip of a whole genome for breeding apostichopus japonicus, which comprises an SNP marker combination for breeding the apostichopus japonicus and a liquid phase breeding chip for breeding the apostichopus japonicus, wherein the SNP marker combination for breeding the anti-high temperature character of the apostichopus japonicus consists of 48755 SNP sites, the nucleotide sequences of the SNPs are respectively sequences shown as SEQ No.001-SEQ ID No.48755, the length of the SNP sites is 49bp, the liquid phase breeding chip for breeding the apostichopus japonicus comprises 48755 pairs of probe sequences, each SNP site corresponds to two probe sequences, and the SNP sites are respectively a Forward probe and a Reverse probe.

The invention also provides a preparation method of the apostichopus japonicus breeding whole genome 50K SNP chip and related applications of the chip, and particularly relates to the functions of the apostichopus japonicus breeding whole genome chip in genetic background analysis of different populations of apostichopus japonicus and the applications of the apostichopus japonicus breeding whole genome chip in apostichopus japonicus character association analysis.

A preparation method of a low-density 12K SNP chip for high-temperature resistant breeding of apostichopus japonicus comprises the following steps:

1. construction of Apostichopus japonicus sample population

In different sea areas of Liaoning and Shandong, 500 apostichopus japonicus samples are randomly selected, tissues are taken and placed in 95% ethanol, and the tissues are brought back to a laboratory for storage and standby.

2. Apostichopus japonicus whole genome-wide SNP typing

2.1DNA extraction

500ul STE lysis buffer (100mM NaCl; 10mM Tris-Cl, pH 8.0; 1mM EDTA, pH8.0), 50ul 10% SDS,3.5ul proteinase K (20mg/ml) and 16ul RNase A (100mg/ml) are added into a 1.5ml tube, about 0.1 g of scallop adductor muscle is taken, added, cut, ground by a grinding rod to be flocculent, treated at 56 ℃ for about 2 hours, and evenly mixed by inverting at intervals of 30mins during the period, and finally the lysate is in a clear state.

② 500ul of Tris saturated phenol, 100ul of chloroform/isoamyl alcohol (24:1) was added, gently shaken for 20min, and centrifuged at 12000rpm at room temperature for 10 min.

③ the supernatant was removed into a new 1.5ml EP tube, 300. mu.l of Tris saturated phenol, 300. mu.l of chloroform/isoamyl alcohol (24:1) were added thereto, and the mixture was gently shaken for 20mins and centrifuged at 12000rpm at room temperature for 10 minutes.

And fourthly, repeating the step three to two times until no protein layer exists.

Fifthly, extracting supernatant, adding equal volume of chloroform/isoamyl alcohol, about 500ul, gently shaking for 20min, centrifuging at room temperature of 8000rpm, and centrifuging at normal temperature for 10 min.

Sixthly, 1ml of ice absolute ethyl alcohol and 50ul of sodium acetate (3M) are added into the supernatant, the mixture is placed at 20 ℃ for 40min, and the mixture is centrifuged at 12000rpm for 10min, so that the nucleic acid is precipitated.

Seventhly, discarding the supernatant, washing and precipitating with 70% ethanol for 2 times, and centrifuging at 8000rpm for 5min each time.

Eighty percent (r) drying until the ethanol is completely volatilized, adding 30uL ddH2O for dissolution, and then adding 0.75uL RNase for RNA digestion at 37 ℃ for 1.5 h.

Ninthly, quantifying the DNA by using a Qubit kit, and detecting the quality of the DNA by using 1% agarose gel electrophoresis. The extracted DNA was stored at-20 ℃ for further use.

2.2 library construction and sequencing

The extracted apostichopus japonicus genome DNA is broken by a Covaris breaker, the breaking range is set to be about 350bp, the end of a DNA fragment is repaired by adding A by using a genome DNA library building kit, then, the two ends are connected with a linker for amplification, and a primer with Barcode is used for library amplification to complete the construction of the library. Library quantification was performed using Qubit 2.0. Sending the library to a sequencing company for quantifying the size of the insert of the library and the effective concentration of the library, and sequencing the library on an Illumina HiSeq X Ten PE150 platform after the quality inspection is qualified.

2.3 processing and comparative typing of the resequencing data

(ii) establishing an index with reference to the genome

The index of the reference sequence is constructed using the index command of the Bwa software, the index command of samtools, and createsequenetdictionary.

② sequence comparison

Comparing the double-end sequencing reads by using an Bwa-mem command to generate a bam file, and sequencing by using a sort command of samtools to generate a sequenced bam file.

③ removing PCR duplicate

Due to the possible preference during PCR, fragments at certain positions are over-amplified, so that a large amount of redundant sequences exist at the positions, which causes typing errors, and therefore, the PCR duplicate is removed to eliminate false positive sequences generated during PCR experiments. Jar command is used to discard the duplicate sequence by setting the parameter REMOVE _ DUPLICATES _ true.

Establishing indexes of Bam files

And establishing an index for the bam file generated by each individual by utilizing the Samtools index, and preparing the file for the subsequent GATK process.

Fifthly, GATK typing

The method comprises the steps of firstly generating a gvcf file for each sample, then carrying out joint-genetic of a group, and correcting individual variation and genotype data according to variation information of the group.

3. Apostichopus japonicus 50K SNP marker screening

3.1SNP marker Primary filtration

And sequentially filtering the generated original mutation sites according to the following steps to generate a high-quality SNP data set.

Selecting out two-state SNP locus

② filtering out the region with too dense SNP, i.e. more than 3 SNP sites in 10bp window (Bowen et al, 2011)

③ according to the filtering parameters of hard filtering recommended by official network, filtering out the low quality SNP sites, namely QD <2.0, FS >20.0, MQ <40.0, DP <6.0, DP >1000.0, MQRankSum < -12.5, ReadPosRankSum < -8.0.

And fourthly, filtering the sites with the minimum allele less than 0.05.

Fifthly, finally obtaining 967 ten thousand SNP of high quality, and carrying out the next low-density SNP selection.

3.2SNP marker optimized selection

Estimating reliable genetic parameters for molecular breeding of individual animals requires further screening of a preliminary filtered set of SNPs for a suitable set of SNPs. Two conditions are generally required: first, the SNPs screened are SNPs common to different geographical populations of the species; secondly, the selected SNP has higher information content, and can accurately evaluate the individual SNP gene effect and breeding value.

Firstly, screening out SNPs shared by different groups, then reducing or deleting the SNPs in high linkage disequilibrium in the shared SNPs, and adopting R2>0.35 of LD as a deletion SNP scale, and the result shows that the scale is used for screening the SNPs on the premise of keeping the estimation accuracy of SNP breeding parameters, so that the number of needed SNP markers can be obviously reduced.

Secondly, screening the SNP with high information content can construct an SNP selection optimization model under constraint conditions according to different statistical indexes and comprehensive indexes such as Wright FST, the average Euclidean distance of SNP gene frequency, information entropy and the like, and optimize and solve the package by utilizing R software to obtain the SNP gene frequency containing the SNP with high information content, wherein the SNP can be distributed on a genome more uniformly.

Thirdly, obtaining SNP sites which are related to the skin weight and the high temperature resistance by GWAS Pval value by utilizing GWAS correlation analysis according to the cloned SNP sites which are related to the important growth and the function of the genes related to the high temperature resistance of the apostichopus japonicus and reported by the prior literature.

And fourthly, annotating the high-quality SNP by using SnpEff software, determining the gene element of the SNP, determining the influence on the change of the amino acid and the like.

Finally, according to the first to fourth steps, the total number of SNPs after selection was controlled to about 5 ten thousand markers.

4. Design and development probe design of HD-marker high-density chip

4.1 design and screening of Targeted probes

According to the design idea of an HD-marker probe pool, a base sequence of 22bp at the upstream and a sequence of 22bp at the downstream of an SNP locus are selected as specific probes of the locus. Design and evaluation of the flanking probes were performed according to the following HD-Marker probe design principle:

the flanking probe sequences need to meet a GC content of between 40% and 60%

A Tm value of 55 to 65 ℃,

regions within the flanking probes that cannot have more than 5 contiguous bases

The region with more than 80% flanking sequence match cannot be greater than 5

The number of variation sites within the flanking sequences of the probe does not exceed 3.

And finally, the number of sites meeting the design standard is 48755, and the site information passing the design standard is integrated to form an HD-marker liquid chip pool containing the site information, the probe sequence and the annotation information. The chip is derived from 22778 sites on the gene region, the number of the covered genes is 6955, and the sites of the intergenic region are 25977.

Site distribution on apostichopus japonicus liquid phase chip

4.2 sequence Synthesis of Probe pools

In order to enable a target site with a primer to be combined in the subsequent PCR amplification, the 5 'end of an upstream probe and the 3' end of a downstream probe are respectively connected with a 22bp primer sequence universal for Illumina platform sequencing to form a Forward probe and a Reverse probe.

Taking the Illumina sequencing platform as an example, the flanking hybridization probe Forward has the following structure:

CCTACACGCTCTTCCGATCTXXXXXXXXXXXXXXXXX, flanking hybridization probe Reverse, having the structure: XXXXXXXXXXXXXXXXXAGATCGGAAGAGCAACGCATCTGTGA. Wherein X and Y represent specific sequences flanking the site.

And (3) collecting the F probes at all the sites to form an F probe pool, collecting the R probes at all the sites to form an R probe pool, and synthesizing the F probe pool and the R probe pool to obtain a liquid phase chip pool with 48k sites.

4.3 detection of the tattoo chip

Firstly, extracting the DNA of the apostichopus japonicus: extracting the genome DNA of the muscle tissue by using a Tiangen plant genome extraction kit (RT405-12),

detecting the quality of the DNA sample: detecting the integrity of the DNA band by using 1% agarose gel electrophoresis; the concentration was measured with a NanoDrop microanalyzer and the DNA concentration was adjusted to 100ng/ul.

Detecting an HD-Marker chip: and preparing HD-marker libraries of 100 apostichopus japonicus DNA samples according to an HD-marker standard experimental flow. The concentration of the library is detected by a Qubit4 spectrophotometer, the concentration of the library is uniform between 8.9 ng/ul and 10.6ng/ul, and the quality of the library meets the sequencing requirement.

Chip performance detection and analysis:

liquid phase chip efficiency analysis: the efficiency of the chip is evaluated from several indexes of site targeting, capture rate, accuracy rate, uniformity and the like. The results show that the capture efficiency of the sites in all samples can reach more than 97%, the proportion of the typing sites is more than 95%, the sequencing depth of the sites has higher consistency, and the Pearson coefficient of the depth consistency of repeated samples can reach more than 0.96. The site typing accuracy was also between 90% -94% compared to the standard WGS library data. The result shows that the Apostichopus japonicus selenka 50k liquid phase chip has a better parting effect.

An application of a low-density 12K SNP chip for high-temperature resistant breeding of apostichopus japonicus comprises the following steps:

1. genetic background analysis of different populations of apostichopus japonicus: screening loci with the group typing rate of more than 90 percent and the minimum allele frequency of more than 0.05 to obtain 46232 genotype information of 46232 high-quality loci, and carrying out individual cluster analysis on the apostichopus japonicus by using typing data.

2. The application of the apostichopus japonicus character correlation analysis comprises the following steps: the 50KSNP chip developed by the people is utilized to carry out whole genome analysis on the sites related to the growth and stress resistance of the apostichopus japonicus, and further proves that the 50KSNP chip covers the mutant sites and the markers related to the growth and stress resistance of the apostichopus japonicus.

The invention has the beneficial effects that:

(1) the liquid phase chip of the invention has good targeting property, higher consistency in the depth of site sequencing, Pearson coefficients of consistency in the depth of repeated samples which can reach more than 0.96, high accuracy of site typing and better typing effect.

(2) The invention can obtain 46232 genotype information of 46232 high-quality loci by screening loci with group typing rate more than 90 percent and minimum allele frequency more than 0.05, and utilizes typing data to carry out individual cluster analysis on the apostichopus japonicus, and the result clearly shows that the SNP loci covered by the apostichopus japonicus chip have better polymorphism in the apostichopus japonicus group, can be applied to genetic background analysis of apostichopus japonicus materials, and is an SNP chip universal to the group.

(3) The invention can utilize the Apostichopus japonicus 50KSNP chip to carry out whole genome analysis on the Apostichopus japonicus growth and stress resistance character related sites, wherein 7 significant sites with the P <1E-07 are positioned in the No. 3 linkage group and the No. 11 linkage group, and the 50KSNP chip is proved to cover the Apostichopus japonicus growth and stress resistance character related mutation sites and markers.

(4) The invention can provide a reliable technical platform for developing the relevant genetic breeding work such as the selection of the high-temperature resistance character of the apostichopus japonicus, promotes the development of the aquaculture industry in China, and simultaneously provides preliminary theoretical reference and method guidance for the design and development of chips of other aquatic organisms.

Description of the drawings:

for ease of illustration, the invention is described in detail by the following detailed description and the accompanying drawings.

FIG. 1 is a phylogenetic tree of an apostichopus japonicus sample according to the present invention;

FIG. 2 is a whole genome analysis Manhattan diagram of the present invention;

the specific implementation mode is as follows:

example 1: development of 50kSNP chip for full genome range of apostichopus japonicus

1. Construction of Apostichopus japonicus sample population

In different sea areas of Liaoning and Shandong, 500 apostichopus japonicus samples are randomly selected, tissues are taken and placed in 95% ethanol, and the tissues are brought back to a laboratory for storage and standby.

2. Apostichopus japonicus whole genome-wide SNP typing

2.1DNA extraction

500ul STE lysis buffer (100mM NaCl; 10mM Tris-Cl, pH 8.0; 1mM EDTA, pH8.0), 50ul 10% SDS,3.5ul proteinase K (20mg/ml) and 16ul RNase A (100mg/ml) are added into a 1.5ml tube, about 0.1 g of scallop adductor muscle is taken, added, cut, ground by a grinding rod to be flocculent, treated at 56 ℃ for about 2 hours, and evenly mixed by inverting at intervals of 30mins during the period, and finally the lysate is in a clear state.

② 500ul of Tris saturated phenol, 100ul of chloroform/isoamyl alcohol (24:1) was added, gently shaken for 20min, and centrifuged at 12000rpm at room temperature for 10 min.

③ the supernatant was removed into a new 1.5ml EP tube, 300. mu.l of Tris saturated phenol, 300. mu.l of chloroform/isoamyl alcohol (24:1) were added thereto, and the mixture was gently shaken for 20mins and centrifuged at 12000rpm at room temperature for 10 minutes.

And fourthly, repeating the step three to two times until no protein layer exists.

Fifthly, extracting supernatant, adding equal volume of chloroform/isoamyl alcohol, about 500ul, gently shaking for 20min, centrifuging at room temperature of 8000rpm, and centrifuging at normal temperature for 10 min.

Sixthly, 1ml of ice absolute ethyl alcohol and 50ul of sodium acetate (3M) are added into the supernatant, the mixture is placed at 20 ℃ for 40min, and the mixture is centrifuged at 12000rpm for 10min, so that the nucleic acid is precipitated.

Seventhly, discarding the supernatant, washing and precipitating with 70% ethanol for 2 times, and centrifuging at 8000rpm for 5min each time.

Eighty percent (r) drying until the ethanol is completely volatilized, adding 30uL ddH2O for dissolution, and then adding 0.75uL RNase for RNA digestion at 37 ℃ for 1.5 h.

Ninthly, quantifying the DNA by using a Qubit kit, and detecting the quality of the DNA by using 1% agarose gel electrophoresis. The extracted DNA was stored at-20 ℃ for further use.

2.2 library construction and sequencing

The extracted apostichopus japonicus genome DNA is broken by a Covaris breaker, the breaking range is set to be about 350bp, the end of a DNA fragment is repaired by adding A by using a genome DNA library building kit, then, the two ends are connected with a linker for amplification, and a primer with Barcode is used for library amplification to complete the construction of the library. Library quantification was performed using Qubit 2.0. Sending the library to a sequencing company for quantifying the size of the insert of the library and the effective concentration of the library, and sequencing the library on an Illumina HiSeq X Ten PE150 platform after the quality inspection is qualified.

2.3 processing and comparative typing of the resequencing data

(ii) establishing an index with reference to the genome

The index of the reference sequence is constructed using the index command of the Bwa software, the index command of samtools, and createsequenetdictionary.

② sequence comparison

Comparing the double-end sequencing reads by using an Bwa-mem command to generate a bam file, and sequencing by using a sort command of samtools to generate a sequenced bam file.

③ removing PCR duplicate

Due to the possible preference during PCR, fragments at certain positions are over-amplified, so that a large amount of redundant sequences exist at the positions, which causes typing errors, and therefore, the PCR duplicate is removed to eliminate false positive sequences generated during PCR experiments. Jar command is used to discard the duplicate sequence by setting the parameter REMOVE _ DUPLICATES _ true.

Establishing indexes of Bam files

And establishing an index for the bam file generated by each individual by utilizing the Samtools index, and preparing the file for the subsequent GATK process.

Fifthly, GATK typing

The method comprises the steps of firstly generating a gvcf file for each sample, then carrying out joint-genetic of a group, and correcting individual variation and genotype data according to variation information of the group.

3. Apostichopus japonicus 50K SNP marker screening

3.1SNP marker Primary filtration

And sequentially filtering the generated original mutation sites according to the following steps to generate a high-quality SNP data set.

Selecting out two-state SNP locus

② filtering out the region with too dense SNP, i.e. more than 3 SNP sites in 10bp window (Bowen et al, 2011)

③ according to the filtering parameters of hard filtering recommended by official network, filtering out the low quality SNP sites, namely QD <2.0, FS >20.0, MQ <40.0, DP <6.0, DP >1000.0, MQRankSum < -12.5, ReadPosRankSum < -8.0.

And fourthly, filtering the sites with the minimum allele less than 0.05.

Fifthly, finally obtaining 967 ten thousand SNP of high quality, and carrying out the next low-density SNP selection.

3.2SNP marker optimized selection

Estimating reliable genetic parameters for molecular breeding of individual animals requires further screening of a preliminary filtered set of SNPs for a suitable set of SNPs. Two conditions are generally required: first, the SNPs screened are SNPs common to different geographical populations of the species; secondly, the selected SNP has higher information content, and can accurately evaluate the individual SNP gene effect and breeding value.

Firstly, screening out SNPs shared by different groups, then reducing or deleting the SNPs in high linkage disequilibrium in the shared SNPs, and adopting R2>0.35 of LD as a deletion SNP scale, and the result shows that the scale is used for screening the SNPs on the premise of keeping the estimation accuracy of SNP breeding parameters, so that the number of needed SNP markers can be obviously reduced.

Secondly, screening the SNP with high information content can construct an SNP selection optimization model under constraint conditions according to different statistical indexes and comprehensive indexes such as Wright FST, the average Euclidean distance of SNP gene frequency, information entropy and the like, and optimize and solve the package by utilizing R software to obtain the SNP gene frequency containing the SNP with high information content, wherein the SNP can be distributed on a genome more uniformly.

Thirdly, obtaining SNP sites which are related to the skin weight and the high temperature resistance by GWAS Pval value by utilizing GWAS correlation analysis according to the cloned SNP sites which are related to the important growth and the function of the genes related to the high temperature resistance of the apostichopus japonicus and reported by the prior literature.

And fourthly, annotating the high-quality SNP by using SnpEff software, determining the gene element of the SNP, determining the influence on the change of the amino acid and the like.

Finally, according to the first to fourth steps, the total number of SNPs after selection was controlled to about 5 ten thousand markers.

4. Design and development probe design of HD-marker high-density chip

4.1 design and screening of Targeted probes

According to the design idea of an HD-marker probe pool, a base sequence of 22bp at the upstream and a sequence of 22bp at the downstream of an SNP locus are selected as specific probes of the locus. Design and evaluation of the flanking probes were performed according to the following HD-Marker probe design principle:

the flanking probe sequences need to meet a GC content of between 40% and 60%

A Tm value of 55 to 65 ℃,

regions within the flanking probes that cannot have more than 5 contiguous bases

The region with more than 80% flanking sequence match cannot be greater than 5

The number of variation sites within the flanking sequences of the probe does not exceed 3.

And finally, the number of sites meeting the design standard is 48755, and the site information passing the design standard is integrated to form an HD-marker liquid chip pool containing the site information, the probe sequence and the annotation information. The chip is derived from 22778 sites on the gene region, the number of the covered genes is 6955, and the sites of the intergenic region are 25977.

Site distribution on apostichopus japonicus liquid phase chip

4.2 sequence Synthesis of Probe pools

In order to enable a target site with a primer to be combined in the subsequent PCR amplification, the 5 'end of an upstream probe and the 3' end of a downstream probe are respectively connected with a 22bp primer sequence universal for Illumina platform sequencing to form a Forward probe and a Reverse probe.

Taking the Illumina sequencing platform as an example, the flanking hybridization probe Forward has the following structure:

CCTACACGCTCTTCCGATCTXXXXXXXXXXXXXXXXX, flanking hybridization probe Reverse, having the structure: XXXXXXXXXXXXXXXXXAGATCGGAAGAGCAACGCATCTGTGA. Wherein X and Y represent specific sequences flanking the site.

And (3) collecting the F probes at all the sites to form an F probe pool, collecting the R probes at all the sites to form an R probe pool, and synthesizing the F probe pool and the R probe pool to obtain a liquid phase chip pool with 48k sites.

4.3 detection of the tattoo chip

Firstly, extracting the DNA of the apostichopus japonicus: extracting the genome DNA of the muscle tissue by using a Tiangen plant genome extraction kit (RT405-12),

detecting the quality of the DNA sample: detecting the integrity of the DNA band by using 1% agarose gel electrophoresis; the concentration was measured with a NanoDrop microanalyzer and the DNA concentration was adjusted to 100ng/ul.

Detecting an HD-Marker chip: and preparing HD-marker libraries of 100 apostichopus japonicus DNA samples according to an HD-marker standard experimental flow. The concentration of the library is detected by a Qubit4 spectrophotometer, the concentration of the library is uniform between 8.9 ng/ul and 10.6ng/ul, and the quality of the library meets the sequencing requirement.

Chip performance detection and analysis:

liquid phase chip efficiency analysis: the efficiency of the chip is evaluated from several indexes of site targeting, capture rate, accuracy rate, uniformity and the like. The results show that the capture efficiency of the sites in all samples can reach more than 97%, the proportion of the typing sites is more than 95%, the sequencing depth of the sites has higher consistency, and the Pearson coefficient of the depth consistency of repeated samples can reach more than 0.96. The site typing accuracy was also between 90% -94% compared to the standard WGS library data. The result shows that the Apostichopus japonicus selenka 50k liquid phase chip has a better parting effect.

Example 2: application of Apostichopus japonicus 50K SNP chip in molecular breeding

In order to verify the application effect of the apostichopus japonicus chip in the apostichopus japonicus molecular breeding, a 50K chip is used for detecting samples from Russia (30), Dalian (100) and Shandong (100), and the molecular breeding is carried out:

(1) analyzing genetic backgrounds of different geographical groups; (2) whole genome association analysis (GWAS) is carried out on the number of the apostichopus japonicus meat thorn related to growth

1. The Apostichopus japonicus whole genome breeding chip has the following functions in genetic background analysis of Apostichopus japonicus of different populations:

screening the loci with the group typing rate of more than 90 percent and the minimum allele frequency of more than 0.05 to obtain 46232 genotype information of 46232 high-quality loci, and carrying out individual clustering analysis on the apostichopus japonicus by using typing data, wherein as shown in figure 1, the positions of 12 groups in a sample on the phylogenetic tree are clear and the classification is definite. The result shows that the SNP locus covered by the apostichopus japonicus chip has better polymorphism in the apostichopus japonicus population, can be applied to the genetic background analysis of the apostichopus japonicus material, and is a general SNP chip for the population.

2. The application of the apostichopus japonicus whole genome breeding chip in the apostichopus japonicus character correlation analysis comprises the following steps:

by utilizing the apostichopus japonicus 50KSNP chip of the technical scheme, the whole genome analysis is carried out on the apostichopus japonicus growth and stress resistance character related sites, and the result is shown in figure 2, 7 significant sites with P <1E-07 are positioned in the No. 3 linkage group and the No. 11 linkage group, which proves that the 50KSNP chip covers the apostichopus japonicus growth and stress resistance character related mutation sites and marks.

While there have been shown and described what are at present considered to be the fundamental principles of the invention and its essential features and advantages, it will be understood by those skilled in the art that the invention is not limited by the embodiments described above, which are merely illustrative of the principles of the invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the invention as defined by the appended claims and their equivalents.

Sequence listing

32页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种与斑点叉尾鮰生长相关的SNP标记及其应用

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!