Primer design method, device and application for selective whole genome amplification

文档序号:170912 发布日期:2021-10-29 浏览:27次 中文

阅读说明:本技术 用于选择性全基因组扩增的引物设计方法、装置及应用 (Primer design method, device and application for selective whole genome amplification ) 是由 李英镇 林宇锋 李晴晴 于 2020-04-29 设计创作,主要内容包括:本申请公开了一种用于选择性全基因组扩增的引物设计方法、装置及应用。本申请的方法包括,引物设计参数设置步骤,预先设置引物长度、引物组合个数,引物组合在目标基因或背景基因的平均距离、最大距离、分布均匀性;引物设计步骤,利用k-mer在背景基因与目标基因中的差异设计引物;引物组合筛选步骤,输出最优的N个引物组合;引物组合测评步骤,模拟各引物组合对各目标基因的覆盖率,预计可能丢失的区域;结果分析步骤,输出每组实验的目标基因组覆盖率和深度图,以便根据实验情况筛选引物组合。本申请的方法,可设计能同时富集多个目标基因的引物组合,能有效提高目标基因覆盖度,在同等条件下比现有的引物设计流程缩短约80%的时间。(The application discloses a primer design method, a device and application for selective whole genome amplification. The method comprises the steps of setting primer design parameters, presetting primer length, primer combination number, average distance, maximum distance and distribution uniformity of the primer combination in a target gene or a background gene; designing a primer by utilizing the difference of the k-mer in a background gene and a target gene; a primer combination screening step, namely outputting the optimal N primer combinations; a primer combination evaluation step, which simulates the coverage rate of each primer combination to each target gene and predicts the regions which are likely to be lost; and a result analysis step, namely outputting the target genome coverage rate and the depth map of each group of experiments so as to screen the primer combination according to the experimental condition. The method can design a primer combination capable of enriching a plurality of target genes simultaneously, can effectively improve the coverage of the target genes, and shortens about 80% of time compared with the existing primer design process under the same condition.)

1. A method of primer design for selective whole genome amplification, comprising: comprises the following steps of (a) carrying out,

setting primer design parameters, including presetting the length range of primers, the number range of primer combinations, the average distance of the primer combinations in a target genome, the maximum distance of the primer combinations in the target genome, the distribution uniformity score of the primer combinations in the target genome and the average distance of the primer combinations in a background genome;

a primer design step, which comprises the steps of designing primers according to the parameters set in the primer design parameter setting step by utilizing the difference of the sliding interval of the genome sequence in the background genome and the target genome, and not screening the uniform distribution of each designed primer in each target genome;

a primer combination screening step, which comprises screening the primers obtained in the primer design step by using an exhaustion method according to the parameters set in the primer design parameter setting step, and outputting the optimal N primer combinations for subsequent evaluation according to the ratio of the number of the binding sites of the primer combinations in the target genome to the number of the binding sites of the primer combinations in the background genome, the maximum distance of the primer combinations in the target genome, the average distance of the primer combinations in the target genome and the average distance of the primer combinations in the background genome;

a primer combination evaluation step, which comprises simulating the primer coverage degree of each primer combination to different target genomes according to the parameters set in the primer design parameter setting step, and predicting the regions possibly lost in the experiment so as to facilitate the screening of the primer combinations;

and a result analysis step, which comprises the steps of setting the parameters set in the step according to the primer design parameters and outputting the target genome coverage rate and depth map of each primer combination according to the results of the primer combination evaluation step, so as to reflect the experimental condition of each primer combination in different target genomes, and conveniently screen the primer combinations according to the experimental condition.

2. The method of designing a primer according to claim 1, wherein: the primer design step further comprises screening the hairpin structure of each designed primer;

preferably, the primer combination screening step further comprises screening primer combinations according to whether primer dimers and the kini coefficients are formed, counting parameters of each primer combination in each target genome when a plurality of target genomes are provided, and evaluating each primer combination according to a ratio of (frequency of target genome in comparison x kini coefficient) divided frequency of background genome in comparison.

3. A primer design apparatus for selective whole genome amplification, characterized in that: comprises a primer design parameter setting module, a primer design module, a primer combination screening module, a primer combination evaluation module and a result analysis module,

the primer design parameter setting module is used for setting the length range of the primer, the number range of the primer combination, the average distance of the primer combination in the target genome, the maximum distance of the primer combination in the target genome, the distribution uniformity score of the primer combination in the target genome and the average distance of the primer combination in the background genome;

the primer design module is used for designing primers according to the parameters set by the primer design parameter setting module by utilizing the difference of the sliding interval of the genome sequence in the background genome and the target genome without screening the uniform distribution of each designed primer in each target genome;

the primer combination screening module is used for screening the primers obtained by the primer design module according to the parameters set by the primer design parameter setting module by using an exhaustion method, and outputting the optimal N primer combinations for subsequent evaluation according to the ratio of the number of the binding sites of the primer combinations on the target genome to the number of the binding sites of the primer combinations on the background genome, the maximum distance of the primer combinations on the target genome, the average distance of the primer combinations on the target genome and the average distance of the primer combinations on the background genome;

the primer combination evaluation module simulates the primer coverage degree of each primer combination to different target genomes according to the parameters set by the primer design parameter setting module, and predicts the regions possibly lost in the experiment so as to facilitate the screening of the primer combinations;

the result analysis module comprises parameters set by the primer design parameter setting module and results of the primer combination evaluation module, and outputs the target genome coverage rate and the depth map of each primer combination so as to reflect the experimental condition of each primer combination in different target genomes, so that the primer combinations can be screened according to the experimental condition.

4. The primer designing apparatus according to claim 3, wherein: the primer design module is also used for screening the hairpin structure of each designed primer;

preferably, the primer combination screening module further comprises a module for screening primer combinations according to whether primer dimers and the kini coefficients are formed, counting parameters of each primer combination in each target genome when a plurality of target genomes are provided, and evaluating each primer combination according to a ratio of (frequency of target genome in comparison x kini coefficient) divided frequency of background genome in comparison.

5. A primer design apparatus for selective whole genome amplification, wherein the primer design apparatus comprises a memory and a processor;

the memory is used for storing programs;

the processor is configured to execute the program stored in the memory to implement the primer design method according to claim 1 or 2.

6. A computer-readable storage medium characterized by: comprising a program executable by a processor to implement the primer design method according to claim 1 or 2.

7. A primer set obtained by the primer design method according to claim 1 or 2 or the primer design apparatus according to any one of claims 3 to 5.

8. Use of a primer combination according to claim 7 for the enrichment of a pathogenic microorganism genome or a pathogenic microorganism specific gene in a human or other host genome.

9. A method for enriching a pathogenic microorganism genome or a pathogenic microorganism specific gene based on a selective genome amplification technology is characterized in that: the method comprises the steps of adopting the primer design method of claim 1 or 2 or the primer design device of any one of claims 3 to 5 to obtain a primer combination taking a pathogenic microorganism genome or a pathogenic microorganism specific gene as a target genome, and adopting the obtained primer combination to amplify and enrich a nucleic acid sample to be processed, thereby realizing the enrichment of the pathogenic microorganism genome or the pathogenic microorganism specific gene.

10. A primer combination for enriching the genome of pathogenic microorganism from the human genome based on the selective genome amplification technology is characterized in that: the pathogenic microorganism genome comprises an escherichia coli genome, a staphylococcus aureus genome and a candida albicans genome;

the primer combination is at least one of a first primer combination, a second primer combination and a third primer combination; the first primer combination comprises primers with sequences shown as Seq ID No.1 to Seq ID No.5, the second primer combination comprises primers with sequences shown as Seq ID No.6 to Seq ID No.10, and the third primer combination comprises primers with sequences shown as Seq ID No.11 to Seq ID No. 15;

Seq ID No.1:5’-CGTCGTAA-3’

Seq ID No.2:5’-ATCGTCGT-3’

Seq ID No.3:5’-ATTCGTCG-3’

Seq ID No.4:5’-ATCGTTCG-3’

Seq ID No.5:5’-CGTCGTAT-3’

Seq ID No.6:5’-CGACGAAT-3’

Seq ID No.7:5’-ACGACGAT-3’

Seq ID No.8:5’-TACGACGA-3’

Seq ID No.9:5’-ACCGATAAT-3’

Seq ID No.10:5’-CGAACGAT-3’

Seq ID No.11:5’-CGACGAAT-3’

Seq ID No.12:5’-ATACGACG-3’

Seq ID No.13:5’-ACGACGAT-3’

Seq ID No.14:5’-TTACGACG-3’

Seq ID No.15:5’-CGACGAAA-3’

the second and third bases at the 3' end of all primers have phosphorylation modifications.

Technical Field

The application relates to the technical field of selective amplification and enrichment of target sequences from a whole genome, in particular to a primer design method, a device and application for selective whole genome amplification.

Background

Selective whole genome amplification (sgga), which is a random primer designed with selective bias for a target nucleic acid sequence in a whole genome or a background gene, and is used for amplifying the whole genome, thereby enriching the target nucleic acid sequence.

Selective whole genome amplification was originally developed by professor Dustin Brisson of the university of Pennsylvania biological system, and it has been mentioned in sporadic article reports that the sWGA technology is used for enriching target nucleic acid of trace samples, for example Sesh A. Sundaraman in 2016, designs a primer with bias for plasmodium to enrich blood sample samples, and has the effect of qualitative identification while obtaining more than 20 times of enrichment effect. The efficacy of selective whole genome amplification depends on the design and screening of selective random primers, particularly the bias of the designed primers to specific target nucleic acid sequences.

Currently, there are two main technical schemes for primer design for selective whole genome amplification:

leichty et al propose primer design where background nucleic acid sequences and target nucleic acid sequences differ greatly by alignment, for example, mitochondrial genes. The primer design mode mainly aims at genes with larger differences, is not based on primer design of a whole genome, and cannot meet the use requirement of enriching pathogenic microorganism sequences in human genomes.

Alternatively, Erik L.Clarke et al used the sliding interval (k-mer) of genomic sequences to screen primers biased toward target nucleic acid amplification, and parameters of primer screening filtration included lower background nucleic acid sequence binding frequency, higher target nucleic acid sequence binding frequency, uniform primer distribution and coverage in the target (i.e., required King coefficient greater than 0.6), appropriate Tm, and the like. Screening according to the parameters to obtain the optimal primer with the rank of 200, and using the optimal primer for the subsequent selective whole genome amplification primer group selection; specifically, the primer group selection comprises the steps of selecting primers with good compatibility to form primer combinations, evaluating each primer combination according to multiple standards, including combining frequency and uniformity, obtaining the primer combinations with evaluation scores higher than a threshold value, storing and outputting the primer combinations meeting the requirements according to instructions, namely the final selective whole genome amplification primers.

Erik L.Clarke theoretically can meet the use requirement of enrichment of pathogenic microorganism sequences in human genomes by using a primer design scheme of selective whole genome amplification by utilizing a sliding interval of genome sequences. However, this approach has the following disadvantages: 1) the operation time is long, and the result can be obtained after 2 weeks of running a 3.5M bacterial genome according to the Erik L.Clarke design process; 2) only for a single target, the Erik l.clarke procedure has no outcome output when facing two or more targets, i.e. it is not possible to enrich multiple targets, e.g. it is not possible to design primer combinations for multiple pathogenic microorganisms; 3) even aiming at a single pathogenic microorganism, the primer combination obtained by the Erik L.Clarke design flow is still poor in microbial coverage, and the coverage of the primer combination sequence reported in the current article on the microorganism is only about 30%.

Therefore, how to shorten the running time, increase the target sequence for single running and improve the primer coverage is the key and difficult point for enriching the pathogenic microorganism sequence based on the selective whole genome amplification technology.

Disclosure of Invention

The purpose of the application is to provide an improved primer design method, a primer design device and application for selective whole genome amplification.

The following technical scheme is adopted in the application:

a first aspect of the present application discloses a method of primer design for selective whole genome amplification comprising the steps of:

setting primer design parameters, including presetting the length range of primers, the number range of primer combinations, the average distance of the primer combinations in a target genome, the maximum distance of the primer combinations in the target genome, the distribution uniformity score of the primer combinations in the target genome and the average distance of the primer combinations in a background genome;

a primer design step, which comprises the steps of designing primers according to the parameters set in the primer design parameter setting step by utilizing the difference of a sliding interval (k-mer) of a genome sequence in a background genome and a target genome, and not screening the uniform distribution of each designed primer in each target genome;

a primer combination screening step, which comprises screening the primers obtained in the primer design step by using an exhaustion method according to the parameters set in the primer design parameter setting step, and outputting the optimal N primer combinations for subsequent evaluation according to the ratio of the number of the binding sites of the primer combinations in the target genome to the number of the binding sites of the primer combinations in the background genome, the maximum distance of the primer combinations in the target genome, the average distance of the primer combinations in the target genome and the average distance of the primer combinations in the background genome; where N is an integer, for example in one implementation of the present application, the optimal 10 primer combinations are output for subsequent evaluation and screening;

the primer combination evaluation step comprises the steps of simulating the primer coverage degree of each primer combination to different target genomes according to the parameters set in the primer design parameter setting step, and predicting the regions which are possibly lost in the experiment so as to facilitate the screening of the primer combinations; respectively carrying out simulation evaluation on the output N primer combinations; in one implementation of the present application, the primer coverage and the areas that may be lost are displayed in the form of images or tables for screening;

and a result analysis step, which comprises the steps of setting the parameters set in the step according to the primer design parameters and outputting the target genome coverage rate and depth map of each primer combination according to the results of the primer combination evaluation step, so as to reflect the experimental condition of each primer combination in different target genomes, and conveniently screen the primer combinations according to the experimental condition.

Compared with the existing primer design scheme, the primer design method shortens the running time, and particularly compared with the Erik L.Clarke process, in the primer design method, the screening logic is that the primary screening is performed firstly and then the fine screening is performed, and the Erik L.Clarke only has a one-step screening process, so that the method saves more operation resources than the Erik L.Clarke process in automation, and in an implementation mode of the method, the time is shortened by 80% than the Erik L.Clarke process under the same condition; in addition, in an implementation manner of the present application, the primer design and screening of the present application can be performed by using jellyfish analysis software, and bedtool is used in the Erik l. Secondly, the running number of targets is increased, and the logic of the primer design method is that nucleic acid sequences of a plurality of species of the targets A1, A2 and A3 … … are assembled into a whole and then are compared with the species B and primer design is carried out; the Erik l clarke procedure only allows for the comparison and primer design of target a1 and species B, target a2 and species B, and target A3 and species B, respectively; therefore, the primer design method of the application can design a primer combination simultaneously aiming at multiple targets, and is equivalent to a universal primer in conventional PCR amplification; in one implementation of the present application, more than 7 targets can be run simultaneously and primers distributed well on each target can be obtained. Thirdly, the primers are designed more reasonably and distributed more densely through reasonable primer screening, the designed primers have wider applicability, and the coverage of the primers is improved; in one implementation of the present application, theoretical coverage is improved by more than 30%. Fourthly, in the primer design method, a series of factors such as primer length, distance between primers, primer cunency coefficient, primer secondary structure and the like are considered in the primer design process, and the factors are comprehensively considered, so that the performance of the designed primer combination is improved, and the primer combination obtained by the design method is applied to the microorganism on the actual sample and is improved by 30-80 times.

Preferably, in the primer design method of the present application, the primer design step further comprises screening the hairpin structure of each designed primer.

It should be noted that, in the primer design method of the present application, hairpin structure screening may be added according to the requirement of designing the primer, for example, whether hairpin structure screening is needed or not may be selected according to the length of the designed primer and the specific operation of the experiment, so as to reduce the generation of secondary structure of the primer and improve the subsequent amplification benefit.

Preferably, in the primer design method of the present application, the primer combination screening step further includes screening the primer combinations according to whether primer dimers and the kini coefficients are formed, counting parameters of each primer combination in each target genome when there are a plurality of target genomes, and evaluating each primer combination according to a ratio of (frequency of target genome in comparison × kini coefficient) ÷ frequency of background genome in comparison).

It is to be noted that the target genome refers to a target to be enriched from a background genome, e.g. in one implementation of the present application, in particular to a pathogenic microorganism genome; considering that the difference degree of each pathogenic microorganism genome is large, in the preferred technical scheme of the application, each target microorganism genome is independently analyzed and then filtered, the parameters of each group of primer combination on different pathogenic microorganisms are counted, and the optimal primer combination sequence under each parameter is obtained according to the ratio of (the frequency of the target genome on the comparison multiplied by the kini coefficient) to the frequency of the background genome on the comparison, so as to be used for screening.

The second aspect of the application discloses a primer design device for selective whole genome amplification, which comprises a primer design parameter setting module, a primer design module, a primer combination screening module, a primer combination evaluation module and a result analysis module, wherein the primer design parameter setting module comprises a length range for setting a primer, a number range of primer combinations, an average distance of the primer combinations in a target genome, a maximum distance of the primer combinations in the target genome, a distribution uniformity score of the primer combinations in the target genome, and an average distance of the primer combinations in a background genome; the primer design module is used for designing primers according to the parameters set by the primer design parameter setting module by utilizing the difference of the sliding interval of the genome sequence in the background genome and the target genome without screening the uniform distribution of each designed primer in each target genome; the primer combination screening module is used for screening the primers obtained by the primer design module according to the parameters set by the primer design parameter setting module by utilizing an exhaustion method, and outputting the optimal N primer combinations for subsequent evaluation according to the ratio of the number of the binding sites of the primer combinations in the target genome to the number of the binding sites of the primer combinations in the background genome, the maximum distance of the primer combinations in the target genome, the average distance of the primer combinations in the target genome and the average distance of the primer combinations in the background genome; the primer combination evaluation module comprises parameters set by the primer design parameter setting module, simulates the primer coverage degree of each primer combination to different target genomes, predicts the regions possibly lost in the experiment and displays the regions in an image or table form so as to facilitate primer combination screening; and the result analysis module comprises parameters set by the primer design parameter setting module and results of the primer combination evaluation module, and outputs the target genome coverage rate and the depth map of each primer combination so as to reflect the experimental condition of each primer combination on different target genomes, so that the primer combinations can be screened according to the experimental condition.

It should be noted that, the primer design apparatus of the present application actually realizes each step in the primer design method of the present application through each module automation; therefore, the preferred embodiments of each module in the primer design device of the present application can refer to the primer design method of the present application, and will not be described herein.

A third aspect of the present application discloses another primer design apparatus for selective whole genome amplification, the primer design apparatus comprising a memory and a processor; the memory is used for storing programs; the processor is used for executing the program stored in the memory so as to realize the primer design method.

It should be noted that, in the primer design method of the present application, in addition to the automatic design by using a dedicated primer design device, the steps in the primer design method may be automated by programming, and the program may be stored in a readable storage medium of a computer, such as a read only memory, a random access memory, a magnetic disk, an optical disk, a hard disk, and the like, and the program is executed by the computer to implement the primer design method of the present application. The program may be stored in a memory of the device, or in a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk, or a removable hard disk, and is saved in a memory of the local device by downloading or copying, or performs version update on a system of the local device, and then the program in the memory is executed by the processor, so that the primer design method of the present application can be implemented.

Accordingly, a fourth aspect of the present application discloses a computer-readable storage medium containing a program executable by a processor to implement the primer design method of the present application.

The fifth aspect of the present application discloses a primer design method of the present application, or a primer combination obtained by any one of the primer design apparatuses of the present application.

It can be understood that, compared with the primer combination obtained by the existing design method, on one hand, the primer combination of the present application has higher coverage on the target genome; on the other hand, the primer combination of the present application can enrich multiple target genomes, that is, the primer combination can be designed for multiple target genomes by using the primer design method or device of the present application; on the other hand, for the same target genome, the primer combination of the application has shorter design time and higher efficiency.

A sixth aspect of the present application discloses the use of the primer combination of the present application for enriching a pathogenic microorganism genome or a pathogenic microorganism specific gene in a human or other host genome. In the present application, the pathogenic microorganism specific genes include, for example, drug resistance genes, virulence genes, pathogenic microorganism detection identification specific genes, and the like.

It should be noted that, the primer design method or device of the present application is actually developed for enriching the genome of pathogenic microorganism based on the selective genome amplification technology; thus, the primer combinations obtained by the primer design methods or apparatus of the present application can be used to enrich the genome of a pathogenic microorganism or a pathogenic microorganism-specific gene in the genome of a human or other host.

The seventh aspect of the application discloses a method for enriching a pathogenic microorganism genome or a pathogenic microorganism specific gene based on a selective genome amplification technology, which comprises the steps of obtaining a primer combination taking the pathogenic microorganism genome or the pathogenic microorganism specific gene as a target genome by adopting the primer design method or the primer design device of the application, and carrying out amplification and enrichment on a nucleic acid sample to be processed by adopting the obtained primer combination, thereby realizing the enrichment of the pathogenic microorganism genome or the pathogenic microorganism specific gene.

The eighth aspect of the present application discloses a primer combination for enriching a pathogenic microorganism genome from a human genome based on a selective genome amplification technology, wherein the pathogenic microorganism genome comprises an escherichia coli genome, a staphylococcus aureus genome and a candida albicans genome;

the primer combination is at least one of a first primer combination, a second primer combination and a third primer combination; the first primer combination comprises primers with sequences shown as Seq ID No.1 to Seq ID No.5, the second primer combination comprises primers with sequences shown as Seq ID No.6 to Seq ID No.10, and the third primer combination comprises primers with sequences shown as Seq ID No.11 to Seq ID No. 15; and the second and third bases at the 3' end of all primers have phosphorylation modifications.

It should be noted that the first primer combination, the second primer combination and the third primer combination of the present application are actually three preferred primer combinations for enrichment of three target genomes of escherichia coli, staphylococcus aureus and candida albicans, which are obtained by designing the primer design method or device of the present application, in a specific implementation manner of the present application. It can be understood that the first primer combination, the second primer combination and the third primer combination of the present application, on one hand, confirm that the primer design method and the apparatus of the present application can indeed perform primer combination design on a plurality of target genomes, and can obtain higher target genome coverage; on the other hand, these three specific primer combinations are only the primer combinations specifically used in one implementation of the present application, and do not exclude that there may be other more primer combinations or sequences.

The beneficial effect of this application lies in:

the primer design method for selective whole genome amplification can design a primer combination capable of enriching a plurality of target genomes simultaneously, and can effectively improve the coverage of the target genomes, so that the use requirement of pathogenic microorganism genome enrichment is well met. Compared with the existing primer design process, the primer design method shortens about 80% of time under the same condition, and improves the primer design efficiency and quality.

Drawings

FIG. 1 is a flow chart of a primer design method for selective whole genome amplification in the examples of the present application;

FIG. 2 is a block diagram of a primer design apparatus for selective whole genome amplification in the present embodiment;

FIG. 3 is a graph showing the coverage analysis of the three primer combinations on the whole genome of Staphylococcus aureus in the example of the present application;

FIG. 4 is a graph showing the coverage analysis of the three primer combinations on the whole genome of Cryptococcus gatherensis in the examples of the present application;

FIG. 5 is a graph showing the coverage analysis of the three primer combinations on the whole genome of Klebsiella pneumoniae in the example of the present application;

FIG. 6 is a diagram showing the analysis of the coverage of three primer combinations on the whole genome of stenotrophomonas maltophilia in the present example;

FIG. 7 is a graph showing the coverage analysis of the entire genome of enterococcus faecium by three primer combinations in the present example;

FIG. 8 is a diagram showing the coverage analysis of the primer combination of three groups on the whole genome of Pseudomonas aeruginosa in the present application.

Detailed Description

The existing primer design process for enriching pathogenic microorganisms based on the selective genome amplification technology is mainly a primer design scheme developed by Erik L.Clarke and the like, but the scheme has the defects of long running time, only aiming at a single target, low microorganism coverage and the like.

In order to improve enrichment quality and efficiency, the method is improved on the basis of the Erik L.Clarke primer design process, changes parameters and conditions of primer design and screening, and increases evaluation and result analysis of primer combinations, so that the primer design method can well overcome the defects of the conventional primer design process.

The primer design method for selective whole genome amplification of the present application, as shown in fig. 1, includes a primer design parameter setting step 11, a primer design step 12, a primer combination screening step 13, a primer combination evaluation step 14, and a result analysis step 15.

The primer design parameter setting step 11 includes presetting a primer length range, a primer combination number range, a primer combination average distance in a target genome, a primer combination maximum distance in the target genome, a primer combination distribution uniformity score in the target genome, and a primer combination average distance in a background genome. In an implementation manner of the present application, the setting step of the primer design parameters is actually parameter initialization, a default value of each parameter can be given by referring to relevant documents and experimental experience, and according to the use requirements, each parameter can be modified in a personalized manner, including a length range of the primer, a number range of the primer combination, an average distance of the primer combination in the target genome, a maximum distance of the primer combination in the target genome, a distribution uniformity score of the primer combination in the target genome, an average distance of the primer combination in the background genome, and the like.

And a primer design step 12, which includes designing primers according to the parameters set in the primer design parameter setting step by using the difference between the background genome and the target genome of the sliding region (k-mer) of the genome sequence, without screening the uniform distribution of each designed primer in each target genome.

The primer design step of the application abandons the screening concept of sWGA proposed by Erik L.Clarke et al in 2017, eliminates the setting that each primer can pass the screening only when the distribution of each microorganism is even (namely the Gini coefficient is more than 0.6), and enables more primers to pass the screening and enter the primer combination screening step stage of the next module. Because in a database of multiple pathogens or an integration of multiple microorganisms, primers cannot exist uniformly in each microorganism, but the same effect can be achieved as long as the composition of the primer combination is controlled, and more screening possibilities are provided for the next step. In the step of primer design, hairpin structure screening can be added according to the requirement of designing the primer, and whether the hairpin structure needs to be screened can be selected according to the length of the designed primer and the specific operation of the experiment, so that the generation of the secondary structure of the primer is reduced, and the subsequent amplification benefit is improved.

The kini coefficient is a common international index for measuring the income gap of residents in a country or a region. The coefficient of the kini is between 0 and 1, and the larger the coefficient of the kini is, the higher the inequality is. Along with the mutual integration of the disciplines of mathematics, biology, information and the like, the biological information cross discipline introduces a kini coefficient for judging economic development and evenly distributing wealth. And (3) calculating a kini coefficient according to the distribution condition of the primers in the target genome, and excluding primer combinations which are not uniformly distributed or aggregated. The maximum Gini coefficient is 'l', which indicates that the distribution among the primers is absolutely uneven, namely all the primers only occupy one position; the minimum is "0" and means that the distribution between the primers is absolutely even, i.e., the primers are completely uniformly distributed at every position of the target genome without any difference. In reality, the situation that the kini coefficient is equal to 0 or 1 generally does not occur, so the actual value of the kini coefficient is only between 0 and 1, if the distribution of the primers is more uneven, the kini coefficient is larger and closer to 1, and if the distribution of the primers is more even, the kini coefficient is smaller and closer to 0.

A primer combination screening step 13, which comprises screening the primers obtained in the primer design step by using an exhaustion method according to the parameters set in the primer design parameter setting step, and outputting the optimal N primer combinations for subsequent evaluation according to the ratio of the number of the binding sites of the primer combinations in the target genome to the number of the binding sites of the primer combinations in the background genome, the maximum distance of the primer combinations in the target genome, the average distance of the primer combinations in the target genome, and the average distance of the primer combinations in the background genome; where N is an integer, for example, in one implementation of the present application, the optimal 10 primer combinations are output for subsequent evaluation and screening.

In one implementation manner of the present application, in the step of screening the primer combinations, the multi-parameter screening is performed according to the relevant parameters designed in the step of setting the primer design parameters and the threshold designed by the experimenter, including selecting the total number of primers used to design the primer combinations, the range of the number of primer combinations, whether to filter the structure capable of forming primer dimers, the degree of the kini coefficient, the average distance of the primer combinations in the target genome, the maximum distance of the primer combinations in the target genome, and the like. In the primer combination screening step, 10 combinations with optimal Ratio (number of binding sites of primer combination in the target genome/number of binding sites of primer combination in the background genome), Max _ target _ dist (maximum distance of primer combination in the target genome), Mean _ target _ dist (average distance of primer combination in the target genome), and Mean _ bg _ dist (average distance of primer combination in the background genome) are output for selection. In this step, because the difference degree of the target microorganisms is large, the genome of each target microorganism is preferably analyzed and filtered, the parameters of each group of primer combinations on the genomes of different pathogenic microorganisms are counted, and according to the conversion relation: and (3) obtaining the optimal combined sequence under each parameter for selection, wherein the score is the frequency of the target pathogen in the alignment multiplied by the coefficient of kini divided by the frequency of the background library in the alignment.

And a primer combination evaluation step 14, which comprises the steps of simulating the primer coverage degree of each primer combination to different target genomes according to the parameters set in the primer design parameter setting step, predicting the regions which are possibly lost in the experiment, and displaying the regions in an image or table form so as to facilitate the screening of the primer combinations. The primer combination evaluation step mainly displays the coverage and the areas possibly lost in an intuitive mode so as to facilitate a user to select or select corresponding primer combinations according to different conditions and requirements.

And a result analyzing step 15, which includes setting the parameters according to the primer design parameters and outputting the target genome coverage and depth map of each primer combination according to the results of the primer combination evaluating step, so as to reflect the experimental conditions of each primer combination in different target genomes, and facilitate the screening of the primer combinations according to the experimental conditions. The result analysis step of the application mainly provides visual and clear results for experiment operators, and is convenient for the operators to further perfect and select according to the experiment results.

As will be understood by those skilled in the art, all or part of the functions of the primer design method of the present application can be implemented by hardware, or by computer program. Therefore, on the basis of the primer design method of the present application, a primer design device for selective whole genome amplification is further developed, as shown in fig. 2, and includes a primer design parameter setting module 21, a primer design module 22, a primer combination screening module 23, a primer combination evaluation module 24, and a result analysis module 25. The primer design parameter setting module 21 is used for setting the length range of the primers, the number range of the primer combinations, the average distance of the primer combinations in the target genome, the maximum distance of the primer combinations in the target genome, the distribution uniformity score of the primer combinations in the target genome, and the average distance of the primer combinations in the background genome; a primer design module 22, which is used for designing primers according to the parameters set by the primer design parameter setting module 21 by using the difference of the sliding interval of the genome sequence between the background genome and the target genome, and does not screen the uniform distribution of each designed primer in each target genome; a primer combination screening module 23, which is used for screening the primers obtained by the primer design module 22 according to the parameters set by the primer design parameter setting module 21 by using an exhaustion method, and outputting the optimal N primer combinations for subsequent evaluation according to the ratio of the number of the binding sites of the primer combinations in the target genome to the number of the binding sites of the primer combinations in the background genome, the maximum distance of the primer combinations in the target genome, the average distance of the primer combinations in the target genome, and the average distance of the primer combinations in the background genome; the primer combination evaluation module 24 simulates the primer coverage degree of each primer combination to different target genomes according to the parameters set by the primer design parameter setting module 21, predicts the regions possibly lost in the experiment, and displays the regions in the form of images or tables so as to facilitate the screening of the primer combinations; the result analysis module 25, which includes the parameters set by the primer design parameter setting module 21 and the results of the primer combination evaluation module 24, outputs the target genome coverage and depth map of each primer combination, so as to reflect the experimental conditions of each primer combination in different target genomes, so as to screen the primer combinations according to the experimental conditions.

It can be understood that the primer design method of the present application aims to design and obtain a primer combination capable of effectively enriching a plurality of target pathogenic microorganisms, and therefore, on the basis of the primer design method of the present application, the present application further develops a method for enriching a pathogenic microorganism genome or a pathogenic microorganism specific gene based on a selective genome amplification technology, that is, the primer combination obtained by the primer design method or the apparatus of the present application is used for amplification to enrich the pathogenic microorganism genome or the pathogenic microorganism specific gene.

In addition, three specific primer combinations are developed in a specific implementation manner of the application and are used for enriching databases integrating three pathogenic microorganisms, namely an escherichia coli genome, a staphylococcus aureus genome and a candida albicans genome. A first primer combination consisting of primers with sequences shown in Seq ID No.1 to Seq ID No.5, a second primer combination consisting of primers with sequences shown in Seq ID No.6 to Seq ID No.10, and a third primer combination consisting of primers with sequences shown in Seq ID No.11 to Seq ID No. 15.

The present application will be described in further detail with reference to specific examples. The following examples are intended to be illustrative of the present application only and should not be construed as limiting the present application.

Examples

The primer design method for selective whole genome amplification of the embodiment comprises the following steps:

setting primer design parameters, including presetting the length range of primers, the number range of primer combinations, the average distance of the primer combinations in a target genome, the maximum distance of the primer combinations in the target genome, the distribution uniformity score of the primer combinations in the target genome and the average distance of the primer combinations in a background genome. The default value of each parameter can be given by referring to relevant documents and experimental experience, and each parameter can be modified in a personalized way according to the use requirement; specifically, the length range of the primers in this example is 8-12bp, the average distance of the primer combination in the target genome is 100,000bp, and the average distance of the primer combination in the background genome is 1,000,000bp, and the settings of the parameters are detailed in table 1.

A primer design step, which comprises the steps of designing primers according to the parameters set in the primer design parameter setting step by utilizing the difference of a sliding interval (k-mer) of a genome sequence in a background genome and a target genome, and not screening the uniform distribution of each designed primer in each target genome; in addition, the hairpin structure of each designed primer can be selectively screened according to requirements.

A primer combination screening step, which comprises utilizing an exhaustion method, setting relevant parameters designed in the step according to primer design parameters, and carrying out multi-parameter screening according to a threshold value designed by a tester, wherein the multi-parameter screening comprises selecting the total number of primers used for designing the primer combination, the number range of primer combinations, whether to filter a structure capable of forming a primer dimer, the size of a kini coefficient, the average distance of the primer combination in a target genome, the maximum distance of the primer combination in the target genome and the like; and outputting the optimal 10 primer combinations for subsequent evaluation according to the Ratio (Ratio) of the number of the binding sites of the primer combinations in the target genome to the number of the binding sites of the primer combinations in the background genome, the maximum distance (Max _ target _ dist) of the primer combinations in the target genome, the average distance (Mean _ target _ dist) of the primer combinations in the target genome and the average distance (Mean _ bg _ dist) of the primer combinations in the background genome. In the step of screening the primer combinations, considering that the difference of a plurality of target pathogenic microorganisms is large, the genome of each target microorganism is analyzed and filtered separately, the parameters of each group of primer combinations on the genomes of different pathogenic microorganisms are counted, and according to the conversion relationship: and (3) obtaining the optimal combined sequence under each parameter for selection, wherein the score is the frequency of the target pathogen in the alignment multiplied by the coefficient of kini divided by the frequency of the background library in the alignment.

And a primer combination evaluation step, which comprises the steps of setting parameters according to primer design parameters, simulating the primer coverage degree of each primer combination on different target genomes, predicting regions which are possibly lost in the experiment, and displaying the regions in an image or table form so as to facilitate the screening of the primer combinations.

And a result analysis step, which comprises the steps of setting the parameters set in the step according to the primer design parameters and outputting the target genome coverage rate and depth map of each primer combination according to the results of the primer combination evaluation step, so as to reflect the experimental conditions of each primer combination in different target genomes, and conveniently screen the primer combinations according to the experimental conditions.

The parameter settings for the steps of this example are shown in table 1. Meanwhile, the present example adopts the existing Erik L.Clarke primer design process as a comparison, and the parameter settings of the existing process are detailed in Table 1.

TABLE 1 primer design procedure parameter set-up

Wherein, due to the restriction of reaction temperature and the restriction of CG content ratio, the length can not reach only 5-7bp when designing the primer, and the primer is easy to mismatch or unstable to match due to too short length; therefore, the default value of the primer of 8-12bp is adopted in the embodiment, and can be modified according to the self analysis requirement. The minimum average distance of the target genome and the maximum average distance of the background genome are restricted to fixed values, and since primer amplification of PCR has a certain amplification range, generally 5kb-15kb, the fixed value adopted in this embodiment is more suitable than 1/100000 of the target whole genome size or 1/150000 of the background whole genome size adopted in the conventional process. In the aspect of hairpin structure, the hairpin can be used or not used according to the design requirement of the hairpin; since the threshold value can be lowered when necessary in order to amplify a large part of the target genome in consideration of the detection of the pathogenic microorganisms in this example. In addition, in the present example, Jellyfish is used for primer screening, and the primer screening speed of Jellyfish is actually measured to be faster than that of bedtool.

In table 1, the primer dimer can determine whether the influence of the primer dimer needs to be considered according to experimental requirements and design requirements, so that the system has higher adjustability and is more flexible. This example sets the average distance of the target genes lower, giving fewer log combinations, but the effect of the combination is theoretically better. Average distance of background genome, since the primer screening module already places constraints on the frequency of primers in the background genome, the influence of the average distance of background genome on the results is not considered in the primer screening step. The maximum distance of the target gene is set to prevent data from being lost in a concentrated manner on the target genome, and the result is more rigorous due to the fact that selective whole genome amplification cannot be completely amplified but only covers most of the target genome. The primer design method of the embodiment has more diversified scores, including the kini coefficient, the proportion, the maximum distance of the target gene, the average distance of the background gene, the score and the like, and can consider the influence of parameters from more aspects. In addition, "the length of each primer site is assumed to cover" in table 1, and since the range covered by each primer is approximately 5-15kb, the coverage rate of each primer is 10kb by default.

In this example, according to the parameters in table 1, the difference between the genomes of three pathogenic microorganisms, namely escherichia coli, staphylococcus aureus and candida albicans, and the genome of Hg19 background is compared, and a primer combination with high binding to the genomes of the three pathogenic microorganisms and low binding to the human genome is designed.

In the primer design method of the present example, the primer design parameter setting step, the primer design step, the primer combination screening step, the primer combination evaluation step, and the result analysis step can all be implemented by automated programming, and therefore, the present example particularly develops a primer design device for selective whole genome amplification, which includes a primer design parameter setting module, a primer design module, a primer combination screening module, a primer combination evaluation module, and a result analysis module, each of which is respectively used to implement each step in the primer design method of the present example.

For convenience of use, this example further developed a primer design apparatus for selective whole genome amplification, which included a memory for storing a program and a processor for implementing the primer design method of this example by executing the program stored in the memory. That is, the present example realizes the steps of the primer design method by a programmed program, and the program can be executed on an existing processor. The primer design program of this example works as follows:

inputting a target genome sequence fa file, a background genome sequence fa file, a running path address, a preinstalled software jellyfish path, a target pathogenic fa file id and name corresponding table of three pathogenic microorganisms. And obtaining a PDPD-sWGA operation parameter file, a new target pathogen fa file id and name correspondence table and a target pathogen whole genome length statistical table.

The results show that: the PDPD-sWGA runs the parameter file, the parameters set in the file are called when each subsequent module runs, and a user can also adjust the parameters in a personalized mode according to the self analysis requirement. The new target pathogen fa file id and name corresponding table is a table for copying the original target pathogen fa file id and name corresponding table to a designated position, so that other follow-up modules can be conveniently called, and the files are protected from being deleted by a user by mistake to cause system errors. And (4) a target pathogen whole genome length statistical table, namely, the whole genome length of each target pathogen is calculated according to the target genome sequence fa file for other follow-up modules to call.

Calling PDPD-sWGA to operate a primer to screen a parameter file, jellyfish software, a target genome sequence fa file and a background genome sequence fa file; and obtaining a jellyfish intermediate operation result and a filtered primer set. The parameters in the primer screening parameter file are shown in table 1, and include parameters such as "length of primer", "average distance of primer combination in target genome", "average distance of primer combination in background genome", "hairpin structure", "tool for selecting primer", and "tool parameter for selecting primer".

The results show that: the jellyfish intermediate operation result is a binary file; the filtered primer set contains all the screened primers and their parameters.

And (3) calling PDPD-sWGA to operate a primer combination to screen a parameter file and a filtered primer set. And obtaining a position file corresponding to the 5000 primers before scoring and ranking in the target bacteria, a filtered primer combination set and a primer combination set of 30 primers before ranking of each parameter. The parameters in the primer combination screening parameter file are shown in table 1, and include parameters such as "number of candidate primers", "number range of primers constituting a primer combination", "whether primer dimer is filtered", "maximum value of kini coefficient", "minimum average distance of combination in primer in a target genome", "maximum average distance of primer combination in a background genome", "maximum distance of primer combination in a target genome", "scoring criterion", and the like.

The results show that: and (3) scoring the position files corresponding to the pre-ranking 5000 primers in the target bacteria, namely selecting the primers with the pre-ranking 5000 scores from the filtered primer set, and combining the primers serving as templates to form a primer combination. And (3) filtering the primer combination set, namely screening, filtering and grading all primer combinations one by one according to parameters in the PDPD-sWGA operation parameter file to obtain the primer combination set. The primer combination sets with the parameters ranked 30 are obtained by sequencing the filtered primer combination sets one by one according to the parameters, and the primer combination sets with the parameters ranked 30 are stored in different files.

And (3) calling PDPD-sWGA to operate a primer combination to evaluate a parameter file, an appointed file and a corresponding position file in 30 primer re-target bacteria before grade ranking. And obtaining a preparation file for evaluation, an R language running script and a primer combination simulation evaluation result. Wherein, the parameters in the primer combination evaluation parameter file are shown in Table 1, and mainly comprise the parameter of 'length of each primer site assumed to cover'.

The results show that: the specified file, namely the default is the primer combination set with the score of 30, and the user can also set the corresponding primer combination set file in a personalized way according to the self analysis requirement. And (4) preparing the file for evaluation, namely counting the files according to the specified files to obtain the files required by the subsequent R language. The primer combination simulates the evaluation result, that is, the abscissa represents the position of each window of the pathogen, the ordinate represents the number of primers contained in the 30 th window, and the right sequence number represents the number of combinations of primers.

Through the primer design method and the primer design process, the optimal three groups of primer combinations are finally screened out for enriching genomes of three pathogenic microorganisms, namely escherichia coli, staphylococcus aureus and candida albicans, and the three groups of primer combinations are shown in table 2. The three primer combinations were all synthesized by Shanghai Yingjun Biotechnology Ltd.

TABLE 2 three primer combinations

The present example statistically analyzes the time required for designing the primers and the primer combinations by the primer design method of the present example, and comparatively statistically calculates the time required for designing the primer of the conventional process Erik L.Clarke, and the results are shown in Table 3. Among them, Erik L.Clarke primer design procedure requires time, reference Erik L.Clarke1, Sesh A.Sundararaman, et al.swa: a primer design toolkit for selective genetic amplification, Bioinformatics,33(14),2017,2071 and 2077.

TABLE 3 time required for primer design

Link of a Chinese character The method for designing primers of this example Existing flow path
Single primer screening 354 seconds 9614 seconds
Multi-primer screening 72585 seconds More than 5 days

The results in Table 3 show that the primer design method of this example, whether it is single primer design and screening or primer combination design and screening, is much faster than the existing process. In addition, from the output results, the primer design method of the present example can output primer combinations for a plurality of target genomes, whereas the conventional flow can output only a primer combination for a single target genome.

In this example, the coverage of the three groups of primer combinations obtained in the genomes of three pathogenic microorganisms, namely Escherichia coli, Staphylococcus aureus and Candida albicans, was analyzed. Partial results are shown in FIG. 3, and FIG. 3 shows the coverage of the three primer combinations of this example on the whole genome of Staphylococcus aureus; in the figure, a coverage statistical chart of a first primer combination, a second primer combination and a third primer combination on a staphylococcus aureus genome is shown from top to bottom; the abscissa is the genome size in kilobases (kb); the ordinate is the depth of coverage, in layers. The results show that the coverage of the three groups of primer combinations on the genome of three pathogenic microorganisms is about 65 percent on average.

In addition, the coverage of the three primer combinations obtained in this example was analyzed on common microorganisms including Cryptococcus gatherens, Klebsiella pneumoniae, stenotrophomonas maltophilia, enterococcus faecium, and Pseudomonas aeruginosa. The results are shown in FIGS. 4 to 8, wherein FIG. 4 is the coverage of the three primer combinations of this example on the genome of Cryptococcus gatherens, FIG. 5 is the coverage of the three primer combinations of this example on the genome of Klebsiella pneumoniae, FIG. 6 is the coverage of the three primer combinations of this example on the genome of stenotrophomonas maltophilia, FIG. 7 is the coverage of the three primer combinations of this example on the genome of enterococcus faecium, and FIG. 8 is the coverage of the three primer combinations of this example on the genome of Pseudomonas aeruginosa; in fig. 4 to 8, the coverage statistics of the first primer combination, the second primer combination and the third primer combination on the corresponding genome are shown from top to bottom; the abscissa is the genome size in kilobases (kb); the ordinate is the depth of coverage, in layers.

The results in FIGS. 4 to 8 show that the coverage of the three primer combinations in this example can reach around 60% in common microorganisms.

The three groups of primer combinations are adopted to enrich the simulated sample and the clinical sample respectively, sequencing is carried out on the enriched products, and the enrichment conditions of three pathogenic microorganism genomes of escherichia coli, staphylococcus aureus and candida albicans are analyzed. And N6 random primers were used as a control to compare the enrichment effect of the three primer combinations of this example. The details are as follows:

simulation of a sample: escherichia coli genome nucleic acid with the copy number of 1000cp, staphylococcus aureus genome nucleic acid with the copy number of 1000cp and candida albicans genome nucleic acid with the copy number of 1000cp are mixed and added into 50ng of human genome nucleic acid to prepare a simulation sample, and the simulation sample is divided into four parts on average and is used for amplification and enrichment of three groups of primer combinations and N6 random primers respectively.

Clinical samples: nucleic acid from one mycoplasma positive clinical specimen was used as a clinical sample for sequencing the applicability of the three primer combinations of this example.

The DNA polymerase used in the PCR amplification of the embodiment is phi29 DNA polymerase, which has strong template binding and strand displacement capabilities and can generate a 50-100kb DNA fragment by amplification under isothermal conditions. The PCR amplification system of this example was 50. mu.L, and included: nucleic acid sample 1. mu.L, 10 XPhi 29 DNA Polymerase Reaction Buffer 5. mu.L, phi29 DNA Polymerase 3. mu.L of 1000U/mL, dNTPs 2. mu.L of each component concentration 25mmol/L, bovine serum albumin 0.5. mu.L of 10mg/mL, primer combination 12.5. mu.L of 10. mu.M, and make up nucleic acid-free water to 50. mu.L. Wherein the concentration of each primer in the primer combination is equal.

Placing the prepared PCR reaction system solution inIn a thermal cycler, gradually reducing the temperature from 35 ℃ to 30 ℃ in a gradient manner at a cooling speed of 10 min/DEG C, and maintaining the temperature at 30 ℃ for 16h, so that double strands in a DNA sample are gradually opened by strand displacement polymerase and sequences of pathogenic microorganisms are enriched under the action of primers; then, the temperature was raised to 65 ℃ for 10min to inactivate the phi29 DNA polymerase, and finally, 10 ℃ hold.

After the PCR reaction is finished, the Axygen magnetic beads are adopted to purify the PCR product, and the specific steps are as follows:

(1) complementing the PCR product to 100 mu L with water, adding 50 mu L of Axygen magnetic beads, fully mixing uniformly, and standing at room temperature for 5 min;

(2) centrifuging the solution in the previous step, namely the magnetic bead mixture, for a short time, placing the solution on a magnetic frame for 2min, and carefully sucking and removing the supernatant;

(3) carefully adding 500 mu L of 70% ethanol, rotating the centrifuge tube to fully wash the magnetic beads, standing for 3min after washing is finished, and removing the ethanol;

(4) repeating the step (3) once;

(5) carefully absorbing and removing ethanol, and then airing at room temperature until the surfaces of the magnetic beads are matte;

(6) adding 20 mu L of Elution Buffer, blowing and uniformly mixing the magnetic beads for multiple times by using a gun head, and standing for 5 min; after brief centrifugation, place on magnetic rack for 3min, carefully pipette the solution into a new 1.5mL centrifuge tube.

The purified PCR product is diluted to a proper concentration for BGISEQ-500 platform library construction. The constructed library was circularized by split oligo and make DNB followed by BGISEQ-500/BGISEQ-50 in-silico sequencing. Among these, library construction and DNB preparation are described in reference to the BGISEQ sequencing platform, which is not repeated herein.

And (4) analyzing results: and calling a PDPD-sWGA operation result analysis parameter file and an experiment result output file. And obtaining an experimental result analysis intermediate file, coverage depth maps of all pathogens and all samples, and coverage rate maps of all pathogens and all samples. The parameters of the analysis parameter file are shown in table 1, and include "coverage of each primer" and "normalization of the number of sequences". The length of the sequence of each machine is not consistent, in this example, the length of each sequence coverage is 100bp by default, and the user can modify the length of the sequence according to the selected sequencing instrument. Because the sizes of the samples cannot be consistent due to experimental operation problems, the samples need to be normalized in order to transversely compare the effects of the samples, and the total data size is 20M in the default example.

The results show that: wherein, the experimental result output file requires the user to input the corresponding information address in the PDPD-sgga operation parameter file. And analyzing the intermediate file according to the experimental result, namely obtaining a statistical table for outputting the depth and the coverage rate of each window according to the PDPD-sWGA operation parameter file and the experimental result output file. And obtaining the coverage depth situation of each pathogen and each window of each sample according to the DPI size of the PDPD-sWGA operation parameter file. And obtaining the coverage rate conditions of each pathogen and each window of each sample according to the DPI size of the PDPD-sWGA operation parameter file.

The off-line data was analyzed and the results are shown in table 4. The results in Table 4 show that the total proportion of the reads of the three primer combinations in the simulated sample, both of Escherichia coli and Staphylococcus aureus, is 0.29-1.28%; whereas the total content of bacterial reads in the N6 random primer control group was only 0.01%. The total proportion of the fungus in the simulated sample can reach 0.18-3.02% by combining the three groups of primers, and the total proportion of the fungus in the N6 random primer control is only 0.02%. The proportion of mycoplasma in the treated group in the clinical sample can reach 9.04-17.6% by the three groups of primer combinations, and the proportion of mycoplasma in the N6 random primer control group is only 1.9%.

TABLE 4 target reads ratio in sequencing results

The experiments show that the primer design method of the embodiment can design primer combinations aiming at three target genomes, and the coverage of the three target genomes can reach 65%; the designed primer combination also has higher coverage to some common microorganisms, and the coverage is as high as 60%; the three primer combinations designed in the embodiment can have high coverage on the genomes of eight microorganisms, and can be used for enriching the genomes of the microorganisms. Compared with the existing Erik L.Clarke primer design process, the primer design method of the embodiment has shorter running time, and shortens about 80% of the time compared with the Erik L.Clarke process under the same condition.

The foregoing is a more detailed description of the present application in connection with specific embodiments thereof, and it is not intended that the present application be limited to the specific embodiments thereof. It will be apparent to those skilled in the art from this disclosure that many more simple derivations or substitutions can be made without departing from the spirit of the disclosure.

SEQUENCE LISTING

<110> Guangzhou Huada Dageney medical laboratory Co., Ltd

Shenzhen Huada Yinyuan Pharmaceutical Technology Co.,Ltd.

Huada Biotechnology (Wuhan) Co., Ltd.

<120> primer design method, device and application for selective whole genome amplification

<130> 20I23705

<160> 15

<170> PatentIn version 3.3

<210> 1

<211> 8

<212> DNA

<213> Artificial sequence

<400> 1

cgtcgtaa 8

<210> 2

<211> 8

<212> DNA

<213> Artificial sequence

<400> 2

atcgtcgt 8

<210> 3

<211> 8

<212> DNA

<213> Artificial sequence

<400> 3

attcgtcg 8

<210> 4

<211> 8

<212> DNA

<213> Artificial sequence

<400> 4

atcgttcg 8

<210> 5

<211> 8

<212> DNA

<213> Artificial sequence

<400> 5

cgtcgtat 8

<210> 6

<211> 8

<212> DNA

<213> Artificial sequence

<400> 6

cgacgaat 8

<210> 7

<211> 8

<212> DNA

<213> Artificial sequence

<400> 7

acgacgat 8

<210> 8

<211> 8

<212> DNA

<213> Artificial sequence

<400> 8

tacgacga 8

<210> 9

<211> 9

<212> DNA

<213> Artificial sequence

<400> 9

accgataat 9

<210> 10

<211> 8

<212> DNA

<213> Artificial sequence

<400> 10

cgaacgat 8

<210> 11

<211> 8

<212> DNA

<213> Artificial sequence

<400> 11

cgacgaat 8

<210> 12

<211> 8

<212> DNA

<213> Artificial sequence

<400> 12

atacgacg 8

<210> 13

<211> 8

<212> DNA

<213> Artificial sequence

<400> 13

acgacgat 8

<210> 14

<211> 8

<212> DNA

<213> Artificial sequence

<400> 14

ttacgacg 8

<210> 15

<211> 8

<212> DNA

<213> Artificial sequence

<400> 15

cgacgaaa 8

25页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种配体-蛋白质相互作用的预测方法及装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!