Method for targeting nucleic acid sequence enrichment and the application in the nucleic acid sequencing of error correcting

文档序号:1760016 发布日期:2019-11-29 浏览:18次 中文

阅读说明:本技术 用于靶向核酸序列富集的方法及在错误纠正的核酸测序中的应用 (Method for targeting nucleic acid sequence enrichment and the application in the nucleic acid sequencing of error correcting ) 是由 斯科特·R·肯尼迪 杰西·J·索尔克 迈克尔·希普 伊丽莎白·施密特 罗莎·安娜·里斯克斯 于 2018-03-23 设计创作,主要内容包括:本技术一般涉及用于靶向核酸序列富集的方法和组合物,以及此类富集用于错误纠正的核酸测序应用的用途。在一些实施例中,以这种方式使用双链核酸复合物中独特标记的链的组合,核酸材料的高度准确、错误纠正和大规模平行测序是可能的,所述方式使得每条链可以与其互补链在信息上相关,但在每条链或由其衍生的扩增产物测序后也与其互补链区别开。在各种实施例中,该信息可以用于所测定序列的错误纠正的目的。(Technology relates generally to the purposes of method and composition and such enrichment for the nucleic acid sequencing application of error correcting for targeting nucleic acid sequence enrichment.In some embodiments, in this way using the combination of the chain of unique tag in double-strandednucleic acid compound, pin-point accuracy, error correcting and the large-scale parallel sequencing of nucleic acid material are possible, it is related in information that the mode allows every chain to be complementary chain, but is also complementary chain after every chain or the sequencing of amplified production as derived from it and differentiates.In various embodiments, which can be used for the purpose of the error correcting of measured sequence.)

1. a kind of method comprising:

The double-strandednucleic acid material comprising one or more double-stranded nucleic acid molecules is provided, wherein each double-stranded nucleic acid molecule is included in often Unimolecule identifier nucleotide sequence on chain and the adapter at least one of the 5 ' of nucleic acid molecules and/or 3 ' ends, And wherein for each nucleic acid molecules, the first linking subsequence is related to the first chain of the nucleic acid molecules, and the second rank It is related to the second chain of the nucleic acid molecules to connect subsequence;

Expand the nucleic acid material;

The nucleic acid material of the amplification is separated into the first sample and the second sample;

By using the primer to the first adapter sequence-specific, the first chain in first sample is expanded, to mention For the first nucleic acid product;

By using the primer to the second adapter sequence-specific, the second chain in second sample is expanded, to mention For the second nucleic acid product;

First nucleic acid product and the second nucleic acid product are respectively sequenced;And

Compare the sequence of first nucleic acid product and the sequence of second nucleic acid product.

2. according to the method described in claim 1, wherein the nucleic acid material be or comprising in double-stranded DNA and double-stranded RNA extremely Few one kind.

3. according to claim 1 or method as claimed in claim 2, wherein the offer step includes:

Double-strandednucleic acid material is connected at least one degeneracy or half degeneracy bar code sequence, to form double-stranded nucleic acid molecule bar shaped Code compound, wherein the bar code sequence includes unimolecule identifier nucleotide sequence.

4. according to claim 1 or method as claimed in claim 2, wherein the unimolecule identifier nucleotide sequence is degeneracy or half letter And one or more nucleic acid fragment ends or combinations thereof of at least one of bar code sequence, the nucleic acid material, it is unique Ground marks the double-stranded nucleic acid molecule.

5. according to claim 1 or method as claimed in claim 2, wherein the unimolecule identifier nucleotide sequence includes endogenous shearing Point can be to the shearing point relevant endogenous sequence in position.

6. method according to any one of claims 1-5, wherein expanding the nucleic acid material includes generating from described Multiple amplicons of first chain and multiple amplicons from second chain.

7. method according to claim 1 to 6, wherein the nucleic acid material expanded in first sample includes:

Using at least partly complementary at least one single-stranded oligonucleotide of sequence present in the first linking subsequence, with And at least one single-stranded oligonucleotide at least partly complementary with purpose target sequence, amplification, which is originated from, comes from original double-stranded nucleic acid molecule Single nucleic acid chains nucleic acid material so that the unimolecule identifier nucleotide sequence is at least partly maintained.

8. method according to any one of claims 1-7, wherein the nucleic acid material expanded in second sample includes:

Using at least partly complementary at least one single-stranded oligonucleotide of sequence present in the second linking subsequence, with And at least one single-stranded oligonucleotide at least partly complementary with purpose target sequence, amplification, which is originated from, comes from original double-stranded nucleic acid molecule Single nucleic acid chains nucleic acid material so that the unimolecule identifier nucleotide sequence is at least partly maintained.

9. method according to any one of the preceding claims, wherein at least some of described nucleic acid material is impaired 's.

10. according to the method described in claim 9, wherein the damage is or comprising at least one of the following: oxidation, alkyl Change, methylation, hydrolysis, hydroxylating, generates notch, crosslinking, interchain linkage, the fracture of flush end chain, staggered end pair in chain at deamination Chain fracture, phosphorylation, dephosphorylation, SUMOization, glycosylation, deglycosylation, putrescinylation, carboxylation, halogenation, formyl Change, the damage of Lai Zire, the damage for carrying out self-desiccation, the damage from UV exposure, comes from gamma-emitting damage, comes single-stranded gap Damage from X-ray, the damage from Non-ionizing radiation, the damage from heavy-particle radiation, comes the damage from ionising radiation From the damage of nuclear decay, the damage from β radiation, from alpha-emitting damage, the damage from neutron irradiation, from proton spoke The damage penetrated, the damage from cosmic radiation, the damage from high pH, the damage from low pH, from active oxidation species Damage, the damage from peroxide, the damage from hypochlorite, carrys out the fixed such as good fortune of self-organizing at the damage from free radical That Malin or the damage of formaldehyde, the damage from active iron, the damage from low ionic conditions, the damage from macroion condition Wound, the damage from nuclease, the damage from environmental exposure, the damage from fire, is come the damage from no buffer condition Damage from mechanical stress, the damage from enzymatic degradation, the damage from microorganism, the damage from preparative mechanical shearing Wound, the damage from preparative enzymatic cleavage, in vivo abiogenous damage, the damage occurred during nucleic acid extraction Wound, the damage occurred during sequencing library preparation, the damage introduced by polymerase, the damage being had been introduced into during nucleic acid reparation Wound, the damage occurred during nucleic acid connection, has occurred during sequencing the damage occurred during nucleic acid tailing Damage, occurred due to the mechanical treatment of DNA damage, occurred during through nano-pore damage, decline as biology Damage that old part has occurred, the damage occurred due to the Chemical exposure of individual, the damage occurred by mutagens, The damage that is occurred by carcinogenic substance, the damage occurred by clastogen, the damage occurred due to internal inflammation, since oxygen is sudden and violent The damage of dew, the damage and any combination thereof being broken due to one or a plurality of chain.

11. method according to any one of the preceding claims, wherein the nucleic acid material by comprising be originated from subject or The sample of one or more double-stranded nucleic acid molecules of biology provides.

12. according to the method for claim 11, wherein the sample is or comprising bodily tissue, slicer, skin Sample, blood, serum, blood plasma, sweat, saliva, celiolymph, mucus, metroclyster liquid, vaginal swab, Pap smear, nose are wiped Son, buccal swab, tissue scraping blade, hair, fingerprint, urine, excrement, vitreous humor, peritoneal wash fluid, phlegm, bronchial perfusate, mouth Chamber irrigating solution, thoracic cavity lavage liquid, gastric lavage liquid, gastric juice, bile, ductus pancreaticus irrigating solution, bile duct irrigating solution, choledochus irrigating solution, gallbladder Cyst fluid, infected wound, is uninfected by wound, archaeological samples, forensic samples, water sample, tissue sample, foodstuff samples, biology instead at synovia Answer device sample, plant sample, bacteria samples, Protozoan specimen, fungal sample, animal sample, viral sample, more biological samples Product, nail scraping blade, sperm, prostatic fluid, vaginal secretion, vaginal swab, fallopian tubal irrigating solution, cell free nucleic acid, intracellular nucleic Acid, metagenomics sample, be implanted into foreign matter irrigating solution or swab, Nasal lavage fluid, intestinal juice, epithelium scrub liquid, epithelium irrigating solution, Slicer, autopsy samples, ptomatopsia sample, organ samples, human body is organized to identify sample, non-human identification sample, people Nucleic acid samples that work generates, synthesis gene samples, inventory or store sample, tumor tissues, fetal samples, organ transplant sample, Microculture sample, core DNA sample, mitochondrial DNA sample, chloroplast DNA sample, top plastid DNA sample, organelle sample And any combination thereof.

13. method according to any one of the preceding claims, wherein the nucleic acid material includes substantially or close to The nucleic acid molecules of even length.

14. according to the method for claim 13, wherein the substantially uniform length is at about 1 to about 1,000,000 Between base.

15. according to claim 13 or method of claim 14, wherein via targeting endonuclease, by the nucleic acid Material be cut into substantially or close to even length nucleic acid molecules.

16. method described in any one of -12 according to claim 1, wherein the nucleic acid material includes length at one or more Nucleic acid molecules in a substantially known magnitude range.

17. according to the method for claim 16, wherein the nucleic acid molecules are between 1 to about 1,000,000 base, In Between about 10 to about 10,000 bases, between about 100 to about 1000 bases, between about 100 to about 600 bases, It between about 100 to about 500 bases, or is its certain combination.

18. method according to any one of the preceding claims, wherein before the offer step, the method packet It includes:

The nucleic acid material is cut with one or more targeting endonucleases, so that forming the target nucleic acid of substantially known length Segment;And

The target nucleic acid fragment is separated based on the substantially known length.

19. according to the method for claim 18, wherein one or more targeting endonucleases are selected from ribose core egg White, Cas enzyme, Cas9 sample enzyme, meganuclease, nuclease (TALEN), zinc finger core based on activating transcription factor sample effector Sour enzyme, argonaute nuclease or combinations thereof.

20. according to claim 18 or claim 19 described in method, wherein one or more targeting endonucleases Include Cas9 or CPF1 or derivatives thereof.

21. method described in any one of 8-20 according to claim 1, further include before the offer step, will be described Adapter is connected to target nucleic acid fragment.

22. method described in any one of 8-21 according to claim 1, wherein the target nucleic acid fragment is derived from subject or life Object.

23. method described in any one of 8-21 according to claim 1, wherein the target nucleic acid fragment is at least partly manually to close At.

24. method described in any one of 8-23 according to claim 1, wherein cutting the nucleic acid material includes with a kind of or more Kind targeting endonuclease cuts the nucleic acid material, so that forming more than one target nucleic acid piece with substantially known length Section.

25. according to the method for claim 24, wherein the target nucleic acid fragment has different substantially known length.

26. according to the method for claim 24, wherein the target nucleic acid fragment respectively contains one in genome Or the target genome sequence of multiple and different positions.

27. according to the method for claim 24, wherein the target nucleic acid fragment respectively contains in the nucleic acid material Substantially known region targeting sequence.

28. method described in any one of 8-27 according to claim 1, wherein separating institute based on the substantially known length Stating target nucleic acid fragment includes by gel electrophoresis, gel-purified, liquid chromatography, size exclusion purifying, filtering or the purifying of SPRI pearl To be enriched with the target nucleic acid fragment.

29. method according to any one of the preceding claims, wherein at least one amplification step is drawn including at least one Object, at least one primer are or comprising at least one non-standard nucleotide.

30. method according to any one of the preceding claims, wherein at least one linking subsequence are or comprising at least A kind of non-standard nucleotide.

31. according to method described in claim 29 or claim 30, wherein the non-standard nucleotide is selected from uracil, first Base nucleotide, RNA nucleotide, ribonucleotide, 8- oxo-guanine, biotinylated nucleotide, desthiobiotin nucleosides Nucleotide, the different dC, different dG, 2 '-O- methyl nucleotides, inosine nucleosides that acid, the nucleotide of mercaptan modification, acrydite are modified It is acid, lock nucleic acid, peptide nucleic acid, 5 methyl dC, 5-bromouracil deoxyribose, 2,6- diaminopurine, 2-aminopurine nucleotide, abasic Nucleotide, 5- nitroindoline nucleotide, polyadenylation nucleotide, nitrine nucleotide, digoxin nucleotide, I- connector, 5 ' hexins The nucleotide of base modification, the pungent diynyl dU of 5-, light cleavable spacer, non-light cleavable spacer, click chemistry is compatible repairs Adorn nucleotide, fluorescent dye, biotin, furans, BrdU, fluoro- dU, loto-dU and any combination thereof.

32. method according to any one of the preceding claims, wherein being produced to first nucleic acid product and the second nucleic acid Object respectively carries out sequencing

Compare the sequence of a plurality of chain in first nucleic acid product, to determine the first chain consensus sequence;And

Compare the sequence of a plurality of chain in second nucleic acid product, to determine the second chain consensus sequence.

33. according to the method for claim 32, wherein the sequence of first nucleic acid product and second nucleic acid The sequence of product includes the first chain consensus sequence and the second chain consensus sequence, to provide the shared of error correcting Sequence.

34. method according to any one of the preceding claims, wherein each to the first nucleic acid product and the second nucleic acid product Include: from sequencing is carried out

It is sequenced at least one in first chain, to determine the first chain-ordering reading;

It is sequenced at least one in second chain, to determine the second chain-ordering reading;And

Compare the first chain-ordering reading and second chain-ordering reading, the sequence reads corrected with generation error.

35. according to the method for claim 34, wherein the sequence reads of the error correcting are included in the first chain sequence Consistent nucleotide base between column reading and second chain-ordering reading.

36. according to method described in claim 34 or claim 35, wherein in the sequence reads of the error correcting The variation that specific location occurs is accredited as true variant.

37. the method according to any one of claim 34-36, wherein in first chain-ordering reading or described second The variation that the specific location in only one in chain-ordering reading occurs is accredited as potential artefact.

38. the method according to any one of claim 34-36, wherein the sequence reads of the error correcting are for identifying Or it is characterized in biology of the double chain target acid molecule from it or the cancer in subject, risk of cancer, cancer mutation, cancer metabolism State, mutation type surface, carcinogenic substance exposure, toxin exposure, chronic inflammation exposure, the age, neurodegenerative disease, pathogen, Anti- medicine variant, fetal molecule, legal medical expert's relevant molecule, immunity related molecular, the T cell receptor of mutation, mutation B-cell receptor, Immunoglobulin locus, the site kategis in genome, the high mutational site in genome, the low frequency variant, Asia of mutation Clone variants, secondary molecular population, pollution sources, nucleic acid resultant fault, enzymatically modifying mistake, chemical modification mistake, gene editing Mistake, gene therapy mistake, information nucleic acid store piece, microorganism quasispecies, viral quasi-species, organ transplant, organ-graft refection, cancer State, depauperation state, microchimerism, stem cell transplantation state, cell before residual cancer, tumour after disease recurrence, treatment Therapeutic state, the nucleic acid marking for being attached to another molecule or combinations thereof.

39. the method according to any one of claim 34-36, wherein the sequence reads of the error correcting are for identifying Mutagenesis compound or exposure.

40. the method according to any one of claim 34-36, wherein the sequence reads of the error correcting are for identifying Carcinogenic compound or exposure.

41. the method according to any one of claim 34-36, wherein the nucleic acid material is originated from forensic samples, and Wherein the sequence reads of the error correcting are in forensic analysis.

42. method according to any one of the preceding claims, wherein at least one amplification step includes that polymerase chain is anti- Answer (PCR).

43. method according to any one of the preceding claims, wherein being initially provided of at most 1000ng nucleic acid material.

44. method according to any one of the preceding claims, wherein being initially provided of at most 10ng nucleic acid material.

45. method described in any one of -44 according to claim 1, wherein expanding nucleic acid material in first sample also It is included in after the separating step and before first sample amplification, destroys or rupture is found on the nucleic acid material the Two linking subsequences.

46. method described in any one of -45 according to claim 1, wherein expanding nucleic acid material in second sample also It is included in after the separating step and before second sample amplification, destroys or rupture is found on the nucleic acid material the One linking subsequence.

47. according to method described in claim 45 or claim 46, wherein described destroy includes at least one of the following: Enzymatic digestion, at least one inhibition of DNA replication molecule include, the enzymatic of enzymatic cutting, the enzymatic cutting of chain, two chains is cut Cut, the incorporation of modification of nucleic acids then for cause a chain or two chains to cut enzymatic treatment, replicate blocked nucleotide incorporation, The incorporation of chain terminator, the incorporation of light cleavable connector, the incorporation of uracil, the incorporation of ribosyl, 8- oxo-guanine adduction The incorporation of object, the use of restriction endonuclease, the use of targeted nuclease and any combination thereof.

48. method according to any one of the preceding claims, wherein amplification include rolling circle amplification, multiple displacement amplification, Isothermal duplication, bridge amplification or surface combine amplification.

49. method according to any one of the preceding claims, wherein expanding the nucleic acid material and including and aim sequence At least partly complementary single-stranded oligonucleotide and the single-stranded oligonucleotide at least partly complementary with the region of adapter makes With.

50. method according to any one of the preceding claims, wherein the nucleic acid material is included in the nucleic acid material Every chain 5 ' and 3 ' ends respectively on adapter.

51. according to the method for claim 50, wherein expanding the nucleic acid material includes being connected subsequence with described first The use of at least partly complementary single-stranded oligonucleotide with the region of the second linking subsequence.

52. method according to any one of the preceding claims, wherein the adapter includes at least one nucleotide position It sets, the nucleotide position is at least partly incomplementarity or comprising at least one non-standard bases.

53. method according to any one of the preceding claims, wherein the adapter include by about 5 or more from Single " U-shaped " oligonucleotide sequence that complementary nucleotide is formed.

54. a kind of method comprising:

Double-strandednucleic acid material is provided, wherein due to targeting endonuclease cutting, the nucleic acid material about 1 to 1,000, Between 000 base, and wherein the nucleic acid material includes the unimolecule identifier on every chain of the nucleic acid material Sequence and the linking subsequence at least one of 5 ' and 3 ' ends of every chain of the nucleic acid material, wherein the One linking subsequence is located on one of 5 ' ends or 3 ' ends of the first chain of the nucleic acid material, and the second linking subsequence On the opposing end portions of the second chain of the nucleic acid material, and wherein first chain and second chain derived from identical Double-stranded nucleic acid molecule;

Expand the nucleic acid material;

The nucleic acid material of the amplification is separated into the first sample and the second sample;

By using the primer to the first adapter sequence-specific, the first chain in first sample is expanded, to mention For the first nucleic acid product;

By using the primer to the second adapter sequence-specific, the second chain in second sample is expanded, to mention For the second nucleic acid product;

First nucleic acid product and the second nucleic acid product are respectively sequenced;And

Compare the sequence of first nucleic acid product and the sequence of second nucleic acid product.

55. method according to any one of the preceding claims, wherein the nucleic acid material includes from more than one next The nucleic acid material in source.

Background technique

The prior method of certain form of genetic analysis such as forensic DNA analysis relies on the Capillary Electrophoresis of PCR amplification (CE) (PCR-CE) is separated, to identify the length polymorphism in short tandem repeat.From it since about 1991 release, this The analysis of seed type has proved to be very valuable.From that time, some publications have had been introduced into standardization agreement, verifying Its application in laboratory all over the world, and its use in many different crowds is described in detail and introduces More efficient method, such as miniSTR.

Although this method has proved to be extremely successful, which has the shortcomings that limit many of the practicality.Example Such as, current str locus classifying method often causes the background signal from PCR skidding (PCR stutter), this is by polymerizeing Sliding of the enzyme on template DNA causes, and leads to the mixing of PCR amplification of the different length in the reaction being finally completed Object.This problem is particularly important in the sample with more than one contributor (for example, being originated from the DNA of different particular individuals There is the specific gene for carrying different STR Length variants to constitute for mixture, the individual), due to being difficult to distinguish skidding equipotential base Cause and real allele.There is another problem when the DNA sample of analysis degradation.Impaired DNA can deteriorate skidding With the degree of PCR mistake.Variation in fragment length frequently results in that significant lower or even there is no longer PCR fragments.Cause And the capillary electrophorogram from degradation of dna often has lower perspective.

The introducing of large-scale parallel sequencing (MPS, otherwise referred to as next-generation DNA sequencing, NGS) system, which has, solves legal medical expert The potentiality of several challenges in analysis.For example, these platforms provide impayable ability in the past, to allow simultaneously Core and STR and single nucleotide polymorphism (SNP) in mitochondrial DNA (mtDNA) are analyzed, this is by the difference between sharply increasing individual Power, and a possibility that determining ethnicity and even physical attribute (phenotype) are provided.In addition, the gathering groups with only reporter molecule The PCR-CE of the average gene type of body is different, and MPS technique is in a digital manner by the whole nucleotide sequence of many individual DNA moleculars Tabulation thus provides the unique ability for detecting the minorAllele frequency (MAF) in heterogeneous DNA mixture.Because comprising The forensic samples of two or more contributors are still one of most thorny issue in medical jurisprudence, so MPS leads medical jurisprudence The influence in domain may be huge.

The announcement of human genome is highlighted the immense strength of MPS platform.However, up to date, due to reading length It is significantly shorter than Short tandem repeatSTR (STR) locus, the entire ability of these platforms has limited use to medical jurisprudence, eliminates tune With the ability of the genotype based on length.Initially, pyrosequencing instrument (pyrosequencer), such as MPS Roche 454 are flat Platform is with unique platform of the reading length core standard str locus seat to be sequenced enough.However, in competitive technologyL Reading length increased, therefore enable its be used for legal medical expert application effectiveness play.In short, regardless of platform, it is all this The General Result studied a bit is that STR can be successfully entered, or even also generated to analyze with CE by impaired forensic samples and be may compare Genotype.

Although many, researches show that the consistency with normal PCR-CE method, and even indicate SNP in other benefit such as STR The detection of (single nucleotide polymorphism), but they have also been highlighted many current problems about the technology.For example, STR The current MPS method of Genotyping relies on multiplex PCR, is sequenced to provide enough DNA and introduces PCR primer.However, because Multiple PCR reagent kit is designed for PCR-CE, so they contain the primer for the amplicon with all size.This change Change causes covering uneven, has the deviation of the amplification towards smaller fragment, this can cause allele to be omitted.In fact, Nearest research is it has been shown that the difference in PCR efficiency can influence component of mixture, especially at low MAF.

Similar to PCR-CE, MPS is not from the influence of PCR skidding generation.Most MPS research about STR is all reported The road appearance of artificial instillation allele.Recently, systemic MPS research report, most of slip events show as shorter length The polymorphism of degree is different from the real allele of four base-pair units, and one of the most common is n-4, but also sees Observe the position n-8 and n-12.Skidding percentage usually occurs in~1% reading, but can be up at certain locus 3%, instruction MPS can show the skidding of the more height ratio than PCR-CE.

The various methods in solution development, chemical/biological chemistry and data processing level are developed, to mitigate MPS The influence of the mistake of based on PCR in.In addition, before amplification or amplification during based on unique random shearing point or via In addition external source label (uses molecular barcode, also referred to as molecular label, unique molecular marker symbol [UMI] and unimolecule mark Know symbol [SMI]) it can thus differentiate that arise from the duplicate technology of PCR of each DNA fragmentation be common.This method has been used for improving The counting accuracy of DNA and RNA template.Because it can clearly identify all amplicons from single starting molecule, plus It is wrong that any variation in the sequence of the sequencing reading of upper same label can be used for correcting the base occurred during PCR or sequencing Accidentally.For example, Kinde et al. (Proc Natl Acad Sci USA 108,9530-9535,2011) introduces SafeSeqS, lead to The PCR copy grouping that shared bar code is sequenced to and is formed consensus sequence is crossed, sequencing is reduced using single chain molecule raddle shape code Error rate.This method causes the average detected about point mutation 0.5% to limit, but its to the validity of str locus seat not yet Carry out extensive evaluation.

The method that describes recently of another kind, MIPSTR, using passing through monomolecular molecular inversion probes (Molecular Inversion Probe) the sequence-specific annealing of (smMIP) to str locus seat is flanked, carrys out targeted capture str locus seat.In After the polymerase of the 3 ' ends of smMIP extends, end is connected and is subjected to PCR amplification and sequencing.To the flanking region of str locus seat The use of the MIP of specificity dramatically increases target-specific, and increases the accuracy that Genotyping is carried out to str locus seat.However, More such as Safe-SeqS, the incorporation of single chain molecule bar code cannot completely eliminate the PCR occurred in first round amplification and manually produce Object is carried on derivative copy as " first prize (jackpot) " event.

Mutation and heredity for str locus seat, single nucleotide polymorphism (SNP) locus and many other forms become The method of the more high accuracy Genotyping of body, medical jurisprudence, medicine, Scientific Industries various applications in be desired.However, One challenge is to copy such as how reliability as highest as possible but with reasonable cost from many correlations of the genetic stocks of sequencing Shellfish most effectively formation sequence information.Various consensus sequence sequencing approaches (based on molecular barcode and are not based on molecular barcode Two kinds) be successfully used for error correcting, with help preferably to identify the variant in mixture (about being discussed in detail, referring to J.Salk et al., Enhancing the accuracy of next-generation sequencing for detecting Rare and subclonal mutations, Nature Reviews Genetics, 2018), but in performance have various Compromise.Dual sequencing is described previously in we, this is a kind of superelevation accuracy sequencing approach, relies on Genotyping and compares The uncrosslinking chain of double-stranded nucleic acid molecule sequencing, the purpose for error correcting.The technical description that links herein is for improving cost Efficiency restores efficiency and other performance measurement, and for the overall process speed of dual sequencing and correlation MPS sequencing approach Method.

Summary of the invention

Technology relates generally to the core of method and such enrichment for error correcting for targeting nucleic acid sequence enrichment The purposes of acid sequencing application.In some embodiments, in this way using the chain of unique tag in double-strandednucleic acid compound Combination, pin-point accuracy, error correcting and the large-scale parallel sequencing of nucleic acid material be it is possible, the mode makes every chain can It is related in information to be complementary chain, but chain difference is also complementary after every chain or the sequencing of amplified production as derived from it It opens, and the information can be used for the purpose of the error correcting of measured sequence.The some aspects of this technology are provided for changing Kind cost, the method for the conversion that molecule is sequenced and the time efficiency for generating mark molecule for targeting superelevation accuracy sequencing and Composition.In some embodiments, the method and composition provided allow accurately to analyze very small amount of nucleic acid material (for example, The DNA freely floated from the sample for being derived from scene of a crime or from small clinical sample or in blood).In some implementations In example, the method and composition provided allows to detect the mutation in the sample of nucleic acid material, and frequency existing for the mutation is less than One of 100 cells or molecule (for example, less than one of 1,000 cells or molecule, less than one of 10,000 cells or molecule, Or less than one of 100,000 cells or molecule).

In some embodiments, this disclosure provides the methods for including the steps that offer double-strandednucleic acid material, and Wherein the nucleic acid material includes unimolecule identifier nucleotide sequence on every chain of the nucleic acid material and in the nucleic acid Linking subsequence at least one of 5 ' and 3 ' ends of every chain of material, wherein the first linking subsequence is positioned at described On one of 5 ' ends or 3 ' ends of first chain of nucleic acid material, and the second linking subsequence is located at the of the nucleic acid material On the opposing end portions of two chains, and wherein first chain and second chain are derived from identical double-stranded nucleic acid molecule;Amplification institute State nucleic acid material;The nucleic acid material of the amplification is separated into the first sample and the second sample;By using to first rank The primer of subsequence specificity is connect, the first chain in first sample is expanded, to provide the first nucleic acid product;By using right The primer of the second adapter sequence-specific expands the second chain in second sample, to provide the second nucleic acid product; First nucleic acid product and the second nucleic acid product are respectively sequenced;And the sequence of first nucleic acid product with The sequence of second nucleic acid product.In some embodiments, nucleic acid material is included in the 5 ' and 3 ' of every chain of nucleic acid material End respectively on linking subsequence.

In some embodiments, this disclosure provides include providing pair comprising one or more double-stranded nucleic acid molecules The method of the step of chain nucleic acid material, wherein each double-stranded nucleic acid molecule include unimolecule identifier nucleotide sequence on every chain, And the adapter at least one of the 5 ' of nucleic acid molecules and/or 3 ' ends, and wherein for each nucleic acid molecules, First linking subsequence is related to the first chain of the nucleic acid molecules, and second is connected the of subsequence and the nucleic acid molecules Two chains are related;Expand the nucleic acid material;The nucleic acid material of the amplification is separated into the first sample and the second sample;By making With the primer to the first adapter sequence-specific, the first chain in first sample is expanded, to provide the first nucleic acid Product;By using the primer to the second adapter sequence-specific, the second chain in second sample is expanded, to mention For the second nucleic acid product;First nucleic acid product and the second nucleic acid product are respectively sequenced;And more described first The sequence of the sequence of nucleic acid product and second nucleic acid product.In some embodiments, nucleic acid material is included in nucleic acid material Every chain 5 ' and 3 ' ends respectively on linking subsequence.

In some embodiments, the present disclosure also provides include the steps that provide double-strandednucleic acid material method, In due to targeting endonuclease (for example, CRISPR correlation (Cas) enzyme/guidance RNA compound, such as Cas9 or Cpf1, big Meganuclease, nuclease (TALEN), Zinc finger nuclease, argonaute nuclease based on activating transcription factor sample effector Deng) cutting, the nucleic acid material cut with provide have essentially similar length (for example, about 1 to 1,000,000 Between base, between 10 to 1,000 bases or between about 100 to 500 bases) nucleic acid material chain, and wherein The nucleic acid material includes unimolecule identifier nucleotide sequence on every chain of the nucleic acid material and in the nucleic acid material At least one of 5 ' and 3 ' ends of every chain on linking subsequence, wherein the first linking subsequence is located at the nucleic acid On one of 5 ' ends or 3 ' ends of first chain of material, and the second linking subsequence is located at the second chain of the nucleic acid material Opposing end portions on, and wherein first chain and second chain are derived from identical double-stranded nucleic acid molecule;Expand the core Sour material;The nucleic acid material of the amplification is separated into the first sample and the second sample;By using to first adapter The primer of sequence-specific expands the first chain in first sample, to provide the first nucleic acid product;By using to described The primer of second adapter sequence-specific expands the second chain in second sample, to provide the second nucleic acid product;To institute It states the first nucleic acid product and the second nucleic acid product is respectively sequenced;And the sequence of first nucleic acid product with it is described The sequence of second nucleic acid product.In some embodiments, nucleic acid material is included in 5 ' and 3 ' ends of every chain of nucleic acid material Linking subsequence on respectively.

In some embodiments, the first nucleic acid product and the second nucleic acid product are respectively sequenced the following steps are included: It is sequenced at least one in the first chain, to determine the first chain-ordering reading;It is sequenced at least one in the second chain, to determine second Chain-ordering reading;And compare the first chain-ordering reading and the second chain-ordering reading, the sequence reads corrected with generation error.In In some embodiments, the sequence reads of error correcting are included in consistent between the first chain-ordering reading and the second chain-ordering reading Nucleotide base.In some embodiments, the variation that the specific location in the sequence reads of error correcting occurs is identified For true variant.In some embodiments, specific in the only one in the first chain-ordering reading or the second chain-ordering reading The variation occurred at position is accredited as potential artefact.

In some embodiments, the sequence reads of error correcting are for identifying or being characterized in double chain target acid molecule from it Biology or cancer in subject, risk of cancer, cancer mutation, the exposure of cancer metabolism state, mutation type surface, carcinogenic substance, Toxin exposure, chronic inflammation exposure, age, neurodegenerative disease, pathogen, anti-medicine variant, fetal molecule, legal medical expert's correlation point Son, immunity related molecular, the T cell receptor of mutation, the B-cell receptor of mutation, the immunoglobulin locus of mutation, genome In the site kategis, the high mutational site in genome, low frequency variant, subclone variant, secondary molecular population, pollution sources, Nucleic acid resultant fault, enzymatically modifying mistake, chemical modification mistake, gene editing mistake, gene therapy mistake, information nucleic acid storage Shape before residual cancer, tumour after piece, microorganism quasispecies, viral quasi-species, organ transplant, organ-graft refection, cancer return, treatment State, depauperation state, microchimerism, stem cell transplantation state, cell therapy state, the nucleic acid for being attached to another molecule Label or combinations thereof.In some embodiments, the sequence reads of error correcting are for identifying carcinogenic compound or exposure.Some In embodiment, the sequence reads of error correcting are for identifying mutagenesis compound or exposure.In some embodiments, nucleic acid material source From forensic samples, and the sequence reads of error correcting are in forensic analysis.

In some embodiments, unimolecule identifier nucleotide sequence includes endogenous shearing point or can be with the shearing point in position Upper relevant endogenous sequence.In some embodiments, unimolecule identifier nucleotide sequence is in degeneracy or half degeneracy bar code sequence At least one, one or more nucleic acid fragment ends of nucleic acid material or combinations thereof uniquely mark the double-strandednucleic acid point Son.In some embodiments, adapter and/or linking subsequence include at least one nucleotide position, the nucleotide position It is at least partly incomplementarity or comprising at least one non-standard bases.In some embodiments, adapter include by about 5 or More single " U-shaped " oligonucleotide sequences formed from complementary nucleotide.

According to various embodiments, any one of multiple nucleic acids material can be used.In some embodiments, nucleic acid material It may include at least one modification to the polynucleotides in classical sugar-phosphate backbone.In some embodiments, nucleic acid material can To include at least one modification in any base in nucleic acid material.For example, as non-limitative example, in some embodiments In, nucleic acid material is or comprising at least one of double-stranded DNA, double-stranded RNA, peptide nucleic acid (PNA), lock nucleic acid (LNA).

In some embodiments, providing step includes that double-strandednucleic acid material is connected to at least one double-strand degeneracy bar code Sequence, to form double-stranded nucleic acid molecule bar code compound, wherein the double-strand degeneracy bar code sequence is included in every chain Unimolecule identifier nucleotide sequence.

In some embodiments, expanding the nucleic acid material in the first sample includes by using special to the first linking subsequence Second primer of anisotropic primer and the non-linking subdivision specificity to the first chain, to expand first in the first sample Chain, to provide the first nucleic acid product.In some embodiments, by using to the second adapter sequence-specific primer and To the second primer of the non-linking subdivision specificity of the second chain, to expand the second chain in the second sample, to provide the second core Acid product.

In some embodiments, the nucleic acid material in the first sample of amplification, which is connected in subsequence including use with first, exists At least partly complementary at least one single-stranded oligonucleotide and at least partly complementary with purpose target sequence at least one of sequence Kind single-stranded oligonucleotide, amplification is originated from the nucleic acid material of the single nucleic acid chains from original double-stranded nucleic acid molecule, so that unimolecule Identifier nucleotide sequence is at least partly maintained.

In some embodiments, expanding the nucleic acid material in the second sample includes that use is connected in subsequence with described second At least partly complementary at least one single-stranded oligonucleotide of existing sequence and with purpose target sequence at least partly it is complementary extremely A kind of few single-stranded oligonucleotide, amplification is originated from the nucleic acid material of the single nucleic acid chains from original double-stranded nucleic acid molecule, so that institute Unimolecule identifier nucleotide sequence is stated at least partly to be maintained.

In some embodiments, amplification of nucleic acid material includes generating the multiple amplicons for being originated from the first chain and from the second chain Multiple amplicons.

In some embodiments, the method provided further includes the following steps before providing step: with one or more It targets endonuclease and cuts nucleic acid material, so that forming the target nucleic acid fragment with substantially known length;And it is based on base Known length separates target nucleic acid fragment in sheet.In some embodiments, the method provided further includes inciting somebody to action before providing step Adapter (such as linking subsequence) is connected to target nucleic acid (such as target nucleic acid fragment).

In some embodiments, nucleic acid material can be or comprising one or more target nucleic acid fragments.In some embodiments In, one or more target nucleic acid fragments respectively contain the target genome sequence of one or more positions in genome. In some embodiments, one or more target nucleic acid fragments include the targeting sequence of the substantially known region in nucleic acid material Column.It in some embodiments, include pure by gel electrophoresis, gel based on substantially known length separation target nucleic acid fragment Change, liquid chromatography, size exclusion purifying, filtering or the purifying of SPRI pearl are to be enriched with target nucleic acid fragment.

According to various embodiments, the method for some offers is in the various suboptimals of sequencing nucleic acid material (for example, impaired or drop Solution) any one of sample can be it is useful.For example, in some embodiments, at least some of nucleic acid material is impaired 's.In some embodiments, damage is or comprising at least one of the following: oxidation, alkylation, deamination, methylation, water Solution, hydroxylating generate notch, crosslinking, interchain linkage, the fracture of flush end chain, staggered end double-strand break, phosphorylation, dephosphorization in chain Acidification, SUMOization, glycosylation, deglycosylation, putrescinylation, carboxylation, halogenation, formylated, single-stranded gap, Lai Zire Damage, carry out self-desiccation damage, from UV exposure damage, from it is gamma-emitting damage, the damage from X-ray, come from The damage of ionising radiation, the damage from heavy-particle radiation, the damage from nuclear decay, comes the damage from Non-ionizing radiation The damage that is radiated from β, from alpha-emitting damage, the damage from neutron irradiation, the damage from proton irradiation, from universe The damage of radiation, the damage from high pH, the damage from low pH, the damage from active oxidation species, from free radical Damage, the damage from peroxide, the damage from hypochlorite, the damage for carrying out self-organizing fixed such as formalin or formaldehyde Wound, the damage from active iron, the damage from low ionic conditions, the damage from macroion condition, from no buffer condition Damage, the damage from nuclease, the damage from environmental exposure, the damage from fire, the damage from mechanical stress, It is damage from enzymatic degradation, the damage from microorganism, the damage from preparative mechanical shearing, disconnected from preparative enzymatic The damage split, in vivo abiogenous damage, the damage occurred during nucleic acid extraction, during sequencing library preparation The damage that has occurred, the damage introduced by polymerase, the damage being had been introduced into during nucleic acid reparation, during nucleic acid tailing The damage that has occurred, the damage occurred during nucleic acid connection, the damage occurred during sequencing, due to the machinery of DNA Handle the damage occurred, by having occurred during nano-pore damage, as biological decay part occurred damage, Due to individual Chemical exposure occurred damage, by mutagens occur damage, by carcinogenic substance occur damage, The damage that has been occurred by clastogen, the damage occurred due to internal inflammation, due to the damage of oxygen exposure, due to one or more The damage and any combination thereof of chain fracture.

Consider that nucleic acid material can come from various sources.For example, in some embodiments, nucleic acid material is (for example, include one Kind or a variety of double-stranded nucleic acid molecules) by from people experimenter, animal, plant, fungi, virus, bacterium, protozoan or it is any its The sample of its life form provides.In other embodiments, sample includes at least partly artificial synthesized nucleic acid material.One In a little embodiments, sample is or comprising bodily tissue, slicer, skin samples, blood, serum, blood plasma, sweat, saliva, Celiolymph, metroclyster liquid, vaginal swab, Pap smear, nose swab, buccal swab, tissue scraping blade, hair, refers to mucus Line, urine, excrement, vitreous humor, peritoneal wash fluid, phlegm, bronchial perfusate, oral cavity irrigating solution, thoracic cavity lavage liquid, gastric lavage Liquid, bile, ductus pancreaticus irrigating solution, bile duct irrigating solution, choledochus irrigating solution, gall-bladder liquid, synovia, infected wound, is uninfected by wound at gastric juice Mouth, archaeological samples, forensic samples, water sample, tissue sample, foodstuff samples, bioreactor sample, plant sample, bacteria samples, Protozoan specimen, fungal sample, animal sample, viral sample, more biological samples, nail scraping blade, sperm, prostatic fluid, yin Road liquid, fallopian tubal irrigating solution, cell free nucleic acid, intracellular nucleic acid, metagenomics sample, is implanted into foreign matter at vaginal swab Irrigating solution or swab, Nasal lavage fluid, intestinal juice, epithelium scrub liquid, epithelium irrigating solution, tissue slicer, autopsy samples, corpse Dissect sample, organ samples, human body identification sample, non-human identification sample, artificially generated nucleic acid samples, synthesis gene sample Product, inventory or storage nucleic acid samples, tumor tissues, fetal samples, organ transplant sample, microculture sample, core DNA sample Product, mitochondrial DNA sample, chloroplast DNA sample, top plastid DNA sample, organelle sample and any combination thereof.In some realities It applies in example, nucleic acid material is originated from more than one source.

As described herein, in some embodiments, it is advantageous that processing nucleic acid material in this way, to improve sequencing procedure Efficiency, accuracy and/or speed.In some embodiments, nucleic acid material is comprising with substantially even length and/or substantially The nucleic acid molecules of known length.In some embodiments, substantially uniform length and/or substantially known length are about 1 To between about 1,000,000 bases.For example, in some embodiments, substantially uniform length and/or substantially known It is at least 1 that length, which can be length,;2;3;4;5;6;7;8;9;10;15;20;25;30;35;40;50;60;70;80;90; 100;120;150;200;300;400;500;600;700;800;900;1000;1200;1500;2000;3000;4000; 5000;6000;7000;8000;9000;10,000;15,000;20,000;30,000;40,000;Or 50,000 bases.In In some embodiments, substantially uniform length and/or substantially known length can be at most 60,000;70,000;80, 000;90,000;100,000;120,000;150,000;200,000;300,000;400,000;500,000;600,000; 700,000;800,000;900,000;Or 1,000,000 bases.As specific non-limitative example, in some embodiments In, substantially uniform length and/or substantially known length are about 100 to about 500 bases.In some embodiments, Via one or more targeting endonucleases, nucleic acid material, which is cut into, has substantially even length and/or substantially Know the nucleic acid molecules of length.In some embodiments, targeting endonuclease includes at least one modification.

In some embodiments, nucleic acid material includes core of the length in one or more substantially known magnitude ranges Acid molecule.In some embodiments, nucleic acid molecules can be between 1 to about 1,000,000 base, about 10 to about 10, and 000 Between base, between about 100 to about 1000 bases, between about 100 to about 600 bases, about 100 to about 500 bases it Between or some combination.

In some embodiments, targeting endonuclease is or comprising at least one restriction endonuclease (that is, limitation Property enzyme), recognition site (for example, EcoRI, BamHI, XbaI, HindIII, AluI, AvaII, BsaJI, BstNI, DsaV, Fnu4HI、HaeIII、MaeIII、NlaIV、NSiI、MspJI、FspEI、NaeI、Bsu36I、NotI、HinF1、Sau3AI、 PvuII, SmaI, HgaI, AluI, EcoRV etc.) at or near cutting DNA.The list of several restriction endonuclease is to print There is provided with both computer-reader forms, and by many commercial suppliers (such as New England Biolabs, Ipswich, MA) it provides.Those of ordinary skill in the art will understand that any limit can be used according to the various embodiments of this technology Property endonuclease processed.In other embodiments, targeting endonuclease be or comprising in ribonucleoprotein complexes at least One kind, e.g., such as related (Cas) enzyme/guidance RNA compound (such as Cas9 or Cpf1) of CRISPR or Cas9 sample enzyme.Other In embodiment, targeting endonuclease is or comprising homing endonuclease, Zinc finger nuclease, TALEN and/or a wide range of nucleic acid Enzyme (such as megaTAL nuclease etc.), argonaute nuclease or combinations thereof.In some embodiments, endonuclease is targeted Include Cas9 or CPF1 or derivatives thereof.In some embodiments, can be used more than one targeting endonuclease (for example, 2,3,4,5,6,7,8,9,10 or more).In some embodiments, targeting endonuclease can be used for cutting nucleic acid material More than one potential target region (for example, 2,3,4,5,6,7,8,9,10 or more).In some embodiments, work as presence When the more than one target region of nucleic acid material, each target region can have the length of identical (or substantially the same).Some In embodiment, when there are the more than one target region of nucleic acid material, it is known that at least two target regions of length in length not Same (for example, the first target region with 100bp length and second target region with 1,000bp length).

In some embodiments, certain modifications are carried out to a part of nucleic acid material sample (for example, linking subsequence).Make For specific example, in some embodiments, the nucleic acid material expanded in the first sample further includes after separating step and first Before sample amplification, part or all of the second linking subsequence found on nucleic acid material is destroyed or ruptured.As into one Example is walked, in some embodiments, the nucleic acid material expanded in the second sample further includes after separating step and in the second sample Before amplification, the first linking subsequence found on nucleic acid material is destroyed or ruptured.In some embodiments, destruction or rupture can To be or including at least one of the following: enzymatic digestion, at least one inhibition of DNA replication molecule include, enzymatic cutting, one The enzymatic cutting of chain, the enzymatic cutting of two chains, modification of nucleic acids the subsequent enzyme to cause a chain or two chains to cut of incorporation Rush processing, incorporation, the incorporation of chain terminator, the incorporation of light cleavable connector, the incorporation of uracil, core of duplication blocked nucleotide The incorporation of glycosyl, the incorporation of 8- oxo-guanine adduct, the use of restriction endonuclease, in ribonucleoprotein nucleic acid Enzyme cutting (for example, Cas enzyme, such as Cas9 or CPF1) or other programmable endonucleases are (for example, homing endonuclease, zinc Finger nuclease, TALEN, meganuclease (for example, megaTAL nuclease), argonaute nuclease etc.) use and its Any combination.In some embodiments, the addition or substitution for destroying or rupturing as primer sites, it is contemplated that under such as affinity It draws, size selection or any other known technology for unwanted nucleic acid material to be removed and/or do not expanded from sample Method.

In some embodiments, at least one amplification step includes at least one primer and/or linking subsequence, be or Include at least one non-standard nucleotide.As other examples, in some embodiments, at least one linking subsequence be or Include at least one non-standard nucleotide.In some embodiments, non-standard nucleotide be selected from uracil, methylated nucleotide, RNA nucleotide, ribonucleotide, 8- oxo-guanine, biotinylated nucleotide, desthiobiotin nucleotide, mercaptan are modified Nucleotide, the different dC, different dG, 2 '-O- methyl nucleotides, inosine nucleotide, lock nucleic acid, peptide core that nucleotide, acrydite are modified Acid, 5 methyl dC, 5-bromouracil deoxyribose, 2,6- diaminopurine, 2-aminopurine nucleotide, abasic nucleotide, 5- nitro Yin Diindyl nucleotide, polyadenylation nucleotide, nitrine nucleotide, digoxin nucleotide, I- connector, 5 ' hexin bases modification nucleotide, The compatible modified nucleoside acid of the pungent diynyl dU of 5-, light cleavable spacer, non-light cleavable spacer, click chemistry, fluorescence dye Material, biotin, furans, BrdU, fluoro- dU, loto-dU and any combination thereof.

According to several embodiments, any of various analytical procedures can be used, to increase the standard of provided process One of true property, speed and efficiency are a variety of.For example, in some embodiments, being produced to the first nucleic acid product and the second nucleic acid It includes comparing the sequence of a plurality of chain in the first nucleic acid product that object, which respectively carries out sequencing, to determine the first chain consensus sequence;And ratio Compared with the sequence of a plurality of chain in the second nucleic acid product, to determine the second chain consensus sequence.In some embodiments, compare the first nucleic acid The sequence of product the first chain consensus sequence and second chain consensus sequence including compared with the sequence of the second nucleic acid product, to provide mistake The consensus sequence accidentally corrected.

Considering can be according to various embodiments using any one of the various method for amplification of nucleic acid material.For example, In some embodiments, at least one amplification step includes polymerase chain reaction (PCR), rolling circle amplification (RCA), multiple displacement expansion Increase (MDA), isothermal duplication, polonies (polony) amplification in lotion, on surface, on the surface of pearl or in hydrogel Bridge amplification and any combination thereof.In some embodiments, amplification of nucleic acid material include with the region of target genome sequence extremely The single-stranded oligonucleotide of small part complementation and make at least partly complementary single-stranded oligonucleotide in region of linking subsequence With.In some embodiments, amplification of nucleic acid material includes being connected subsequence and second with first to be connected the region of subsequence at least Partial complementarity (for example, with 5 and/or 3 ' ends of every chain of nucleic acid material ' on linking subsequence it is at least partly complementary) The use of single-stranded oligonucleotide.

By some embodiments provide be on one side from very small amount of nucleic acid material generate high quality sequencing information Ability.In some embodiments, the method and composition provided can be at most about 1 pik (pg);10pg;100pg;1 nanogram (ng);10ng;100ng;The starting of 200ng, 300ng, 400ng, 500ng, 600ng, 700ng, 800ng, 900ng or 1000ng The amount of nucleic acid material is used together.In some embodiments, the method and composition provided can at most 1 molecule copy or Genome equivalent, 10 molecule copies or its genome equivalent, 100 molecule copies or its genome equivalent, 1,000 molecules Copy or its genome equivalent, 10,000 molecule copies or its genome equivalent, 100,000 molecule copies or its genome The input quantity of the nucleic acid material of equivalent or 1,000,000 molecule copies or its genome equivalent is used together.For example, one In a little embodiments, at most 1,000ng nucleic acid material is initially provided of for specific sequencing procedure.For example, in some embodiments, It is initially provided of at most 100ng nucleic acid material and is used for specific sequencing procedure.For example, in some embodiments, being initially provided of at most 10ng nucleic acid material is used for specific sequencing procedure.For example, in some embodiments, being initially provided of at most 1ng nucleic acid material use In specific sequencing procedure.For example, in some embodiments, being initially provided of at most 100pg nucleic acid material for being specifically sequenced Process.For example, in some embodiments, being initially provided of at most 1pg nucleic acid material and being used for specific sequencing procedure.

As used in the present patent application, term " about " and " about " use as equivalent.Herein to publication, specially Any reference of benefit or patent application is incorporated hereby.In present patent application together with or not together with about/about Any number being used together is intended to cover any normal fluctuation understood by those of ordinary skill in the related art.

In various embodiments, with faster rate (for example, there is less step) and lower cost (for example, benefit With less reagent) enrichment of nucleic acid material, including enriched nucleic acid material are provided to destination region), and lead to the increased phase Hope data.The various aspects of this technology have many applications and other application in preclinical and clinical trial and diagnosis.

The detail of several embodiments of the technology is described below with reference to Figure 1A -24.Although herein in regard to dual survey Sequence describes many embodiments, but in addition to those of being described herein, and is capable of sequencing reading that generation error is corrected and/or other Other sequencing modes that reading is sequenced are also within the scope of the present technology.Additionally, it is contemplated that other nucleic acid inquire after benefit from it is described herein Nucleic acid enriching method and reagent.Further, the other embodiments of this technology can have different from those of being described herein Configuration, component or program.Therefore, those of ordinary skill in the art will accordingly appreciate that, which can have with additional The other embodiments of element, and the technology can have without the several features for showing and describing below with reference to Figure 1A -24 Other embodiments.

Detailed description of the invention

Many aspects of present disclosure may be better understood with reference to following attached drawings.Component in attached drawing not necessarily press than Example is drawn.On the contrary, it is preferred that emphasis is be clearly shown the principle of present disclosure.

Figure 1A is shown according to one embodiment of this technology, the core being used together for some embodiments with this technology Sour adapter molecule and the double-strand adapter-nucleic acid complexes being connect from adapter molecule with double stranded nucleic acid fragment.

Figure 1B and 1C is the conceptual illustration of various dual sequencing approach steps according to one embodiment of this technology.

Fig. 2 is according to present disclosure in some terms, (NGS) is sequenced, based on single-stranded label according to for next-generation Variant gene frequency in the molecular population of error correcting and dual sequencing error correcting, draws the figure of positive predictive value.

Fig. 3 A and 3B show a series of figures, and which show the aspects according to present disclosure, for three different genes Seat, the CODIS genotype phase in the case where error correcting is not present (Fig. 3 A) and after with standard DS analysis (Fig. 3 B) Multiple sequencings are read.

Fig. 4 is the conceptual illustration of SPLiT-DS method and step according to one embodiment of this technology.

Fig. 5 is the conceptual illustration of SPLiT-DS method and step according to one embodiment of this technology, and shows For the step of generating dual consensus sequence.

Fig. 6 is the conceptual illustration of various SPLiT-DS method and steps according to one embodiment of this technology.

Fig. 7 is the conceptual illustration of further SPLiT-DS method and step according to one embodiment of this technology.

Fig. 8 A is to mix the SPLiT-DS method step that double-chain primer site ruins a plan according to additional embodiments of the present technology Rapid conceptual illustration.

Fig. 8 B is the general of the example of SPLiT-DS method and step shown in Fig. 8 A and according to one embodiment of this technology The property read diagram.

Fig. 8 C is the another aspect according to this technology, it then follows method and step shown in Fig. 8 A, SPLiT-DS method and step Embodiment conceptual illustration.

Fig. 8 D is another embodiment according to this technology, the SPLiT-DS method that incorporation double-chain primer site ruins a plan The conceptual illustration of step.

Fig. 9 A and 9B are the further aspects according to this technology, the side SPLiT-DS that incorporation single stranded primer level point ruins a plan The conceptual illustration of the various embodiments of method step.

Figure 10 is the another embodiment according to this technology, using multiple targeting primers for generating longer nucleic acid molecules Dual consensus sequence SPLiT-DS method and step conceptual illustration.

Figure 11 A is according to one embodiment of this technology, to draw obtained family after Nucleic acid inserts size and amplification The figure of relationship between size.

Figure 11 B is the aspect shown according to this technology, and the sequencing data generated for different Nucleic acid inserts sizes shows It is intended to.

Figure 11 C is to show according to one embodiment of this technology, for generating the targeting for distinguishing size by CRISPR/Cas9 The schematic diagram of segment and the method and step for generating sequencing information.

Figure 12 A-12D is the conceptual illustration of CRISPR-DS method and step according to one embodiment of this technology.Figure 12A show the digestion of the CRISPR/Cas9 from TP53 as a result, wherein seven segments contain via using the targeting of gRNA to cut All TP53 encoded exons for cutting and cutting off.Dark grey represents reference chain, and light gray represents anti-reference chain.Figure 12 B is shown It is selected using the size of O.5x SPRI pearl;Uncut genomic DNA combination pearl and allow to recycle excision in the solution Segment.Figure 12 C shows the schematic diagram of the double chain DNA molecule with double-strand DS- adapter fragmentation and connection, the double-strand DS- Adapter contains the random complementary nucleotide and 3 '-dT jag of 10-bp.Figure 12 D is shown about the error correcting by DS Schematic diagram.The reading of same chain from DNA is compared, to form single-stranded consensus sequence (SSCS).It then will be identical Two chains of starting DNA molecular are compared each other, to generate double-strand consensus sequence (DSCS), and in two SSCS readings It was found that mutation be counted as DSCS reading in true mutation.

Figure 12 E and 12F schematically compare the CRISPR-DS and standard DS method according to certain embodiments of the present technology Step.Figure 12 E is the comparison about the library preparation step of CRISPR-DS and standard-DS.Each box represent 1 hour when Between.Figure 12 F shows the schematic diagram of the segment generated using ultrasonic treatment, with passing through with best and consistent length The fragment products (it has being completely covered for sequencing reading) that CRISPR-DS is obtained are compared, and the segment is more shorter than optimum length Or longer (information for corresponding respectively to loss or redundancy).

Figure 13 A-13C is shown according to one embodiment of this technology, from the data of SPLiT-DS program.Figure 13 A It is the representative gel for showing the Insert Fragment size before sequencing.Figure 13 B and 13C are shown in the case where there is no error correctings Under (Figure 13 B) and after being analyzed with SPLiT-DS (Figure 13 C) CODIS genotype relative to it is multiple sequencing reading figures.

Figure 14 A and 14B are to show according to one embodiment of this technology, and for the DNA of high injury, mistake is being not present (Figure 14 A) and the CODIS genotype after being analyzed with SPLiT-DS (Figure 14 B) are read relative to multiple sequencings in the case where correction Several figures.

Figure 15 A and 15B are visually indicated according to one embodiment of this technology, by 10ng (Figure 15 A) and 20ng (figure 15B) the SPLiT-DS sequencing data for the KRAS exon 2 that cfDNA is generated.

Figure 16 A is according to one embodiment of this technology, to pass through the piece being ultrasonically treated and CRISPR/Cas9 fragmentation generates The schematic illustration of segment length.

Figure 16 B and 16C are to show according to one embodiment of this technology, are prepared with standard DS and CRISPR-DS scheme The histogram of the segment insert size of sample.X-axis indicates the percentage difference with best clip size, for example, in adjustment point After sub-barcode and shearing, the clip size of matching sequencing reading length.Cylindrical region shows the range of clip size, In 10% difference of best size, wherein best size is specified with vertical dotted line.

Figure 17 A-17C is shown according to one embodiment of this technology, for targeting the code area of enrichment people TP53 CRISPR/Cas9 scheme.TP53 oncoprotein;Homo sapiens (Homo sapiens);NC_000017.11 Chr.17, Ref.GRCh38.p2.Grey letter represents coding region;Exon title indicates in edge on the right, and when they are identical Time-frame is together in segment.The highlighted text of grey represents Cas9 cleavage site, and wherein PAM sequence has double underline.Add The text of single underscore represents biotinylated probe, and middle probe title indicates on left edge.

Figure 18 A-18C is shown on target the bar chart (Figure 18 A) of the percentage of the primitive sequencer reading of (covering TP53), Show the percentage recycling as calculated by the genome percentage in input DNA, the input DNA generates dual shared sequence Column reading (Figure 18 B), and show according to one embodiment of this technology, for using standard DS and CRISPR-DS to process Various input quantities DNA, across the dual consensus sequence depth of intermediate value (Figure 18 C) of all target areas.

Figure 19 is to show according to one embodiment of this technology, from two captures about three different blood DNA samples Step is compared, by the bar chart of the target enrichment provided of the CRISPR-DS with a capture step.

Figure 20 A and 20B are shown on pulsed field gel (Figure 20 A) and bar chart (Figure 20 B) with the pre- richness of BluePippin Collect high MW DNA's as a result, which show according to one embodiment of this technology, for before BluePippin preenrichment and The identical DNA being sequenced later, the comparison of the percentage of original reading and dual consensus sequence depth on target.

Figure 21 A-21C be synthesis double chain DNA molecule schematic diagram (Figure 21 A) and CRISPR/Cas9 digest after predict The chart (Figure 21 B) of fragment length and the postdigestive practical DNA fragmentation of CRISPR/Cas9 of the double chain DNA molecule in synthesis The resulting TapeStation gel images (Figure 21 C) of length, it was confirmed that according to one embodiment of this technology, use The successful cutting of CRISPR/Cas9 digestion.

Figure 22 A is according to one embodiment of this technology, to draw Nucleic acid inserts size and use CRISPR-DS and standard The figure of relationship after DS scheme amplification TP53 between obtained family's size.Point represents the DNA molecular of original provided with bar code, And in CRISPR-DS, all DNA moleculars (brighter point) have scheduled size, and the PCR for generating similar numbers is copied Shellfish (such as visible by several " band-like " clusters compared with bright spot).In standard DS (dim spot), ultrasonic treatment cuts into DNA variable Fragment length (dim spot, brighter point are broadly distributed on the diagram).The figure illustrates compare long segment it is greater number of compared with Short-movie section.

Figure 22 B-22E is shown according to one embodiment of this technology, about from CRISPR-DS and standard DS method The data of the TP53 of step.The representativeness that Figure 22 B is shown in the Insert Fragment size after adapter connection and before sequencing is solidifying Glue.Figure 22 C and 22D are shown in front of sequencing, the gained nucleic acid generated by CRISPR-DS (Figure 22 C) and standard DS (Figure 22 D) The electrophoretogram at the peak in library.22E is shown by CRISPR-DS and standard DS scheme and Integrative Genomics The dual consensus sequence reading for the TP53 that Viewer is generated.Figure 22 B is shown with ladder and from CRISPR-DS (A1) and mark The TapeStation gel of the sample of quasi- DS (B1).The size of band corresponds to the CRISPR/Cas9 cutting sheet with adapter Section.Figure 22 E shows the different boundary corresponding to CRISPR/Cas9 cut point, and crosses in segment and between segment The uniform depth of position is distributed.Standard-DS shows random shearing and hybrid capture and uneven covering life by segment At peak figure case.

Figure 23 is the schematic overview of CRISPR-DS data processing step according to one embodiment of this technology.

Figure 24 A and 24B are that chart (Figure 24 A) and table (Figure 24 B) are determined which show according to one embodiment of this technology Measuring CRISPR/Cas9 digestion then is the result of the target enrichment degree after size selection.Figure 24 A shows DNA sample and right In the enrichment that every kind of sample is realized.Figure 24 B show with input DNA amount compared with, the percentage of the original reading of " on target ".

Definition

In order to be easier to understand present disclosure, certain terms have been defined below first.Specification from beginning to end Elaborate the other definition for following terms and other terms.

In the present patent application, unless based on context in addition specific, otherwise term "/kind " can be understood as anticipating Refer to " at least one/kind ".As used in the present patent application, term "or" can be understood as meaning "and/or".In this patent In application, term "comprising" and " comprising " can be understood as covering the component or step itemized, either individually present Or it is presented together with one or more in addition component or steps.When providing range herein, including endpoint. As used in the present patent application, the variation of term " including (comprise) " and the term, such as " include (comprising) " it and " comprising (comprises) ", is not intended to exclude other additives, component, integer or step.

About: when referenced herein value in use, term " about " refers to similar value in the context of mentioned value.One As for, the those skilled in the art for being familiar with context, which will be appreciated that, related changes journey by what is " about " covered in this context Degree.For example, in some embodiments, term " about " can cover mentioned value 25%, 20%, 19%, 18%, 17%, 16%, 15%, 14%, 13%, 12%, 11%, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1% or less A series of interior values.

Analog: as used herein, term " analog ", which refers to, shares one or more specific structures spies with reference material Sign, element, component or partial substance.In general, the significant structural similarity of " analog " display and reference material, such as it is shared Core or apokoinou construction, but it is also different in certain discrete way.In some embodiments, analog is for example by reference to object The chemical operation of matter, the substance that can be generated by reference material.In some embodiments, analog be can be basic by executing It is upper similar to (for example, shared multiple steps) generate reference material that synthesis process and the substance that generates.Some It is similar to generate or can be generated from that different synthesis process for generating reference material by executing in embodiment Object.

Biological sample: as used herein, term " biological sample " or " sample " are often referred to derive from or be originated from purpose biology The sample in source (for example, tissue or biology or cell culture), as described herein.In some embodiments, purpose source is wrapped Containing biology, such as animal or people.In other embodiments, purpose source includes microorganism, such as bacterium, virus, protozoan Or fungi.In a further embodiment, purpose source can be synthetic tissue, biology, cell culture, nucleic acid or other materials Material.In further embodiment, purpose source can be the biology based on plant.In another embodiment, sample It can be environmental sample, such as water sample, pedotheque, archaeological samples or the other samples collected from no life source.Other In embodiment, sample can be more biological samples (for example, Mixed biological sample).In some embodiments, biological sample be or Include biological tissue or fluid.In some embodiments, biological sample can be or comprising marrow;Blood;Haemocyte;Ascites; Tissue or fine-needle biopsy samples;Celliferous body fluid;The nucleic acid freely floated;Phlegm;Saliva;Urine;Celiolymph, peritonaeum Liquid;Liquor pleurae;Excrement;Lymph;Gynaecology's fluid;Skin swab;Vaginal swab;Pap smear, buccal swab;Nose swab;It washes Liquid or irrigating solution out, such as ductal lavage fluid or BAL fluid;Vaginal fluid, aspirate;Scraping blade;Sample of bone marrow; Organize biopsy sample;Fetal tissue or fluid;Operation sample;Excrement, other body fluid, secretion and/or excreta; And/or the cell etc. from it.In some embodiments, biological sample is or comprising from the cell that obtains of individual.In some realities It applies in example, the cell of acquisition is or comprising the cell from sample by its individual obtained.In a particular embodiment, biological Sample is the liquid biopsy obtained from subject.In some embodiments, sample is straight by any means appropriate Connect " primary sample " obtained from purpose source.For example, in some embodiments, by being selected from biopsy (for example, fine needle Suction or tissue biopsy), the method collected of operation, body fluid (for example, blood, lymph, excrement etc.) obtain primary biology Sample.In some embodiments, as specific from the context, term " sample " refers to through processing (for example, by removing its One or more components and/or by being added to it one or more reagents) primary sample obtain preparation.For example, using half Permeable membrane filtering.Such " processed sample " may include the nucleic acid or protein for example extracted from sample, or by making primary Sample is subjected to the amplification or reverse transcription of technology such as mRNA, the separation of certain components and/or purifying etc. and the nucleic acid or albumen of acquisition Matter.

Measurement: many methods described herein include " measurement " step.Read the ordinary skill people of this specification Member it will be appreciated that, such " measurement " can by using any in the available various technologies of those skilled in the art utilizing or It realizes, including such as specifically mentioned particular technique of this paper.In some embodiments, measurement is related to the manipulation of physical sample.In In some embodiments, the considerations of being related to data or information and/or manipulation are measured, such as utilize the calculating for being adapted for carrying out correlation analysis Machine or other processing units.In some embodiments, measurement is related to receiving relevant information and/or material from source.In some implementations In example, measurement is related to for the one or more features of sample or entity and comparable reference being compared.

Expression: as used herein, " expression " of nucleic acid sequence refers to one or more of following events: (1) from DNA sequence Column generate RNA template (for example, passing through transcription);(2) processing of RNA transcript by montage, editor, 5 ' caps (for example, formed And/or 3 ' end formed);(3) RNA is translated into polypeptide or protein;And/or the posttranslational modification of (4) polypeptide or protein.

GRNA: as used herein, " gRNA " or " guidance RNA " refers to short rna molecule comprising with substantially target-specific Sequence combine, be suitable for targeting endonuclease (such as Cas enzyme, such as Cas9 or Cpf1 or with the another of similar characteristics A kind of ribonucleoprotein etc.) stent sequence, promote the cutting of the specific region of DNA or RNA.

Nucleic acid: as used herein, in the broadest sense, refer to its incorporation or can mix in oligonucleotide chain Any compound and/or substance.In some embodiments, nucleic acid is to mix or can mix few nucleosides via phosphodiester bond Compound and/or substance in sour chain.Such as specific from context, in some embodiments, " nucleic acid " refers to each nucleic acid (for example, nucleotide and/or nucleosides);In some embodiments, " nucleic acid " refers to the oligonucleotide chain comprising each nucleic acid.In In some embodiments, " nucleic acid " is or comprising RNA;In some embodiments, " nucleic acid " is or comprising DNA.In some embodiments In, nucleic acid is one or more natural acid residues, comprising one or more natural acid residues or by one or more days Right nucleic acid composition.In some embodiments, nucleic acid is one or more nucleic acid analogs, comprising one or more nucleic acids It is formed like object or by one or more nucleic acid analogs.In some embodiments, the difference of nucleic acid analog and nucleic acid It is that it does not utilize phosphodiester backbone.For example, in some embodiments, nucleic acid is one or more " peptide nucleic acids ", comprising one Kind or a variety of " peptide nucleic acids " or be made of one or more " peptide nucleic acids ", the peptide nucleic acid is known in the art, and In main chain there is peptide bond to replace phosphodiester bond, is considered as within the scope of the present technology.Alternatively or additionally, in some implementations In example, nucleic acid has one or more thiophosphates and/or the bonding of 5 '-N- phosphoramidite rather than phosphodiester bond.In In some embodiments, nucleic acid be one or more natural nucleus glycosides (for example, adenosine, thymidine, guanosine, cytidine, uridine, desoxyadenossine, Deoxythymidine, deoxyguanosine and deoxycytidine), include one or more natural nucleus glycosides or by one or more natural nucleus glycosides Composition.In some embodiments, nucleic acid be one or more nucleoside analogs, comprising one or more nucleoside analogs or By one or more nucleoside analogs (for example, the thio thymidine of 2- amino adenosine, 2-, inosine, pyrrolopyrimidine, 3- methyladenosine, 5- methylcytidine, C-5 propinyl-cytidine, C-5 propinyl-uridine, 2- amino adenosine, C5- Broxuridine, C5- floxuridine, C5- iodine Uridine, C5- propinyl-uridine, C5- propinyl-cytidine, C5- methylcytidine, 2- amino adenosine, 7- denitrogenation adenosine, 7- denitrogenation bird Glycosides, 8- oxo adenosine, 8- oxoguanosine, 0 (6)-methyl guanine, 2- thiacydidine, methylated base, insertion base and its group Close) composition.In some embodiments, compared with those of in natural acid, nucleic acid includes one or more modified sugared (examples Such as, 2 '-fluorine ribose, ribose, 2 '-deoxyriboses, arabinose and hexose).In some embodiments, nucleic acid has encoding function The nucleotide sequence of property gene product (such as RNA or protein).In some embodiments, nucleic acid is included including one or more Son.In some embodiments, (in vivo by being separated from natural origin, by the enzyme' s catalysis of the polymerization based on complementary template Or it is external), in recombinant cell or system one of duplication and chemical synthesis or a variety of prepare nucleic acid.In some embodiments In, nucleic acid is long at least 2,3,4,5,6,7,8,9,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85, 90、95、100、110、120、130、140、150、160、170、180、190、200、225、250、275、300、325、350、 375、400、425、450、475、500、600、700、800、900、1000、1500、2000、2500、3000、3500、4000、 4500,5000 or more residue.In some embodiments, nucleic acid is partially or completely single-stranded;In some embodiments, core Acid is partially or completely double-strand.In some embodiments, nucleic acid has the nucleotide sequence comprising at least one element, described The complement of the sequence of at least one component numbering polypeptide or coding polypeptide.In some embodiments, nucleic acid is living with enzymatic Property.In some embodiments, nucleic acid for example plays mechanical function in ribonucleoprotein complexes or transfer RNA.

With reference to: as used herein, describes and execute the standard or control compared relative to it.For example, in some implementations Example in, by purpose reagent, animal, individual, group, sample, sequence or value with referring to or contrast agents, animal, individual, group, Sample, sequence or value are compared.In some embodiments, reference or control, which are tested or measured with purpose, substantially simultaneously carries out Test and/or measurement.In some embodiments, reference or control are history reference or control, the body optionally in tangible medium It is existing.In general, as being appreciated by those skilled in the art, with reference to or to impinge upon in those of the lower comparable condition of assessment or In the case of be measured or characterize.Those skilled in the art will be appreciated that when there is enough similitudes, with proof and specifically The dependence that may refer to or compare and/or compare.

Unimolecule identifier (SMI): as used herein, term " unimolecule identifier " or " SMI ", (it is properly termed as " label ", " bar code ", " molecular barcode ", " unique molecular marker symbol " or " UMI " and other titles), refer to area Any material (for example, nucleotide sequence, nucleic acid molecules feature) of each molecule in the large-scale heterogeneous population of other molecule.One In a little embodiments, SMI can be or the SMI comprising exogenous application.In some embodiments, the SMI of exogenous application can be or Include degeneracy or half degenerate sequence.In some embodiments, the SMI of substantially degeneracy is properly termed as random unique molecular identifier (R-UMI).In some embodiments, SMI may include the code (such as nucleic acid sequence) in known codes library.Some In embodiment, predefined SMI code is known as the unique molecular identifier (D-UMI) limited.In some embodiments, SMI can To be or comprising endogenous SMI.In some embodiments, endogenous SMI can be or comprising the specific shearing point phase with target sequence The information of pass or feature relevant to the end of each molecule comprising target sequence.In some embodiments, SMI can be related to And the sequence variations in nucleic acid molecules, by random or semi-random damage, chemical modification, enzymatically modifying or to its of nucleic acid molecules It, which is modified, causes.In some embodiments, modification can be the deamination of methylcystein.In some embodiments, modification can It can need the site of nucleic acid notch.In some embodiments, SMI may include both external source and endogenous element.In some implementations In example, SMI may include physically adjacent SMI element.In some embodiments, SMI element in the molecule can be in space Upper difference.In some embodiments, SMI can be non-nucleic acid.In some embodiments, SMI may include two or more Different types of SMI information.The various embodiments of SMI are further disclosed in International Patent Publication No. WO2017/100441, institute Patent is stated to be incorporated herein in its entirety by reference.

Chain defines element (SDE): as used herein, term " chain defines element " or " SDE " refer to such any material, It allows to identify the specific chain of double-strandednucleic acid material, and therefore distinguishes over other/complementary strand (for example, causing double from target Two kinds of respective amplified productions of single-chain nucleic acid of chain nucleic acid can substantially be different from mutual after sequencing or other nucleic acid are inquired after Any material).In some embodiments, SDE can be or comprising one of substantially non-complementary sequence in linking subsequence Or multiple sections.In a particular embodiment, be connected the substantially non-complementary sequence in subsequence section can by comprising Y shape or The adapter molecule of " ring " shape provides.In other embodiments, the section for being connected the substantially non-complementary sequence in subsequence can Intermediate with the contiguous complementarity sequence in linking subsequence forms unpaired " being bubbled (bubble) ".In other embodiments In, SDE can cover nucleic acid modification.In some embodiments, SDE may include pairs of chain physical separation at physically dividing The reaction compartment opened.In some embodiments, SDE may include chemical modification.In some embodiments, SDE may include through The nucleic acid of modification.In some embodiments, SDE can be related to by random or semi-random damage, chemical modification, enzymatically modifying or right Sequence variations in nucleic acid molecules caused by other modifications of nucleic acid molecules.In some embodiments, modification can be methyl born of the same parents The deamination of pyrimidine.In some embodiments, modification may need the site of nucleic acid notch.The various embodiments of SDE are further It is disclosed in International Patent Publication No. WO2017/100441, the patent disclosure is incorporated herein in its entirety by reference.

Subject: as used herein, term " subject " refers to biology, and usually mammal is (for example, people, some It include antenatal person form in embodiment).In some embodiments, subject suffers from related disease, illness or situation.In some realities It applies in example, subject's susceptible disease, illness or situation.In some embodiments, subject shows the one of disease, illness or situation Kind or a variety of symptoms or feature.In some embodiments, subject does not show any symptom or the spy of disease, illness or situation Sign.In some embodiments, subject is with neurological susceptibility or disease, the illness or situation to disease, illness or situation The people of the distinctive one or more features of risk.In some embodiments, subject is patient.In some embodiments, tested Person is the individual that it diagnoses and/or treats application and/or has applied.

Substantially: as used herein, term " substantially " refers to the totality for showing purpose feature or characteristic or close to always The qualitative condition of body degree or degree.Field of biology ordinarily skilled artisan will understand that, biological phenomenon and chemical phenomenon are seldom (such as If fruit has) complete and/or proceed to completion or realize or avoid absolute results.Therefore, term " substantially " is herein For capturing the hidden hunger of completeness intrinsic in many biological phenomenons and chemical phenomenon.

Specific embodiment

The selection example of dual sequencing approach and relevant adapter and reagent

Dual sequencing (DS) is the method for generating the DNA sequence dna of error correcting from double-stranded nucleic acid molecule, and it is most It is just described in International Patent Publication No. WO2013/142389 and U.S. Patent number 9,752,188, described two patents are to draw Mode is integrally incorporated.As that shown in figs. 1A to 1 c, and in this technology in some terms, DS can be used for such side Two chains of each DNA molecular are independently sequenced in formula, and derivative sequence reads are identified as during MPS derived from phase Same double-strandednucleic acid parent molecules can also be distinguished from each other open after sequencing as diacritic entity.Then compare from every The obtained sequence reads of chain, for obtaining the mistake for being known as the original double-stranded nucleic acid molecule of dual consensus sequence (DCS) The purpose of correct sequence.The process of DS enables confirmation that whether a chain of original double-stranded nucleic acid molecule or two chains are being used for Being formed in the sequencing data generated of DCS indicates.

In certain embodiments, the method for mixing DS may include that one or more sequencing adapter is connected to target double-strand Nucleic acid molecules, the target double-stranded nucleic acid molecule includes the first chain target nucleic acid sequence and the second chain target nucleic acid sequence, to generate double-strand Target nucleic acid compound (such as Figure 1A).

In various embodiments, obtained target nucleic acid compound may include at least one SMI sequence, may need The degeneracy of exogenous application or half degenerate sequence, interior source information relevant to the specific shearing point of target double-stranded nucleic acid molecule or its group It closes.SMI may cause to target-nucleic acid molecules substantially and can be different from a number of other molecules wait be sequenced in group.The base of SMI element In sheet diacritic feature can by every it is single-stranded independently carry, single-stranded such formation double-stranded nucleic acid molecule, so that often The derivative amplified production of chain can be identified as coming from identical original substantially unique double-stranded nucleic acid molecule after sequencing. In other embodiments, SMI may include other information and/or can be used in other methods, for other methods, Such molecular difference function is useful, for example, described in the publications cited above those.In another embodiment, SMI element can mix after adapter connection.In some embodiments, SMI is substantially double-strand.In other embodiments In, it is substantially single-stranded.In other embodiments, it is substantially single-stranded and double-strand combination.

In some embodiments, each double chain target acid sequence compound can also include element (for example, SDE), cause Making to form the amplified production of the two of target double-stranded nucleic acid molecule kinds of single-chain nucleic acids substantially can be distinguishable from one another after sequencing.In a reality It applies in example, SDE may be embodied in the asymmetric primer sites for including in sequencing adapter, alternatively, in other arrangements, can incite somebody to action Sequence asymmetry introduces in the adapter molecule not in primer sequence, so that the first chain and target nucleus of target nucleic acid sequence compound At least one position in the nucleotide sequence of second chain of acid sequence compound is different from each other after amplification and sequencing.Other In embodiment, SMI may include another biochemistry asymmetry between two chains, be different from classical nucleotide sequence A, T, C, G or U, but at least one classical nucleotide sequence difference is converted into the molecule of two kinds of amplifications and sequencing.In addition In one embodiment, SDE can be the means that two chains are physically isolated before amplification, so that coming from the first chain target nucleic acid The derivative amplified production of sequence and the second chain target nucleic acid sequence keeps mutually substantially physical separation, for maintaining between the two The purpose of differentiation.It can use the other such arrangements or method for providing SDE function, the SDE function allows difference One chain and the second chain, the arrangement or method for example described in the publication mentioned above those, or described by playing Functional purpose other methods.

After generating the double chain target acid compound comprising at least one SMI and at least one SDE, or it is subsequently introduced When one or both of these elements, compound can be made to be subjected to DNA cloning, such as any other bioid of PCR or DNA cloning Method is (for example, rolling circle amplification, multiple displacement amplification, isothermal duplication, bridge amplification or surface combine amplification, so that generate the One or more copies of one chain target nucleic acid sequence and one or more copies (for example, Figure 1B) of the second chain target nucleic acid sequence. Then the one or more amplification copies and the one or more of the second target nucleic acid molecule that can make the first chain target nucleic acid molecule expand Increase copy and is subjected to DNA sequencing, it is preferable to use " next generation " extensive parallel DNA sequencing platform (for example, Figure 1B).

Based on relevant substantially unique SMI is shared, can identify by first from original double chain target acid molecule The sequence reads that chain target nucleic acid molecule and the second chain target nucleic acid molecule generate, and opposite strand target nucleic acid is different from due to SDE Molecule.In some embodiments, SMI can be the sequence of the error correction code (for example, Hamming code) based on mathematics base, thus It can tolerate certain amplification mistakes, misordering or SMI resultant fault, it is original dual (for example, double-strandednucleic acid point for being associated in Son) complementary strand on SMI sequence sequence purpose.For example, wherein SMI includes classics DNA alkali for double-strand external source SMI The complete degenerate sequence of 15 base-pairs of base estimates that 4^15=1,073,741,824 SMI variant will be present in complete degeneracy In the group of SMI.If recycling two SMI from the reading of sequencing data, in the group that 10,000 sample SMI A nucleotide is differed only by SMI sequence, then can mathematically calculate such case occurred by random chance it is probability, And it is made whether to be more likely to decision of the single base to one of the mistake of difference reflection the above-mentioned type, and can determine SMI sequence Column have actually been originated from identical original dual molecule.In some embodiments, wherein SMI be at least partly exogenous application sequence Column, wherein sequence variants incomplete degeneracy each other, and be at least partly known array, it is known that the identifier of sequence is in some realities Apply in example to design in this way, so that the one or more mistake of aforementioned type is not by a kind of known SMI sequence Identifier is converted to that of another SMI sequence, so that reducing the probability that a SMI is misinterpreted as another SMI.Some In embodiment, this SMI layout strategy includes Hamming code method or derivatives thereof.Once identification, just compares from the first chain target nucleus One or more sequence reads that acid molecule generates and the one or more sequence reads generated from the second chain target nucleic acid molecule, with Generate the target nucleic acid molecule sequence (for example, Fig. 1 C) of error correcting.For example, wherein coming from the first chain target nucleic acid sequence and the second chain The consistent nucleotide position of the base of target nucleic acid sequence is considered as real sequence, and nucleotide inconsistent between two chains Position be identified as being ignored (discounted) error of performance potential site.Therefore it can produce original double-strand target The error correcting sequence (being shown in Fig. 1 C) of nucleic acid molecules.

Alternatively, in some embodiments, the inconsistent site of sequence can be identified as original pair between two chains The potential site of mispairing derived from biology in chain target nucleic acid molecule.Alternatively, in some embodiments, sequence between two chains The potential site that DNA in original double chain target acid molecule synthesizes derivative mispairing can be identified as by arranging inconsistent site.It can Alternatively, in some embodiments, the inconsistent site of sequence can be identified as potential site between two chains, wherein being damaged Or modified nucleotide base is present on a chain or two chains, and passes through enzymatic processes (such as archaeal dna polymerase, DNA Glycosylase or another nucleic acid modifying enzyme or chemical process) be converted to mispairing.In some embodiments, latter discovery can be used In the presence for inferring nucleic acid damaging or nucleotide modification before enzymatic processes or chemical treatment.

Fig. 2 is according to present disclosure in some terms, (NGS) is sequenced, based on single-stranded label according to for next-generation Variant gene frequency in the molecular population of error correcting and dual sequencing error correcting draws theoretical positive predictive value Figure.With reference to Fig. 2, according to the DS mistake for next-generation sequencing (NGS), the error correcting based on single-stranded label and specified error rate Variant gene frequency in the molecular population of correction draws positive predictive value (for example, the correct positive expection number called The total number called divided by the positive).Such as be overlapped by curve it is visible, if the frequency of the variant detected be greater than 1/10, It is all correct that nearly all mutant, which is called using any method,.However, standard Illumina is sequenced and is based on single-stranded label Error correcting error rate cause respectively in the positive predictive value of~1/100 and 1/1,000 variant frequency key damage It loses.Allow the secured identification of the variant lower than 1/100,000 (dotted line) by the extremely low error rate that DS is assigned.

In some embodiments, and according to the aspect of this technology, the DS step by being discussed herein can further be filtered The sequencing of generation is read, and (for example, during storage, transport, is extracted in tissue or blood with eliminating the molecule from DNA damage During or after, library prepare during or after etc.) sequencing reading.For example, DNA repair enzyme, such as uracil-DNA glycosyl Change enzyme (UDG), formamidopyrimidine DNA glycosylase (FPG) and 8- oxoguanine DNA glycosylase (OGG1), Ke Yiyong In elimination or correct DNA damage (such as external DNA damage or in vivo damage).For example, these DNA repair enzymes are removed from DNA The glycosylase of impaired base.For example, UDG removal derives from the urine of cytosine deamination (being caused by the spontaneous hydrolysis of cytimidine) Pyrimidine, and FPG removal 8- oxo-guanine (for example, the common dna from active oxygen species damages).FPG, which also has, to be split Enzymatic activity is solved, 1 base notch can be generated at abasic site.For example, such abasic site generally then cannot By PCR amplification, because polymerase cannot replicate template.Correspondingly, the use of such DNA damage reparation/elimination enzyme can have Effect ground removal damaged dna, do not have really mutation, but be sequenced and dual-serial analysis after may in other aspects without Method is detected as mistake.Although in rare cases, theoretical since the mistake for being damaged base can often be corrected by DS On complementary mistake can occur at the same position on two chains, therefore, reducing the increased damage of mistake can be reduced manually The probability of product.In addition, library preparation during, certain segments of DNA to be sequenced can be from its source or from plus Work step rapid (for example, machinery DNS shearing) it is single-stranded.It is converted in " end reparation " step that these regions are usually known in the art For double-stranded DNA, thus archaeal dna polymerase and nucleosides substrate are added in DNA sample, to extend 5 ' recessed ends.DNA to be copied Single stranded portion in DNA damage Mutagenesis Site (i.e. the single-stranded 5 ' jag of one or two end of DNA double weight or Single-stranded nick or gap inside person), it can cause to may cause to the mistake during filling reaction of single-stranded mutation, resultant fault Or nucleic acid damaging site becomes double-stranded form, can be misinterpreted as really being mutated in final dual consensus sequence, thus Really mutation is present in original double-stranded nucleic acid molecule, in fact, it is really not so.By using such damage and failure/repair Multiple enzyme, it is possible to reduce or prevention is known as the such case of " false dual ".In other embodiments, former by using destroying or preventing Begin strategy that the single stranded portion of dual molecule formed (such as the original double-strandednucleic acid material of fragmentation rather than mechanical shearing Certain enzymes or can leave notch or gap certain other enzymes use), it is possible to reduce or eliminate such case.At it In its embodiment, process (such as the single-stranded specific nucleic acid enzyme, such as S1 nuclease of the single stranded portion of original double-strandednucleic acid are eliminated Or mung-bean nuclease) use can be used for similar purpose.

In a further embodiment, the sequencing reading generated from the DS step being discussed herein can be further filtered, with The end of false dual artifactitious reading is easiest to by trimming to eliminate error mutation.For example, DNA fragmentationization can be double The end of chain molecule generates single stranded portion.These single stranded portions can fill during end is repaired (for example, passing through Klenow Or T4 polymerase).In some cases, polymerase generates copy errors in the region that these ends are repaired, and causes " false dual The generation of molecule ".Once these artefacts of sequencing, library preparation may improperly look like real mutation. As end repair mechanism as a result, these mistakes can be eliminated from the analysis after sequencing by the end of trimming sequencing reading Or reduce, to exclude any mutation that may have occurred in high risk zone, to reduce the number of error mutation.At one In embodiment, such trimming that reading is sequenced can be automatically performed (for example, normal process steps).In another embodiment In, the frequency of mutation in fragment ends region can be evaluated, and if the threshold value water of mutation is observed in fragment ends region It is flat, then sequencing reading trimming can be executed before the double-strand consensus sequence reading for generating DNA fragmentation.

Compared with the next-generation sequencing approach of standard, the height error correcting provided by the chain comparison techniques of DS is by double-strand core The sequencing mistake of acid molecule reduces multiple orders of magnitude.The reduction of this mistake improves to be sequenced in almost all kinds of sequence Accuracy, but be particularly well-suited for biochemistry challenge sequence, be well known in the art and be especially susceptible to malfunction.This One non-limitative example of the sequence of seed type is homopolymer or other microsatellite/Short tandem repeatSTRs.Benefit from DS mistake school Another non-limitative example of positive error-prone sequence is for example by heating, radiation, mechanical stress or variousization Learn exposure and impaired molecule, the Chemical exposure is generated to be easy to during through the copy of one or more nucleotide polymerases The chemical adducts of error.In a further embodiment, DS can be also used for accurately detecting in double-stranded nucleic acid molecule group A small number of sequence variants.One non-limitative example of present patent application be the non-cancerous tissue in subject it is a large amount of not In mutating molecule, detection is originated from a small amount of DNA molecular of cancer.Another about the rare variant detection by DS is non-limiting It is detected using the legal medical expert of the DNA for another individual for being the DNA from the individual mixed with low abundance and different genotype.

DS, which has been displayed, is removing both artefacts derived from amplification and sequencing/sequenator in mitochondria and core DNA side Face is extremely successful.However, certain previous researchs have concentrated on detection body cell point mutation and small (such as < 5bp) insertion And missing.Solving some challenges (for example, removal PCR skidding, low-level DNA, mixing sample etc.) relevant to forensic analysis When, DS has very big hope to method medical circles.For example, and refer to Fig. 3 A and 3B, when compared with conventional MPS, DS is had proven to Remove the ability that PCR skids.In this example embodiment, using conventional MPS (Fig. 3 A) and DS (Fig. 3 B), end is being matched with 300bp On the Illumina MiSeq platform of reading, to the Three Represents from 10ng Promega 2800M standard reference material DNA Property CODIS locus is sequenced, and with STRait-Razor STR allele calls tool impression data.Fig. 3 A is shown Three figures, which show about three CODIS locus respective CODIS genotype relative to there is no error correctings In the case of multiple sequencings reading (for example, routine MPS), and show several slip events (black arrow).In contrast, And as shown in Figure 3B, DS eliminates the slip event about identical three CODIS locus.In all original 13 bases of CODIS Because of similar result all visible at seat.Correspondingly, the various aspects of DS technology can overcome about forensic analysis by conventional method Some limitations of experience.In addition to the other application of DS, the other aspects of forensic analysis also can benefit from each of transfer efficiency Any improvement of a aspect, or be converted to the percentage of the input DNA of the sequence data of error correcting.Forensic analysis can refer to especially It is poached, peddles or abuses, human or animal with mankind's crime, natural calamity, extensive casualty accident, animal or other life circle Remains identification, attack identification, missing crew's identification, assault sexually identification, paleontology apply application relevant with archaeology application.

About the efficiency of DS process, two kinds of efficiency described further herein: transfer efficiency and workflow efficiency. In order to discuss that the purpose of DS efficiency, transfer efficiency can be defined as the unique nucleic acid molecule being input in sequencing library preparation reaction Score, at least one dual consensus sequence reading prepares reaction by the sequencing library and generates.Workflow efficiency may relate to And these steps are carried out to generate dual sequencing library and/or carry out the required time quantum of targeting enrichment, step for aim sequence The relative inefficiency of rapid relative number and/or reagent/material financial cost.

In some cases, any or both in transfer efficiency and the limitation of workflow efficiency can limit high accuracy DS For otherwise by the effectiveness of fit closely some applications.For example, low transformation efficiency will lead to the copy of wherein target double-strandednucleic acid The limited situation of number, this can cause to generate less than the desired amount of sequence information.The non-limitative example of the concept includes coming from The DNA of circulating tumor cell or cell free DNA from tumour or antenatal baby, fall in body fluid such as blood plasma and with Excessive DNA mixing from other tissues.Although DS, which usually has, can differentiate more than one in 100,000 unmutated molecules The accuracy of a mutating molecule, still, for example, it is available if only having 10,000 molecule in sample, and even by these points The ideal efficiency that son is converted to dual consensus sequence reading is 100%, then the minimum frequency of mutation 1/ (10,000* that can be measured 100%)=1/10,000.As clinical diagnosis, there is the low level signal of detection cancer or the maximum spirit for the treatment of related mutation Sensitivity may be important, and therefore relatively low transfer efficiency is undesirable in this case.Similarly, in legal medical expert In, often few DNA can be used for testing.When can only be from scene of a crime or natural calamity place recycling nanogram or picogram Amount, and when the DNA from multiple individuals is mixed, owning in mixture is being able to detect with maximum conversion efficiency It can be important in terms of the presence of individual DNA.

In some cases, application is inquired after for certain nucleic acid, workflow inefficiency may be similarly challenge. One non-limitative example of this point is in clinical microbiology test.Sometimes for quickly detecting one or more infectiousness The property of biology, for example, microorganism or multiple-microorganism bloodstream infection, some of biologies are become based on its uniqueness heredity carried Body is resistance to certain antibiotics, but the time that the antibiotics sensitivity for cultivating and being empirically determined infectious organisms is spent, It is more much longer than the time that the treatment that must be made in it about the antibiotic for being ready to use in treatment determines.It is (or other to carry out autoblood Infected tissue or body fluid) the DNA sequencing of DNA there are faster potentiality, and for example, in other high accuracy sequencing approaches In DS can extremely accurate be detected based on DNA marker and infect the upper important a small number of variants for the treatment of in group.Due to workflow The turnaround time generated to data for determine treatment option (for example, as used in this article example in) be it is crucial, because This application for increasing up to the speed of data output is also desired.

Further disclosed herein is for targeting nucleic acid sequence enrichment method and composition and such enrichment for mistake The purposes of the nucleic acid sequencing application of correction provides cost, the conversion of molecule is sequenced and generates for targeting superelevation accuracy Improvement in the time efficiency of the mark molecule of sequencing.

SPLiT-DS

In some embodiments, the method provided provide with using molecular barcode for error correcting it is compatible based on The targeting enrichment strategy of PCR.Fig. 4 is according to one embodiment of this technology, to utilize the separation PCR of the link template for sequencing The sequencing of (Separated PCRs of Linked Templates for sequencing) (" SPLiT-DS ") method and step The conceptual illustration of enrichment strategy.With reference to Fig. 4, and in one embodiment, SPLiT-DS method can be with molecule bar shaped The double-strandednucleic acid material (for example, coming from DNA sample) of code labeling (for example, add label) fragmentation starts, mode and above Description and the mode similar about standard DS library construction protocols (for example, as shown in fig. 1b) description.In some embodiments, Double-strandednucleic acid material can be (for example, such as cell free DNA, damaged dna) of fragmentation;However, in other embodiments In, various steps may include the nucleic acid material fragmentation using mechanical shearing such as ultrasonic treatment or other DNA cutting methods, Such as it is described further herein.It may include that end reparation and 3 '-dA- add in terms of the double-strandednucleic acid material of labeled fragment Tail, if needed in a particular application, then in order double stranded nucleic acid fragment to be connect with the DS adapter containing SMI, (Fig. 4 is walked It is rapid 1).In other embodiments, SMI can be the combination of endogenous or external source and endogenous sequence, come from for being uniquely associated with The information of two chains of original nucleic acid molecule.After adapter molecule is connected to double-strandednucleic acid material, this method can continue Amplification (for example, PCR amplification, rolling circle amplification, multiple displacement amplification, isothermal duplication, bridge amplification, surface combination amplification etc.) (figure 4, step 2).

In certain embodiments, amplification of nucleic acid can be used for the primer of for example one or more adapter sequence-specifics Every chain of material leads to multiple copies of the nucleic acid amplicon of every chain from original double-stranded nucleic acid molecule, wherein each SMI (Fig. 4, the step 2) of amplicon reservation initial combination.Expand and combine step to remove byproduct of reaction after, can be with By sample splitting (preferably, but not necessarily, essentially homogeneously) at two or more separated samples (for example, in pipe, In emulsion droplet, in microchamber, on the surface separation drop or other known container, are referred to as " managing ") (Fig. 4, step It is rapid 3).Alternatively, the amplified production of amplification can by do not require they in the solution in a manner of split, for example, in conjunction with microballon, Then for by microballon group be divided into two rooms or by separated amplified production adhere on surface two or more are different Physical location.Herein, any of such separated group of this latter is similarly known as functionally of equal value by we , and in different " pipes ".In the example shown in Fig. 4, the step cause to find in each pipe any given chain/ Average even the half of barcode amplification copies.In other implementations that primary sample is wherein split into the separated sample of more than two In example, such distribution of nucleic acid material will lead to the relatively comparable amplicon for reducing number.It should be noted that wherein amplicon The random nature being split leads to the variation about the average value.In order to consider this variation, hypergeometric distribution is (that is, without replacing Change and select the probability of k bar code copy) it may be used as model, to determine at least one for containing each pipe from two chains The chance of copy reaches the minimum amplification subnumber (for example, PCR is copied) of maximum required SMI (for example, bar code).Be not intended to by To the constraint of specific theory, consider during step 2 >=4 PCR cycles (i.e. 24=16 copy/bar codes) ensure it is following 99% probability of >: each bar code from every chain is copied at least indicates primary in each pipe.In some embodiments, It preferably may anisotropically split amplified production.If nucleic acid material separates in more than two pipe, amplification cycles in addition It can be used for generating other copy, to adapt to further separate.By sample splitting at two pipes after, can be used for The primer of adapter sequence-specific and primer to purpose target nucleic acid regiospecificity are enriched with target nucleic acid area with multiplex PCR Domain (for example, destination region, locus etc.) (Fig. 4, step 3).In another embodiment, can add in the subsequent of the second primer In addition linear amplification step is added before, allows the exponential amplification of purpose target region.

In certain embodiments, multiple target-specific PCR is executed, so that obtained PCR product is only originated from each pipe One of two chains (for example, " top chain " or " bottom chain ").(step 3) as shown in Figure 4, in some embodiments, this is as follows It realizes: in the first pipe (be shown in left side on), and be connected " reading 1 " (for example, the Illumina P5) of subsequence at least partly Complementary primer (Fig. 4, step 3;Grey arrow) and it is at least partly complementary with purpose nucleic acid region and containing " reading 2 " (i.e. Illumina P7, black arrow w/ grey tail portion) it is connected the primer of subsequence, it is original for specifically amplification (for example, enrichment) " top chain " (Fig. 4, step 3 and 4) of nucleic acid molecules.In first sample, and since the property of SDE is (for example, this In the case of, the unique linking subsequence about target nucleic acid insert is orientated), " bottom chain " cannot be expanded suitably.Similarly, It is at least partly complementary with " reading 2 " (for example, the Illumina P5) of linking subsequence in the second pipe (being shown on right side) Primer (Fig. 4, step 3;Grey arrow) and it is at least partly complementary with purpose nucleic acid region and containing " reading 1 " (i.e. Illumina P7, black arrow w/ grey tail portion) it is connected the primer of subsequence, it is original for specifically amplification (for example, enrichment) " bottom chain " (Fig. 4, step 3 and 4) of nucleic acid molecules.In second sample, " top chain " cannot be expanded suitably.In PCR Or after other amplification methods, multiple copies of " top chain " are generated in the first pipe, and " bottom is generated in the second pipe Multiple copies of chain ".It can be used on each end of nucleic acid amplicon since these resulting target-specific copies respectively have Two kinds of linking subsequences (for example, Illumina P5 and Illumina P7 be connected subsequence), standard MPS method can be used These products rich in target are sequenced.

Fig. 5 is the conceptual illustration for the SPLiT-DS method and step for such as showing and discussing about Fig. 4.And it further displays According to one embodiment of this technology, multiple copies of the target region for being enriched with to each PCR are sequenced and are generated double The step of weight consensus sequence.In multiple copies to " top chain " from first pipe and " the bottom chain " from second pipe Multiple copies be sequenced after, can with similar to DS method analyze sequencing data, thus share identical molecule bar shaped The sequencing reading of code is separately grouped, and the molecular barcode is originated from ' top ' or ' bottom ' chain of original double chain target acid molecule (it finds in the first pipe and the second pipe respectively).In some embodiments, the grouping sequencing reading from " top chain " is used for Top chain consensus sequence (for example, single-stranded consensus sequence (SSCS)) is formed, and the sequencing reading of the grouping from " bottom chain " is used In formation bottom chain consensus sequence (for example, SSCS).With reference to Fig. 5, it then can compare top and bottom SSCS, have to generate The dual consensus sequence (DCS) of consistent nucleotide is (for example, if they appear in the survey from two chains between two chains In sequence reading, then variant or mutation are considered as genuine (see, for example, Fig. 1 C).

As specific example, in some embodiments, it provided herein is the error correctings for generating double chain target acid material Sequence reads method comprising the step of double chain target acid material is connected at least one linking subsequence, to be formed Adapter-target nucleic acid material composite, wherein at least one linking subsequence includes the unimolecule of (a) degeneracy or half degeneracy Identifier (SMI) sequence uniquely marks each molecule of double chain target acid material, and (b) by adapter-target nucleic acid material Expect first nucleotide linking subsequence and with first nucleotide sequence at least partly non-of first chain plus label of compound The second complementary nucleotide is connected subsequence, and the second chain of adapter-target nucleic acid material composite is added label, so that rank Connect every chain of son-target nucleic acid material composite have relative to its complementary strand it is different identify nucleotide sequence.This method Next it may comprise steps of: amplification every chain of adapter-target nucleic acid material composite, to generate multiple first chain ranks Connect son-target nucleic acid compound amplicon and multiple second chain adapters-target nucleic acid compound amplicon, and by adapter- Target nucleic acid compound amplicon is separated into the first sample and the second sample.This method may also comprise the following steps:: by using With the first nucleotide at least partly complementary the first primer of linking subsequence and with purpose target sequence is at least partly complementary draws Object expands the first chain in the first sample, to provide the first nucleic acid product, and by using with the second nucleotide adapter sequence Column at least partly complementary the second primer and the primer at least partly complementary with purpose target sequence, expand in the second sample Second chain, to provide the second nucleic acid product.This method may also comprise the following steps:: producing to the first nucleic acid product and the second nucleic acid Object is respectively sequenced, and to generate multiple first chain-ordering readings and multiple second chain-orderings reading, and confirms at least one The presence of first chain-ordering reading and at least one second chain-ordering reading.This method can also include compare at least one first Chain-ordering reading is read at least one second chain-ordering, and by ignoring inconsistent nucleotide position, or alternatively, The the first chain-ordering reading and the second chain-ordering reading for removing the comparison with one or more nucleotide positions, in the nucleosides In sour position, the first chain-ordering reading and the second chain-ordering reading compared is non-complementary, Lai Shengcheng double chain target acid material Error correcting sequence reads.

As other specific example, in some embodiments, it provided herein is the sides of the identification of dna variant from sample Method comprising following steps: by two chain links of nucleic acid material (for example, double-strand target DNA molecule) at least one asymmetry Adapter molecule has the with the top chain combination of double-strand target DNA molecule to form adapter-target nucleic acid material composite One nucleotide sequence and the second nucleotide sequence with the first nucleotide sequence at least partly incomplementarity, second nucleosides The bottom chain combination of acid sequence and double-strand target DNA molecule, and every chain of adapter-target nucleic acid material is expanded, lead to every Chain generates adapter-target DNA product of one group of unique but relevant amplification.This method may also comprise the following steps:: will be connected Son-target DNA product is separated into the first sample and the second sample, by using specific (for example, at least to the first nucleotide sequence Partial complementarity) the first primer and the primer at least partly complementary with purpose target sequence, expand the first sample in linking Son-target DNA product top chain, to provide top chain adapter-target nucleic acid compound amplicon, and by using to second The second primer and the second primer of nucleotide sequence specificity (for example, at least partly complementary), expand the bottom in the second sample Chain, to provide bottom chain adapter-target nucleic acid compound amplicon.This method may also comprise the following steps:: holding in the mouth to top chain It connects son-target nucleic acid compound amplicon and bottom chain adapter-target nucleic acid compound amplicon is respectively sequenced, confirmation comes from The presence of at least one extension increasing sequence reading of adapter-target DNA compound every chain, and compare and obtained from top chain At least one extension increasing sequence is read to be read at least one extension increasing sequence obtained from bottom chain, only has nucleosides soda acid to be formed The consensus sequence of the nucleic acid material (for example, double-strand target DNA molecule) of base is read, at the nucleotide base, nucleic acid material (example Such as, double-strand target DNA molecule) the sequences of two chains be consistent so that the specific location in consensus sequence reading occurs Variant be accredited as real DNA variant.

In some embodiments, it provided herein is the double-strand consensus sequences corrected from double-strandednucleic acid material generation error Method comprising following steps: adding label for each dual DNA molecular with adapter molecule, to form the DNA material for adding label Material uniquely marks dual DNA wherein each adapter molecule includes the unimolecule identifier (SMI) of (a) degeneracy or half degeneracy Molecule, and (b) the first non-complementary nucleotide linking subsequence and the second non-complementary nucleotide are connected subsequence, add for each The DNA molecular of upper label, difference plus label DNA material in each of the original top chain of individual DNA moleculars with it is original Bottom chain, and generate one group of repetition for adding the original top chain of DNA molecular of label and the DNA molecular plus label Original bottom chain one group of repetition, with formed amplification DNA material.This method may also comprise the following steps:: by amplification DNA material is separated into the first sample and the second sample, by using to the first nucleotide adapter sequence-specific primer, with And the primer at least partly complementary with purpose target sequence, the other repetition of the original top chain in the first sample is generated, to mention For the first nucleic acid product, and by using to the second nucleotide adapter sequence-specific primer and with purpose target sequence At least partly complementary (identical or different) primer of column, generates the other repetition of the original bottom chain in the second sample, to mention For the second nucleic acid product.This method may also comprise the following steps:: it is single-stranded total to generate first from the other repetition of original top chain There is sequence (SSCS) and repeat to generate the second single-stranded consensus sequence (SSCS) from the other of original bottom chain, compares original top chain The first SSCS and original bottom chain the 2nd SSCS, and the double-strand for generating the only error correcting with nucleotide base is shared Sequence, at the nucleotide base, the sequence of both the 2nd SSCS of the first SSCS of original top chain and original bottom chain It is complementary.

Unimolecule identifier nucleotide sequence (SMI)

According to various embodiments, the method and composition provided includes the one or more on every chain of nucleic acid material SMI sequence.SMI can with origin derived from every of double-stranded nucleic acid molecule it is single-stranded independently carry so that the derivative amplification of every chain Product can be identified as coming from identical original substantially unique double-stranded nucleic acid molecule after sequencing.In some embodiments In, SMI may include other information and/or can be used in other methods, for the method, such molecular difference function Property be it is useful, as the skilled person will recognize.In some embodiments, SMI element can be in linking subsequence Before being connected to nucleic acid material, substantially simultaneously or later mix.

In some embodiments, SMI sequence may include at least one degeneracy or half degeneracy nucleic acid.In other embodiments In, SMI sequence can be nondegenerate.In some embodiments, SMI can be with the fragment ends of nucleic acid molecules (for example, even The end of the random or semi-random shearing of the nucleic acid material connect) combine or sequence in its vicinity.In some embodiments, external source Sequence can be examined in conjunction with the sequence of the random or semi-random shearing end for the nucleic acid material (such as DNA) for corresponding to connection Consider, to obtain the SMI sequence for capableing of such as single DNA molecules distinguishable from one another.Another.In some embodiments, SMI sequence is The a part for the linking subsequence being connect with double-stranded nucleic acid molecule.In certain embodiments, comprising the linking subsequence of SMI sequence It is double-strand, so that every chain of double-stranded nucleic acid molecule includes the SMI after being connected to linking subsequence.In another embodiment In, SMI sequence is single-stranded before or after being connected to double-stranded nucleic acid molecule, and can be by being extended with archaeal dna polymerase Opposite strand is to obtain complementary double-strand SMI sequence, to generate complementary SMI sequence.In some embodiments, each SMI sequence It may include about 1 to about 30 nucleic acid (for example, 1,2,3,4,5,8,10,12,14,16,18,20 or more degeneracy or half Degeneracy nucleic acid).

In some embodiments, SMI can be connected to one or both of nucleic acid material and linking subsequence.In some implementations In example, SMI be can connect into the flush end of T- jag, A jag, CG jag, dehydroxylation base and nucleic acid material It is at least one.

In some embodiments, the sequence of SMI can with correspond to such as nucleic acid material (for example, the nucleic acid material connecting) It is random or semi-random shearing end sequence combine take in (or being designed according to it), with obtain can be distinguishable from one another The SMI sequence of single nucleic acid molecules.

In some embodiments, at least one SMI can be endogenous SMI (for example, SMI relevant to shearing point, for example, Using shearing point itself or use the core close to shearing point [for example, apart from 2,3,4,5,6,7,8,9,10 nucleotide of shearing point] The nucleotide of restricted number in sour material).In some embodiments, at least one SMI can be external source SMI (for example, comprising The SMI of not found sequence on target nucleic acid material).

In some embodiments, SMI can be or comprising imaging moiety (for example, fluorescence or other optics can detect portion Point).In some embodiments, such SMI allows to detect and/or quantitatively without amplification step.

In some embodiments, SMI element may include two or more different SMI elements, be located at adapter- At different location on target nucleic acid compound.

The various embodiments of SMI are further disclosed in International Patent Publication No. WO2017/100441, and the patent is to draw Mode is integrally incorporated herein.

Chain defines element (SDE)

In some embodiments, every chain of double-strandednucleic acid material can also include element, cause to form target double-strand core The amplified production of two kinds of single-chain nucleic acids of sour material substantially can be distinguishable from one another after sequencing.In some embodiments, SDE can be with It is or includes that the asymmetric primer sites for including are sequenced in adapter, it, can be by sequence asymmetry alternatively, in other arrangements Property introduce in linking subsequence rather than in primer sequence so that the first chain and target nucleic acid sequence of target nucleic acid sequence compound are multiple At least one position closed in the nucleotide sequence of the second chain of object is different from each other after amplification and sequencing.In other embodiments In, SDE may include another biochemistry asymmetry between two chains, be different from classical nucleotide sequence A, T, C, G or U, but at least one classical nucleotide sequence difference is converted into the molecule of two kinds of amplifications and sequencing.In another reality It applies in example, SDE can be or the means comprising being physically isolated two chains before amplification, so that coming from the first chain target nucleic acid The derivative amplified production of sequence and the second chain target nucleic acid sequence keeps mutually substantially physical separation, for maintaining derived from two kinds The purpose of differentiation between amplified production.It can use the other such arrangements or method for providing SDE function, the SDE Function allows to distinguish the first chain and the second chain.

In some embodiments, SDE may be capable of forming ring (for example, hairpin loop).In some embodiments, ring can wrap Containing at least one endonuclease recognition site.In some embodiments, target nucleic acid compound can be known containing endonuclease Other site promotes the cutting event in ring.In some embodiments, ring may include non-standard nucleotide sequence.Some In embodiment, the non-standard nucleotide for being included can be promoted one or more enzymes identification of chain cutting.In some embodiments In, the non-standard nucleotide for being included can be by promoting one or more chemical methodes of the chain fracture in ring to target.One In a little embodiments, ring can contain modified nucleic acid linker, can be by promoting the one or more of the chain cutting in ring Enzymatic, chemically or physically method target.In some embodiments, this modified connector is light cleavable connector.

Various other molecular tools can serve as SMI and SDE.In addition to shearing point and based on the label of DNA, it is kept into chain The unimolecule compartmentation method or other non-nucleic acid labeling methods being physically proximate can play chain correlation function.It is similar Ground can mark linking subchain that can play SDE effect with them with physically separate mode asymmetric chemistry.Description recently DS variation converted using bisulfites, naturally occurring chain asymmetry is converted to area in the form of cytosine methylation The sequence difference of other two chains.Although the implementation limits the mutation type that can detecte, can directly detect through modifying Nucleotide newly occur under the backgrounds of sequencing technologies, the concept using natural asymmetry is noticeable.SDE's is various Embodiment is further disclosed in International Patent Publication No. WO2017/100441, and the patent is incorporated hereby.

Adapter and linking subsequence

In various arrangements, comprising SMI (for example, molecular barcode), SDE, primer sites, flow cell sequence and/or its The adapter molecule of its feature considers for being used together with many embodiments disclosed herein.In some embodiments, it provides Adapter can be or comprising one or more sequences complementary with PCR primer (such as primer sites) or at least partly complementary, It is at least one of following characteristics: 1) high target-specific;It 2) being capable of multiplex;And 3) show steady and bottom line The amplification of deviation.

In some embodiments, adapter molecule can be Y-shaped, " u "-shaped, " hair clip " shape, have bubbling (for example, non-mutual A part of the sequence of benefit) or other feature.In other embodiments, adapter molecule may include Y-shaped, " u "-shaped, " hair Folder " shape or bubbling.Certain adapters may include modified or off-gauge nucleotide, restriction site or for body External structure or the controllable other feature of function.Adapter molecule may be coupled to the various nucleic acid materials with end.For example, linking Molecule can be suitably attached to T jag, A jag, CG jag, polynucleotides jag, dehydroxylation base, nucleic acid material The flat end of material and the end of molecule, wherein the 5 ' of target slough phosphoric acid or are prevented from conventional connection in other ways.In other realities It applies in example, adapter molecule, which can contain, to be sloughed phosphoric acid on the 5 ' chains at connection site or prevent repairing for connection in other ways Decorations.In latter two embodiment, such strategy can be used for preventing the dimerization of library fragments or adapter molecule.

Linking subsequence can mean single stranded sequence, double-stranded sequence, complementary series, non-complementary sequence, partial complementarity sequence, Asymmetric sequence, primer binding sequence, flow cell sequence, catenation sequence or the other sequences provided by adapter molecule.Specific In embodiment, linking subsequence can mean the sequence for expanding and complementary with oligonucleotides.

In some embodiments, the method and composition provided includes at least one linking subsequence (for example, two linkings Subsequence, each one kind on the 5 ' of nucleic acid material and 3 ' ends).In some embodiments, the method and composition provided can wrap Containing two or more linking subsequences (for example, 3,4,5,6,7,8,9,10 kind or more).In some embodiments, at least Two kinds of linking subsequences are (for example, passing through sequence) different from each other.In some embodiments, every kind of linking subsequence and mutual rank It is different (for example, passing through sequence) to connect subsequence.In some embodiments, at least one linking subsequence and at least one other rank At least part for connecing subsequence is not at least partly complementary (for example, not complementary by least one nucleotide).

In some embodiments, linking subsequence includes at least one non-standard nucleotide.In some embodiments, nonstandard Quasi- nucleotide be selected from abasic site, uracil, tetrahydrofuran, 8- oxo -7,8- dihydro -2 '-desoxyadenossine (8- oxo-A), 8- oxo -7,8- dihydro -2 '-deoxyguanosine (8- oxo-G), deoxyinosine, 5 ' nitroindolines, 5- methylol -2 '-deoxidation born of the same parents Glycosides, iso-cytosine, 5 '-methyl-iso-cytosine or isoguanine riboside, methylated nucleotide, RNA nucleotide, ribonucleotide, 8- oxygen Generation-guanine, light cleavable connector, biotinylated nucleotide, desthiobiotin nucleotide, mercaptan modification nucleotide, Nucleotide, the different dC, different dG, 2 '-O- methyl nucleotides, inosine nucleotide, lock nucleic acid, peptide nucleic acid, 5 methyl of acrydite modification DC, 5-bromouracil deoxyribose, 2,6- diaminopurine, 2-aminopurine nucleotide, abasic nucleotide, 5- nitroindoline nucleosides Nucleotide, the 5- pungent two that acid, polyadenylation nucleotide, nitrine nucleotide, digoxin nucleotide, I- connector, 5 ' hexin bases are modified The compatible modified nucleoside acid of alkynyl dU, light cleavable spacer, non-light cleavable spacer, click chemistry and any combination thereof.

In some embodiments, linking subsequence includes the part (that is, magnetic part) with magnetic properties.In some realities It applies in example, this magnetic properties are paramagnetic.Linking subsequence includes magnetic part (for example, with magnetic portion is included wherein Point linking subsequence connection nucleic acid material) some embodiments in, when applying a magnetic field, the adapter comprising magnetic part Sequence is with the linking subsequence not comprising magnetic part (for example, the nucleic acid connecting with the linking subsequence for not including magnetic part Material) it is substantially separate.

In some embodiments, at least one linking subsequence is located at the 5 ' of SMI.In some embodiments, at least one Linking subsequence is located at the 3 ' of SMI.

In some embodiments, linking subsequence can be connected to SMI and nucleic acid via one or more linker domains At least one of material.In some embodiments, linker domains can be made of nucleotide.In some embodiments, it connects Header structure domain may include at least one modified nucleotide or nonnucleotide molecules (for example, as other in present disclosure Described in place).In some embodiments, linker domains can be or comprising ring.

In some embodiments, the linking subsequence on either one or two end of every chain of double-strandednucleic acid material may be used also To include providing one or more elements of SDE.In some embodiments, SDE can be or be included in packet in linking subsequence The asymmetric primer sites contained.

In some embodiments, linking subsequence can be or comprising at least one SDE and at least one connection structure domain (that is, the active structural domain of at least one ligase can be modified, for example, being suitable for being connected to nucleic acid material by the activity of ligase The structural domain of material).In some embodiments, from 5 ' to 3 ', linking subsequence can be or comprising primer binding site, SDE and Connection structure domain.

Various methods for synthesizing DS adapter were previously public in such as U.S. Patent number 9,752,188 and international monopoly It is described in the number of opening WO2017/100441, described two patents are incorporated herein in its entirety by reference.

Primer

In some embodiments, there are at least one of following characteristics: 1) high target-specific;It 2) being capable of multiplex;And 3) steady and the amplification of bottom line deviation one or more PCR primers are shown, are considered for the aspect according to this technology Various embodiments in.Many previous researchs and commercial product have devised satisfaction about in these standards of Standard PCR-CE Some primer mixtures.However, having been noted that these primer mixtures for being used together with MPS and not always best 's.In fact, the primer mixture of exploitation height multiplex may be challenging and time-consuming process.Easily, Both Illumina and Promega have developed the multiple compatibility primer mixture for Illumina platform recently, Show the steady and effective amplification of various standards and non-standard STR and SNP locus.Because these kits make before sequencing With its target region of PCR amplification, so 5 ' ends of each reading in paired end sequencing data correspond to for DNA amplification 5 ' ends of PCR primer.In some embodiments, the method and composition provided includes being designed to ensure that is uniformly amplified draws Object, may need different reaction densities, melting temperature and minimize secondary structure and primer in/primer between mutually Effect.Many technologies have described the primer optimization for the MPS height multiplex applied.Particularly, these technologies are frequently referred to as Amplification method, as this field fully describes.

Amplification

In various embodiments, the method and composition provided utilizes or uses at least one amplification step, wherein expanding Nucleic acid material (or part of it, for example, particular target or locus) quilt, to form the nucleic acid material of amplification (for example, some The amplicon product of number).In some embodiments, the method provided includes that the nucleic acid material of amplification is separated into such as first The step of sample and the second sample.

In some embodiments, expand the first sample in nucleic acid material the following steps are included: using with the first adapter At least partly complementary at least one single-stranded oligonucleotide of sequence present in sequence and at least partly mutual with purpose target sequence At least one single-stranded oligonucleotide of benefit, amplification are originated from the nucleic acid material of the single nucleic acid chains from original double-strandednucleic acid material, So that SMI sequence is at least partly maintained.

In some embodiments, expand the second sample in nucleic acid material the following steps are included: using with the second adapter At least partly complementary at least one single-stranded oligonucleotide of sequence present in sequence and at least partly mutual with purpose target sequence At least one single-stranded oligonucleotide of benefit, amplification are originated from the nucleic acid material of the single nucleic acid chains from original double-strandednucleic acid material, So that SMI sequence is at least partly maintained.

In some embodiments, before the second amplification step, the nucleic acid material of amplification can be separated into 3 or more A sample (for example, 4,5,6,7,8,9,20,20,30,40,50 or more samples).In some embodiments, each sample Including the amplification of nucleic acid material with each other sample substantially identical amounts.In some embodiments, at least two samples include The amplification of nucleic acid material of substantially different amount.

In some embodiments, expand the nucleic acid material in the first sample or the second sample may include " pipe " (such as PCR pipe), emulsion droplet, expand sample in microchamber and above-mentioned other examples or other known container.

In some embodiments, at least one amplification step includes at least one primer, is or comprising at least one non- Standard nucleotides.In some embodiments, non-standard nucleotide is selected from uracil, methylated nucleotide, RNA nucleotide, ribose Nucleotide, 8- oxo-guanine, biotinylated nucleotide, lock nucleic acid, peptide nucleic acid, height-Tm Nucleic acid variant, allele are distinguished Nucleic acid variant, any other nucleotide described elsewhere herein or linker variant and any combination thereof.

Although any application amplified reaction appropriate is considered compatible with some embodiments, as specific example, one In a little embodiments, amplification step can be or including polymerase chain reaction (PCR), rolling circle amplification (RCA), multiple displacement amplification (MDA), isothermal duplication, the polonies amplification in lotion, on surface, on the surface of pearl or the bridge amplification in hydrogel and Any combination thereof.

In some embodiments, can a part (for example, linking subsequence) to nucleic acid material sample carry out certain repair Decorations.As specific example, in some embodiments, after the nucleic acid material in the first sample of amplification can also be included in separating step And before the first sample amplification, part or all of the second linking subsequence found on nucleic acid material is destroyed or ruptured. As other specific example, in some embodiments, separation step can also be included in by expanding the nucleic acid material in the second sample After rapid and before the second sample amplification, at least one of the first linking subsequence that destruction or rupture are found on nucleic acid material Point.In some embodiments, destroying or rupturing can be or including at least one of the following: enzymatic digestion is (for example, via core Sour restriction endonuclease and/or exonuclease), at least one inhibition of DNA replication molecule include, the enzymatic of enzymatic cutting, a chain is cut Cut, the incorporation of the enzymatic cutting of two chains, modification of nucleic acids then for cause a chain or two chains to cut enzymatic treatment, duplication The incorporation of blocked nucleotide, the incorporation of chain terminator, the incorporation of light cleavable connector, the incorporation of uracil, the incorporation of ribosyl, The incorporation of 8- oxo-guanine adduct, the use of sequence-specific restriction endonuclease, targeting endonuclease (example Such as, Cas enzyme, such as Cas9 or CPF1) use and any combination thereof.In some embodiments, as primer sites destroy or The addition or substitution of rupture, it is contemplated that such as affinity drop-down, size select or for removing and/or not expanding not from sample The method of any other known technology of the nucleic acid material needed.

In some embodiments, targeting is for the first unexpected amplified production at least partly destroying with targeting primer The second amplification after lead to the second amplified production, finally contain on each end of molecule two of the targeting primer are similar Primer binding site rather than two different primer binding sites.In some embodiments, this class formation is for MPS DNA Sequence performance or efficiency may be problematic.

In some embodiments, amplification of nucleic acid material include with purpose target region or target sequence (for example, genome sequence, MtDNA sequence, plasmid sequence, the target nucleic acid being synthetically produced etc.) at least partly complementary at least one single-stranded oligonucleotide, with And the use of the single-stranded oligonucleotide at least partly complementary with the region (such as primer sites) of linking subsequence.In some implementations Example in, amplification of nucleic acid material include with the adapter sequence area on 5 ' and 3 ' ends of every chain of nucleic acid material at least partly The use of complementary single-stranded oligonucleotide.

In general, steady amplification, such as PCR amplification, reaction condition can be highly dependent on.For example, multiplex PCR can be with It is dense to buffer composition, unit price or divalent cation concentration, detergent concentration, crowded dose of (i.e. PEG, glycerol etc.) concentration, primer Degree, primer Tm, design of primers, primer G/C content, Modify to primer nucleotide identity and cycling condition (i.e. temperature and extension of time with And rate of temperature change) sensitive.The optimization of buffer condition may be difficult and time consuming process.In some embodiments, according to elder generation At least one of buffer, primer pond concentration and PCR condition can be used in preceding known amplification scheme, amplified reaction.One In a little embodiments, new amplification scheme can produce, and/or amplified reaction optimization can be used.As specific example, some In embodiment, can be used PCR optimization kit, such as fromPCR Optimization Kit, contain There are many buffer of preformulation, part optimization is applied for various PCR, such as multiple, real-time, anti-rich in GC and inhibitor Property amplification.The buffer of these preformulations can quickly be supplemented with different Mg2+And primer concentration and primer pond ratio.Separately Outside, in some embodiments, it can evaluate and/or use various cycling conditions (for example, thermal cycle).In evaluation specific embodiment It is appropriate for specific required in application, specificity in other aspects can be evaluated, about the equipotential of heterozygous genes seat One of balance and depth or a variety of between gene coverage rate, locus.Successfully measurement may include that the DNA of product is surveyed for amplification Sequence is then commented by gel electrophoresis or Capillary Electrophoresis or HPLC or other size separation methods for the visual product of segment Estimate, using the curve analysis of double-strandednucleic acid combination dye or fluorescence probe, mass spectrography or other methods known in the art.

According to various embodiments, any length that can influence specific amplification step in the various factors is (for example, PCR Recurring number etc. in reaction).For example, in some embodiments, the nucleic acid material provided may be compromise or in other aspects (such as degradation and/or pollution) of suboptimum.In this case, longer amplification step can help to ensure that required product expands Increase to acceptable degree.In some embodiments, amplification step can be provided surveys from each starting DNA molecular 3 to 10 The average value of the PCR copy of sequence, although in other embodiments, it is only necessary to top chain and the respective single copy of bottom chain.No Wish the constraint by specific theory, too many or very little PCR copy can cause determination efficiency to reduce, and eventually lead to depth It is possible that degree, which is reduced,.Generally, the number for expanding nucleic acid (such as DNA) segment in (for example, PCR) reaction is main Adjustable variables can indicate the number for sharing the reading of identical SMI/ bar code sequence.Because SPLiT-DS is using in addition PCR step and do not need using the targeted capture based on hybridization, as some previously described methods, thus using first Any double-strandednucleic acid input quantity of front method report requires to be less likely to be directly translated into the method provided at present, may be more Effectively.

Primer sites are destroyed

Fig. 6-9B is the conceptual illustration according to the various SPLiT-DS method and steps of additional embodiments of the present technology.As above What text discussed, and Fig. 4-6 is referred to, method and step relevant to SPLiT-DS provides first had by SMI plus label It the amplification of nucleic acid material (for example, α, α ', β, β ', Fig. 6) of chain amplicon and the second chain amplicon and is included in the first round and expands It can be separated to the another of the asymmetric primer sites (for example, for Illumina P5 and P7 primer, Fig. 6) in multiple samples afterwards Outer linking subsequence.Fig. 7 shows subsequent step, and wherein nest-type PRC reaction can be in separated response sample (such as pipe) The top chain of original nucleic acid molecule and the Enrichment Amplification of bottom chain are provided.As shown in Figure 7, except the enrichment of required amplified production it Outside, some unexpected amplified productions and subsequent sequencing reading can also be generated.Correspondingly, and in some embodiments, it imitates Rate can be reduction (for example, relative in SPLiT-DS scheme it is not available those, for the required production in SPLiT-DS The percentage of object may be lower).

It is unexpected for reducing and/or eliminating by using one or more strategies according to the another aspect of this technology The amplification and sequencing of amplified production, can increase the various aspects of transfer efficiency and workflow efficiency.In some embodiments, In After the first round of amplification of nucleic acid material is expanded and is separated in multiple samples, primer sites are destroyed or are ruptured (for example, adapter The destruction of primer sites in sequence) it may be used as the mode for being enriched with certain nucleic acid products (such as in such as Fig. 8 A).Some In embodiment, the method provided may include the use that double-chain primer site is destroyed.Several primer sites are contemplated herein to destroy Method.Fig. 8 A-8D is the conceptual illustration for mixing the SPLiT-DS method and step that double-chain primer site ruins a plan.Double-chain primer Site is destroyed and can be realized by various means, including by via the Mdification primer used in the first amplification step, In It targets and introduces primer sites modification (for example, Fig. 6) in chain.In some embodiments, the primer in the first PCR, which can have, repairs Decorations, including uracil, methylation, RNA base, 8- oxo-guanine or the other modifications that can be targeted in later step.In In some embodiments, primer sites destroy can be or including, for example, linking subsequence present in sequence restriction enzyme or Other targeting endonuclease (such as Cas9, CPF1 etc.) digestion, wherein having determined that the chance of restriction site in aim sequence The chance of middle appearance is very low.In some embodiments it is possible to the oligonucleotides complementary with primer sequence to be destroyed is added special Random sample product are then inquiring after with the targeting endonuclease to double-stranded DNA specificity.In another specific embodiment, tool There is the hybridization oligonucleotide of methyl group to can be used for recruiting methylation-specific restriction endonuclease to complementary primer Site.As shown in Figure 8 A, double-chain primer site is destroyed (for example, the primer sites in sample on two copies of non-targeted chain Destruction), can be used for destroy, weaken or remove in pipe 1 " top chain " and " bottom chain " copy the two " P5 " draw Object sequence.Similarly, in pipe 2, " P7 " primer sequence can copy in the two selectively from " top chain " and " bottom chain " It destroys, weaken or removes.Fig. 8 B is the conceptual illustration of an example of the primer sequence for selectively destroying in sample. As shown in figure 8B, the first sample, first limitation can be handled with the first restriction endonuclease (for example, MspJI) Property endonuclease selectively cut the site found in the first primer sequence (for example, Illumina " P5 "), thus destroy The first primer site in all nucleic acid materials in first sample.Likewise it is possible to the second restriction endonuclease (example Such as, FspEI) processing the second sample, second restriction endonuclease selectively cut the second primer sequence (for example, Illumina " P7 ") in find site, thus destroy the second primer sites in all nucleic acid materials in the second sample.

Together with reference to Fig. 8 A and 8C, by using " P7 " primer and with the target sequence primer of " P5 " primer sites tail portion (for example, gene-specific primer), selectively the product in amplification (extending one or more Linear Circulations) pipe 1, only generates " bottom chain " type (see, for example, Fig. 8 C) of both incorporation " P7 " and " P5 " primer sites, and other nucleic acid species in pipe 1 It exponentially cannot expand or be sequenced (for example, lacking " P5 " primer sites).Similarly, by using " P5 " primer and have The target sequence primer (for example, gene-specific primer) of " P7 " primer sites tail portion, selectively amplification (extends one or more Linear Circulation) product in pipe 2, only generate both incorporation " P5 " and " P7 " primer sites " top chain " type (see, for example, Fig. 8 C), and other nucleic acid species in pipe 2 exponentially cannot expand or be sequenced (for example, lacking " P5 " primer sites).It answers Understand, although unwanted linear product is not sequenced or exponentially expands, they may consume primer and dNTP, this can There can be some influences to the efficiency of such reaction.

In some embodiments, including primer sites destroy method can also use it is one or more biotinylated or Other targeting primers.Fig. 8 D is another embodiment according to this technology, the SPLiT- that incorporation double-chain primer site ruins a plan The conceptual illustration of DS method and step.In the embodiment shown in Fig. 8 D, there is " P5 " primer sites tail portion or " P7 " primer The target sequence primer of site tail portion is biotinylated.With reference to Fig. 8 D, and in the extension using biotinylated targeting primer After step, streptavidin pearl or hydrogel enrichment can be used for being enriched with the product having there are two primer sites, by This eliminates most of only with the nucleic acid species of a primer sites.Consider in some such embodiments, such enrichment can be with Improve PCR efficiency, and/or promote multiple method, and/or improves the cluster amplification efficiency in MPS DNA sequencer, and/or More available sequencing data is generated in MPS DNA sequencer.

In order to further limit the enrichment of missing the target of the type by biotin/streptavidin enrichment capture, nest is used Formula primer (for example, second targeting primer of " PS " or " P7 " primer and the nested inside with counter current pond sequence) into One step expands the type that can be used for further being enriched on target, and reduces unwanted amplified production.In a specific reality It applies in example, before pairs of nested primers is added for exponential amplification, uses the primer for example to purpose target sequence specificity Selective linear amplification can further be enriched with required type.

In some embodiments, the destruction of single stranded primer level point can be used.Fig. 9 A and 9B are according to the further of this technology Aspect, the conceptual illustration of the various embodiments for the SPLiT-DS method and step that incorporation single stranded primer level point ruins a plan.As Non-limitative example, and as illustrated in figure 9 a, during the first amplification step of SPLiT-DS, drawn by using modified Object (not shown) can destroy primer sites in a chain of duplex molecule (see, for example, Fig. 6).Modified primer can be with Including chemical modification (for example, uracil, methylation, RNA base, 8- oxo-guanine etc.) etc., can then be targeted For destroying or weakening the primer sites on impacted chain.Use " P7 " primer and specific marker (for example, biotin, tool Have different flow cell adapter tail portions etc.) target sequence primer (for example, gene-specific primer), then amplification (extend one A or multiple Linear Circulations) required target in pipe 1, incorporation " P7 " and special marking are only generated (for example, biotin, different primers Site etc.) both " bottom chain " type (see, for example, Fig. 9 A), and other nucleic acid species in pipe 1 will not exponentially expand Increase.(not shown) is enriched with by streptavidin pearl or is mended via with " P7 " primer and with different primers site Fill and have " P5 " primer sites flow cell adapter tail portion Mdification primer further amplification, in next step further Select unwanted product (Fig. 9 B)." the bottom being enriched in 1 sample of pipe with the final amplified reaction of " P7 " and " P5 " primer Portion's chain " product (Fig. 9 B).The supplement step in the sample in pipe 2 can be made to be enriched with " top chain " product (Fig. 9 B).Be not intended to by The constraint of any specific theory, if considering that the option for the digestion of double-chain primer site is available, this Class Options can be with Better than single-stranded digestion.

In a further embodiment, one or more schemes about Fig. 6-9B description can combine or certain steps can To eliminate, while still realizing certain improved efficiencies.For example, in one embodiment, life can be used during extending step Object element targeting primer (such as, it then follows method and step shown in Fig. 6), and subsequent streptavidin detection It can be used for recycling purpose chain.(for example, destroying without primer sites) in this example, also there are two same primers positions for recycling tool The type of point (for example, two " P5 " primer sites, two " P7 " primer sites).

Multiplex PCR/capture molecule

In some applications, target area or sequence may be challenging to sequence, because nucleic acid break point may connect It is bordering on target specific primer, the region for leading to short-movie section or missing completely.For example, the DNA or circulating cells of random shearing are free DNA (cfDNA), such as Circulating tumor DNA or circulation foetal DNA, sample, which can have, cannot retrieve (for example, reading in sequencing Middle detection/covering) target sequence.In some embodiments, the method provided can pass through the multiple regions in targeting target sequence Overcome such challenge, such as using the multiple target primers complementary with the alternating share of target sequence (for example, targeting target sequence is not With each primer in region).In order to avoid challenge relevant to short-movie section, and in one embodiment, DNA can be sheared At than be most preferably sequenced it is generally desirable to bigger piece.Figure 10 is the another embodiment according to this technology, uses multiple targetings Primer is used to generate the conceptual illustration of the SPLiT-DS method and step of the dual consensus sequence of longer nucleic acid molecules.

Referring to Figure 10, the method provided may include multiple amplimers, such as the respectively region of targeting purpose target sequence The use of multiple primers of (for example, being separated by~100BP).According to various embodiments, such method can single reaction (such as Pipe) in execute, or in other embodiments, executed in multiple reactions (such as pipe), for example, to avoid neighbouring or adjacent Primer interacts with each other.In some embodiments, multiple interlock can be mitigated by executing extension with strand displacement polymerase The interaction of primer in the same tube, so that not blocking the primer from more upstream initiation from the primer that downstream is caused.Some In embodiment, extension can execute several Linear Circulations with the first primer, be then purification, and for the another of the second primer Group extension etc..As shown in Figure 10, the primer sets of each nesting generate the amplified production of different length, can then be surveyed Sequence.Reading 1 in all amplified productions will obtain identical sequence information, and from the respective pairing end amplified production A, B and C Terminal sequence reading will obtain staggered sequencing information, provide together with 1 sequencing information of reading than using MPS or standard in the past The assembling sequence of the possible larger lengths of DS scheme.

In some embodiments, using the analysis that other off-gauge methods of DS method are carried out with more primer data.Such as this What field technical staff understood, the dual assembling of more primer sequences reading using individual SMI label be it is impossible because more The sample changed again may include the product with the different length of same label.In order to solve this challenge, some embodiment packets The dual assembling by label is included, the label is sequence (such as genome) position of SMI and targeting primer initiation site Combination.In some embodiments, after dual assembling, it can be estimated that the number with common SMI but the dual sense of different length According to.In some embodiments, each dual family can be assembled into " the reading dual family " of aggregation more.Consider some such Embodiment can promote DS target area subgroup to dress up longer unimolecule reading, this may be advantageous certain applications, And increase the effective gene parting length of target nucleic acid molecule with short reading microarray dataset.

As it is known to the person skilled in the art, the longest continuous-reading that can be obtained by Illumina NextSeq at present Be~300BP: pairing end 150BP reading meets in centre, as long as enzymatic targeting and primer careful design are to generate substantially Close to the segment of this length.Correspondingly, in some embodiments, as described herein, the embodiment of more primed methods is mixed Realize longer entire molecule DS sequence.

In some respects, the method provided reflects following opinion: in some embodiments, that combines with SPLiT-DS is more A targeting primer can especially realize the continuous sequence of (i) long single molecule, and optionally, have (ii) high specific and/ Or (ii) DS accuracy.Think that being likely to method provided herein can be used for for example in applying below: needing long and accurate Those of continuous-reading;From the beginning genome assembles;Survey is executed in repeat region (i.e. with the genome area of repetitive sequence) It is fixed, wherein unique mapping is difficult;Think particularly challenging sequencing region (such as HLA locus, cancer vacation base Cause, microsatellite);The measurement of common incidence (such as the mutation of drug sensitization, resistant mutation) for the variant in such as cancer, Haplotype analysis (for example, the origin (such as parent, male parent or fetal origin) being mutated in assessments foetal DNA), macro gene Group learns (such as antibiotic resistance);Overcome limitation (such as the position of Cas9 and specific region based on enzyme recognition site of certain enzymes Need to be separated by limitation how far);Big structural rearrangement;And/or insertion and deletion etc..

For processing the additional embodiment of nucleic acid material

In some embodiments, it is advantageous that processing nucleic acid material in this way, to improve efficiency, the accuracy of sequencing procedure And/or speed.According to the further aspect of this technology, can be enhanced by targeting nucleic acid fragment such as DS and/or The efficiency of SPLiT-DS.Traditionally, by physical shear (for example, ultrasonic treatment) or using enzymatic mixture with cutting DNA phosphoric acid Certain non-sequence specificity enzymatic methods of diester linkage, Lai Shixian nucleic acid (for example, genome, mitochondria, plasmid etc.) fragmentation. It is any the result is that such sample in the above method, wherein complete nucleic acid material (for example, genomic DNA (gDNA)) is gone back Original at the nucleic acid fragment with random or semi-random size mixture.Although effectively, these methods, which generate, has and can become larger Small nucleic acid fragment, this may cause amplification deviation, and (for example, short-movie section, which compares long segment, is more likely to PCR amplification, and cluster exists Polonies are easier to expand during being formed) and non-uniform sequencing depth.For example, Figure 11 A is that drafting Nucleic acid inserts are big The figure of the small relationship between obtained family's size after amplification.As illustrated in figure 11A, because shorter segment tend to it is excellent It first expands, so generating and these greater number of copies respective compared with short-movie section being sequenced, provides the disproportionate of these regions Horizontal sequencing depth.In addition, for longer segment, although being successfully connected, expanding and capture, in the limitation of sequencing reading The a part of DNA between (or between the end of paired end sequencing reading) cannot be inquired after, and be that " dark " (schemes 11B).Similarly, for short reading, and when using paired end sequencing, read from two readings identical among molecule Sequence provides redundancy, and (Figure 11 B) that cost efficiency is low.Random or semi-random nucleic acid fragment may further result in target Uncertain breaking point in molecule obtains may not having complementary or reduction complementarity with the bait chain of hybrid capture Segment, to reduce target capture rate.Random or semi-random fragmentation can also be broken aim sequence, and/or cause in library The very small or very big segment lost during other stages of preparation, and data yield and efficiency can be reduced.

About many random fragmentation methods, another especially mechanical or acoustic method problem is that their introducings exceed The damage of double-strand break, this part that may cause to double-stranded DNA is no longer double-strand.For example, mechanical shearing can be in molecule End generate 3 ' or 5 ' jags, and in the molecule between single-stranded nick.Comply with these single-stranded portions of adapter connection Point, such as the mixture of " end reparation " enzyme, for artificially causing it to become double-strand again, and it may be artificial The source (such as above for " false dual molecule " description) of mistake.In many examples, make during processing with natural It is optimal that the amount for the purpose double-strandednucleic acid that double-stranded form retains, which reaches maximum,.

Correspondingly, in some embodiments, the method and composition that provides utilizes targeting endonuclease (such as ribose core Albumen composition (CRISPR associated nucleic acid restriction endonuclease, such as Cas9, Cpf1), homing endonuclease, Zinc finger nuclease, TALEN, argonaute nuclease and/or meganuclease (for example, megaTAL nuclease etc.) or combinations thereof) or can Other technologies of nucleic acid material (for example, one or more restriction enzymes) are cut, to cut off the purpose target with best clip size Sequence is for being sequenced.In some embodiments, targeting endonuclease has specificity and selectively cuts off accurate purpose sequence The ability of column region.Figure 11 C is to show according to one embodiment of this technology, distinguishes size by CRISPR/Cas9 for generating Targeting segment and for generate sequencing information method and step schematic diagram.By pre-selecting cleavage site, such as using Programmable endonuclease (for example, CRISPR correlation (Cas) enzyme/guidance RNA compound), causes predetermined and substantially uniform The segment (Figure 11 C) of size, deviation and the presence without information reading can be reduced sharply.Further, since the segment of excision and surplus Difference in size between remaining non-cutting DNA, it is big to remove to execute size selection step (as described further below) Region of missing the target, therefore the preenrichment sample before any further processing step.It can also reduce or eliminate and end is repaired Needs of multiple step, therefore save the risk of time and false double challenge, and in some cases, reduce or eliminate for The needs of the calculating trimming of data near molecular end, therefore improve efficiency.

Restriction endonuclease

Special consideration should be given to any one of various restriction endonuclease (i.e. enzyme) may be used to provide substantially uniform length The nucleic acid material of degree.Generally, restriction enzyme is usually generated by certain bacteriums/other prokaryotes, and in the given area of DNA At particular sequence in section, cut adjacent or between.

It will be apparent to those skilled in the art that selectional restriction enzyme is at specific site to cut, or alternatively, It is cut at the site of generation, to generate the restriction site for cutting.In some embodiments, restriction enzyme is synthesis Enzyme.In some embodiments, restriction enzyme is not synzyme.In some embodiments, restriction enzyme as used herein into Row modification, to introduce one or more variations in the genome of enzyme itself.In some embodiments, restriction enzyme is DNA's Double-strand cutting is generated between restriction sequence in given part.

Although any restrictions enzyme can use (for example, I type, II type, type III and/or IV type) according to some embodiments, But following non-limiting lists for the restriction enzyme that can be used that represent: AluI, ApoI, AspHI, BamHI, BfaI, BsaI, CfrI、DdeI、DpnI、DraI、EcoRI、EcoRII、EcoRV、HaeII、HaeIII、HgaI、HindII、HindIII、 HinFI、KpnI、MamI、MseI、MstI、MstII、NcoI、NdeI、NotI、PacI、PstI、PvuI、PvuII、RcaI、 RsaI、SacI、SacII、SalI、Sau3AI、ScaI、SmaI、SpeI、SphI、StuI、XbaI、XhoI、XhoII、XmaI、 XmaII and any combination thereof.The extensive but non-exhaustive list of appropriate restriction enzyme can disclose obtain catalogue and because It is found on special net (for example, can be obtained at New England Biolabs, Ipswich, MA, U.S.A).

Target endonuclease

Targeting endonuclease (for example, CRISPR correlation ribonucleoprotein complexes, such as Cas9 or Cpf1, core of going back to the nest Sour enzyme, Zinc finger nuclease, TALEN, megaTAL nuclease, argonaute nuclease and/or its derivative) it can be used for selecting Property cut and cut off the targeting moiety of nucleic acid material, for being enriched with such targeting moiety for the purpose applied be sequenced.One In a little embodiments, targeting endonuclease can be modified, such as with amino acid substitution, for providing the thermostabilization of such as enhancing Property, salt tolerance and/or pH tolerance.In other embodiments, targeting endonuclease can be biotinylated, anti-with strepto- Biotin protein merges and/or mixes other (for example, bait/catches) technologies based on affinity.In certain embodiments, Targeting endonuclease can have the recognition site specificity of change (for example, having the PAM locus specificity changed SpCas9 variant).The targeting endonuclease based on CRISPR further described herein, to provide targeting endonuclease The further details of non-limitative example used.It was noted that the nomenclature about such targeted nuclease is still becoming Change.For the purposes herein, we are using term " based on CRISPER's " generally to mean the endonuclease comprising nucleic acid sequence Enzyme, sequence can be modified, to redefine nucleic acid sequence to be cut.Cas9 and CPF1 is such target used at present To the example of endonuclease, but become apparent from and there is different places in nature, and it is such targeting and be easy adjust The availability expection of the different mutation of nuclease will increase rapidly in the coming years.Similarly, enhance or modify its characteristic this The multiple transformation variant of a little enzymes is made available by.Herein, we have taken explicitly into account not explicitly described herein or still not found The use of substantially intimate targeting endonuclease, with realize with its described in the similar purpose of disclosure.

CRISPR-DS

The another aspect of this technology is related to being enriched with the side of destination region using programmable endonuclease CRISPR/Cas9 Method.Particularly, CRISPR/Cas9 (or other programmable endonucleases) can be used for selectively cutting off one or more mesh Sequence area, wherein the target region cut off is configured to have one or more predetermined lengths, therefore allow for being sequenced Using the size selection before the library preparation of (such as DS and SPLiT-DS).These programmable endonucleases can individually make With, or be applied in combination with the targeted nuclease such as restriction endonuclease of other forms.This side of referred to as CRISPR-DS Method allows the very high enrichment (it can reduce the needs for subsequent hybrid capture step) on target, can significantly drop Low time and cost and increase transfer efficiency.Figure 12 A-12D is CRISPR-DS method according to one embodiment of this technology The conceptual illustration of step.For example, CRISPR/Cas9 can be used in target sequence one or more specific sites (for example, The site PAM) at cut (Figure 12 A;TP53 target region in this example embodiment).Figure 12 B shows a kind of method, uses SPRI/ Ampure pearl and magnet purify to separate the target part of excision, to remove high-molecular-weight DNA, while leaving scheduled compared with short-movie Section.In other embodiments, all size selection method, including but not limited to gel electrophoresis, gel-purified, liquid phase can be used Chromatography, size exclusion purifying and Purification by filtration method, by the cut-out of predetermined length and undesirable DNA fragmentation and other high Molecular weight gene group DNA (if applicable) is separated.After size selection, CRISPR-DS method includes walking with DS method Suddenly the step consistent (see, for example, Figure 12 E), the company including A- tailing (CRISPR/Cas9 excision leaves flush end), DS adapter Connect (Figure 12 C), dual amplification (Figure 12 D), capture step and index in every chain sequencing and before generating dual consensus sequence It expands (for example, PCR) (Figure 12 D).In addition to the improvement in workflow efficiency obvious in such as Figure 12 E, CRISPR-DS is mentioned The best fragment length (Figure 12 F) for efficient amplification and sequencing steps is supplied.

In certain embodiments, CRISPR-DS solves a variety of FAQs relevant to NGS, including for example inefficient Target enrichment can select to optimize by the size based on CRISPR;Mistake is sequenced, can be used for generation error The DS method removal for the dual consensus sequence corrected;And non-uniform clip size, pass through the CRISPR/ being pre-designed Cas9 fragmentation mitigates (table 1).

The crRNA sequence of table 1.TP53 CRISPR/Cas9 digestion

Digest the formation of DNA material use ribonucleoprotein complexes, identification and cutting in vitro with Cas9 nuclease Predetermined site (for example, the site PAM, Figure 11 C).The compound by guidance RNA (" gRNA ", such as crRNA+tracrRNA) and Cas9 is formed., can be then compound with tracrRNA by merging all crRNA for multiple cutting, or by compound respectively Every kind of crRNA and tracrRNA is then combined with and carrys out compound gRNA.In some embodiments, the second option may be preferred, because The competition between crRNA is eliminated for it.

As the skilled artisan will appreciate, as described herein, CRISPR-DS can have is for sample wherein The application of the sensitive identification of mutation in the case that DNA is limited, such as medical jurisprudence and early-stage cancer detection application.

In some embodiments, nucleic acid material includes the nucleic acid molecules of substantially even length.In some embodiments, base Uniform length is between about 1 to 1,000,000 base in sheet).For example, in some embodiments, substantially uniform length It is at least 1 that degree, which can be length,;2;3;4;5;6;7;8;9;10;15;20;25;30;35;40;50;60;70;80;90;100; 120;150;200;300;400;500;600;700;800;900;1000;1200;1500;2000;3000;4000;5000; 6000;7000;8000;9000;10,000;15,000;20,000;30,000;40,000;Or 50,000 bases.Some In embodiment, substantially uniform length can be at most 60,000;70,000;80,000;90,000;100,000;120, 000;150,000;200,000;300,000;400,000;500,000;600,000;700,000;800,000;900,000; Or 1,000,000 bases.As specific non-limitative example, in some embodiments, substantially uniform length is about Between 100 to about 500 bases.In some embodiments, size selection can be executed before any specific amplification step Step, such as steps described herein.In some embodiments, size choosing can be executed after any specific amplification step Select step, such as steps described herein.In some embodiments, size selection step (such as those described herein) is subsequent It can be other step, such as digestion step and/or another size select step.

In addition to the use of targeting endonuclease, any of the realization substantially nucleic acid molecules of even length can be used Other application method appropriate.As non-limitative example, such method can be or including one of following or a variety of It uses: Ago-Gel or other gels, affinity column, HPLC, PAGE, filtering, SPRI/Ampure type pearl or such as by this field Any other method appropriate that technical staff generally acknowledges.

In some embodiments, processing nucleic acid material is to generate the nucleic acid molecules of substantially even length (or quality), It can be used for the target region needed for recycling one or more from sample (for example, purpose target sequence).In some embodiments, it processes Nucleic acid material can be used for excluding the specific part of sample to generate the nucleic acid molecules of substantially even length (or quality) (for example, nucleic acid material of the unwanted subject from unwanted species or same species).In some embodiments, core Sour material can have (for example, not being substantially uniform length or quality) with all size.

In some embodiments, more than one targeting endonuclease can be used or for providing substantially even length Nucleic acid molecules other methods (for example, 2,3,4,5,6,7,8,9,10 or more).In some embodiments, nucleic acid is targeted Enzyme can be used for cutting the more than one potential target region of nucleic acid material (for example, 2,3,4,5,6,7,8,9,10 or more It is a).In some embodiments, when there are the more than one target region of nucleic acid material, each target region can have it is identical (or It is substantially the same) length.In some embodiments, when there are the more than one target region of nucleic acid material, it is known that length At least two target regions are different in length (for example, the first target region with 100bp length and having 1,000bp length Second target region).

In some embodiments, multiple targeting endonuclease (for example, programmable endonuclease) can be applied in combination, With the multiple regions of fragmentation purpose target nucleic acid.In some embodiments, one or more programmable targeting endonucleases can To be applied in combination with other targeted nucleases.In some embodiments, one or more targeting endonucleases can with it is random Or semi-random nuclease is applied in combination.In some embodiments, one or more targeting endonucleases can be with nucleic acid fragment The other random or semi-random method changed, such as the shearing of mechanical or acoustics are applied in combination.In some embodiments, with one or more A intervention size selection step executes cutting in sequential steps to be advantageous.Targeting fragmentation and random or half wherein In some embodiments that random fragmentation is applied in combination, the random or semi-random property of the latter can be used for playing the purpose of SMI. In some embodiments that wherein targeting fragmentation and random or semi-random fragmentation are applied in combination, the latter's is random or semi-random Property can be used for promoting the sequencing of nucleic acid region, and the nucleic acid region is not easy to cut in a targeted manner, such as long height Repeat region.

Other method

In some embodiments, the method provided may comprise steps of: nucleic acid material be provided, with targeting endonuclease Enzyme (for example, ribonucleoprotein complexes) cuts nucleic acid material, so that its remaining part of the target region of predetermined length and nucleic acid material It separates, and analyzes the target region of cutting.In some embodiments, the method provided can also include by least one SMI And/or linking subsequence is connected at least one of 5 ' or 3 ' ends of the cutting target region of predetermined length.In some implementations In example, analysis can be or including quantitative and/or sequencing.

In some embodiments, it quantitatively can be or including spectrophotometric analysis, real-time PCR and/or fluorescence-based fixed Amount (for example, adding label using fluorescent dye).In some embodiments, sequencing can be or including the sequencing of mulberry lattice, shotgun Sequencing, bridge-type PCR, nano-pore sequencing, unimolecule be sequenced in real time, ion stream sequencing, pyrosequencing, number sequencing is (for example, base In the sequencing of digital bar code), by connection sequencing, the sequencing based on polonies, the sequencing based on electric current is (for example, tunnel Wear electric current), the sequencing via mass spectrography, the sequencing based on microfluid and any combination thereof.

In some embodiments, targeting endonuclease be or comprising related (Cas) enzyme of CRISPR (for example, Cas9 or Cpf1) or other ribonucleoprotein complexes, homing endonuclease, Zinc finger nuclease, be based on activating transcription factor sample effect At least one of nuclease (TALEN), argonaute nuclease and/or megaTAL nuclease of object.In some embodiments In, more than one targeting endonuclease (for example, 2,3,4,5,6,7,8,9,10 or more) can be used.In some implementations In example, targeted nuclease can be used for cutting predetermined length more than one potential target region (for example, 2,3,4,5,6,7,8,9, 10 or more).In some embodiments, when there are the more than one target region of predetermined length, each target region can be with Length with identical (or substantially the same).In some embodiments, when there are the more than one target region of predetermined length, At least two target regions of predetermined length it is different in length (for example, the first target region with 100bp length and with 1, Second target region of 000bp length).

In addition aspect

According to one aspect of the present disclosure, some embodiments provide the survey of high quality from very small amount of nucleic acid material Sequence information.In some embodiments, the method and composition provided can be at most about 1 pik (pg);10pg;100pg;1 receives Gram (ng);10ng;100ng;200ng, 300ng, 400ng, 500ng, 600ng, 700ng, 800ng, 900ng or 1000ng's rises The amount of beginning nucleic acid material is used together.In some embodiments, the method and composition provided can be copied at most 1 molecule Or genome equivalent, 10 molecule copies or its genome equivalent, 100 molecule copies or its genome equivalent, 1,000 points Son copy or its genome equivalent, 10,000 molecule copies or its genome equivalent, 100,000 molecule copies or its gene The input quantity of the nucleic acid material of group equivalent or 1,000,000 molecule copies or its genome equivalent is used together.For example, In In some embodiments, at most 1,000ng nucleic acid material is initially provided of for specific sequencing procedure.For example, in some embodiments In, it is initially provided of at most 100ng nucleic acid material and is used for specific sequencing procedure.For example, in some embodiments, be initially provided of to More 10ng nucleic acid materials are used for specific sequencing procedure.For example, in some embodiments, being initially provided of at most 1ng nucleic acid material For specific sequencing procedure.For example, in some embodiments, being initially provided of at most 100pg nucleic acid material for specifically surveying Program process.For example, in some embodiments, being initially provided of at most 1pg nucleic acid material and being used for specific sequencing procedure.

According to the other aspects of this technology, the methods of some offers sequencing nucleic acid material various suboptimums (for example, impaired Or degradation) any one of sample can be it is useful.For example, in some embodiments, at least some of nucleic acid material is Impaired.In some embodiments, damage is or comprising at least one of the following: oxidation, alkylation, deamination, methylation, Hydrolysis, generate notch, crosslinking in chain, interchain linkage, the fracture of flush end chain, staggered end double-strand break, phosphorylation, dephosphorylation, SUMOization, single-stranded gap, the damage of Lai Zire, the damage for carrying out self-desiccation, the damage from UV exposure, comes from γ spoke at glycosylation The damage penetrated, the damage from X-ray, the damage from ionising radiation, the damage from Non-ionizing radiation, from heavy particle spoke The damage penetrated, the damage from nuclear decay, the damage from β radiation, from alpha-emitting damage, the damage from neutron irradiation, Damage from proton irradiation, the damage from high pH, the damage from low pH, comes from activity at the damage from cosmic radiation The damage of oxide species, the damage from peroxide, the damage from hypochlorite, comes from group at the damage from free radical It knits the damage, the damage from active iron of fixed such as formalin or formaldehyde, the damage from low ionic conditions, come from macroion The damage of condition, the damage from nuclease, the damage from environmental exposure, comes from fire at the damage from no buffer condition The damage, damage from mechanical stress, the damage from enzymatic degradation, the damage from microorganism, mechanical from preparative The damage of shearing, the damage from preparative enzymatic cleavage, in vivo abiogenous damage, sent out during nucleic acid extraction Raw damage, the damage introduced by polymerase, has been drawn during nucleic acid reparation at the damage occurred during sequencing library preparation The damage that enters, the damage occurred during nucleic acid tailing, the damage occurred during nucleic acid connection, during sequencing The damage that has occurred, the damage occurred due to the mechanical treatment of DNA, damage, the conduct occurred during through nano-pore Damage that the part of biological decay has occurred, has been occurred by mutagens the damage occurred due to the Chemical exposure of individual Damage, the damage occurred by carcinogenic substance, the damage occurred by clastogen, the damage occurred due to internal inflammation, Due to the damage of oxygen exposure, due to one or the damage and any combination thereof of the fracture of a plurality of chain.

Nucleic acid material

Type

According to various embodiments, any one of multiple nucleic acids material can be used.In some embodiments, nucleic acid material It may include at least one modification to the polynucleotides in classical sugar-phosphate backbone.In some embodiments, nucleic acid material can To include at least one modification in any base in nucleic acid material.For example, as non-limitative example, in some embodiments In, nucleic acid material is or comprising in double-stranded DNA, single stranded DNA, double-stranded RNA, single stranded RNA, peptide nucleic acid (PNA), lock nucleic acid (LNA) At least one.

Modification

According to various embodiments, nucleic acid material can before any particular step, substantially simultaneously or later receive one Kind or a variety of modifications, depending on using the method for specific offer or the application of composition for it.

In some embodiments, modification can be or at least part of reparation including nucleic acid material.Although any answer It is considered compatible with some embodiments with nucleic acid repair mode appropriate, but certain illustrative methods and composition are therefore below It is described in example.

As non-limitative example, in some embodiments, DNA repair enzyme, as uracil-DNA glycosylase (UDG), Formamidopyrimidine DNA glycosylase (FPG) and 8- oxoguanine DNA glycosylase (OGG1) can be used for correcting DNA damage Hurt (such as external DNA damage).For example, these DNA repair enzymes are the glycosylases for removing impaired base from DNA.For example, UDG Removal derives from the uracil of cytosine deamination (being caused by the spontaneous hydrolysis of cytimidine), and FPG removal 8- oxo-bird is fast Purine (for example, the most common DNA damage for deriving from active oxygen species).FPG also has cracking enzymatic activity, can be in abasic position 1 base notch is generated at point.For example, such abasic site then cannot be by PCR amplification, because of polymerase not reproducible Template.Correspondingly, damaged dna can be effectively removed in the use of such DNA damage repair enzyme, does not have really mutation, But it can not may be detected as mistake in other aspects with after dual-serial analysis being sequenced.

As described above, in a further embodiment, can further filter from the procedure of processing generation being discussed herein Sequencing reading eliminates error mutation to be easiest to the end of artifactitious reading by trimming.For example, DNA fragmentationization can To generate single stranded portion in the end of duplex molecule.These single stranded portions can fill during end is repaired (for example, passing through Klenow).In some cases, polymerase generates copy errors in the region that these ends are repaired, and causes " dual point false The generation of son ".Once sequencing, these artefacts can look like real mutation.Knot as end repair mechanism Fruit, these mistakes can be eliminated from the analysis after sequencing by the end of trimming sequencing reading, may have been occurred with exclusion Any mutation, to reduce the number of error mutation.In some embodiments, such trimming that reading is sequenced can be automatically performed (for example, normal process steps).In some embodiments, the frequency of mutation in fragment ends region can be evaluated, and if The threshold level of mutation is observed in fragment ends region, then can read it in the double-strand consensus sequence for generating DNA fragmentation Preceding execution sequencing reading trimming.

Source

Consider that nucleic acid material can come from any one of various sources.For example, in some embodiments, nucleic acid material by It is provided from the sample of at least one subject (such as human or animal subject) or other biological sources.In some embodiments In, nucleic acid material is provided by inventory/storage sample.In some embodiments, sample is or comprising at least one of the following: Blood, serum, sweat, saliva, celiolymph, mucus, metroclyster liquid, vaginal swab, nose swab, buccal swab, tissue are scraped Piece, hair, fingerprint, urine, excrement, vitreous humor, peritoneal wash fluid, phlegm, bronchial perfusate, oral cavity irrigating solution, thoracic cavity lavage Liquid, gastric lavage liquid, gastric juice, bile, ductus pancreaticus irrigating solution, bile duct irrigating solution, choledochus irrigating solution, gall-bladder liquid, synovia, infectious wound Mouth is uninfected by wound, archaeological samples, forensic samples, water sample, tissue sample, foodstuff samples, bioreactor sample, plant sample Product, nail scraping blade, sperm, prostatic fluid, fallopian tubal irrigating solution, cell free nucleic acid, intracellular nucleic acid, metagenomics sample Product, the irrigating solution for being implanted into foreign matter, Nasal lavage fluid, intestinal juice, epithelium scrub liquid, epithelium irrigating solution, tissue slicer, postmortem sample Product, ptomatopsia sample, organ samples, human body identification sample, artificially generated nucleic acid samples, synthesis gene samples, nucleic acid number According to store sample, tumor tissues and any combination thereof.In other embodiments, sample be or comprising microorganism, based on plant Biology or at least one of the environmental sample (for example, water, soil, archaeology etc.) of any collection.

The example application of selection

As described herein, the method and composition provided can be used for any one of various purposes and/or various situations Any one of.It is described below to be only used for illustrating the example of the non-limiting application of purpose and/or situation.

Medical jurisprudence

The prior method of forensic DNA analysis is almost completely dependent on the capillary electrophoresis separation of PCR amplification, short to identify Length polymorphism in tandem repetitive sequence.From it since 1991 release, such analysis is had proved to be very It is valuable.From that time, some publications have had been introduced into standardization agreement, demonstrate it in laboratory all over the world Using, be described in detail its in many different crowds use and introduce more efficient method, such as miniSTR.

Although this method has proved to be extremely successful, which has the shortcomings that limit many of the practicality.Example Such as, current str locus classifying method often causes the background signal from PCR skidding (PCR stutter), this is by polymerizeing Sliding of the enzyme on template DNA causes.This problem is particularly important in the sample with more than one contributor, due to being difficult to Distinguish skidding allele and real allele.There is another problem when the DNA sample of analysis degradation.Fragment length In variation frequently result in that significant lower or even there is no longer PCR fragments.Thus, the map from degradation of dna is frequent With lower perspective.

MPS system is introduced into the potentiality for solving several challenges in forensic analysis.For example, these platforms mention Impayable ability is supplied, to allow while analyze the STR in core and mtDNA and SNP, this is by the area between sharply increasing individual Other power, and a possibility that determining ethnicity and even physical attribute are provided.In addition, with the aggregation group of only reporter molecule The PCR-CE of average gene type is different, and MPS technique is in a digital manner by the whole nucleotide sequence system of many individual DNA moleculars Table thus provides the unique ability for detecting the MAF in heterogeneous DNA mixture.Because including two or more contributors' Forensic samples are still one of most thorny issue in medical jurisprudence, so influence of the MPS to forensic science may be huge.

The announcement of human genome is highlighted the immense strength of MPS platform.However, up to date, due to reading length It is significantly shorter than str locus seat, the entire ability of these platforms has limited use to medical jurisprudence, eliminates calling based on length The ability of genotype.Initially, pyrosequencing instrument, such as 454 platform of Roche are that have reading length enough to core Unique platform that str locus seat is sequenced.However, the reading length in competitive technologyL has increased, therefore it is made to be used for legal medical expert The effectiveness of application is played.A possibility that many researchs have revealed that the MPS Genotyping about str locus seat.In short, not How is pipe platform, and the General Result of all these researchs is that STR can be successfully entered, or even also be produced by impaired forensic samples It is raw to analyze comparable genotype with CE.

Although all these researchs all show the consistency with normal PCR-CE method, and even indicate other benefit such as The detection of SNP in STR, but they have also been highlighted many current problems about the technology.For example, str locus parting Current MPS method rely on multiplex PCR, be sequenced to provide enough DNA and introduce PCR primer.However, because multiplex PCR Kit is designed for PCR-CE, so they contain the primer for the amplicon with all size.This variation causes Covering is uneven, has the deviation of the amplification towards smaller fragment, this can cause allele to be omitted.In fact, nearest Research is it has been shown that the difference in PCR efficiency can influence component of mixture, especially at low MAF.In order to solve this problem, Being specially designed for medicolegal several sequencing kits is currently to be obtained commercially, and checking research start to be reported. However, amplification deviation is still apparent due to high-caliber multiplex.

As PCR-CE, MPS are not influenced by PCR skidding.Most MPS research about STR all reports people For the appearance for instilling allele.Recently, systemic MPS research report, most of slip events show as the more of short length State property is different from the real allele of four base-pair units, and one of the most common is n-4, but also observes n-8 With the position n-12.Skidding percentage usually occurs in~1% reading, but 3% can be up at certain locus, instruction MPS can show the skidding of the more height ratio than PCR-CE.

In contrast, in some embodiments, the method and composition provided allows the height of low quality and/or low amounts sample Quality and effectively sequencing, as above and described in Examples below.Correspondingly, in some embodiments, the method and/or group provided Closing object can be used for the rare change of low abundance and the DNA of the DNA of another individual of the different genotype individual mixed Physical examination is surveyed.

Forensic dna sample usually contains inhuman DNA.The potential source of this foreign DNA is: the source of DNA is (for example, saliva Microorganism in liquid or cheek sample), the surface environment that sample is collected by it, and from laboratory pollutant (such as reagent, Workspace etc.).By some embodiments provide be on the other hand certain offers method and composition allow distinguish pollution Nucleic acid material and other sources (for example, different types) and/or surface or environmental contaminants, so that these materials (and/or its Effect) it can be removed from final analysis, and do not make sequencing result bias.

In the DNA of height degradation, due to not containing the DNA fragmentation of necessary primer annealing sites, locus-specific PCR possibly can not works fine, cause allele to be omitted.The uniqueness that such case calls repressor gene type, and The confidence level matched is not sure, especially in mixture test.However, in some embodiments, the method and composition provided Allow to use single nucleotide polymorphism (SNP) as the supplement of STR marker or substitution.

In fact, SNP is related to legal medical expert's work more and more as the data about people's hereditary variation are continuously increased. Like this, in some embodiments, the method and composition provided uses PRIMER DESIGN STRATEGY, allows to for example based on current Available sequencing kit generates multi-primers experimental subjects group, in fact ensures that reading across one or more positions SNP.

Triage

Refer generally to the triage that the patient based on one or more non-treatment correlative factors divides, is that medical field is felt very much The theme of interest.This interest largely may be the fact that fail acquisition FDA approval due to certain treatment candidates, to be partially Due to the difference that do not recognized previously in the patient in test.These differences can be or including one or more hereditary differences, It causes therapeutic agent to be metabolized by difference, or in one group of patient relative to existing in one or more of the other patient group or deteriorate Side effect.In some cases, some or all of these differences can be detected as one of patient or a variety of differences Genetic profile, lead to that the reaction of therapeutic agent is different from not showing other patients of identical genetic profile.

Correspondingly, in some embodiments, the method and composition provided is determined for specific group of patients (example Such as, the patient with common disease, illness or situation) in which subject may respond specific therapy.For example, in some realities It applies in example, the method and/or composition provided can be used for evaluating whether particular subject has and the bad response to therapy Relevant genotype.In some embodiments, the method and/or composition provided can be used for evaluating whether particular subject has There is genotype relevant to the active response to therapy.

Monitor the response (Tumor mutations etc.) to therapy

Going out for next-generation sequencing (NGS) has allowed with the mutation feelings of unprecedented details characterization tumour in genome research Condition, and led to diagnosis, prognosis and the clinical cataloguing that can operate mutation.In short, these mutation, which have, passes through personalized medicine For improving cancer result and for the significant hope of potential early-stage cancer detection and screening.Before present disclosure, The critical limitation in the field is when they are cannot detect these mutation in the presence of low frequency.Clinical biopsy is often main It is made of normal cell, and it is also a technological challenge that the cancer cell based on its DNA mutation, which is detected even for modern NGS,. Tumor mutations identification in thousands of normal gene groups, which is similar to, looks for a needle in a haystack, and needs the sequencing beyond previously known method quasi- Exactness is horizontal.

Generally, in the case where liquid biopsy, the problem is more serious, is not only to provide discovery wherein challenging Extreme sensitivity needed for Tumor mutations, but also pass through the DNA of minimum measurement usually existing in these biopsies To realize this point.Term ' liquid biopsy ' is often referred to blood and is based on the presence of Circulating tumor DNA (ctDNA) and informs cancer The ability of disease.CtDNA is fallen in blood by cancer cell, and monitoring, detection and prediction cancer has been displayed and allows tumour The very big hope of Genotyping and therapeutic choice.These applications can thoroughly change the current management of cancer patient, however, progress It is slower than expected in the past.One main problem is that ctDNA typically represents all cell free DNA present in blood plasma (cfDNA) very small part.In metastatic cancer, its frequency can be > 5%, but in localized cancer, only 1%-0.001%.Theoretically, the DNA subgroup of any size should all be detected by measuring enough number molecules of interest.However, The basic limitation of prior method is the high-frequency that base is improperly scored.Mistake often fasciation at, sequencing circulation, weak cluster Occur during resolution ratio and template degradation.The result is that the sequencing base of about 0.1-1% is improperly called.Further ask Topic can arise from polymerase errors and amplification deviation during PCR, this can lead to deflection group or pseudomutation allele The introducing of frequency (MAF).In short, previously known technology, including conventional NGS, it cannot be to detect the required level of low frequency mutation It executes.

Several method has been used for the accuracy for attempting to improve NGS.It is had been displayed and is subtracted with external reparation kit removal DNA damage False variant number of calls in few NGS.However, and not all mutagenic damage all identified that the fidelity of reparation is not yet by these enzymes It is perfect.Having obtained another method significantly paid close attention to is repeated using the PCR for arising from each DNA fragmentation, to form consensus sequence. Referred to as ' molecular barcode ', share unique random shearing point or the random dna sequence being exogenously introduced before or during PCR Reading is grouped, and keeps most common sequence.Kinde et al. describes this idea using SafeSeqS, described SafeSeqS uses single chain molecule bar code, is grouped by the PCR copy that shared bar code is sequenced, and formed shared Sequence reduces the error rate of sequencing.This method causes 0.5% average detected to limit, and is successfully used for detection metastatic carcinoma CtDNA in disease, but only detected in~40% early-stage cancer.Can substantially it be changed using digital drop PCR (ddPCR) It is apt to detection limit, the ddPCR can detecte with down to the mutation of~0.01% MAF.However, these mutation need to be previous Known, this severely limits multiple cancer applications.In addition, can only once test 1-4 mutation, high throughput screening is eliminated (table 2).

Table 2.

Before present disclosure, there is with the comparable sensitivity of ddPCR but do not need the priori knowledge of Tumor mutations Unique technical be DS.DS extends the concept of molecular barcode by using duplex molecule bar code, to utilize two DNA chain The fact that containing complementary information.We are previously it has proven convenient that this method leads to the unprecedented sensitivity of < 0.005% in people's core DNA.

Due to its high accuracy, DS, SPLiT-DS and CRISPR-DS and conversion for increasing these microarray datasets and The method of workflow efficiency has in oncology wishes.As described herein, the method and composition provided allows to the side DS The duplex molecule of DS is tagged and is integrated with target sequence specific amplification (for example, PCR), for increasing by the innovative approach of method Efficiency and scalability, while maintaining error correcting.

In addition to the needs about pin-point accuracy and effectively measured, the reality of clinical labororatory also need quickly, can expand It opens up and reasonable cost effectively measures.Correspondingly, improve the root of the workflow efficiency (for example, the enrichment strategy for being used for DS) of DS Various embodiments according to the aspect of this technology are desired.As described herein, for the specific target sequence of DS application based on expansion The enrichment and digestion of increasing/size selection enrichment provides high target-specific, the performance about low DNA input, scalability and most Small cost (usually~$ 2-3/ sample).

Some embodiments of the method and composition of offer in general cancer research and the special field ctDNA It is even more important, because the technology developed herein has with the potentiality of unprecedented sensitivity identification cancer mutation, while minimizing DNA Input, preparation time and cost.In other embodiments disclosed herein, SPLiT-DS and CRISPR-DS can be used for clinic Using, can by improved case control and early-stage cancer detection dramatically increase survival.

Example

Example 1:SPLiT-DS

SPLiT-DS is the targeting enrichment strategy for the based on PCR of dual sequencing error correcting, and on every chain Molecular barcode use compatible (Fig. 4 A).In this exemplary embodiment, in order to start SPLiT-DS analysis, using a kind of or A variety of methods (being similar to previously described dual sequencing library building as known in the art), by one or more DNA samples Fragmentation.After fragmentation, the most common end reparation and 3 '-dA- tailings are executed, then for each DNA fragmentation and containing simple And or half degeneracy double-strand bar code T- tailing DS adapter connection (Fig. 4, step 7).Alternatively, it can be used previously The other types of connection described in International Patent Publication No. WO 2017/100441 and U.S. Patent number 9,752,188 protrudes End, flush end connection or adapter connection chemistry.Using to the universal primer binding site specificity in single-stranded adapter tail portion Primer carries out PCR amplification to the DNA molecular of essentially all dual linking, provides the multiple of the DNA fragmentation for being originated from every chain Provided with bar code copies (" segment of provided with bar code ") (Fig. 4, step 2).After removing byproduct of reaction, by given sample splitting At two sseparated pipes, (Fig. 4, step 3) is (that is, sample is split into two halves, wherein each pipe contains the sample content of substantially half Object).On an average, the half copy of any given provided with bar code segment will be transferred in each pipe;However, since sample is torn open Randomness involved in point, it may occur however that the variation in the distribution of any given provided with bar code segment.It is any such in order to explain Variation, hypergeometric distribution (that is, probability that k bar code copy is selected without replacement) is used as model, following to realize with determination Suitable high probability needed for given bar code minimum PCR copy number: each pipe, which contains, to be originated from from original dual two At least one respective provided with bar code segment of (i.e. two) DNA chain.Consider according to hypergeometry model, during step 1 >=4 PCR cycle (i.e. 2E4=16 copy/bar code) is more likely to provide 99% probability of > below: each provided with bar code segment (is come From each chain) it is at least indicated once in each pipe.This assumes that this is all uniformly and close to 100% PCR amplification efficiency In the case of all may be unpractical, but for the high quality DNA sample of relatively low input (such as 10ng human gene group DNA/ 50uL PCR) it is reasonably to assume.By sample splitting at two pipes after, using for linking subsequence and purpose gene The primer of seat specificity is enriched with target gene seat (Fig. 4, step 4) with multiplex PCR.

Multiple gene seat specific PCR is executed in this way, so that obtained PCR product is only originated from given DNA in each pipe One of two raw chains of molecular sample.This realizes according to following procedure, using splitting into two pipes (the as described herein One pipe and second pipe) sample.In the first pipe, using for hybridizing with " reading 1 " (i.e. Illumina P5) linking subsequence Primer (Fig. 4, the step 3 of specificity;Grey arrow), and using for reading 2 (i.e. Illumina P7) linking subsequence Sequence tailing, for primer (Fig. 4, the step 3 of purpose genetic loci specificity;Black arrow w/ grey tail portion), Lai Zhihang PCR.Alternatively, the tail portion can be shortened so as not to containing complete P7 sequence, it can be on the contrary before sequencing via slightly PCR addition afterwards.Propose that this step provides in each end there is the amplified production of P5 and P7 sequence to be only from The DNA (i.e. initial sample DNA) of a chain from original parent DNA molecular.Sequentially or simultaneously, class is repeated in the second pipe As react: compared with the sample amplification in the first pipe, expand from be originated from identical genomic locations opposite strand expansion Increase production object.This linking by using locus specific primers and to opposite universal primer sequence (i.e. P7 rather than P5) Sub- primer realizes that the locus specific primers pair are annealed with the opposite chain direction in pipe 1 (that is, back-reference sequence phase For reference sequences), and tailing is carried out with opposite universal primer sequence (i.e. P5 rather than P7).To be similar to conventional dual Method used in sequencing analysis/library construction analyzes data, thus will being total to from ' original first chain or original second chain ' The reading for enjoying particular bar is grouped into single-stranded consensus sequence.

Then compare these single-stranded consensus sequences (" SSCS ") and for another raw chains (for example, as described herein Opposite strand) calculate consensus sequence.Only when the sequence obtained at same position and two from every dual raw chains The SSCS mutual added time just retains the identity of nucleotide position.If the identity of position is mismatching in SSCS, this is indicated A bit.For there are consistent nucleotide positions between wherein pairs of SSCS, the identity of the position is in final dual shared sequence It arranges in (forming DCS) and (Fig. 1 C) is described in detail.For the unmatched position of sequence identity between two of them SSCS, these quilts Labeled as potential wrong site, and usually by ignoring the position mark is unknown (i.e. " N ").As previously in state Alternative strategy described in border patent publication No. WO 2017/100441 and U.S. Patent number 9,752,188, including if hair It now mismatches, ignores entire consensus sequence reading, or confidence level is distributed into a variant relative to another using statistical method It is a, and the prior probability based on certain types of mistake, and according to the given SSCS of the kinsfolk's number for constituting it how It shows well and how consistent well these are, determine which is more likely to as real variant.Another method is to protect The uncertainty of nucleotide position is stayed, for example, with IUPAC nomenclature (such as " K " represents it can be the position of G or T).Example Such as, amplification mistake under prior probability or given sequence background based on certain form of sequenator supports each pairing altogether There are the opposite number of readings per taken of each variant in sequence family at this location or the reading of the original reading comprising SSCS family Quality score etc., can be by other Information application in consensus sequence data file, with one nucleotide of reflection and another core The relative possibility of identity of the thuja acid at undefined position.

It should be noted that although dual consensus sequence call method is substantially similar to International Patent Publication No. WO 2017/ 100441 and U.S. Patent number 9, method described in 752,188, but in the case where SPLiT-DS, in an end of molecule The unimolecule identifier nucleotide sequence at place is commonly used in identifying each molecule (opposite with one on each end), and at one Discovery is originated from the sequence reads of the copy of one of raw chains in pipe, and in another pipe it can be found that complementary raw chains.So And be not necessarily such case: as described elsewhere herein, the PCR reaction in dual amplification library can be split into more than two A pipe (for example, four pipes that each pipe has a specific primer pair), and carried out in two ends of initial molecule Process is stated, so that each molecule prepares two dual consensus sequences.Initial p CR reaction can similarly split into multiple pipes (figure 10), and subgroup of the multimetering for dual sequencing error correcting and/or longer sequence and short reading sequence can be generated Dress.

It is often convenient to distinguish them that the product of each pipe is discriminatively indexed after multiple sequencing.However, this is not It is enforceable.A benefit of SPLiT-DS can be achieved on the targeting enrichment using PCR, accelerate the double of previous version The workflow of sequence is resurveyed, relies on hybrid capture to be enriched with destination region or other methods.Meanwhile it allows using dual linking Son and label are used for maximum likelihood, this is cannot achieve with the sequencing of conventional amplification.

Example 2: the exploitation of the SPLiT-DS for CODIS str locus seat

This example is based on the insight that the repeat region such as short tandem repeat (STR) to DNA carries out Genotyping Currently available method, will benefit from the improvement of accuracy and sensitivity.The example extends and improves established DS scheme (itself can remove " skidding ";Fig. 3 B), with generation " SPLiT-DS " measurement/scheme.This example will confirm setting for (1) primer Meter and for subsequently selected used in the multiplex PCR;(2) improve the method for DNA library preparation;(3) for example using decrescence measuring DNA, accuracy, the accuracy, sensitivity and specificity of provided technology are provided;(4) it confirms in final error correcting number The skidding substantially reduced in.

Design of primers and selection for multiplex PCR

SPLiT-DS PCR primer is designed as preferably having following characteristics: 1) high target-specific;It 2) being capable of multiplex;And 3) Show steady and bottom line deviation amplification.Although many existing primer mixtures, which meet, is used for Standard PCR capillary These standards in electrophoresis (PCR-CE), but identical primer mixture is unreliable in MPS.For this purpose, data available is (by surveying Sequencing data mapping point (the i.e. paired end sequencing number obtained before sequence using the kit of amplification target gene seat being obtained commercially 5 ' ends of each reading in correspond to 5 ' ends of the PCR primer for DNA amplification)) for developing for this example In primer.Understanding as described herein and the data obtained from previous case, for informing for extending CODIS core gene Seat (CODIS20) is plus PentaD, PentaE and SE3329 (for simplicity, unless otherwise stated, this will simply Be referred to as CODIS locus) initial primers group design.Previously determined mapping point is not provided about commercially available (or with it Its mode) primer used in available kit other information, such as length, melting temperature and concentration, therefore this example In the generation of primer concentrate on such design, it is uniform, steady and specific to make to realize before any reaction of multiplex The probability of amplification reaches maximum.

With such as gel analysis on the contrary, result can be analyzed by direct Sequencing (such as Illumina MiSeq platform). Each sample can be assessed, in many measurements to design optimal substrate mixture.Measurement includes: 1) specific (i.e. in target On reading number divided by the reading that misses the target number);2) about the allele coverage rate of heterozygous genes seat (i.e. compared with low depth Allele is divided by compared with high depth allele;1.0) ideal is;3) between locus balance (i.e. lowest depth locus divided by Highest depth locus;1.0) ideal is;And 4) (mean depth of i.e. each locus is divided by all genes for change in depth The overall average depth of seat.At least one primer sets of these metric sebections can be based on, for further analyzing and developing.It is alternative Ground and/or additionally, design of primers may include for every kind of STR marker using network-based program, such as Primer3。

Example 3: the improvement in the preparation method of library

Library preparation method for SPLiT-DS follows known standard scheme, such as dual sequencing scheme, Zhi Dao One PCR step is completed.This example is by improving after first dual sequencing PCR step, in locus-specific PCR And especially improve and extend the program, the locus-specific PCR the step of occurring on locus-specific PCR It is that SPLiT-DS technology provided herein is distinctive.

It is as a reference point, first using known buffer, primer pond concentration and PCR condition (for example, such as in the standard side DS In case) operation reaction, but it is applied to SPLiT-DS method, play the targeting enrichment after carrying out initial dual sequencing PCR Purpose can be then enriched in some cases for the targeting of other forms, such as hybrid capture.Pass through direct Sequencing Reaction on Illumina MiSeq platform, and monitor specificity, about the allele coverage rate of heterozygous genes seat, gene Balance and depth between seat, to determine the effect of these conditions are to multiplex PCR.It is (rather than for example wrong that the measurement will assess PCR effect Accidentally correct), therefore about 100,000-500 will be used, 000 reading/condition allows to analyze at least 50 PCR condition/sequencings Operation.

In the particular instance, it should obtain the PCR copy of average 3 to 10 sequencings from each starting DNA molecular (i.e. bar code family) is to realize successful analysis.In other embodiments, successful analysis can be defined as recycling specific dual point One or more copies of every original DNA chain of son.Consider that more than 3-10 copy can cause using sequenator resource The determination efficiency of aspect reduction, and without other useful data.Consider the average copy of every chain will not be able to satisfy very little about The standard of the successful analysis of definition, and finally reduce depth.Consider in some embodiments, successful analysis to be defined as realizing The sequencing of the minimal amount of every chain copies, and promotes have the double of smaller minimum required copy number than wherein every raw chains Resurvey the dual sequencing of sequence more high accuracy.

SPLiT-DS cannot rely on the known conditions (for example, those of known in other measurements) about DNA input, because For with it is other be currently available that technology compared with, it is unique method;Therefore, will determine in the PCR occurred after the split makes DNA input quantity, because of the depth after the input quantity variation (such as reduction) of the first PCR step necessarily affects processing.

After having determined that DNA input range, absolute magnitude of the measurement based on qPCR for the target DNA of quantitative adapter connection (similar to the step 3) in such as Fig. 4.

With DNA input accuracy decrescence, accuracy, sensitivity and specificity

It is used as about the accuracy of working standard reference material (SRM) DNA, accuracy, sensitivity and specificity for such as The reference point of improving environment as described herein carries out.Then using serial dilution (such as in about 50pg to the range of about 10ng It is interior), SPLiT-DS (for example, accuracy and precision of appraisal procedure) is executed to the input DNA (i.e. sensitivity) decrescence measured.It is right The different library of at least six is independently prepared in each DNA input.(use is specific to dual after sequencing and error correcting The SPLiT-DS variant of sequencing is developed and the in house software of design), evaluate accuracy using STRait Razor: (i) is to processing Data carry out Genotyping;And/or (ii) determines the reading percentage that " correct " genotype is shown at each CODIS locus Than (that is, as known to normalized sample).Accuracy is by determining following is assessed: (i) about heterozygous genes seat etc. Position gene coverage rate;(ii) it is balanced between locus;(iii) change in depth;And/or (iv) skidding percentage (such as sample room becomes Different quantization).

Pollute the detection of DNA

This example also focuses on the improvement being currently available that in DNA appraisal procedure, to detect given sample by exogenous DNA It pollutes (such as by forensic dna of the people of inhuman DNA pollution).In pollution DNA (such as mouse, dog, ox, chicken, Candida albicans (Candida albicans), Escherichia coli (Escherichia coli), staphylococcus aureus (Staphylococcus Aureus) etc.) in the presence of, SPLiT-DS analysis is carried out to people's DNA sample.Analysis includes mixing in triplicate, with following ratios 50: 50,10: 1 and 100: 1 material has the sample DNA of 10ng pollution DNA: (pollutant: sample DNA, by mass) and 100: 0 Control (that is, being free of people DNA) 0: 100 (people DNA of non-spike).The library being each successfully generated is sequenced and is mapped to given Pollutant is corresponding to refer to genome and human genome (GRCh38).The mapping is shown just at each locus for determination The really reading percentage of (such as with reference genome alignment) genotype, and be compared with the value of control.It compares and pass is provided In the information of pollution DNA bound, the pollution DNA bound still allows for successful SPLiT-DS, and (i.e. there may be without not The accuracy of SPLiT-DS and/or the pollution DNA level of intensity are influenced sharply).

Example 4: the SPLiT-DS verifying on single source sample.

In order to verify SPLiT-DS as the feasible high accuracy methods of genotyping to representative crowd, using from From the DNA of the cell purification of personal genome plan (Personal Genome Project) (PGP) (see, for example, in table 3 Details is summarized in the demographics of PGP).

Table 3:PGP Sample details

Assess the ability that SPLiT-DS carries out correct gene parting to DNA single source sample.

SPLiT-DS in duplicate executes the DNA purified from the cell line of the independent individuals from PGP.Test comes from The DNA of about 110 unique individuals.Using an appropriate number of DNA measured in such as previous case (that is, for each locus Reliably (such as > 80%) generates the minimum number of the sequencing library of depth after > 60X is averagely processed) execute SPLiT-DS.In After being sequenced using internal SPLiT-DS software as described herein and executed error correcting, STRait Razor is used for sample Product carry out Genotyping.

The explanation guide of Genotyping is carried out as the SPLiT-DS data to us, uses two duplicate modifications as follows ' consensus sequence ' method:

No result: when at least one (such as one of two), which replicate, generates low coverage rate (for example, < 60x);

Correct genotype: when all (for example, two in two) repeat to generate expected genotype (that is, matching About the genotype in the WGS data of given sample).

Undefined genotype: when at the given locus in all repetitions (for example, two in two) obtain not With genotype when, or when only one genotype and when WGS data difference.

The genotype of mistake: when repeating to show identical incorrect genotype for all (two in two).

By determining the skidding ratio about each sequencing locus, quantitatively skid to all samples and locus execution Amount.It is counted and is counted divided by the reading of actual sample allele to calculate and skid by the reading for the skidding allele that will be given Than.If it is observed that the slip event of more than one type, then carry out the calculating of each skidding length.In order to make the inclined of the analysis Difference is preferably minimized, and can only be calculated and be skidded than (detection is beaten containing the substitution occurred with 5% at mean depth >=60X locus Sliding allele >=1 80% ability (1 sample binomial test) read after processing.It is obtained wherein about at least several In the case where the consistent greater depths covering of locus, the slip event of lower frequency and suitably calculating ratio are checked (for example, adjustment force).

The another part analyzed in the example is by the effect including STR length to various parameters, then comparison result and ginseng Pass the examination at given locus STR length (for example, specificity, about the allele coverage rate of heterozygous genes seat, locus Between balance and/or depth).Consider these parameters assessment improve the polymorphism based on STR length explanation (including for example such as to The SPLiT-DS sample of assessment is derived from general outbreeding group, and can be for example with various STR length polymorphisms).Except commenting Except the effect for estimating STR length, skidding ratio is also determined.Finally, the calculating for executing the taste about each sample (is based on basis The locus of guidance correct gene parting as described herein, such as use the expection gene frequency in American population).

The result of the analysis described in the example can determine SPLiT-DS using width (and appointing in method The degree of what deviation), for example, in various types of samples, and/or for carrying out Genotyping to STR.

The comparison of Capillary Electrophoresis and MPS method and agreement

In order to confirm superiority of the SPLiT-DS as the sequencing approach for forensic application, work as example, executing and being directed to The agreement of preceding methods availalbe.Currently, " gold standard " about legal medical expert's str locus parting is PCR-CE.According to standard Program compares the SPLiT-DS result obtained according to example described herein and carries out base using PCR-CE analysis and 1ng input DNA Because of the identical DNA sample of parting.Two datasets (PCR-CE and SPLiT-DS, together with control/reference appropriate (such as WGS PGP sample data)) it can determine consistency level between two methods.Also using be obtained commercially kit (such as Illumina FORENSEQ DNA Signature Prep Kit) agreement is executed, the kit uses 63 STR Targeting PCR amplification, including CODIS locus and 95 authentication information SNP.Use the consistency in PCR-CE and SPLiT-DS Same sample used in research, and Genotyping is executed using STRait-Razor.Also in every kind of method (PCR-CE, quotient Industry kit, SPLiT-DS) in check that PCR skids, and if euallele peak height is at least 600RFU (random threshold value) But it is no more than 15,000RFU, then calculates skidding.In order to eliminate the positive and negative skidding at the repeatable position between Heterozygous alleles Any additive effect, do not include the position for being separated by two repetitive units.As described herein, by the peak height at the peak that will skid divided by The peak height of euallele calculates skidding percentage.Using the kit assay sample being obtained commercially, have >=60 observe that all allele of reading all call, and the percentage as described herein that calculates skids.In each test base Compare because being executed between the percentage skidding of seat.Although considering that the skidding result between platform cannot compare directly with one another, number According to the reasonable estimated value for the relative abundance skidded in every kind of method will be provided.

Example 5: the SPLiT-DS of damaged dna and DNA mixture is verified.

The DNA and mixture that height is damaged/degrades allow and are currently available that genotyping technique is obscured.Correspondingly, this example The ability that will confirm that SPLiT-DS correctly carries out Genotyping to the sample with damaged dna and DNA mixture improves and expands The method of being currently available that is opened up.

Verifying of the SPLiT-DS to the damaged dna from single contributor

SPLiT-DS:(i is executed to the DNA for being exposed to three legal medical expert's related category samplings) Chemical exposure;(ii) ultraviolet (UV) line;(iii) high temperature (used in previous research/known effect routine STR analysis exemplary exposure method/item The summary of part, referring to table 4).It can be used for the SRM of damaged dna sample due to lacking, the level of damage of induction repeats it in biology Between be standardized.DNA be primarily exposed to as in table 4 environmental condition and time point, and use the kit that is obtained commercially (for example, KAPA Biosystems hgDNA Quantification and QC qPCR kit (Roche/KAPA Biosystems the)) assessment carried out, for measuring DNA damage/degradation in given sample.For certain environmental conditions (as led to Cross measurement as described herein determining), only show may compare horizontal damage (be defined as it is observed that average value In one standard deviation) sample for this example analysis in.

The experiment of the SPLiT-DS of the DNA to impaired/degradation is assessed in triplicate to Promega2800M SRM DNA It executes, minimum needed for forming the library for being able to use SPLiT-DS sequencing using consistently (> 50%) inputs amount of DNA, uses Possible most harsh conditions (measurement for the such amount made as described herein) in each classification of table 4.Consideration does not generate one Those of library condition is caused to be considered as the sensitivity limitation for limiting SPLiT-DS to the DNA of impaired/degradation.Any such library is all Without assessment.

Table 4:DNA damaging condition.

Also sample is sequenced on Illumina MiSeq platform using 300bp pairing end reading, and is used Customization SPLiT-DS software handles data to the data genotype for using STRait Razor to measure as described herein.Consideration is led Cause cannot correctly Genotyping experiment condition (as described in previous case), limit SPLiT-DS to impaired/degradation of dna The limitation of accuracy.Calculating is also executed, to determine the DNA of specificity to(for) impaired/degradation, about the equipotential of heterozygous genes seat Gene coverage rate and/or depth about each locus, and result is compared with undamaged control.

Due to relative performance of the SPLiT-DS to high quality DNA can not necessarily be directly translated into about damaged dna that, Therefore also compared using SPLiT-DS, the execution of standard PCR-CE and MPS method.Before priority of use 10 of Genotyping in example PGP sample executes these methods, and the sample is in each damage type about the SPLiT-DS sample of successful gene parting It is further subjected to most challenging condition (as determining result).As described in previous case, using it is appropriate it is commercially available can The kit obtained carries out Genotyping to sample by PCR-CE and routine MPS.Measurement SPLiT-DS as described herein is to PCR- The relative performance of CE and MPS, relative quantity, allele omission, allele including the skidding between determination and comparative approach Inner equilibrium and Genotyping success rate.Compared with using other methods achievable, using compared with small sample and/or more impaired/ The DNA sample of degradation, SPLiT-DS can provide sensitiveer and accurate result.

Verifying of the SPLiT-DS to mixture.

Confirm SPLiT-DS analysis to improvement effect of DNA mixture (for example, compared with methods availalbe, improvement it is accurate Degree and sensitivity), the DNA mixture with two of broad range of MAF rate hereditary independent individuals by being formed.For table 5 In every kind of mixture, 10 two people combination is selected from the PGP sample of Genotyping in previous case.Used in this example Specific PGP sample depends on specific genotype, as (can be used as the portion of PGP in previous case or by its whole genome sequence Separately win) determined by.If it would be possible, then selection differs the contributor of at least two repeat lengths at >=8 sites It is right.Think to be likely to require and is more than 10ng DNA from each sample.Exact amount by SPLiT-DS to each locus such as What, which effectively works, is determined, as determined by previous case.

Table 5:DNA mixture conditions

DNA input quantity is adjusted in this way, so that any minor contributions person is read by least ten to be indicated.Consider there is at least 10 The expression of a reading assigns 95% chance of > that two allele is detected at all CODIS locus.Realize that 10 MAF are read The required specific quantity of number is limited depending on the sensitivity of SPLiT-DS, as demonstrated in previous case.

In order to be preferably minimized the variability between repeating, QUANTIFILER Duo DNA Quantification is used Kit (Thermo Fisher), quantitatively constructs mixture based on triplicate DNA.As described herein, sample is in Illumina It is sequenced on MiSeq platform, and using customization SPLiT-DS software data processing as described herein, and uses STRait Razor carries out Genotyping.SPLiT-DS commenting to the performance of DNA mixture is facilitated in the presence for assessing skidding in these experiments Estimate.For the locus of each analysis in each blend sample, the Wilson's score section (two about known MAF is calculated The form of item formula ratio confidence interval).Also count the slip event number that a repeat length is differed with the known MAF in mixture Mesh.If the reading that skids counts in 95% Wilson's score section of one of MAF allele, which is considered as portion Divide matching.If two MAF allele are not by the test, the genotype which is considered as failure is called (such as Fruit cannot differentiate MAF and skidding, then homozygous alleles fail automatically).As previous case, also hold as described herein Row and the comparative studies for assessing SPLiT-DS and PCR-CE and MPS, and the relative quantity of skidding, allele are omitted, equipotential base Because of the comparison of inner equilibrium and/or Genotyping success rate.The result of two people's mixtures experiment is used subsequently to carry out three people's mixtures Test (see, for example, table 5), using with two people's mixtures analysis in identical sample selection criteria and analysis.

SPLiT-DS is also used by State of Washington patrol police's forensic laboratory service bureau (Washington State Patrol Forensic Laboratory Services Bureau) provide DNA, use the simulation of single source and two people's mixtures Case sample executes, commercially-available forensic dna aptitude tests of the DNA from previous analysis.Use SPLiT-DS's Genotyping is compared with the consensus sequence result of the Online release about sample.

Improvement performance of the example 6:SPLiT-DS to damaged dna sample

Formalin is fixed with cytidine deamination, and the form of oxidative damage and crosslinking causes poles D NA to damage.In order to confirm with Currently available method compares the ability of SPLiT-DS, by being subjected to at the D3S1358 locus of Promega 2800M SRM The fixed core DNA of formalin is sequenced, and the DNA being damaged to height is analyzed (Figure 13 B and 14A).Figure 13 A-13C is shown According to one embodiment of this technology, from the data of SPLiT-DS program.Figure 13 A is the Insert Fragment shown before sequencing (swimming lane 1 is ladder to the representative gel of size;Swimming lane 2 and 3 is the sample of the PCR product from each pipe;For example, see Fig. 4's Step 4).Figure 13 B and 13C are shown in there is no (Figure 13 B) in the case where error correcting and are analyzing it with SPLiT-DS Figure of the CODIS genotype of (Figure 13 C) relative to multiple sequencing readings afterwards.Figure 13 B is shown in the feelings that error correcting is not present There is the sample (D3S1358) for the polymorphism observed under condition;Slip event is indicated by black arrow.Figure 13 C show with The sample (D3S1358-DCS) of detectable slip event is free of after SPLiT-DS analysis.The respective x-axis instruction of Figure 13 B and 13C CODIS genotype, and y-axis indicates number of readings per taken.

Figure 14 A and 14B are to show according to one embodiment of this technology, and for the DNA of high injury, mistake is being not present (Figure 14 A) and the CODIS genotype after being analyzed with SPLiT-DS (Figure 14 B) are read relative to multiple sequencings in the case where correction Several figures.The x-axis of each figure indicates CODIS genotype, and y-axis indicates number of readings per taken.Figure 14 A, which is shown, does not pass through SPLiT-DS (D3S1358) the damaged dna sample analyzed, and confirm the obvious point mutation of slip event (black arrow) and significant quantity (not shown).Figure 14 B shows the sample (D3S1358-DCS) analyzed with SPLiT-DS error correcting, and confirms to be not present Detectable slip event.Apparent point mutation is not observed.

SPLiT-DS is eliminated using SPLiT-DS and is sequenced using standard as a result, it was confirmed that on the DNA of formalin exposure The artefact of all based on PCR and sequencing existing for method.(Figure 13 C and 14B).It notices in the efficiency about these samples Reduction (about 3 times) (see, for example, Figure 14 B relative to Figure 13 C), however, interchain linkage common during formalin is fixed In the presence of this reduction may have been facilitated.

Example 7: target gene group fragmentation

This example confirms target gene group fragmentation as the method for improving genomic DNA (gDNA) sequencing efficiency. SPLiT-DS genomic fragmentization is usually come for example, by the method such as physical shear of DNA phosphodiester bond or enzymatic digestion real It is existing.Such method can produce sample, wherein complete gDNA is reduced to the mixture with the DNA fragmentation of random size. Although height is steady, the DNA fragmentation of variable-size can cause PCR amplification deviation (short-movie section amplification is more) and non-uniform It is sequenced depth (Figure 11 A);And sequencing reading not Chong Die with destination region in DNA fragmentation.Correspondingly, this example uses CRISPR/Cas9 overcomes these problems.Cleavage site is designed as generating predetermined and evenly sized segment.One pack of more homogeneous Section is considered being likely to overcome deviation and/or the presence without information reading, and the deviation and/or the presence read without information may Influence the efficiency in other technologies without using targeting fragmentation.It is additionally considered that targeting fragmentation may promote before prepared by library The preenrichment of given sample, because due to clip size consistency/difference, it may be possible to by from gDNA isolated fragment make a return journey Except big region of missing the target.

Example 8: for monitoring and diagnosing the SPLiT-DS of cancer

The presence of Circulating tumor DNA in blood is had recognized that in decades, but needs overdelicate method for reliably Develop cancer biomarkers object (for example, diagnosis and/or tracking disease presence/progress marker).SPLiT-DS help overcomes Universal challenge, including a small amount of Circulating tumor DNA in the blood sample containing different amounts of cell free DNA.SPLiT-DS also changes Method that is kind and extending several high sensitivity known in the art and specificity, such as BEAMing, SafeSeqS, TamSeq And ddPCR, because it does not need the priori knowledge of specific mutation.SPLiT-DS provides the side for being able to detect cancer related mutation Method, with the accuracy of currently available highest level, low DNA input and without the priori knowledge of specific tumors mutation.

This example assesses sequence relevant to circulating tumor cell DNA using SPLiT-DS.Use pair of known mutations Product in the same old way, and run with together with the sample with diagnosis and/or the patient of doubtful cancer.

SPLiT-DS and genome or cell free DNA

SPLiT-DS is used to develop the survey being accurately sequenced for low input gDNA (10-100ng) and cfDNA (~10ng) It is fixed.Genomic DNA generally with large fragment (> 1Kb) appearance, and cell free DNA almost uniquely as rare frequency~ 150bp segment occurs.

It is low input 10-100ng) gDNA basic principle

This example confirms the feasibility that SPLiT-DS inputs low DNA and its for the applicability of multiplex.Although group Knitting can obtain from the biopsy of cancer patient, but preferably such sample use be it is conservative, to complete own Necessary test.Correspondingly, the sequencing of gDNA will benefit from improved platform, such as the platform provided by SPLiT-DS, need Want less input material.

Each target in SPLiT-DS is to separate design and optimize.Gene TP53, KRAS and BRAF are proved as principle It is measured.Particularly, every kind of gene has known target region, and mutation relevant to cancer occurs wherein.TP53 has 10 encoded exons (having relatively small size) are all these all to be targeted using SPLiT-DS.KRAS has The known mutations hot spot at codon 12,13 and 61 in exon 2, it is all these to be all targeted.BRAF has to be shown outside V600E mutation in son 15, will be targeted.

Materials and methods

SPLiT-DS measurement being executed to gDNA, as summarized in Figure 4 and 5, being had using in TP53, KRAS and BRAF The DNA for going mark (de-identified) tumour of known clonal mutation, and the leucocyte gDNA from no cancer individual.It holds Two groups of different experiments of row, to execute any optimization/verification step and testing efficiency and sensitivity.

Efficiency

Efficiency is defined as the percentage of its input DNA molecular for being converted to DCS reading.Efficiency targeting in the example is extremely Few 30%, but > 50%.Think to be likely to the average DCS depth that 10ng input DNA reaches 1000x across target gene seat (10ng=~3200 genome, therefore 0.3 efficiency of 3200x=~1000 genomes being sequenced).Efficiency depends in part on The performance of multiplex PCR.Using computer chip method, PCR primer is configured to have: i) high target-specific;Ii) multiplex Ability;And iii) execute steady and the amplification of bottom line deviation ability.

CRISPR/Cas9 system for specifically generate include specific purpose region~500-550bp segment (referring to Figure 11 C).Complete guidance RNA and PCR primer design after, combined method for realizing: (i) target-specific is (i.e. on target The percentage of reading, acceptable > 70%);And (i.e. lowest depth locus is divided by most for depth balance between (ii) locus High depth locus;Acceptable > 0.5).Then it is identical the guidance of optimization and primer pond to be applied to 10ng and 100ng gDNA.These ponds are for being related to all subsequent experimentals of gDNA.

Sensitivity

TP53 mutation tumour gDNA with 1: 2,1: 10,1: 100,1: 1000,1: 10,000 ratio spike to compare, not In the leucocyte gDNA of mutation.Used in KRAS and BRAF respectively in containing known clonal mutation other two kinds of Tumour DNAs execute Identical combined experiments, in total 15 samples (3 kinds of genes respectively 5 dilutions).As described herein, using 10ng and 100ng DNA is inputted, this 15 samples are processed by SPLiT-DS." expected " and " observing " MAF is compared (using maximum MAF is by MAFmaxThe guidance that=α 1N is determined, wherein N is the number of genome, and a is the efficiency of SPLiT-DS;Such as 30% efficiency, MAFmaxIt is 0.1% for 10ng DNA, and is 0.01%) for 100ng DNA.

Based on binomial distribution, it is believed that be likely to realize detection in MAFmax63% probability of the existing given mutation in place.Cause To there are 3 spike mutation in an experiment, so statistically, be more likely to detect at least one with 0.1% and 0.01%, And when efficiency increases above 30%, this probability will increase.

In addition to spike mutation, SNP will be used to confirm sensitivity, because normal control DNA is from different from Tumour DNA Individual.With identical dilution (homozygous SNP) and 1: 4,1: 20,1: 200,1: 2000 and 1: 20,000 (heterozygosis SNP) it is effective Dilution checks SNP.

CRISPR/Cas9 can effectively cut all TP53 exons, and select to promote enrichment and make by size Reading use reaches maximum.CRISPR/Cas9 guiding thing is designed as cutting TP53 exon (referring to Figure 12 A).Such as previous case Described in, it is digested using SPLiT-DS and processes 10ng gDNA (referring to Figure 12 B and 12C), expanded with PCR primer appropriate outer Show sub- 5-6 and 7 (Figure 12 C and 12D).After for every kind of molecular matching complementation random tags, with high percentage on target Two DNA chain are suitably sequenced in reading, and generate DCS reading (Figure 12 D).In addition, the starting amount of DNA for 10ng obtains Mean depth corresponding to 25% efficiency, (that is, from original 3000 genomes, right~800X average value is surveyed Sequence), represent the 50 times of improvement for the DS that is above standard, and unprecedented improvement compared with conventional soln hybridizing method.

Example 9: the SPLiT-DS for cfDNA to be accurately sequenced is developed

This example confirms purposes of the SPLiT-DS for the mutation in detection example cancer related gene: in cfDNA TP53, KRAS and BRAF.

Materials and methods

It is extracted using QIAamp Circulating Nucleic Acid kit from the blood plasma being obtained commercially The cell free DNA of (Conversant Bio).Using three kinds of different synthesis 150bp DNA moleculars, three kinds of purposes are encoded The respective known mutations of gene.These synthetic dna molecules are respectively with 1: 2,1: 10,1: 100,1: 1000,1: 10,000 ratio In spike to cfDNA.Two groups of different experiments are executed, to optimize and verify the SPLiT-DS strategy parameter about cfDNA.

Efficiency

Because cfDNA fragmentation does not need cutting (such as CRISPR/Cas9).Therefore, as in previous case The execution SPLiT-DS, with the addition of nest-type PRC.Gained segment is sequenced using MiSeq v3150 circulation, greatly About 10 sample multiplexes in cartridge, wherein each sample is read for 2,500,000 in total.

Sensitivity

By SPLiT-DS analyze TP53, KRAS and BRAF in cfDNA be mutated respective five kinds of mixed diluting objects (1: 2,1: 10,1: 100,1: 1000,1: 10,000), with the optimizational primer designed in this example, and with 10ng and 100ng DNA Start.Experiment runs parallel with SafeSeqS, with the sensitivity between comparison techniques (for the known skill of ctDNA to be accurately sequenced Art is SafeSeqS, reduces NGS mistake by using single-stranded correction).Think to be likely to for MAF=0.1% and 0.01% abrupt climatic change, SPLiT-DS are better than SafeSeqS.Think that being likely to SPLiT-DS can be averaged with 0.5% estimation Sensitivity technique spike is mutated (table 2), but Safe-SeqS cannot detect any spike mutation under so low frequency.

Primer (for nested PCR method) is designed as the codon 12 and 13 in amplification KRAS exon 2.Parallel processing from 10ng the and 20ng cfDNA extracted in normal plasma (Conversant Bio).Figure 15 A and 15B are visually indicated according to this One embodiment of technology, using nest-type PRC and by the KRAS exon of 10ng (Figure 15 A) and 20ng (Figure 15 B) cfDNA generation 2 SPLiT-DS sequencing data.In this example, target enrichment is completed using SPLiT-DS, and is matching end with 75bp It is sequenced on the Illumina MiSeq of reading.Show the SSCS before dual formation about both ' A ' and ' B ' chain, And final DCS reading.Arrow indicates two locus-specific PCR primers (grey primer=nest-type PRC primer).

As shown in Figure 15 A and 15B, " side A " corresponds to two different DNA chain with " side B ", is suitably expanded, and And find their complementary strand, to form the DCS reading of pin-point accuracy.Although the depth obtained is moderate (~50 readings), It corresponds to~1% efficiency, is the current efficiency of standard DS.Therefore, (no any optimization), SPLiT- in baseline DS obtained with efficiency identical with method used at present as a result, but use as few as the input DNA of 10ng, it was demonstrated that more than for surveying The improved efficiency of other methods availalbes of sequence cfDNA, including with few quantity.

Example 10: the SPLiT-DS for cancer of pancreas detection and prognosis based on ctDNA.

This example is confirmed after being suffered from the mutation in the ctDNA of ductal adenocarcinoma of pancreas (PDAC) patient using SPLiT-DS detection Improvement (compared with currently available method).SPLiT-DS provide ddPCR multiple target genes (including KRAS, TP53 and BRAF the improvement sensitivity in).Think the result for being likely to these measurements confirms to be more than current method, in 95% PDAC patient One mutation of middle detection and the sensitivity improving that two mutation are detected in the PDAC case of > 50%.

In addition, due to most of DNA in the circulation of people experimenter (i.e. in the circulatory system (such as cell free DNA)) With haematological origin, therefore leucocyte DNA is the sequence and mutation compared with those of discovery in cfDNA.These results with than The bigger sensitivity of other results and accuracy inform whether certain background mutations are derived from leucocyte subclone.

Materials and methods

It assesses from 40 patients for suffering from PDAC, 20 patients for suffering from chronic pancreatitis and 20 age-matcheds just The cfDNA for going mark completely often compareed and matched leucocyte DNA sample.Blood sample is handled extracting in two hours, and is mentioned For the sample including 2-5ml blood plasma and 500ul buffy coat.In addition, a piece of freezing tumour can be used for really for PDAC patient Recognize Tumor mutations.For all PDAC patients, blood is obtained before surgery.All patients all clinically follow-ups, and in detail Clinical pathology information be it is obtainable, including to recurrence and dead time.Patient Sample A includes from 20 with part Cancer and 20 Patient Sample As with metastatic cancer.

CtDNA is extracted with QIAamp Circulating Nucleic Acid Kit, and is tried with QIAamp DNA Mini Agent box extracts gDNA.With the cf DNA of SPLiT-DS program processing 10ng appropriate or more as described herein (from collection Blood plasma), the gDNA and all available ctDNA (at most 100ng) of 100ng, target KRAS, BRAF and TP53.Using being used for The Illumina150-cycle MiSeq v3 Reagent Kit of ctDNA and 600 circulations for gDNA execute sequencing.In In 150 circulation kits, 10 ctDNA samples are by multiplex, and in 600 circulation kits, 15 gDNA sample quilts Multiplex.Based on experimental design, it is believed that it is likely to for 10ng DNA, with the sequencing depth of at least 1,000x, and for The measurement depth of 100ng DNA, up to 10,000x obtain at least 30% expection efficiency.It is generated in sequencing, DCS and mutation is reflected Determine post analysis data.

Cancer of pancreas detection

In this example measurement SPLiT-DS detection from PDAC patient cfDNA in KRAS, TP53 and The sensitivity and specificity of BRAF mutation.For sensitivity for analysis, compares the mutation found in cfDNA and identified with by SPLiT-DS Tumor mutations (clone and subclone).Since SPLiT-DS result is provided about nearly all PDAC with 1 mutation Case and > 50% have the covering of the case of 2 mutation, therefore the combination sensitivity for all PDAC~90%, it is believed that very At least one Tumor mutations may be detected in the cfDNA from all transfer cases and about 80% local case.

Compare the mutation found in cfDNA and matches the mutation found in leucocyte with from what is purified in same patient.In The mutation found in cfDNA and matched leucocyte is considered as Biological background, and from the final discontinuous counter in cfDNA In ignore.After deducting shared mutation, cfDNA mutation is compared in PDAC, pancreatitis and control.Think to be likely to cancer mutation With the frequency for being higher than Biological background mutation, even if Biological background mutation (such as age related mutation) is retained in sample In.The optimal threshold about the frequency of mutation is determined, to use the ROC model of area under the curve and ageadjustment, with maximum Sensitivity and specificity distinguish cancer and control.

Cancer of pancreas prognosis

Since the sensitivity of the SPLiT-DS confirmed in such as previous case increases, it is believed that be likely to and previous methods availalbe phase Than ctDNA all can be detected in almost (90%) all PDAC patients.(i.e. instead of the binary variable existing for ctDNA Yes/No), ctDNA MAF is analyzed as quantitative variable, and compares MAF score and clinical data (obtains for example, comparing MAF Point and prognosis).Also determine whether mutated gene, codon and/or mutation type are associated with recurrence or the death rate.For mixing The multivariable COX model of factor (including age and stage) adjustment, for testing these variables and combinations thereof to predict without disease The ability of survival and overall survival.Kaplan-Meier curve is used for the predicted value of presentation class variable.

Example 11: for identifying the SPLiT-DS of the resistant mutation in metastatic CRC

Use ctDNA detection early-stage cancer and prediction recurrence

In the about 50% metastatic CRC (i.e. IV phase) for representing presented case, oncogene parting controls guidance It is required for treating decision: the Cancer-causing mutation in KRAS, NRAS and BRAF occurs in about 50% CRC patient, and predicts to lack Response to EGFR monoclonal antibody Cetuximab and Victibix.Therefore, these genes are in fixed and loose tissue More solito is evaluated in biopsy the two, but is currently available that method frequently results in low-quality subclone resolution ratio, And has the shortcomings that sampling deviation.Thus, it is possible to miss the tumour with subclone mutation, and a part of patient may be given The treatment that application fails certainly.Therefore, in this example, carrying out oncogene parting with ctDNA using SPLiT-DS confirms ratio It is currently available that technology has the measurement for improving sensitivity, it is described since SPLiT-DS detects pre-existing resistant mutation Measurement also improves diagnosing and treating, has adjusted qualification of the patient for EGFR Block For Treating.

CRC exists and/or the detection and prediction of recurrence

In experimental subjects group of the SPLiT-DS for the CRC gene of 5 common mutations, to confirm the mutation in ctDNA Detection, the priori knowledge without the mutation of any specific tumors.Think that the result very likely from the measurement is able to use simplification Following CRC detection is informed in test (such as blood testing) much.

This example is also demonstrated to the improvement for detecting and/or predicting the method for recurrence.Currently, can be lacked with technology Enough sensitivity and/or the limitation of specificity, alternatively, for the technology with enough sensitivity/specificities, their cost mistakes It is high.Therefore, the SPLiT-DS analysis of ctDNA confirms the improvement recurred in CRC detection and prediction, provides extension and the multiple base of assessment Improvement in the accuracy (for example, being more than that such as SafeSeqS is greater than 100 times) of cause and ability.

Materials and methods

The sample of the patient of the multiple biopsy type of the patient of the operation excision of tumour is undergone to use from > 300 In this example.Available biological sample includes tumour, blood plasma and buffy coat.The trouble that longitudinal follow-up sample is obtained by it Person, and blood sample is obtained at after baseline is cut off 6,12 and 24 months.For all patients, detailed clinicopathologia can get Information, including recurrence.The medical information of all samples and coding all is gone to identify completely.Previously with regard to KRAS and NRAS mutation evaluation Sample from the patient with metastatic disease, a possibility that determine to the response of Cetuximab or Victibix.Such as Fruit does not find to be mutated, then applies targeted therapy.Resistance is recorded via the progress of imaging research.

Assessment is from 20 patients and 40 trouble with localized cancer (I-III phase) with metastatic cancer (IV phase) The sample of person.DNA is purified in the blood plasma (2-5ml) and buffy coat obtained from operation consent, and freezing tumor sample.Classification It is those of feminine gender patient to be tested for KRAS and NRAS mutation, but be not responding to EGFR inhibition for the patient with metastatic cancer Agent treatment.It further include the patient that at least ten has recurrence.After surgery 6,12 and 24 months whens blood for collecting in measure ctDNA.Such as in previous case, leucocyte DNA mutation is used to identify that the potential biology background being likely to be present in cfDNA to be prominent Become.

In addition, since APC is the gene being most often mutated in CRC, and SPLiT-DS experimental subjects used in this example Group includes the region APC being most often mutated, such as mutation cluster region, extends to codon 1,585 from codon 1,286 (299bp), covers in APC52 about 60% CRC mutation, and in total~1000bp is had found in COSMIC another Outer top hit.In NRAS codon 12,13 and 61 is also included within.Therefore, for the size of population of~2700bp, in this example The experimental subjects group used include APC (~1000bp), TP53 (code area 1182bp), KRAS (codon 12,13,61), BRAF (V600E) and NRAS (codon 12,13,61).Think to be likely to experimental subjects group described in the example and cover include The subset of those of mutation that there are two all CRC samples and tool of one mutation.

The identification of resistant mutation in metastatic CRC

SPLiT-DS is for assessing the sample from metastatic CRC, for the Clonal Tumor mutations in cfDNA.It is all Tumour is all negative to KRAS and NRAS mutation, but may carry the identification of the experimental subjects group described in this example at least One clonal mutation (in APC or TP53).SPLiT-DS is also used to determine whether that extremely low frequency (< in ctDNA can be detected 0.1%) presence being mutated, the mutation assign the resistance treated to EGFR.Think very likely from metastatic disease The sample of patient is successfully sequenced with very high depth (~10,000x).SPLiT-DS analysis, which also improves, suffers from metastatic disease Patient ctDNA low and medium frequency KRAS, BRAF and NRAF mutation detection, the patient is sequenced by the mulberry lattice of Tumour DNA It is negative for KRAS and NRAS test, but EGFR treatment also fails.Using SPLiT-DS with similar high depth to Tumour DNA into Row sequencing, to determine the existence or non-existence of primary resistant mutation in ctDNA.Compare ctDNA and from tumour inner tissue Result between DNA.

The detection of Local C RC

SPLiT-DS is used in the sample from part (I-III phase) cancer, uses 5 kinds of CRC bases as described herein The experimental subjects group of cause identifies ctDNA.Tumour DNA also uses SPLiT-DS to carry out sequence.It is also true as described in previous case Surely the presence being mutated derived from the biological context of leucocyte.

Compared with other methods for detecting recurrence, certain currently available method (for example, CEA) offers are estimated as 1.5-6 months ' leading phase ', but whether indefinite such time quantum influences to survive.Other technologies can improve the leading phase, but Need the priori knowledge of tumor genotype.Therefore, SPLiT-DS is for being sequenced ctDNA, and confirms to improve " leading " phase several A month excellent ability, and it is as described herein, do not need the priori knowledge of tumor genotype.SPLiT- is confirmed in this example DS detects the ability of ctDNA at after undergoing the initial surgery with Local C RC patient recurred 6,12 and 24 months.Have 10 patients are selected on the basis of recurrence, wherein tumour and baseline ctDNA take in the gene of previously described experimental subjects group (preferably 2) are mutated at least one.For each sample (individual), at baseline, 6,12 and 24 months, for about Total ctDNA level of each mutation draws clinical medical history as time go by, and (chemotherapy, CT scan and other recurrences refer to Mark).Also assess the leading phase that compared with CEA is horizontal and ctDNA and CEA recurs.

Example 12:CRISPR-DS

This example describes the generation of CRISPR-DS, to execute pin-point accuracy and sensitive sequencing.Skill based on CRISPR Art is used to cut off the target region (Figure 12 A) that design has scheduled homogeneity length.In this example, the CRISPR compatibility core used Sour enzyme is Cas9.Size control is for promoting the size selection (Figure 12 B) before prepared by library, then for plus chain Shape code (Figure 12 C), to execute mistake removal (being similar to previously described such as DS method) (Figure 12 D).Plus bar code it Afterwards, single-wheel capture (being contrasted with other methods availalbes) is executed, and leads to the very high enrichment on target, has and generates Segment is to cover the ability (Figure 12 F and 16A) that complete sequencing is read.Fragmentation about hybrid capture is usually held with ultrasonic treatment Row, it is described be ultrasonically treated the segment that often generates it is too long and have with the nonoverlapping sequencing reading in destination region, and/or it is too short And there is the sequencing to overlap each other reading and read identical sequence (Figure 12 F and 16A) again.Figure 16 B and 16C are display roots According to one embodiment of this technology, with the histogram of the segment insert size of the sample of standard DS and CRISPR-DS scheme preparation Figure.X-axis indicates the percentage difference with best clip size, for example, matching sequencing is read after adjustment molecular barcode and shearing The clip size of number length.Cylindrical region shows the range of clip size, in 10% difference with best size, wherein Best size is specified with vertical dotted line.As shown in Figure 16 B and 16C, ultrasonic treatment obtains the departure with best clip size In significant changes (Figure 16 B), and CRISPR/Cas9 digests to obtain segment of most readings in best clip size (Figure 16 C).

How this example confirmed by using being prevented pseudomutation based on the fragmentation of CRISPR, including for example, because this Enzyme Cas9 used in example generates flush end, does not need end reparation.Therefore, technology provided herein overcomes the more of NGS The common and common problem of kind, including inefficient target enrichment, sequencing mistake and non-uniform clip size.

Guidance RNA (gRNA) is designed as code area and flanking intron area (Figure 12 A) of excision TP53.Clip size is set as ~500bp.Based on specific score and fragment length selection gRNA (table 1, Figure 17 A-17C).With the input DNA (10- of variable 250ng) test sample carries out CRISPR/Cas9 digestion, then to be selected with the size of the reversible immobilization of solid phase (SPRI) pearl, with Indigested high-molecular-weight DNA is removed, and is enriched with the excision segment (Figure 12 B) containing target area.Subsequent library preparation It is executed according to standard scheme is currently available that, but a wheel capture and small modification is used only, as described herein.DNA is carried out A- tailing is connect with DS adapter, and amplification is washed by pearl and purified, and passes through the biotinylation with targeting TP53 exon 120bp DNA probe hybridization captured (table 6).The sample of index of reference primer amplification capture, and in Illumina It is sequenced in MiSeq v3600 circulation kit.As executed analysis in standard scheme, but it is revised as generating altogether before being included in comparison There is sequence (Figure 23).

Table 6.TP53 hybrid capture probe

Standard DS with one wheel or two-wheeled hybrid capture relative to one wheel hybrid capture CRISPR-DS side by side compared with It is shown in Figure 18 A-18C.Figure 18 A-18C is shown on target the bar shaped of the percentage of the primitive sequencer reading of (covering TP53) Scheme (Figure 18 A), it is shown that such as the percentage recycling calculated by the genome percentage in input DNA, the input DNA is generated Dual consensus sequence reads (Figure 18 B), and shows for the various input quantities that use standard DS and CRISPR-DS to process DNA, across the dual consensus sequence depth of intermediate value (Figure 18 C) of all target areas.Figure 18 A shows the mark with two-wheeled capture The primitive sequencer of (covering TP53) on target between quasi- DS and the CRISPR-DS captured with a wheel reads percentage.Figure 18B shows the percentage recycling as calculated by generating the genome percentage in the input DNA that DCS is read.Figure 18 C is shown Intermediate value DCS depth across all target areas is calculated for each input quantity.With standard scheme (i.e. standard-DS) and CRISPR-DS carries out three input quantities (250ng, 100ng and 25ng) of the identical DNA from normal human bladder's tissue extraction Sequencing.With a wheel capture, CRISPR-DS realize > 90% on target original reading (such as covering TP53) (table 8, under Text display), this represents more than significantly improving for standard DS (it realizing~5% original reading on target using a wheel capture) (table 8, be shown below).Second wheel capture bottom line increases the original reading (Figure 19) in CRISPR-DS.Standard-DS is crossed over The rate of recovery of difference input generation~1% is (for example, the percentage of the input genome as sequencing genomes recycling;Also referred to as The recycling of score genome equivalent), and CRISPR-DS generates the rate of recovery that range is 6% to 12%.The rate of recovery of CRISPR-DS It is changed into the DNA of 25ng, the DCS depth (depth generated by DCS reading) of generation is generated with 250ng DNA by standard-DS That is comparable.Comparing side by side for two methods also confirms that CRISPR-DS can provide following improvement: since PCR amplification is inclined The result (i.e. the covering of destination region is uniform) at different band/peaks, In do not occur/influence for the excessive performance of the short-movie section of difference The clearly defined segment for providing the confirmation of correct library preparation before sequencing, and being generated by targeting fragmentation completely across More required target region has uniform covering (Figure 22 E).

Materials and methods

Sample

The sample analyzed in this example includes the people for going mark of the bladder from peripheral blood, with and without cancer Genomic DNA and peritoneal fluid DNA.Patient information can be used for peritoneal fluid sample, and the presence for confirming Tumor mutations.Fluid Sample is obtained from University of Washington's Tissues of Gynecologic Tumors library (University of Washington Gynecologic Oncology Tissue Bank), by University of Washington's mankind's subject (Washington Human Subjects Division sample and clinical information) are collected after the informed consent under the scheme number 27077 of institutional review board approval.From China Sheng Dun university genitourinary cancers sample biology depots (University of Washington Genitourinary Cancer Specimen Biorepository) and previously fixed or freezing autopsy tissue obtain the freezing wing for going mark Guang sample.QIAamp DNA Mini kit (Qiagen, Inc., Valencia, CA, USA) has previously been used to extract DNA, and It is from undenatured.DNA is quantified with Qubit HS dsDNA kit (ThermoFisher Scientific).Use Genomic TapeStation (Agilent, Santa Clara, CA) evaluates DNA mass, and measures DNA integrality number (DIN).DIN is The measurement of genomic DNA quality, range (are degraded) to 10 (undegraded) from 1 very much.Peripheral blood DNA and peritoneal fluid DNA have DIN > 7 (reflect the second best in quality DNA and without degradation).Figure 19 is bar chart, which show from about three different blood DNA samples On two capture steps compare, by with one capture step CRISPR-DS provide target enrichment.

Select bladder sample purposefully to include the DNA degradation of different level.Bladder DNA sample B1 to B13 has DIN between 6.8 and 8.9, and (table 10, be shown below) is successfully analyzed by CRISPR-DS.Sample B14 and B16 difference With 6 and 4 DIN, and for confirming the improvement generated and with Bluepippin system preenrichment high-molecular-weight DNA (Figure 20 A and 20B).

CRISPR guide design.

The gRNA of excision TP53 exon is configured to have including feature below: the generation covering code area TP53~ The ability of 500bp segment and the website (2) highest MIT score (" MIT score ";CRISPR.mit.edu:8079/;Table 1 and figure 17A-17C).For exon 7, guidance is designed to generate the segment of smaller size, to avoid the proximal end in destination region Poly- A beam.12 kinds of gRNA in total are designed, TP53 is cut into 7 different segments (Figure 12 A).All gRNA have " MIT " Score > 60.The comparison of final DCS reading is checked by using Integrative Genomics Viewer to evaluate cutting matter Amount.Successfully guidance generates typical overlay pattern, has sharp edges and DCS depth appropriate in zone boundary (Figure 22 E).If guided " unsuccessful ", observes the decline in DCS depth and read across beyond the long of expected cut point Several presence;Such guidance is redesigned as needed.Conjunction including all gRNA sequences with random dna sequence interval At GeneBlock DNA fragmentation (IDT, Coralville, IA) (table 7) for evaluating guidance (Figure 21 A-21B).Use this paper institute The CRISPR/Cas9 external digestion scheme stated digests 3ngGeneBlock DNA with every kind of gRNA.After digestion, pass through TapeStation 4200 (Agilent Technologies, Santa Clara, CA, USA) analysis reaction (Figure 21 C).In the presence of Predefined fragment length, and confirm the ability that its target site is cut in gRNA assembling appropriate and gRNA.

Table 7.GeneBlock DNA fragmentation

Gene block segment-has the 500bp of all gRNA target sequences.

Spacer sequence 17bp (includes subregion DS from TP53 exons 1 0)

Start spacer sequence (7bp):

It terminates spacer sequence (30bp):

The CRISPR/Cas9 external digestion of genomic DNA.

CrRNA and tracrRNA (IDT, Coralville, IA) are combined in gRNA, then by 30nM gRNA with Cas9 nuclease (NEB, Ipswich, MA) is together in~30nM, the water of 1x NEB Cas9 reaction buffer and 23-27 μ L volume In incubated 10 minutes at 25 DEG C.Then, 10-250ng DNA, the final volume for 30 μ L is added.It will react in 37 DEG C of temperature It educates overnight, is then used for enzymatic inactivation within heat shock 10 minutes at 70 DEG C.

Size selection.

Before the preparation of library, the intended fragment length for target enrichment is selected using size selection.AMPure XP Beads (Beckman Coulter, Brea, CA, USA) for remove miss the target, indigested high-molecular-weight DNA.It is being heated and inactivated Afterwards, reaction is mixed with the pearl of 0.5x ratio, then of short duration mixing incubates 3 minutes, to allow high MW DNA to combine.Then it uses Magnet separates pearl with solution, and solution (containing targeting DNA fragmentation length) is transferred in new pipe.Execution standard AMPure The purifying of 1.8x ratio pearl, and be eluted in 50 μ L TE Low.

Library preparation

A- tailing and connection

According to the scheme of manufacturer, using NEBNext Ultra II DNA Library Prep Kit (NEB, Ipswich, MA), A- tailing and connection are carried out to the DNA of fragmentation.The end NEB is repaired and A- tailing (ERAT) reaction is at 20 DEG C It is lower to incubate 30 minutes and incubated 30 minutes at 65 DEG C.CRISPR-DS does not need end and repairs (Cas9 generates flush end), but ERAT Reaction is used for convenient A- tailing.Then 15 μM of NEB connection main mixture and 2.5ul DS adapter is added, and at 20 DEG C It is lower to incubate 15 minutes.It synthesizes business adapter prototype (Figure 12 C), there are following differences with adapter used in previous research: (1) using 10bp, random, duplex molecule label replaces 12bp;And (2) replace previous 3 ' using simple 3 '-dT jag 5bp conserved sequence is used to be connected to the DNA molecular of 5 '-dA- tailings.After connection, DNA passes through the AMPure of 0.8X times of ratio Bead purifies to clean, and is eluted in the water of 23 μ L nuclease frees.

PCR

Using with fluorescence standard KAPA Real-Time Amplification kit (KAPA Biosystems, Woburn, MA, USA) expand the DNA connected.Prepare 50 μ l reaction comprising KAPA HiFi HotStart Real-time The DNA and final concentration of 2 μM of DS primer MWS13 and MWS20 that PCR Master Mix, 23ul had previously been connected and purified.Instead Should be denaturalized 45 seconds at 98 DEG C, and with 98 DEG C 15 seconds, 65 DEG C of 30 seconds and 72 DEG C of 6-8 of 30 seconds circulations, then at 72 DEG C Final extension in lower 1 minute is expanded.By sample amplification, until they reach fluorescence standard 3, (it generates enough and standardizes The DNA copy of number indicates successful Cas9 cutting and connection to prevent excessive amplification across analyte capture), this is usually needed Want 6-8 circulation, the amount depending on DNA input.The AMPure Bead washing of 0.8X ratio is executed, to purify the segment of amplification, In its water for being eluted to 40 μ L nuclease frees.Compared with the standard DS under PCR step, CRISPR-DS is provided including following Improvement: (i) provide similar size segment (reduce towards small fragment amplification deviation (Figure 22 A), (ii) generate destination region More evenly covering (Figure 22 E);(iii) passes through TapeStation4200 (Agilent Technologies, Santa Clara, CA, USA) the successful library of accurate evaluation prepares (use scheduled clip size feature).In standard-DS, PCR is produced There is (figure as the wide smear for being difficult to compare between samples since ultrasonic treatment has broad range of size in object 22A).Compared with other methods such as standard-DS (it, which can produce, is difficult to comparison result between samples), CRISPR-DS Discrete peak is generated, is explicitly indicated and is successfully cut and connect, and complies with comparison (Figure 22 B- across the quality control of sample D)。

PCR after capture and capture

According to previous research, TP53 xGen Lockdown Probes (IDT, Coralville, IA) is for executing pass In the hybrid capture of TP53 exon, but following modification: selection probe (coming from IDT TP53Lockdown probe groups) is with covering The entire code area TP53 (part of exons 1 and exons 11 is not code area) (table 6).Each CRISPR/Cas9 excision Segment is by minimum 2 probes and most 5 probes covering (Figure 17 A-17C).In order to generate capture probe pond, about given segment Each probe merged with equimolar amounts, generate 7 different ponds (each segment one).Then again by 7 segment ponds with Equimolar amounts mixes (other than the pond of exon 7 and exon 8-9, indicating respectively with 40% and 90%).In sequencing In the case where observing that exon excessively shows, implement the reduction of the capture probe about these exons.Pond will finally be captured It is diluted to 0.75pmol/ μ l.Hybrid capture is executed according to standard IDT scheme, have following modifications: use is special to DS adapter The blocking agent MWS60 and MSW61 of property;Use 75 μ l (rather than 100 μ l) Dynabeads M-270Streptavidin pearl;And And KAPA Hi-Fi HotStart PCR kit (KAPA Biosystems, Woburn, MA, USA) is used, use is with 0.8 μM The MWS13 and index primer MWS21 of final concentration execute PCR after capture.Reaction is denaturalized 45 seconds at 98 DEG C, then at 98 DEG C 30 seconds, 45 seconds and 20 circulations of amplification in 45 seconds at 72 DEG C at 60 DEG C, are then to extend 60 seconds at 72 DEG C.Use 0.8X AMPure Bead washs purified pcr product.

Sequencing

Sample is quantified using Qubit dsDNA HS Assay Kit, is diluted and is merged for being sequenced.Then exist Show sample cell on 4200 TapeStation of Agilent, to confirm Library Quality.The display of TapeStation electrophoretogram corresponds to In sharp unique peak (Figure 22 B-22D) of the fragment length of the CRISPR/Cas9 cutting segment of design.(it can also merge The step individually is executed for each sample before, the performance of each individual samples is verified with as needed/expectation).It uses KAPA Library Quantification kit (KAPA Biosystems, Woburn, MA, USA) quantifies most terminal cistern.Root According to the explanation of manufacturer, kit (Illumina, San Diego, CA, USA) is recycled using v3600, in MiSeq Library is sequenced on Illumina platform.Each sample have~7-10% distribution swimming lane (correspond to~200 ten thousand reading Number);Each sequencing operation spike has about 1%PhiX comparison DNA.

Data processing

Customization bioinformatics pipeline is generated, to automate the analysis (Figure 23) from original FASTQ file to text file. The pipeline is similar to the method for standard DS analysis, but with following modifications: (i) realizes the reservation of pairing reading information, with And (ii) executes consensus sequence preparation before comparison.It matches end reading to be used in the analysis of CRISPR-DS data, but also generation Table be above standard DS analysis improvement because they provide clip sizes quality control and remove as existing for short-movie section Potential technology artifact.In addition, standard DS analysis is mapped to by all readings with reference to postgenome execution consensus sequence system It is standby, and CRISPR-DS analysis executes consensus sequence as initial step, only relies upon and is read by the base of sequenator.Think Being likely to this variation improves consensus sequence preparation, and the time needed for reducing data processing.In CRISPR-DS, share Sequence prepares the customization python script execution by referred to as UnifiedConsensusMaker.py, and the script, which obtains, is originated from phase With all readings of label, the base called more at each position, and generate single-stranded consensus sequence (SSCS) reading.So The SSCS reading about each complementary label pair is compared in position one by one afterwards, to generate double-strand consensus sequence (DCS) reading (figure 12D).Two FASTQ files are prepared for, (DCS reading corresponds to original DNA with DCS reading containing obtained SSCS reading Molecule, therefore average DCS depth is the estimated value of the genome number of sequencing).(also referred to as score genome equivalent returns the rate of recovery Receive) being calculated as average DCS depth (genome of sequencing), (1ng DNA corresponds to~330 lists divided by the number of input genome Times body genome).The original reading on target is calculated by the number of meter reading, the genomic coordinates of the reading are fallen into In upstream and downstream CRISPR/Cas9 cleavage site, wherein to the window of either side addition 100bp.Then default using having V.0.7.419 the DCS FASTQ file for matching end, is examined genome v38 with ginseng and is compared by the bwa-mem of parameter.It reflects The reading penetrated compares again with GATK Indel-Realigner, and is cut using GATK Clip-Reads from end and cut low-quality Measure base.The conservative for executing 30 bases from 3 ' ends is cut other 7 the cutting for base cut and from 5 ' ends and is cut.Separately Outside, the overlapping region of the reading pair in TP53 design across~80bp is carried out using fgbio ClipOverlappingReads It returns something for repairs.The algorithm is executed uniformly to cut and be cut from two ends of pairing reading, until they meet, this, which is maximised, has height The use of the sequencing base of PHRED quality score.Text is accumulated by obtained document creation using SAMtools mpileup Part (pileup file).Then using customization python script filtering accumulation file, there is the script BED file to be used for target To genomic locations.The coordinate that CRISPR/Cas9 gRNA can be used easily creates BED file.Then, the heap of filtering Product file is handled by customization script mut-position.1.33.py, and the script creation is known as ' mutpos ', has The tab-delimited text file of abrupt information.The summary and the mutation at each sequencing position that mutpos includes DCS depth (software used in CRISPR-DS analysis can access at Hyper text transfer security protocol: //github.com/ risqueslab/CRISPR-DS)。

Standard-DS

Three kinds of amount of DNA (25ng, 100ng and 250ng) from normal human bladder's sample B9 are sequenced with standard-DS, With a wheel and two-wheeled capture, and it is compared with the result from CRISPR-DS.Execution standard-DS analysis, but use KAPA Hyperprep kit (KAPA Biosystems, Woburn, MA, USA) is repaired and is connected for end, and KAPA Hi-Fi HotStart PCR kit (KAPA Biosystems, Woburn, MA, USA) is used for PCR amplification.It uses Cover TP53 exon 2-11 xGen Lockdown probe execute hybrid capture (identical probe for standard DS and In CRISPR-DS the two).Sample is sequenced on~10% 2500 Illumina platform of HiSeq, shorter to adapt to Fragment length.

CRISPR-DS target enrichment

In order to characterize CRISPR-DS target enrichment, two kinds of separated analyses of execution:

First analysis includes the comparison (and compared with result of standard DS) that a wheel capture is captured relative to two-wheeled.Place Three DNA samples are managed for CRISPR-DS, and are split into two after a hybrid capture.By first part plus index And be sequenced, and second part is made to be subjected to other wheel capture, as required in original DS scheme.For once capture relative to It captures twice, compares the original reading percentage of " on target " (i.e. covering TP53 exon).Between standard DS and CRISPR-DS Comparison details can see in table 8.

Comparison of the 8. standard-DS of table relative to CRISPR-DS

The percentage of original reading of second assay on target without executing hybrid capture, and determines that enrichment is logical The segment for crossing size selection CRISPR excision uniquely generates.The different amount of DNA of scheme processing described in the first analysis (from 10ng to 250ng) three kinds of different samples, until the first PCR (i.e. until hybrid capture).Figure 24 A and 24B are chart (figures 24A) with table (Figure 24 B), which show according to one embodiment of this technology, quantitative CRISPR/Cas9 digestion is then size The result of target enrichment degree after selection.The enrichment that Figure 24 A shows DNA sample and realizes for every kind of sample.Figure 24 B is aobvious Show compared with the amount of input DNA, the percentage of the original reading of " on target ".Then PCR product plus index and is sequenced. Calculate the percentage of original reading on target, and estimate enrichment times (consider target area size, in this case, 3280bp)。

The preenrichment of high-molecular-weight DNA

The selection of high-molecular-weight DNA improves the performance of the degradation of dna in CRISPR-DS.Use BluePippin system (Sage Science, Beverly, MA) executes this selection.It the use of 0.75% gel box and high pass setting operation DIN is 6 and 4 Two kinds of bladder DNA, to obtain > 8kb segment.Size selection (Figure 20 A) is confirmed by TapeStation.Then exist 250ng DNA before BluePippin with the 250ng DNA after BluePippin is parallel with CRISPR-DS is located Reason.It quantifies and compares the percentage of the original reading on target and be averaged DCS depth (Figure 20 B).

Example 13: the CRISPR-DS in ovarian cancer samples

In order to verify the ability of CRISPR-DS detection low frequency mutation, from the female for suffering from oophoroma during Debulking surgery Property in collect and four peritoneal fluid samples and analyzed.Previously passed standard-DS confirms TP53 Tumor mutations in these samples In the presence of.100ng DNA (30-100 times smaller than that for standard-DS) is analyzed for CRISPR-DS, and is obtained and mark The quasi- comparable DCS depth of DS, and successful identification TP53 Tumor mutations (tables 9) in all cases.Rate of recovery range is 6% to 12%, compared with the standard DS for using identical DNA, represent the increase of 15x-200x.

Table 9. is about 4 kinds of different samples being mutated with TP53, comparison of the standard-DS relative to CRISPR-DS.

* data processing is executed after final dual sequencing

Example 14: the CRISPR-DS in bladder body sample

This example describes CRISPR-DS in one group of 13 DNA sample extracted in the bladder body from different patients Use (table 10).250ng DNA from each sample leads to 6 for measuring, the intermediate value DCS depth of 143x, corresponding In the 7.4% intermediate value rate of recovery.It repeats to confirm reproducible performance using the technology for two samples (B2 and B4).All samples Product all have the DCS reading on target of > 98%, but the percentage range of the original reading on target is 43% to 98%.It is low Target enrichment corresponds to the sample of DNA integrality number (DIN) < 7.

CRISPR-DS sequencing result of the table 10. about 13 samples with 250ng input DNA processing.

In order to test effect of the DIN to measurement performance, low-molecular-weight dna is removed before CRISPR/Cas9 digestion. The pulsed field feature of BluePippin system is used to select macromolecule from two samples with " degradation of dna " (DIN 6 and 4) Measure DNA.Preenrichment increases the original reading on target by 2 times, and DCS depth increases by 500 (Figure 20 B).For direct quantitative letter Singly by CRISPR/Cas9 digestion then for size selection assign enrichment degree, to 3 samples be sequenced without Capture.10-250ng DNA is digested, size selection, connection, amplification and sequencing.The percentage of the original reading of " on target " It is 0.2% to 5% than range, corresponds to~2,000x to 50,000x times enrichment (table 11).It is worth noting that, lower DNA Input shows highest enrichment, may reflect the best of the high-molecular-weight DNA segment missed the target when they are in lower abundance Removal.

The target enrichment that table 11. is selected due to size.

CRISPR/Cas9 fragmentation is then that size performs effective target enrichment chosen successfullyly, and eliminates pair In any need of the second wheel capture of small target region.In addition, eliminating PCR deviation and realizing uniformly covering for destination region Lid, represents a large amount of improvement more than currently available method.

Equivalent and range

The foregoing detailed description of technical em- bodiments is not intended to be detailed, or by technical restriction in above-disclosed accurate Form.Although for illustrative purpose described above is the specific embodiment of technology and examples, such as the technology of related fields What personnel will be recognized, various equivalent modifications are possible in the range of technology.For example, although step to be in graded It is existing, but alternate embodiment can step perform in different order.Various embodiments described herein can also combine, with provide into The embodiment of one step.All references cited herein is all herein incorporated by reference, as fully expounded one herein Sample.

According to above, it should be understood that the specific embodiment of technology is described for explanatory purposes herein, but many Well known structure and function is not shown or described in detail, to avoid the description of unnecessarily fuzzy technology embodiment.Upper and lower In the case that text allows, singular or plural term can also respectively include plural number or singular references.Further, although at that The advantage related to some embodiments of technology is described in the context of a little embodiments, but other embodiments can also be shown Such advantage, and simultaneously not all embodiments must all show the such advantage fallen into technical scope.Correspondingly, the disclosure Content and the relevant technologies can cover the other embodiments for not being explicitly shown or describing herein.

It would be recognized by those skilled in the art that or being able to use no more than routine experiment and determining open skill as described herein Many equivalents of the specific embodiment of art.The range of this technology is not intended to limitation foregoing specification, but such as following rights It is illustrated in it is required that:

97页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:确定胃癌对西妥昔单抗敏感性的系统和方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!