For analyzing the reagent and method of the nucleic acid that is associated

文档序号:1776441 发布日期:2019-12-03 浏览:27次 中文

阅读说明:本技术 用于分析相联系核酸的试剂和方法 (For analyzing the reagent and method of the nucleic acid that is associated ) 是由 卢卡斯·布朗东·埃德尔曼 于 2017-12-19 设计创作,主要内容包括:提供了用于分析循环微粒(即源自血液之微粒)的核酸(例如基因组DNA)的试剂和方法。所述方法包括将循环微粒的至少两个靶核酸片段相联系以产生至少两个相联系靶核酸片段的组。在所述方法中,靶核酸片段可通过例如条码化、分配、连接和/或单独测序的技术相联系。对相联系片段的组的测序提供了对应于来自单个微粒的片段序列的信息上相联系序列读取的组。(Provide the reagent and method of the nucleic acid (such as genomic DNA) for analyzing circulation particle (being originated from the particle of blood).The method includes recycling at least two target nucleic acid fragments of particle to be associated to generate at least two groups for being associated target nucleic acid fragment.In the method, target nucleic acid fragment can be for example, by bar code, distribution, connection and/or individually the technology that is sequenced is associated.The group for the sequence read that is associated in information corresponding to the fragment sequence from single particle is provided to the sequencing of the group for the segment that is associated.)

1. the method for sample of the analysis comprising circulation particle, wherein the circulation particle contains at least two genomic DNA fragment, And the method comprise the steps that

(a) sample of the preparation for sequencing comprising at least two at least two genomic DNA fragment are associated To generate at least two groups for being associated genomic DNA fragment;And

(b) segment that is each associated in described group is sequenced to generate at least two sequence reads being associated.

2. method described in claim 1, wherein by least three, at least four, at least five, at least 10 of the circulation particle A, at least 50, at least 100, at least 500, at least 1000, at least 5000, at least 10,000, at least 100, 000 or at least 1,000,000 genomic DNA fragment is associated, and is then sequenced to generate at least three, at least 4 A, at least five, at least ten, at least 50, at least 100, at least 500, at least 1000, at least 5000, at least 10,000, at least 100,000 or at least 1,000,000 sequence reads being associated.

3. method described in claim 1 or claim 2, wherein the diameter of the circulation particle is 100 to 5000nm.

4. the method described in any one of claims 1 to 3, wherein the genomic DNA fragment that is associated is from individual gene Group DNA molecular.

5. method described in any one of Claims 1-4, wherein the method also includes estimating or determine the base that is associated Because of the genome sequence length of group DNA fragmentation.

6. method described in any one of claims 1 to 5, wherein the method also includes dividing from blood, blood plasma or serum The step of from the circulation particle.

7. method of claim 6, wherein the separating step includes centrifugation.

8. claim 6 or method of claim 7, wherein the separating step includes size exclusion chromatography.

9. method described in any one of claim 6 to 8, wherein the separating step includes filtering.

10. method described in any one of claims 1 to 9, wherein the sample includes the first and second circulation particles, wherein Each circulation particle contains at least two genomic DNA fragment, and wherein the method includes carrying out step (a) to generate State second group of gene that is associated of first group of be associated genomic DNA fragment and second circulation particle of first circulation particle Group DNA fragmentation, and step (b) is carried out to generate first group of the first circulation particle be associated sequence read and described the Second group of sequence read that is associated of two circulation particles.

11. method described in any one of claims 1 to 9, wherein the sample includes n circulation particle, wherein each following Ring particle contains at least two genomic DNA fragment, and wherein the method includes carrying out step (a) to be connected to generate n group It is genomic DNA fragment, one group of each of described n circulation particle, and carry out step (b) and be associated with generating n group Sequence read, one group of each of described n circulation particle.

12. method described in claim 11, wherein n is at least three, at least five, at least ten, at least 50, at least 100 It is a, at least 1000, at least 10,000, at least 100,000, at least 1,000,000, at least 10,000,000 or At least 100,000,000 circulation particles.

13. method described in any one of claim 10 to 12, wherein before step (a), the method also includes by institute State the step that sample is assigned at least two differential responses volumes.

14. method of the preparation for the sample of sequencing, wherein the sample includes circulation particle, wherein the circulation particle includes At least two genomic DNA fragments, and wherein the method includes by it is described circulation particle at least two genomic DNA pieces Section is attached to the different sequence of barcodes of sequence of barcodes or sequence of barcodes group, to generate the group for the genomic DNA fragment that is associated.

15. method of claim 14, wherein at least two genomic DNA fragment of the circulation particle is attached Before the step of being connected to the different sequence of barcodes of sequence of barcodes or sequence of barcodes group, the method includes being attached to coupling sequence Each genomic DNA fragment of the circulation particle, wherein the coupling sequence is then attached to sequence of barcodes or bar code sequence The different sequence of barcodes of column group, to generate the group for the genomic DNA fragment that is associated.

16. claim 14 or method of claim 15, wherein the sample includes the first and second circulation particles, In each circulation particle contain at least two genomic DNA fragment, and wherein the method includes the first circulation is micro- At least two genomic DNA fragment of grain is attached to the different bar code sequences of the first sequence of barcodes or first group of sequence of barcodes Column, to generate first group of genomic DNA fragment that is associated, and by least two genome of the second circulation particle DNA fragmentation is attached to the different sequence of barcodes of the second sequence of barcodes or second group of sequence of barcodes, to generate second group of base that is associated Because of a group DNA fragmentation.

17. method described in any one of claims 1 to 13, the method comprise the steps that

(a) sample of the preparation for sequencing comprising be attached at least two genomic DNA fragment of the circulation particle The group for the genomic DNA fragment that is associated is generated to sequence of barcodes, and

(b) segment that is each associated in described group is sequenced to generate at least two and be associated sequence read, wherein described At least two sequence reads that are associated are associated by the sequence of barcodes.

18. method described in claim 17, wherein at least two genomic DNA fragment of the circulation particle is attached Before the step of being connected to sequence of barcodes, the method includes coupling sequence is attached to each genome of the circulation particle DNA fragmentation, wherein the coupling sequence is attached to the sequence of barcodes then to generate the genomic DNA fragment that is associated Group.

19. claim 17 or method of claim 18, wherein the sample includes the first and second circulation particles, In each circulation particle contain at least two genomic DNA fragment, and wherein the method includes carry out step (a) with generate First group of the first circulation particle is associated second group of base that is associated of genomic DNA fragment and the second circulation particle Because of a group DNA fragmentation, and step (b) is carried out to generate first group of the first circulation particle sequence read and described of being associated Second group of sequence read that is associated of second circulation particle, wherein at least two phase relative to the second circulation particle Contact sequence is read, and the sequence read that is associated of described at least two of the first circulation particle passes through different sequence of barcodes phases Connection.

20. method described in any one of claims 1 to 13, the method comprise the steps that

(a) sample of the preparation for sequencing comprising will be at least two genomic DNA fragment of the circulation particle Each is attached to the different sequence of barcodes of sequence of barcodes group to generate the group for the genomic DNA fragment that is associated;And

(b) segment that is associated each described in described group is sequenced to generate at least two and be associated sequence read, wherein The sequence read of at least two connection is associated by the sequence of barcodes group.

21. method of claim 20, wherein will be at least two genomic DNA fragment of the circulation particle Each the step of being attached to different sequence of barcodes before, the method includes coupling sequence is attached to the circulation particle Each genomic DNA fragment, wherein it is described circulation particle each of at least two genomic DNA fragment pass through Its coupling sequence is attached to the different sequence of barcodes of the sequence of barcodes group.

22. method described in claim 20 or claim 21, wherein the sample includes the first and second circulation particles, In each circulation particle contain at least two genomic DNA fragment, and wherein the method includes carry out step (a) with generate First group of the first circulation particle is associated second group of base that is associated of genomic DNA fragment and the second circulation particle Because of a group DNA fragmentation, and step (b) is carried out to generate first group of the first circulation particle sequence read and described of being associated Second group of sequence read that is associated of second circulation particle, wherein relative to second group of sequence read that is associated, described One group of sequence read that is associated is associated by different sequence of barcodes groups.

23. method described in any one of claim 14 to 22, wherein the method includes preparing the first He for sequencing Second sample, wherein each sample includes at least one circulation particle, wherein each circulation particle contains at least two genome DNA fragmentation, and wherein the sequence of barcodes respectively contains sample identifier region, and the method comprise the steps that

(i) step (a) is carried out to each sample, wherein being attached to the bar code of the genomic DNA fragment from first sample Sequence has different sample identifier areas from the sequence of barcodes for being attached to the genomic DNA fragment from second sample Domain;

(ii) step (b) is carried out to each sample, wherein the sequence read that is each associated includes the sample identifier region Sequence;And

(iii) sample of each sequence read that is associated is obtained by the determination of its sample identifier region.

24. method described in any one of claim 14 to 23, wherein before the attachment step, the method also includes The sample is assigned to the step at least two differential responses volumes.

25. method of the preparation for the sample of sequencing, wherein the sample includes the first and second circulation particles, and wherein every A circulation particle contains at least two target nucleic acid fragment, and the method comprise the steps that

(a) contact the sample with the library for containing at least two polymer bar code reagent, wherein each polymer bar code Changing reagent includes the first and second bar code regions to link together, wherein each bar code region includes nucleic acid sequence, and its In the first polymer bar code reagent the first and second bar code regions be different from the library the second polymer bar code try First and second bar code regions of agent;And

(b) sequence of barcodes is attached to each of first and second target nucleic acid fragments of the first circulation particle to generate First and second bar code target nucleic acid molecules of the first circulation particle, wherein the first bar code target nucleic acid molecule includes The nucleic acid sequence in the first bar code region of the first polymer bar code reagent, and the second bar code target nucleic acid molecule The nucleic acid sequence in the second bar code region comprising the first polymer bar code reagent, and sequence of barcodes is attached to described Each of first and second target nucleic acid fragments of second circulation particle are to generate the first and of the second circulation particle Two bar code target nucleic acid molecules, wherein the first bar code target nucleic acid molecule includes the second polymer bar code reagent The nucleic acid sequence in the first bar code region, and the second bar code target nucleic acid molecule is tried comprising the second polymer bar code The nucleic acid sequence in the second bar code region of agent.

26. method of claim 25, the method comprise the steps that

(a) contact the sample with the library for containing at least two polymer bar code reagent, wherein each polymer bar code Changing reagent includes the first and second bar code oligonucleotides to link together, wherein the bar code oligonucleotides respectively contains Bar code region, and the wherein item of the first and second bar code oligonucleotides of the first polymer bar code reagent in the library Code region is different from the bar code area of the first and second bar code oligonucleotides of the second polymer bar code reagent in the library Domain;And

(b) make the first and second bar code oligonucleotides and the first circulation particle of the first polymer bar code reagent The annealing of the first and second target nucleic acid fragments or connect to generate the first and second bar code target nucleic acid molecules, and make described the First and second bar code oligonucleotides of two polymer bar code reagents and the first and second targets of the second circulation particle Nucleic acid fragment annealing is connected to generate the first and second bar code target nucleic acid molecules.

27. the method described in claim 26, wherein making the first and second bar code oligonucleotides and first and second Before the step of genomic DNA fragment annealing or connection, the method includes coupling sequence is attached to each genome DNA fragmentation, wherein the subsequent institute with first and second genomic DNA fragment of the first and second bar codes oligonucleotides State coupling sequence annealing or connection.

28. method described in claim 26 or claim 27, wherein step (b) includes:

(i) make the first and second bar code oligonucleotides and the first circulation particle of the first polymer bar code reagent The annealing of the first and second genomic DNA fragments, and make the first and second bar codes of the second polymer bar code reagent The first and second genomic DNA fragments for changing oligonucleotides and the second circulation particle are annealed;And

(ii) extend the first and second bar code oligonucleotides of the first polymer bar code reagent to generate the first He Second different bar code target nucleic acid molecule, and keep the first and second bar codes of the second polymer bar code reagent few Nucleotide extends to generate the first and second different bar code target nucleic acid molecules, wherein each bar code target nucleic acid molecule includes At least one nucleotide by the genomic DNA fragment as templated synthesis.

29. method described in any one of claim 25 to 28, wherein step (a) and (b) and optional (c) and (d) In It is carried out at least two circulation particles in single reaction volume.

30. method described in any one of claim 25 to 28, wherein before step (b), the method also includes by institute State the step that sample is assigned at least two differential responses volumes.

31. method described in any one of claim 1 to 24, the method comprise the steps that

(a) sample of the preparation for sequencing comprising:

(i) contact the sample with polymer bar code reagent, the polymer bar code reagent includes to link together First and second bar code regions, wherein each bar code region includes nucleic acid sequence, and

(ii) sequence of barcodes is attached to each of at least two genomic DNA fragments of the circulation particle to generate the One and the second different bar code target nucleic acid molecule, wherein the first bar code target nucleic acid molecule includes first bar code area The nucleic acid sequence in domain, and the second bar code target nucleic acid molecule includes the nucleic acid sequence in second bar code region;And

(b) each bar code target nucleic acid molecule is sequenced to generate at least two contact sequences and read.

32. method described in claim 31, wherein sequence of barcodes is attached to described at least two of the circulation particle Before the step of each of genomic DNA fragment, the method includes coupling sequence is attached to the circulation particle Each genomic DNA fragment, wherein sequence of barcodes is then attached at least two genomic DNA of the circulation particle The coupling sequence of each in segment, to generate the first and second different bar code target nucleic acid molecules.

33. method described in claim 31 or claim 32, wherein by described in any one of claim 25 to 30 Method carries out step (a).

34. method described in any one of claim 31 to 33, wherein the method includes preparing the first He for sequencing Second sample, wherein each sample includes at least one circulation particle, wherein the circulation particle contains at least two genome DNA fragmentation, and wherein the sequence of barcodes respectively contains sample identifier region, and the method comprise the steps that

(i) step (a) is carried out to each sample, wherein being attached to the bar code of the genomic DNA fragment from first sample Sequence has different sample identifier areas from the sequence of barcodes for being attached to the genomic DNA fragment from second sample Domain;

(ii) step (b) is carried out to each sample, wherein each sequence read includes the sequence in the sample identifier region;With And

(iii) sample of each sequence read is obtained by the determination of its sample identifier region.

35. method described in any one of claim 31 to 34, wherein the method includes analyses to contain at least two circulation The sample of particle, wherein each circulation particle contains at least two genomic DNA fragment, and wherein the method includes following Step:

(a) sample of the preparation for sequencing comprising:

(i) contact the sample and the library of polymer bar code reagent, which includes for the two or more The polymer bar code reagent for recycling each of particle, wherein each polymer bar code reagent such as claim 31 to 33 Any one of defined in;And

(ii) sequence of barcodes is attached to each of at least two genomic DNA fragments of each circulation particle, wherein from Each of described at least two circulations particle generates at least two bar code target nucleic acid molecules, and wherein from single loop At least two bar codes target nucleic acid molecule that particle generates respectively contains the bar code from same polymer bar code reagent The nucleic acid sequence in region;And

(b) each bar code target nucleic acid molecule is sequenced, is associated sequence with generate each circulation particle at least two It reads.

36. method described in claim 35, wherein sequence of barcodes to be attached to the circulation particle in single reaction volume Genomic DNA fragment.

37. method described in claim 35, wherein the method also includes dividing the sample before the attachment step The step being fitted at least two differential responses volumes.

38. method described in any one of claim 13,24,30 and 37, wherein the sample that circulation particle will be contained at least two Product are assigned at least two differential responses volumes.

39. method described in claim 38, wherein the differential responses volume is provided by differential responses container.

40. method described in claim 38, wherein the differential responses volume is provided by different aqueous droplets.

41. method described in claim 40, wherein the aqueous droplet of difference is the aqueous droplet of difference in lotion.

Technical field

The present invention relates to the analyses of cell-free nucleic acid (such as Cell-free DNA).Particularly, it is related to the particle from blood (microparticle) analysis for the Cell-free DNA for including in.It provides for the nucleic acid of single particle to be associated (linking) reagent and method.It additionally provides for analyzing (linked) the nucleic acid fragment group that is associated from single particle Method.

Background technique

Cell-free DNA (cell-free DNA, cfDNA) in circulation is usually (the normal length 100- of fragmentation 200 base-pairs), and the method for being accordingly used in cfDNA analysis is traditionally focused on the biology that can be used these short dna segments to find Signal.For example, detecting the mononucleotide variant in individual molecule or carrying out " numerator counts " to a large amount of sequencing fragments to push away indirectly Foetal DNA in the presence for extensive chromosome abnormality of breaking, such as assessment maternal circulation for fetal chromosomal trisomy Test (form of so-called " the antenatal detection (non-invasive prenatal testing) of Noninvasive " or NIPT).

The method that many kinds of analysis circulation Cell-free DNAs are previously described.According to specific application field, these surveys Surely different terms can be used for substantially similar sample type and technical method group, such as Circulating tumor DNA (circulating tumor DNA, ctDNA), cell-free foetal DNA (cell-free foetal DNA, cffDNA) and/or Liquid biopsy or the antenatal detection of Noninvasive.In general, these methods include circulation Cell-free DNA of the preparation for sequencing Sample, sequencing reaction itself are subsequently used for analysis gained sequence to detect the experiment of the information frame of associated biomolecule signal Room scheme.This method is related to the purifying of the DNA before sequencing and separating step, it means that subsequent analysis must only rely upon The information for including in DNA itself.After sequencing, such method is come usually using one or more information or statistical framework The many aspects of analytical sequence data, such as detect specific mutation therein, and/or the specific chromosome of detection or subchromoso area The selective enrichment or selectivity missing (for example, it may indicate that the chromosomal aneuploidy in the fetus of development) in domain.

Many in these methods is for NIPT (such as in 6258540 B1 of United States Patent (USP), 8296076 B2,8318430 In B2,8195415 B2,9447453 B2 and 8442774 B2).For carrying out for detecting fetal chromosomal abnormalities (such as three Body and/or subchromoso are abnormal, such as micro-deleted) the most popular method of the antenatal detection of Noninvasive be related to a large amount of CfDNA molecule is sequenced, and obtained sequence mapping (mapping) to genome (is determined which chromosome is sequence come from And/or which part of given chromosome), and subsequently for one or more such chromosomes or subchromoso region, Determine the amount (for example, in the form of absolute number of reading (read) or the relative number of reading) for being mapped to sequence thereon, And be then compared it with one or more normal or abnormal threshold values or cutoff value, and/or carry out statistical check, with true The fixed region whether may in sequence amount over-expression (it can be for example corresponding to Trisomy) and/or the region Whether insufficient (it may for example corresponding to micro-deleted) may be expressed in sequence amount.

It also describes using the data from the individual molecular not contacted and analyzes a variety of other or warp of Cell-free DNA The method (such as WO2016094853 A1, US2015344970 A1 and US20150105267 A1) of modification.

Despite the presence of such extensive method, it is still desirable to allow to reliably detect long-range hereditary information and (such as determine phase (phasing)) new cfDNA analysis method and with more highly sensitive method.For example, in the case where NIPT, tire Youngster cfDNA only accounts for the sub-fraction of entirety cfDNA in pregnant individuals (most of Circulating DNA is normal mother body D NA).Therefore, A sizable technological challenge of NIPT is around differentiation fetus cfDNA and mother body D NA.Similarly, in the trouble for suffering from cancer In person, cfDNA only accounts for the sub-fraction of whole Circulating DNA.Therefore, it analyzes using cfDNA for diagnosing or monitoring cancer side There are similar technological challenges in face.

Summary of the invention

The present invention provides the methods for analyzing the nucleic acid fragment in circulation particle (or the particle for being originated from blood).This hair It is bright to be based on being associated fragment approach (linked-fragment approach), wherein the nucleic acid fragment from single particle contacts Together.This connection makes it possible to generate the group for the sequence read that is associated for corresponding to the fragment sequence from single particle.

The fragment approach that is associated provides super-sensitive cfDNA analysis, and also allows for detecting long-range heredity letter Breath.This method is based on the combination seen clearly.Firstly, following see clearly is utilized in these methods: a body circulation particle is (for example, individual follows Ring apoptotic body) many genes group DNA fragmentation of same individual cells (internal somewhere) generation by experience apoptosis can be included.Its Secondary, a part of such genomic DNA fragment in individual particles can be preferentially comprising coming from one or more specific dyeing The sequence of body region.Cumulatively, such circulation particle accordingly acts as " the molecule stethoscope of data rich and multiple features (molecular stethoscope) ", with observe occur in the limited somatic tissue space in somewhere in the body can be with Extremely complex genetic event;It is being removed or metabolism advances into circulation importantly, due to which such particle is most of, therefore It can non-invasively be detected.The present invention, which describes, uses these " stethoscopes " --- the segment that is associated and the sequence that is associated Group (single individual particles or the in many embodiments shape of the complex sample comprising a large amount of single loop particles of reading Formula) execute the experimental method and information approach of analysis and diagnosis task.

The method of sample the present invention provides analysis comprising the particle from blood, wherein the particle includes at least two A target nucleic acid fragment, and the method comprise the steps that sample of (a) preparation for sequencing comprising by described at least two At least two in target nucleic acid fragment are associated to generate at least two groups for being associated target nucleic acid fragment;And (b) to described The segment that is each associated in group is sequenced to generate at least two sequence reads being associated (in information).

The method of sample the present invention provides analysis comprising circulation particle, wherein the circulation particle contains at least two Target nucleic acid fragment, and the method comprise the steps that sample of (a) preparation for sequencing comprising by least two target At least two in nucleic acid fragment are associated to generate at least two groups for being associated target nucleic acid fragment;And (b) to described group In the segment that is each associated be sequenced to generate at least two sequence reads that are associated (in information).

The method of sample the present invention provides analysis comprising the particle from blood, wherein the particle includes at least two A genomic DNA fragment, and the method comprise the steps that sample of (a) preparation for sequencing comprising by least two bases Because at least two in group DNA fragmentation are associated to generate at least two groups for being associated genomic DNA fragment;And it is (b) right The segment that is each associated in described group is sequenced to generate at least two sequence reads being associated.

The method of sample the present invention provides analysis comprising circulation particle, wherein the circulation particle contains at least two Genomic DNA fragment, and the method comprise the steps that sample of (a) preparation for sequencing comprising by least two genes At least two in group DNA fragmentation are associated to generate at least two groups for being associated genomic DNA fragment;And (b) to institute The segment that is each associated in group is stated to be sequenced to generate at least two sequence reads being associated.

In the method, can by least three of the particle, at least four, at least five, at least ten, at least 50, At least 100, at least 500, at least 1000, at least 5000, at least 10,000, at least 100,000 or at least 1, The connection of 000,000 target nucleic acid fragment in groups, and is then sequenced to generate at least three, at least four, at least five, at least 10, at least 50, at least 100, at least 500, at least 1000, at least 5000, at least 10,000, at least 100, 000 or at least 1,000,000 sequence read being associated.

Preferably, at least five target nucleic acid fragment of the particle can be contacted in groups, and is then sequenced to generate extremely Few 5 sequence reads being associated.

In the method, the sequence read being each associated can provide at least one nucleotide, at least for the segment that is associated 5 nucleotide, at least ten nucleotide, at least 20 nucleotide, at least 30 nucleotide, at least 50 nucleotide, at least 100 A nucleotide, at least 200 nucleotide, at least 500 nucleotide, at least 1000 nucleotide or at least 10,000 nucleosides The sequence of acid.Preferably, the sequence read being each associated can provide the sequence of at least 20 nucleotide of the segment that is associated.

In the method, at least two, at least ten, at least 100, at least 1000, at least 10 in total be can produce, 000, at least 100,000, at least 1,000,000, at least 10,000,000, at least 100,000,000, at least 1, 000,000,000, at least 10,000,000,000, at least 100,000,000,000 or at least 1,000,000,000, 000 sequence read.Preferably, in total at least 500 are generated, 000 sequence read.

Sequence read may include from least five of target nucleic acid (such as genomic DNA), at least ten, at least 25, extremely 50, at least 100, at least 250, at least 500, at least 1000, at least 2000, at least 5000 or at least 10 less, 000 nucleotide.Preferably, each sequence read includes at least five nucleotide from target nucleic acid.

Sequence read may include the original series reading generated by sequenator or part of it, such as is sequenced by Illumina The original series for 50 long sequences of nucleotide that instrument generates are read.Sequence read may include two from paired end sequencing operation The fusion sequence of a reading, such as the first reading and second of the paired end sequencing operation on Illumina sequenator are read Take the sequence of the series connection (concatenated) or fusion of the two.Sequence read may include being read by the original series that sequenator generates The a part taken, for example, the original series of 150 nucleotide that are generated by Illumina sequenator read in 20 continuous kernels Thuja acid.Single original series reading may include the sequence read that at least two generated by means of the present invention are associated.

Sequence read can be generated by any method known in the art.For example, being sequenced by chain termination or Sanger.It is excellent Selection of land, sequencing are carried out by next-generation sequencing approach, for example, synthesis order-checking, using reversible terminator synthesis order-checking (such as Illumina sequencing), pyrosequencing (such as 454 sequencing), connection sequencing (seqencing by ligation) (such as SOLiD sequencing), single-molecule sequencing (such as unimolecule (SMRT) sequencing in real time, Pacific Biosciences), or by receiving (such as on Minion or Promethion platform, Oxford Nanopore Technologies) is sequenced in metre hole.Most preferably Ground, sequence read are generated by using the synthesis order-checking (such as Illumina is sequenced) of reversible terminator.

The method may include the other step mapped to each sequence read being associated with reference to genome sequence. The sequence read being associated may include mapping to the sequence of the phase homologous chromosomes with reference to genome sequence or mapping to reference to gene The sequence of two or more different chromosomes of group sequence.

The diameter of particle can be at least 100nm, at least 110nm, at least 125nm, at least 150nm, at least 175nm, extremely Few 200nm, at least 250nm or at least 500nm.Preferably, the diameter of particle is at least 200nm.The diameter of particle can be 100-5000nm.The diameter of particle can be 10-10,000nm (such as 100-10,000nm, 110-10,000nm), 50- 5000nm,75-5,000nm,100-3,000nm.The diameter of particle can be 10-90nm, 50-100nm, 90-200nm, 100- 200nm, 100-500nm, 100-1000nm, 1000-2000nm, 90-5000nm or 2000-10,000nm.Preferably, particle Diameter be 100 to 5000nm.Most preferably, the diameter of particle is 200 to 5000nm.Sample may include at least two different rulers Very little or at least three kinds different sizes or a series of various sizes of particles.

The genomic DNA fragment of connection may originate from individual gene group DNA molecular.

The step of the method may also include estimation or determine the genome sequence length for the genomic DNA fragment that is associated. Optionally, the step can by substantially entire sequence to the segment that is associated (i.e. from its approximation 5 ' end to its approximation 3 ' end) into Row is sequenced and is counted the nucleotide number being wherein sequenced to carry out.Optionally, this can be by carrying out as follows: to being associated The nucleotide of enough numbers at 5 ' ends of the sequence of segment is sequenced mapping at the 5 ' end with reference to genome sequence (example Such as human genome sequence) in locus, and similarly to the nucleosides of enough numbers at the 3 ' of the sequence for the segment that is associated ends Acid is sequenced mapping at the 3 ' end with reference to the locus in genome sequence, and then using with reference to genome sequence Determine be associated segment genome sequence length (segment that is associated 3 ' end sequencing nucleotide number+be associated segment 5 ' end sequencing nucleotide number+with reference to the number (part is not sequenced) of the nucleotide between these sequences in genome).

Preferably, sample separation autoblood, blood plasma or serum.Particle is isolated from blood, blood plasma or serum.The method It may also include the step of particle is separated from blood, blood plasma or serum.The step can carry out before or during step (a).

Particle can be separated by centrifugation, size exclusion chromatography and/or filtering.

Separating step may include centrifugation.Particle can be separated by precipitating, and be walked using centrifugation step and/or ultracentrifugation Suddenly or the series of two or more centrifugation steps of two or more friction speeds and/or ultracentrifugation step, wherein From centrifugation/ultracentrifugation step precipitating and/or supernatant the second centrifugation/ultracentrifugation step and/or differential from It is further processed during the heart.

Centrifugation or ultracentrifugation step can be with 100-500,000G, 100-1000G, 1000-10,000G, 10,000- The speed of 100,000G, 500-100,000G or 100,000-500,000G carry out.Centrifugation or ultracentrifugation step may proceed to Few 5 seconds, at least 10 seconds, at least 30 seconds, at least 60 seconds, at least 5 minutes, at least 10 minutes, at least 30 minutes, at least 60 minutes or At least 3 hours duration of person.

Separating step may include size exclusion chromatography, such as the size exclusion chromatography based on column, for example including containing base The size exclusion chromatography of the column of matrix in agarose or the matrix based on sephacryl.

Size exclusion chromatography may include using matrix or filter comprising following aperture: at least 50 nanometers, at least 100 Nanometer, the size of at least 200 nanometers, at least 500 nanometers, at least 1.0 microns, at least 2.0 microns or at least 5.0 microns or straight Diameter.

Separating step may include filtered sample.Filtrate may be provided in the particle analyzed in method.Optionally, using filter Separation is lower than the particle of certain size, and wherein filter priority or completely remove size greater than 100 nanometers, size is greater than 200 nanometers, size be greater than 300 nanometers, size be greater than 500 nanometers, size be greater than 1.0 microns, size be greater than 2.0 microns, size It is greater than 5.0 microns or particle of the size greater than 10.0 microns greater than 3.0 microns, size.Optionally, can be used has identical ruler Very little filtration parameter or the filter of different size exclusion parameters carry out two or more such filtration steps.Optionally, Filtrate from one or more filtration steps includes particle, and generates the sequence read being associated from it.

In the method, sample may include the first and second particles from blood, wherein each particle includes at least Two target nucleic acid (such as genomic DNA) segments, and wherein the method includes carrying out step (a) to generate the first particle Second group of target nucleic acid fragment that is associated of first group of be associated target nucleic acid fragment and second particle, and step (b) is carried out to produce First group of sequence read being associated of raw first particle and second group of sequence read being associated of the second particle.

In the method, for the first particle generate be associated sequence read group can with the second particle generated The sequence read group that is associated distinguishes.

In the method, sample may include the n particle from blood, wherein each particle contains at least two target nucleus Sour (such as genomic DNA) segment, and wherein the method includes carrying out step (a) to generate n group and be associated target nucleic acid piece Section, one group of each of described n particle, and step (b) is carried out to generate the sequence read that n group is associated, the n is a One group of each of particle.

In the method, n can be at least 3, at least 5, at least 10, at least 50, at least 100, at least 1000, at least 10,000, at least 100,000, at least 1,000,000, at least 10,000,000, at least 100,000,000, at least 1,000, 000,000, at least 10,000,000,000 or at least 100,000,000,000.Preferably, n is at least 100,000 particle.

In the method, nucleic acid samples may include at least three, at least five, at least ten, at least 50, at least 100 It is a, at least 1000, at least 10,000, at least 100,000, at least 1,000,000, at least 10,000,000, extremely It is 100,000,000, at least 1,000,000,000, at least 10,000,000,000 or at least 100,000,000 few, 000 particle, wherein in any step of the method, such as make sample and polymer bar code agent library (library Of multimeric barcoding reagent) contact any step, and/or sequence of barcodes is attached to Any step of (appending to) target nucleic acid, and/or coupling sequence (coupling sequence) is attached to target nucleic acid Any step, and/or during any step of crosslinking or permeabilization, the particle is included in single continuous aqueous volume.

The sequence read group being associated generated for each particle can be with the sequence being associated that generates for other particles Column reading group distinguishes.

Before step (a), the method may also include the step being assigned to sample at least two differential responses volumes Suddenly.

In the present invention, two sequences or sequence read (for example, being determined by sequencing reaction) can be existed by any means It is associated, allows such sequence in any way in computer system, in algorithm or in data set in information It is relative to each other or interrelated.It is such connection may include it is following, be determined by the following or by following presentation: by discrete mark Property connection, or by shared attribute, or by making sequence as two or more be associated, it is interrelated or relevant Any indirect method.

Connection may include following, and/or be determined by the following, and/or by following presentation: the sequence in sequencing reaction itself (such as by way of the sequence of barcodes that sequencing reaction determines, or two different pieces or section of individually determining sequence Form, together include first and second to be associated sequence), or it is determining independent of such sequence, include or indicate (such as by the inclusion of in identical flow cell or in the identical swimming lane of the flow cell or identical compartment of sequenator or region It is interior, or included in sequenator identical sequencing operation in, or with a degree of spatial proximity include in the biological sample, And/or it is included in sequenator or sequencing flow cell with a degree of spatial proximity.Connection may include it is following, and/or by It is identified below, and/or by following presentation: corresponding to the measurement or parameter of physical location or subregion in sequenator, such as image And/or more pixel cameras or pixel or location of pixels in more pixel charge coupled apparatuses, and/or for example nano-pore sequencing instrument or The position of nano-pore or nano-pore in nano-pore membrane.

Connection can be absolute (that is, two sequences are associated or are not associated, do not quantify in addition, sxemiquantitative Or qualitative/classification relation).Connection can also be opposite, probability in terms of the degree of connection, probability or degree or determine, include Or it indicates, such as relative to a series of one one or more ginseng that can have in quantitative, sxemiquantitative or qualitative/classification value Number (or by its expression).For example, two (or more) sequences can be by quantitative, sxemiquantitative or qualitative/sorting parameter in information On be associated, the parameter indicate, include, estimate or embody it is described two in sequenator (or more) sequence it is close Described two in degree or biological sample (or more) degree of approach of sequence.

Any analysis for two or more sequences for being related to being associated in information by any such mode, connection The presence (or being not present) of system can be used as any analysis or evaluation procedure or for the parameter in any algorithm of its execution.For It is related to any analysis for two or more sequences being associated in information by any such mode, degree, the probability of connection Or degree can be used as any analysis or evaluation procedure or for the parameter in any algorithm of its execution.

In a kind of form of such connection, two or more sequences that are associated of given group can be with unique identifier (such as alpha numeric identifier) or bar code or sequence of barcodes are associated.In another form, two or more of given group A sequence that is associated can be associated with bar code or sequence of barcodes, wherein the bar code or sequence of barcodes are included in true by sequencing reaction In fixed sequence.For example, each sequence measured in sequencing reaction may include sequence of barcodes and correspond to genomic dna sequence Sequences.Optionally, certain sequences or the sequence that is associated can be indicated by two or more bar codes or identifier or and its It is associated.

In another form of connection, two or more sequences that are associated are positively retained at computer or computer network In interior, hard disk drive or any kind of storage medium or store discrete partition in any other device of sequence data It is interior.Optionally, certain sequences or the sequence that is associated can be reserved for two or more in such computer or data medium In subregion.

The sequence being associated in information may include a group or more groups of sequences being associated in information.Be associated sequence Sequence in group can all share identical be associated function or its expression;For example, all sequences being associated in group can be with phase Same bar code or associated with identical identifier, or may include in the identical partitions in computer or storage medium;Institute There is sequence that can share connection, correlation and/or the correlation of any other form.One or more sequences being associated in group Column can be described group of exclusiveness member, and therefore not be the member of any other group.Alternatively, be associated in group one or More sequences can be described group of nonexcludability member, and therefore the sequence different can be connected by two or more Be sequence group indicate and/or it is associated with it.

1. including the sample of particle

Sample for the method for the present invention includes the particle that at least one is originated from blood (such as human blood).Particle may originate from Maternal blood.Particle may originate from the blood of the patient with disease (such as cancer).Sample can be such as blood sample, blood plasma Sample or blood serum sample.Sample can be Mammalian samples.Preferably, sample is human sample.

A variety of cell-free particles are had found in blood, blood plasma and/or the serum from people and other animals (Orozco et al, Cytometry Part A (2010) .77A:502 514,2010).Group of these particles in its source It knits and cell and its biophysics process for forming behind and its respective size and molecular structure and composition aspect is Multiplicity.Particle may include component (such as incorporation phospholipid fraction) and some intracellular or nucleus groups from cell membrane Point.Particle includes efflux body, apoptotic body (also referred to as apoptosis vesicle) and extracellular microvesicle.

Particle may be defined as containing at least two the membrane vesicle of target nucleic acid (such as genomic DNA) segment.The diameter of particle It can be 100-5000nm.Preferably, the diameter of particle is 100-3000 nanometers.

Efflux body is one of the smallest circulation particle, and diameter is usually 50 to 100 nanometers, and is considered from living complete The cell membrane of whole cell, and containing include protein in external phospholipid fraction and RNA component (including mRNA molecule with/ Both or the mRNA molecule and minor adjustment RNA molecule of degradation, such as microrna molecule).Efflux body is considered more by cytoplasm (Gyorgy et al, Cell.Mol.Life Sci. (2011) 68:2667-2688) that the exocytosis of foam is formed.Outlet Body is considered playing different effects (Kanada et al, PNAS in cell-ECM signal transduction and function of extracellular (2015)1418401112).It is previously described for determining the microRNA and/or mRNA molecule that are found in efflux body Amount or the technology (for example, U.S. Patent application 13/456,121, European application EP2626433 A1) of sequencing.

Particle further includes apoptotic body (also referred to as apoptosis vesicle) and extracellular microvesicle, and overall diameter is up to 1 micron or very To 2 to 5 microns, and have been generally acknowledged that diameter is greater than 100 nanometers of (Lichtenstein et al, Ann N Y Acad Sci. (2001);945:239-49).All types of circulation particles are considered as being generated by intracorporal a large amount of and various kinds of cell (Thierry et al, Cancer Metastasis Rev 35 (3), 347-376.9 (2016)/s10555-016-9629- x)。

Preferably, particle is not efflux body, for example, particle is any particle that diameter is greater than efflux body.

Be previously described be largely used to separation circulation particle (and/or circulation particle specific subgroup, classification or grade Point) method.European patent ES2540255 (B1) described with 9005888 B2 of United States Patent (USP) separated based on centrifugally operated it is specific The method for recycling particle (such as apoptotic body).Previously be fully described and develop it is a large amount of by centrifugation, ultracentrifugation and its His technology separate different types of cell-free particle method (Gyorgy et al, Cell.Mol.Life Sci. (2011) 68: 2667-2688)。

Particle contains at least two target nucleic acid fragment (such as molecule of fragmentation genomic DNA).These fragmentation genes The sequence for including in group DNA molecular and/or these fragmentation genomic DNA molecules can be by any method phase described herein Connection.

The segment of target nucleic acid can be DNA fragmentation (such as molecule of fragmentation genomic DNA) or RNA segment (such as MRNA segment).Preferably, target nucleic acid fragment is genomic DNA fragment.

DNA fragmentation can be the segment of mitochondrial DNA.DNA fragmentation can be the mitochondria from mother cell or tissue DNA fragmentation.DNA fragmentation can be the mitochondria DNA fragment from fetus or placenta tissue.DNA fragmentation can be from illness The segment of the mitochondrial DNA of tissue and/or cancerous tissue.

Particle may include blood platelet.Particle may include the blood platelet (tumour-educated of tumour education platelet).Target nucleic acid may include blood platelet RNA (for example, the segment of blood platelet RNA and/or the blood platelet RNA of tumour education Segment).Sample comprising one or more blood platelets may include the blood plasma rich in blood platelet (for example, comprising tumour education Blood platelet the blood plasma rich in blood platelet).

Target nucleic acid fragment may include double-strand or single-chain nucleic acid.Genomic DNA fragment may include double-stranded DNA or single stranded DNA.Target Nucleic acid fragment may include partially double stranded nucleic acid.Genomic DNA fragment may include dsdna segment.

Target nucleic acid fragment can be the segment from single nucleic acid molecules, or the piece from two or more nucleic acid molecules Section.For example, genomic DNA fragment can be originated from individual gene group DNA molecular.

As it will be understood by the skilled person, term target nucleic acid fragment used herein refers to the original sheet being present in particle Section and its copy or amplicon.For example, term gDNA segment refers to the original gDNA segment being present in particle, and, for example, can The DNA molecular prepared by primer extension reaction from original gene group DNA fragmentation.As another example, term mRNA segment Refer to the original mRNA segment being present in particle, and, for example, the cDNA that reverse transcription is prepared from original mRNA segment can be led to and divided Son.

Target nucleic acid (such as genomic DNA) segment can be at least ten nucleotide, at least 15 nucleotide, at least 20 Nucleotide, at least 25 nucleotide or at least 50 nucleotide.Target nucleic acid (such as genomic DNA) segment can be 15 to 100, 000 nucleotide, 20 to 50,000 nucleotide, 25 to 25,000 nucleotide, 30 to 10,000 nucleotide, 35-5,000 A nucleotide, 40-1000 nucleotide or 50-500 nucleotide.Target nucleic acid (such as genomic DNA) segment can be length For 20 to 200 nucleotide, length be 100 to 200 nucleotide, length is 200 to 1000 nucleotide, length be 50 to 250 nucleotide, length are 1000 to 10,000 nucleotide, length is 10,000 to 100,000 nucleotide or length is 50 to 100,000 nucleotide.Preferably, the length of fragmentation genomic DNA molecule is 50 to 500 nucleotide.

In the sample, the concentration of particle be smaller than 0.001 particle/microlitre, less than 0.01 particle/microlitre, less than 0.1 A particle/microlitre, less than 1.0 particles/microlitre, less than 10 particles/microlitre, less than 100 particles/microlitre, less than 1000 A particle/microlitre, less than 10,000 particles/microlitre, less than 100,000 particles/microlitre, less than 1,000,000 particles/ Microlitre, less than 10,000,000 particles/microlitre or less than 100,000,000 particles/microlitre.

In the sample, it is micro- to be smaller than 1.0 piks (picogram) DNA/ for the concentration of nucleic acid (such as genomic DNA) segment Rise, less than DNA/ microlitres of 10 pik, less than DNA/ microlitres of 100 pik, less than DNA/ microlitres of 1.0 nanogram, less than 10 nanogram DNA/ Microlitre, less than DNA/ microlitres of 100 nanogram or less than DNA/ microlitres of 1000 nanogram

2. being associated by bar code

The present invention provides method of the preparation for the sample of sequencing, wherein the sample includes the particle from blood, Described in particle contain at least two target nucleic acid (such as genomic DNA) segment, and wherein the method includes will be described micro- At least two target nucleic acid fragments of grain are attached to the different sequence of barcodes of sequence of barcodes or sequence of barcodes group, to generate the target that is associated Nucleic acid fragment group.

The present invention provides method of the preparation for the sample of sequencing, wherein the sample includes circulation particle, wherein described Circulation particle contains at least two target nucleic acid (such as genomic DNA) segment, and wherein the method includes by the circulation At least two target nucleic acid fragments of particle are attached to the different sequence of barcodes of sequence of barcodes or sequence of barcodes group, are associated with generating Target nucleic acid fragment group.

In the different sequence of barcodes that at least two target nucleic acid fragments of particle are attached to sequence of barcodes or sequence of barcodes group The step of before, the method may include that coupling sequence is attached to each target nucleic acid (such as genomic DNA) segment of particle, Wherein coupling sequence is then attached to the different sequence of barcodes of sequence of barcodes or sequence of barcodes group, with the target that is associated described in generation Nucleic acid fragment group.

In the method, sample may include the first and second particles from blood, wherein each particle includes at least Two target nucleic acid (such as genomic DNA) segments, and wherein the method may include by least two target nucleus of the first particle Acid fragment is attached to the different sequence of barcodes of the first sequence of barcodes or first group of sequence of barcodes, to generate first group of target nucleus that is associated Acid fragment, and at least two target nucleic acid fragments of the second particle are attached to the second sequence of barcodes or second group of sequence of barcodes Different sequence of barcodes, to generate second group of target nucleic acid fragment that is associated.

First sequence of barcodes may differ from the second sequence of barcodes.The sequence of barcodes of first group of sequence of barcodes may differ from second The sequence of barcodes of group sequence of barcodes.

In the method, sample may include the n particle from blood, wherein each particle contains at least two target nucleus Sour (such as genomic DNA) segment, and wherein the method includes carrying out step (a) to generate n group and be associated target nucleic acid piece Section, one group of each of n circulation particle.

In the method, n can be at least 3, at least 5, at least 10, at least 50, at least 100, at least 1000, at least 10,000, at least 100,000, at least 1,000,000, at least 10,000,000, at least 100,000,000, at least 1,000, 000,000, at least 10,000,000,000 or at least 100,000,000,000.Preferably, n is at least 100,000 particle.

Preferably, the every group of sequence read that is associated is associated by different sequence of barcodes or different group sequence of barcodes.Bar code Each sequence of barcodes of sequence group may differ from least 1 in library, at least 4, at least 9, at least 49, at least 99, at least 999, extremely Few 9,999, at least 99,999, at least 999,999, at least 9,999,999, at least 99,999,999, at least 999,999,999, At least 9,999,999,999, at least 99,999,999,999 or at least 999,999,999,999 other sequence of barcodes groups Sequence of barcodes.Each sequence of barcodes of sequence of barcodes group may differ from the sequence of barcodes of every other sequence of barcodes group in library. Preferably, each sequence of barcodes of sequence of barcodes group is different from the sequence of barcodes of other sequence of barcodes groups of at least nine in library.

The method of sample the present invention provides analysis comprising the particle from blood, wherein the particle includes at least two A target nucleic acid fragment, and the method comprise the steps that sample of (a) preparation for sequencing comprising extremely by the particle Few two target nucleic acid (such as genomic DNA) segments are attached to sequence of barcodes to generate the target nucleic acid fragment group that is associated;And (b) Each of the described group segment that is associated is sequenced to generate at least two and be associated sequence read, wherein described at least two A sequence read that is associated is associated by the sequence of barcodes.

Sequence of barcodes may include unique sequences.Each sequence of barcodes may include at least five, at least ten, at least 15, extremely Few 20, at least 25, at least 50 or at least 100 nucleotide.Preferably, each sequence of barcodes includes at least five nucleosides Acid.Preferably, each sequence of barcodes includes deoxyribonucleotide, and all nucleotide optionally in sequence of barcodes are all deoxidations Ribonucleotide.One or more deoxyribonucleotides, which can be, (such as uses biotin through modification deoxyribonucleotide The deoxyribonucleotide or deoxyuridine acid of part modification).Sequence of barcodes may include one or more degenerate cores Thuja acid or sequence.Sequence of barcodes can not include any degeneracy nucleotide or sequence.

It in the method, should before the step of at least two target nucleic acid fragments of particle are attached to sequence of barcodes Method may include each nucleic acid fragment that coupling sequence is attached to particle, and wherein coupling sequence is then attached to sequence of barcodes To generate the segment group that is associated.

In the method, sample may include the first and second particles from blood, wherein each particle includes at least Two target nucleic acid (such as genomic DNA) segments, and wherein the method includes carrying out step (a) to generate the first particle Second group of target nucleic acid fragment that is associated of first group of be associated target nucleic acid fragment and second particle, and step (b) is carried out to produce First group of be associated sequence read and second group of sequence read that is associated of the second particle of raw first particle, wherein relative to the At least two of two particles are associated sequence read, and the sequence read that is associated of at least two of the first particle passes through different bar codes Sequence is associated.

Relative to second group of segment that is associated, first group of segment that is associated can be associated by different sequence of barcodes.

In the method, sample may include the n particle from blood, wherein each particle contains at least two target nucleus Sour (such as genomic DNA) segment, and wherein the method includes carrying out step (a) to generate n group and be associated target nucleic acid piece Section, one group of each of n particle, and progress step (b) is associated sequence read with generating n group, it is every in n particle One one group.

In the method, n can be at least 3, at least 5, at least 10, at least 50, at least 100, at least 1000, at least 10,000, at least 100,000, at least 1,000,000, at least 10,000,000, at least 100,000,000, at least 1,000, 000,000, at least 10,000,000,000 or at least 100,000,000,000.Preferably, n is at least 100,000 particle.

Preferably, the every group of sequence read that is associated is associated by different sequence of barcodes.

In the method, different sequence of barcodes can be used as the offer of sequence of barcodes library.Text used in the method Library may include at least two, at least five, at least ten, at least 50, at least 100, at least 1000, at least 10,000, At least 100,000, at least 1,000,000, at least 10,000,000, at least 100,000,000, at least 1,000, 000,000, at least 10,000,000,000, at least 100,000,000,000 or at least 1,000,000,000,000 A different sequence of barcodes.Preferably, library used in the method includes at least 1,000,000 different bar code sequence Column.

In the method, each sequence of barcodes in library can be only attached to the segment from single particle.

The method can be it is deterministic, i.e. a sequence of barcodes can be used for identifying the sequence from single particle read It takes or probabilistic, i.e. a sequence of barcodes can be used for identifying the sequence read that may be from single particle.In certain implementations In scheme, a sequence of barcodes is attached to the genomic DNA fragment from two or more particles.

The method can include: sample of (a) preparation for sequencing comprising by least two target nucleic acid (examples of particle Such as genomic DNA) each of segment is attached to the different sequence of barcodes of sequence of barcodes group to generate the target nucleic acid piece that is associated Section group;And (b) segment that is each associated in described group is sequenced to generate at least two and be associated sequence read, wherein Described at least two sequence reads that are associated are associated by the sequence of barcodes group.

In the method, each of at least two target nucleic acid fragments of particle are being attached to different bar code sequences Before the step of column, the method may include each target nucleic acid fragment that coupling sequence is attached to particle, and wherein particle is extremely Each of few two target nucleic acid fragments are attached to the different sequence of barcodes of the sequence of barcodes group by its coupling sequence.

In the method, sample may include the first and second particles from blood, wherein each particle includes at least Two target nucleic acid (such as genomic DNA) segments, and wherein the method may include carrying out step (a) to generate the first particle First group of be associated target nucleic acid fragment and second particle second group of connection target nucleic acid DNA fragmentation, and carry out step (b) To generate first group of be associated sequence read and second group of sequence read that is associated of the second particle of the first particle, wherein relatively In second group of sequence read that is associated, first group of sequence read that is associated is associated by different sequence of barcodes groups.

In the method, sample may include the n particle from blood, wherein each particle contains at least two target nucleus Sour (such as genomic DNA) segment, and wherein the method may include carrying out step (a) to generate n group and be associated target nucleic acid Segment, one group of each of n particle, and carry out step (b) and be associated sequence read with generating n group, in n particle Each one group.

In the method, n can be at least 3, at least 5, at least 10, at least 50, at least 100, at least 1000, at least 10,000, at least 100,000, at least 1,000,000, at least 10,000,000, at least 100,000,000, at least 1,000, 000,000, at least 10,000,000,000 or at least 100,000,000,000.Preferably, n is at least 100,000 particle.

Preferably, the every group of sequence read that is associated is associated by different sequence of barcodes groups.

In the method, different sequence of barcodes groups can be used as the offer of sequence of barcodes group library.It is used in the method Library may include at least two, at least five, at least ten, at least 50, at least 100, at least 1000, at least 10,000 A, at least 100,000, at least 1,000,000, at least 10,000,000, at least 100,000,000, at least 1,000, 000,000, at least 10,000,000,000, at least 100,000,000,000 or at least 1,000,000,000,000 A different sequence of barcodes group.Preferably, library used in the method includes at least 1,000,000 different bar code sequence Column group.

Each sequence of barcodes of sequence of barcodes group may differ from least one in library, at least four, at least nine, at least 49 It is a, at least 99, at least 999, at least 9,999, at least 99,999, at least 999,999, at least 9,999,999, At least 99,999,999, at least 999,999,999, at least 9,999,999,999, at least 99,999,999,999, Or at least 999,999,999,999 other sequence of barcodes groups.Each sequence of barcodes of sequence of barcodes group may differ from library Every other sequence of barcodes group.Preferably, each sequence of barcodes in sequence of barcodes group be different from library at least nine other The bar code sequence of code sequence group.

In the method, the sequence of barcodes of the sequence of barcodes group from library can be only attached to the piece from single particle Section.

The method can be deterministic, i.e., sequence of barcodes group can be used for identifying the sequence read from single particle, Or it is probabilistic, i.e., sequence of barcodes group can be used for identifying the sequence read that may be from single particle.

The method may include first and second samples of the preparation for sequencing, wherein each sample includes at least one source The particle of autoblood, wherein each particle contains at least two target nucleic acid (such as genomic DNA) segment, and wherein bar code sequence Column respectively contain sample identifier region (sample identifier region), and the method comprise the steps that (i) right Each sample carries out step (a), comes from wherein being attached to the sequence of barcodes of the target nucleic acid fragment from the first sample with being attached to The sequence of barcodes of the target nucleic acid fragment of second sample has different sample identifier regions;(ii) step is carried out to each sample (b), wherein the sequence read that is each associated includes the sequence in sample identifier region;And (iii) passes through its sample identifier Region determination obtains the sample of each sequence read that is associated.

In the method, described before, during and/or after the step of being attached sequence of barcodes and/or coupling sequence Method may include the step of being crosslinked the genomic DNA fragment in particle.

In the method, before, during and/or after the step of being attached sequence of barcodes and/or coupling sequence, and/ Or optionally after the step of being crosslinked the genomic DNA fragment in particle, the method may include making the step of particle permeabilization Suddenly.Before transfer step and optionally after cross-linking step, the method includes making particle permeabilization.

Sequence of barcodes may include in the bar code oligonucleotides in the solution of bar code oligonucleotides;This bar code is few Nucleotide can be single-stranded, double-strand or with the single-stranded of one or more double-stranded regions.Bar code oligonucleotides can be single-stranded Or it is connect in double-strand connection reaction with target nucleic acid fragment.Bar code oligonucleotides may include the list that can be connect with target nucleic acid fragment Chain 5 ' or 3 ' regions.Each bar code oligonucleotides can be connect in single-stranded connection reaction with target nucleic acid fragment.Alternatively, bar code Oligonucleotides may include the flush end that can be connect with target nucleic acid fragment, female end or jag 5 ' or 3 ' regions.Each bar code Changing oligonucleotides can connect in double-strand is connected and reacted with the segment of target nucleic acid.

In certain methods, the end of target nucleic acid fragment can be converted into flush end duplex ends in flat end reaction, and Bar code oligonucleotides may include flush end duplex ends.Each bar code oligonucleotides can flush end connect reaction in target nucleic acid Segment connection.In certain methods, the end of target nucleic acid fragment can convert flush end double-strand for its end in flat end reaction End, and the form with single 3 ' adenosine jag then is converted by its end, and the wherein bar code few nucleosides Acid includes the duplex ends with single 3 ' thymidine jag, can be with single 3 ' the adenosine jag of target nucleic acid fragment Annealing.Each bar code oligonucleotides can be connect in double-strand A/T connection reaction with target nucleic acid fragment.

In certain methods, bar code oligonucleotides includes at its 3 ' or 5 ' end can be with target nucleic acid and/or coupling sequence In target region annealing target region, and can be by moving back bar code oligonucleotides with the target nucleic acid and/or coupling sequence It fights and bar code oligonucleotides is optionally extended and/or be connected to nucleic acid target and/or coupling sequence is attached by sequence of barcodes It is connected to target nucleic acid.

In certain methods, before being attached bar code oligonucleotides, coupling sequence is attached to genomic DNA fragment.

Before being attached step, the method may include being assigned to nucleic acid samples at least two differential responses volumes Step.

3. being associated using polymer bar code reagent by bar code

Method the present invention provides preparation for the sample of sequencing, wherein the sample includes the particle from blood, And wherein the particle contains at least two target nucleic acid (such as genomic DNA) segment, and wherein the method includes with Lower step: contacting sample with the library comprising polymer bar code reagent, and wherein polymer bar code reagent includes connection The first and second bar code regions together, wherein each bar code region includes nucleic acid sequence;And (b) sequence of barcodes is attached Each of first and second target nucleic acid fragments to particle to generate the first and second bar code target nucleic acid molecules of particle, Wherein the first bar code target nucleic acid molecule includes the nucleic acid sequence and the second bar code target nucleic acid molecule packet in the first bar code region Nucleic acid sequence containing the second bar code region.

Method the present invention provides preparation for the sample of sequencing, wherein the sample includes the particle from blood, And wherein the particle contains at least two target nucleic acid (such as genomic DNA) segment, and wherein the method includes with Lower step: contacting sample with polymer bar code reagent, and wherein polymer bar code reagent includes the to link together One and the second bar code oligonucleotides, and wherein bar code oligonucleotides respectively contains bar code region;And (b) make the first He The annealing of first and second target nucleic acid fragments of the second bar code oligonucleotides and particle or connection, to generate the first and second bar codes Change target nucleic acid molecule.

Method the present invention provides preparation for the sample of sequencing, wherein the sample includes the first He from blood Second particle, and wherein each particle contains at least two target nucleic acid (such as genomic DNA) segment, and the wherein side Method is the following steps are included: (a) contacts sample with the library for containing at least two polymer bar code reagent, wherein each poly Body bar code reagent includes the first and second bar code regions to link together, wherein each bar code region includes nucleic acid sequence, And wherein the first and second bar code regions of the first polymer bar code reagent are different from the second polymer bar code in library First and second bar code regions of reagent;And sequence of barcodes (b) is attached to the first and second target nucleic acid pieces of the first particle Each of section, to generate the first and second bar code target nucleic acid molecules of the first particle, wherein the first bar code target nucleic acid Molecule includes the nucleic acid sequence and the second bar code target nucleic acid molecule in the first bar code region of the first polymer bar code reagent The nucleic acid sequence in the second bar code region comprising the first polymer bar code reagent, and shape code sequence is attached to the second particle Each of the first and second target nucleic acid fragments, to generate the first and second bar code target nucleic acid molecules of the second particle, Wherein the first bar code target nucleic acid molecule include the second polymer bar code reagent the first bar code region nucleic acid sequence and Second bar code target nucleic acid molecule includes the nucleic acid sequence in the second bar code region of the second polymer bar code reagent.

Method the present invention provides preparation for the sample of sequencing, wherein the sample includes the first He from blood Second particle, and wherein each particle contains at least two target nucleic acid (such as genomic DNA) segment, and the wherein side Method is the following steps are included: (a) contacts sample with the library for containing at least two polymer bar code reagent, wherein each poly Body bar code reagent includes the first and second bar code oligonucleotides to link together, and wherein bar code oligonucleotides respectively wraps Region containing bar code, and the bar code of the first and second bar code oligonucleotides of the first polymer bar code reagent of its Chinese library Region is different from the bar code region of the first and second bar code oligonucleotides of the second polymer bar code reagent in library;And (b) make the first and second bar code oligonucleotides of the first polymer bar code reagent and the first and second target nucleus of the first particle Acid fragment annealing is connected to generate the first and second bar code target nucleic acid molecules, and makes the second polymer bar code reagent First and second bar code oligonucleotides are annealed with the first and second target nucleic acid fragments of the second particle or are connect to generate first With the second bar code target nucleic acid molecule.

Bar code oligonucleotides can be connect in single-stranded or double-stranded connection reaction with target nucleic acid fragment.

In the method, bar code oligonucleotides may include single-stranded 5 ' or 3 ' areas that can be connect with target nucleic acid fragment Domain.Each bar code oligonucleotides can be connect in single-stranded connection reaction with the segment of target nucleic acid.

In the method, bar code oligonucleotides may include the flush end that can be connect with target nucleic acid fragment, female end or prominent Outlet 5 ' or 3 ' regions.Each bar code oligonucleotides can be connected in reaction in double-strand and be connect with the segment of target nucleic acid.

In the method, the end of target nucleic acid fragment can be converted into flush end duplex ends in flat end reaction, and Bar code oligonucleotides may include flush end duplex ends.Each bar code oligonucleotides can flush end connect reaction in target nucleic acid Segment connection.

In the method, the end of target nucleic acid fragment can convert flush end double-strand end for its end in flat end reaction End, and the form with single 3 ' adenosine jag then is converted by its end, and wherein bar code oligonucleotides includes Duplex ends with single 3 ' thymidine jag can anneal with single 3 ' the adenosine jag of target nucleic acid fragment. Each bar code oligonucleotides can be connect in double-strand A/T connection reaction with the segment of target nucleic acid.

In the method, the end of target nucleic acid fragment can be contacted with restriction enzyme, and wherein restriction enzyme is at restriction site Each segment is digested to generate connection border (ligation junction), and wherein bar code at these restriction sites Changing oligonucleotides includes that the compatible end of border is connect with these.Each bar code oligonucleotides can double-strand connect reaction in It is connect at the connection border with target nucleic acid fragment.Optionally, the restriction enzyme can be EcoRI, HindIII or BglII.

In the method, make the first and second bar code oligonucleotides and the first and second target nucleic acid fragments anneal or Before the step of connection, the method may include that coupling sequence is attached to each target nucleic acid fragment, wherein then making the first He Second bar code oligonucleotides is annealed or is connect with the coupling sequence of the first and second target nucleic acid fragments.

In the method, step (b) can include: (i) makes the first and second bar codes of the first polymer bar code reagent Change the first and second target nucleic acid fragments annealing of oligonucleotides and the first particle, and makes the of the second polymer bar code reagent One and second first and second target nucleic acid fragments of bar code oligonucleotides and the second particle anneal;And

(ii) extend the first and second bar code oligonucleotides of the first polymer bar code reagent to generate the first He Second different bar code target nucleic acid molecule, and make the first and second bar code few nucleosides of the second polymer bar code reagent Acid extends to generate the first and second different bar code target nucleic acid molecules, wherein each bar code target nucleic acid molecule includes at least One nucleotide by target nucleic acid fragment as templated synthesis.

The method can include: (a) contacts sample with the library for containing at least two polymer bar code reagent, wherein Each polymer bar code reagent includes the first and second bar code oligonucleotides to link together, wherein bar code few nucleosides Acid includes respectively target region and bar code region, the first He of the first polymer bar code reagent of Chinese library with 5 ' to 3 ' directions The bar code region of second bar code oligonucleotides is different from the first and second bar codes of the second polymer bar code reagent in library Change the bar code region of oligonucleotides, and wherein sample also draws with the first and second targets for each polymer bar code reagent Object contact;And (b) to each particle follow the steps below (i) make the first bar code oligonucleotides target region and particle First subsequence of one target nucleic acid (such as genomic DNA) segment is annealed, and makes the target region of the second bar code oligonucleotides It anneals with first subsequence of the second target nucleic acid (such as genomic DNA) segment of particle, (ii) makes the first target primer and particle The first target nucleic acid fragment the annealing of the second subsequence, wherein the second subsequence is the 3 ' of the first subsequence, and make the second target Second subsequence of the second target nucleic acid fragment of primer and particle is annealed, wherein the second subsequence is the 3 ' of the first subsequence, (iii) the first target nucleic acid fragment of particle is used so that the first target primer extend is reached first subsequence up to it as template, with It generates first and is extended target primer, and make the second target primer extend until it reaches the using the second target nucleic acid fragment of particle One subsequence is extended target primer to generate second, and the 3 ' ends that first is extended target primer are connected to first by (iv) 5 ' ends of codeization oligonucleotides are connected to generate the first bar code target nucleic acid molecule, and by the 3 ' ends that second is extended target primer To 5 ' ends of the second bar code oligonucleotides to generate the second bar code target nucleic acid molecule, wherein the first and second bar code target nucleus Acid molecule is different, and respectively contains at least one nucleotide by target nucleic acid as templated synthesis.

Polymer bar code reagent can respectively contain: the first and second hybrid molecules that (i) links together, wherein each Hybrid molecule includes the nucleic acid sequence containing hybridising region;And (ii) first and second bar code oligonucleotides, wherein first The annealing of the hybridising region of bar code oligonucleotides and the first hybrid molecule, and wherein the second bar code oligonucleotides and second miscellaneous Hand over the hybridising region annealing of molecule.

Polymer bar code reagent can respectively contain: the first and second molecular bar codes that (i) links together, wherein each Molecular bar code includes the nucleic acid sequence containing bar code region;And (ii) first and second bar code oligonucleotides, wherein first Bar code oligonucleotides includes the bar code region annealed with the bar code region of the first molecular bar code, and wherein the second bar code is few Nucleotide includes the bar code region annealed with the bar code region of the second molecular bar code.

In the method, before step (b), the method may include by the first of the first polymer bar code reagent It is transferred in the first particle of sample with the second bar code oligonucleotides and by the first He of the second polymer bar code reagent Second bar code oligonucleotides is transferred to the step in the second particle of sample.Optionally, before step (b), the method Include the steps that for target primer being transferred in the first and second particles.Optionally, before step (b), the method is also wrapped It includes the first polymer bar code agent transfer into the first particle and by the second polymer bar code agent transfer to second Step in particle.

Method the present invention provides preparation for the sample of sequencing, wherein the sample is contained at least two from blood Particle, wherein each particle contains at least two target nucleic acid fragment, and the method comprise the steps that (a) makes Sample is contacted with the library comprising the first polymer bar code reagent and the second polymer bar code reagent, wherein each polymer Bar code reagent includes the first and second molecular bar codes for linking together, wherein each molecular bar code include optionally with 5 ' extremely 3 ' directions include bar code region and the nucleic acid sequence for being connected subregion;(b) coupling sequence is attached to the first and second particles First and second target nucleic acids (such as genomic DNA) segment;(c) for each polymer bar code reagent, make the first segment Coupling sequence is connected subregion annealing with the first molecular bar code, and makes the coupling sequence and the second molecular bar code of the second segment Linking subregion annealing;And (d) for each polymer bar code reagent, sequence of barcodes is attached at least the two of particle Each of a target nucleic acid fragment is to generate the first and second different bar code target nucleic acid molecules, wherein the first bar code target Nucleic acid molecules include the nucleic acid sequence in the bar code region of the first molecular bar code, and the second bar code target nucleic acid molecule includes second The nucleic acid sequence in the bar code region of molecular bar code.

In the method, each molecular bar code may include including bar code region and linking subregion with 5 ' to 3 ' directions Nucleic acid sequence, and wherein step (d) includes using the bar code area of the first molecular bar code for each polymer bar code reagent Domain extends the coupling sequence of the first segment to generate the first bar code target nucleic acid molecule as template, and uses the second bar code The bar code region of molecule extends the coupling sequence of the second segment to generate the second bar code target nucleic acid molecule as template, wherein First bar code target nucleic acid molecule includes the sequence complementary with the bar code region of the first molecular bar code, and the second bar code target nucleus Acid molecule includes the sequence complementary with the bar code region of the second molecular bar code.

In the method, it includes linking subregion and bar code region that each molecular bar code, which may include with 5 ' to 3 ' directions, Nucleic acid sequence, wherein step (d) includes, for each polymer bar code reagent, (i) using the bar code area of the first molecular bar code Domain makes the first extension primer anneal and extends to generate the first bar code oligonucleotides as template, and uses the second bar code point The bar code region of son makes the second extension primer anneal and extends to generate the second bar code oligonucleotides, wherein first as template Bar code oligonucleotides includes the sequence complementary with the bar code region of the first molecular bar code, and the second bar code oligonucleotides packet Containing the sequence complementary with the bar code region of the second molecular bar code, 3 ' ends of the first bar code oligonucleotides are connected to first by (ii) The 3 ' of the second bar code oligonucleotides are held with generating the first bar code target nucleic acid molecule at 5 ' ends of the coupling sequence of segment 5 ' ends of the coupling sequence of the second segment are connected to generate the second bar code target nucleic acid molecule.

In the method, each molecular bar code may include with 5 ' to 3 ' directions include linking subregion, bar code region and The nucleic acid sequence of initiation area, wherein step (d) includes, for each polymer bar code reagent, (i) making the first extension primer It anneals with the initiation area of the first molecular bar code and the bar code region of the first molecular bar code is used to draw the first extension as template Object extends to generate the first bar code oligonucleotides, and the second extension primer and the initiation area of the second molecular bar code is made to anneal And the bar code region of the second molecular bar code is used to extend the second extension primer to generate the second bar code few nucleosides as template Acid, wherein the first bar code oligonucleotides includes the sequence complementary with the bar code region of the first molecular bar code and the second bar code Oligonucleotides includes the sequence complementary with the bar code region of the second molecular bar code, and (ii) by the first bar code oligonucleotides 3 ' ends are connected to 5 ' ends of the coupling sequence of the first segment to generate the first bar code target nucleic acid molecule, and by the second bar code 3 ' ends of oligonucleotides are connected to 5 ' ends of the coupling sequence of the second segment to generate the second bar code target nucleic acid molecule.

Before step (b) or step (c), the method may include by the first polymer bar code reagent, coupling sequence And/or extension primer is transferred in the first particle and prolongs by the second polymer bar code reagent, coupling sequence and/or by primer Reach the step in the second particle.

The method can include: (a) contacts sample with the library comprising the first and second polymer bar code reagents, In each polymer bar code reagent include the first and second molecular bar codes for linking together, wherein each molecular bar code includes With 5 ' to 3 ' directions include bar code region and be connected subregion nucleic acid sequence, and wherein sample also with each polymer bar code The the first and second adapter oligonucleotides contact for changing reagent, wherein the first and second adapter oligonucleotides respectively contain linking Subregion, and (b) make the first polymer bar code reagent the first and second adapter oligonucleotides and the first particle One connects with the second target nucleic acid fragment, and make the first and second adapter oligonucleotides of the second polymer bar code reagent with First and second target nucleic acid fragments of the second particle connect;(c) for each polymer bar code reagent, keep the first adapter few The linking subregion of nucleotide is connected subregion annealing with the first molecular bar code, and makes the rank of the second adapter oligonucleotides Subregion is connect to anneal with the subregion that is connected of the second molecular bar code;And (d) is used for each polymer bar code reagent The bar code region of one molecular bar code extends the first adapter oligonucleotides to generate the first bar code target nucleic acid point as template Son, and the bar code region of the second molecular bar code is used to extend the second adapter oligonucleotides to generate Article 2 as template Codeization target nucleic acid molecule, wherein the first bar code target nucleic acid molecule includes the sequence complementary with the bar code region of the first molecular bar code Column, and the second bar code target nucleic acid molecule includes the sequence complementary with the bar code region of the second molecular bar code.

The method can comprise the following steps that (a) makes sample and the text comprising the first and second polymer bar code reagents Library contact, wherein each polymer bar code reagent includes: the first and second molecular bar codes that (i) links together, wherein often A molecular bar code includes the nucleic acid sequence that optionally 5 ' to 3 ' directions include linking subregion and bar code region, and (ii) first With the second bar code oligonucleotides, wherein the first bar code oligonucleotides include and the first molecular bar code bar code region annealing Bar code region, wherein the second bar code oligonucleotides includes the bar code region annealed with the bar code region of the second molecular bar code, and And the bar code region of the first and second bar code oligonucleotides of the first polymer bar code reagent of its Chinese library is different from text The bar code region of first and second bar code oligonucleotides of the second polymer bar code reagent in library;Wherein sample also with it is each First and second adapter oligonucleotides of polymer bar code reagent contact, wherein the first and second adapter oligonucleotides are each Self-contained linking subregion;(b) make the first and second adapter oligonucleotides of the first polymer bar code reagent and first micro- The annealing of the first and second target nucleic acids (such as genomic DNA) segment or connection of grain, and make the second polymer bar code reagent The first and second adapter oligonucleotides and the second particle the first and second target nucleic acids (such as genomic DNA) segment anneal Or connection, (c) for each polymer bar code reagent, make the linking subregion and the first bar code of the first adapter oligonucleotides The linking subregion of molecule is annealed, and makes the linking subregion of the second adapter oligonucleotides and being connected for the second molecular bar code Subregion annealing;And (d) for each polymer bar code reagent, 3 ' ends of the first bar code oligonucleotides are connected to the 5 ' ends of one adapter oligonucleotides are to generate the first bar code target nucleic acid molecule, and by the 3 ' of the second bar code oligonucleotides End is connected to 5 ' ends of the second adapter oligonucleotides to generate the second bar code target nucleic acid molecule.

In the method, step (b) may include keeping the first and second adapters of the first polymer bar code reagent few The annealing of the first and second target nucleic acids (such as genomic DNA) segment of nucleotide and the first particle, and make the second polymer item First and second adapter oligonucleotides of codeization reagent and the first and second target nucleic acids (such as genomic DNA) of the second particle Segment annealing, and wherein: (i) for each polymer bar code reagent, step (d) includes by the first bar code oligonucleotides 3 ' ends be connected to 5 ' ends of the first adapter oligonucleotides to generate the first bar code-adapter oligonucleotides and by second 3 ' ends of bar code oligonucleotides are connected to 5 ' ends of the second adapter oligonucleotides to generate the second bar code-adapter widow's core Thuja acid, and extend the first and second bar codes-adapter oligonucleotides to generate the first and second different bar code targets Nucleic acid molecules respectively contain at least one nucleotide by target nucleic acid fragment as templated synthesis, or (ii) for each Polymer bar code reagent, before step (d), the method includes extending the first and second adapter oligonucleotides to produce Raw first and second different target nucleic acid molecules, respectively contain at least one nucleosides by target nucleic acid fragment as templated synthesis Acid.

In the method, make the first and second adapter oligonucleotides and the first and second target nucleic acid fragments anneal or Before the step of connection, the method may include that coupling sequence is attached to each target nucleic acid fragment, wherein the first and second ranks Sub- oligonucleotides is connect then to anneal or connect with the coupling sequence of the first and second target nucleic acid fragments.

In the method, before step (b) or step (c), the method may include by the first polymer bar code First and second adapter oligonucleotides of reagent are transferred in the first particle and by the of the second polymer bar code reagent One and second adapter oligonucleotides be transferred to the step in the second particle, optionally wherein the step further includes by more than first Aggressiveness bar code agent transfer is into the first particle and by the second polymer bar code agent transfer into the second particle.

Herein in any method, the method may include target nucleic acid (such as the genome made in particle DNA the step of) segment is crosslinked.The step can use chemical cross-linking agent such as formaldehyde, paraformaldehyde, glutaraldehyde, two succinimides Base glutarate, ethylene glycol bis- (succinimidyl succinates) are carried out with bifunctional crosslinking agent or heterobifunctional agents. The step before any permeabilization step, after any permeabilization step, before any allocation step, any attachment coupling sequence Before step, any attachment coupling sequence the step of after, any attachment sequence of barcodes the step of before (such as step (b) it Before), any attachment sequence of barcodes the step of after (such as after step (d)) and attachment sequence of barcodes simultaneously or its any group Close lower carry out.For example, before contacting the sample comprising particle with the library of two or more polymer bar code reagents, The sample comprising particle can be made to be crosslinked.Any such cross-linking step can further be terminated by cancellation step, for example, by with Glycine solution mixes formaldehyde crosslinking step is quenched.Can before the specific subsequent step of scheme, such as primer extend, Before PCR or nucleic acid purification step, any such crosslinking is removed.

In the method, in step (b), (c) and/or (d) during the step of (i.e. attachment sequence of barcodes), particle and/ Or target nucleic acid fragment may include such as Ago-Gel, polyacrylamide gel or any covalent friendship in gel or hydrogel The gel of connection, such as the poly(ethylene glycol) gel of covalent cross-linking, or poly(ethylene glycol) and acrylate comprising thiol-functional The gel of the covalent cross-linking of the mixture of functionalized poly(ethylene glycol).

Herein in any method, optionally after cross-linking step, the method may include keeping particle saturating Change.It can make particle permeabilization by incubation step.Incubation step can carry out in the presence of chemical surfactant.Optionally, should Permeabilization step can before (such as before step (b)) be attached sequence of barcodes, after being attached sequence of barcodes (such as in step (d) after) or occur under the two before and after being attached sequence of barcodes.Incubation step can be at least 20 degrees Celsius, at least 30 Degree Celsius, at least 37 degrees Celsius, at least 45 degrees Celsius, at least 50 degrees Celsius, at least 60 degrees Celsius, at least 65 degrees Celsius, at least It is carried out at a temperature of 70 degrees Celsius or at least 80 degrees Celsius.Incubation step can be it is at least 1 second long, at least 5 seconds long, at least 10 It is second long, at least 30 seconds long, at least 1 minute long, at least 5 minutes long, at least 10 minutes long, at least 30 minutes long, at least 60 minutes Long or at least 3 small durations.The step can after any cross-linking step, before any permeabilization step, in any permeabilization step Later, before any allocation step, any attachment coupling sequence the step of before, in any attachment coupling sequence the step of Later, any attachment strip code sequence the step of before (for example, before step (b)), in the step of any attachment strip code sequence After (such as after step (d)) rapid, while be attached sequence of barcodes, or any combination thereof lower progress.For example, making to wrap Before sample containing particle is contacted with the library of two or more polymer bar code reagents, the sample comprising particle can be made to hand over Connection, and then permeabilization in the presence of chemical surfactant.

Herein in any method, protease digestion step (such as with Proteinase K enzymic digestion) digestion can be used The sample of particle.Optionally, which can be at least 10 seconds long, at least 30 seconds long, at least 60 seconds long, at least It is 5 minutes long, at least 10 minutes long, at least 30 minutes long, at least 60 minutes long, at least 3 small durations, at least 6 small durations, at least 12 Small duration, or at least 24 small durations.The step can after any cross-linking step, before any permeabilization step, any After changing step, before any allocation step, any attachment coupling sequence the step of before, in any attachment coupling sequence The step of after, any attachment strip code sequence the step of before (for example, before step (b)), in any attachment strip code sequence After the step of column (such as after step (d)), while being attached sequence of barcodes, or any combination thereof lower carry out.For example, Before contacting the sample comprising particle with the library of two or more polymer bar code reagents, it can make comprising particle Sample crosslinking, and then carry out partly being digested with protease K digesting step.

In the method, bar code oligonucleotides, adapter oligonucleotides and/or polymer bar code reagent can pass through It is compound with transfection reagent or lipid carrier (such as liposome or micella) and be transferred in particle.

Transfection reagent can be lipofectin, such as cation lipid transfection reagent.Optionally, the cationic lipid Matter transfection reagent contains at least two alkyl chain.Optionally, the cation lipid transfection reagent can be commercially available cation Lipofectin, such as Lipofectamine.

In the method, the bar code oligonucleotides of the first polymer bar code reagent may include in the first lipid carrier It is interior, and wherein the bar code oligonucleotides of the second polymer bar code reagent may include in the second lipid carrier.Lipid carries Body can be liposome or micella.

In the method, step (a) and (b) and optional (c) and (d) can in single reaction volume at least two It is carried out on a particle.

Before step (b), the method, which may also include, is assigned to nucleic acid samples at least two differential responses volumes The step of.

The method of sample the present invention provides analysis comprising the particle from blood, wherein the particle includes at least two A target nucleic acid (such as genomic DNA) segment, and the method comprise the steps that sample of (a) preparation for sequencing comprising (i) contact sample and the polymer bar code reagent in the first and second bar code regions comprising linking together, wherein each Bar code region include nucleic acid sequence, and (ii) sequence of barcodes is attached to it is each at least two target nucleic acid fragments of particle It is a to generate the first and second different bar code target nucleic acid molecules, wherein the first bar code target nucleic acid molecule include the first bar code The nucleic acid sequence in region and the second bar code target nucleic acid molecule include the nucleic acid sequence in the second bar code region;And (b) to every A bar code target nucleic acid molecule is sequenced to generate at least two and be associated sequence read.

In the method, each in the segment of at least two genomic DNAs that sequence of barcodes is attached to particle Before a step, the method may include each genomic DNA fragment that coupling sequence is attached to particle, wherein bar code sequence Column are then attached to the coupling sequence of each of at least two genomic DNA fragments of particle, to generate first and the Two different bar code target nucleic acid molecules.

The method may also include, optionally before step (a) (i) or (a) (ii), by polymer bar code reagent Step of the first and second bar code zone-transfers into particle

Before transfer step, any method described herein, which may also include, is crosslinked genomic DNA fragment in particle The step of.Cross-linking step can with chemical cross-linking agent for example formaldehyde, paraformaldehyde, glutaraldehyde, two succinimidyl glutarats, Ethylene glycol bis- (succinimidyl succinates) is carried out with bifunctional crosslinking agent or heterobifunctional agents.

During the step (a), particle and/or target nucleic acid fragment may include in gel or hydrogel, such as agarose is solidifying The gel of glue, polyacrylamide gel or any covalent cross-linking, such as the poly(ethylene glycol) gel of covalent cross-linking, or include mercaptan The gel of the covalent cross-linking of the mixture of the poly(ethylene glycol) of functionalized poly(ethylene glycol) and acrylate functional.

Before transfer step and optionally after cross-linking step, the method may also include the step for making particle permeabilization Suddenly.It can make particle permeabilization by incubation step.Incubation step can carry out in the presence of chemical surfactant.Optionally, should Permeabilization step can before being attached sequence of barcodes (such as before step (a) (ii)), after being attached sequence of barcodes (such as After step (a) (ii)) or occur under the two before and after being attached sequence of barcodes.Incubation step can be Celsius at least 20 Degree, at least 30 degrees Celsius, at least 37 degrees Celsius, at least 45 degrees Celsius, at least 50 degrees Celsius, at least 60 degrees Celsius, at least 65 take the photograph It is carried out at a temperature of family name's degree, at least 70 degrees Celsius or at least 80 degrees Celsius.Incubation step can be it is at least 1 second long, at least 5 seconds It is long, at least 10 seconds long, at least 30 seconds long, at least 1 minute long, at least 5 minutes long, at least 10 minutes long, at least 30 minutes long, extremely Few 60 minutes long or at least 3 small durations.

Protease digestion step (such as with Proteinase K enzymic digestion) digestion particulate samples can be used.Optionally, which disappears Change step can be it is at least 10 seconds long, it is at least 30 seconds long, it is at least 60 seconds long, it is at least 5 minutes long, it is at least 10 minutes long, at least 30 Minute long, at least 60 minutes long, at least 3 small durations, at least 6 small durations, at least 12 small durations, or at least 24 small durations.The step Suddenly can before permeabilization, after permeabilization, before being attached sequence of barcodes (such as before step (a) (ii)), in attachment strip After (such as after step (a) (ii)) code sequence, while be attached sequence of barcodes, or any combination thereof lower progress.

First and second bar code regions of polymer bar code reagent can by with transfection reagent or lipid carrier (such as rouge Plastid or micella) it is compound and be transferred in particle.

Transfection reagent can be lipofectin, such as cation lipid transfection reagent.Optionally, the cationic lipid Matter transfection reagent contains at least two alkyl chain.Optionally, the cation lipid transfection reagent can be commercially available cation Lipofectin, such as Lipofectamine.

The step of the method (a), can be used for appointing for the sample (or nucleic acid samples) of sequencing by preparation described herein Where method carries out.

The method may include first and second samples of the preparation for sequencing, wherein each sample includes at least one source The particle of autoblood, wherein each particle contains at least two target nucleic acid (such as genomic DNA) segment, and wherein bar code sequence Column respectively contain sample identifier region, and the method comprise the steps that (i) step (a) are carried out to each sample, wherein attached It is connected to the sequence of barcodes of the nucleic acid fragment from the first sample and is attached to the bar code sequence of the target nucleic acid fragment from the second sample Column have different sample identifier regions;(ii) step (b) is carried out to each sample, wherein each sequence read includes sample The sequence of tag slot;And (iii) obtains the sample of each sequence read by the determination of its sample identifier region.

The method may include the sample that analysis contains at least two the particle from blood, wherein each particle includes extremely Few two target nucleic acid (such as genomic DNA) segments, and the method comprise the steps that (a) is prepared for being sequenced Sample comprising: (i) make sample with comprising the polymer bar code reagent for each of two or more particles Polymer bar code agent library contact, wherein each polymer bar code reagent is as defined herein;And (ii) will Sequence of barcodes is attached to each of at least two target nucleic acid fragments of each particle, wherein from least two particle Each generate at least two bar code target nucleic acid molecules, and wherein from single particle generate at least two bar code target nucleus Acid molecule respectively contains the nucleic acid sequence in the bar code region from same polymer bar code reagent;And (b) to each bar code Change target nucleic acid molecule to be sequenced, be associated sequence read with generating at least two of each particle.

Sequence of barcodes can be attached to the step of genomic DNA fragment, i.e. the method for particle in single reaction volume (a) it can be carried out in single reaction volume.

Before being attached step (step (a) (ii)), the method, which may also include, is assigned at least two differences for sample Step in reaction volume.

In office where in method, before the step of being attached sequence of barcodes, polymer bar code reagent is separable, classification (fractionate) or two or more component parts are dissolved into, such as discharge bar code oligonucleotides.

In office where in method, the concentration of polymer bar code reagent is smaller than 1.0 and flies to rub (femtomolar), fly less than 10 It rubs, flies to rub less than 100, rub less than 1.0 skins, rub less than 10 skins, rubbing less than 100 skins, being received less than 1 and rub, received less than 10 and rub, be less than 100 receive and rub or micro- rub less than 1.0.

4. being associated by the way that segment links together

The method of sample the present invention provides analysis comprising the particle from blood, wherein the particle includes at least two A target nucleic acid (such as genomic DNA) segment, and the method comprise the steps that sample of (a) preparation for sequencing comprising At least two target nucleic acid fragments of particle are linked together, the single of the sequence of target nucleic acid fragment is contained at least two with generation Nucleic acid molecules;And (b) each segment in the single nucleic acid molecules is sequenced, it is associated sequence with generating at least two Column are read.

At least two target nucleic acids (such as genomic DNA) segment can be continuously in the single nucleic acid molecules 's.

Described at least two are associated sequence read can the offer in single original series are read.

The method may include, before contact step, coupling sequence is attached at least one target nucleic acid (such as gene Group DNA) segment, and then at least two target nucleic acid fragments are linked together by coupling sequence.

Target nucleic acid (such as genomic DNA) segment can be linked together by solid support, two of them or multiple Section connection is to identical solid support (directly or indirectly, such as pass through coupling sequence).Optionally, solid support is Pearl, such as Styrofoam pearl, super-paramagnetic bead or sepharose 4B.

Target nucleic acid (such as genomic DNA) segment can by connection reaction (such as double-strand connection reaction or single-stranded connection it is anti- Answer) it links together.

The end of target nucleic acid fragment can be converted into the attachable duplex ends of flush end, and the side in flat end reaction Method may include that two or more segments are connected to each other by flush end connection reaction.

The end of target nucleic acid fragment can be contacted with restriction enzyme, wherein restriction enzyme at restriction site digestion fragment at this A little restriction sites generate connection border, and wherein the method may include being reacted by the connection at connection border by two Or more segment be connected to each other.Any target nucleic acid can be contacted with restriction enzyme, wherein restriction enzyme Xiaohua tablet at restriction site Section is to generate connection border in these restriction sites, and wherein the method may include anti-by the connection at connection border Two or more segments should be connected to each other.Optionally, the restriction enzyme can be EcoRI, HindIII or BglII.

Before segment links together, coupling sequence can be attached to two or more target nucleic acid fragments.Optionally Two or more different coupling sequences are attached to target nucleic acid fragment group by ground.

Coupling sequence can include connection border at least one end, and wherein the first coupling sequence is attached to the first target nucleic acid Segment, and wherein the second coupling sequence is attached to the second target nucleic acid fragment, and two of them coupling sequence is connected to each other, because And two target nucleic acid fragments are linked together.

Coupling sequence may include and the first coupling sequence wherein being made to be attached to first in the annealed zones at least one 3 ' end Target nucleic acid fragment, and the second coupling sequence is wherein made to be attached to the second target nucleic acid fragment, and two of them coupling sequence edge Length be at least one nucleotide section it is complimentary to one another and anneal, and wherein archaeal dna polymerase for extending the first coupling At least one of sequence 3 ' holds at least one nucleotide to enter in the sequence of the second target nucleic acid fragment, thus by two target nucleic acids (such as genomic DNA) segment links together.

Before at least two segments link together, the method may also include the step of being crosslinked particle, such as Utilize chemical cross-linking agent such as formaldehyde, paraformaldehyde, glutaraldehyde, two succinimidyl glutarats, the bis- (succinyls of ethylene glycol Imido grpup succinate), carry out with bifunctional crosslinking agent or heterobifunctional agents.

Before at least two segments link together, the method, which may also include, is assigned to two or more for particle In a subregion.

The method makes particle permeabilization during may additionally include incubation step.The step can be distribution (if carrying out) Before, after distribution (if carrying out), before segment links together and/or segment is being linked together it After carry out.

Incubation step can carry out in the presence of chemical surfactant, such as Triton X-100 (C14H22O(C2H4O)n (n=9-10)), NP-40, Tween 20, Tween 80, saponin(e, digitonin (Digitonin) or dodecyl sulphate Sodium.

Incubation step is at least 20 degrees Celsius, at least 30 degrees Celsius, at least 37 degrees Celsius, at least 45 degrees Celsius, at least 50 Degree Celsius, at least 60 degrees Celsius, at least 65 degrees Celsius, at least 70 degrees Celsius, at least 80 degrees Celsius, at least 90 degrees Celsius or extremely It is carried out at a temperature of 95 degrees Celsius few.

Incubation step can be it is at least 1 second long, at least 5 seconds long, at least 10 seconds long, at least 30 seconds long, at least 1 minute long, At least 5 minutes long, at least 10 minutes long, at least 30 minutes long, at least 60 minutes long or at least 3 small durations.

The method may include the sample with protease digestion step (such as with Proteinase K enzymic digestion) digestion particle.Appoint Selection of land, the protease digestion step can be it is at least 10 seconds long, it is at least 30 seconds long, it is at least 60 seconds long, it is at least 5 minutes long, at least It is 10 minutes long, at least 30 minutes long, at least 60 minutes long, at least 3 small durations, at least 6 small durations, at least 12 small durations, or extremely Few 24 small durations.The step can before distribution (if carrying out), after distribution (if carrying out), by segment It is carried out before linking together and/or after segment links together.

The method may include amplification (original) target nucleic acid fragment, and then join two or more gained nucleic acid molecules It is tied.

The step of segment is linked together can produce the nucleic acid molecules of concatermer (concatamerised), it includes At least three, at least five, at least ten, at least 50, at least 100, at least 500 or at least 1000 are single The nucleic acid molecules being attached to each other in continuous nucleic acid molecules.

This method can be used for at least three particle, at least five particle, at least ten particle, at least 50 particles, at least 100 particles, at least 1000 particles, at least 10,000 particles, at least 100,000 particles, at least 1,000,000 it is micro- Grain, at least 10,000,000 particles, at least 100,000,000 particles, at least 1,000,000,000 particles, at least 10, 000,000,000 particle or the generation of at least 100,000,000,000 particles are associated sequence read.

Sample may include at least two be originated from blood particles, wherein each particle contain at least two target nucleic acid (such as Genomic DNA) segment, and wherein the method includes carrying out step (a) to contain at least two target with generate each particle The single nucleic acid molecules of the sequence of nucleic acid fragment, and step (b) is carried out to generate the sequence read that is associated of each particle.

Before the step of at least two target nucleic acids (such as genomic DNA) segment links together, period and/or it Afterwards, the method may include the step of being crosslinked the target nucleic acid fragment in particle.Cross-linking step can use chemical cross-linking agent such as first Aldehyde, paraformaldehyde, glutaraldehyde, two succinimidyl glutarats, ethylene glycol bis- (succinimidyl succinates), same pair Functional crosslinker or heterobifunctional agents carry out.

Before the step of at least two target nucleic acids (such as genomic DNA) segment links together, period and/or it Afterwards, and/or optionally after the step of being crosslinked the target nucleic acid fragment in particle, the method includes making the step of particle permeabilization Suddenly.

Before step (a), the method, which may also include, is assigned to nucleic acid samples at least two differential responses volumes The step of.

It links together at least two target nucleic acid fragments that will recycle particle to generate and contain at least two target nucleic acid piece In one embodiment of the method for the single nucleic acid molecules of the sequence of section, make the sample (example comprising at least one circulation particle Such as, wherein the sample pass through any method disclosed herein obtain and/or purifying) at room temperature in 1% formalin Crosslinking 10 minutes, and formaldehyde crosslinking step then is quenched with glycine.Pass through centrifugation step (such as 5 minutes at 3000 × G) Make particulate deposits, and is resuspended in (the New England of 1 × NEBuffer 2 containing 1.0% lauryl sodium sulfate (SDS) Biolabs in), and 10 minutes are incubated for so that particle permeabilization under 45 degrees Celsius.It is quenched by adding Triton X-100 SDS, and solution is incubated overnight under 37 degrees Celsius with AluI (New England Biolabs) end can be connect to generate flush end End.Enzyme is inactivated by adding SDS to final concentration of 1.0%, and is incubated for 15 minutes under 65 degrees Celsius.Pass through addition SDS is quenched in Triton X-100, and solution is diluted at least 10 times in 1 × buffer to be used for T4 DNA ligase, and Total concentration to DNA is every microlitre of at most 1.0 nanogram DNA.Diluted solution is incubated under 16 degrees Celsius with T4DNA ligase Overnight, it is linked together with the segment of self-loopa in future particle.Then by being incubated in the solution of Proteinase K under 65 degrees Celsius It educated reverse cross-link yesterday and protein component is made to degrade.Then the DNA of purifying connection is (such as with Qiagen spin- Column PCR Purification Kit and/or Ampure XP pearl).Then with the external transposition method of Nextera (Illumina;According to the scheme of manufacturer) attachment Illumina sequencing linking subsequence, carry out an appropriate number of PCR cycle with Expand the material of connection;And then by the sizeable DNA for being expanded and being purified with Illumina sequenator (such as Illumina NextSeq 500 or MiSeq) it is sequenced, it is each that the pairing end of at least 50 bases is read.It will pairing Each end of end sequence is independently mapped to reference to human genome to illustrate the sequence read that is associated (for example, two of them End includes the reading of the sequence of the different genes group DNA fragmentation from single loop particle).

At least two target nucleic acid fragments of particle are linked together to generate the sequence for containing at least two target nucleic acid fragment The method of the single nucleic acid molecules of column can have a variety of unique properties and feature, make it that can it is expected to become for that will come from one Or more circulation particle the method that is associated of sequence.In one aspect, such method, which makes it possible to contact, carrys out self-loopa The sequence of particle is without complicated instrument (for example, for the micro-fluidic of the method based on subregion).In addition, this method is (extensively Ground) it can be carried out in the single individual reaction that may include a large amount of circulations particle (such as hundreds of or thousands of or more numbers), And a large amount of circulation particles are therefore capable of handling without multiple reactions, and this may be necessary in other methods, such as In combined index method.Further, since this method is not to be necessarily required to using bar code and/or polymer bar code reagent, because This is not limited by the size of bar code library (and/or polymer bar code agent library), carrys out being connected for self-loopa particle to realize It is the useful molecules measurement of sequence.

5. being associated by distributing

The method can be to comprising having been dispensed at least two different reaction volumes (or subregion (partition)) The nucleic acid samples of at least two particles carry out.

It is in office where in method, the nucleic acid samples for containing at least two particle can be assigned to at least two different reactants In product (or subregion).Different reaction volumes (or subregion) can be mentioned by different reaction vessels (or different physical reactions containers) For.Different reaction volumes (or subregion) can be provided by different aqueous droplets, for example, the aqueous droplet of difference or solid in lotion The aqueous droplet of difference on body support (such as glass slide).

For example, nucleic acid samples can be distributed before the target nucleic acid fragment that sequence of barcodes is attached to particle.Alternatively, will Before at least two target nucleic acid fragments of particle link together, nucleic acid samples can be distributed.

For being related to any method of allocation step, any step (example after said allocating step of the method Such as any attachment sequence of barcodes or the step of be attached coupling sequence or the step of any connection, annealing, primer extend or PCR) can It is independently carried out on each subregion.Reagent (such as oligonucleotides, enzyme and buffer) can be directly appended to each subregion.In Wherein subregion includes in the method for the aqueous droplet in lotion, and such addition step can pass through the aqueous droplet in fusion lotion Process carry out, such as using micro-fluidic droplet-merging pipeline and simultaneously optionally using mechanically or thermally mixing step.

Subregion includes the different aqueous solution droplets in lotion, and wherein lotion is water-in-oil emulsion, and wherein droplet It is generated by physical shock or vortex procedure, or wherein droplet passes through the fusion aqueous solution in micro-fluidic conduit or connector and oil Solution generates.

Include the method for the aqueous droplet in lotion for wherein subregion, any method known in the art or work can be passed through Tool generates this water-in-oil emulsion.Optionally, this may include commercially available microfluidic system, such as available from 10 × The Chromium system or other systems of Genomics Inc, the number from Raindance Technologies or Bio-Rad Word drop generator, and for the system component-based of micro-fluidic generation and operation, such as Drop-Seq (Macosko et Al., 161,1202-1214 2015, Cell) and inDrop (Klein et al., 2015, Cell 161,1187-1201).

Subregion may include physically nonoverlapping spatial volume different in gel or hydrogel, such as Ago-Gel, The gel of polyacrylamide gel or any covalent cross-linking, such as the poly(ethylene glycol) gel of covalent cross-linking, or include mercaptan official The gel of the covalent cross-linking of the mixture of the poly(ethylene glycol) of the poly(ethylene glycol) and acrylate functional of energyization.

Particulate samples can assign at least ten, at least 100, at least 1000, at least 10,000, at least 100 in total, 000, at least 1,000,000, at least 10,000,000, at least 100,000,000 or at least 1,000,000,000 In a subregion.Preferably, Nanoparticle Solution is assigned in total at least 1000 subregions.

Particulate samples can be assigned in subregion, so that each subregion has averaged less than 0.0001 particle, be less than 0.001 A particle is less than 0.01 particle, is less than 0.1 particle, is less than 1.0 particles, is less than 10 particles, is less than 100 particles, Less than 1000 particles are less than 10,000 particle, are less than 100,000 particle, are less than 1,000,000 particle, are less than 10, 000,000 particle or less than 100,000,000 particle.Preferably, there are average less than 1.0 particles in each subregion.

The solution of particle can be assigned in subregion, so that each subregion exists averaged less than 1.0 Ackers (attogram) DNA, DNA, the DNA less than 10 winged grams, less than 100 less than the DNA of 10 Ackers, less than the DNA of 100 Ackers, less than 1.0 winged grams The DNA flown gram, less than the DNA of 1.0 piks, less than the DNA of 10 piks, less than the DNA or less than 1.0 nanograms of 100 piks DNA.Preferably, there is the DNA less than 10 piks in each subregion.

The volume of subregion is smaller than 100 and ascends to heaven (femtoliter), less than 1.0 picoliters (picoliter), less than 10 skins It rises, less than 100 picoliters, less than 1.0 nanoliters, less than 10 nanoliters, less than 100 nanoliters, less than 1.0 microlitres, less than 10 microlitres, is less than 100 microlitres, or less than 1.0 milliliters.

Sequence of barcodes can be provided in each subregion.For each in two or more subregions comprising sequence of barcodes A, sequence of barcodes wherein included may include multiple copies of same sequence of barcodes, or comprising coming from same sequence of barcodes group Different bar code sequences.

After particle is assigned to two or more subregions, it can be made by the incubation step of any method described herein Particle permeabilization.

The sample of protease digestion step (such as with Proteinase K enzymic digestion) digestion particle can be used.Optionally, the protease Digestion step can be it is at least 10 seconds long, it is at least 30 seconds long, it is at least 60 seconds long, it is at least 5 minutes long, it is at least 10 minutes long, at least It is 30 minutes long, at least 60 minutes long, at least 3 small durations, at least 6 small durations, at least 12 small durations, or at least 24 small durations.It should Step can before a distribution, point after the distribution, before being attached sequence of barcodes, after being attached sequence of barcodes and/or attached Connect sequence of barcodes while progress.

Sequence is attached by combination bar code method

The method of attachment sequence of barcodes may include combining at least two steps of bar shaped method, wherein executing the first bar code Change step, wherein particulate samples are assigned in two or more subregions, wherein each subregion includes different sequence of barcodes Or different sequence of barcodes groups, it is then attached to from target nucleic acid (such as the genome for including particle in the subregion DNA) the sequence of segment, and the bar code nucleic acid molecules of at least two subregions are then wherein merged into the second sample and are mixed Object, and wherein then second sample mixture is assigned in two or more new subregions, wherein each new subregion packet Containing different sequence of barcodes or different sequence of barcodes groups, then it is attached to from two or more new subregions Particle target nucleic acid (such as genomic DNA) segment sequence.

Optionally, combination bar code method may include the first bar code step, in which: A) it will include at least first and second First sample mixture of circulation particle is assigned at least the first and second raw partitions (for example, wherein at least from sample First circulation particle is assigned to the first raw partition, and wherein at least second circulation particle from sample is assigned to the second original In beginning subregion), wherein the first raw partition includes sequence of barcodes (or the sequence of barcodes for being different from including in the second raw partition Group) sequence of barcodes (or sequence of barcodes group), and wherein by the sequence of barcodes for including in the first raw partition (or come from bar code The sequence of barcodes of sequence group) at least the first and second target nucleic acid fragments of first circulation particle are attached to, and wherein by second The sequence of barcodes (or sequence of barcodes from sequence of barcodes group) for including in raw partition is attached to second circulation particle at least First and second target nucleic acid fragments;And it is wherein that at least one the circulation particle for including in the first raw partition and second is original At least one the circulation particle for including in subregion merges to generate the second sample mixture and the second bar code step, in which: B) particle for including in the second sample mixture is assigned in at least first and second new subregions (for example, wherein from second At least first circulation particle of sample mixture is assigned in the first new subregion, and wherein from the second sample mixture extremely Few second circulation particle is assigned in the second new subregion), wherein the first new subregion includes different from including in the second new subregion The sequence of barcodes (or sequence of barcodes group) of sequence of barcodes (or sequence of barcodes group), and the bar code sequence that will include in the first new subregion Column (or sequence of barcodes from sequence of barcodes group) are attached at least the first and second target nucleic acid fragments of first circulation particle, and And the sequence of barcodes (or sequence of barcodes from sequence of barcodes group) for including in the second new subregion is wherein attached to second circulation At least the first and second target nucleic acid fragments of particle.

Optionally, combination bar code method may include the first bar code step, in which: A) it will include at least first and second First sample mixture of circulation particle is assigned at least the first and second raw partitions (for example, wherein at least from sample First circulation particle is assigned in the first raw partition, and wherein at least second circulation particle from sample is assigned to second In raw partition), wherein the first raw partition includes sequence of barcodes (or the sequence of barcodes in bar code oligonucleotides Group), be different from including in the second raw partition includes sequence of barcodes (or the sequence of barcodes in bar code oligonucleotides Group), and the bar code oligonucleotides for including in the first raw partition is wherein attached at least the first of first circulation particle With the second target nucleic acid fragment, and it is micro- that the bar code oligonucleotides for including in the second raw partition is wherein attached to second circulation At least the first and second target nucleic acid fragments of grain;And wherein by include in the first raw partition at least one circulation particle with At least one the circulation particle for including in second raw partition merges to generate the second sample mixture and the second bar code step Suddenly the particle for including in the second sample mixture is assigned at least first and second new subregions (for example, wherein coming, in which: B) It is assigned in the first new subregion from at least first circulation particle of the second sample mixture, and is wherein mixed from the second sample At least second circulation particle of object is assigned in the second new subregion), wherein the first new subregion includes to be included in bar code few nucleosides Sequence of barcodes (or sequence of barcodes group) in acid is included in bar code oligonucleotides different from include in the second new subregion Sequence of barcodes (or sequence of barcodes group), and the bar code oligonucleotides for including in the first new subregion is attached to first circulation At least the first and second target nucleic acid fragments of particle, and be wherein attached the bar code oligonucleotides for including in the second new subregion To at least the first and second target nucleic acid fragments of second circulation particle.

Optionally, combination bar code method may include the first bar code step, in which: A) it will include at least first and second First sample mixture of circulation particle is assigned at least the first and second raw partitions (for example, wherein extremely from sample Few first circulation particle is assigned in the first raw partition, and wherein at least second circulation particle from sample is assigned to the In two raw partitions), wherein the first raw partition includes sequence of barcodes (or the sequence of barcodes in bar code oligonucleotides Group), it includes sequence of barcodes (or sequence of barcodes in bar code oligonucleotides that be different from the second raw partition, which include interior, Group), and the bar code oligonucleotides for including in the first raw partition is wherein connected at least the first of first circulation particle With the second target nucleic acid fragment, and it is micro- that the bar code oligonucleotides for including in the second raw partition is wherein connected to second circulation At least the first and second target nucleic acid fragments of grain;And wherein by include in the first raw partition at least one circulation particle with At least one the circulation particle for including in second raw partition merges to generate the second sample mixture and the second bar code step Suddenly the particle for including in the second sample mixture is assigned in at least first and second new subregions (for example, wherein, in which: B) At least first circulation particle from the second sample mixture is assigned in the first new subregion, and wherein mixed from the second sample At least second circulation particle for closing object is assigned in the second new subregion), wherein the first new subregion includes to be included in bar code widow core Sequence of barcodes (or sequence of barcodes group) in thuja acid is included in bar code oligonucleotides different from include in the second new subregion Interior sequence of barcodes (or sequence of barcodes group), and the bar code oligonucleotides for including in the first new subregion is wherein connected to At least the first and second target nucleic acid fragments of one circulation particle, and the bar code few nucleosides that wherein will include in the second new subregion Acid is connected at least the first and second target nucleic acid fragments of second circulation particle.

Optionally, bar code method is combined can include: A) chemical crosslinking step, wherein utilizing chemical cross-linking agent (such as first Aldehyde) make the samples crosslinking comprising at least first and second circulation particles, and subsequently optionally wherein cross-linking step by the way that step is quenched Suddenly terminate, such as formaldehyde crosslinking step is quenched by mixing sample with glycine solution, and/or subsequently optionally make through handing over Connection particle permeabilization is (that is, be close to genomic DNA (and/or other target nucleic acids) segment physically, so that it can be by into one Step operation;For example, make its can in bar code step bar code);Optionally, permeabilization as any of them by with chemistry Surfactant (such as nonionic detergent) is incubated with to carry out;And B) the first bar code step, wherein will be comprising extremely First sample mixture of few first and second circulations particle is assigned at least the first and second raw partitions (for example, wherein At least first circulation particle from sample is assigned in the first raw partition, and the wherein at least second circulation from sample Particle is assigned in the second raw partition), wherein the first raw partition includes the bar code sequence in bar code oligonucleotides It arranges (or sequence of barcodes group), be different from including in the second raw partition includes sequence of barcodes in bar code oligonucleotides (or sequence of barcodes group), and the bar code oligonucleotides for including in the first raw partition is wherein connected to first circulation particle At least the first and second target nucleic acid fragments, and wherein the bar code oligonucleotides for including in the second raw partition is connected to At least the first and second target nucleic acid fragments of second circulation particle;And wherein by include in the first raw partition at least one At least one the circulation particle for including in circulation particle and the second raw partition merges to generate the second sample mixture and C) Second bar code step, wherein the particle for including in the second sample mixture is assigned in at least first and second new subregions It (for example, wherein at least first circulation particle from the second sample mixture is assigned in the first new subregion, and wherein comes from At least second circulation particle of second sample mixture is assigned in the second new subregion), wherein the first new subregion is included in Sequence of barcodes (or sequence of barcodes group) in bar code oligonucleotides is included in bar code different from include in the second new subregion Change the sequence of barcodes (or sequence of barcodes group) in oligonucleotides, and the bar code oligonucleotides for including in the first new subregion is connected The bar code for being connected at least the first and second target nucleic acid fragments of first circulation particle, and wherein will including in the second new subregion Oligonucleotides is connected at least the first and second target nucleic acid fragments of second circulation particle.

Optionally, in any combobar code method, first and/or second (and/or other) bar code step Before, the method may include making to recycle particle and/or making target nucleic acid fragment (such as the base in one or more circulation particles Because of a group DNA fragmentation) crosslinking the step of.The step can use chemical cross-linking agent such as formaldehyde, paraformaldehyde, glutaraldehyde, two succinyls Imido grpup glutarate, ethylene glycol bis- (succinimidyl succinates), with bifunctional crosslinking agent or heterobifunctional agents It carries out.The step can before any permeabilization step, after any permeabilization step, before any allocation step, any attachment strip Code sequence the step of before, any attachment sequence of barcodes the step of after and be attached sequence of barcodes simultaneously, or any combination thereof under It carries out.Any such cross-linking step can further be terminated by cancellation step, such as be quenched by mixing with glycine solution It goes out formaldehyde crosslinking step.It can be before the specific subsequent step of lab scenario, such as in primer extend, PCR or nucleic acid purification Before step, any such crosslinking is further removed.The step of being crosslinked by chemical cross-linking agent is for will be in each particle Genomic DNA (and/or other target nucleic acids) segment keeps physical access each other, so that can be in the basic structure property for keeping particle While (that is, keep the genomic DNA fragment from same particle physical proximity while) operate and handle sample

Optionally, in any combobar code method, in the step of being chemically crosslinked after step, can make crosslinked micro- Grain permeabilization is (that is, be close to genomic DNA (and/or other target nucleic acids) segment physically, so that it can further be grasped Make;For example, make its can in bar code step bar code);This permeabilization can for example by with chemical surfactant (such as Nonionic detergent) it is incubated with to carry out.Optionally, the chemical surfactant for this permeabilization step may include Triton X-100(C14H22O(C2H4O)n(n=9-10)), NP-40, Tween 20, Tween 80, saponin(e, digitonin Or lauryl sodium sulfate.

Optionally, in any combobar code method, after being chemically crosslinked step any one or more step In rapid, crosslinking can partly or completely all round reversing (for example, making genomic DNA (and/or other target nucleic acids) segment physically more It is close to, so that it can be further operable;For example, make its can in bar code step bar code;Crosslinking reverse can example It is such as carried out by being incubated at high temperature, for example, at least 45 DEG C, at least 50 DEG C, at least 55 DEG C, at least 60 DEG C, at least 65 DEG C, until It is 70 DEG C few, at least 75 DEG C, at least 80 DEG C, at least 85 DEG C, or at least 90 DEG C;In addition, crosslinking reverse can for example carry out a Duan Te Fixed duration, for example, at least 1 minute, at least 5 minutes, at least 10 minutes, at least 20 minutes, at least 30 minutes, at least 60 Minute, at least 2 hours, at least 3 hours, at least 5 hours, or at least 24 hours.

Optionally, in any combobar code method, (the example any one or more attachment sequence of barcodes the step of Such as any attachment and/or the step of connecting bar code oligonucleotides), and/or any one or more will be one or more Sample (for example, circulation particle) be assigned to the step in different subregions, and/or any one or more by two or more Circulation particle is merged into the step in single subregion, and/or any one or more chemical crosslinking step and/or any one After other a or more steps, purification process can be used, wherein relative to other in the solution used in the step Ingredient preferentially purifies and separates particle.Purification step as any one or more may include size exclusion chromatography method. Purification step as any one or more may include size centrifugation (such as differential centrifugation) method.

Optionally, in any combobar code method, any one described herein or more method can be passed through (such as single-stranded connection, double-strand connection, flush end connection, the connection of A tail, glutinous end mediate connection, hybridization, hybridization and extension, hybridization and Extension and connection, and/or swivel base) attachment sequence of barcodes.

Optionally, during any step of any combobar code method, at least two, at least three, at least five, extremely 10, at least 20, at least 50, at least 100, at least 200, at least 500, at least 1000, at least 2000 few, At least 5000, at least 10,000, at least 50,000, at least 100,000, at least 500,000 or at least 1,000, 000 circulation particle may include that (and/or at least the first and second subregions are in each in subregion;And/or any more big figure Subregion in).Preferably, at least 50 circulation particles may include in subregion (and/or at least the first and second subregions each It is interior;And/or in any greater number of subregion).

Optionally, during any step of any combobar code method, at least two, at least three, at least 5 can be used A, at least ten, at least 20, at least 50, at least 100, at least 200, at least 500, at least 1000, at least 2000, at least 5000, at least 10,000, at least 50,000, at least 100,000, at least 500,000, at least 1,000,000, at least 10,000,000 or at least 100, circulation particle (for example, can be assigned to by 000,000 subregion In the subregion of the number).Preferably, during any step of any combobar code method, at least 24 points be can be used Area's (for example, circulation particle can be assigned in the subregion of the number).

Optionally, in any step of any combobar code method, particulate samples can be assigned in subregion, so that often A subregion exist averaged less than 0.0001 particle, less than 0.001 particle, less than 0.01 particle, less than 0.1 particle, Less than 1.0 particles, less than 100 particles, less than 1000 particles, less than 10,000 particles, are less than less than 10 particles 100,000 particles are less than 1,000,000 particles, are micro- less than 10,000,000 particles or less than 100,000,000 Grain.Preferably, there are average less than 1.0 particles in each subregion.

Optionally, in any step of any combobar code method, Nanoparticle Solution can be assigned in subregion, so that often There is the DNA averaged less than 1.0 Ackers in a subregion, less than the DNA of 10 Ackers, less than the DNA of 100 Ackers, less than 1.0 winged grams DNA, the DNA less than 10 winged grams, the DNA less than 100 winged grams are less than 100 less than the DNA of 10 piks less than the DNA of 1.0 piks The DNA of the DNA of pik or less than 1.0 nanograms.Preferably, there is the DNA less than 10 piks in each subregion.

Optionally, in any step of any combobar code method, the volume of subregion is smaller than 100 and ascends to heaven, and is less than 1.0 picoliters, it is micro- less than 1.0 less than 100 nanoliters less than 10 nanoliters less than 1.0 nanoliters less than 100 picoliters less than 10 picoliters It rises, less than 10 microlitres, less than 100 microlitres, or less than 1.0 milliliters.

Optionally, any combination bar code method may include at least two, at least three, at least four, at least five, at least 10 A, at least 20, at least 30, at least 40, at least 50, at least 100, at least 500 or at least 1000 different items Codeization step.Each bar code step can be as herein to described in the first and second bar code steps.

Optionally, in any combobar code method, any one or more allocation step may include random character, For example, the circulation particle of estimated number (rather than definite or exact number) can be assigned in one or more subregions;That is, The number of the circulation particle of each subregion can have statistics or probabilistic uncertainty (such as by Poisson load and/or Distribution statistics).

Optionally, in any combobar code method, the code character for being attached to particular sequence (such as is attached to genome The sequence of DNA fragmentation;For example, being attached to the first bar code of the sequence during including the first bar code step and in Article 2 The group of the second bar code of the sequence is attached to during codeization step) can be used for will from the sequence of single particle be associated and/ Or the sequence of the group from two or more particles is associated.Optionally, identical in any combobar code method The group of two (or more than two) bar codes attach to from two or more circulation particle (for example, it is wherein described two or More circulation particles are separately dispensed into in a series of first and second subregion during the first and second bar code steps) Particular sequence (such as the sequence for being attached to genomic DNA fragment).Optionally, identical in any combobar code method Two (or more than two) bar codes of group are attached to from only one circulation particle (to exist for example, only one recycle particle Be separately dispensed into during first and second bar code steps in a series of first and second subregion) particular sequence (such as It is attached to the sequence of genomic DNA fragment).

Optionally, in any combobar code method, used in any one or more bar code steps point The number of the number in area and different bar code steps can combine in combination, so that on average, two (or more) bar codes Each group attach to from only one circulation particle sequence.For example, for the sample for recycling particle comprising 1000, first 100 subregions respectively can be used to the second bar code step (to related bar code wherein included);Then difference code character is total Number will be equal to (100 × 100=) 10,000 different code characters;With 1000 circulation particle phases for including in primary sample The sequence for being only from (or conceptive be less than an one) circulation particle is therefore averagely attached to than, each code character.Any In some different embodiments for combining bar code method, the number of the subregion used in any one or more bar code steps The number of mesh and/or different bar code steps be can be improved and/or be reduced to realize desired resolution ratio and/or level of sensitivity (for example, it is contemplated that the needs of the sample to analysis comprising different number of circulation particle, and/or the different bar codes to different application Change specific requirements).Optionally, in some applications, have imperfect and/or inefficient bar code method (for example, wherein existing Only a fraction of sequence in one or more bar code steps from specific particle is attached to bar code;And/or for example wherein Identical group of sequence of barcodes is attached to the sequence from two or more circulation particles) it may make enough molecules and/or letter Breath resolution ratio becomes can be to realize that desired signal and/or sequencing are read.

Combination bar code method can provide the advantages of better than substitution bar code method, form be reduce to fine and/or The demand of complex device, to realize that the code character that potentially identifies of higher number is used to for bar code to be attached to the sequence for carrying out self-loopa particle It arranges (such as from genomic DNA fragment).For example, in two different bar code steps using 96 different subregions (for example, Be easy to be realized with 96 orifice plate of standard widely used in molecular biology) combination bar code method can realize net (96 × 96 =) 9216 different code characters;Compared with the non-combined method of substitution, this significantly reduces execute to divide needed for this index Area's quantity.It is used in one or more such bar code steps by improving the number of bar code step and/or improving Subregion number, significant higher levels of combined index resolution ratio can be further realized.In addition, combination bar code method can disappear Except to complex instrument (for example, micro-fluidic instrument (such as 10 × Genomics Chromium for substituting bar code method System demand)).

6. being associated by space sequencing or sequencing in situ or library construction in situ

The present invention provides method of the preparation for the sample of sequencing, wherein the sample includes the particle from blood, and And wherein the particle contains at least two target nucleic acid (such as genomic DNA) segment, and the method comprise the steps that (a) Preparation for sequencing sample, wherein by least two target nucleic acid fragments of particle by its degree of approach on sequencing device that This connection, to generate at least two groups for being associated target nucleic acid fragment;And (b) using sequencing device to each target nucleus that is associated Acid fragment is sequenced, and is associated sequence read with generating at least two.

Nucleic acid samples may include at least two particles for being originated from blood, wherein each particle contains at least two target nucleic acid (such as genomic DNA) segment, and wherein this method includes the target nucleic acid that is associated for carrying out step (a) to generate each particle The group of segment, and wherein the target nucleic acid fragment of each particle is spatially different on sequencing device, and carries out step (b) To generate the sequence read that is associated of each particle.

At least two segments from particle can keep physical proximity each other in sequencing device itself or thereon, and Wherein the physical proximity be it is known or can by sequencing device pass through or during its operation determine or observe, and its At least two sequences for being associated by this measurement of middle physical proximity.

The method may include being sequenced using library constructing method in situ.In the method, sample will can be come from Complete or partially complete particle be placed on sequenator, and wherein by two or more target nucleic acids (such as genome DNA) segment is processed into the ready template of sequencing in sequenator, i.e., is sequenced using library constructing method in situ.Library in situ Building in Schwartz et al (2012) PNAS 109 (46): 18749-54) in description.

The method may include sequencing in situ.In the method, sample can keep it is complete (such as major part or part it is complete It is whole), and to target nucleic acid (such as genomic DNA) segment direct Sequencing in particle, for example, using Lee etc. (2014) Science, 343,6177,1360-1363) ' FISSEQ ' fluorescent in situ sequencing technology described in.

Optionally, particulate samples can be crosslinked with chemical cross-linking agent, and be subsequently placed in sequencing device or thereon, and then be protected Hold physical access each other.Optionally, it is placed in sequencing device or two or more target nucleic acid (examples from particle thereon Such as genomic DNA) segment can then be determined by sequencing approach its sequence all or part of.Optionally, fluorescence can be passed through Such segment is sequenced in sequencing technologies in situ, wherein determining the sequence of the segment by optics sequencing approach.Optionally One or more coupling sequences, linking subsequence or extension increasing sequence can be attached to the target nucleic acid fragment by ground.Optionally, The segment can expand in amplification procedure, and wherein the physical access of their segment of amplified production holding with amplification or physics connect Touching.Optionally, then these amplified productions are sequenced by optics sequencing approach.Optionally, the amplified production is attached It is connected to plane surface, such as sequencing flow cell.Optionally, flow cell is respectively formed by the amplified production that individual chip generates Interior single cluster.Optionally, in any method as described above, it is logical that the distance between molecule is sequenced in any two or more The configuration a priori known in sequencing device are crossed, or can determine or observe in sequencing procedure.Optionally, each sequencing molecule quilt It is mapped in cluster field or in pixel array, the distance between molecule is sequenced by the cluster or pixel in any two of them or more The distance between determine.Optionally, any measure or estimate of distance or the degree of approach can be used for contacting any two or more Determining sequence.

Optionally, the sequence determined by above-mentioned any method can be further evaluated, wherein two or more are sequenced The distance between molecule or the measured value of the degree of approach are compared with one or more cutoff values or threshold value, and only will be in spy Determine to be determined as being associated in information in range or higher or lower than the molecule of determining specific threshold or cutoff value.Optionally, The group of two or more such cutoff values or threshold value or its range can be used, so that can determine that any two or more are surveyed The different degrees of and/or classification of the connection of sequence molecule and/or classification.

7. being associated by independent sequential grammar

Method the present invention provides preparation for the sample of sequencing, wherein the sample includes the particle from blood, And wherein the particle contains at least two target nucleic acid (such as genomic DNA) segment, and the method comprise the steps that (a) sample of the preparation for sequencing, wherein by least two target nucleic acids (such as genomic DNA) segment of each particle by adding It is downloaded in individual sequencing procedure and is associated, to generate at least two groups for being associated target nucleic acid fragment;And (b) using survey Each target nucleic acid fragment that is associated is sequenced in sequence device, to generate at least two groups for being associated sequence read.

Sample may include at least two be originated from blood particles, wherein each particle contain at least two target nucleic acid (such as Genomic DNA) segment, and the method may include the target nucleic acid fragment that is associated for carrying out step (a) to generate each particle, Wherein at least two target nucleic acid fragments of each particle are associated by being loaded into individual sequencing procedure, and to each Sequencing procedure carries out step (b) to generate the sequence read that is associated of each particle.

In the method, the segment of the first single particle (or particle group) can be surveyed independently of the segment of other particles Sequence, and resulting sequence read is associated in information;It include that segment in the second single particle (or particle group) is only It stands on the first particle or particle group is sequenced, and resulting sequence read is associated in information.

Optionally, (all sequencing procedures) the first and second sequencing procedures are carried out with different sequenators, and/or use phase Same sequenator is still carried out in two different times or in two different sequencing procedures.Optionally, first and second Sequencing procedure is carried out with identical sequenator, but in two different regions, subregion, compartment, conduit, flowings of sequenator It is carried out in pond, swimming lane, nano-pore, micro rack, micro rack array or integrated circuit.Optionally, 3 or more, 10 or more Multiple, 1000 or more, 1,000,000 or more or 1,000,000,000 or more particles or particle group It can be associated by the above method.

8. expanding original segments before being associated

As technical staff will understand, term " segment " used herein is (for example, " genomic DNA fragment " or " target Nucleic acid fragment " or " genomic DNA fragment from particle ") refer to the original segments being present in particle and its part, copy Shellfish or amplicon, the copy of the only a part (such as its amplicon) including original segments, and through modification segment or copy (example Such as the segment of attached coupling sequence).For example, term genomic DNA fragment refers to the original gene group being present in particle DNA fragmentation, and, for example, the DNA molecular that can be prepared by primer extension reaction from original gene group DNA fragmentation.As another A example, term mRNA segment refer to the original mRNA segment being present in particle, and, for example, can lead to reverse transcription from original The cDNA molecule of mRNA segment preparation.

Before the step of being attached sequence of barcodes, the method may also include the step of the original target nucleic acid fragment of amplification particle Suddenly, for example, passing through primer extension procedures or polymerase chain reaction step.It may then use that any method described herein will Sequence of barcodes is attached to the amplicon or copy of original target nucleic acid fragment.

The section comprising one or more degeneracy bases can be used in primer extension procedures or polymerase chain reaction step One or more of primers carry out.

Primer extension procedures or polymerase chain reaction step can be used to specific target nucleic acid sequence (such as specific target base Because of a group DNA sequence dna) there are one or more of primers of specificity to carry out.

Amplification step can by strand displacement polymerase (such as Phi29DNA polymerase or Bst polymerase or Bsm polymerase, Or phi29, Bst or Bsm polymerase through modified derivative) Lai Jinhang.Amplification can be reacted and be wrapped by multiple displacement amplification The primer sets in the region containing one or more degeneracy bases carry out.Optionally, using random hexamer, random heptamer, Random eight aggressiveness, random nine aggressiveness or random ten mer primer.

Amplification step may include extending the single-stranded nick in the segment of original target nucleic acid through archaeal dna polymerase.Notch can It is generated by the enzyme with single stranded DNA cutting behavior or by sequence-specific notch restriction endonuclease.

Amplification step may include being introduced at least one or more dUTP nucleotide to replicate or expand by archaeal dna polymerase Increase at least part of one or more genomic DNA fragments in the DNA chain that synthesizes, and wherein passes through uracil and cut off Enzyme (such as uracil dna glycosylase) generates notch.

Amplification step may include on the nucleic acid comprising genomic DNA fragment generate cause sequence, wherein cause sequence by Primase (such as Thermus Thermophilus PrimPol polymerase or TthPrimPol polymerase) generates, and wherein At least one nucleotide for the sequence that archaeal dna polymerase is used to that the initiation sequence to be used to replicate genomic DNA fragment as primer.

Amplification step can be carried out by linear amplification reaction, such as the RNA amplification mistake carried out by the way that process is transcribed in vitro Journey.

Amplification step can be carried out by primer extension procedures or polymerase chain reaction step, and therefore wherein be used Primer corresponds to one or more general universal primers for causing sequence.General initiation sequence can be reacted by connection, is logical It crosses primer extend or polymerase chain reaction or genomic DNA fragment is attached to by the reaction of external swivel base.

9. coupling sequence is attached to segment before connection

In office where in method, sequence of barcodes directly or indirectly (such as by annealing or connecting) can be attached to the target nucleus of particle Sour (such as gDNA) segment.Sequence of barcodes is attached to the coupling sequence (such as composition sequence) for being attached to segment.

It is including that at least two target nucleic acid fragments of particle link together to generate in the method for single nucleic acid molecules, Coupling sequence can be attached to each of at least two segments first, and can then be contacted segment by coupling sequence Together.

Coupling sequence is attached to the original target nucleic acid fragment of particle or its copy or amplicon.

Coupling sequence may be added to that the 5 ' ends or 3 ' ends of two or more segments of nucleic acid samples.In the method, (item Codeization oligonucleotides) target region may include the sequence complementary with coupling sequence.

Coupling sequence may include in double-strand coupling oligonucleotides or in single-stranded coupling oligonucleotides.Being coupled oligonucleotides can Reaction is connected by double-strand or single-stranded connection reaction is attached to target nucleic acid.Being coupled oligonucleotides may include that can connect with target nucleic acid Single-stranded 5 ' or 3 ' regions, and coupling sequence can by single-stranded connection react be attached to target nucleic acid.

Coupling oligonucleotides may include flush end, female end or the jag 5 ' that can be connect with target nucleic acid or 3 ' regions, and Coupling sequence can connect reaction by double-strand and be attached to target nucleic acid.

The end of target nucleic acid fragment can be converted into flush end duplex ends in flat end reaction, and being coupled oligonucleotides can Comprising flush end duplex ends, and wherein, coupling oligonucleotides can be connect in flush end is connected and reacted with target nucleic acid fragment.

The end of target nucleic acid fragment can convert flush end duplex ends for its end in flat end reaction, and then by it End is converted into the form with single 3 ' adenosine jag, and wherein coupling oligonucleotides may include having single 3 ' thymus gland The duplex ends of pyrimidine jag can anneal with single 3 ' the adenosine jag of target nucleic acid fragment, and wherein coupling is few Nucleotide is connect in double-strand A/T connection reaction with the segment of target nucleic acid.

Target nucleic acid can be contacted with restriction enzyme, and wherein restriction enzyme digests target nucleic acid in restriction site at restriction site Connection border is generated, and wherein coupling oligonucleotides includes the compatible end of border to be connect with these, and wherein coupling is few Nucleotide is then connect in double-strand connection reaction with target nucleic acid.

Coupling oligonucleotides can be attached by primer extend or polymerase chain reaction step.

One or more oligonucleotides comprising causing section can be used to pass through primer extend or polymerase chain reaction Step is attached coupling oligonucleotides, and the initiation section includes one or more degeneracy bases.

It can be used also comprising there is the initiation of specificity for specific target nucleic acid sequence or hybridize the one or more of section A oligonucleotides is attached coupling oligonucleotides by primer extend or polymerase chain reaction step.

Coupling sequence can be added by polynucleotides tailings reactions.Coupling sequence can pass through terminal enzyme (DNA) (such as end Deoxynucleotidyl transferase) Lai Tianjia.Coupling sequence can be added by the polynucleotides carried out with terminal deoxynucleotidyl transferase End reaction is to be attached, and wherein coupling sequence includes at least two continuous nucleotides with poly- sequence.

Coupling sequence may include with poly- 3 ' tail (such as poly (A) tail).Optionally, in such method, (bar code is few Nucleotide) same poly- 3 ' tail (such as poly (T) tail) of the target region comprising complementation.

Coupling sequence may include synthesizing in transposons, and can be reacted by external swivel base to be attached.

Coupling sequence is attached to target nucleic acid, and wherein bar code oligonucleotides by least one primer extension procedures or Polymerase chain reaction step is attached to target nucleic acid, and wherein the bar code oligonucleotides includes complementary with the coupling sequence Length be at least one nucleotide region.Optionally, which is located at 3 ' ends of bar code oligonucleotides.Optionally, The complementary region length is at least two nucleotide, and length is at least five nucleotide, and length is at least ten nucleotide, and length is At least 20 nucleotide or length are at least 50 nucleotide.

10. the optional additional step of the method

The method may include one or more bases for determining the sample of self-contained one or more circulation particles Existence or non-existence because of at least one in group DNA fragmentation through modified nucleoside acid or nucleobase.The method may include that circulation is micro- Measurement (such as measurement is through modified nucleoside acid or nucleobase) through modified nucleoside acid or nucleobase in the genomic DNA fragment of grain. Measured value can be circulation particle analysed genomic DNA fragment (be associated genomic DNA fragment) total value and/or Measured value can be the value of each analysed genomic DNA fragment.It can be 5- methyl born of the same parents through modified nucleoside acid or nucleobase Pyrimidine or 5- hydroxy-methyl cytimidine.

Carrying out the measurement through modified nucleoside acid or nucleobase in one or more genomic DNA fragments of self-loopa particle makes It must be able to carry out different kinds of molecules and information analysis, the measurement of the sequence of described segment itself can be supplemented.In one aspect, coming The measurement (i.e. the measurement of " apparent gene group ") of " epigenetics " so-called in the genomic DNA fragment of self-loopa particle label Make it possible to be compared with the list with reference to apparent genetic sequence and/or the apparent genetic sequence of reference (and/or relative to It is mapped).This makes compared with only 4 kinds of (unmodified) bases of measurement standard and/or its traditional " science of heredity " sequence, The analysis of " orthogonal (orthogonal) " form can be carried out to the sequence for the genomic fragment for carrying out self-loopa particle.In addition, through The measurement of modified nucleoside acid and/or nucleobase, which may make, can more accurately determine and/or estimate to follow to one or more The cell of ring particle and/or the type of tissue.Since cell types different in vivo shows different epigenetics features (epigenetic signature), therefore the measurement for carrying out the apparent gene group of the genomic DNA fragment of self-loopa particle can be because This allows more accurate this particle to the mapping of cell type.In the method, carry out the genomic DNA piece of self-loopa particle Section epigenetics measurement can with correspond to specific specific tissue in methylation and/or it is methylolated refer to epigenetic The list (or multiple lists) for learning sequence is compared (for example, mapping to it).This may make to illustrate and/or be enriched with and come from Particle (such as the phase from specific particle of particular tissue type and/or specific health and/or illing tissue's (such as cancerous tissue) Contact sequence group).For example, the measurement through modified nucleoside acid or nucleobase in the genomic DNA fragment of circulation particle may make It can identify the sequence that is associated (or the sequence read that is associated) of the genomic DNA fragment from cancer cell.In another example In, the measurement through modified nucleoside acid or nucleobase recycled in the genomic DNA fragment of particle may make and can identify from tire The sequence that is associated (or the sequence read that is associated) of the genomic DNA fragment of youngster's cell.It is specific through modified nucleoside acid or nucleobase Absolute magnitude can in specific organization health status and/or disease it is related.For example, compared with normal healthy tissues, cancerous tissue The level of middle 5- hydroxymethyl cytimidine is strong to be changed;Therefore, carry out the 5- hydroxyl-in the genomic DNA fragment of self-loopa particle The measurement of methylcystein, which may make, more accurately can detect and/or analyze the circulation particle from cancer cell.

The method may include the measurement of the 5-methylcytosine in the genomic DNA fragment for recycle particle (for example, measurement Recycle the 5-methylcytosine in the genomic DNA fragment of particle).The method may include the genomic DNA piece for recycling particle The measurement of 5- hydroxy-methyl cytimidine in section is (for example, the 5- hydroxy-methyl in the genomic DNA fragment of measurement circulation particle Cytimidine).

The method may include the measurement of the 5-methylcytosine in the genomic DNA fragment for recycle particle (for example, measurement Recycle the 5-methylcytosine in the genomic DNA fragment of particle), wherein the measurement is carried out using enrichment probe, with other It is compared through modification or unmodified base, the enrichment probe specificity or the preferential 5- first in conjunction in genomic DNA fragment Base cytimidine.The method may include the measurement (example of the 5- hydroxy-methyl cytimidine in the genomic DNA fragment for recycle particle Such as, the 5- hydroxy-methyl cytimidine in the genomic DNA fragment of measurement circulation particle), wherein the measurement uses enrichment probe It carries out, compared with other are through modification or unmodified base, the enrichment probe specificity or preferential in conjunction with genomic DNA piece 5- hydroxy-methyl cytimidine in section.

The method may include the survey of the 5-methylcytosine in the genomic DNA fragment of two or more circulation particles Amount is (for example, measuring the 5-methylcytosine in the genomic DNA fragment of first circulation particle and measuring second circulation particle 5-methylcytosine in genomic DNA fragment).The method may include the genomic DNA of two or more circulation particles The measurement of 5- hydroxy-methyl cytimidine in segment is (for example, the 5- hydroxyl in the genomic DNA fragment of measurement first circulation particle Base-methylcystein and measure the 5- hydroxy-methyl cytimidine in the genomic DNA fragment of second circulation particle).

The method may include the survey of the 5-methylcytosine in the genomic DNA fragment of two or more circulation particles Amount is (for example, measuring the 5-methylcytosine in the genomic DNA fragment of first circulation particle and measuring second circulation particle 5-methylcytosine in genomic DNA fragment), wherein the measurement is carried out using enrichment probe, with other through modification or not Modified base is compared, the enrichment probe specificity or the preferential 5-methylcytosine in conjunction in genomic DNA fragment.Institute The method of stating may include the measurement (example of the 5- hydroxy-methyl cytimidine in the genomic DNA fragment of two or more circulation particles Such as, it measures the 5- hydroxy-methyl cytimidine in the genomic DNA fragment of first circulation particle and measures second circulation particle 5- hydroxy-methyl cytimidine in genomic DNA fragment), wherein the measurement is carried out using enrichment probe, with other through modifying Or unmodified base is compared, the enrichment probe specificity or the preferential 5- hydroxy-methyl in conjunction in genomic DNA fragment Cytimidine.

The method may include the measurement of the 5-methylcytosine in the genomic DNA fragment for recycle particle (for example, measurement Recycle the 5-methylcytosine in the genomic DNA fragment of particle), wherein the measurement using bisulfite conversion method or Oxidative hydrogen salt method for transformation carries out.The method may include 5- hydroxyl-first in the genomic DNA fragment for recycle particle The measurement (for example, 5- hydroxy-methyl cytimidine in the genomic DNA fragment of measurement circulation particle) of base cytimidine, wherein institute Measurement is stated to carry out using bisulfite conversion method or oxidative hydrogen salt method for transformation.

The method may include the survey of the 5-methylcytosine in the genomic DNA fragment of two or more circulation particles Amount is (for example, measuring the 5-methylcytosine in the genomic DNA fragment of first circulation particle and measuring second circulation particle 5-methylcytosine in genomic DNA fragment), wherein the measurement uses bisulfite conversion method or oxidative Hydrogen salt method for transformation carries out.The method may include the 5- hydroxyl-in the genomic DNA fragment of two or more circulation particles The measurement of methylcystein is (for example, the 5- hydroxy-methyl cytimidine in the genomic DNA fragment of measurement first circulation particle is simultaneously And the 5- hydroxy-methyl cytimidine in the genomic DNA fragment of measurement second circulation particle), wherein the measurement uses sulfurous Sour hydrogen salt method for transformation or oxidative hydrogen salt method for transformation carry out.

Optionally, carry out the sequence of two or more component parts of the sample of self-contained one or more circulation particles It can be used as association to determine, to determine at least one in one or more genomic DNA fragments from the sample through modifying The existence or non-existence of nucleotide or nucleobase.For example, enriching step can be carried out to include through modified base (example in enriched sample Such as 5-methylcytosine or 5- hydroxymethyl cytimidine) genomic DNA fragment, wherein can be to the sample comprising genomic fragment Be sequenced by the first component part that the enriching step is enriched with, and can also be to the sample comprising genomic fragment (such as being sequenced in independent sequencing reaction) is not sequenced by the second component part that the enriching step is enriched with.Appoint Selection of land, second component part of sample may include the non-enrichment generated during enrichment process and/or supernatant fraction (example Such as, the fraction not combined by enrichment probe or affinity probe during enrichment process).Optionally, primary sample can be divided into first With the second subsample, wherein the first subsample is for carrying out enriching step to generate the first component part of sample, and wherein Second component part of sample may include the subsample of the second non-enrichment.Can to sample two or more enrichment and/ Or (such as bisulfite conversion and/or the conversion of oxidative hydrogen salt) and/or unconverted composition are not enriched with and/or converted Partial any combination is sequenced.For example, the sample comprising one or more circulation particles can be used for generating three compositions Part, such as the component part (alternatively, by the component part of bisulfite conversion) rich in 5-methylcytosine DNA, Component part (alternatively, the component part for being oxidized bisulfite conversion) rich in 5- hydroxy-methyl cytimidine and not (and/or unconverted) component part of enrichment.Optionally, two or more any such component parts of sample can be It is individually sequenced in independent sequencing reaction (such as in independent flow cell, or in the independent swimming lane of single flow cell).Appoint Two or more any such parts of selection of land, sample are attached to identification sequence of barcodes (for example, it identifies the richness of sample Given sequence in collection or the component part that is not enriched with), and then in identical sequencing procedure (such as in identical flow cell In interior or flow cell swimming lane) sequencing.

Optionally, it is described herein it is any sequence is associated method (for example, by attachment sequence of barcodes, such as Two or more polymer bar codes are come from by being attached the sequence of barcodes from polymer bar code reagent or passing through attachment The sequence of barcodes in the library of reagent) it can carry out before any such enrichment and/or molecule step of converting (for example, wherein existing Such contact method is carried out on primary sample comprising at least one circulation particle or at least two circulation particles, wherein being connected Be sequence consequently as list entries for be enriched with or molecular conversion process).

For example, the sample comprising two or more circulation particles is attached to from two or more polymer bar codes The sequence of barcodes for changing the library of reagent, wherein the first and second sequence of barcodes from the first polymer bar code reagent are attached to The first and second genomic DNA fragments from first circulation particle, and wherein from the second polymer bar code reagent First and second sequence of barcodes are attached to the first and second genomic DNA fragments from second circulation particle, and wherein institute The genomic DNA fragment for obtaining bar code attachment is rich in 5-methylcytosine (and/or 5- hydroxymethyl cytimidine), and wherein then The genomic DNA fragment of enrichment is sequenced, wherein subsequent sequence of barcodes is used to determine that the segment of which enrichment to be attached to From the bar code of same polymer bar code reagent, and it is micro- thus to predict that the segment of (or determination) which enrichment is included in same circulation Intragranular.In this example, the second sequencing reaction can be also carried out to the genomic DNA fragment not being enriched with (for example, by walking to enrichment Genomic fragment in rapid supernatant fraction (the non-enriched fraction not captured) is sequenced), wherein subsequent sequence of barcodes For determining that the segment which is not enriched with is attached to the bar code from same polymer bar code reagent, and thus predict (or really The segment which is not enriched with is included in same circulation particle calmly).In this example, if enrichment and the genome not being enriched with Therefore all so sequencings of both DNA fragmentations then can be predicted segment (or determination) which enrichment and which is not enriched with and be attached to Bar code from same polymer bar code reagent, and the thus segment not being enriched with which of prediction (or determination) which enrichment In same circulation particle.The method similar to the example can also be used, such as by using one or more molecules Conversion process, and/or for example by preparing, analyzing three or more component parts of sample (for example, phonetic rich in 5- methyl born of the same parents The component part of pyridine, rich in component part 5- hydroxy-methyl cytimidine component Parts and be not enriched with) or it is surveyed Sequence.

Optionally, it is described herein it is any sequence is associated method (for example, by attachment sequence of barcodes, such as By the sequence of barcodes for being attached the library from polymer bar code reagent or two or more polymer bar code reagents) It can be carried out after any such enrichment and/or molecule step of converting (for example, carrying out enriching step wherein to be enriched with includes 5- Methylcystein or genomic DNA fragment comprising 5- hydroxy-methyl cytimidine, and wherein by described herein any The genomic DNA fragment being enriched with by the process is associated by method).

Presence that the method may include at least one in determining genomic DNA fragment through modified nucleoside acid or nucleobase or It is not present, wherein carrying out enriching step to be enriched with includes the genomic DNA fragment through modified base.It is such through modify alkali Base may include 5-methylcytosine or 5- hydroxy-methyl cytimidine or any other through one of modified base or more Kind.Such enriching step can be by the specificity compared with other are through modification or unmodified base or preferentially with described through repairing The enrichment probe that the base of decorations combines, such as antibody, enzyme, enzyme fragment or other protein or adapter or any other spy Needle carries out.Such enriching step can by can enzymatically modifying include that the enzyme of the DNA molecular through modified base carries out, such as Glucosyltransferase, such as 5- hydroxymethyl cytimidine glucosyltransferase.Optionally, 5- hydroxymethyl cytimidine glucose can be used Based transferase determines the presence of 5- hydroxymethyl cytimidine in genomic DNA fragment, wherein 5- hydroxymethyl cytimidine glucityl Transferase be used for by glucose moiety from uridine diphosphoglucose be transferred in genomic DNA fragment through modified base to produce Raw glucityl -5- hydroxymethyl cytosine base, optionally wherein then detects the glucityl -5- hydroxymethyl cytimidine alkali Base, such as detected with glucityl -5- hydroxymethyl cytimidine sensitive restriction enzyme, wherein to the glucityl -5- hydroxyl The genomic DNA fragment that the digestion of methylcystein sensitive restriction enzyme is resistant is considered comprising modified 5- hydroxyl first Base cytosine base;Optionally, resistant genomic DNA can be digested to described pair by any method described herein Segment is sequenced to determine its sequence.It optionally, can be before the step of being attached sequence of barcodes if attachment sequence of barcodes Or the enriching step is carried out after the step of being attached sequence of barcodes.Optionally, if genomic DNA fragment from particle Two or more sequences are attached to each other, then can before the step of these sequences are attached to each other or by these sequences each other The enriching step is carried out after the step of attachment.Using at least one of enrichment probe measurement genomic DNA fragment through modifying Any method of nucleotide or nucleobase can be carried out with commercially available enrichment probe or other products, such as commercial antibody, such as Anti- 5- hydroxy-methyl cytimidine antibody ab178771 (Abcam), or for example anti-5-methylcytosine antibody ab10805 (Abcam).In addition, commercial product and/or kit can also be used for other steps of such method, such as albumin A or Protein G Dynabeads (ThermoFisher) is for combining, recycling and processing/washing antibody and/or segment in connection.

Presence that the method may include at least one in determining genomic DNA fragment through modified nucleoside acid or nucleobase or Be not present, wherein carry out molecule step of converting with by it is described through modified base be converted into it is different through modification or unmodified core Base, the latter can detect during determining nucleic acid sequence.The step of converting may include bisulfite conversion step, oxidation sulfurous Sour hydrogen salt step of converting or any other molecule step of converting.It optionally, can be in attachment bar code if attachment sequence of barcodes The enriching step is carried out before the step of sequence or after the step of being attached sequence of barcodes.Optionally, if from particle Two or more sequences of genomic DNA fragment are attached to each other, then can before the step of these sequences are attached to each other or The enriching step is carried out after the step of these sequences are attached to each other.Genomic DNA piece is measured using molecule step of converting At least one in section can be carried out through any method of modified nucleoside acid or nucleobase with commercially available molecule conversion reagent box, such as EpiMark Bisulfite Conversion Kit (New England Biolabs) or TruMethyl Seq Oxidative Bisulfite Sequencing Kit(Cambridge Epigenetix)。

In any method for carrying out molecule step of converting, one or more adapters are few after molecular conversion process Nucleotide is attached to the one or both ends of the genomic DNA fragment set of genomic DNA fragment (and/or in sample).For example, Single-stranded adapter oligonucleotides (for example, binding site comprising the primer for expanding (such as passing through PCR amplification)) can use list Chain link enzyme is connected to one end or two of the transformed genomic DNA fragment set of genomic DNA fragment (and/or in sample) End.Optionally, sequence of barcodes and/or linking subsequence (such as in bar code oligonucleotides) can before molecule step of converting It is attached to one end of the genomic DNA fragment set of genomic DNA fragment (and/or in sample), and then transformed in molecule Adapter oligonucleotides is attached to the second end of genomic DNA fragment after journey.Optionally, the second end may include dividing (i.e. wherein the segment of genomic DNA has gone through fragmentation process, therefore relative to it for the end generated in sub- conversion process Original segments produce one or more new ends of the segment).The method of this attachment adapter oligonucleotides can have Having permission, fragmentation and/or the genomic DNA fragment of degradation are further expanded and/or are analyzed in molecular conversion process And/or the benefit of sequencing.

In any method for carrying out molecule step of converting, any adapter oligonucleotides, and/or bar code few nucleosides Acid, and/or sequence of barcodes, and/or any coupling sequence and/or any coupling oligonucleotides may include one or more synthesis 5-methylcytosine nucleotide.Optionally, any adapter oligonucleotides, and/or bar code oligonucleotides, and/or bar code Sequence, and/or any coupling sequence and/or any coupling oligonucleotides may be configured such that it is wherein included any or all Cytidylic acid is the 5-methylcytosine nucleotide of synthesis.Optionally, the 5- methyl born of the same parents comprising one or more synthesis Any adapter oligonucleotides, and/or bar code oligonucleotides, and/or sequence of barcodes of pyrimidine nucleotide, and/or any idol Connection sequence and/or any coupling oligonucleotides can be attached to genomic DNA fragment before molecule step of converting;As substitution And/or supplement, genomic DNA fragment can be attached to after molecule step of converting.The adapter and/or oligonucleotides And/or the 5-methylcytosine nucleotide of this synthesis in sequence can have in molecular conversion process (such as bisulfites Conversion process) period, it was made to degrade and/or fragmentation reduces or the benefit of minimum, because it is to the drop during such process It solves resistant.

Presence that the method may include at least one in determining genomic DNA fragment through modified nucleoside acid or nucleobase or Be not present, wherein by sequencing reaction is determining or detection it is described through modified nucleoside acid or nucleobase (such as 5-methylcytosine or 5- hydroxy-methyl cytimidine).Optionally, the sequencing reaction can be carried out by the sequenator based on nano-pore, such as by Minion, Gridion X5, Promethion and/or the Smidgion of Oxford Nanopore Technologies production are surveyed Sequence instrument, wherein passing through in genomic DNA fragment during the process of the nano-pore indexing in sequenator and by analyzing in gene It is determined during the indexing of group DNA fragmentation by the current signal of nanopore device and is deposited through modified nucleoside acid or nucleobase In.Optionally, the sequencing reaction can pass through the sequenator (zero-mode-waveguide-based based on zero mode waveguide Sequencing instrument) it carries out, such as Sequel the or RSII sequenator produced by Pacific Biosciences, Wherein in the zero mode waveguide in sequenator synthesize genomic DNA fragment at least part of copy process during and Believed by at least part of process period analysis in replicator group DNA fragmentation from the light of the zero mode waveguide Number come determine through modified nucleoside acid or nucleobase presence.

In any method for carrying out enriching step and/or molecule step of converting, the enrichment and/or conversion be can be not The complete and/or efficiency lower than 100%.For example, molecular conversion process can be carried out, so that the particular category less than 100% Target is (such as sub- by molecular conversion process through modified nucleoside sour (such as 5-methylcytosine or 5- hydroxy-methyl cytimidine) Disulfate conversion or the conversion of oxidative hydrogen salt) it is converted.For example, about 99% or about 95% or about 90% or about 80% or about 70% or about 60% or about 50% or about 40% or about 25% or about 10% such target is through modifying Nucleotide can convert during such molecular conversion process.This incomplete molecular conversion process can be divided by limitation The duration of sub- conversion process carries out (for example, by making the duration shorter than for realizing molecular conversion process The completely or nearly standard time of complete efficiency) make, on average, realize the targeted transformation efficiency.It is this incomplete Molecular conversion process can have reduction sample degradation/fragmentation and/or sample loss amount benefit, and this be, for example, many The feature of molecular conversion process (such as bisulfite conversion).

Similarly, in any method for carrying out enriching step, the enrichment can be incomplete and/or be lower than 100% efficiency.For example, the enriching step of 5-methylcytosine (and/or 5- hydroxy-methyl cytimidine) can be carried out, wherein about 99% or about 95% or about 90% or about 80% or about 70% or about 60% or about 50% or about 40% or about 25%, Or about 10% containing genomic DNA fragment of such target through modified nucleoside acid (such as use affine spy in enriching step The enriching step of needle (such as the antibody to the target through modified nucleoside acid with specificity)) period is captured and recycles.Appoint Selection of land can be carried out described incomplete by limiting and/or reducing amount and/or the concentration of affinity probe used in enrichment process Enrichment is (for example, empirically test the effect of this capture by using the affinity probe of different amount and/or concentration Rate, and refer to optionally by the evaluation for using the DNA sequence dna composed comprising known to through modified nucleoside acid to test as the experience Mark).Optionally, the incomplete enrichment can be carried out by limiting and/or reducing the duration, wherein affinity probe is used for Combine in enrichment process and/or capture target gene group DNA fragmentation (i.e. by using different incubation times, wherein affinity probe It can interact with target gene group DNA fragmentation potential in sample);For example, by using the different incubation duration The efficiency of this capture is tested by rule of thumb, and is made optionally by using comprising the known DNA sequence dna composed through modified nucleoside acid For the evaluation index of experience test).This incomplete enrichment can have reduce false positive molecular signal benefit (for example, The segment of genomic DNA is wherein captured during enrichment process, but wherein the segment does not have desired target through modifying core Thuja acid).In addition, the incomplete enrichment can have the benefit of the cost and complexity that reduce enrichment granting itself.

The method may include carrying out sequence enrichment or sequence capturing step, one of them or more specific gene group DNA sequence dna is enriched with from genomic DNA fragment.The step can be carried out by carrying out any method of sequence enrichment, such as be made With the DNA oligonucleotides complementary with the sequence, or the RNA oligonucleotide complementary with the sequence, or prolong by using primer The step of stretching target enriching step, or the step organized by using molecular inversion probes (molecular inversion probe) Suddenly, the step of or by using padlock-probe (padlock probe) organizing.It optionally, can be attached if attachment sequence of barcodes The enriching step is carried out before the step of connecing sequence of barcodes or after the step of being attached sequence of barcodes.Optionally, if come from Two or more sequences of the genomic DNA fragment of particle are attached to each other, then can be the step of these sequences are attached to each other The enriching step is carried out before or after the step of these sequences are attached to each other.

This method may include enrichment at least one, at least five, at least ten, at least 50, at least 100, at least 500 A, at least 1000, at least 5000, at least 10,000, at least 100,000, at least 1,000,000 or at least 10, 000,000 different genomic DNA fragment.

In the method, each unique input molecule can averagely be sequenced at least 1.0 times in sequencing reaction, averagely extremely It is 1.5 times few, an average of at least 2.0 times, an average of at least 3.0 times, an average of at least 5.0 times, an average of at least 10.0 times, an average of at least 20.0 It is secondary, an average of at least 50.0 times or 100 times an average of at least.Optionally, it is sequenced in sequencing reaction at least twice (that is, using at least The redundancy sequencing of two sequence reads) unique input molecule be used to detecting and/or removing the institute generated by sequencing reaction State the mistake or inconsistency in the sequencing between at least two sequence reads.

Before carrying out sequencing reaction and/or before carrying out amplification reaction, nucleotide reparation reaction can be carried out, wherein going Remove and/or repair impaired and/or excision base or oligonucleotides.Optionally, reparation reaction can be following a kind of or more It is carried out in the presence of a variety of: thermus aquaticus (Thermus aquaticus) DNA ligase, Escherichia coli (e.coli) inscribe Nuclease IV, bacillus stearothermophilus (Bacillus stearothermophilus) archaeal dna polymerase, Escherichia coli formyl In amic metadiazine [fapy]-DNA glycosylase, Escherichia coli uracil-DNA glycosylase, T4 endonuclease V and Escherichia coli Cut nuclease VIII.

It in the method, can be attached before sequencing steps and/or before amplification step (such as PCR amplification step) It connects with linking subsequence (such as one or two general linking subsequence).Optionally, one or more such general Linking subsequence can carry out (wherein one by the reaction of external swivel base by causing at random or gene-specific primer extends step General linking subsequence described in a or more is included in synthesis swivel base body (synthetic transposome)), by double Chain or single-stranded connection reaction (carry out or without previous fragmentation step, such as Chemical fragmentation step, sound or mechanical segment Change step or enzymatic fragmentation step;And optionally with or without flat end and/or 3 ' A- tailing steps) Lai Tianjia.

The sequence of barcodes for the complementary series that the copy or enzyme generated comprising enzymatic generates

One or more sequence of barcodes may include comprising sequence of barcodes enzymatic generate copy or enzymatic generate In the oligonucleotides of complementary series (such as in bar code oligonucleotides).

Optionally, one or more sequence of barcodes may include the wherein bar code few nucleosides in bar code oligonucleotides The bar code region of acid includes the complementary series that the copy that the enzymatic of sequence of barcodes generates or enzymatic generate.Optionally, one or more Multiple sequence of barcodes may include in bar code oligonucleotides, and wherein the bar code region of bar code oligonucleotides includes to be included in item The complementary series that the enzymatic of the sequence of barcodes of code intramolecular generates.Optionally, one or more sequence of barcodes may include in item In codeization oligonucleotides, wherein the bar code region of bar code oligonucleotides includes the enzyme of the sequence of barcodes in molecular bar code It promotes production raw copy.

Optionally, one or more sequence of barcodes may include the wherein bar code few nucleosides in bar code oligonucleotides The bar code region of acid includes the complementary series that the enzymatic of the sequence of barcodes in polymer molecular bar code generates.Optionally, One or more sequence of barcodes may include in bar code oligonucleotides, and wherein the bar code region of bar code oligonucleotides includes It include the copy that the enzymatic of the sequence of barcodes in polymer molecular bar code generates.

Optionally, one or more sequence of barcodes may include in the first bar code oligonucleotides, and wherein bar code is few The bar code region of nucleotide includes the complementary series that the enzymatic of the sequence of barcodes in the second bar code oligonucleotides generates. Optionally, one or more sequence of barcodes may include the wherein bar code oligonucleotides in the first bar code oligonucleotides Bar code region includes the copy that the enzymatic of the sequence of barcodes in the second bar code oligonucleotides generates.

For copying, replicating and/or any enzymatic method of synthetic nucleic acid sequence can be used for generating the enzymatic of sequence of barcodes The complementary series that the copy or enzymatic of generation generate.Optionally, primer extension method can be used.Optionally, primer can be used to prolong Stretching method, wherein be included in molecular bar code (and/or be included in polymer molecular bar code, and/or be included in bar code widow core In thuja acid) sequence of barcodes be replicated in primer extension procedures, and wherein primer extension procedures gained primer extend produce Object includes all or part of (such as comprising bar code oligonucleotides all or part of) of sequence of barcodes, then attached It is connected to the nucleic acid sequence (for example, being attached to the sequence for carrying out the genomic DNA fragment of self-loopa particle) for carrying out self-loopa particle.

Optionally, polymerase chain reaction (PCR) method can be used.Optionally, polymerase chain reaction (PCR) can be used Method, wherein be included in molecular bar code (and/or be included in polymer molecular bar code, and/or be included in bar code few nucleosides In acid) sequence of barcodes PCR extend step in be replicated, and wherein PCR extend step gained extension products include item All or part of (such as comprising bar code oligonucleotides all or part of) of code sequence, is then attached to and comes from Recycle the nucleic acid sequence (for example, being attached to the sequence for carrying out the genomic DNA fragment of self-loopa particle) of particle.Optionally, it can adopt With polymerase chain reaction (PCR) method, wherein be included in molecular bar code (and/or be included in polymer molecular bar code, And/or be included in bar code oligonucleotides) sequence of barcodes extended in steps at least two continuous PCR and replicate (for example, Replicated at least the first PCR cycle and subsequent second PCR cycle), and PCR extension products obtained by wherein at least two are each All or part of (such as comprising bar code oligonucleotides all or part of) of self-contained sequence of barcodes, it is then attached It is connected to the nucleic acid sequence (for example, being attached to the sequence for carrying out the genomic DNA fragment of self-loopa particle) for carrying out self-loopa particle.

Optionally, rolling circle amplification (rolling-circle amplification, RCA) method can be used.Optionally, may be used Using rolling circle amplification (RCA) method, wherein be included in molecular bar code (and/or be included in polymer molecular bar code, and/or In bar code oligonucleotides) sequence of barcodes be replicated in rolling circle amplification step, and wherein rolling circle amplification step Gained extension products include sequence of barcodes all or part of (such as the whole comprising bar code oligonucleotides or one Point, and/or all or part of comprising molecular bar code, and/or all or part of comprising polymer molecular bar code), Then the nucleic acid sequence for carrying out self-loopa particle is attached to (for example, being attached to the sequence for carrying out the genomic DNA fragment of self-loopa particle Column).

Optionally, rolling circle amplification (RCA) method can be used, wherein including that sequence of barcodes in polymer molecular bar code exists It is replicated in rolling circle amplification step, and wherein the gained extension products of rolling circle amplification step include the second polymer bar code point Son, and wherein the second polymer molecular bar code be used as template with synthesize at least one bar code oligonucleotides (wherein this The bar code oligonucleotides of sample can be generated by any method described herein;Such as use the second polymer bar code point Son by primer extension procedures or uses the second polymer molecular bar code as template and passes through primer extend as template At least one bar code oligonucleotides is generated with Connection Step), then it is attached to the nucleic acid sequence (example for carrying out self-loopa particle Such as, it is attached to the sequence for carrying out the genomic DNA fragment of self-loopa particle).

Optionally, any such method of the enzymatic copy generated of sequence of barcodes or the complementary series of enzymatic generation is generated It can be carried out in single reaction volume.Optionally, the enzymatic copy generated of sequence of barcodes or the complementary sequence of enzymatic generation are generated Any such method of column can carry out in two or more different reaction volumes (that is, different at two or more It is carried out in subregion).Optionally, generate sequence of barcodes enzymatic generate copy or enzymatic generate complementary series it is any such Method can at least three, at least five, at least ten, at least 50, at least 100, at least 500, at least 1000, at least 10,000, at least 100,000, at least 1,000,000, at least 10,000,000 or at least 100,000,000 differences Reaction volume (and/or subregion) in carry out.

Optionally, any such method of the enzymatic copy generated of sequence of barcodes or the complementary series of enzymatic generation is generated It can be in the reaction volume comprising the nucleic acid sequence from one or more circulation particles (for example, comprising one or more In the reaction volume of a circulation particle) it carries out.Optionally, the copy or enzymatic for generating the enzymatic generation of sequence of barcodes generate mutual The method of complementary series (such as can be followed in the nucleic acid sequence comprising the first circulation particle from sample comprising first from sample The genomic DNA fragment of ring particle, and/or include the first circulation particle from sample) the first reaction volume in carry out, and And in nucleic acid sequence (such as the gene comprising the second circulation particle from sample comprising the second circulation particle from sample Group DNA fragmentation, and/or include the second circulation particle from sample) the second reaction volume in carry out.

Optionally, the method for generating the enzymatic copy generated of sequence of barcodes or the complementary series of enzymatic generation can be N number of It is carried out in different reaction volumes, wherein each such reaction volume includes at least one sequence of barcodes and also includes to come from The nucleic acid sequence of the circulation particle of sample is (for example, also include the genomic DNA fragment of the circulation particle from sample, and/or also Include the circulation particle from sample), wherein N be at least 2, at least 3, at least 5, at least 10, at least 50, at least 100, at least 500, at least 1000, at least 10,000, at least 100,000, at least 1,000,000, at least 10,000,000 or at least 100, 000,000.Optionally, include sequence of barcodes in N number of differential responses volume can together comprising at least 2, at least 3, at least 5, At least 10, at least 50, at least 100, at least 500, at least 1000, at least 10,000, at least 100,000, at least 1,000,000, At least 10,000,000 or at least 100,000,000 different sequence of barcodes.

Optionally, generate sequence of barcodes enzymatic generate copy or enzymatic generate complementary series method can comprising First sequence of barcodes and also wrap first circulation particle with sample nucleic acid sequence (for example, also comprising from sample first Recycle particle genomic DNA fragment, and/or also include the first circulation particle from sample) the first reaction volume in into Row, and in the nucleic acid sequence for comprising the second sequence of barcodes and also wrapping second circulation particle with sample (for example, also comprising coming From the genomic DNA fragment of the second circulation particle of sample, and/or also include the second circulation particle from sample) second It is carried out in reaction volume, wherein the first sequence of barcodes is different from the second sequence of barcodes.

Optionally, generate sequence of barcodes enzymatic generate copy or enzymatic generate complementary series method can comprising The first of the nucleic acid sequence (such as the genomic DNA fragment for wrapping first circulation particle with sample) of the first circulation particle of sample Carried out in reaction volume, wherein copy that at least the first and second enzymatics of the sequence of barcodes from the first reaction volume generate or The complementary series that enzymatic generates is attached to the nucleic acid sequence of the first circulation particle of sample, and is including second from sample Recycle the second reactant of the nucleic acid sequence (such as genomic DNA fragment comprising the second circulation particle from sample) of particle It is carried out in product, wherein copy or enzymatic that at least the first and second enzymatics of the sequence of barcodes from the second reaction volume generate produce Raw complementary series is attached to the nucleic acid sequence of the second circulation particle of sample.

Optionally, any method for generating the enzymatic copy generated of sequence of barcodes or the complementary series of enzymatic generation can be right It is carried out in the library that (and/or using or with) includes two or more sequence of barcodes.Optionally, the enzymatic of sequence of barcodes is generated Any method for the complementary series that the copy or enzymatic of generation generate can include two or more for (and/or utilization or use) The library of molecular bar code carries out.Optionally, the complementary series that the copy or enzymatic for generating the enzymatic generation of sequence of barcodes generate The library that any method can include two or more polymer molecular bar codes for (and/or using or with) carries out.Optionally, Generate sequence of barcodes enzymatic generate copy or enzymatic generate complementary series any method can for (and/or utilize or With) carry out comprising the libraries of two or more polymer bar code reagents.Optionally, the enzymatic generation of sequence of barcodes is generated Any method for the complementary series that copy or enzymatic generate can include two or more bar codes for (and/or utilization or use) The library of oligonucleotides carries out.

Optionally, any method for generating the enzymatic copy generated of sequence of barcodes or the complementary series of enzymatic generation may be used also The complementary series generated including the copy or enzymatic that generate one or more enzymatics of sequence of barcodes in attachment step is attached Each of one or more nucleic acid sequences for being connected to circulation particle (such as are attached to the genomic DNA sequence of circulation particle Column).Optionally, attachment step as any one or more may include hybridization step (for example, making bar code oligonucleotides The step of with nucleic acid array hybridizing), hybridize and (such as make bar code oligonucleotides and nucleic acid array hybridizing the step of extending hybridization And the step of then extending the bar code oligonucleotides of hybridization with polymerase), and/or Connection Step (such as bar code is few Nucleotide is connected to the step of nucleic acid sequence).It, can be to including bar code after any one or more such attachment steps The nucleic acid sequence of sequence and its attached nucleic acid sequence for carrying out self-loopa particle carry out sequencing steps.

Optionally, any method for generating the enzymatic copy generated of sequence of barcodes or the complementary series of enzymatic generation may be used also Complementary series including generating the copy of one or more enzymatics generation of sequence of barcodes or enzymatic is attached to circulation particle One or more nucleic acid sequences each (such as be attached to circulation particle genomic dna sequence), wherein recycling micro- The nucleic acid sequence of grain also includes coupling sequence.Any coupling sequence and/or attachment coupling sequence described herein can be used The method of column, and/or the method that sequence of barcodes is attached to coupling sequence (and/or oligonucleotides comprising coupling sequence).

Optionally, the enzymatic copy generated of sequence of barcodes or the complementary series of enzymatic generation are generated and further includes by item The complementary series that the copy or enzymatic that one or more enzymatics of code sequence generate generate is attached to the nucleic acid sequence of circulation particle Column any method may also include make recycle particulate chemistry crosslinking (and/or make comprising two or more circulation particles sample Chemical crosslinking) the step of.Optionally, the chemical crosslinking step can will recycle particle and/or molecular bar code is assigned to two Or more carry out before or after step in different subregions.Optionally, it can be after the chemical crosslinking step inverse The step of turning the crosslinking, such as incubation step is warmed by height.Optionally, generate sequence of barcodes enzymatic generate copy or Complementary series that enzymatic generates and further include that the copy for generating one or more enzymatics of sequence of barcodes or enzymatic generate Complementary series be attached to any method of nucleic acid sequence of circulation particle and may also include the step of making the circulation particle permeabilization, Such as pass through high temperature incubation step and/or chemical surfactant.

Optionally, any method for generating the enzymatic copy generated of sequence of barcodes or the complementary series of enzymatic generation is available The subregion described herein of any quantity and/or type and/or volume carries out.Optionally, in one or more subregions It may include one or more for generating any method of the enzymatic copy generated of sequence of barcodes or the complementary series of enzymatic generation Subregion, the subregion include any amount of circulation particle as described herein.Optionally, in one or more subregions It may include one or more for generating any method of the enzymatic copy generated of sequence of barcodes or the complementary series of enzymatic generation Subregion, the subregion include the circulation particle as described herein of any quantity (or par).Optionally, at one or Any method of the complementary series of the copy or enzymatic generation of the enzymatic generation of generation sequence of barcodes may include in more subregions One or more subregions, the subregion include any weight (mass) (or average weight) from circulation described herein The nucleic acid (such as genomic DNA fragment of any weight) of particle.

Any method for the complementary series that the copy or enzymatic for generating the enzymatic generation of sequence of barcodes generate, which can have, to be used for Analysis carrys out a variety of desired features and characteristics of the sequence that is associated of self-loopa particle.In the first scenario, bar code sequence is generated The complementary series that the copy or enzymatic that the enzymatic of column generates generate makes it possible to produce using only a small amount of starting sequence of barcodes material Raw big absolute weight sequence of barcodes (such as molecular bar code or bar code oligonucleotides of big absolute weight) (for example, PCR and RCA processing can produce a large amount of exponential amplifications of input material, using and operating for subsequent).

In addition, the complementary series that copy or enzymatic that the enzymatic for generating sequence of barcodes generates generate, wherein such bar code Sequence be included in library in (such as the library, polymer molecular bar code included in molecular bar code library, polymer bar code examination In the library of agent, and/or the library of bar code oligonucleotides), make it possible to generate having for big absolute weight and determines that sequence is special The sequence of barcodes of sign is (for example, wherein the sequence of barcodes of big absolute weight includes the text from characterization previously established and/or previous The sequence in library).

In addition, the duplication of many enzymatics and amplification method (such as the rolling circle amplification by phi29 polymerase, and pass through heat Stablize the primer extend and/or PCR amplification of polymerase (such as Phusion polymerase)) high score is shown during the duplication Sub- accuracy (in terms of the probability for generating mistake in the sequence newly replicated), and therefore compared with non-enzymatic method (such as with mark Quasi- chemistry oligonucleotide synthesis method, such as phosphoramidite oligonucleotide synthesis) compared to showing gained sequence of barcodes (such as institute Molecular bar code, polymer molecular bar code and/or bar code oligonucleotides) advantageous accuracy characteristic.

In addition, enzymatic duplication and amplification method (such as primer extend and PCR method) are highly suitable for the sequence Subsequent modification, processing and functionalization step, itself can also have in a relatively simple manner on the substrate of big absolute weight The further benefit realized.For example, primer extension product be easily configured and/or be configurable to subsequent connection procedure (for example, Such as in primer extend and connection procedure, bar code oligonucleotides and/or polymer bar code such as can be for example generated by carrying out Reagent).And for further example, the direct product of enzymatic reproduction process itself is (for example, the wherein complementation of sequence of barcodes Sequence/copy and sequence of barcodes itself is annealed) can have desired function and/or architectural characteristic.For example, being prolonged by enzymatic primer Extend through bolt in the macromolecular complex that the bar code oligonucleotides that journey generates is maintained at single in its production process in structure Molecular bar code (such as polymer molecular bar code) (by the nucleotide sequence of annealing) is lain in, it then can in the solution further Processing and/or function turn to single whole reagent.

11. the general aspects of polymer bar code reagent

The use of polymer bar code reagent shows a variety of available feature and function, with the sequence of self-loopa in future particle Column are associated.In the first scenario, such reagent (and/or its library) may include sufficiently characterizing of clearly limiting very much Code character, may be notified that and enhance subsequent bioinformatic analysis (for example, being related to using known and/or empirically determined sequence The polymer molecular bar code and/or polymer bar code reagent of column).In addition, such reagent can very easily distribute and/ Or other molecules for once carrying out multiple sequence of barcodes or biophysics processing are (that is, due to including more in each such reagent A sequence of barcodes automatically " moves together " in solution and during liquid handling and/or procedure of processing).In addition, these The degree of approach between multiple sequence of barcodes of reagent itself can realize new functional examination form, such as make to recycle particle crosslinking, And the sequence from such polymer reagent is then attached to genomic DNA fragment wherein included (including for example at it In solution-phase reaction, i.e., two or more particles in single subregion).

The present invention provides for marking the polymer bar code reagent of one or more target nucleic acids.Polymer bar code Reagent includes two or more bar code regions of (direct or indirect) of linking together.

Each bar code region includes nucleic acid sequence.Nucleic acid sequence can be single stranded DNA, double-stranded DNA or with one or more The single stranded DNA of multiple double-stranded regions.

Each bar code region may include the sequence for identifying polymer bar code reagent.For example, the sequence can be it is single more The shared constant region in all bar code regions of aggressiveness bar code reagent.Each bar code region may include being not present in other regions And it therefore can be used for uniquely identifying the unique sequences in each bar code region.Each bar code region may include at least five, at least 10 A, at least 15, at least 20, at least 25, at least 50 or at least 100 nucleotide.Preferably, each bar code region packet Nucleotide containing at least five.Preferably, each bar code region includes deoxyribonucleotide, all optionally in bar code region Nucleotide is all deoxyribonucleotide.One or more deoxyribonucleotides can be modified dezyribonucleoside Acid (such as the deoxyribonucleotide or deoxyuridine acid modified with biotin moiety).Bar code region may include one Or more degeneracy nucleotide or sequence.Bar code region can not include any degeneracy nucleotide or sequence.

Polymer bar code reagent may include at least five, at least ten, at least 20, at least 25, at least 50, extremely 75, at least 100, at least 200, at least 500, at least 1000, at least 5000 or at least 10,000 bar code areas less Domain.Preferably, polymer bar code reagent includes at least five bar code region.

Polymer bar code reagent may include at least two, at least three, at least four, at least five, at least ten, at least 20 It is a, at least 25, at least 50, at least 75, at least 100, at least 200, at least 500, at least 1000, at least 5000, at least 104It is a, at least 105Or at least 106A unique or different bar code region.Preferably, polymer bar code tries Agent includes at least five uniqueness or different bar code regions.

Polymer bar code reagent may include: the first and second molecular bar codes (the i.e. polymer bar code point to link together Son), wherein each molecular bar code includes the nucleic acid sequence containing bar code region.

The molecular bar code of polymer molecular bar code can be associated on nucleic acid molecules.The molecular bar code of polymer molecular bar code It may include in (single) nucleic acid molecules.Polymer molecular bar code may include individually connecting containing two or more molecular bar codes Continuous nucleic acid sequence.Polymer molecular bar code can be single stranded nucleic acid molecule (such as single stranded DNA), double-strand chain nucleic acid molecules or comprising The single chain molecule of one or more double-stranded regions.Polymer molecular bar code may include that can connect with 3 ' ends of other nucleic acid molecules One or more phosphorylase 15s connect ' end.Optionally, in double-stranded region or between two different double-stranded regions, polymer Molecular bar code may include one or more notch (nick) or one or more notches (gap), wherein polymer bar code Molecule itself is separated or is separated.The length of any notch can be at least one, at least two, at least five, at least 10 A, at least 20, at least 50 or at least 100 nucleotide.The notch and/or notch can be used for improving polymer bar code The purpose of the molecular flexibility of molecule and/or polymer bar code reagent, such as improve molecule or reagent and target nucleic acid molecule is mutual The accessibility of effect.The notch and/or notch, which also may make, more efficiently can purify or remove the molecule or reagent.Packet Molecule and/or reagent containing the notch and/or notch can be kept between different molecular bar codes and with complementary dna chain Connection, the complementary dna chain hybridize jointly with the region of two or more parts separated of polymer molecular bar code.

Molecular bar code can be associated for example, by support (such as macromolecular, solid support or semi-solid support). Molecular bar code be associated with each support sequence can be it is known.Molecular bar code directly or indirectly (such as can pass through connector Molecule) it is associated with support.Molecular bar code can by conjunction with support and/or by with the connector that is integrated on support Molecule in conjunction with or annealing and be associated.Molecular bar code can be by being covalently attached, being not covalently linked (such as protein-protein phase Interaction or Streptavidin-biotin key) or nucleic acid hybridizes and support (or and linkers) combine.Linkers can be with It is biopolymer (such as nucleic acid molecules) or synthetic polymer.Linkers may include one or more ethylene glycol and/or Poly(ethylene glycol) (such as six ethylene glycol or five ethylene glycol) unit.Linkers may include one or more ethyls, such as C3 (three carbon) interval base, C6 interval base, C12 interval base or C18 interval base.

Molecular bar code can be associated by conjunction with macromolecular and/or and annealing with macromolecular by macromolecular.

Molecular bar code can directly or indirectly (such as passing through linkers) be associated with macromolecular.Molecular bar code can by with Macromolecular in conjunction with the linkers being bound in macromolecular or and annealing in conjunction with and/or by being associated.Molecular bar code can lead to It is miscellaneous to cross covalent linkage, non-covalent linking (such as protein-protein interaction or Streptavidin-biotin key) or nucleic acid Give macromolecular (with or linkers) combine.Linkers can be biopolymer (such as nucleic acid molecules) or synthesized polymer Object.Linkers may include that one or more ethylene glycol and/or poly(ethylene glycol) (such as six ethylene glycol or five ethylene glycol) are single Member.Linkers may include one or more ethyls, such as C3 (three carbon) interval base, C6 interval base, C12 interval base or C18 Interval base.

It is (such as single-stranded that macromolecular can be synthetic polymer (such as dendritic) or biopolymer such as nucleic acid Nucleic acid, such as single stranded DNA), peptide, polypeptide or protein (such as polymer protein).

Dendritic may include at least 2 generations, at least 3 generations, at least 5 generations or at least 10 generations.

Macromolecular can be the nucleic acid comprising two or more nucleotide, and each nucleotide can be with molecular bar code knot It closes.Additionally or alternatively, nucleic acid may include two or more regions, and each region can hybridize with molecular bar code.

Nucleic acid may include first sour through modified nucleoside through modified nucleoside acid and second, wherein each include through modified nucleoside acid It can be with the bound fraction (for example, biotin moiety, or can be used for the alkynyl moiety of click chemistry reaction) in conjunction with molecular bar code.Appoint Selection of land, first and second through modified nucleoside acid can by least one, at least two, at least five or at least ten nucleotide Nucleic acid sequence is interleave to separate.

Nucleic acid may include the first hybridising region and the second hybridising region, wherein each hybridising region include in molecular bar code At least one nucleotide sequence is complementary and the sequence that can be hybrid with it.Complementary series can be at least five, at least 10 A, at least 15, at least 20, at least 25 or at least 50 continuous nucleotides.Preferably, complementary series is that at least ten connects Continuous nucleotide.Optionally, the first and second hybridising regions can by least one, at least two, at least five or at least ten core The nucleic acid sequence that interleaves of thuja acid separates.

Macromolecular can be protein, such as polymer protein, such as with polyprotein matter or different polyprotein matter.For example, Protein may include Streptavidin, such as tetramer Streptavidin.

Support can be solid support or semi-solid support.Support may include flat surfaces.Support can be with It is such as glass slide, such as glass slide.Glass slide can be the flow cell for sequencing.If support is glass slide, Then the first and second molecular bar codes can be fixed in the zone of dispersion on glass slide.Optionally, each polymer item in library The molecular bar code of codeization reagent is fixed on glass slide not relative to the molecular bar code of other polymer bar code reagents in library In same zone of dispersion.Support can be the plate comprising hole, optionally wherein the first and second molecular bar codes be fixed on it is same Kong Zhong.Optionally, in library the molecular bar code of each polymer bar code reagent relative to other polymer bar codes in library The molecular bar code of reagent is fixed in the different holes of plate.

Preferably, support is pearl (such as gel beads).Pearl can be sepharose 4B, silica beads, styrofoam Pearl, gel beads (such as can from 10 ×Obtain those of), antibody conjugate pearl, oligo-dT be conjugated pearl, strepto- Avidin pearl or magnetic bead (such as super-paramagnetic bead).Pearl can have any size and/or molecular structure.For example, pearl can be diameter 10 nanometers to 100 microns, 100 nanometers to 10 microns or 1 micron to 5 microns of diameter of diameter.Optionally, pearl is diameter about 10 Nanometer, about 100 nanometers of diameter, about 1 micron of diameter, about 10 microns of diameter or about 100 microns of diameter.Pearl can be it is solid, or To can be hollow or part hollow or porous for pearl as an alternative by person.For certain bar code methods, certain sizes Pearl can be most preferably.For example, the pearl less than 5.0 microns or less than 1.0 microns is for making the nucleic acid target in individual cells Bar code can be most available.Preferably, in library the molecular bar code of each polymer bar code reagent together relative to text The molecular bar code of other polymer bar code reagents is related on different pearls in library.

Support can be functionalized and enable to connect two or more molecular bar codes.This functionalization can pass through to Support add chemical part (such as carboxylate group, alkynes, azide, acrylate group, amino, sulfate group or Succinimide group) and/or part (such as Streptavidin, Avidin or Protein G) Lai Shixian based on protein.Bar code Molecule can directly or indirectly (such as passing through linkers) be connected with the part.

It can make under conditions of promoting two or more molecular bar code to connect with each pearl in solution functionalized Support (such as pearl) contacts with the solution of molecular bar code and (generates polymer bar code reagent).

In the library of polymer bar code reagent, the molecular bar code of each polymer bar code reagent can be with one in library It rises and is contacted on different supports relative to the molecular bar code relative to other polymer bar code reagents in library.

Polymer bar code reagent may include: at least two, at least three, at least four, at least five, at least ten, at least 20, at least 25, at least 50, at least 75, at least 100, at least 200, at least 500, at least 1000, at least 5000, at least 104It is a, at least 105It is a or at least 106A molecular bar code to link together, wherein each molecular bar code is such as Defined in this paper;And the bar code oligonucleotides with the annealing of each molecular bar code, wherein each bar code oligonucleotides is such as Defined in this paper.Preferably, polymer bar code reagent includes the molecular bar code that at least five links together, wherein each Molecular bar code is as defined herein;And the bar code oligonucleotides with the annealing of each molecular bar code, wherein each bar code Oligonucleotides is as defined herein.

Polymer bar code reagent may include: at least two, at least three, at least four, at least five, at least ten, at least 20, at least 25, at least 50, at least 75, at least 100, at least 200, at least 500, at least 1000, at least 5000, at least 104It is a, at least 105It is a or at least 106A unique or different molecular bar code to link together, wherein often A molecular bar code is as defined herein;And the bar code oligonucleotides with the annealing of each molecular bar code, wherein each bar code Change oligonucleotides to be as defined herein.Preferably, polymer bar code reagent include the uniqueness that links together of at least five or Different molecular bar code, wherein each molecular bar code is as defined herein;And the bar code with the annealing of each molecular bar code Oligonucleotides, wherein each bar code oligonucleotides is as defined herein.

Polymer bar code reagent may include two or more bar code oligonucleotides as defined herein, wherein Bar code oligonucleotides respectively contains bar code region.Polymer bar code reagent may include: at least two, at least three, at least 4 A, at least five, at least ten, at least 20, at least 25, at least 50, at least 75, at least 100, at least 200, At least 500, at least 1000, at least 5000, at least 10,000, at least 100,000 or at least 1,000,000 are solely Special or different bar code oligonucleotides.Preferably, polymer bar code reagent includes at least five uniqueness or different bar codes Oligonucleotides.

The bar code oligonucleotides of polymer bar code reagent links together (direct or indirect).The examination of polymer bar code The bar code oligonucleotides of agent passes through support (such as macromolecular, androgynous support or half androgynous support) described herein It links together.Polymer bar code reagent may include bar code oligonucleotides annealing or attached to it one or more Polymer.For example, the bar code oligonucleotides of polymer bar code reagent can be with polymer hybrid molecule (such as polymer bar code Molecule) annealing.Alternatively, the bar code oligonucleotides of polymer bar code reagent can (such as synthetic polymer be such as by macromolecular Dendritic or biopolymer such as protein) or support (such as solid support or semi-solid support, such as Gel beads) it links together.Additionally or alternatively, the bar code oligonucleotides of (single) polymer bar code reagent can pass through It links together in (single) lipid carrier (such as liposome or micella).

Polymer bar code reagent can contain: the first and second hybrid molecules (the i.e. polymer hybridization point to link together Son), wherein each hybrid molecule includes the nucleic acid sequence containing hybridising region;And the first and second bar code oligonucleotides, Wherein the hybridising region of the first bar code oligonucleotides and the first hybrid molecule is annealed, and wherein the second bar code oligonucleotides It anneals with the hybridising region of the second hybrid molecule.

Hybrid molecule includes deoxyribonucleotide or is made of deoxyribonucleotide.One or more deoxyriboses Nucleotide can be through modification deoxyribonucleotide (such as with biotin moiety modify deoxyribonucleotide or deoxidation urinate Pyrimidine nucleotide).Hybrid molecule may include one or more degeneracy nucleotide or sequence.Hybrid molecule can not include any Degeneracy nucleotide or sequence.

The hybrid molecule of polymer hybrid molecule can be associated on nucleic acid molecules.Such nucleic acid molecules can provide can be with The main chain of single-stranded bar code oligonucleotides annealing.The hybrid molecule of polymer hybrid molecule may include in (single) nucleic acid molecules It is interior.Polymer hybrid molecule may include the single continuous nucleic acid sequence containing two or more hybrid molecules.Polymer hybridization Molecule can be the single stranded nucleic acid molecule (such as single stranded DNA) comprising two or more hybrid molecules.Polymer hybrid molecule It may include one or more double-stranded regions.Optionally, in double-stranded region or between two different double-stranded regions, polymer Hybrid molecule may include one or more notch or one or more notches, and wherein polymer molecular bar code itself is divided It opens or separates.The length of any such notch can be at least one, at least two, at least five, at least ten, at least 20 A, at least 50 or at least 100 nucleotide.The notch and/or notch can be used for improving polymer hybrid molecule and/or The purpose of the molecular flexibility of polymer bar code reagent, for example, improve molecule or reagent and target nucleic acid molecule interaction can and Property.The notch and/or notch, which also may make, more efficiently can purify or remove the molecule or reagent.Include the notch And/or the molecule and/or reagent of notch can keep the connection between different hybrid molecules and with complementary dna chain, it is described Complementary dna chain hybridizes jointly with the region of two or more parts separated of polymer hybrid molecule.

Hybrid molecule can be associated by conjunction with macromolecular and/or and annealing with macromolecular by macromolecular.

Hybrid molecule can directly or indirectly (such as passing through linkers) be associated with macromolecular.Hybrid molecule can by with Macromolecular in conjunction with the linkers for being bound to macromolecular or and annealing in conjunction with and/or by being associated.Hybrid molecule can pass through It is covalently attached, non-covalent linking (such as protein-protein interaction or Streptavidin-biotin key) or nucleic acid hybridize It is combined with macromolecular (or linkers).Linkers can be biopolymer (such as nucleic acid molecules) or synthetic polymer. Linkers may include one or more ethylene glycol and/or poly(ethylene glycol) (such as six ethylene glycol or five ethylene glycol) unit. Linkers may include one or more ethyls, such as C3 (three carbon) interval base, C6 interval base, C12 interval base or the interval C18 Base.

It is (such as single-stranded that macromolecular can be synthetic polymer (such as dendritic) or biopolymer such as nucleic acid Nucleic acid, such as single stranded DNA), peptide, polypeptide or protein (such as polymer protein).

Dendritic may include at least 2 generations, at least 3 generations, at least 5 generations or at least 10 generations.

Macromolecular can be the nucleic acid comprising two or more nucleotide, and each nucleotide can be with hybrid molecule knot It closes.Additionally or alternatively, nucleic acid may include two or more regions, and each region can hybridize with hybrid molecule.

Nucleic acid may include first sour through modified nucleoside through modified nucleoside acid and second, wherein each include through modified nucleoside acid It can be with the bound fraction (for example, biotin moiety, or can be used for the alkynyl moiety of click chemistry reaction) in conjunction with hybrid molecule.Appoint Selection of land, first and second through modified nucleoside acid can by least one, at least two, at least five or at least ten nucleotide Nucleic acid sequence is interleave to separate.

Nucleic acid may include the first hybridising region and the second hybridising region, wherein each hybridising region include in hybrid molecule At least one nucleotide sequence is complementary and the sequence that can be hybrid with it.Complementary series can be at least five, at least 10 A, at least 15, at least 20, at least 25 or at least 50 continuous nucleotides.Optionally, the first hybridising region and second miscellaneous Hand over region can by least one, at least two, the nucleic acid sequence that interleaves of at least five or at least ten nucleotide separates.

Macromolecular can be protein, such as polymer protein, such as with polyprotein matter or different polyprotein matter.For example, Protein may include Streptavidin, such as four poly- Streptavidins.

Hybrid molecule can be associated by support.Hybrid molecule can directly or indirectly (such as passing through linkers) with Support is associated.Hybrid molecule can by conjunction with support and/or by with the linkers knot that is integrated on support It closes or anneals and be associated.Hybrid molecule can be by being covalently attached, being not covalently linked (such as protein-protein interaction Or Streptavidin-biotin key) or nucleic acid hybridizes and support (or linkers) combine.It is poly- that linkers can be biology Close object (such as nucleic acid molecules) or synthetic polymer.Linkers may include one or more ethylene glycol and/or poly- (second two Alcohol) (such as six ethylene glycol or five ethylene glycol) unit.Linkers may include between one or more ethyls, such as C3 (three carbon) Every base, C6 interval base, C12 interval base or C18 interval base.

Support can be solid support or semi-solid support.Support may include flat surfaces.Support can be with It is such as glass slide, such as glass slide.Glass slide can be the flow cell for sequencing.If support is glass slide, Then the first and second hybrid molecules can be fixed in the zone of dispersion on glass slide.Optionally, each polymer bar code in library The hybrid molecule for changing reagent is fixed on difference on glass slide relative to the hybrid molecule of other polymer bar code reagents in library Zone of dispersion in.Support can be the plate comprising hole, and optionally wherein the first and second hybrid molecules are fixed on same hole In.Optionally, the hybrid molecule of each polymer bar code reagent is tried relative to other polymer bar codes in library in library The hybrid molecule of agent is fixed in the different holes of plate.

Preferably, support is pearl (such as gel beads).Pearl can be sepharose 4B, silica beads, styrofoam Pearl, gel beads (such as can from 10 ×Obtain those of), antibody conjugate pearl, oligo-dT be conjugated pearl, strepto- Avidin pearl or magnetic bead (such as super-paramagnetic bead).Pearl can have any size and/or molecular structure.For example, pearl can be diameter 10 nanometers to 100 microns, 100 nanometers to 10 microns or 1 micron to 5 microns of diameter of diameter.Optionally, pearl is diameter about 10 Nanometer, about 100 nanometers of diameter, about 1 micron of diameter, about 10 microns of diameter or about 100 microns of diameter.Pearl can be it is solid, or To can be hollow or part hollow or porous for pearl as an alternative by person.For certain bar code methods, certain sizes Pearl may be most preferred.For example, the pearl less than 5.0 microns or less than 1.0 microns is for making the nucleic acid target in individual cells Bar code can be most available.Preferably, in library the hybrid molecule of each polymer bar code reagent together relative to text The hybrid molecule of other polymer bar code reagents contacts on different pearls in library.

Support can be functionalized and enable to connect two or more hybrid molecules.This functionalization can pass through to Support add chemical part (such as carboxylate group, alkynes, azide, acrylate group, amino, sulfate group or Succinimide group) and/or part (such as Streptavidin, Avidin or Protein G) Lai Shixian based on protein.Hybridization Molecule can directly or indirectly (such as passing through linkers) be connect with the part.

It can make under conditions of promoting two or more hybrid molecule to connect with each pearl in solution functionalized Support (such as pearl) contacts with the solution of hybrid molecule and (generates polymer bar code reagent).

In the library of polymer bar code reagent, the hybrid molecule of each polymer bar code reagent can be with one in library It rises and is contacted on different supports relative to the hybrid molecule relative to other polymer bar code reagents in library.

Optionally, hybrid molecule is by being covalently attached, being not covalently linked (such as Streptavidin-biotin key) or nucleic acid Hybridization is connect with pearl.

Polymer bar code reagent may include: at least two, at least three, at least four, at least five, at least ten, at least 20, at least 25, at least 50, at least 75, at least 100, at least 200, at least 500, at least 1000, at least 5000 or at least 10,000 hybrid molecule to link together, wherein each hybrid molecule is as defined herein;And The bar code oligonucleotides annealed with each hybrid molecule, wherein each bar code oligonucleotides is as defined herein.It is preferred that Ground, polymer bar code reagent includes the hybrid molecule that at least five links together, wherein each hybrid molecule institute for example herein It limits;And the bar code oligonucleotides with the annealing of each hybrid molecule, wherein each bar code oligonucleotides institute for example herein It limits.

Polymer bar code reagent may include: at least two, at least three, at least four, at least five, at least ten, at least 20, at least 25, at least 50, at least 75, at least 100, at least 200, at least 500, at least 1000, at least 5000 or at least 10,000 unique or different hybrid molecule to link together, wherein each hybrid molecule is as herein Defined in;And the bar code oligonucleotides with the annealing of each hybrid molecule, wherein each bar code oligonucleotides is as herein Defined in.Preferably, polymer bar code reagent includes unique or different hybrid molecule that at least five links together, Wherein each hybrid molecule is as defined herein;And the bar code oligonucleotides with the annealing of each hybrid molecule, wherein often A bar code oligonucleotides is as defined herein.

Polymer hybrid molecule can be polymer molecular bar code, wherein the first hybrid molecule is the first molecular bar code, and And second hybrid molecule be the second molecular bar code.Polymer bar code reagent may include: first and second to link together Code molecule (i.e. polymer molecular bar code), wherein each molecular bar code includes the nucleic acid sequence containing bar code region;And first With the second bar code oligonucleotides, wherein the bar code region of the first bar code oligonucleotides and the first molecular bar code anneal, and Wherein the bar code region of the second bar code oligonucleotides and the second molecular bar code is annealed.

The bar code oligonucleotides of polymer bar code reagent may include: the first bar code oligonucleotides, optionally with 5 ' to 3 ' directions include bar code region and the target region that can be annealed or connect with the first target nucleic acid fragment;And second bar code The target that oligonucleotides includes optionally bar code region with 5 ' to 3 ' directions and can anneal or connect with the second target nucleic acid fragment Region.

The bar code oligonucleotides of polymer bar code reagent may include: the first bar code oligonucleotides, it includes bar codes Region and the target region that can be connect with the first target nucleic acid fragment;And the second bar code oligonucleotides, it includes bar code regions With the target area that can be connect with the second target nucleic acid fragment.

The bar code oligonucleotides of polymer bar code reagent may include: the first bar code oligonucleotides, with 5 ' to 3 ' Direction includes bar code region and can be with the target region of the first target nucleic acid fragment annealing;And the second bar code oligonucleotides, The target region that includes bar code region with 5 ' to 3 ' directions and can anneal with the second target nucleic acid fragment.

12. the general aspects of bar code oligonucleotides

Bar code oligonucleotides includes bar code region.Bar code oligonucleotides includes optionally bar code with 5 ' to 3 ' directions Region and target region.Target region can anneal or connect with target nucleic acid fragment.Alternatively, bar code oligonucleotides can be substantially by item Code region composition is made of bar code region.

5 ' ends of bar code oligonucleotides can be phosphorylated.This, which can make the 5 ' of bar code oligonucleotides to hold, can be connected to target 3 ' ends of nucleic acid.Alternatively, 5 ' ends of bar code oligonucleotides can not be phosphorylated.

Bar code oligonucleotides can be single stranded nucleic acid molecule (such as single stranded DNA).Bar code oligonucleotides may include one A or more double stranded region.Bar code oligonucleotides can be double-stranded nucleic acid molecule (such as double-stranded DNA).

Bar code oligonucleotides may include deoxyribonucleotide or be made of deoxyribonucleotide.It is one or more Deoxyribonucleotide can be through modification deoxyribonucleotide (such as with biotin moiety modify deoxyribonucleotide Or deoxyuridine acid).Bar code oligonucleotides may include one or more degeneracy nucleotide or sequence.Bar code is few Nucleotide can not include any degeneracy nucleotide or sequence.

The bar code region of each bar code oligonucleotides may include different sequences.Each bar code region may include identification poly The sequence of body bar code reagent.For example, what all bar code regions that the sequence can be single polymer bar code reagent shared Constant region.The bar code region of each bar code oligonucleotides may include be not present in other bar code oligonucleotides and therefore can For uniquely identifying the unique sequences of each bar code oligonucleotides.Each bar code region may include at least five, at least 10 A, at least 15, at least 20, at least 25, at least 50 or at least 100 nucleotide.Preferably, each bar code region packet Nucleotide containing at least five.Preferably, each bar code region includes deoxyribonucleotide, all optionally in bar code region Nucleotide is all deoxyribonucleotide.One or more deoxyribonucleotides can be modified dezyribonucleoside Acid (such as the deoxyribonucleotide or deoxyuridine acid modified with biotin moiety).Bar code region may include one Or more degeneracy nucleotide or sequence.Bar code region can not include any degeneracy nucleotide or sequence.

The target region of each bar code oligonucleotides may include different sequence.Each target region may include can only with core The sequence (i.e. target-specific sequences) of single target nucleic acid fragment annealing in sour sample.Each target region may include one or more A random sequence or one or more degenerate sequences, so that target region can anneal with more than one target nucleic acid fragment.Often A target region may include at least five, at least ten, at least 15, at least 20, at least 25, at least 50 or at least 100 Nucleotide.Preferably, each target region includes at least five nucleotide.Each target region may include 5 to 100 nucleotide, 5 to 10 nucleotide, 10 to 20 nucleotide, 20 to 30 nucleotide, 30 to 50 nucleotide, 50 to 100 nucleotide, 10 to 90 nucleotide, 20 to 80 nucleotide, 30 to 70 nucleotide or 50 to 60 nucleotide.Preferably, each target region packet Containing 30 to 70 nucleotide.Preferably, each target region includes deoxyribonucleotide, all nucleosides optionally in target region Acid is all deoxyribonucleotide.One or more deoxyribonucleotides can be modified deoxyribonucleotide (such as the deoxyribonucleotide or deoxyuridine acid modified with biotin moiety).Each target region may include one Or more universal base (such as inosine), one or modified nucleotide and/or one or more nucleotide analogs.

Target region can be used for that bar code oligonucleotides and target nucleic acid fragment is made to anneal, and be then used as primer extension reaction Or the primer of amplified reaction (such as polymerase chain reaction).Alternatively, target region can be used for for bar code oligonucleotides being connected to Target nucleic acid fragment.Target region can be located at 5 ' ends of bar code oligonucleotides.Such target region can be phosphorylated.This can make target area 5 ' the ends in domain can be connected to 3 ' ends of target nucleic acid fragment.

Bar code oligonucleotides also may include one or more linking subregions.Be connected subregion can bar code region with Between target region.Bar code oligonucleotides can be for example included in 5 ' the linking subregion (5 ' linking subregion) in bar code region And/or 3 ' the linking subregion (3 ' linking subregion) in bar code region.Optionally, bar code oligonucleotides is with 5 ' to 3 ' sides To including bar code region, linking subregion and target region.

The linking subregion of bar code oligonucleotides may include the sequence complementary with the linking subregion of polymer molecular bar code Column or the sequence complementary with the hybridising region of polymer hybrid molecule.The linking subregion of bar code oligonucleotides may make bar code Changing oligonucleotides can connect with macromolecular or support (such as pearl).Linking subregion can be used for operating, purify, recycles, expand Increase or detect the bar code oligonucleotides and/or target nucleic acid that it can anneal or connect.

The linking subregion of each bar code oligonucleotides may include constant region.Optionally, each polymer bar code examination All linking subregions of the bar code oligonucleotides of agent are substantially the same.Linking subregion may include at least one, at least two, At least three, at least four, at least five, at least six, at least eight, at least ten, at least 15, at least 20, at least 25, At least 50, at least 100 or at least 250 nucleotide.Preferably, linking subregion includes at least four nucleotide.It is preferred that Ground, each linking subregion include deoxyribonucleotide, and optionally, all nucleotide being connected in subregion are all deoxidation core Ribotide.One or more deoxyribonucleotides can be through modification deoxyribonucleotide (such as with biotin portion Divide the deoxyribonucleotide or deoxyuridine acid of modification).Each linking subregion may include one or more logical With base (such as inosine), one or modified nucleotide and/or one or more nucleotide analogs.

Bar code oligonucleotides can be synthesized by chemical oligonucleotide synthesis method.Bar code oligonucleotide synthesis can Including one or more steps below: enzymatic production process, enzymatic amplification process or enzymatically modifying operation, such as in vitro Transcription, process of reverse-transcription, primer extend process or polymerase chain reaction reaction process.

These general aspects of bar code oligonucleotides are suitable for any polymer bar code reagent described herein.

13. the general aspects of polymer bar code agent library

The present invention provides polymer bar code agent libraries, and it includes the first and second polies as defined herein Body bar code reagent, wherein the bar code region of the first polymer bar code reagent is different from the item of the second polymer bar code reagent Code region.

Polymer bar code agent library may include at least five, at least ten, at least 20, at least 25, at least 50 A, at least 75, at least 100, at least 250, at least 500, at least 103It is a, at least 104It is a, at least 105It is a, at least 106It is a, at least 107It is a, at least 108It is a or at least 109A polymer bar code reagent as defined herein.Preferably, literary Library includes at least ten polymer bar code reagent as defined herein.Preferably, each polymer bar code reagent First and second bar code regions are different from the bar code region of at least nine other polymer bar code reagents in library.

First and second bar code regions of each polymer bar code reagent may differ from least four in library, at least 9 A, at least 19, at least 24, at least 49, at least 74, at least 99, at least 249, at least 499, at least 999 (i.e. 103- 1) a, at least 104- 1, at least 105- 1, at least 106- 1, at least 107- 1, at least 108- 1 or at least 109The bar code region of -1 other polymer bar code reagent.First and second bar code areas of each polymer bar code reagent Domain may differ from the bar code region with polymer bar code reagent every other in library.Preferably, each polymer bar code First and second bar code regions of reagent are different from the bar code region of other polymer bar code reagents of at least nine in library.

The bar code region of each polymer bar code reagent may differ from least four, at least nine, at least 19 in library A, at least 24, at least 49, at least 74, at least 99, at least 249, at least 499, at least 999 (i.e. 103-1) It is a, at least 104- 1, at least 105- 1, at least 106- 1, at least 107- 1, at least 108- 1 or at least 109- 1 other The bar code region of polymer bar code reagent.The bar code region of each polymer bar code reagent may differ from library it is all its The bar code region of his polymer bar code reagent.Preferably, the bar code region of each polymer bar code reagent is different from library The bar code region of other polymer bar code reagents of middle at least nine.

The present invention provides the polymer items comprising the first and second polymers bar code reagent as defined herein Codeization agent library, wherein the bar code region of the bar code oligonucleotides of the first polymer bar code reagent is different from the second poly The bar code region of the bar code oligonucleotides of body bar code reagent.

Different polymer bar code reagents in polymer bar code agent library may include that the bar code of different number is few Nucleotide.

Polymer bar code agent library may include at least five, at least ten, at least 20, at least 25, at least 50 A, at least 75, at least 100, at least 250, at least 500, at least 103It is a, at least 104It is a, at least 105It is a, at least 106It is a, at least 107It is a, at least 108It is a or at least 109It is a to be as defined herein polymer bar code reagent.Preferably, library Include at least ten polymer bar code reagent as defined herein.Preferably, the of each polymer bar code reagent One and second the bar code region of bar code oligonucleotides be different from the bar codes of at least nine other polymer bar code reagents in library Change the bar code region of oligonucleotides.

The bar code region of first and second bar code oligonucleotides of each polymer bar code reagent may differ from library Middle at least four, at least nine, at least 19, at least 24, at least 49, at least 74, at least 99, at least 249, at least 499, at least 999 (i.e. 103- 1) a, at least 104- 1, at least 105- 1, at least 106- 1, at least 107- 1, at least 108- 1 or at least 109The bar code region of the bar code oligonucleotides of -1 other polymer bar code reagent.Each polymer The bar code region of first and second bar code oligonucleotides of bar code reagent may differ from and polymer every other in library The bar code region of the bar code oligonucleotides of bar code reagent.Preferably, the first and second of each polymer bar code reagent The bar code region of bar code oligonucleotides is different from the bar code few nucleosides of other polymer bar code reagents of at least nine in library The bar code region of acid.

The bar code region of the bar code oligonucleotides of each polymer bar code reagent may differ from least four in library, At least nine, at least 19, at least 24, at least 49, at least 74, at least 99, at least 249, at least 499, at least 999 (i.e. 103- 1) a, at least 104- 1, at least 105- 1, at least 106- 1, at least 107- 1, at least 108- 1 or extremely Few 109The bar code region of the bar code oligonucleotides of -1 other polymer bar code reagent.Each polymer bar code reagent The bar code region of bar code oligonucleotides may differ from bar code widow's core of every other polymer bar code reagent in library The bar code region of thuja acid.Preferably, the bar code region of the bar code oligonucleotides of each polymer bar code reagent is different from text The bar code region of the bar code oligonucleotides of other polymer bar code reagents of at least nine in library.

These general aspects of polymer bar code agent library are suitable for any polymer bar code described herein Reagent.

14. polymer bar code reagent includes the bar code oligonucleotides annealed with polymer molecular bar code

The present invention provides the polymer bar code reagents for tagged target nucleic acid, wherein the reagent includes: connection exists The first and second molecular bar codes (i.e. polymer molecular bar code) together, wherein each molecular bar code includes to contain bar code region Nucleic acid sequence;And the first and second bar code oligonucleotides, wherein the first bar code oligonucleotides is optionally with 5 ' to 3 ' sides To the bar code region comprising annealing with the bar code region of the first molecular bar code and it can anneal or connect with the first target nucleic acid fragment Target region, and wherein the second bar code oligonucleotides includes optionally bar code with the second molecular bar code with 5 ' to 3 ' directions The bar code region of region annealing and the target region that can be annealed or connect with the second target nucleic acid fragment.

The present invention provides the polymer bar code reagents for tagged target nucleic acid, wherein the reagent includes: connection exists The first and second molecular bar codes (i.e. polymer molecular bar code) together, wherein each molecular bar code includes to contain bar code region Nucleic acid sequence;And the first and second bar code oligonucleotides, wherein the first bar code oligonucleotides includes and the first bar code point The bar code region of the bar code region annealing of son and the target region that can be connect with the first target nucleic acid fragment, and wherein the second bar code Change oligonucleotides to include the bar code region annealed with the bar code region of the second molecular bar code and can connect with the second target nucleic acid fragment The target region connect.

The present invention provides the polymer bar code reagents for tagged target nucleic acid, wherein the reagent includes: connection exists The first and second molecular bar codes (i.e. polymer molecular bar code) together, wherein each molecular bar code includes to contain bar code region Nucleic acid sequence;And the first and second bar code oligonucleotides, wherein the first bar code oligonucleotides includes with 5 ' to 3 ' directions The bar code region annealed with the bar code region of the first molecular bar code and the target region that can be annealed with the first target nucleic acid fragment, and Wherein the second bar code oligonucleotides includes the bar code region annealed with the bar code region of the second molecular bar code with 5 ' to 3 ' directions With the target region that can be annealed with the second target nucleic acid fragment.

The present invention provides the polymer bar code reagents for tagged target nucleic acid, wherein the reagent includes: connection exists The first and second molecular bar codes (i.e. polymer molecular bar code) together, wherein each molecular bar code includes to contain bar code region Nucleic acid sequence;And the first and second bar code oligonucleotides, wherein the first bar code oligonucleotides includes and the first bar code point The bar code region that the bar code region of son is annealed and can be connect with the first target nucleic acid fragment, and wherein second bar code widow's core Thuja acid includes the bar code region annealed and can connect with the second target nucleic acid fragment with the bar code region of the second molecular bar code.

Each bar code oligonucleotides can substantially be made of bar code region or be made of bar code region.

Preferably, molecular bar code includes deoxyribonucleotide or is made of deoxyribonucleotide.It is one or more Deoxyribonucleotide can be through modification deoxyribonucleotide (such as with biotin moiety modify deoxyribonucleotide Or deoxyuridine acid).Molecular bar code may include one or more degeneracy nucleotide or sequence.Molecular bar code can not Include any degeneracy nucleotide or sequence.

Bar code region can uniquely identify each molecular bar code.Each bar code region may include identification polymer bar code examination The sequence of agent.For example, the sequence can be the constant region that all bar code regions of single polymer bar code reagent share.Each Bar code region may include at least five, at least ten, at least 15, at least 20, at least 25, at least 50 or at least 100 Nucleotide.Preferably, each bar code region includes at least five nucleotide.Preferably, each bar code region includes deoxyribose core Thuja acid, optionally all nucleotide in bar code region are all deoxyribonucleotides.One or more dezyribonucleosides Acid can be modified deoxyribonucleotide (such as with biotin moiety modify deoxyribonucleotide or deoxidation urine it is phonetic Pyridine nucleotide).Bar code region may include one or more degeneracy nucleotide or sequence.Bar code region can not include any letter And nucleotide or sequence.

Preferably, the bar code region of the first bar code oligonucleotides include with the bar code region of the first molecular bar code it is complementary and The sequence of annealing, and the bar code region of the second bar code oligonucleotides include with the bar code region of the second molecular bar code it is complementary and The sequence of annealing.The complementary series of each bar code oligonucleotides can be at least five, at least ten, at least 15, at least 20 A, at least 25, at least 50 or at least 100 continuous nucleotides.

The target region (it does not anneal with polymer molecular bar code) of bar code oligonucleotides can not be with polymer bar code point It is sub complementary.

Bar code oligonucleotides may include the joint area between bar code region and target region.Joint area may include not With the annealing of polymer molecular bar code and one or more continuous nucleotides not complementary with target nucleic acid fragment.Connector may include 1 to 100,5 to 75,10 to 50,15 to 30 or 20 to 25 non-complementary nucleotides.Preferably, connector includes 15 to 30 A non-complementary nucleotide.The effect of the bar code reaction carried out using polymer bar code reagent is enhanced using this joint area Rate.

Molecular bar code also may include one or more nucleic acid sequences not complementary with the bar code region of bar code oligonucleotides Column.For example, molecular bar code may include one or more linking subregions.Molecular bar code can be for example included in the 5 ' of bar code region Linking subregion (5 ' linking subregion) and/or 3 ' linking subregion (3 ' linking subregion) in bar code region.Linking The subregion one or more parts of subregion (and/or linking) can be with oligonucleotides (such as the rank of bar code oligonucleotides Connect subregion) complementary and annealing.Alternatively, molecular bar code linking subregion (and/or linking subregion one or more portions Point) can not be complementary with the sequence of bar code oligonucleotides.Linking subregion can be used for operating, purify, retrieving, expanding and/or Detect molecular bar code.

Polymer bar code reagent may be arranged so that: it includes linking sub-district that each molecular bar code, which includes with 5 ' to 3 ' directions, The nucleic acid sequence in domain and bar code region;First bar code oligonucleotides optionally with 5 ' to 3 ' directions include and the first molecular bar code The annealing of bar code region bar code region, with the first molecular bar code be connected linking subregion that subregion is annealed and can be with the The target region of one target nucleus acid region annealing;And the second bar code oligonucleotides optionally includes and Article 2 with 5 ' to 3 ' directions The bar code region of the bar code region annealing of code molecule, the linking subregion and energy for being connected subregion annealing with the second molecular bar code Enough target regions with the annealing of the second target nucleus acid region.

The linking subregion of each molecular bar code may include constant region.Optionally, all ranks of polymer bar code reagent It is substantially the same to connect subregion.Linking subregion may include at least one, at least two, at least three, at least four, at least five, extremely Few 6, at least eight, at least ten, at least 15, at least 20, at least 25, at least 50, at least 100 or at least 250 nucleotide.Preferably, linking subregion includes at least four nucleotide.Preferably, each linking subregion includes deoxidation Ribonucleotide, optionally, all nucleotide being connected in subregion are all deoxyribonucleotides.One or more deoxidations Ribonucleotide can be through modification deoxyribonucleotide (such as the deoxyribonucleotide modified with biotin moiety or de- Oxygen uridylate).Each linking subregion may include one or more universal bases (such as inosine), one or through repairing The nucleotide of decorations and/or one or more nucleotide analogs.

Bar code oligonucleotides may include the joint area between linking subregion and target region.Joint area may include It does not anneal with polymer molecular bar code and not complementary with target nucleic acid fragment one or more continuous nucleotides.Connector can wrap Containing 1 to 100,5 to 75,10 to 50,15 to 30 or 20 to 25 non-complementary nucleotides.Preferably, connector include 15 to 30 non-complementary nucleotides.The bar code reaction carried out using polymer bar code reagent is enhanced using this joint area Efficiency.

The molecular bar code of polymer molecular bar code can be associated on nucleic acid molecules.Such nucleic acid molecules can provide can be with The main chain of single-stranded bar code oligonucleotides annealing.Alternatively, the molecular bar code of polymer molecular bar code can be by described herein Any other mode links together.

Polymer bar code reagent may include: at least two, at least three, at least four, at least five, at least ten, at least 20, at least 25, at least 50, at least 75, at least 100, at least 200, at least 500, at least 1000, at least 5000 or at least 10,000 molecular bar code to link together, wherein each molecular bar code is as defined herein;And with The bar code oligonucleotides of each molecular bar code annealing, wherein each bar code oligonucleotides is as defined herein.Preferably, Polymer bar code reagent includes the molecular bar code that at least five links together, wherein each molecular bar code as limited herein It is fixed;And the bar code oligonucleotides with the annealing of each molecular bar code, wherein each bar code oligonucleotides as limited herein It is fixed.

Polymer bar code reagent may include: at least two, at least three, at least four, at least five, at least ten, at least 20, at least 25, at least 50, at least 75, at least 100, at least 200, at least 500, at least 1000, at least 5000, at least 104It is a, at least 105It is a or at least 106A unique or different molecular bar code to link together, wherein often A molecular bar code is as defined herein;And the bar code oligonucleotides with the annealing of each molecular bar code, wherein each bar code Change oligonucleotides to be as defined herein.Preferably, polymer bar code reagent include the uniqueness that links together of at least five or Different molecular bar code, wherein each molecular bar code is as defined herein;And the bar code with the annealing of each molecular bar code Oligonucleotides, wherein each bar code oligonucleotides is as defined herein.

Polymer bar code reagent may include: at least five, at least ten, at least 20, at least 25, at least 50, extremely Few 75, at least 100, at least 200, at least 500, at least 1000, at least 5000 or at least 10,000 bar codes Region, wherein each bar code region is as defined herein;And the bar code oligonucleotides with the annealing of each bar code region, In each bar code oligonucleotides be as defined herein.Preferably, polymer bar code reagent includes at least five bar code area Domain, wherein each bar code region is as defined herein;And the bar code oligonucleotides with the annealing of each bar code region, wherein Each bar code oligonucleotides is as defined herein.

Polymer bar code reagent may include: at least two, at least three, at least four, at least five, at least ten, at least 20, at least 25, at least 50, at least 75, at least 100, at least 200, at least 500, at least 1000, at least 5000, at least 104It is a, at least 105It is a or at least 106A unique or different bar code region, wherein each bar code region is such as Defined in this paper;And the bar code oligonucleotides with the annealing of each bar code region, wherein each bar code oligonucleotides is such as Defined in this paper.Preferably, polymer bar code reagent includes at least five uniqueness or different bar code regions, wherein often A bar code region is as defined herein;And the bar code oligonucleotides with the annealing of each bar code region, wherein each bar code Change oligonucleotides to be as defined herein.

Fig. 1 shows polymer bar code reagent, and it includes first (D1, E1 and F1) and second (D2, E2 and F2) bar code Molecule, each molecular bar code include the nucleic acid sequence containing bar code region (E1 and E2).These the first and second molecular bar codes examples Such as linked together by connecting nucleic acid sequence (S).Polymer bar code reagent also includes first (A1, B1, C1 and G1) and Two (A2, B2, C2 and G2) bar code oligonucleotides.These bar code oligonucleotides respectively contain bar code region (B1 and B2) and target Region (G1 and G2).

Bar code region in bar code oligonucleotides can respectively contain be not present in it is only in other bar code oligonucleotides Special sequence, and can be accordingly used in uniquely identifying each such molecular bar code.Target region can be used for making bar code oligonucleotides It anneals with target nucleic acid fragment, and is then used as drawing for primer extension reaction or amplified reaction (such as polymerase chain reaction) Object.

Each molecular bar code is also optionally including 5 ' linkings subregion (F1 and F2).Bar code oligonucleotides then may be used also Include 3 ' linkings subregion (C1 and C2) complementary with the 5 ' of molecular bar code linking subregions.

Each molecular bar code may include also identical in each molecular bar code optionally including 3 ' regions (D1 and D2) Sequence.Bar code oligonucleotides then also may include the 5 ' regions complementary with 3 ' regions of molecular bar code (A1 and A2).These 3 ' Region can be used for operating or amplifying nucleic acid sequence, such as the sequence by being generated with bar code oligonucleotide marker nucleic acid target. 3 ' regions may include at least four, at least five, at least six, at least eight, at least ten, at least 15, at least 20, at least 25 A, at least 50, at least 100 or at least 250 nucleotide.Preferably, 3 ' regions include at least four nucleotide.It is preferred that Ground, each 3 ' region include deoxyribonucleotide, and optionally, all nucleotide in 3 ' regions are all dezyribonucleosides Acid.One or more deoxyribonucleotides can be modified deoxyribonucleotide and (such as be repaired with biotin moiety Deoxyribonucleotide or the deoxyuridine acid of decorations).Each 3 ' region may include one or more universal base (examples Such as inosine), one or modified nucleotide and/or one or more nucleotide analogs.

The present invention provides polymer bar code agent library, it is used to mark the target nucleus for sequencing it includes at least ten The polymer bar code reagent of acid, wherein each polymer bar code reagent includes: including the in (single) nucleic acid molecules One and second molecular bar code, wherein each molecular bar code includes the nucleic acid sequence containing bar code region;And first and second Codeization oligonucleotides, wherein the first bar code oligonucleotides optionally includes the bar code with the first molecular bar code with 5 ' to 3 ' directions Region is complementary and the bar code region of annealing and can with the target region the first target nucleic acid fragment annealing or connect, and wherein second Bar code oligonucleotides includes optionally complementary with the bar code region of the second molecular bar code and annealing bar code with 5 ' to 3 ' directions Region and the target region that can be annealed or connect with the second target nucleic acid fragment.Preferably, the of each polymer bar code reagent One and second the bar code region of bar code oligonucleotides be different from the bar codes of at least nine other polymer bar code reagents in library Change the bar code region of oligonucleotides.

15. polymer bar code reagent includes the bar code oligonucleotides annealed with polymer hybrid molecule

The present invention provides the polymer bar code reagents for tagged target nucleic acid, wherein the reagent includes: connection exists The first and second hybrid molecules (i.e. polymer hybrid molecule) together, wherein each hybrid molecule includes to contain hybridising region Nucleic acid sequence;And the first and second bar code oligonucleotides, wherein the first bar code oligonucleotides is optionally with 5 ' to 3 ' sides To comprising being connected subregion, bar code region and can be with the first target nucleic acid fragment with what the hybridising region of the first hybrid molecule was annealed The target region of annealing or connection, and the second bar code oligonucleotides optionally includes and the second hybrid molecule with 5 ' to 3 ' directions The linking subregion of hybridising region annealing, bar code region and can be with the annealing of the second target nucleic acid fragment or the target region that connect.

Optionally, the first and second bar code oligonucleotides respectively contain linking subregion in single continuous sequence and Target region, the single continuous sequence is complementary with the hybridising region of hybrid molecule and anneals, and can also be with target nucleic acid fragment Annealing or connection.

The present invention provides the polymer bar code reagents for tagged target nucleic acid, wherein the reagent includes: connection exists The first and second hybrid molecules (i.e. polymer hybrid molecule) together, wherein each hybrid molecule includes to contain hybridising region Nucleic acid sequence;And the first and second bar code oligonucleotides, wherein the first bar code oligonucleotides is optionally with 5 ' to 3 ' sides To comprising bar code region, with the hybridising region of the first hybrid molecule anneal be connected subregion and can be with the first target nucleic acid fragment Annealing or connection target region, and wherein the second bar code oligonucleotides optionally with 5 ' to 3 ' directions include bar code region, The linking subregion annealed with the hybridising region of the second hybrid molecule and the target that can be annealed or connect with the second target nucleic acid fragment Region.

Optionally, the first and second bar code oligonucleotides respectively contain linking subregion in single continuous sequence and Target region, the single continuous sequence is complementary with the hybridising region of hybrid molecule and anneals, and can also be with target nucleic acid fragment Annealing or connection.

The present invention provides the polymer bar code reagents for tagged target nucleic acid, wherein the reagent includes: connection exists The first and second hybrid molecules (i.e. polymer hybrid molecule) together, wherein each hybrid molecule includes to contain hybridising region Nucleic acid sequence;And the first and second bar code oligonucleotides, wherein the first bar code oligonucleotides includes (with 5 ' -3 ' or 3 ' - 5 ' directions) with what the hybridising region of the first hybrid molecule was annealed it is connected subregion, bar code region and can be with the first target nucleic acid piece The target region of section connection, and wherein the second bar code oligonucleotides includes that (with 5 ' -3 ' or 3 ' -5 ' directions) hybridize point with second Linking subregion, bar code region and the target region that can be connect with the second target nucleic acid fragment of the hybridising region annealing of son.

Optionally, the first and second bar code oligonucleotides respectively contain linking subregion in single continuous sequence and Target region, the single continuous sequence is complementary with the hybridising region of hybrid molecule and anneals, and can also be with target nucleic acid fragment Connection.

The present invention provides the polymer bar code reagents for tagged target nucleic acid, wherein the reagent includes: connection exists The first and second hybrid molecules (i.e. polymer hybrid molecule) together, wherein each hybrid molecule includes to contain hybridising region Nucleic acid sequence;And the first and second bar code oligonucleotides, wherein the first bar code oligonucleotides includes (with 5 ' -3 ' or 3 ' - 5 ' directions) bar code region, with the hybridising region of the first hybrid molecule anneal be connected subregion and can be with the first target nucleic acid piece Section connection target region, and wherein the second bar code oligonucleotides include (with 5 ' -3 ' or 3 ' -5 ' directions) bar code region, with The linking subregion of the hybridising region annealing of second hybrid molecule and the target region that can be connect with the second target nucleic acid fragment

Optionally, the first and second bar code oligonucleotides respectively contain linking subregion in single continuous sequence and Target region, the single continuous sequence is complementary with the hybridising region of hybrid molecule and anneals, and can also be with target nucleic acid fragment Connection.

The present invention provides the polymer bar code reagents for tagged target nucleic acid, wherein the reagent includes: connection exists The first and second hybrid molecules (i.e. polymer hybrid molecule) together, wherein each hybrid molecule includes to contain bar code region Nucleic acid sequence;And the first and second bar code oligonucleotides, wherein the first bar code oligonucleotides includes with 5 ' to 3 ' directions Be connected subregion, bar code region and can be with the annealing of the first target nucleic acid fragment with what the hybridising region of the first hybrid molecule was annealed Target region, and wherein the second bar code oligonucleotides with 5 ' to 3 ' directions include and the second hybrid molecule hybridising region anneal Linking subregion, bar code region and can with the second target nucleic acid fragment anneal target region.

The present invention provides the polymer bar code reagents for tagged target nucleic acid, wherein the reagent includes: connection exists The first and second hybrid molecules (i.e. polymer hybrid molecule) together, wherein each hybrid molecule includes to contain bar code region Nucleic acid sequence;And the first and second bar code oligonucleotides, wherein the first bar code oligonucleotides includes with 5 ' to 3 ' directions Bar code region, with the hybridising region of the first hybrid molecule anneal be connected subregion and can be with the annealing of the first target nucleic acid fragment Target region, and wherein the second bar code oligonucleotides with 5 ' to 3 ' directions includes bar code region, miscellaneous with the second hybrid molecule The target region handing over the linking subregion of region annealing and capable of annealing with the second target nucleic acid fragment.

Optionally, the first and second bar code oligonucleotides respectively contain linking subregion in single continuous sequence and Target region, the single continuous sequence is complementary with the hybridising region of hybrid molecule and anneals, and can also anneal with target nucleic acid.

Preferably, the linking subregion of the first bar code oligonucleotides includes complementary with the hybridising region of the first hybrid molecule And the sequence of annealing, and the linking subregion of the second bar code oligonucleotides includes mutual with the hybridising region of the second hybrid molecule The sequence mended and annealed.The complementary series of each bar code oligonucleotides can be at least five, at least ten, at least 15, extremely Few 20, at least 25, at least 50 or at least 100 continuous nucleotides.

The hybridising region of each hybrid molecule may include constant region.Preferably, all hybridization of polymer bar code reagent Region is substantially the same.Optionally, all hybridising regions of polymer bar code agent library are substantially the same.It hybridising region can Include at least one, at least two, at least three, at least four, at least five, at least six, at least eight, at least ten, at least 15 A, at least 20, at least 25, at least 50, at least 100 or at least 250 nucleotide.Preferably, hybridising region includes At least four nucleotide.Preferably, each hybridising region includes deoxyribonucleotide, optionally, all cores in hybridising region Thuja acid is all deoxyribonucleotide.One or more deoxyribonucleotides can be through modifying deoxyribonucleotide (such as the deoxyribonucleotide or deoxyuridine acid modified with biotin moiety).Each hybridising region may include one A or more universal base (such as inosine), one or modified nucleotide and/or one or more ucleotides are seemingly Object.

The target region of bar code oligonucleotides can not anneal with polymer hybrid molecule.The target area of bar code oligonucleotides It domain can not be complementary with polymer hybrid molecule.

Bar code oligonucleotides may include the joint area between linking subregion and target area.Joint area may include not With the annealing of polymer hybrid molecule and one or more continuous nucleotides not complementary with target nucleic acid fragment.Connector may include 1 to 100,5 to 75,10 to 50,15 to 30 or 20 to 25 non-complementary nucleotides.Preferably, connector includes 15 to 30 A non-complementary nucleotide.The effect of the bar code reaction carried out using polymer bar code reagent is enhanced using this joint area Rate.

Hybrid molecule also may include one or more nucleic acid sequences not complementary with bar code oligonucleotides.For example, miscellaneous Handing over molecule may include one or more linking subregions.Hybrid molecule can be for example included in 5 ' linking sub-district of hybridising region Domain (5 ' linking subregion) and/or 3 ' linking subregion (3 ' linking subregion) in hybridising region.It is available to be connected subregion In operation, purifying, retrieval, amplification and/or detection hybrid molecule.

The linking subregion of each hybrid molecule may include constant region.Optionally, all linkings of polymer hybridizing reagent Subregion is substantially the same.Linking subregion may include at least one, at least two, at least three, at least four, at least five, at least 6, at least eight, at least ten, at least 15, at least 20, at least 25, at least 50, at least 100 or at least 250 A nucleotide.Preferably, linking subregion includes at least four nucleotide.Preferably, each linking subregion includes deoxyribose Nucleotide, optionally, all nucleotide being connected in subregion are all deoxyribonucleotides.One or more deoxyriboses Nucleotide can be through modification deoxyribonucleotide (such as with biotin moiety modify deoxyribonucleotide or deoxidation urinate Pyrimidine nucleotide).Each linking subregion may include one or more universal bases (such as inosine), one or modified Nucleotide and/or one or more nucleotide analogs.

Bar code oligonucleotides may include the joint area between linking subregion and target region.Joint area may include It does not anneal with polymer hybrid molecule and not complementary with target nucleic acid fragment one or more continuous nucleotides.Connector can wrap Containing 1 to 100,5 to 75,10 to 50,15 to 30 or 20 to 25 non-complementary nucleotides.Preferably, connector include 15 to 30 non-complementary nucleotides.The bar code reaction carried out using polymer bar code reagent is enhanced using this joint area Efficiency.

The present invention provides polymer bar code agent library, it is used to mark the target nucleus for sequencing it includes at least ten Sour polymer bar code reagent, wherein each polymer bar code reagent includes: including first in (single) nucleic acid molecules With the second hybrid molecule, wherein each hybrid molecule include the nucleic acid sequence containing hybridising region;And first and second bar code Change oligonucleotides, wherein the first bar code oligonucleotides optionally includes the hybridization region with the first hybrid molecule with 5 ' to 3 ' directions The linking subregion of domain complementation and annealing, bar code region and the target region that can be annealed or connect with the first target nucleic acid fragment, and And wherein the second bar code oligonucleotides optionally with 5 ' to 3 ' directions include with the hybridising region of the second hybrid molecule it is complementary and Linking subregion, bar code region and the target region that can be annealed or connect with the second target nucleic acid fragment of annealing.

Preferably, the bar code region of the first and second bar code oligonucleotides of each polymer bar code reagent is different from The bar code region of the bar code oligonucleotides of other polymer bar code reagents of at least nine in library.

The present invention provides polymer bar code agent library, it is used to mark the target nucleus for sequencing it includes at least ten Sour polymer bar code reagent, wherein each polymer bar code reagent includes: including first in (single) nucleic acid molecules With the second hybrid molecule, wherein each hybrid molecule include the nucleic acid sequence containing hybridising region;And first and second bar code Change oligonucleotides, wherein the first bar code oligonucleotides includes optionally bar code region with 5 ' to 3 ' directions, hybridizes point with first The hybridising region complementation of son and the linking subregion annealed and the target region that can be annealed or connect with the first target nucleic acid fragment, and And wherein the second bar code oligonucleotides optionally includes the hybridization region of bar code region and the second hybrid molecule with 5 ' to 3 ' directions Domain complementation and the linking subregion of annealing and the target region that can be annealed or connect with the second target nucleic acid fragment.Preferably, each Different from least nine in library, other are more in the bar code region of first and second bar code oligonucleotides of polymer bar code reagent The bar code region of the bar code oligonucleotides of aggressiveness bar code reagent.

16. polymer bar code reagent includes the bar code oligonucleotides being associated by macromolecular

The present invention provides the polymer bar code reagents for tagged target nucleic acid, wherein the reagent includes by dividing greatly The first and second bar code oligonucleotides that son links together, and wherein bar code oligonucleotides respectively contains bar code area Domain.

First bar code oligonucleotides also may include the target region that can be annealed or connect with the first target nucleic acid fragment, and Second bar code oligonucleotides also may include the target region that can be annealed or connect with the second target nucleic acid fragment.

First bar code oligonucleotides can include bar code region with 5 ' -3 ' directions and can move back with the first target nucleic acid fragment The target region of fire, and the second bar code oligonucleotides can include bar code region and can be with the second target nucleus with 5 ' -3 ' directions The target region of acid fragment annealing.

Bar code oligonucleotides also may include any feature described herein.

Bar code oligonucleotides can be connected by conjunction with macromolecular and/or and annealing with macromolecular by macromolecular System.

Bar code oligonucleotides can directly or indirectly (such as passing through linkers) be contacted with macromolecular.Bar code few nucleosides Acid can be associated by conjunction with macromolecular and/or by conjunction with the linkers for being bound to macromolecular or and annealing.Bar code Changing oligonucleotides can be by being covalently attached, being not covalently linked (such as protein-protein interaction or Streptavidin-life Object element key) or nucleic acid hybridizes and macromolecular (or linkers) combine.Linkers can be biopolymer (such as nucleic acid point Son) or synthetic polymer.Linkers may include one or more ethylene glycol and/or poly(ethylene glycol) (such as six ethylene glycol Or five ethylene glycol) unit.Linkers may include one or more ethyls, for example, C3 (three carbon) interval base, C6 interval base, C12 interval base or C18 interval base.

It is (such as single-stranded that macromolecular can be synthetic polymer (such as dendritic) or biopolymer such as nucleic acid Nucleic acid, such as single stranded DNA), peptide, polypeptide or protein (such as polymer protein).

Dendritic may include at least 2 generations, at least 3 generations, at least 5 generations or at least 10 generations.

Macromolecular can be the nucleic acid comprising two or more nucleotide, and each nucleotide can be with bar code few nucleosides Acid combines.Additionally or alternatively, nucleic acid may include two or more regions, and each region can be with bar code oligonucleotides Hybridization.

Nucleic acid may include first and second through modified nucleoside acid, wherein it is each through modified nucleoside acid include can be with bar code The bound fraction (for example, biotin moiety, or can be used for the alkynyl moiety of click chemistry reaction) that oligonucleotides combines.Optionally, First and second through modified nucleoside acid can by least one, at least two, at least five or at least ten nucleotide interleave core Acid sequence separates.

Nucleic acid may include the first hybridising region and the second hybridising region, wherein each hybridising region includes and bar code widow's core The sequence of at least one nucleotide in thuja acid is complementary and the sequence that can be hybrid with it.Complementary series can be at least five, extremely Few 10, at least 15, at least 20, at least 25 or at least 50 continuous nucleotides.Optionally, the first hybridising region and Two hybridising regions can by least one, at least two, the nucleic acid sequence that interleaves of at least five or at least ten nucleotide separates.

Macromolecular can be protein, such as polymer protein, such as with polyprotein matter or different polyprotein matter.For example, Protein may include Streptavidin, such as four poly- Streptavidins.

Additionally provide the library of the polymer bar code reagent of the bar code oligonucleotides comprising being associated by macromolecular. It such library can be based on the general aspects of polymer bar code agent library described herein.In library, each poly Body bar code reagent may include different macromolecular.

17. polymer bar code reagent includes the bar code few nucleosides contacted by solid support or semi-solid support Acid

The present invention provides the polymer bar code reagents for tagged target nucleic acid, wherein the reagent includes to pass through solid The first and second bar code oligonucleotides that support or semi-solid support link together, and wherein bar code few nucleosides Acid respectively contains bar code region.

First bar code oligonucleotides also may include the target region that can be annealed or connect with the first target nucleic acid fragment, and Second bar code oligonucleotides also may include the target region that can be annealed or connect with the second target nucleic acid fragment.

First bar code oligonucleotides can include bar code region with 5 ' -3 ' directions and can move back with the first target nucleic acid fragment The target region of fire, and the second bar code oligonucleotides can include bar code region and can be with the second target nucleus with 5 ' -3 ' directions The target region of acid fragment annealing.

Bar code oligonucleotides also may include any feature described herein.

Bar code oligonucleotides can be associated by solid support or semi-solid support.Bar code oligonucleotides can be straight It connects or (such as passing through linkers) is associated with support indirectly.Bar code oligonucleotides can by conjunction with support and/ Or it is associated by conjunction with the linkers being bound on support or and annealing.Bar code oligonucleotides can be by covalently connecting It connects, be not covalently linked (such as protein-protein interaction or Streptavidin-biotin key) or nucleic acid hybridization and support Object (or linkers) combines.Linkers can be biopolymer (such as nucleic acid molecules) or synthetic polymer.Connector point Son may include one or more ethylene glycol and/or poly(ethylene glycol) (such as six ethylene glycol or five ethylene glycol) unit.Connector point Son may include one or more ethyls, such as C3 (three carbon) interval base, C6 interval base, C12 interval base or C18 interval base.

Support may include flat surfaces.Support can be such as glass slide, such as glass slide.Glass slide can be with It is the flow cell for sequencing.If support is glass slide, the first and second bar code oligonucleotides can be fixed on load glass In the zone of dispersion of on piece.Optionally, in library the bar code oligonucleotides of each polymer bar code reagent relative to library In other polymer bar code reagents bar code oligonucleotide pair zone of dispersion different on glass slide in.Support can To be the plate comprising hole, optionally wherein the first and second bar code oligonucleotide pairs in same hole.Optionally, in library Bar code of the bar code oligonucleotides of each polymer bar code reagent relative to other polymer bar code reagents in library Oligonucleotide pair is in the different holes of plate.

Preferably, support is pearl (such as gel beads).Pearl can be sepharose 4B, silica beads, styrofoam Pearl, gel beads (such as can from 10 ×Obtain those of), antibody conjugate pearl, oligo-dT be conjugated pearl, strepto- Avidin pearl or magnetic bead (such as super-paramagnetic bead).Pearl can have any size and/or molecular structure.For example, pearl can be diameter 10 nanometers to 100 microns, 100 nanometers to 10 microns or 1 micron to 5 microns of diameter of diameter.Optionally, pearl is diameter about 10 Nanometer, about 100 nanometers of diameter, about 1 micron of diameter, about 10 microns of diameter or about 100 microns of diameter.Pearl can be it is solid, or To can be hollow or part hollow or porous for pearl as an alternative by person.For certain bar code methods, certain sizes Pearl can be most preferably.For example, the pearl less than 5.0 microns or less than 1.0 microns is for making the nucleic acid target in individual cells Bar code can be most available.Preferably, the bar code oligonucleotides of each polymer bar code reagent phase together in library The bar code oligonucleotides of other polymer bar code reagents in library is contacted on different pearls.

Support can be functionalized and enable to connect two or more bar code oligonucleotides.This functionalization can By adding chemical part (such as carboxylate group, alkynes, azide, acrylate group, amino, sulfuric ester to support Group or succinimide group) and/or part (such as Streptavidin, avidin or albumen based on protein G) Lai Shixian.Bar code oligonucleotides can directly or indirectly (such as passing through linkers) be connect with the part.

It can make official under conditions of promoting two or more bar code oligonucleotides to connect with each pearl in solution The support (such as pearl) of energyization contacts with the solution of bar code oligonucleotides and (generates polymer bar code reagent).

Additionally provide the library of the polymer bar code reagent of the bar code oligonucleotides comprising being associated by support. It such library can be based on the general aspects of polymer bar code agent library described herein.In library, each poly Body bar code reagent may include different supports (for example, pearl of different labels).In the library of polymer bar code reagent In, the bar code oligonucleotides of each polymer bar code reagent can be together relative to other polymer bar codes in library in library The bar code oligonucleotides for changing reagent is related on different supports.

18. polymer bar code reagent includes by the inclusion of the bar code oligonucleotides to link together in lipid carrier

The present invention provides the polymer bar code reagent for tagged target nucleic acid, wherein the reagent includes first and the Two bar code oligonucleotides and lipid carrier, wherein the first and second bar code oligonucleotides by the inclusion of in lipid carrier and It links together, and wherein the bar code oligonucleotides respectively contains bar code region.

First bar code oligonucleotides also may include the target region that can be annealed or connect with the first target nucleic acid fragment, and Second bar code oligonucleotides also may include the target region that can be annealed or connect with the second target nucleic acid fragment.

First bar code oligonucleotides can include bar code region with 5 ' -3 ' directions and can move back with the first target nucleic acid fragment The target region of fire, and the second bar code oligonucleotides can include bar code region and can be with the second target nucleus with 5 ' -3 ' directions The target region of acid fragment annealing.

Bar code oligonucleotides can further include any feature described herein.

The present invention provides the libraries of polymer bar code reagent, and it includes more than as defined herein first and second Aggressiveness bar code reagent, wherein the bar code oligonucleotides of the first polymer bar code reagent is included in the first lipid carrier, And wherein the bar code oligonucleotides of the second polymer bar code reagent includes in the second lipid carrier, and wherein more than first The bar code that the bar code region of the bar code oligonucleotides of aggressiveness bar code reagent is different from the second polymer bar code reagent is few The bar code region of nucleotide.

Polymer bar code agent library may include at least five, at least ten, at least 20, at least 25, at least 50 A, at least 75, at least 100, at least 250, at least 500, at least 103It is a, at least 104It is a, at least 105It is a, at least 106It is a, at least 107It is a, at least 108It is a or at least 109A polymer bar code reagent as defined herein.Preferably, often The bar code region of first and second bar code oligonucleotides of a polymer bar code reagent be different from library at least nine other The bar code region of the bar code oligonucleotides of polymer bar code reagent.

The bar code oligonucleotides of each polymer bar code reagent is included in different lipid carriers.

Lipid carrier can be liposome or micella.Lipid carrier can be phospholipid carrier.Lipid carrier may include one kind Or more amphipathic molecule.Lipid carrier may include one or more of phosphatide.Phosphatide can be phosphatidyl choline.Lipid carries Body may include one or more of following component: phosphatidyl-ethanolamine, phosphatidylserine, cholesterol, cuorin, di(2-ethylhexyl)phosphate whale Wax ester, stearylamine, phosphatidyl glycerol, dipalmitoylphosphatidylcholine, distearyl phosphatidyl choline and/or its any correlation And/or derived molecules.Optionally, lipid carrier may include any combination of two or more mentioned components, has or does not have There are other compositions.

Lipid carrier (such as liposome or micella) can be single-layer or multi-layer.Polymer bar code agent library can wrap Both carrier containing unilamellar lipid and multilayer lipid carrier.Lipid carrier may include copolymer, such as block copolymer.

Lipid carrier may include at least two, at least three, at least five, at least ten, at least 50, at least 100, extremely Few 500, at least 1000, at least 10,000 or at least 100,000 bar code oligonucleotides or any greater amount of Bar code oligonucleotides.

Any lipid carrier (such as liposome or micella, and/or liposome reagent or micella reagent) can be averaged with 1, Or less than 1 or more than one polymer bar code reagent it is compound, to form the library of such polymer bar code reagent.

The present invention provides polymer bar code agent libraries, and it includes at least ten polymers as defined herein Bar code reagent, wherein each polymer bar code reagent includes the first and second bar codes in different lipid carriers Oligonucleotides, and the wherein bar code region difference of the first and second bar code oligonucleotides of each polymer bar code reagent The bar code region of the bar code oligonucleotides of other polymer bar code reagents of at least nine in library.

The method for preparing polymer bar code reagent includes adding bar code oligonucleotides and/or polymer bar code reagent It is downloaded in lipid carrier (such as liposome or micella).The step of the method may include passive, active and/or remote loading. Pre-formed lipid carrier (such as liposome and/or micella) can be by making itself and bar code oligonucleotides and/or polymer item The solution of codeization reagent contacts to load.Lipid carrier (such as liposome and/or micella) can be by forming or synthesizing lipid Before carrier and/or period contacts it with the solution of bar code oligonucleotides and/or polymer bar code reagent to load.Institute The method of stating may include in lipid carrier to bar code oligonucleotides and/or polymer bar code reagent carry out passive encapsulating and/ Or capture.

Lipid carrier (such as liposome and/or micella) can by method based on sonication, based on French press Method (French press-based method), solvent evaporation process, the method based on extrusion, is based on machinery at inversion method Mixed method, the method based on freeze/thaw are prepared based on the method, and/or any combination thereof of dehydration/rehydrated.

Lipid carrier (such as liposome and/or micella) can be stabilized and/or be stored using known method using preceding.

Any polymer bar code reagent described herein or kit may include lipid carrier.

19. including the kit of polymer bar code reagent and adapter oligonucleotides

The present invention also provides the kits for the component being defined herein comprising one or more.The present invention also provides Kit especially suitable for any method for implementing to be defined herein.

The present invention also provides the kits for tagged target nucleic acid, wherein the kit includes: (a) polymer bar code Change reagent, the first and second molecular bar codes to link together it includes (i) (i.e. polymer molecular bar code), wherein each bar code Molecule includes optionally with 5 ' to 3 ' directions include linking subregion and bar code region nucleic acid sequence, and (ii) first and the Two bar code oligonucleotides, wherein the first bar code oligonucleotides includes the bar code annealed with the bar code region of the first molecular bar code Region, and wherein the second bar code oligonucleotides includes the bar code region annealed with the bar code region of the second molecular bar code;With And (b) the first and second adapter oligonucleotides, wherein the first adapter oligonucleotides includes optionally energy with 5 ' to 3 ' directions It reaches the linking subregion annealed with the linking subregion of the first molecular bar code and can anneal or connect with the first target nucleic acid fragment Target region, and wherein the second adapter oligonucleotides optionally with 5 ' to 3 ' directions include can be with the second molecular bar code The linking subregion of linking subregion annealing and the target region that can be annealed or connect with the second target nucleic acid fragment.

The present invention also provides the kits for tagged target nucleic acid, wherein the kit includes: (a) polymer bar code Change reagent, the first and second molecular bar codes to link together it includes (i) (i.e. polymer molecular bar code), wherein each bar code Molecule includes the nucleic acid sequence containing linking subregion and bar code region, and (ii) first and second bar code oligonucleotides, Wherein the first bar code oligonucleotides includes the bar code region annealed with the bar code region of the first molecular bar code, and wherein second Bar code oligonucleotides includes the bar code region annealed with the bar code region of the second molecular bar code;And (b) first and second rank Sub- oligonucleotides is connect, wherein the first adapter oligonucleotides includes can be with the rank for being connected subregion annealing of the first molecular bar code The target region that connects subregion and can be connect with the first target nucleic acid fragment, and wherein the second adapter oligonucleotides include can The linking subregion annealed with the linking subregion of the second molecular bar code and the target region that can be connect with the second target nucleic acid fragment.

The present invention also provides the kits for tagged target nucleic acid, wherein the kit includes: (a) polymer bar code Change reagent, the first and second molecular bar codes to link together it includes (i) (i.e. polymer molecular bar code), wherein each bar code It includes the nucleic acid sequence for being connected subregion and bar code region, and (ii) first and second bar code that molecule, which includes with 5 ' to 3 ' directions, Change oligonucleotides, wherein the first bar code oligonucleotides includes the bar code region annealed with the bar code region of the first molecular bar code, And wherein the second bar code oligonucleotides includes the bar code region annealed with the bar code region of the second molecular bar code;And (b) First and second adapter oligonucleotides, wherein the first adapter oligonucleotides includes can be with the first bar code with 5 ' to 3 ' directions The linking subregion of the linking subregion annealing of molecule and can with the target region of the first target nucleic acid fragment annealing, and wherein the Two adapter oligonucleotides include with 5 ' to 3 ' directions can be with the linking sub-district for being connected subregion annealing of the second molecular bar code Domain and the target region that can be annealed with the second target nucleic acid fragment.

The present invention also provides the kits for tagged target nucleic acid, wherein the kit includes: (a) polymer bar code Change reagent, the first and second molecular bar codes to link together it includes (i) (i.e. polymer molecular bar code), wherein each bar code Molecule includes optionally with 5 ' to 3 ' directions include linking subregion and bar code region nucleic acid sequence, and (ii) first and the Two bar code oligonucleotides, wherein the first bar code oligonucleotides includes the bar code annealed with the bar code region of the first molecular bar code Region, and wherein the second bar code oligonucleotides includes the bar code region annealed with the bar code region of the second molecular bar code;With And (b) the first and second adapter oligonucleotides, wherein the first adapter oligonucleotides includes can be with the first molecular bar code The linking subregion that linking subregion is annealed and can be connect with the first target nucleic acid fragment, and wherein second adapter widow's core Thuja acid includes the linking that can be annealed and can be connect with the second target nucleic acid fragment with the linking subregion of the second molecular bar code Subregion.

Each adapter oligonucleotides can substantially be made of adapter region or be made of adapter region.Each linking Sub- oligonucleotides can not include target region.

Preferably, the linking subregion of the first adapter oligonucleotides include with the first molecular bar code to be connected subregion mutual The sequence mended and can annealed with it, and the linking subregion of the second adapter oligonucleotides includes and the second molecular bar code It is connected the sequence that subregion is complementary and can anneal with it.The complementary series of each adapter oligonucleotides can be at least five, At least ten, at least 15, at least 20, at least 25, at least 50 or at least 100 continuous nucleotides.

The target region of adapter oligonucleotides can not anneal with polymer molecular bar code.Adapter oligonucleotides Target region can not be complementary with polymer molecular bar code.

The target region of each adapter oligonucleotides may include different sequence.Each target region may include can only with core The sequence of single target nucleic acid fragment annealing in sour sample.Each target region may include one or more random sequences or one A or more degenerate sequence, so that target region can anneal with more than one target nucleic acid fragment.Each target region may include to Few 5, at least ten, at least 15, at least 20, at least 25, at least 50 or at least 100 nucleotide.Preferably, often A target region includes at least five nucleotide.Each target region may include 5 to 100 nucleotide, 5 to 10 nucleotide, 10 to 20 A nucleotide, 20 to 30 nucleotide, 30 to 50 nucleotide, 50 to 100 nucleotide, 10 to 90 nucleotide, 20 to 80 A nucleotide, 30 to 70 nucleotide or 50 to 60 nucleotide.Preferably, each target region includes 30 to 70 nucleotide. Preferably, each target region includes deoxyribonucleotide, and all nucleotide optionally in target region are all deoxyribose core Thuja acid.One or more deoxyribonucleotides can be modified deoxyribonucleotide and (such as use biotin moiety Deoxyribonucleotide or the deoxyuridine acid of modification).Each target region may include one or more universal bases (such as inosine), one or modified nucleotide and/or one or more nucleotide analogs.

Target region can be used for that the segment of adapter oligonucleotides and target nucleic acid is made to anneal, and it is anti-to be then used as primer extend It answers or the primer of amplified reaction (such as polymerase chain reaction).Alternatively, target region can be used for connecting adapter oligonucleotides To the segment of target nucleic acid.Target region can be located at 5 ' ends of adapter oligonucleotides.Such target region can be phosphorylated.This can make 5 ' ends of target region can be connected to 3 ' ends of target nucleic acid fragment.

Adapter oligonucleotides may include the joint area between linking subregion and target region.Joint area may include It does not anneal with the first and second molecular bar codes (i.e. polymer molecular bar code) and one or more not complementary with target nucleic acid fragment Multiple continuous nucleotides.Connector may include 1 to 100,5 to 75,10 to 50,15 to 30 or 20 to 25 non-complementary nucleotides Acid.Preferably, connector includes 15 to 30 non-complementary nucleotides.It is enhanced using this joint area and uses examination described herein The efficiency for the bar code reaction that agent box carries out.

Any form being defined herein can be used in every kind of component of kit.

Polymer bar code reagent and adapter oligonucleotides can be used as physically separated component and provide in kit.

The kit may include: (a) polymer bar code reagent, it includes at least five, at least ten, at least 20, At least 25, at least 50, at least 75 or at least 100 molecular bar codes to link together, wherein each molecular bar code is as originally Defined in text;And the adapter oligonucleotides that can (b) anneal with each molecular bar code, wherein each adapter few nucleosides Acid is as defined herein.

Fig. 2 shows the reagents comprising polymer bar code reagent and adapter oligonucleotides for tagged target nucleic acid Box.More specifically, the kit includes first (D1, E1 and F1) and second (D2, E2 and F2) molecular bar code, each molecular bar code It is incorporated with bar code region (E1 and E2) and 5 ' linkings subregion (F1 and F2).In this embodiment, these first and second Molecular bar code is linked together by connecting nucleic acid sequence (S).

The kit also includes first (A1 and B1) and second (A2 and B2) bar code oligonucleotides, respectively contains bar code Region (B1 and B2) and 5 ' regions (A1 and A2).5 ' regions of each bar code oligonucleotides and 3 ' regions of molecular bar code (D1 and D2) is complementary and therefore can anneal with it.Bar code region (B1 and B2) is complementary with bar code region (E1 and the E2) of molecular bar code And it therefore can anneal with it.

The kit also includes first (C1 and G1) and second (C2 and G2) adapter oligonucleotides, wherein each adapter Oligonucleotides includes that 5 ' linkings subregion (F1 and F2) with molecular bar code are complementary and what therefore can be annealed with it is connected sub-district Domain (C1 and C2).These adapter oligonucleotides can be synthesized comprising 5 '-terminal phosphate groups.Each adapter oligonucleotides is also Comprising target region (G1 and G2), can be used for making bar code-adapter oligonucleotides (A1, B1, C1 and G1 and A2, B2, C2 And G2) anneal with target nucleic acid, and it is then used as the primer of primer extension reaction or polymerase chain reaction.

Kit may include the library of two or more polymer bar code reagents, wherein each polymer bar code tries Agent is as defined herein, and for the adapter oligonucleotides of each polymer bar code reagent, wherein each adapter Oligonucleotides is as defined herein.The bar code area of first and second bar code oligonucleotides of the first polymer bar code reagent Domain is different from the bar code region of the first and second bar code oligonucleotides of the second polymer bar code reagent.

The kit may include containing at least five, at least ten, at least 20, at least 25, at least 50, at least 75, at least 100, at least 250, at least 500, at least 103It is a, at least 104It is a, at least 105It is a, at least 106It is a, extremely Few 107It is a, at least 108It is a or at least 109The library of a polymer bar code reagent as defined herein.Preferably, reagent Box includes the library containing at least ten polymer bar code reagent as defined herein.Kit also may include for every The adapter oligonucleotides of a polymer bar code reagent, wherein each adapter oligonucleotides can be taken and be defined herein The form of any adapter oligonucleotides.Preferably, the first and second bar code few nucleosides of each polymer bar code reagent The bar code region of acid is different from the bar code area of the bar code oligonucleotides of other polymer bar code reagents of at least nine in library Domain.

The bar code region of first and second bar code oligonucleotides of each polymer bar code reagent may differ from library In at least four, at least nine, at least 19, at least 24, at least 49, at least 74, at least 99, at least 249, extremely Few 499, at least 999 (i.e. 103- 1) a, at least 104- 1, at least 105- 1, at least 106- 1, at least 107- 1, at least 108- 1 or at least 109The bar code region of the bar code oligonucleotides of -1 other polymer bar code reagent.Each polymer The bar code region of first and second bar code oligonucleotides of bar code reagent may differ from every other polymer item in library The bar code region of the bar code oligonucleotides of codeization reagent.Preferably, first and second of each polymer bar code reagent The bar code region of codeization oligonucleotides is different from the bar code oligonucleotides of other polymer bar code reagents of at least nine in library Bar code region.

The bar code region of the bar code oligonucleotides of each polymer bar code reagent may differ from least 4 in library A, at least nine, at least 19, at least 24, at least 49, at least 74, at least 99, at least 249, at least 499, At least 999 (i.e. 103- 1) a, at least 104- 1, at least 105- 1, at least 106- 1, at least 107- 1, at least 108- 1 Or at least 109The bar code region of the bar code oligonucleotides of -1 other polymer bar code reagent.Each polymer bar code The bar code region of the bar code oligonucleotides of reagent may differ from the bar code of every other polymer bar code reagent in library The bar code region of oligonucleotides.Preferably, the bar code region of the bar code oligonucleotides of each polymer bar code reagent is different The bar code region of the bar code oligonucleotides of other polymer bar code reagents of at least nine in library.

The present invention provides for mark be used for sequencing target nucleic acid kit, wherein the kit includes: (a) wrap The polymer bar code agent library of the bar code reagent of polymer containing at least ten, wherein each polymer bar code reagent packet Contain: (i) includes the first and second molecular bar codes in (single) nucleic acid molecules, wherein each molecular bar code include optionally with 5 ' to 3 ' directions include the nucleic acid sequence of linking subregion and bar code region, and (ii) first and second bar code few nucleosides Acid, wherein the first bar code oligonucleotides includes complementary with the bar code region of the first molecular bar code and annealing bar code region, and And wherein the second bar code oligonucleotides includes complementary with the bar code region of the second molecular bar code and annealing bar code region;And (b) the first and second adapter oligonucleotides of each polymer bar code reagent are used for, wherein the first adapter oligonucleotides Optionally with 5 ' to 3 ' directions include can with the first molecular bar code be connected linking subregion that subregion is annealed and can be with First target nucleic acid fragment annealing or connection target region, and wherein the second adapter oligonucleotides optionally with 5 ' to 3 ' directions Comprising can with the second molecular bar code be connected subregion annealing linking subregion and can anneal with the second target nucleic acid fragment Or the target region of connection.

20. the kit comprising polymer bar code reagent, adapter oligonucleotides and extension primer

The present invention also provides for mark be used for sequencing target nucleic acid kit, wherein the kit includes: (a) Polymer molecular bar code, it includes the first and second molecular bar codes to link together, wherein each molecular bar code includes optional Ground includes the nucleic acid sequence for being connected subregion, bar code region and initiation area with 5 ' to 3 ' directions;(b) for polymer bar code point First and second extension primers of son, wherein the first extension primer includes that can anneal with the initiation area of the first molecular bar code Sequence, and wherein the second extension primer includes the sequence that can be annealed with the initiation area of the second molecular bar code;And it (c) uses In the first and second adapter oligonucleotides of polymer molecular bar code, wherein the first adapter oligonucleotides optionally with 5 ' extremely 3 ' directions include can with the first molecular bar code be connected subregion anneal linking subregion and can be with the first target nucleic acid piece The target region of section annealing or connection, and wherein the second adapter oligonucleotides optionally include with 5 ' to 3 ' directions can be with the The linking subregion of the linking subregion annealing of two molecular bar codes and the target area that can be annealed or connect with the second target nucleic acid fragment Domain.

The present invention also provides for mark be used for sequencing target nucleic acid kit, wherein the kit includes: (a) Polymer molecular bar code, it includes the first and second molecular bar codes to link together, wherein each molecular bar code includes optional Ground includes the nucleic acid sequence for being connected subregion, bar code region and initiation area with 5 ' to 3 ' directions;(b) for polymer bar code point First and second extension primers of son, wherein the first extension primer includes that can anneal with the initiation area of the first molecular bar code Sequence, and wherein the second extension primer includes the sequence that can be annealed with the initiation area of the second molecular bar code;And it (c) uses In the first and second adapter oligonucleotides of polymer molecular bar code, wherein the first adapter oligonucleotides includes can be with the The linking subregion that the linking subregion of one molecular bar code is annealed and can be connect with the first target nucleic acid fragment, and wherein the Two adapter oligonucleotides include can with the second molecular bar code be connected subregion annealing and can be with the second target nucleic acid piece The target region of section connection.

Each adapter oligonucleotides can substantially be made of adapter region or be made of adapter region

Any form described herein can be used in the component of kit.

Preferably, the first extension primer includes sequence that is complementary with the initiation area of the first molecular bar code and can annealing with it Column, and the second extension primer includes sequence that is complementary with the initiation area of the second molecular bar code and can annealing with it.Each The complementary series of extension primer can be at least five, at least ten, at least 15, at least 20, at least 25, at least 50 Or at least 100 continuous nucleotides.

Prolong as template in the bar code region that first and second extension primers can be able to use the first and second molecular bar codes It stretches, to generate the first and second bar code oligonucleotides, wherein the first bar code oligonucleotides includes and the first molecular bar code The sequence of bar code region complementation, and the second bar code oligonucleotides includes the sequence complementary with the bar code region of the second molecular bar code Column.

The sequence of first and second extension primers can be identical.Alternatively, the sequence of the first and second extension primers can not Together.

First extension primer and/or the second extension primer also may include being respectively provided with and the first bar code molecule and Article 2 One or more regions of the not complementary nucleic acid sequence of shape code molecule.Optionally, such incomplementarity region may include one kind Or more amplimer binding site.Optionally, such incomplementarity area can be located in 5 ' regions of molecule.Optionally, First and second extension primers may include can be with the 5 ' phosphate group of end of 3 ' end connections of nucleic acid molecules.

First extension primer and/or the second extension primer also may include one or more second bar codes region.Optionally, Second bar code region may include in the region not complementary with molecular bar code of extension primer.Optionally, the second bar code region can In the region of extension primer, which is located at the 3 ' regions and extension primer complementary with molecular bar code of extension primer Between 5 ' regions of the binding site comprising amplimer.

Second bar code region may include the sequence of one or more nucleotide, wherein the first extension primer and second extends The sequence in the second bar code region of primer is different.Optionally, one or more nucleotide may include random or degenerate core Thuja acid.Optionally, one or more nucleotide may include different but nonrandom nucleotide.Any second bar code region It may include at least two, at least three, at least five, at least ten, at least 15, at least 20 or at least 30 nucleotide.It is any Second bar code region may include the continuous sequence of bar code oligonucleotides, or may include by least one non-bar code or constant nucleotide Two or more the different sections separated.Optionally, any second bar code region may include unique molecular marker symbol (unique Molecular identifier, UMI).

Kit may include the library of two or more polymer molecular bar codes, wherein each polymer molecular bar code is such as Defined in this paper and the first and second extension primers, and for the first and second ranks of each polymer molecular bar code Connect sub- oligonucleotides.Any form described herein can be used in extension primer and adapter oligonucleotides.First polymer item The bar code region of first and second molecular bar codes of code molecule is different from the first and second bar codes of the second polymer molecular bar code The bar code region of molecule.

The kit may include containing at least five, at least ten, at least 20, at least 25, at least 50, at least 75, at least 100, at least 250, at least 500, at least 103It is a, at least 104It is a, at least 105It is a, at least 106It is a, extremely Few 107It is a, at least 108It is a or at least 109The library of a polymer molecular bar code as defined herein.Preferably, kit Include the library containing at least ten polymer molecular bar code as defined herein.Kit also may include for each more The extension primer and/or adapter oligonucleotides of aggressiveness molecular bar code.Extension primer and adapter oligonucleotides can be used herein Described in any form.Preferably, the bar code region of the first and second molecular bar codes of each polymer molecular bar code is different The bar code region of the molecular bar code of other polymer molecular bar codes of at least nine in library.

The bar code region of first and second molecular bar codes of each polymer molecular bar code may differ from least 4 in library A, at least nine, at least 19, at least 24, at least 49, at least 74, at least 99, at least 249, at least 499, At least 999 (i.e. 103- 1) a, at least 104- 1, at least 105- 1, at least 106- 1, at least 107- 1, at least 108- 1 Or at least 109The bar code region of the molecular bar code of -1 other polymer molecular bar code.The first of each polymer molecular bar code It may differ from the bar code of the molecular bar code of every other polymer molecular bar code in library with the bar code region of the second molecular bar code Region.Preferably, the bar code region of the first and second molecular bar codes of each polymer molecular bar code is different from least 9 in library The bar code region of the molecular bar code of other a polymer molecular bar codes.

The bar code region of the molecular bar code of each polymer molecular bar code may differ from least four in library, at least 9 A, at least 19, at least 24, at least 49, at least 74, at least 99, at least 249, at least 499, at least 999 (i.e. 103- 1) a, at least 104- 1, at least 105- 1, at least 106- 1, at least 107- 1, at least 108- 1 or at least 109The bar code region of the molecular bar code of -1 other polymer molecular bar code.The molecular bar code of each polymer molecular bar code Bar code region may differ from the bar code region of the molecular bar code of every other polymer molecular bar code in library.Preferably, each The bar code region of the molecular bar code of polymer molecular bar code is different from the bar code of other polymer molecular bar codes of at least nine in library The bar code region of molecule.

The present invention also provides for mark be used for sequencing target nucleic acid kit, wherein the kit includes: (a) Polymer molecular bar code library comprising at least ten polymer molecular bar code, each polymer molecular bar code are included in The first and second molecular bar codes in (single) nucleic acid molecules, wherein each molecular bar code includes optionally with 5 ' to 3 ' directions packet The subregion containing linking, bar code region and initiation area nucleic acid sequence, and wherein the first He of each polymer molecular bar code The shape code region of second molecular bar code is different from the bar code region of 9 other polymer molecular bar codes in library;(b) for each First and second extension primers of polymer molecular bar code, wherein the first extension primer includes that can draw with the first molecular bar code The sequence of region annealing is sent out, and wherein the second extension primer includes the sequence that can be annealed with the initiation area of the second molecular bar code Column;And the first and second adapter oligonucleotides of each polymer molecular bar code (c) are used for, wherein first adapter widow's core Thuja acid optionally includes with 5 ' to 3 ' directions can be with the linking subregion and energy for being connected subregion annealing of the first molecular bar code It is enough with the target region the first target nucleic acid fragment annealing or connect, and wherein the second adapter oligonucleotides optionally with 5 ' to 3 ' Direction include can with the second molecular bar code be connected subregion anneal linking subregion and can be with the second target nucleic acid fragment The target region of annealing or connection.

21. method of the preparation for the nucleic acid samples of sequencing

Preparation may include that (i) makes nucleic acid samples and first comprising linking together for the method for the nucleic acid samples of sequencing It is contacted with the polymer bar code reagent in the second bar code region, wherein each bar code region includes nucleic acid sequence, and (ii) will Sequence of barcodes is attached to the first and second target nucleic acid fragments to generate the first and second different bar code target nucleic acid molecules, wherein First bar code target nucleic acid molecule includes the nucleic acid sequence in the first bar code region and the second bar code target nucleic acid molecule includes the The nucleic acid sequence in two bar code regions

In the method that polymer bar code reagent includes the first and second bar code oligonucleotides to link together, item Code sequence can be attached to the first and second target nucleic acid fragments by any method described herein.

First and second bar code oligonucleotides can be connected to the first and second target nucleic acid fragments, to generate first and second Different bar code target nucleic acid molecules.Optionally, before Connection Step, the method includes by the first and second coupling sequences It is attached to target nucleic acid, wherein the first and second coupling sequences are first connect with the first and second bar code oligonucleotides and Two target nucleic acid fragments.

First and second bar code oligonucleotides can anneal with the first and second target nucleic acid fragments of extension, to generate first The bar code target nucleic acid molecule different with second.Optionally, before annealing steps, the method includes even by first and second Connection sequence is attached to target nucleic acid, wherein the first and second coupling sequences are the with the annealing of the first and second bar code oligonucleotides One and second target nucleic acid fragment.

First and second bar code oligonucleotides can anneal at its 5 ' end and the first and second subsequences of target nucleic acid, and First and second target primers can anneal with the third and fourth subsequence of target nucleic acid respectively, and wherein third subsequence is in the first sub- sequence Column 3 ', and wherein the 4th subsequence the 3 ' of the second subsequence.The method also includes using target nucleic acid to make as template First target primer extend until its reach first subsequence, with generate first through extend target primer, and use target nucleic acid as Template extends the second target primer until it reaches the second subsequence, to generate second through extending target primer, and by first through prolonging 3 ' the ends for stretching target primer are connected to 5 ' ends of the first bar code oligonucleotides to generate the first bar code target nucleic acid molecule, and will Second 3 ' the ends for being extended target primer are connected to 5 ' ends of the second bar code oligonucleotides to generate the second bar code target nucleic acid point Son, wherein the first and second bar code target nucleic acid molecules it is different and respectively contain by target nucleic acid as templated synthesis at least one A nucleotide.Optionally, before one or two annealing steps, the method includes by the first and second coupling sequences and/ Or third and fourth coupling sequence be attached to target nucleic acid, wherein the first and second coupling sequences are and the first and second bar codes are few First and second subsequences of the target nucleic acid of annealing oligonucleotide, and/or wherein the third and fourth coupling sequence is and first and Third and fourth subsequence of the target nucleic acid of two target primer annealings.

As described herein, make polymer hybrid molecule, polymer molecular bar code, bar code oligonucleotides, adapter Before oligonucleotides or target primer are annealed or connect with target nucleic acid, coupling sequence can be attached to target nucleic acid.Poly can then be made Body hybrid molecule, polymer molecular bar code, bar code oligonucleotides, adapter oligonucleotides or target primer and coupling sequence are annealed Or connection.

Coupling sequence can be added to the 5 ' ends or 3 ' ends of two or more target nucleic acids of nucleic acid samples.In this method In, (bar code oligonucleotides) target region may include the sequence complementary with coupling sequence.

Coupling sequence may include in double-strand coupling oligonucleotides or in single-stranded coupling oligonucleotides.Being coupled oligonucleotides can Reaction is connected by double-strand or single-stranded connection reaction is attached to target nucleic acid.Being coupled oligonucleotides may include that can connect with target nucleic acid Single-stranded 5 ' or 3 ' regions, and coupling sequence can by single-stranded connection react be attached to target nucleic acid.

Coupling oligonucleotides may include the flush end that can be connect with target nucleic acid, female end or jag 5 ' or 3 ' regions, and And coupling sequence can connect reaction by double-strand and be attached to target nucleic acid.

The end of target nucleic acid can be converted into flush end duplex ends in flat end reaction, and be coupled oligonucleotides and may include Flush end duplex ends, and wherein coupling oligonucleotides can be connect in flush end is connected and reacted with target nucleic acid.

The end of target nucleic acid can be converted into flush end duplex ends in flat end reaction, be subsequently converted to have single 3 ' gland The form of glycosides jag, and wherein coupling oligonucleotides may include the duplex ends with single 3 ' thymidine jag, It can anneal with single 3 ' the adenosine jag of target nucleic acid, and wherein be coupled oligonucleotides in double-strand A/T connection reaction It is connect with target nucleic acid.

Target nucleic acid can be contacted with restriction enzyme, and wherein restriction enzyme digests target nucleic acid in restriction site at restriction site Place generates connection border, and wherein coupling oligonucleotides includes the compatible end of border to be connect with these, and be wherein coupled Oligonucleotides is then connect in double-strand connection reaction with target nucleic acid.

Being coupled oligonucleotides can be attached by primer extend or polymerase chain reaction step.

One or more oligonucleotides comprising causing section can be used to pass through primer extend or polymerase chain reaction Step attachment coupling oligonucleotides, the initiation section includes one or more degeneracy bases

It can be used also comprising there is the initiation of specificity for specific target nucleic acid sequence or hybridize the one or more of section A oligonucleotides passes through primer extend or polymerase chain reaction step attachment coupling oligonucleotides.

Coupling sequence can be added by polynucleotides tailings reactions.Coupling sequence can pass through terminal enzyme (DNA) (such as end Deoxynucleotidyl transferase) it is added.Coupling sequence can be by the polynucleotides that are carried out with terminal deoxynucleotidyl transferase Tailings reactions are to be attached, and wherein coupling sequence includes at least two continuous nucleotides with poly- sequence.

Coupling sequence may include with poly- 3 ' tail (such as poly (A) tail).Optionally, in such method, (bar code is few Nucleotide) same poly- 3 ' tail (such as poly (T) tail) of the target region comprising complementation.

Coupling sequence may include synthesizing in transposons, and can be reacted and be attached by external swivel base.

Coupling sequence is attached to target nucleic acid, and wherein bar code oligonucleotides by least one primer extension procedures or Polymerase chain reaction step is attached to target nucleic acid, and wherein the bar code oligonucleotides includes complementary with the coupling sequence Length be at least one nucleotide region.Optionally, which is located at 3 ' ends of bar code oligonucleotides.Optionally, The complementary region length is at least two nucleotide, and length is at least five nucleotide, and length is at least ten nucleotide, and length is At least 20 nucleotide or length are at least 50 nucleotide.

In adapter oligonucleotides to be attached to the method for (such as connection or annealing) to target nucleic acid, adapter oligonucleotides Linking subregion the coupling sequence that can hybridize with the linking subregion of polymer hybrid molecule or polymer molecular bar code is provided Column.

Method the present invention provides preparation for the nucleic acid samples of sequencing comprising following steps: (a) by coupling sequence It is attached on the first and second target nucleic acid fragments;(b) make nucleic acid samples and the first and second bar codes comprising linking together point The polymer bar code reagent contact of son, wherein each molecular bar code includes to contain (with 5 ' to 3 ' or 3 ' to 5 ' directions) bar code area The nucleic acid sequence in domain and linking subregion;(c) coupling sequence of the first segment is moved back with the subregion that is connected of the first molecular bar code Fire, and the coupling sequence of the second segment is made to anneal with the subregion that is connected of the second molecular bar code;And (d) sequence of barcodes is attached To each of at least two target nucleic acid fragments to generate the first and second different bar code target nucleic acid molecules, wherein first The nucleic acid sequence in bar code region of the bar code target nucleic acid molecule comprising the first molecular bar code, and the second bar code target nucleic acid molecule The nucleic acid sequence in the bar code region comprising the second molecular bar code.

In the method, each molecular bar code may include including bar code region and linking subregion with 5 ' to 3 ' directions Nucleic acid sequence, and step (d) may include that the bar code region of the first molecular bar code is used to make the first target nucleic acid fragment as template Coupling sequence extend to generate the first bar code target nucleic acid molecule, and use the bar code region of the second molecular bar code as mould Plate extends the coupling sequence of the second target nucleic acid fragment to generate the second bar code target nucleic acid molecule, wherein the first bar code target nucleus Acid molecule includes the sequence complementary with the bar code region of the first molecular bar code, and the second bar code target nucleic acid molecule is comprising with the The sequence of the bar code region complementation of two molecular bar codes.

In the method, it includes linking subregion and bar code region that each molecular bar code, which may include with 5 ' to 3 ' directions, Nucleic acid sequence, and step (d) may include that (i) uses the bar code region of the first molecular bar code to make the first extension primer as template It anneals and extends to generate the first bar code oligonucleotides, and the bar code region of the second molecular bar code is used to make as template Two extension primers are annealed and are extended to generate the second bar code oligonucleotides, wherein the first bar code oligonucleotides includes and first The sequence of the bar code region complementation of molecular bar code, and the second bar code oligonucleotides includes the bar code area with the second molecular bar code 3 ' ends of the first bar code oligonucleotides are connected to the 5 ' of the coupling sequence of the first target nucleic acid fragment by the sequence of domain complementation, (ii) 3 ' ends of the second bar code oligonucleotides are connected to the second target nucleic acid piece to generate the first bar code target nucleic acid molecule by end 5 ' ends of the coupling sequence of section are to generate the second bar code target nucleic acid molecule.

In the method, each molecular bar code may include with 5 ' to 3 ' directions include linking subregion, bar code region and The nucleic acid sequence of initiation area, wherein step (d) includes that (i) moves back the first extension primer and the initiation area of the first molecular bar code It fights and the bar code region of the first molecular bar code is used to extend the first extension primer to generate first bar code widow's core as template Thuja acid, and the second extension primer and the initiation area of the second molecular bar code is made to anneal and use the bar code area of the second molecular bar code Domain extends the second extension primer to generate the second bar code oligonucleotides, wherein the first bar code oligonucleotides packet as template Containing the sequence complementary with the bar code region of the first molecular bar code, and the second bar code oligonucleotides includes and the second molecular bar code The complementation of bar code region sequence, the 3 ' of the first bar code oligonucleotides ends are connected to the coupling of the first target nucleic acid fragment by (ii) 3 ' ends of the second bar code oligonucleotides are connected to second to generate the first bar code target nucleic acid molecule by 5 ' ends of sequence 5 ' ends of the coupling sequence of target nucleic acid fragment are to generate the second bar code target nucleic acid molecule.

Preparation can be used for preparing a series of different nucleic acid samples for sequencing for the method for the nucleic acid samples of sequencing. Target nucleic acid can be DNA molecular (such as genomic DNA molecule) or RNA molecule (such as mRNA molecule).Target nucleic acid can come from Any sample.For example, individual cells (or multiple cells), tissue, body fluid (such as blood, blood plasma and/or serum), biopsy or good fortune Paraffin embedding (FFPE) sample that your Malin fixes.

Sample may include at least ten, at least 100, at least 103It is a, at least 104It is a, at least 105It is a, at least 106A, At least 107It is a, at least 108It is a or at least 109A target nucleic acid.

The method may include generating at least two, at least five, at least ten, at least 20, at least 25, at least 50 A, at least 75, at least 100, at least 250, at least 500, at least 103It is a, at least 104It is a, at least 105It is a, at least 106It is a, at least 107It is a, at least 108It is a or at least 109A different bar code target nucleic acid molecule.Preferably, the method includes Generate the different bar code target nucleic acid molecule of at least five.

Each bar code target nucleic acid molecule may include at least one, at least five, at least ten, at least 25, at least 50, At least 100, at least 250, at least 500, at least 1000, at least 2000, at least 5000 or at least 10,000 Nucleotide by target nucleic acid as templated synthesis.Preferably, each bar code target nucleic acid molecule includes at least 20 by target nucleic acid Nucleotide as templated synthesis.

Alternatively, each bar code target nucleic acid molecule may include at least five of target nucleic acid, at least ten, at least 25, at least 50, at least 100, at least 250, at least 500, at least 1000, at least 2000, at least 5000 or at least 10, 000 nucleotide.Preferably, each bar code target nucleic acid molecule includes at least five nucleotide of target nucleic acid.

General initiation sequence can be added to bar code target nucleic acid molecule.The sequence may make that being able to use a kind of forward direction draws Object and a kind of reverse primer then expand at least five, at least ten, at least 20, at least 25, at least 50, at least 75, At least 100, at least 250, at least 500, at least 103It is a, at least 104It is a, at least 105It is a, at least 106It is a, at least 107 It is a, at least 108Or at least 109A different bar code target nucleic acid molecule.

The method may include preparing two or more independent nucleic acid samples for sequencing, wherein each nucleic acid sample Product are prepared using different polymer bar code agent libraries (or different polymer molecular bar code libraries), and wherein every The bar code region of a polymer bar code agent library (or polymer molecular bar code) includes and other polymer bar code reagents The different sequence in the bar code region in (or polymer molecular bar code) library.After each sample for sequencing is prepared separately, It can never will merge with the bar code target nucleic acid molecule of sample preparation and be sequenced together.For the generation of each bar code target nucleic acid molecule Sequence read can be used for identifying the library of polymer bar code reagent (or polymer molecular bar code) used in its preparation, And thus identify the nucleic acid samples prepared from it.

In any method of the preparation for the nucleic acid samples of sequencing, target nucleic acid molecule can be present in core with certain concentration In sour sample, such as concentration is at least 100 to receive and rub, and at least 10 receive and rub, and at least 1 receives and rubs, and at least 100 skins rub, and at least 10 skins rub, until Few 1 skin rubs, and at least 100 fly to rub, and at least 10 fly to rub, or at least 1 flies to rub.Concentration, which can be 1 skin, rubs to 100 to receive and rubs, 10 skins rub to 10 receive and rub or 100 skins rub to 1 to receive and rub.Preferably, concentration rubs to 1 to receive and rub for 10 skins.

In any method of the preparation for the nucleic acid samples of sequencing, polymer bar code reagent can be deposited with certain concentration It is in nucleic acid samples, such as concentration is at least 100 to receive and rub, at least 10 receive and rub, and at least 1 receives and rubs, and at least 100 skins rub, and at least 10 Skin rubs, and at least 1 skin rubs, and at least 100 fly to rub, and at least 10 fly to rub, or at least 1 flies to rub.Concentration, which can be 1 skin, rubs to 100 to receive and rubs, and 10 Skin rubs to 10 to receive and rub or 100 skins rub to 1 to receive and rub.Preferably, concentration rubs to 100 skins for 1 skin and rubs.

In any method of the preparation for the nucleic acid samples of sequencing, polymer molecular bar code can exist with certain concentration In in nucleic acid samples, such as concentration is at least 100 to receive and rub, and at least 10 receive and rub, and at least 1 receives and rubs, and at least 100 skins rub, at least 10 skins It rubs, at least 1 skin rubs, and at least 100 fly to rub, and at least 10 fly to rub, or at least 1 flies to rub.Concentration, which can be 1 skin, rubs to 100 to receive and rubs, 10 skins It rubs to 10 to receive and rub or 100 skins rub to 1 to receive and rub.Preferably, concentration rubs to 100 skins for 1 skin and rubs.

In any method of the preparation for the nucleic acid samples of sequencing, bar code oligonucleotides can exist with certain concentration In in nucleic acid samples, such as concentration is at least 100 to receive and rub, and at least 10 receive and rub, and at least 1 receives and rubs, and at least 100 skins rub, at least 10 skins It rubs, at least 1 skin rubs, and at least 100 fly to rub, and at least 10 fly to rub, or at least 1 flies to rub.Concentration, which can be 1 skin, rubs to 100 to receive and rubs, 10 skins It rubs to 10 to receive and rub or 100 skins rub to 1 to receive and rub.Preferably, concentration rubs to 100 to receive and rub for 100 skins.

22. the method using the preparation of polymer bar code reagent for the nucleic acid samples of sequencing

Method the present invention provides preparation for the nucleic acid samples of sequencing, the method comprise the steps that making Nucleic acid samples are contacted with polymer bar code reagent as defined herein;Make the target region of the first bar code oligonucleotides with The annealing of first target nucleic acid fragment, and the target region of the second bar code oligonucleotides and the second target nucleic acid fragment is made to anneal;And Extend the first and second bar code oligonucleotides to generate the first and second different bar code target nucleic acid molecules, wherein each Bar code target nucleic acid molecule includes at least one nucleotide by target nucleic acid as templated synthesis.

In any method of the preparation for the nucleic acid samples of sequencing, nucleic acid molecules and/or polymer in nucleic acid samples Bar code reagent can be present in liquor capacity with certain concentration, such as concentration is at least 100 to receive and rub, and at least 10 receive and rub, until Few 1 receives and rubs, and at least 100 skins rub, and at least 10 skins rub, or at least 1 skin rubs.Concentration, which can be 1 skin, rubs to 100 to receive and rubs, 10 skins rub to 10 receive and rub or 100 skins rub to 1 to receive and rub.Other higher or lower concentration can also be used.

Method of the preparation for the nucleic acid samples of sequencing may include making nucleic acid samples and polymer as defined herein The contact of bar code agent library, and wherein: the bar code oligonucleotides and the first target nucleic acid of the first polymer bar code reagent Segment anneals and generates the first and second different bar code target nucleic acid molecules, wherein each bar code target nucleic acid molecule includes extremely Few nucleotide by the first target nucleic acid as templated synthesis;And the bar code few nucleosides of the second polymer bar code reagent Acid anneals from the second target nucleic acid fragment and generates the first and second different bar code target nucleic acid molecules, wherein each bar code target Nucleic acid molecules include at least one nucleotide by the second target nucleic acid as templated synthesis.

It in the method, can after annealing with the segment of target nucleic acid and before generating bar code target nucleic acid molecule Bar code oligonucleotides is separated from nucleic acid samples.Optionally, by Streptavidin-biotin interaction capture in solid Bar code oligonucleotides is separated on support.

Additionally or alternatively, bar code target nucleic acid molecule can be separated from nucleic acid samples.Optionally, affine by strepto- Element-biotin interaction capture separates bar code target nucleic acid molecule on solid support.

The step of extending bar code oligonucleotides can carry out while bar code oligonucleotides is annealed with molecular bar code.

Fig. 3 shows method of the preparation for the nucleic acid samples of sequencing, wherein using the polymer bar code being defined herein Change reagent (for example, as shown in fig. 1) to mark and extend two or more nucleic acid subsequences in nucleic acid samples.Described In method, synthesized polymer bar code reagent, which introduce at least first (A1, B1, C1 and G1) and second (A2, B2, C2 and G2) bar code oligonucleotides, the latter respectively contain bar code region (B1 and B2) and target region (respectively G1 and G2).

Nucleic acid samples comprising target nucleic acid are contacted or mixed with polymer bar code reagent, and allow two or more The target region (G1 and G2) of bar code oligonucleotides is moved back with two or more corresponding subsequences (H1 and H2) in target nucleic acid Fire.After the anneal step, the first and second bar code oligonucleotides are extended (for example, using target region drawing as polymerase Object) into the sequence of target nucleic acid, so that at least one nucleotide of subsequence is introduced into the extension of each bar code oligonucleotides 3 ' ends.This method generates bar code target nucleic acid molecule, wherein two or more subsequences from target nucleic acid are few with bar code Nucleotide is marked.

Alternatively, before the subsequence that this method may additionally include the target region and target nucleic acid that make bar code oligonucleotides is annealed The step of bar code oligonucleotides and molecular bar code will be made to dissociate.

Fig. 4 shows method of the preparation for the nucleic acid samples of sequencing, wherein using polymer bar code described herein Change reagent (for example, as shown in fig. 1) to mark and extend two or more nucleic acid subsequences in nucleic acid samples, but its In anneal before (and extending target nucleic acid sequence) with target nucleic acid sequence, keep the bar code from polymer bar code reagent few Nucleotide and molecular bar code dissociate.In the method, polymer bar code reagent has been synthesized, which introduce at least first (A1, B1, C1 and G1) and second (A2, B2, C2 and G2) bar code oligonucleotides, the latter respectively contain bar code region (B1 and B2) and target Region (respectively G1 and G2).

Contact the nucleic acid samples comprising target nucleic acid with polymer bar code reagent, and then make bar code oligonucleotides and Molecular bar code dissociation.It can be for example by making reagent be exposed to raised temperature (for example, at least 35 DEG C, at least 40 DEG C, at least 45 DEG C, at least 50 DEG C, at least 55 DEG C, at least 60 DEG C, at least 65 DEG C, at least 70 DEG C, at least 75 DEG C, at least 80 DEG C, at least 85 DEG C or At least 90 DEG C of temperature) or complete by chemical denaturant or combinations thereof the step.The step can also make in sample itself Double-strandednucleic acid denaturation.Then permissible bar code oligonucleotides spread a certain amount of time (for example, at least 5 seconds, at least 15 seconds, extremely It is 30 seconds, at least 60 seconds, at least 2 minutes, at least 5 minutes, at least 15 minutes, at least 30 minutes or at least 60 minutes few) (and phase Ying Di spreads certain physical distance in sample).

The condition of reagent-sample mixture can be changed then to allow the target area of two or more bar code oligonucleotides It anneals with two or more corresponding subsequences (H1 and H2) in target nucleic acid in domain (G1 and G2).This may include for example reduce it is molten The temperature of liquid with allow anneal (for example, by temperature decrease below 90 DEG C, lower than 85 DEG C, lower than 70 DEG C, lower than 65 DEG C, be lower than 60 DEG C, lower than 55 DEG C, lower than 50 DEG C, lower than 45 DEG C, lower than 40 DEG C, lower than 35 DEG C, lower than 30 DEG C, lower than 25 DEG C or be lower than 20 ℃).After the annealing steps (or for example, after purifying/preparation step), the first and second bar code oligonucleotides are prolonged (for example, primer that target region is used as polymerase) is stretched into the sequence of target nucleic acid, so that at least one nucleotide of subsequence is simultaneously Enter the 3 ' ends to the extension of each bar code oligonucleotides.

The method produces bar code target nucleic acid molecule, wherein two or more subsequences from nucleic acid samples by Bar code oligonucleotide marker.In addition, the step of dissociating bar code oligonucleotides and it is made to diffuse through sample, is for specific The sample of type has advantage.For example, nucleic acid samples (such as (FFPE) sample of the fixed paraffin embedding of formalin of crosslinking Product) diffusion of relatively small single bar code oligonucleotides can be subjected to.This method allow label have poor accessibility (such as FFPE sample) or other biological physical property nucleic acid samples, for example, wherein target nucleic acid subsequence is physically distinct from each other.

General initiation sequence can be added to bar code target nucleic acid molecule.The sequence may make that being able to use a kind of forward direction draws Object and a kind of reverse primer then expand at least five, at least ten, at least 20, at least 25, at least 50, at least 75, At least 100, at least 250, at least 500, at least 103It is a, at least 104It is a, at least 105It is a, at least 106It is a, at least 107 It is a, at least 108Or at least 109A different bar code target nucleic acid molecule.

Make nucleic acid samples and polymer bar code reagent as defined herein or polymer bar code agent library Before contact, coupling sequence can be added to the 5 ' ends or 3 ' ends of two or more target nucleic acids of nucleic acid samples.In this method In, target region may include the sequence complementary with coupling sequence.Coupling sequence may include with poly- 3 ' tail (such as poly (A) tail).It is even Connection sequence can be added by terminal enzyme (DNA).Coupling sequence includes in the method for poly (A) tail wherein, and target region may include Poly (T) tailer sequence.Such coupling sequence can be added after the high temperature incubation of nucleic acid samples, with addition coupling sequence it Before make nucleic acid denaturation wherein included.

Alternatively, can be by adding coupling sequence, in this case, coupling sequence with limitation enzymic digestion target nucleic acid sample It may include one or more nucleotide for limiting enzyme recognition sequence.In this case, coupling sequence can be at least partly Double-strand, and may include flush end double chain DNA sequence, or the sequence of 5 ' with 1 or more nucleotide prominent end regions, Or the sequence of 3 ' the prominent end regions with 1 or more nucleotide.In these cases, in polymer bar code reagent Target region then can include double-strand peace terminal sequence (and therefore can connect with flush end restrictive digestion product) or target region It may include 5 ' or 3 ' the prominent terminal sequences of 1 or more nucleotide, they and the restrictive digestion product binded (and therefore can anneal and connect with it).

The method may include preparing two or more independent nucleic acid samples for sequencing, wherein each nucleic acid sample Product are prepared using different polymer bar code agent libraries (or different polymer molecular bar code libraries), and wherein every The bar code region of a polymer bar code agent library (or polymer molecular bar code) includes and other polymer bar code reagents The different sequence in the bar code region in (or polymer molecular bar code) library.After each sample for sequencing is prepared separately, It can never will merge with the bar code target nucleic acid molecule of sample preparation and be sequenced together.For the generation of each bar code target nucleic acid molecule Sequence read can be used for identifying the library of polymer bar code reagent (or polymer molecular bar code) used in its preparation, And thus identify from its prepare nucleic acid samples.

Method the present invention provides preparation for the nucleic acid samples of sequencing, the method comprise the steps that (a) Contact nucleic acid samples with polymer bar code reagent and the first and second target primers, wherein each bar code oligonucleotides with 5 ' to 3 ' directions include target region and bar code region;(b) make the first bar code oligonucleotides target region and target nucleic acid first Subsequence annealing, and the target region of the second bar code oligonucleotides and the second subsequence of target nucleic acid is made to anneal;(c) make first The annealing of the third subsequence of target primer and target nucleic acid, wherein third subsequence is the 3 ' of the first subsequence, and draws the second target 4th subsequence of object and target nucleic acid annealing, wherein the 4th subsequence is the 3 ' of the second subsequence;(d) use target nucleic acid as mould Plate makes the first target primer extend until it reaches the first subsequence to generate first through extending target primer, and is made using target nucleic acid Make the second target primer extend for template until it reaches the second subsequence to generate second through extending target primer;And (e) by The 5 ' of the first bar code oligonucleotides is connected to once the 3 ' ends for extending target primer to hold to generate the first bar code target nucleic acid molecule, And 3 ' the ends that second is extended target primer are connected to 5 ' ends of the second bar code oligonucleotides to generate the second bar code target Nucleic acid molecules, wherein the first and second bar code target nucleic acid molecules are different, and wherein each bar code target nucleic acid molecule includes At least one nucleotide by target nucleic acid as templated synthesis.

In the method, step (b) and (c) can be carried out simultaneously.

23. the method using polymer bar code reagent and the preparation of adapter oligonucleotides for the nucleic acid samples of sequencing

Method presented below can be carried out with any reagent box being defined herein.

Method the present invention also provides preparation for the nucleic acid samples of sequencing, the method comprise the steps that (a) nucleic acid samples are contacted with the first and second adapters oligonucleotides as defined herein;(b) make the first adapter few Nucleotide is annealed or is connect with the first target nucleic acid fragment, and the second adapter oligonucleotides and the second target nucleic acid fragment is made to anneal Or connection;(c) nucleic acid samples are contacted with polymer bar code reagent as defined herein;(d) make the first adapter few The linking subregion of nucleotide is connected subregion annealing with the first molecular bar code, and makes the rank of the second adapter oligonucleotides Subregion is connect to anneal with the subregion that is connected of the second molecular bar code;And (e) 3 ' ends of the first bar code oligonucleotides are connected To 5 ' ends of the first adapter oligonucleotides to generate the first bar code-adapter oligonucleotides, and the second bar code is few 3 ' ends of nucleotide are connected to 5 ' ends of the second adapter oligonucleotides to generate the second bar code-adapter oligonucleotides.

Method the present invention also provides preparation for the nucleic acid samples of sequencing, the method comprise the steps that (a) nucleic acid samples are contacted with the first and second adapters oligonucleotides as defined herein;(b) make the first adapter few Nucleotide is connect with the first target nucleic acid fragment, and connect the second adapter oligonucleotides with the second target nucleic acid fragment;(c) make Nucleic acid samples are contacted with polymer bar code reagent as defined herein;(d) make the linking of the first adapter oligonucleotides Subregion and the first molecular bar code are connected subregion annealing, and make the linking subregion and the of the second adapter oligonucleotides The linking subregion of two molecular bar codes is annealed;And the bar code region of the first molecular bar code (e) is used to make the first rank as template Sub- oligonucleotides is connect to extend to generate the first bar code target nucleic acid molecule, and use the bar code region of the second molecular bar code as Template extends the second adapter oligonucleotides to generate the second bar code target nucleic acid molecule, wherein the first bar code target nucleic acid point Attached bag contains the sequence complementary with the bar code region of the first molecular bar code, and the second bar code target nucleic acid molecule includes and Article 2 The sequence of the bar code region complementation of code molecule.

Method the present invention also provides preparation for the nucleic acid samples of sequencing, the method comprise the steps that (a) nucleic acid samples are contacted with the first and second adapters oligonucleotides as defined herein;(b) make the first adapter few The target region of nucleotide and the first target nucleic acid fragment are annealed, and make the target region and the second target nucleus of the second adapter oligonucleotides Acid fragment annealing;(c) nucleic acid samples are contacted with polymer bar code reagent as defined herein;(d) make the first linking The linking subregion of sub- oligonucleotides is connected subregion annealing with the first molecular bar code, and makes the second adapter oligonucleotides Linking subregion and the second molecular bar code be connected subregion annealing;And (e) by 3 ' ends of the first bar code oligonucleotides 5 ' ends of the first adapter oligonucleotides are connected to generate the first bar code-adapter oligonucleotides, and by the second bar code 3 ' the ends for changing oligonucleotides are connected to 5 ' ends of the second adapter oligonucleotides to generate the second bar code-adapter few nucleosides Acid.

In the method, the first and second bar codes-adapter oligonucleotides can be made to extend to generate first and second Different bar code target nucleic acid molecules respectively contains at least one nucleotide by target nucleic acid as templated synthesis.

Alternatively, the first and second adapter oligonucleotides can be made to extend to generate the first and second different target nucleic acids point Son respectively contains at least one nucleotide by target nucleic acid as templated synthesis.In the method, step (f) generates first Bar code target nucleic acid molecule (the first bar code oligonucleotides being connect with through extending the first adapter oligonucleotides) and second Bar code target nucleic acid molecule (the second bar code oligonucleotides being connect with through extending the second adapter oligonucleotides).

The step of extending adapter oligonucleotides can be before step (c), before step (d) and/or the advance of step (e) Row, and the first and second adapter oligonucleotides can keep annealing with the first and second molecular bar codes until after step (e).

The method can be used polymer bar code agent library as defined herein and for each polymer item The adapter oligonucleotides as defined herein of codeization reagent carries out.Preferably, the item of the first polymer bar code reagent Codeization-adapter oligonucleotides anneals from the first target nucleic acid fragment and generates the first and second different bar code target nucleic acids point Son, wherein each bar code target nucleic acid molecule includes at least one nucleotide by the first target nucleic acid as templated synthesis;And Bar code-adapter oligonucleotides of second polymer bar code reagent and the second target nucleic acid fragment anneal and generate first and Second different bar code target nucleic acid molecule, wherein each bar code target nucleic acid molecule includes that at least one is made by the second target nucleic acid For the nucleotide of templated synthesis.

The method can be used and be as defined herein polymer bar code agent library and for each polymer bar code That changes reagent is as defined herein the progress of adapter oligonucleotides.Preferably, the adapter of the first polymer bar code reagent Oligonucleotides anneals from the first target nucleic acid fragment and generates the first and second different target nucleic acid molecules, wherein each target nucleic acid Molecule includes at least one nucleotide by the first target nucleic acid as templated synthesis;And the rank of the second polymer bar code reagent It connects sub- oligonucleotides to anneal from the second target nucleic acid fragment and generate the first and second different target nucleic acid molecules, wherein each target Nucleic acid molecules include at least one nucleotide by the second target nucleic acid as templated synthesis.

Bar code-adapter oligonucleotides can be after annealing with the segment of target nucleic acid and in the target nucleic acid for generating bar code It is separated from nucleic acid samples before molecule.Optionally, it is caught on solid support by Streptavidin-biotin interaction It obtains to separate bar code-adapter oligonucleotides.

Bar code target nucleic acid molecule can be separated from nucleic acid samples.Optionally, pass through Streptavidin-biotin phase interaction Bar code target nucleic acid molecule is separated on solid support with capture.

Fig. 5 shows the method using the preparation of polymer bar code reagent for the nucleic acid samples of sequencing.In the method In, so that the target nucleic acid in first (C1 and G1) and the second (C2 and G2) adapter oligonucleotides and nucleic acid samples is annealed, and then For primer extension reaction.Each adapter oligonucleotides includes linking subregion (C1 and C2), the 5 ' ranks with molecular bar code It connects subregion (F1 and F2) complementation and therefore can anneal with it.Each adapter oligonucleotides also include target region (G1 and G2), it can be used for that bar code oligonucleotides and target nucleic acid is made to anneal, and be then used as primer extension reaction or polymerase chain The primer of reaction.These adapter oligonucleotides can be synthesized comprising 5 '-terminal phosphate groups.

Then make adapter oligonucleotides (it has respectively extended comprising the sequence from target nucleic acid) and polymer item The contact of codeization reagent, the polymer bar code reagent include first (D1, E1 and F1) and second (D2, E2 and F2) molecular bar code And first (A1 and B1) and second (A2 and B2) the bar code oligonucleotides, the bar code oligonucleotides respectively contain bar code area Domain (B1 and B2) and 5 ' regions (A1 and A2).First and second molecular bar codes respectively contain bar code region (E1 and E2), linking Subregion (F1 and F2) and 3 ' regions (D1 and D2), and in this embodiment by connection nucleic acid sequence (S) connection one It rises.

After primer extend nucleic acid samples are contacted with polymer bar code reagent, 5 ' ranks of each adapter oligonucleotides " connecting border " (J1 and J2) that subregion (C1 and C2) can be adjacent with 3 ' ends of each bar code oligonucleotides is met to anneal. The 3 ' of the bar code oligonucleotides that then 5 ' ends of the adapter oligonucleotides of extension are connected in polymer bar code reagent End generates connection base-pair (K1 and K2), wherein connection border is previously disposed in wherein.Then it can be further processed or expand and is molten Liquid, and it is used for sequencing reaction.

This method is similar with method shown in Fig. 3 and 4, generates bar code target nucleic acid molecule, wherein coming from nucleic acid samples Two or more segments pass through bar code oligonucleotide marker.In the method, for making target region and target nucleic acid fragment The step of annealing, or the step of extending the target region of annealing using polymerase, need not exist for polymer bar code reagent. This feature can have advantage in some applications, such as wherein interested is a large amount of target sequences, and ought not be by polymer item Codeization reagent molecule constraint when, target region can more quickly with target nucleus acid hybridization.

24. the nucleic acid sample using polymer bar code reagent, adapter oligonucleotides and amplimer preparation for sequencing The method of product

Method the present invention also provides preparation for the nucleic acid samples of sequencing, the method comprise the steps that (a) nucleic acid samples are contacted with the first and second adapters oligonucleotides as defined herein;(b) make the first adapter few The target region of nucleotide and the first target nucleic acid fragment are annealed, and make the target region and the second target nucleus of the second adapter oligonucleotides Acid fragment annealing;(c) make nucleic acid samples and polymer molecular bar code as defined herein library and be as defined herein The first and second extension primers contact;(d) make the linking subregion and the first molecular bar code of the first adapter oligonucleotides It is connected subregion annealing, and make the linking subregion of the second adapter oligonucleotides and the second molecular bar code is connected subregion Annealing;(e) the bar code region of the first molecular bar code is used to extend the first extension primer to generate the first bar code as template Oligonucleotides, and the bar code region of the second molecular bar code is used to extend the second extension primer to generate Article 2 as template Codeization oligonucleotides, wherein the first bar code oligonucleotides includes the sequence complementary with the bar code region of the first molecular bar code, and And second bar code oligonucleotides include the sequence complementary with the bar code region of the second molecular bar code;And (f) by the first bar code 3 ' the ends for changing oligonucleotides are connected to 5 ' ends of the first adapter oligonucleotides to generate the first bar code-adapter few nucleosides Acid, and 3 ' ends of the second bar code oligonucleotides are connected to 5 ' ends of the second adapter oligonucleotides to generate the second bar code Change-adapter oligonucleotides.

In the method, the first and second bar codes-adapter oligonucleotides can be made to extend to generate first and second Different bar code target nucleic acid molecules respectively contains at least one nucleotide by target nucleic acid as templated synthesis.

Alternatively, the first adapter oligonucleotides can be made to extend the first and second adapter oligonucleotides to generate first and the Two different target nucleic acid molecules respectively contain at least one nucleotide by target nucleic acid as templated synthesis.In the method, Step (f) generates the first bar code target nucleic acid molecule (the first bar code connecting with through extending the first adapter oligonucleotides Oligonucleotides) and the second bar code target nucleic acid molecule (few core of the second change being connect with through extending the second adapter oligonucleotides Thuja acid).

The step of extending adapter oligonucleotides can before step (c), before step (d), before step (e) and/ Or it is carried out before step (f), and the first and second adapter oligonucleotides can keep annealing with the first and second molecular bar codes Until after step (f).

Before step (c), extension primer and polymer molecular bar code can be made to anneal.Alternatively, can make nucleic acid samples with such as Polymer molecular bar code defined herein library and individual extension primer contact as defined herein.It can then make to prolong The object that extends is annealed with the polymer molecular bar code in nucleic acid samples.During step (d), extension primer can be with polymer bar code point Son annealing.

The library of the first and second extension primers can be used in the method, for example, the library may include for each poly First and second extension primers of body molecular bar code.Optionally, each extension primer in extension primer library may include second Bar code region, wherein Article 2 code region is different from the second bar code region in library in other extension primers.Optionally, Such library may include at least two, at least three, at least four, at least five, at least ten, at least 20, at least 50, extremely Few 100, at least 500, at least 1000, at least 5,000 or at least 10,000 different extension primers.

25. the nucleic acid samples using polymer bar code reagent, adapter oligonucleotides and the preparation of target primer for sequencing Method

Method the present invention also provides preparation for the nucleic acid samples of sequencing, the method comprise the steps that (a) contact nucleic acid samples with the first and second adapter oligonucleotides and the first and second target primers, wherein each linking Sub- oligonucleotides includes target region and linking subregion with 5 ' to 3 ' directions;(b) make the target region of the first adapter oligonucleotides It anneals with the first subsequence of target nucleic acid, and makes the target region of the second adapter oligonucleotides and the second subsequence of target nucleic acid Annealing;(c) the third subsequence of the first target primer and target nucleic acid is made to anneal, wherein third subsequence is the 3 ' of the first subsequence, And the 4th subsequence of the second target primer and target nucleic acid is set to anneal, wherein the 4th subsequence is the 3 ' of the second subsequence;(d) make Target nucleic acid is used so that the first target primer extend is reached first subsequence up to it to generate first and be extended target primer as template, and And target nucleic acid is used so that the second target primer extend is reached second subsequence up to it to generate second through extending target and draw as template Object;(e) 3 ' ends that first is extended target primer are connected to 5 ' ends of the first adapter oligonucleotides, and by second through extending 3 ' ends of target primer are connected to 5 ' ends of the second adapter oligonucleotides;(f) make nucleic acid samples and be as defined herein poly The contact of body molecular bar code library;(g) make the adapter of the linking subregion and the first molecular bar code of the first adapter oligonucleotides Region annealing, and the linking subregion of the second adapter oligonucleotides is annealed with the subregion that is connected of the second molecular bar code;With And the bar code region of the first molecular bar code (h) is used to extend the first adapter oligonucleotides to generate the first bar code as template Change oligonucleotides, and the bar code region of the second molecular bar code is used to extend the second adapter oligonucleotides to produce as template Raw second bar code oligonucleotides, wherein the first bar code oligonucleotides includes complementary with the bar code region of the first molecular bar code Sequence, and the second bar code oligonucleotides includes the sequence complementary with the bar code region of the second molecular bar code.

In the method, step (b) and (c) can be carried out simultaneously.

In the method, step (f)-(h) can be carried out before step (d) and (e).In the method, pass through completion Step (e) generates the first and second different bar code target nucleic acid molecules, respectively contains at least one by target nucleic acid as mould The nucleotide of plate synthesis.

In the method, step (f)-(h) can be carried out after step (d) and (e).In the method, pass through completion Step (h) generates the first and second different bar code target nucleic acid molecules, respectively contains at least one by target nucleic acid as mould The nucleotide of plate synthesis.

Fig. 6 shows a kind of mode of executable this method.In the method, target nucleic acid is genomic DNA.It should be understood that Target nucleic acid can be other kinds of nucleic acid, such as RNA molecule, such as mRNA molecule.

26. the method using polymer bar code reagent and the preparation of target primer for the nucleic acid samples of sequencing

Method the present invention also provides preparation for the nucleic acid samples of sequencing, the method comprise the steps that (a) nucleic acid samples are contacted with the first and second bar code oligonucleotides and the first and second target primers to link together, Wherein each bar code oligonucleotides includes target region and bar code region with 5 ' to 3 ' directions;(b) make the first bar code few nucleosides The target region of acid and the first subsequence of target nucleic acid are annealed, and make the target region and target nucleic acid of the second bar code oligonucleotides The annealing of second subsequence;(c) the third subsequence of the first target primer and target nucleic acid is made to anneal, wherein third subsequence is in the first Asia The 3 ' of sequence, and the 4th subsequence of the second target primer and target nucleic acid is made to anneal, wherein the 4th subsequence is in the second subsequence 3 ';(d) using target nucleic acid as template makes the first target primer extend until it reaches the first subsequence to generate first through prolonging Target primer is stretched, and use target nucleic acid as template to make the second target primer extend until it reaches the second subsequence to generate second Through extending target primer;(e) 3 ' ends that first is extended target primer are connected to 5 ' ends of the first bar code oligonucleotides to generate First bar code target nucleic acid molecule, and the 3 ' ends that second is extended target primer are connected to the 5 ' of the second bar code oligonucleotides End wherein the first and second bar code target nucleic acid molecules are different, and is respectively contained with generating the second bar code target nucleic acid molecule At least one nucleotide by target nucleic acid as templated synthesis.

27. the method for assembling polymer molecular bar code by rolling amplification

The present invention also provides the methods from nucleic acid molecular bar code library assembling polymer molecular bar code library, wherein described Nucleic acid molecular bar code is expanded by one or more rolling circle amplifications (RCA) process.In the method, nucleic acid molecular bar code can be each From optionally with 5 ' to 3 ' directions include bar code region and linking subregion.Optionally, nucleic acid molecular bar code may include can be with Phosphorylase 15 ' the end of 3 ' end connections of nucleic acid molecules.

In the method, annular form is converted by the nucleic acid molecular bar code in library, so that the item from molecular bar code Code region and linking subregion are included in continuous circular nucleic acid molecules.Optionally, this to be converted to nucleic acid molecular bar code The step of annular form, can be carried out by the single-stranded connection reaction of intramolecular.For example, including phosphorylase 15 ' the nucleic acid bar code point at end Son can by with single-chain nucleic acid ligase (such as T4 RNA ligase 1) be incubated for or by with heat-staple single-chain nucleic acid ligase (such as heat-staple single-chain nucleic acid ligase of CircLigase (coming from Epicenter Bio)) is incubated for be cyclized.Optionally, may be used Exonuclease step is carried out with the molecule for not being cyclized and/or being not connected with that consumes or degrade;Optionally, wherein exonuclease walks Suddenly it is carried out by Escherichia coli exonuclease I or Escherichia coli λ exonuclease.

Optionally, cyclisation primer can be used to carry out for the step of nucleic acid molecular bar code being converted to annular form.In the implementation In scheme, nucleic acid molecular bar code includes 5 ' ends of phosphorylation.In addition, in this embodiment, making comprising 3 ' with molecular bar code 5 ' regions of region complementation and the cyclisation primer in the 3 ' regions complementary with 5 ' regions of molecular bar code and molecular bar code are annealed, so that 5 ' the ends and 3 ' ends of molecular bar code are closely adjacent to each other when along cyclisation primer annealing.After the anneal step, it will anneal Molecular bar code be attached with ligase (such as T4 DNA ligase), the 3 ' of molecular bar code ends are connected to by the ligase 5 ' ends of molecular bar code.Optionally, exonuclease step can be carried out with point for not being cyclized and/or being not connected with that consume or degrade Son;Optionally, wherein exonuclease step is carried out by Escherichia coli exonuclease I or Escherichia coli λ exonuclease.

After the cyclisation step, rolling circle amplification step amplification cyclisation molecular bar code can be used.In this process, primer with comprising The circularized nucleic acid chain of molecular bar code is annealed, and 3 ' ends of the primer are extended with the polymerase for showing strand displacement behavior. For each original cyclisation molecular bar code, which can form the linear (acyclic of the copy comprising original cyclisation molecular bar code Shape) polymer molecular bar code, as shown in Figure 7.In one embodiment, the cyclisation primer molecule with bar code annealing can Primer as rolling circle amplification step.Optionally, after cyclisation, it can make and be cyclized at least partly complementary independent of molecular bar code Amplimer and cyclisation molecular bar code annealing are to cause rolling circle amplification step.

During the rolling circle amplification step, primer can be extended through polymerase, and wherein polymerase is along circularized template It extends up to it to encounter amplimer and/or be cyclized 5 ' ends of primer, then it continues to expand along circularized template, replaces simultaneously 5 ' ends of primer, and the chain previously expanded is then replaced during rolling circle amplification.It, can after any such amplification step Step is purified and/or removed to separate the product of this rolling circle amplification.Optionally, purifying and/or remove step may include Size-Selective Process, such as the Size-Selective Process based on gel or the reversible fixed dimension selection method of solid phase, such as based on magnetic The reversible fixed dimension selection method of the solid phase of pearl.Optionally, can purify length is that at least 100 nucleotide, length are at least 500 A nucleotide, length are that at least 1000 nucleotide, length are that at least 2000 nucleotide, length are at least 5000 nucleosides Acid, length are that at least 10,000 nucleotide, length are that at least 20,000 nucleotide, length are at least 50,000 nucleosides Acid or length are the amplified production of at least 100,000 nucleotide.Optionally, before any rolling circle amplification step and/or the phase Between, it can in the reactive mixture include single-stranded DNA binding protein (such as 32 Protein of T4Gene), such as to prevent from passing through Circularized template and/or amplified production form secondary structure.During or after any such rolling circle amplification step, it can remove And/or the inactivation single-stranded DNA binding protein, such as pass through hot inactivation step.

Optionally, such rolling circle amplification process can be carried out by phi29DNA polymerase.Optionally, such rolling ring expands Increasing process can be carried out by Bst or Bsm archaeal dna polymerase.Optionally, such rolling circle amplification process can be carried out, so that by poly- At least one complete copy of synthase generation circularized template.Optionally, such rolling circle amplification process can be carried out, so that by poly- At least two, at least three, at least five, at least ten, at least 50, at least 100, at least 200 of synthase generation circularized template A, at least 500, at least 1000, at least 2000, at least 5000 or at least 10,000 complete copies.

An example of this method is provided in Fig. 7.In the figure, make the bar code comprising linking subregion and bar code region Molecule cyclisation (such as being reacted using single-stranded connection).So that primer and resulting cyclisation product is annealed, and is then set using chain It changes polymerase (such as phi29DNA polymerase) and extends the primer.While synthesizing extension products, polymerase then along Cyclisation product handles a circumference, and original primers are then replaced in strand replacement reaction.It can then continue onto rolling ring expansion Increasing process is to generate long continuous nucleic acid molecules, and it includes many tandem copies of cyclization sequence --- i.e. the bar code of molecular bar code With many tandem copies (and/or sequence complementary with bar code and linking subsequence) of linking subsequence.

Polymer molecular bar code can also be expanded by rolling circle amplification.

28. expanding the method for polymer molecular bar code by rolling circle amplification

A) the property of polymer molecular bar code

The present invention also provides the methods from nucleic acid molecular bar code amplified library polymer molecular bar code, wherein the poly Body molecular bar code is expanded by one or more rolling circle amplifications (RCA) process.In the method, polymer molecular bar code packet It is contained at least two molecular bar codes to link together in (single) nucleic acid molecules.Optionally, each bar code area of molecular bar code It domain can be adjacent with one or more linking subregions;Optionally, such linking subregion can be located at related bar code region 5 ' ends, or the 3 ' ends in related bar code region can be located at.Optionally, each bar code region is connected subregion and 5 ' adapters with 3 ' Both areas is associated;Optionally, 3 ' linking subregions and 5 ' linking subregions may include different linking subsequences.Optionally Ground, one or more linking subregions may include sequence complementary or identical with the linking subregion of adapter oligonucleotides. Optionally, one or more linking subregions may include all or part of complementary or identical sequence with extension primer. Any form described herein can be used in polymer molecular bar code.

Each polymer molecular bar code also may include (optionally in 5 ' ends of polymer molecular bar code) that positive reagent expands Increasing sequence may include sequence complementary or identical with positive reagent amplimer.Each polymer molecular bar code also may include (optionally in 3 ' ends of polymer molecular bar code) reversed reagent extension increasing sequence, may include mutual with reversed reagent amplimer Benefit or identical sequence.

Polymer molecular bar code may include at least two, at least three, at least four, at least five, at least ten, at least 20 It is a, at least 25, at least 50, at least 75, at least 100, at least 200, at least 500, at least 1000, at least 5000, at least 104It is a, at least 105Or at least 106A different molecular bar code.Any polymer molecular bar code library Ke Bao Containing at least five, at least ten, at least 20, at least 25, at least 50, at least 75, at least 100, at least 250, extremely Few 500, at least 103It is a, at least 104It is a, at least 105It is a, at least 106It is a, at least 107It is a, at least 108Or at least 109It is a not Same polymer molecular bar code.

B the method for) being cyclized polymer molecular bar code and/or its library

In the method for amplification polymer molecular bar code, circlewise by polymer molecular bar code (and/or its library) conversion Form, so that 2 from polymer molecular bar code or more bar code region (and optionally, 2 or more linking sub-districts Domain) it is included in continuous circular nucleic acid molecules.Optionally, this step of polymer molecular bar code is converted to annular form It can be carried out by the single-stranded connection reaction of intramolecular.For example, including phosphorylase 15 ' end polymer molecular bar code can by with list Chain nucleic acid ligase (such as T4 RNA ligase 1) be incubated for or by with heat-staple single-chain nucleic acid ligase (such as The heat-staple single-chain nucleic acid ligase of CircLigase (coming from Epicenter Bio)) it is incubated for be cyclized, wherein the ligase 5 ' phosphorylated ends of polymer molecular bar code are connected to 3 ' ends of same molecule.Optionally, exonuclease step can be carried out Suddenly with the molecule for not being cyclized and/or being not connected with that consumes or degrade;Optionally, wherein exonuclease step is circumscribed by Escherichia coli Nuclease I or Escherichia coli λ exonuclease carry out.

Optionally, the step of polymer molecular bar code being converted to annular form can by intramolecular duplex connect reaction come It carries out.For example, including double-stranded sequence and phosphorylase 15 ' end polymer molecular bar code may include flush end, or can optionally make its end End is converted into flush end form by flat end reaction.It can then be incited somebody to action by connecting reaction with the intramolecular duplex of T4 DNA ligase This polymer molecular bar code is converted into annular form, so that one end of polymer molecular bar code connects on one or two chains To the other end of same polymer molecular bar code.

In an alternative embodiment, the step of polymer molecular bar code being converted to annular form can pass through intramolecular Double-strand connection reaction is to carry out, and wherein the end of polymer molecular bar code includes the end generated by restrictive digestion step. In such embodiment, the polymer molecular bar code comprising double-stranded sequence include its 5 ' and 3 ' region in one or The recognition site of more restriction endonucleases.In digestion reaction, the such one kind of the polymer molecular bar code Or more restriction endonuclease digested, to generate being digested for the end comprising restrictive digestion product Polymer molecular bar code.These polymer molecular bar codes through digesting then optionally are purified, such as with based on gel or base In the size selecting step of pearl.It then can be by connecting reaction with the intramolecular duplex of T4 DNA ligase for the poly through digesting Body molecular bar code is converted into annular form, so that the restrictive digestion site of polymer molecular bar code one end is connected to same poly The restrictive digestion site of the other end of body molecular bar code.Optionally, flush end can be by the end that restriction enzyme generates, or It may include 3 ' jags of one or more nucleotide, or may include 5 ' jags of one or more nucleotide.

Optionally, cyclisation primer can be used to carry out for the step of polymer molecular bar code being converted to annular form.In the reality It applies in scheme, polymer molecular bar code includes 5 ' ends of phosphorylation.In addition, in this embodiment, including and polymer bar code The cyclisation primer in 5 ' regions of 3 ' the regions complementation of molecule and the 3 ' regions complementary with 5 ' regions of polymer molecular bar code and more The annealing of aggressiveness molecular bar code so that the 5 ' ends and 3 ' ends of polymer molecular bar code when along cyclisation primer annealing closely each other It is adjacent.Optionally, polymer molecular bar code can include positive reagent extension increasing sequence and reversed examination in its 5 ' end and 3 ' ends respectively Agent extension increasing sequence, and being cyclized primer may include the sequence at least partly complementary with the reagent extension increasing sequence.Optionally, in ring Change primer and polymer molecular bar code or the step of its library is annealed after, cleaning reaction (such as the ruler based on gel can be passed through Very little selection step or the size selecting step based on pearl, such as the reversible fixing step of solid phase) consumed from solution not with polymer The excessive cyclisation primer of molecular bar code annealing.

After cyclisation-primer annealing step, by ligase (such as the T4 DNA connection of the polymer molecular bar code of annealing Enzyme) it connects, 3 ' ends of polymer molecular bar code are connected to 5 ' ends of the polymer molecular bar code of annealing by the ligase, along It is closely adjacent to be cyclized primer.Optionally, exonuclease step can be carried out not to be cyclized and/or be not connected with to consume or degrade Molecule;Optionally, wherein exonuclease step by Escherichia coli exonuclease I or Escherichia coli λ exonuclease into Row.

In assembling, amplification, connection and/or cyclisation molecular bar code and/or polymer molecular bar code and/or its library or component Any step during, the concentration of such molecule is positively retained in a certain range in solution.For example, molecular bar code and/or more The concentration of aggressiveness molecular bar code be smaller than 100 receive rub, received less than 10 rub, received less than 1 rub, rub less than 100 skins, rubbing less than 10 skins, It rubs less than 1 skin, flies to rub less than 100, flies to rub less than 10 or fly to rub less than 1.Optionally, in assembling, amplification, connection and/or cyclisation During any step of molecular bar code and/or polymer molecular bar code and/or its library or its ingredient, such molecule in solution The concentration molecular bar code that allows two or more different and/or polymer molecular bar code be attached to each other, go here and there in solution Connection or connection optionally then further expand the product of these attachments, series connection or connection during amplification step.

C the method for cyclic multimeric body molecular bar code) is expanded with rolling circle amplification

After the cyclisation step, with the polymer molecular bar code of rolling circle amplification step amplification cyclisation.In the method, make to draw Object is annealed with the circularized nucleic acid chain comprising polymer molecular bar code, and makes described draw with the polymerase for showing strand displacement behavior 3 ' ends of object extend.In one embodiment, can be used as rolling ring with the cyclisation primer of polymer molecular bar code annealing to expand Increase the primer of step.Optionally, after cyclisation, can make one kind at least partly complementary with the polymer molecular bar code being cyclized or More kinds of individual amplimers and the annealing of the molecular bar code of cyclisation are to cause rolling circle amplification step.Optionally, with polymer item At least partly complementary oligonucleotides of one or more linking subregions that code intramolecular includes can be used as amplimer.Optionally Ground can be purged after any step for making one or more of amplimers and cyclic multimeric body molecular bar code anneal Step from solution to consume unannealed primer and/or separate the polymer molecular bar code of primer annealing.Optionally, such Removing step may include size selecting step, such as the size selecting step based on gel or the size selecting step based on pearl, Such as the reversible fixing step of solid phase.

During the rolling circle amplification step, each primer can be extended through polymerase, and wherein polymerase is along cyclisation Template extends up to it and encounters amplimer and/or be cyclized 5 ' ends of primer, and then it continues to expand along circularized template, simultaneously 5 ' ends of primer are replaced, and then replace the chain previously expanded during rolling circle amplification.Any such amplification step it Afterwards, step can be purified and/or be removed to separate the product of this rolling circle amplification.Optionally, purifying and/or remove step can The reversible fixed dimension selection method of Size-Selective Process or solid phase including Size-Selective Process, such as based on gel, such as base In the reversible fixed dimension selection method of the solid phase of magnetic bead.Optionally, can purify length is at least 100 nucleotide, length for extremely Few 500 nucleotide, length are that at least 1000 nucleotide, length are that at least 2000 nucleotide, length are at least 5000 Nucleotide, length are that at least 10,000 nucleotide, length are that at least 20,000 nucleotide, length are at least 50,000 cores Thuja acid or length are the amplified production of at least 100,000 nucleotide.

Optionally, such rolling circle amplification process can be carried out by phi29DNA polymerase.Optionally, such rolling ring expands Increasing process can be carried out by Bst or Bsm archaeal dna polymerase.Optionally, such rolling circle amplification process can be carried out, so that by poly- At least one complete copy of synthase generation circularized template.Optionally, such rolling circle amplification process can be carried out, so that by poly- At least two, at least three, at least five, at least ten, at least 50, at least 100, at least 200 of synthase generation circularized template A, at least 500, at least 1000, at least 2000, at least 5000 or at least 10,000 complete copies.

D) with the method for the second rolling circle amplification process amplification polymer molecular bar code

After any step for expanding polymer molecular bar code by rolling circle amplification, the mistake of the second rolling circle amplification can be carried out Journey.In this process, the product (or its component part) from the first rolling circle amplification step is cyclized itself, and is subsequently used as The template molecule of two (or other) rolling circle amplification steps.

For example, in such embodiment, by polymer molecular bar code library in the first rolling circle amplification step Amplification.Then double-strand or partially double stranded form are converted by products therefrom.For example, primer can anneal with the product;Optionally, The primer can be with the whole or one of one or more " the reagent extension increasing sequences " that include in original polymer bar code reagent Partial complementarity is identical.Optionally, after such annealing steps, primer extension procedures can be carried out, wherein 3 ' ends of primer Extend at least one nucleotide by polymerase.Optionally, such primer extend can be carried out until generating related polymer item The complete copy of code molecule, i.e., until generating complete duplex molecule.Optionally, such primer extend can be set by lacking chain It changes or 5 ' -3 ' exonucleases or the polymerase of petaloid endonuclease behavior (flap endonuclease behavior) (such as Phusion polymerase or T4 archaeal dna polymerase) Lai Jinhang.

It (and optionally, is generated by primer extension procedures comprising the primer and the reagent extension increasing sequence annealed with it Any primer extension product) double stranded region may include restriction endonuclease recognition site.The limitation then can be used Property the resulting double-strand of endonuclease digestion or partially double stranded product so that the end of each molecule includes the limitation that can be connected Property border.Optionally, flush end can be by the end that restriction enzyme generates, or may include the 3 ' prominent of one or more nucleotide Outlet, or may include 5 ' jags of one or more nucleotide.

It can be then converted into resulting through digestion molecule by connecting reaction with the intramolecular duplex of T4 DNA ligase Annular form, so that the restrictive digestion site on molecule one end is connected to the restrictive digestion site of the same molecule other end. Optionally, before such connection reaction, the polymer molecular bar code of restrictive digestion can dilute in the solution.Optionally, The concentration of gained polymer molecular bar code, which is smaller than 100, to be received and rubs, receives less than 10 and rub, receive less than 1 and rub, rub, less than 100 skins less than 10 Skin is rubbed, is rubbed less than 1 skin, flies to rub less than 100, flies to rub less than 10 or fly to rub less than 1.

Any rolling circle amplification process described in method any in this article can be used in resulting cyclisation molecule.Optionally, into The first rolling circle amplification of row process, this whole process that products therefrom is cyclized and then carries out the second rolling circle amplification process It is repeatable twice, three times, four times, five times or any bigger number to be to improve the product volume finally generated by whole process.

E) with the method for the polymer molecular bar code of primer extend process processing rolling circle amplification

It, can be to products therefrom after carrying out any process of rolling circle amplification to polymer molecular bar code and/or its library Carry out one or more primer extension procedures.Resulting primer extension product may include single stranded nucleic acid molecule, and it includes polies All or part of and/or two or more polymer molecular bar codes of body molecular bar code all or part of.In some realities It applies in scheme, such primer extension product may include the library of single stranded nucleic acid molecule, wherein each mononucleotide chain includes poly Body molecular bar code.In other embodiments, such primer extension product can with synthesize they template molecule annealing or It anneals part.Optionally, can be used for generating by any polymer molecular bar code that any such primer extend process generates more Aggressiveness bar code reagent and/or its library.Optionally, any polymer bar code generated by any such primer extend process Molecule can be used for making the nucleic acid molecules bar code in nucleic acid samples;Optionally, the bar code sequence comprising the polymer molecular bar code Column attach the nucleic acid molecules to nucleic acid samples.

In such embodiment of primer extend process, can be used with the whole of positive reagent extension increasing sequence or All or part of complementary or identical primer in sequence of a part and/or reversed reagent extension increasing sequence.In this way at one Embodiment in, and include that reagent extension increasing sequence in polymerase-extension products that rolling circle amplification reacts is at least partly mutual The primer of benefit can be used for carrying out one or more primer extension reactions and/or circulation.In an implementation of primer extend process In scheme, Random primed libraries be used for the primer extend process, such as random hexamers, random eight mer primer or with Ten mer primer of machine.Optionally, any primer used in primer extend process may include one or more modifications, such as sulphur Substituted phosphate key, and the phosphorothioate bond especially in such as primer in most 3 ' one or two nucleotide bond.It is this The polymerase that 3 ' phosphorothioate bonds can prevent the primer to be shown exonuclease behavior is degraded.

Optionally, such primer extension procedures can be by showing 5 ' -3 ' exonuclease behaviors (such as from large intestine The DNA polymerase i of bacillus) and/or petaloid endonuclease behavior (such as Taq polymerase from thermus aquaticus) polymerization Enzyme carries out, so that during processing primer extend process of the nucleic acid sequence of anneals downstream immediately of polymerase in the polymerase Degradation or Partial digestion.

Optionally, such primer extension procedures can be by showing polymerase (such as the phi29 DNA of strand displacement behavior Polymerase, Vent polymerase, Deep Vent polymerase or its exonuclease enzyme defect derivative are (such as from New England Bioloabs) or Bst or Bsm archaeal dna polymerase) carry out so that processing polymerase anneals downstream immediately nucleic acid sequence It is replaced during the primer extend process of the polymerase.Optionally, the displaced nucleic acid sequence may include prolonging in primer Other primer extension products generated during extending through journey.Optionally, this primer extension procedures can pass through phi29DNA polymerase It carries out, wherein the primer for the primer extension procedures includes random primer.

Any such primer extension procedures that polymerase by showing strand displacement behavior carries out can have displacement to wrap Containing it is one or more linking subregions and/or be connected subsequence polymer molecular bar code region (and/or comprising come from poly The nucleic acid chains of the sequence of body molecular bar code, for example, by such primer extend process generate those of) effect so that the rank It connects subregion and/or adapter is Sequence Transformed for single stranded form, enable resulting single-stranded linking subregion and complementary series (such as including the complementary series being coupled in oligonucleotides, adapter oligonucleotides and/or extension primer) hybridization.Such chain is set The a part for changing molecule can keep and synthesize their template molecule annealing.Primer extend process in this way synthesizes any A part of the nucleic acid molecules of given strand displacement can be used for synthesizing polymer bar code reagent.Primer extend mistake in this way A part of the nucleic acid molecules of any given strand displacement of Cheng Hecheng can be used for carrying out item to the nucleic acid molecules in nucleic acid samples Codeization.

Optionally, such primer extension procedures can be by not showing 5 ' -3 ' exonucleases or petaloid endonuclease Polymerase (such as Pfu and/or Phusion polymerase or derivatives thereof (New England of enzyme behavior or strand displacement behavior Biolabs) or T4 archaeal dna polymerase) carry out so that processing polymerase anneals downstream immediately nucleic acid sequence in polymerase Stop the extension of polymerase when encountering them.

Optionally, any such primer extension procedures may include at least one, at least five, at least ten, at least 15, At least 20, at least 30, at least 50 or at least 100 primer extend circulations.Optionally, such primer extend circulation It can be carried out in the repetitive cycling of primer extend, template denaturation and primer annealing.Optionally, any such primer extension procedures Can comprising one or more of macromolecular crowding agent (crowding agent) (such as polyethylene glycol (PEG) reagent, such as PEG 8000) buffer in carry out.

Optionally, can produce length by any of above primer extend process is that at least 100 nucleotide, length are at least 500 nucleotide, length are that at least 1000 nucleotide, length are that at least 2000 nucleotide, length are at least 5000 cores Thuja acid, length be at least 10,000 nucleotide, length be at least can 20,000 nucleotide, length be at least 50,000 cores Thuja acid, length are the primer extension product of at least 100,000 nucleotide.Optionally, such primer extend process can be carried out, So that generating at least one complete copy of circularized template by polymerase.Optionally, the mistake of such rolling circle amplification can be carried out Journey, so that passing through at least two of polymerase generation polymer molecular bar code template, at least 3 during each primer extension procedures A, at least five, at least ten, at least 50, at least 100, at least 200, at least 500, at least 1000, at least 2000, at least 5000 or at least 10,000 copies.

Optionally, the time span (for example, second or minute) of primer extension reaction may be arranged so that each primer extend The same length of single polymer bar code reagent in product and library.For example, if for primer extend process Polymerase is carried out with the rate of 1000 nucleotide per minute, and polymer bar code tries in polymer bar code agent library The average length of agent is 1000 nucleotide, then primer extend circulation can be configured to 1 minute length.

It optionally, can be resulting by cleaning reaction isolated or purified after one or more primer extension procedures Primer extension product.Optionally, such cleaning reaction may include size selecting step, such as the step of the size selection based on gel The rapid or size selecting step based on pearl, such as the reversible fixing step of solid phase.Optionally, can purify length is at least 100 cores Thuja acid, length are that at least 500 nucleotide, length are that at least 1000 nucleotide, length are at least 2000 nucleotide, length Be at least 5000 nucleotide, length be at least 10,000 nucleotide, length be at least can 20,000 nucleotide, length be At least 50,000 nucleotide, the primer extension product that length is at least 100,000 nucleotide.

F the method for the polymer molecular bar code of rolling circle amplification and/or primer extend) is handled with degenerative process

Before or after any purification step and/or size selecting step, and/or for synthesizing the examination of polymer bar code Before agent, and/or for before making the nucleic acid bar code in nucleic acid samples, any rolling circle amplification product or primer as above generated Extension products can be denaturalized by denaturing step.Such denaturing step can be denaturation step, and wherein product is incubated at high temperature It educates so that anneal sequence and/or secondary structure unwinding.Such denaturing step can at least 60 degrees Celsius, at least 70 degrees Celsius, It is carried out at a temperature of at least 80 degrees Celsius, at least 90 degrees Celsius or at least 95 degrees Celsius.Such denaturing step can have and make to wrap It is the effect of single stranded form containing one or more linking subregions and/or the polymer molecular bar code region denaturation for being connected subsequence Fruit enables resulting single-stranded linking subregion and complementary series (such as included in coupling oligonucleotides, adapter few nucleosides Complementary series in acid and/or extension primer) hybridization.

In some alternate embodiments, before or after any purification step and/or size selecting step, and/or It, can be without this for before synthesizing polymer bar code reagent, and/or for before making the nucleic acid bar code in nucleic acid samples The denaturing step of sample.For example, the nucleic acid chains of the primer extension product generated during including primer extension procedures can keep and close It anneals at their template molecule annealing or part.Obtained nucleic acid molecule may include in total the individual nucleic acid chains of at least two, The individual nucleic acid chains of at least three, the individual nucleic acid chains of at least five, the individual nucleic acid chains of at least ten, at least 50 individual cores Sour chain, at least 100 individual nucleic acid chains, at least 500 individual nucleic acid chains, at least 1000 individual nucleic acid chains, at least 5000 individual nucleic acid chains or at least 10,000 individual nucleic acid chains.Optionally, individual nucleic acid chains may include one or More polymer molecular bar codes all or part of.Such nucleic acid molecule and/or its library can be used for synthesizing poly Body bar code reagent, and/or for making the nucleic acid bar code in nucleic acid samples.

29. the method for synthesizing polymer bar code reagent

Method the present invention also provides synthesis for the polymer bar code reagent of tagged target nucleic acid comprising: (a) make the One and second molecular bar code contacted with the first and second extension primers, wherein each molecular bar code includes to include with 5 ' to 3 ' directions It is connected the single-chain nucleic acid of subregion, bar code region and initiation area;(b) make the initiation of the first extension primer Yu the first molecular bar code Region annealing, and the second extension primer and the initiation area of the second molecular bar code is made to anneal;And (c) by making the first extension Primer extend is prolonged to synthesize the first bar code extension products by extending the second extension primer to synthesize the second bar code Product is stretched, wherein the first bar code extension products include the sequence and Article 2 complementary with the bar code region of the first molecular bar code Codeization extension products include the sequence complementary with the bar code region of the second molecular bar code, and wherein the first bar code extension products Not comprising the sequence complementary with the linking subregion of the first molecular bar code, and the second bar code extension products do not include and second The sequence of the linking subregion complementation of molecular bar code;And wherein the first and second molecular bar codes link together.

The method may additionally include the following steps before the step of the first and second bar code extension products of synthesis: (a) the first and second molecular bar codes is made to block primer to contact with first and second;And (b) make the first blocking primer and first The linking subregion annealing of code molecule, and the second blocking primer is made to anneal with the subregion that is connected of the second molecular bar code;And Wherein this method further includes the steps that the step of making to block primer and molecular bar code to dissociate after synthesizing bar code extension products.

In the method, the extension step or the second extension step after synthesizing extension products can be carried out, wherein from One of four kinds of typical deoxyribonucleotides or more are excluded in extension, so that second extends step in adapter It is terminated at position before regional sequence, wherein the position includes the nucleotide complementary with the deoxyribonucleotide being excluded. The extension step can be carried out with the polymerase for lacking 3 ' to 5 ' exonuclease activities.

Molecular bar code can be provided by single-stranded polynucleotides body molecular bar code as defined herein.

Molecular bar code can be synthesized by being as defined herein any method.Bar code region can uniquely identify each bar code Molecule.Molecular bar code can be associated on nucleic acid molecules.Molecular bar code can link together in connection reaction.Molecular bar code can By including the steps that molecular bar code is attached to the other of solid support to link together.

The step of being defined above (a) is (that is, connect the first and second molecular bar codes with the first and second extension primers Touching) before, the first and second molecular bar codes can be assembled into double-stranded polymer bar code point by being as defined herein any method Son.Double-stranded polymer molecular bar code can be dissociated to generate single-stranded polynucleotides body molecular bar code, for the step of being defined above (a) (that is, Contact the first and second molecular bar codes with the first and second extension primers).

The method can comprise the further steps of: (a) make the first adapter oligonucleotides linking subregion and first The linking subregion annealing of code molecule, and make the rank of the linking subregion and the second molecular bar code of the second adapter oligonucleotides Subregion annealing is connect, wherein the first adapter oligonucleotides also includes the target area that can be annealed with the first subsequence of target nucleic acid Domain, and the second adapter oligonucleotides also includes the target region that can be annealed with the second subsequence of target nucleic acid;And (b) will 3 ' ends of the first bar code extension products are connected to 5 ' ends of the first adapter oligonucleotides to generate the first bar code few nucleosides Acid, and 3 ' ends of the second bar code extension products are connected to 5 ' ends of the second adapter oligonucleotides to generate the second bar code Change oligonucleotides.Optionally, annealing steps (a) can carry out before the step of synthesizing the first and second bar code extension products, And the step of wherein synthesizing the first and second bar code extension products in the presence of being attached the ligase of step (b) into Row.Ligase can be heat-staple ligase.Extend and connection reaction can more than 37 degrees Celsius, more than 45 degrees Celsius or surpass It crosses under 50 degrees Celsius and carries out.

Target region may include different sequence.Each target region may include can be only single with the target nucleic acid in nucleic acid samples The sequence of a subsequence annealing.Each target region may include one or more random sequences or one or more degeneracy sequences Column, so that target region can anneal with the more than one subsequence of target nucleic acid.Each target region may include at least five, at least 10 A, at least 15, at least 20, at least 25, at least 50 or at least 100 nucleotide.Preferably, each target region includes extremely Few 5 nucleotide.Each target region may include 5 to 100 nucleotide, 5 to 10 nucleotide, 10 to 20 nucleotide, 20 to 30 nucleotide, 30 to 50 nucleotide, 50 to 100 nucleotide, 10 to 90 nucleotide, 20 to 80 nucleotide, 30 to 70 nucleotide or 50 to 60 nucleotide.Preferably, each target region includes 30 to 70 nucleotide.Preferably, each target Region includes deoxyribonucleotide, and all nucleotide optionally in target region are all deoxyribonucleotides.One or more Multiple deoxyribonucleotides can be modified deoxyribonucleotide (such as with biotin moiety modify deoxyribose Nucleotide or deoxyuridine acid).Each target region may include one or more universal bases (such as inosine), one Or modified nucleotide and/or one or more nucleotide analogs.

The linking subregion of each adapter oligonucleotides may include constant region.Optionally, with single polymer bar code All linking subregions of the adapter oligonucleotides of reagent annealing are substantially the same.Linking subregion may include at least four, extremely Few 5, at least six, at least eight, at least ten, at least 15, at least 20, at least 25, at least 50, at least 100, Or at least 250 nucleotide.Preferably, linking subregion includes at least four nucleotide.Preferably, each linking subregion packet Containing deoxyribonucleotide, optionally, all nucleotide being connected in subregion are all deoxyribonucleotides.It is one or more A deoxyribonucleotide can be through modification deoxyribonucleotide (such as with biotin moiety modify dezyribonucleoside Acid or deoxyuridine acid).Each linking subregion may include one or more universal bases (such as inosine), one Or modified nucleotide and/or one or more nucleotide analogs.

For being related to any method of adapter oligonucleotides, 3 ' ends of adapter oligonucleotides may include reversible terminator Part or reversible terminator nucleotides (for example, 3 '-O- blocked nucleotides), such as at 3 ' terminal nucleotides of target region.When with When extending and/or extending and connection is reacted, the 3 ' of these adapter oligonucleotides can be prevented to hold and cause any extension event.This Mistake initiation or other vacations during the generation of bar code oligonucleotides can be made to extend event minimizations.In the poly using assembling Before body bar code reagent, the termination subdivision of reversible terminator can be removed by chemistry or other methods, thus allows target area Domain extends along the target nucleic acid template annealed with it.

Similarly, for being related to any method of adapter oligonucleotides, during extending and/or extending and connection is reacted One or more blocking oligonucleotides complementary with one or more sequences in target region can be used.Block oligonucleotides It can include terminator and/or other parts at its 3 ' and/or 5 ' end, prevent they through polymerase from being extended.It can design Oligonucleotides is blocked, so that they are with the complete or partial complementary sequence anneals of same or more the target region, and is being prolonged It stretches and/or extends and anneal before connecting reaction with the target region.Block the use of primer that can prevent in target region and solution Undesirable such sequence (for example, sequence signature in molecular bar code itself) annealing annealed and potential mistake cause.It can Design blocks oligonucleotides to realize specific annealing and/or melting temperature.Using assembling polymer bar code reagent it Before, size selectivity can be carried out for example, by thermal denaturation and then then to remove or other methods remove and block oligonucleotides. Block the removal of oligonucleotides that target region is allowed to extend along the target nucleic acid template annealed with it.

This method may include synthesis comprising at least five, at least ten, at least 20, at least 25, at least 50, at least 75 or at least 100 molecular bar codes polymer bar code reagent, and wherein: (a) each molecular bar code is such as institute herein It limits;And (b) basis is as defined herein any method and synthesizes bar code extension products from each molecular bar code;And appoint Selection of land (c) basis is as defined herein any method and adapter oligonucleotides is connected to each bar code extension products, with Generate bar code oligonucleotides.

The present invention also provides the methods of synthesis polymer bar code agent library, wherein the method includes repeating such as this To synthesize two or more polymer bar code reagents the step of any method defined in text.Optionally, the method packet Include synthesis at least five, at least ten, at least 20, at least 25, at least 50, at least 75, at least 100, at least 250 A, at least 500, at least 103It is a, at least 104It is a, at least 105It is a, at least 106It is a, at least 107It is a, at least 108It is a, at least 109Or at least 1010The library of a polymer bar code reagent as defined herein.Preferably, library include at least five such as Polymer bar code reagent defined herein.Preferably, the bar code region of each polymer bar code reagent can be with other The bar code region of polymer bar code reagent is different.

Fig. 8 shows method of the synthesis for the polymer bar code reagent of tagged target nucleic acid.In the method, make One (D1, E1 and F1) and the second (D2, E2 and F2) molecular bar code denaturation be single stranded form, the molecular bar code respectively contain containing The nucleic acid sequence in bar code region (E1 and E2) and by connect nucleic acid sequence (S) be associated.For these single-stranded molecular bar codes, So that the 3 ' regions (D1 and D2) of the first and second extension primers (A1 and A2) and the first and second molecular bar codes is annealed, and makes the One and second blocks primer (R1 and R2) to be connected subregion (F1 and F2) annealing with the 5 ' of the first and second molecular bar codes.These resistances Disconnected primer (R1 and R2) can be modified on 3 ' ends, so that it can not be used as the initiation site of polymerase.

Primer extension reaction then is carried out using polymerase, wherein extending extension primer to generate the bar code of molecular bar code The copy (B1 and B2) in region (E1 and E2).The primer extension reaction is carried out, so that extension products cause sequence end close to blocking Only, such as by using the polymerase for lacking strand displacement or 5 ' -3 ' exonuclease activities.Then remove block primer (R1 and R2), such as pass through high-temperature denaturation.

Therefore, this method generates polymer bar code reagent, it includes with single-stranded to be connected subregion (F1 and F2) adjacent First and second connections border (J1 and J2).The polymer bar code reagent can be used for method shown in Fig. 5.

This method may also include 3 ' end (B1 of the first and second bar code oligonucleotides that will be generated by primer extension procedures With the 3 ' of B2 ends) the step of being connected to first (C1 and G1) and second (C2 and G2) adapter oligonucleotides, wherein each linking Sub- oligonucleotides includes complementary with linking subregion (F1 and the F2) of molecular bar code and what therefore can be annealed with it is connected sub-district Domain (C1 and C2).Adapter oligonucleotides can be synthesized to include 5 '-terminal phosphate groups.

Each adapter oligonucleotides also may include target region (G1 and G2), can be used for making bar code oligonucleotides and target Nucleic acids anneal, and can primer that is independent or being subsequently used as primer extension reaction or polymerase chain reaction.By first and second Bar code oligonucleotides is connected to the step of adapter oligonucleotides and generates polymer bar code reagent as shown in fig. 1, can For method shown in Fig. 3 and/or Fig. 4.

Fig. 9 shows method of the synthesis for the polymer bar code reagent (as shown in Figure 1) of tagged target nucleic acid.In the party In method, being denaturalized first (D1, E1 and F1) and the second (D2, E2 and F2) molecular bar code is single stranded form, and the molecular bar code is each The self-contained nucleic acid sequence containing bar code region (E1 and E2) and by connect nucleic acid sequence (S) be associated.For these lists Chain molecular bar code moves back the first and second extension primers (A1 and A2) and the 3 ' regions (D1 and D2) of the first and second molecular bar codes Fire, and make the first adapter few nucleosides (C1 and G1) and the second adapter few nucleosides (C2 and G2) linking subregion (C1 and C2) it is connected subregion (F1 and F2) annealing with the 5 ' of the first and second molecular bar codes.Can synthesize these adapter oligonucleotides with Include 5 '-terminal phosphate groups.

Primer extension reaction then is carried out using polymerase, wherein extending extension primer to generate the bar code of molecular bar code The copy (B1 and B2) in region (E1 and E2).Carry out the primer extension reaction so that extension products close to linking subregion (C1 and C2) sequence ends, such as by using the polymerase for lacking strand displacement or 5 ' -3 ' exonuclease activities.

5 ' ends of adapter oligonucleotides are then connected to using ligase the adjacent 3 ' end of corresponding extension products.Another In one embodiment, ligase can include together with polymerase in a reaction, described to react while realizing primer extend With products therefrom and the connection for connecting oligonucleotides.Draw in this way, obtained bar code oligonucleotides then can be used as Object extension or the primer of polymerase chain reaction, such as the method as shown in Fig. 3 and/or Fig. 4.

30. the method for sequencing and/or processing sequencing data

The present invention provides the methods that the target nucleic acid to circulation particle is sequenced, wherein the circulation particle includes at least Two target nucleic acid fragments, and the method comprise the steps that sample of (a) preparation for sequencing comprising by described at least two At least two in a target nucleic acid fragment are associated to generate at least two groups for being associated target nucleic acid fragment;And (b) to institute It states the segment that is each associated in group to be sequenced, be associated (in information) sequence read with generating at least two.

The present invention provides the methods that the genomic DNA to circulation particle is sequenced, wherein the circulation particle includes At least two genomic DNA fragments, and the method comprise the steps that sample of (a) preparation for sequencing comprising it will at least At least two in two genomic DNA fragments are associated to generate at least two groups for being associated genomic DNA fragment;And (b) segment that is each associated in described group is sequenced, is associated (in information) sequence read with generating at least two.

The present invention provides the methods that the target nucleic acid to circulation particle is sequenced comprising: (a) connection comes from (single) At least two target nucleic acid fragments of particle are recycled to generate at least two groups for being associated target nucleic acid fragment;And (b) to described The segment that is each associated in group is sequenced, and is associated (in information) sequence read with generating at least two.

The present invention provides the methods that circulation zwf gene group DNA is sequenced, comprising: (a) will be followed from (single) At least two genomic DNA fragments of ring particle, which are associated to be associated with generation at least two, recycles zwf gene group DNA fragmentation Group;And (b) segment that is each associated in described group is sequenced, it is associated (in information) sequence with generating at least two It reads.

The present invention also provides the methods that sample is sequenced, wherein the sample passes through system as defined herein It is ready for use on any method preparation of the nucleic acid samples of sequencing.The method be sequenced to sample is the following steps are included: separation Bar code target nucleic acid molecule, and generating includes bar code region, target region and at least one other nucleosides from target nucleic acid The sequence read of each bar code target nucleic acid molecule of acid.Each sequence read may include at least five, at least ten, at least 25 It is a, at least 50, at least 100, at least 250, at least 500, at least 1000, at least 2000, at least 5000 or At least 10,000 nucleotide from target nucleic acid.Preferably, each sequence read includes nucleosides of at least five from target nucleic acid Acid.

The method can produce the sequence read from one or more bar code target nucleic acid molecules, the bar code target Nucleic acid molecules by least ten, at least 100 or at least 103It is a, at least 104It is a, at least 105It is a, at least 106It is a, at least 107 It is a, at least 108Or at least 109A different target nucleic acid generates.

It can be sequenced by any method known in the art.For example, being sequenced by chain termination or Sanger.It is preferred that Ground, sequencing are carried out by next-generation sequencing approach, for example, synthesis order-checking, using reversible terminator synthesis order-checking (such as Illumina sequencing), pyrosequencing (such as 454 sequencing), connection sequencing (such as SOLiD sequencing), single-molecule sequencing (example Such as, unimolecule (SMRT) sequencing in real time, Pacific Biosciences), or by nano-pore sequencing (such as in Minion or On Promethion platform, Oxford Nanopore Technologies).

Method the present invention also provides processing by being as defined herein the sequencing data that any method obtains.Processing The method of sequence data is the following steps are included: (a) identifies the sequence in the bar code region of each sequence read and from target nucleic acid Sequence;And it (b) is determined using the information from step (a) and is marked with from the bar code region of same polymer bar code reagent The sequence group of the target nucleic acid of note.

The method may also include by analytical sequence group with determine continuous sequence come the step of determining target nucleic acid sequence, Middle target nucleic acid sequence includes the nucleotide from least two sequence reads.

The present invention also provides the sequencing numbers obtained for handling (or analysis) by being as defined herein any method According to algorithm.The algorithm can be configured to execute any method being as defined herein for handling sequencing data.The algorithm It can be used for detecting the sequence in the bar code region in each sequence read, and be also used in detection sequence reading from target nucleic acid Sequence, and these sequences are divided into two relevant data sets.

The present invention also provides the long methods read of the synthesis generated from target nucleic acid comprising following steps: (a) basis It is as defined herein nucleic acid samples of any method preparation for sequencing;(b) sample is sequenced, is optionally wherein passed through Any method is as defined herein sample is sequenced;And the sequence data that (c) processing is obtained by step (b), appoint Wherein basis is as defined herein any method processing sequence data to selection of land;Wherein step (c) is generated comprising coming from least two The synthesis of at least one nucleotide of each of a sequence read is long to read.

The method may make can the target sequence to target nucleic acid molecule carry out determining phase, i.e., it, which may make, can determine sequence Positioned at which copy (i.e. male parent or female parent) of chromosome.Target sequence may include specific target mutation, transposition, missing or amplification, And the method can be used for be mutated, transposition, missing or amplification distribute to specific chromosome.To two or more target sequences Column, which carry out fixed phase, can also allow for detection aneuploidy.

It may include at least 50, at least 100, at least 250, at least 500, at least 750, at least that synthesis is long, which to read, 1000, at least 2000, at least 104It is a, at least 105It is a, at least 106It is a, at least 107Or at least 108A nucleotide.It is preferred that Ground, synthesizing long read includes at least 50 nucleotide.

The present invention also provides the methods that the target nucleic acid to two or more common locations is sequenced comprising following step It is rapid: (a) nucleic acid samples for being used for sequencing according to any method preparation is as defined herein;(b) sample is sequenced, optionally Ground is wherein sequenced sample by being as defined herein any method;And the sequence that (c) processing is obtained by step (b) Column data, optionally wherein basis is as defined herein any method processing sequence data;Wherein step (c) identification is comprising coming From at least two sequence reads of the nucleotide of at least two target nucleic acids of common location in the sample.

Any method by sequencing analysis bar code or the nucleic acid molecules that are associated may include redundancy sequencing reaction, wherein target nucleus Acid molecule (such as bar code in bar code reaction) is sequenced two or more times in sequencing reaction.Optionally, from sample The such molecule of each of product preparation can averagely sequencing at least twice, at least 3 times, at least 5 times, at least 10 times, at least 20 times, extremely It is 50 times or at least 100 times few.

In any method by sequencing analysis bar code nucleic acid molecules, error recovery procedure can be used.The process can The following steps are included: (i) determine two or more sequence reads comprising same sequence of barcodes from sequencing data collection, with And (ii) compares the sequence from the two or more sequence reads each other.Optionally, which may be used also The following steps are included: (iii) is determined at each position in sequence read and/or in the sequence of target nucleic acid molecule at each position Most of and/or most common and/or most probable nucleotide.The step optionally includes being gone by error correction, mistake It removes, any process of error detection, error count or statistics mistake removal establishes the consensus sequence of each target nucleic acid sequence.It should Step, which may also include, is compressed into the representative read comprising single error correction for multiple sequence reads comprising same sequence of barcodes The step of.Optionally, appointing for two or more sequence reads comprising same sequence of barcodes from sequencing data collection is determined What step may include determining comprising the sequence at least a degree of identical nucleotide and/or the sequence of barcodes of sequence similarity Column are read, and for example, at least 70%, at least 80%, at least 90% or at least 95% sequence similarity is (for example, allow bar code sequence Mispairing at any point and/or insertion or missing between column).

In using any method by sequencing analysis bar code nucleic acid molecules, the error correction of substitution can be used Journey comprising following steps: (i) determines two or more sequences comprising same target nucleic acid sequence from sequencing data collection It reads, wherein the two or more sequence reads also include two or more different sequence of barcodes, wherein bar code sequence Column come from same polymer molecular bar code and/or polymer bar code reagent, and (ii) will come from it is the two or more The sequence of sequence read compares each other.Optionally, the error recovery procedure may also include step (iii) and determine target nucleic acid molecule sequence Most of and/or most common and/or most probable nucleotide in arranging at each position.The step optionally includes passing through mistake Accidentally any process of correction, mistake removal, error detection, error count or the removal of statistics mistake establishes being total to for target nucleic acid molecule There is sequence.The step may also include by comprising same target nucleic acid molecule multiple sequence reads compression (callapsing) at comprising The step of representative that single error correction is read.Target nucleic acid molecule may include such as genomic dna sequence.Optionally, compare two A sequence of barcodes and/or compares sequencing sequence of barcodes and may include determining comprising at least certain with reference to any step of sequence of barcodes The identical nucleotide of degree and/or the sequence of sequence similarity, for example, at least 70%, at least 80%, at least 90% or at least 95% sequence similarity (for example, allowing the mispairing and/or insertion or missing at any point between sequence of barcodes).

31. the method for determining and analyzing the sequence read group that is associated from particle

The present invention provides the sequences that is associated for determining target nucleic acid (such as genomic DNA) segment from single particle to read The method for taking group, the method comprise the steps that (a) analyzes sample according to any method described herein;And it (b) determines The group of two or more sequence reads that are associated.

Two or more sequence reads that are associated can be determined by identifying the sequence read comprising same sequence of barcodes Group.

Two can be determined by identifying the sequence read of the different sequence of barcodes comprising the sequence of barcodes from same group Or more the sequence read that is associated group.

It can include the sequence read of the sequence of barcodes in the bar code region from same polymer bar code reagent by identification To determine the group of two or more sequence reads that are associated.

It can be by identifying the sequence read in two or more nonoverlapping segments for including same sequencing molecule come really Two or more fixed sequence reads that are associated.

It can be by identifying that two or more its space of sequence read in the sequenator for its sequencing that are associated connect Recency determines the groups of two or more sequence reads that are associated.Optionally, this spatial proximity is by using cutoff value Or threshold value determines, or determines by the nonrandom degree of approach or higher than the degree of approach of average value.Optionally, this space connects Recency is expressed as corresponding to quantitative values, semidefinite magnitude or the scope value of different spaces degree of closeness in sequenator.

The method may include determine at least three of sequence read of being associated, at least five, at least ten, at least 50, At least 100, at least 1000, at least 10,000, at least 100,000, at least 1,000,000 groups.

The present invention provides the methods for the sum for determining the sequence read group that is associated in sequence data collection comprising: (a) root Sample is analyzed according to any method described herein;And (b) determine the number for the sequence read group that is associated.

The sequence read that is associated can be determined by being counted to the number of the sequence read comprising different sequence of barcodes The number of group.

The sequence that is associated can be determined by being counted to the sequence of barcodes group with the sequence of barcodes in sequence read The number of reading group.

It can be counted by the number to the polymer bar code reagent with bar code region to determine the sequence that is associated The number of reading group, the sequence of barcodes in the bar code region is in sequence read.

Optionally, being only contained in sequence data collection in these method of counting indicates at least 2 times, at least 3 times, at least 5 Secondary, at least 10 times, at least 20 times, at least 50 times or at least 100 times sequence of barcodes.Optionally, before the method for counting, It is read by error correcting method processing sequence and/or sequence of barcodes.Optionally, before the method for counting, in entire sequence Column data concentrates the technology indicated more than once to repeat to read the quilt during data de-duplication (de-duplication) Single data de-duplication is compressed into read.

The method may include that the sum for the sequence read group that is associated is counted or estimated, wherein will be comprising from micro- Two or more nucleic acid sequences of target nucleic acid (such as genomic DNA) segment of grain are in the sequence comprising the sequence data collection It is inside attached to each other, and to the number of the sequence read from the sequence data collection for containing at least two different target nucleic acid sections Mesh is counted, thus determines the number for the sequence read group that is associated in sequence data collection.Optionally, to the sequence data collection The sum of interior sequencing molecule is counted, thus determines the number for the sequence read group that is associated in sequence data collection.Optionally, only To comprising at least three difference target nucleic acid section, comprising at least five difference target nucleic acid section, include at least ten difference target nucleic acid Section or sequencing molecule comprising at least 50 different target nucleic acid sections are counted.

The method may include that the sum for the sequence read group that is associated is counted or estimated, wherein the group of sequence passes through Spatial proximity in sequenator is come in information to be associated, and wherein to the sum that molecule is sequenced in the sequence data collection It is counted, thus determines the number for the sequence read group that is associated in sequence data collection.Optionally, in the sequence data collection The sum of sequencing molecule is counted, and then by it divided by constant normalization factor, thus determines phase in sequence data collection The number of contact sequence reading group.

The present invention provides the methods for determining parameter value from the group for the sequence read that is associated, the method comprise the steps that (a) The group for the sequence read that is associated is determined according to any method described herein;And (b) by the every of sequence read group that be associated A sequence (at least part of it) maps to one or more reference nucleotide sequences;And (c) by the sequence that is associated One or more reference nucleotide sequences are counted or are identified that it exists to determine parameter value in reading group.

Optionally, which may include whole gene group, whole chromosome, a part of chromosome, gene, gene A part, any other part of genome or any other synthesis or actual sequence.Reference sequences may include transcript, A part, a part of transcript isotype or transcript isotype of transcript;Reference sequences may include the montage of transcript Border.Reference sequences may be from human genome.Reference sequences may be from one or more different reference man's genome sequences, Such as from different reference sequences below: the library of two or more different reference man's genome sequences or two or More different haplotypes determine the library of reference man's genome sequence of phase (for example, from world HapMap project The different bases of (International HapMap Project) and/or 100 Genome Projects (100 Genomes Project) Because of a group sequence).

Optionally, one or more reference sequences may include false reference sequences, wherein the reference sequences include difference In one or more nucleotide of normal or canonical reference sequence (such as human genome reference sequences).For example, the false ginseng Examining sequence may include one or more by molecular conversion process (such as bisulfite conversion process or oxidative hydrogen salt Conversion process) generate sequence.False reference sequences may include corresponding to cytidylic acid position in canonical reference genome sequence One or more nucleotide of point, wherein the vacation reference sequences include one or more modified in the site And/or Variant nucleotide.Optionally, the false reference sequences may include the nucleotide at the cytidylic acid site, It corresponds to different molecule conversion spectrum and (corresponds in molecule conversion (such as bisulfite conversion or oxidative hydrogen Salt conversion) the different sequences that generate during process, such as wherein the different sequences as the cytidylic acid site are It is no comprising do not methylate, methylate and/or the function of methylolated cytidylic acid and generate), optionally, wherein point The sequence obtained after sub- conversion process differently maps to the function as its methylation and/or methylolation state described Reference sequences.

Optionally, one or more reference sequences may include in specific organization's (i.e. particular cell types) and/or Exclusively exist in specific specific pathological tissues or preferentially finds or sent out with high level and/or higher than the level of average value Existing sequence.Optionally, one or more reference sequences can exclusively exist or excellent in non-parent and/or male parent's tissue It first finds or with high level and/or higher than the level discovery of average value.Optionally, one or more reference sequences can Exclusively exist in maternal tissue or preferentially finds or found with high level and/or higher than the level of average value.Appoint Selection of land, one or more reference sequences can one or more of particular tissue types (for example, lung tissue or pancreatic tissue, Or lymphocyte) in exclusively exist or preferentially discovery or with high level and/or higher than the level discovery of average value.Appoint Selection of land, one or more reference sequences can be in certain types of pathological tissues (such as cancerous tissues, such as cancerous lung tissue or knot Rectum cancer tissue, or the cerebrovascular tissue or experience eclampsia of the cardiac muscular tissue from non-cancer illing tissue such as infarct or lesion Or the placenta tissue of aura eclampsia) in exclusively exist or preferentially discovery or with high level and/or be higher than average value Horizontal discovery.Optionally, one or more reference sequences can be in certain types of tissue (for example, lung tissue or pancreas group Knit or lymphocyte) in exclusively exist or preferentially discovery or with high level and/or higher than the level hair of average value It is existing.Optionally, one or more reference sequences can be in certain types of health tissues (for example, the lung tissue or health of health Pancreatic tissue or health lymphocyte) in exclusively exist or preferentially discovery or with high level and/or be higher than flat The horizontal discovery of mean value.

Optionally, any one or more reference sequences can be established by experience measurement and/or evaluation method, wrapped It is contained in specific organization's (i.e. particular cell types) and/or exclusively exists in specific specific pathological tissues or preferentially send out Now or with high level and/or the sequence of the horizontal discovery higher than average value.Optionally, two or more differences be can measure The expression (such as rna level) of one or more transcripts in organization type (for example, pathological tissues and health tissues), to build It stands and exclusively exists in one of described histological types or preferentially find or with high level and/or higher than average value Horizontal discovery one or more transcripts.Optionally, two or more histological types be can measure (for example, disease Become tissue and health tissues) in one or more genes (or such as gene promoter) 5-methylcytosine (or similarly 5- hydroxy-methyl cytimidine) it is horizontal, exclusively exist in one of described histological types with foundation or preferentially find, Or with high level and/or one or more methylations (or methylolation) gene or base of the horizontal discovery higher than average value Because of promoter.Optionally, one is can measure in two or more histological types (for example, pathological tissues and health tissues) Or more gene (or such as gene promoter) DNA enzymatic (DNAse) accessibility and/or chromatin it is open (such as logical Cross ATAC-seq measurement), with establish in one of described histological types exclusively exist or preferentially discovery or with height Level and/or higher than come-at-able (and/or the open chromatin) base of one or more DNA enzymatics of the horizontal discovery of average value Cause or gene promoter.

Reference nucleotide sequence may include the sequence corresponding to chromosome or chromosome a part.Optionally, the sequence is long Degree is at least one nucleotide, and length is at least ten nucleotide, and length is at least 100 nucleotide, and length is at least 1000 Nucleotide, length are at least 10, and 000 nucleotide, length is at least 100, and 000 nucleotide, length is at least 1,000,000 A nucleotide, length are at least 10, and 000,000 nucleotide or length are at least 100,000,000 nucleotide.

Reference nucleotide sequence may include two or more sequences corresponding to two or more chromosomes, or corresponding Sequence in two or more parts of one or more chromosomes.Optionally, respectively length is at least one to these sequences Nucleotide, length are at least ten nucleotide, and length is at least 100 nucleotide, and length is at least 1000 nucleotide, length It is at least 10,000 nucleotide, length is at least 100,000 nucleotide, and length is at least 1,000,000 nucleotide, long Degree is at least 10, and 000,000 nucleotide or length are at least 100,000,000 nucleotide.Optionally, the reference sequences It may include whole gene group sequence.

Reference nucleotide sequence may include one or more sliding windows, wherein each window includes the base of finite length Because of the span in group region, and two of them or more window deviates certain a limited number of cores along the genome area Thuja acid.Optionally, these sliding windows can be nucleotide partly overlapping, being closely adjacent to each other or by certain amount across Degree separates.

Reference nucleotide sequence may include repetitive sequence.Optionally, which includes dinucleotides repetition, three nucleosides Acid repeats, tetranucleotide repeat or pentanucleotide repeat.Optionally, reference nucleotide sequence includes two of identical repetitive unit Or more adjacent copy series, such as 2 adjacent copies, 5 adjacent copies, 8 adjacent copies, 10 it is tight Adjacent copy, 15 adjacent copies, 20 adjacent copies, 30 adjacent copies, 40 adjacent copies, 50 it is adjacent Copy or 100 adjacent copies.

Optionally, it is true by any method described herein to analyze that any one or more reference sequences can be used Fixed sequence.Any one or more reference sequences can be used to analyze the sequence of genomic DNA fragment.It can be used any one A or more reference sequences analyze the sequence of RNA.Any one or more reference sequences can be used to analyze genome The sequence of DNA fragmentation, wherein carrying out modified nucleotide or nucleobase to one or more genomic DNA fragments (as such example, any one or more reference sequences can be used to analyze through enrichment method richness in measurement The sequence of the genomic DNA fragment of modified nucleotide (such as 5-methylcytosine or 5- hydroxy-methyl cytimidine) is collected; As example as another, any one or more reference sequences can be used to analyze at least one core wherein included Thuja acid has passed through molecular conversion process (such as bisulfite conversion process or oxidative hydrogen salt conversion process) conversion The sequence of genomic DNA fragment, wherein the conversion process is for detecting one or more modified nucleotide, such as 5- Methylcystein or 5- hydroxy-methyl cytimidine).

Optionally, any one or more reference sequences can be used to analyze the sequence of genomic DNA fragment, wherein appointing Most 5 ' and/or most 3 ' nucleotide (and/or the nucleosides near most 5 ' and/or most 3 ' nucleotide of what such genomic DNA fragment Nucleotide in acid, such as nearest 2,3,4 of most 5 ' and/or most 3 ' nucleotide or 5 nucleotide) map to the ginseng Examine sequence.Optionally, the sequence of genomic DNA fragment can be mapped to determine their positions in reference man's genomic dna sequence It sets and/or span, and then can determine their most 5 ' and/or most 3 ' nucleotide (and/or such as most 5 ' and/or most 3 ' nucleotide Nearest 2,3,4 or 5 nucleotide in nucleotide) whether fall into one or more reference sequences.Optionally, The method at 5 ' and/or 3 ' ends of the sequence of such analysis genomic DNA fragment can be used to analyze the fragmentation of the segment Mode-for example analyzes spacing and/or the position and/or fixed of nucleosome and/or other protein together with genomic DNA molecule Position.Optionally, two or more different reference sequences can be used and/or with reference to mapping graph come fragmentation mould as analyzing Formula, wherein the different reference mapping graph can correspond to particular tissue type and/or pathological tissues type and/or associated therewith (for example, first can correspond to and/or can measure segment present in the first tissue type (such as lung tissue) with reference to mapping graph Change mode, and second can correspond to and/or can measure in minor microstructure type (such as hepatic tissue) exist with reference to mapping graph Fragment pattern;As other examples, first can correspond to and/or can measure specific health tissue class with reference to mapping graph Fragment pattern present in type (such as healthy lung tissue), and second can correspond to and/or can measure with reference to mapping graph Fragment pattern present in specific pathological tissues type (such as lesion and/or carcinous lung tissue)).

Parameter value can be quantitative values or semidefinite magnitude, and by being determined comprising being originated from the reference nucleotide sequence The number of sequence read is counted to determine in the group of the sequence of the sequence of column.Optionally, whether sequence determined by determining The step of from reference nucleotide sequence can only include the perfect matching between two sequences, and optionally the step allows Imperfect matching between two sequences.Optionally, when comparing two sequences, imperfect matching may include Variant nucleotide with And the insertion or missing of nucleotide.Optionally, the nucleotide in one of determining and other sequences perfect matching sequence can be passed through Score come determine matching.Optionally, a part of a certain specific length of detection or the sequence of a certain minimum length can be passed through Perfect matching matches to determine.Optionally, allele or multiple equipotentials in Evaluation on specificity reference nucleotide sequence can be passed through The presence of gene matches to determine, wherein the allele includes single nucleotide acid or the area of two or more nucleotide Domain or its insertion or missing, this can be different in different chromosomes or different haplotypes.Optionally, allele exists It is different in two or more reference nucleotide sequences.Optionally, allele may include non-parent and/or male parent's equipotential Gene, wherein particulate samples derive from maternal blood, serum or plasma sample.

Parameter value can be bi-values, and can by least one sequence read in group that detection sequence is read whether It is determined comprising being originated from the sequence of the reference nucleotide sequence.Optionally, whether sequence determined by determining is originated from reference to core The step of nucleotide sequence can only include the perfect matching between two sequences, and optionally the step allow two sequences it Between imperfect matching.Optionally, when comparing two sequences, imperfect matching may include Variant nucleotide and nucleotide Insertion or missing.It optionally, can be by the score of the nucleotide in one of determining and other sequences perfect matching sequence come really Fixed matching.Optionally, can by detect the sequence of a certain specific length or a certain minimum length a part perfect matching come Determine matching.Optionally, the presence of allele or multiple allele in Evaluation on specificity reference nucleotide sequence can be passed through It is matched to determine, wherein the allele includes region or its insertion of single nucleotide acid or two or more nucleotide Or missing, this can be different in different chromosomes or different haplotypes.Optionally, allele is at two or more It is different in reference nucleotide sequence.Optionally, allele may include non-parent and/or male parent's allele, wherein micro- Grain sample source is in maternal blood, serum or plasma sample.

Optionally, the list of two or more reference sequences and/or each reference sequences in group can be with weighted values And/or relating value is related.Optionally, the weighted value and/or relating value can correspond to given sequence be non-parent or male parent can Can property or probability, or a possibility that be parent corresponding to given sequence or probability.Optionally, the weighted value and/or relating value A possibility that given sequence be can correspond to from particular tissue type (for example, lung tissue or pancreatic tissue or lymphocyte) or Probability.Optionally, the weighted value and/or relating value can correspond to given sequence from certain types of pathological tissues (such as cancer Tissue, such as cancerous lung tissue or Colorectal Carcinoma, or the cardiac muscular tissue from non-cancer illing tissue such as infarct or lesion Cerebrovascular tissue or experience eclampsia or aura eclampsia placenta tissue) a possibility that or probability.

Optionally, it can be established by experience measurement and/or evaluation method appointing for any one or more reference sequences What such weighted value and/or relating value.It optionally, can be by measuring two or more histological types (for example, lesion Tissue and health tissues) in the expression (such as rna level) of two or more transcripts establish any one or more The weighted value and/or relating value of reference sequences, and then can with the weighted value of first and second organization type and/or Relating value is the same, rule of thumb establishes the absolute of described two or more transcripts in the first and second organization types respectively And/or relative expression levels.It optionally, can be by measuring two or more histological types (for example, pathological tissues and strong Health tissue) in two or more genome areas (for example, two or more genes or two or more gene promoters Region) 5-methylcytosine (or similarly 5- hydroxy-methyl cytimidine) level come establish any one or more reference Any weighted value and/or relating value of sequence, and then can with the weighted value of first and second organization type and/or Relating value is the same, rule of thumb establishes the two or more genes (or promoter) in the first and second organization types respectively Absolute and/or opposite 5-methylcytosine it is horizontal.It optionally, can be by measuring two or more histological types (examples Such as, pathological tissues and health tissues) in two or more genome areas (for example, two or more genes or two or More gene promoter regions) DNA enzymatic accessibility and/or chromatin it is open (for example, passing through ATAC-seq measurement) Any weighted value and/or relating value of any one or more reference sequences are established, and then can be with described first and The weighted value and/or relating value of two organization types are the same, rule of thumb establish institute in the first and second organization types respectively State the water of the absolute of two or more genes (or promoter) and/or opposite DNA enzymatic accessibility (or chromatin is open) It is flat.

Optionally, it can be established by experience measurement and/or evaluation method appointing for any one or more reference sequences What such weighted value and/or relating value, wherein experience measurement and/or evaluation method are used and followed comprising one or more One or more samples of ring particle are as the input sample for experience measurement and/or evaluation method (for example, wherein The first and second sequences for carrying out the genomic DNA fragment of self-loopa particle are for example connected by any method described herein System).Optionally, any one or more circulation particle respectively contains at least the first and second genomic DNA fragments.Appoint Selection of land, any one or more sample comprising one or more circulation particles is available from one or more Specified disease (such as cancer (such as lung cancer or cancer of pancreas) or for example in the cancer of moment (such as I phase, II phase, III Phase, IV phase) or cancer for example with specific clinical feature (such as benign cancer, for example pernicious cancer, for example local cancer, for example turn Move property cancer or for example treat resistance cancer)) patient.Optionally, comprising the one of one or more circulation particles or more Multiple samples may be from the patient without any such one or more of specified diseases.Optionally, comprising one or more One or more sample of multiple circulation particles may be from be considered as health patient.Optionally, comprising one or Any one or more sample of more circulation particles may include at least the first and second samples from same individual Product, wherein the first sample is prepared from individual in the time earlier, and the second sample is prepared from individual in later time, is separated Time interval (such as one hour or one day or one week or one month or 3 months or 6 between first and second samples The moon or 12 months or 2 years or 3 years or 5 years or 10 years).Optionally, it can be built by experience measurement and/or evaluation method It is vertical any one or more reference sequences any such weighted value and/or relating value, wherein experience measurement and/or Evaluation method using from the patient with disease at least one sample (including one or more circulation particles) and come From at least one sample (including one or more circulation particles) of the people for being not suffering from the disease (for example, will wherein correspond to From with disease people sample in the reference sequences amount and/or signal with correspond to the sample from the people not suffered from the disease The amount of the reference sequences and/or signal are compared in product, such as wherein the ratio of described two measurements is used as the weighting Value and/or relating value).Optionally, any one or more can be established with reference to sequence by experience measurement and/or evaluation method Any such weighted value and/or relating value of column, wherein experience measurement and/or evaluation method use and come from least two The sample (including one or more circulation particles) of patient group with disease, and the disease is not suffering from from least two The sample (including one or more circulation particles) of the group of the people of disease.Optionally, the patient's with disease is any described Group (or the group for being not suffering from the people of the disease) can respectively contain at least three, at least five, at least ten, at least 20, extremely Few 50, at least 100, at least 200, at least 500, at least 1000, at least 2000, at least 10,000, at least 20,000, at least 50,000, at least 100,000, at least 500,000, at least 1,000,000 or at least 10, 000,000 individual.Optionally, any patient in the group of the patient with disease (or is not suffering from the people of the disease The group in anyone) can respectively provide two or more include circulation particle sample, wherein each sample exists Different time point (such as separate at least one day, at least one week, at least one moon, at least two moon, at least six moon, at least one The time point in year, at least 2 years or at least 5 years) it obtains.

Optionally, wherein comprising it is one or more circulation particles one or more samples be used as input sample with Any weighted value and/or relating value of any one or more reference sequences are established by experience measurement and/or evaluation method Any method in, the weighted value and/or relating value can to 5-methylcytosine it is horizontal it is related (for example, they can with it is specific 5-methylcytosine in healthy or specific pathological tissues is horizontal related), or optionally can be with 5- hydroxy-methyl cytimidine water Flat related (for example, they can be horizontal related to the 5- hydroxy-methyl cytimidine in specific health or specific pathological tissues), or It optionally can open horizontal related (such as they can be with specific health or specific disease to DNA enzymatic accessibility and/or chromatin Become the DNA enzymatic accessibility and/or the open horizontal correlation of chromatin in tissue), or optionally sequence can be referred to described The genomic DNA fragment from particular tissue type and/or pathological tissues type and/or healthy tissue types is found most in column 5 ' and/or most 3 ' nucleotide (and/or the nucleotide near most 5 ' and/or most 3 ' nucleotide, such as most 5 ' and/or most 3 ' nucleosides Acid nearest 2,3,4 or 5 nucleotide in nucleotide) frequency and/or probability correlation.

Optionally, the method may include to from one or more list of reference sequences in sequence read group that are associated Number of reference sequences counted.Optionally, the sequence read group that can be associated to all in sample or its any one or More subgroups carry out the method for counting.Optionally, each reference sequences can be related to weighted value and/or relating value, makes to succeed in one's scheme Counting method includes weighted count method, wherein the weighted sum for the reference sequences being associated in sequence read group has been determined.Optionally Ground, the weighted value can correspond to a possibility that given sequence is non-parent or male parent or probability, or corresponding to given sequence be female A possibility that body or probability, or correspond to given sequence from specific organization source (for example, lung tissue or pancreatic tissue or leaching Bar cell) a possibility that or probability, or correspond to given sequence from specific health it is tissue-derived (for example, the lung tissue of health, Health pancreatic tissue or health lymphocyte) a possibility that or probability, or correspond to given sequence come from specific lesion A possibility that tissue-derived (for example, lymphocyte of the lung tissue of lesion or the pancreatic tissue of lesion or lesion) or probability, Or correspond to given sequence from specific carcinous tissue-derived (for example, carcinous lung tissue or carcinous pancreatic tissue or carcinous leaching Bar cell) a possibility that or probability.

Optionally, can by any total and/or weighted sum of the reference sequences from the sequence read group that is associated and one or More threshold values are compared, and are wherein determined and/or suspected and be associated sequence read comprising the number greater than the threshold value Group is from specific tissue-derived.Optionally, the sequence read group that can be associated to all in sample and/or its any one or Any side that more subgroups are determined any such summation and are compared it with one or more threshold values Method.Optionally, the method for determining any such summation may include determining weighted sum as described above.Optionally, may be used Determine total and/or weighted sum be equal to threshold value, in one or more ranges of threshold value, be less than threshold value or in particular value group The sequence read group that is associated from specific tissue-derived.Optionally, any method described herein can be used for determining The sequence read group that is associated in specific organization source.It optionally, can be to being found or suspected to be specific group by any method The sum for knitting the sequence read group that is associated in source is counted, with the sequence read that is associated in the determination specific organization source The sum of group.

Optionally, analysis times can be compared to by two or more different list of reference sequences and/or with it What one or more sequence group that are associated (or all sequence read groups that are associated in such as sample).Optionally, in sample Available the first list of reference sequences corresponding to the first particular tissue type of the sequence read group that is associated analyzed, and also It is analyzed with the second list of reference sequences for corresponding to the second particular tissue type.Optionally, the sequence that is associated in sample Available the first list of reference sequences corresponding to specific health organization type of reading group is analyzed, and also specific with corresponding to Second list of reference sequences of pathological tissues type is analyzed.Optionally, available pair of the sequence read group that is associated in sample It should be analyzed in the first list of reference sequences of specific health organization type, and also with the cancer for corresponding to identical organization type Second list of reference sequences of property tissue is analyzed.Optionally, can at least three, at least four, at least five, at least ten, The sequence read group that is associated at least 20 or at least 30 list of reference sequences analysis samples, wherein each reference sequences arrange Table corresponds to histological types and/or healthy tissue types and/or pathological tissues type and/or cancerous tissue type.Optionally Ground, can use at least 50, at least 100, at least 500, at least 1000, at least 10,000, at least 100, and 000, extremely Being connected in few 1,000,000, at least 10,000,000 or at least 100,000,000 list of reference sequences analysis sample Be sequence read group, optionally wherein each list of reference sequences correspond to histological types and/or healthy tissue types and/ Or pathological tissues type.Optionally, with the sequence read group that is associated in two or more list of reference sequences analysis sample Any method may include will comprising 5-methylcytosine the genomic DNA from the sample segment and it is described two or More list of reference sequences are compared.Optionally, with being connected in two or more list of reference sequences analysis sample Any method for being sequence read group may include the genomic DNA from the sample that will include 5- hydroxy-methyl cytimidine Segment is compared with the two or more list of reference sequences.Optionally, with two or more list of reference sequences Any method for analyzing the sequence read group that is associated in sample may include by the sequence from the RNA of the sample and described two A or more list of reference sequences is compared.Optionally, in two or more list of reference sequences analysis sample Any method for the sequence read group that is associated may include will be from most the 5 ' of the genomic DNA fragment of the sample and/or most 3 ' Nucleotide (and/or the nucleotide near most 5 ' and/or most 3 ' nucleotide, such as nearest 2 of most 5 ' and/or most 3 ' nucleotide, Nucleotide in 3,4 or 5 nucleotide) it is compared with the two or more list of reference sequences.

Sequence read from the sequence read group that is associated can map to corresponding to two of identical genome area or More reference nucleotide sequences, wherein each reference nucleotide sequence includes the different mutation equipotentials in the genome area The mutation allele of gene or different groups, and core can be referred to by one or more in the sequence read group that is associated The presence of nucleotide sequence determines the parameter value.

It can determine or estimate the length of the target nucleic acid (such as genomic DNA) segment, and parameter may include described true Average value, median, mode, maximum value, minimum value or any other single representative value of fixed or estimation length.Optionally Ground, by the way that substantially the entire sequence of genomic DNA fragment (holding from it close to 5 ' ends to it close to 3 ') is sequenced simultaneously The length to determine genomic dna sequence in each sequencing fragment is counted to the number for the nucleotide being wherein sequenced.Optionally Ground, this is carried out by following: being surveyed to the nucleotide at 5 ' ends of the sequence in fragmentation genomic DNA of enough numbers 5 ' the end is mapped to the locus in reference man's genome sequence by sequence, and similarly, to enough numbers in segment Change the nucleotide at 3 ' ends of the sequence of genomic DNA to be sequenced, the 3 ' end is mapped into reference man's genome sequence Interior locus, and then calculate include nucleotide below total span: the 5 ' section in reference man's genome sequence, 3 ' the section in reference man's genome sequence and include any people's genes not being sequenced between two sequencing parts Group sequence.

Can at least two, at least ten, at least 100, at least 1000, at least 10,000, at least 100,000, At least 1,000,000, at least 10,000,000, at least 100,000,000 or at least 1,000,000,000 are associated Sequence read group determines parameter value.

Parameter value can be determined at least two group for the sequence read that is associated, and can be by determining the sequence read that is associated The number of group carrys out evaluation parameter value, wherein the parameter value is equal to particular parameter value, the group equal to two or more parameter values One of, less than particular parameter value, greater than particular parameter value or within the scope of at least one of the parameter value or described In one of two or more ranges of parameter value.Optionally, it is determined that in the quilt in sequence read group that is associated of all evaluations Determine the score or ratio for meeting the sequence read group that is associated of one or more above-mentioned conditions.Optionally, at least two phase Contact sequence reading group determines parameter value, and determine the average value (mean) of entire parameter value group, average (average), Mode or median parameter value.

Parameter value determined to the be associated group of sequence read group of at least two, and can by by the group of parameter value with The group of second parameter value is compared to evaluation parameter value.Optionally, second parameter value group can correspond to parameter value Expection normal distribution, or the expection Non-Gaussian Distribution corresponding to parameter value.Optionally, these parameter values may be from representing one Generated data, random data or one or more independent samples by circulation particle of kind or more normal or abnormal condition The experimental data that product generate.Optionally, it may be determined that at least one, at least ten, at least 100, at least 1000, at least 10, 000, at least 100,000 or at least 1,000,000 other parameter value groups and further by itself and the first parameter value Group is compared.Optionally, statistical test (such as T inspection, bi-distribution inspection, Chi-square Test or variance point can be carried out (ANOVA) is analysed to examine) to compare the first and second or more parameter value groups.Optionally, false discovery rate evaluation is carried out, The middle group by the first parameter value is compared with the entry of two or more parameter value groups, and has wherein been determined have Higher or lower than the parameter value of the first parameter value group, average parameter value, median parameter value or from the parameter value source The score of group in the entry of two or more groups of other amounts.

At least two different parameter values can determine for the sequence read group that is associated.Optionally, at least three, extremely is determined Few 10, at least 100, at least 1000, at least 10,000, at least 100,000, at least 1,000,000, at least 10,000,000 or at least 100,000,000 different parameter values.

The present invention provides the methods for determining the sequence read group that is associated comprising: (a) determine that two or more are connected It is the parameter value of each of sequence read group, wherein determining the sequence that is each associated according to any method described herein The parameter value of reading group;And compare the parameter value for the sequence read group that is associated (b) to identify two or more sequences that are associated The group of column reading group.

Sequence read group can be associated by identification to determine the group for the sequence read group that is associated, the sequence that is associated The parameter value of reading group be equal to particular parameter value, equal to one in the group of two or more parameter values, be less than special parameter Value, greater than particular parameter value or within the scope of at least one value of the parameter value or the parameter value two or more In one of a value range.Optionally, the number for the sequence read group that is associated in group is determined, so that it is determined that the size of group.

The method may include that further evaluation is associated the group of sequence read group, wherein by the second analytical procedure come Further analyze the group for the sequence read group that is associated.Optionally, which includes and determines and/or evaluate to be associated Second parameter value of the group of sequence read group.Optionally, which includes and determines to be included in the sequence reading that is associated Take sequential memory in the group of group or there is no specific allele.Optionally, which includes that determination is deposited Or be not present chromosome abnormality, such as one or more aneuploidy or micro-deleted or copy number variation or heterozygosity Loss or rearrangement or translocation events, mononucleotide variant, from the beginning mutation or any other genome signature or mutation.

The method may include that the group for the sequence read group that is associated further is evaluated by the second analytical procedure, wherein Two analytical procedures include determine be associated in the group of sequence read group be each associated in sequence read group map to one or The number of the sequence read of more reference nucleotide sequences.Optionally, the reference sequences may include whole gene group, it is entire Chromosome, a part of chromosome, gene, a part of gene, any other part of genome or any other synthesis or Actual sequence.Optionally, second analytical procedure include to the sequence read mapped in group in reference sequences sum into Row counts, and then by the number of sequence read divided by the sum of the group in the group, with estimate every group in reference sequences Sequence read relative number.Therefore, this can form the primary particle sample for corresponding to the group for the sequence read group that is associated The estimation of the relative number of sequence read of the interior each particle in reference sequences.Optionally, which can also wrap Including the step of being compared the relative number of the estimation with threshold value, wherein the relative number of estimation is greater than the threshold value, or As an alternative, the relative number of estimation is less than the threshold value and can be shown that presence or absence of specific medicine or inherited disorder, Such as chromosomal aneuploidy or micro-deleted.

32. for converting method of the sequence read data for being analyzed by algorithm that be associated

The present invention provides for will be associated sequence data be converted into can by analysis or statistical means be easier or Its method for representing form more fully analyzed.It is of special importance that the method can be used for analyzing the specific of circulation particle The presence of textural anomaly in sample (for example, transposition or copy number variation on a large scale), but the wherein specific spy of the textural anomaly Property, genomic locations or size be previously unknown, and in addition, wherein such factor can not to specific Biological measurement It is directly important.

Sequence from particle can be used for detection structure and be of the presence of an anomaly with, the presence of the textural anomaly can refer to sample product from The presence of cancer in the human body in its source.The presence of the textural anomaly of certain amount and/or burden itself can indicate that cancer (or refers to Show its risk), but such potential abnormal genomic locations can be with neither perspective known nor with cancer wind Danger assessment is relevant;Therefore, the particle sequence data that is associated is converted into being easier the shape analyzed with information or statistical means Formula can enhance the sensitivity and specificity of this method.It is of special importance that conversion method, which may make to use, usually requires logarithm Specific digital tool family (such as deep learning and/or machine learning according to some conversions are carried out effectively to be analyzed Method and neural network/recurrent neural network method) analyze such particle phase contact sequence data.

The present invention provides the methods for the sequence data that is associated that conversion is generated by particulate samples, wherein first group is associated Sequence read by first circulation particle target nucleic acid fragment generate, and wherein second group of sequence read that is associated by second circulation The target nucleic acid fragment of particle generates.

The first and second sequence read groups that are associated can be mapped to reference to genome sequence, and wherein each sequence is read Take and be converted into the expression including its mapped chromosome, and instruction function, wherein the indicated work can include its with come From the connection of another at least one sequence of the same sequence read group that is associated.Optionally, the instruction function can be identification The unique identifiers for the sequence read group that is accordingly associated.

The first and second sequence read groups that are associated can be mapped to reference to genome sequence, and wherein each sequence quilt Be converted to expression and indicated work including its genomic coordinates (including the position in chromosome number and the chromosome) Can, wherein the indicated work can include or represent itself and another at least one sequence from the same sequence read group that is associated Connection.Optionally, the instruction function can be the unique identifiers of the corresponding sequence read group that is associated of identification.Optionally Ground, genomic coordinates are represented by approximation or windowed value, for example, in 2 bases by indicating nearest on chromosome or It is on chromosome in 10 nearest bases or in 100 nearest on chromosome bases or nearest on chromosome In 1000 bases or in 10,000 nearest on chromosome bases or in 100,000 nearest on chromosome bases, Or on chromosome in nearest 1,000,000 bases or in 10,000,000 nearest on chromosome bases;Such as genome is sat Mark can indicate in the window for corresponding to each intrachromosomal position, wherein it is at least two that such window, which can be length, Nucleotide or length are at least ten nucleotide or length is that at least 100 nucleotide or length are at least 1000 nucleosides Acid or length are at least 10, and 000 nucleotide or length are at least 100, and 000 nucleotide or length are at least 1,000, 000 nucleotide or length are at least 10,000,000 nucleotide.Optionally, (or it adds the genomic coordinates that sequence indicates Window indicates or approximate representation) it can be moved by the factor (such as the nucleotide for passing through the certain amount of upstream or downstream) along chromosome It is dynamic.

The first and second sequence read groups that are associated can be mapped to reference to genome sequence, and wherein in the sequence that is associated First ray in column reading group is read and the second sequence read respectively contains the sequence from same chromosome, wherein the second sequence Column read and are converted into the expression including the genome distance between first and second sequence read described in chromosome.Optionally Ground, the expression of the genome distance are approximation or windowed value, for example, 2 nearest base-pairs, 10 nearest base-pairs, Nearest 100 base-pairs, 1000 nearest base-pairs, 10,000 nearest base-pairs, 100,000 nearest bases To, 1,00,000 nearest base-pairs or 10,000,000 nearest base-pairs.It optionally, can be in the same sequence that is associated Any such method is carried out in the group of 3 or more sequences in column reading group.Optionally, the sequence read that is associated is calculated The average value or median chromosome location of sequence in group, and each sequence is by relative to the average value or median position The nucleotide distance set indicates.Optionally, wherein such method is in same 3 to be associated in sequence read group or more It is carried out in the group of sequence, a sequence in 3 or more sequences may act as reference sequences, and its chromosome location can fill When reference chromosome location, and each sequence is indicated by the distance relative to the nucleotide with reference to chromosome location.

The first and second sequence read groups that are associated can be mapped to the groups of two or more reference nucleotide sequences, And wherein each sequence is converted into the expression including its mapped reference nucleotide sequence (if any), and refers to Show function, wherein the indicated work can include the connection of itself and other at least one sequences from the same sequence read group that is associated System.Optionally, the instruction function can be the unique identifiers of the corresponding sequence read group that is associated of identification.Optionally, institute Stating reference nucleotide sequence can identify each by unique reference sequences identifier, and each sequence can be by corresponding only Special reference sequences identifier indicates.Optionally, can be used at least three, at least ten, at least 100, at least 1000, at least 10,000, at least 100,000, at least 1,000,000, at least 10,000,000, at least 100,000,000, extremely Few 1,000,000,000, at least 10,000,000,000 or at least 100,000,000,000 different reference nucleosides Acid sequence.Optionally, each reference nucleotide sequence may include the single continuous sequence of any length, or may include any length The group of two or more continuous sequences of degree.

The first and second sequence read groups that are associated can be mapped to two or more variant allele or variant Group, and wherein each sequence is converted into the table including its mapped variant allele or variant (if any) Show, and instruction function, wherein the indicated work can include itself and other at least one from the same sequence read group that is associated The connection of sequence.Optionally, the variant allele or variant can be each by unique variant allele or variant marks Symbol is known to identify, and each sequence can be indicated by any corresponding unique variant allele or variant identifier respectively.Appoint Two or more different groups of variant allele or variant can be used in selection of land, wherein each sequence is converted into packet It includes variant allele or variant (if any) from its first group of its mapped and comes from its mapped The expression of the variant allele or variant (if any) of its second group and any other group, and including its with The instruction function of the connection of other at least one sequences from the same sequence read group that is associated.Optionally, in each of which group The variant allele or variant can be identified each by unique variant allele or variant identifier, optionally Each group of ground, variant allele or variant can be further by unique variant or variant allele group identifier Identification.

The method may include determine first and second be associated sequence read group sequence read length, and wherein Each sequence be converted into including its determine length expression, and instruction function, wherein the indicated work can include its with The connection of other at least one sequences from the same sequence read group that is associated.Optionally, by each of genomic dna sequence Length is compared with one or more potential length ranges, and each sequence is converted to including indicating that the length is The no expression for falling into the parameter in each such range, and instruction function, wherein the indicated work can include its with come from The connection of other at least one sequences of the same sequence read group that is associated.Optionally, it may be determined that appoint in the sequence read group that is associated The average length of what two or more length.

Sequence read group, at least ten of being associated at least two be associated sequence read group, at least 100 be associated Sequence read group, at least 1000 sequence read groups that are associated, at least 10,000 the sequence read groups that are associated, at least 100, 000 sequence read group that is associated, at least 1,000,000 sequence read groups that are associated, at least 10,000,000 be associated Sequence read group, at least 100,000,000 be associated sequence read group or at least 1,000,000,000 sequences that are associated Reading group carries out the method.Optionally, the side can be carried out to the subgroup of the sequence read group that is associated from particulate samples Method.Optionally, be associated in sequence read group specifically, only in the sequence read group that is associated all sequences it is incomplete Ratio or score can be used for above-mentioned any analysis.

It in the method, can be such as institute herein by the sequence data that is associated that the sample of two or more particles generates State and converted, and wherein the converted data be used for training algorithm, such as neural network or artificial neural network or Recurrent neural network or deep neural network, decision tree or support vector machines or Bayesian network (Bayesian Network) or genetic algorithm or sparse dictionary or machine learning algorithm or deep learning algorithm, or supervision, it is unsupervised or Semi-supervised machine learning algorithm or feature learning or feature extraction algorithm or nitrification enhancement indicate learning algorithm, Or any combination thereof, component or composition.It optionally, can be converted based on being generated by two or more different particulate samples Data train the algorithm.Optionally, the algorithm can be trained to detect the presence of cancer in the human body for providing the sample. Optionally, the algorithm can be trained to detect depositing for textural anomaly in the genomic DNA for carrying out self-loopa particle or chromosome abnormality In.

It in the method, can be such as institute herein by the sequence data that is associated that the sample of two or more particles generates It states and is converted, and wherein the converted data are evaluated using algorithm, such as neural network or artificial neural network Network or recurrent neural network or deep neural network or decision tree or support vector machines or Bayesian network or heredity are calculated Method or sparse dictionary or machine learning algorithm or deep learning algorithm, or supervision, unsupervised or semi-supervised machine learning are calculated Method perhaps feature learning or feature extraction algorithm or nitrification enhancement or indicates learning algorithm or any combination thereof, group Part or composition.

It in the method, can be such as institute herein by the sequence data that is associated that the sample of two or more particles generates It states and is converted, and wherein the converted data are used for training algorithm (such as above-mentioned any algorithm), wherein the algorithm By the first converted data set from the first biological particle sample and from the second converted number of the second biological particle sample According to collection as input, wherein the acquisition of the second sample is from individual identical with the first biological sample, but compared with the first sample Second and later period acquisition.Second sample can be at least 1 day, at least 1 week after the first sample, at least one moon, at least 2 months, time point of at least six moon, at least 12 months, at least 24 months, at least 36 months, at least 5 years or at least 10 years adopts Collection.Optionally, the algorithm will also can also separate in order one day or more or the third of longer time or the 4th or The sample of five or more numbers is as input data.Optionally, can training algorithm be of the presence of an anomaly with detection structure, increase Significance,statistical between the frequency of body, the burden of accumulation or sample from two or more time points.Optionally, Can training algorithm to detect the presence or burden of cancer, and/or detect malignant tumour between two or more time points Growth, and/or the risk of malignant process is layered.Optionally, it can be used by the first population of individuals and the second population of individuals The sequence data that is associated generated carrys out training algorithm, wherein be separated by least 1 day between the offer of each group (or when any longer Between) the first sample and the second sample of acquisition, and wherein find that the first group has been diagnosed as with malignant process, and its The second group of middle discovery is not diagnosed as with malignant process, therefore trains the algorithm to detect the presence of malignant process.Appoint Selection of land, the algorithm training method can be used respectively separate in order at least each of 1 day three or more individual samples come into The capable and/or process, which can be used, has different characteristic (such as different the ranges of age, different smoking states, different races Property, different genetic cancer neurological susceptibility is horizontal and/or different cancer bears family history) three or more population of individuals Come carry out.

It in the method, can be such as institute herein by the sequence data that is associated that the sample of two or more particles generates It states and is converted, and wherein evaluate the converted data using algorithm (such as above-mentioned any algorithm), wherein the algorithm Using the first converted data set from the first particulate samples and the second converted data set from the second particulate samples as Input, wherein the acquisition of the second sample is from individual identical with the first biological sample, but second and more compared with the first sample The period acquisition in evening.Second sample can be at least 1 day, at least 1 week after the first sample, at least one moon, at least two moon, extremely Time point acquisition in few 6 months, at least 12 months, at least 24 months, at least 36 months, at least 5 years or at least 10 years.Optionally Ground, the algorithm will also can also separate in order one day or more or the third or the 4th or 5th or more of longer time The sample of number is as input data.Optionally, which can be used for detection structure and is of the presence of an anomaly with, and increase the frequency of individual Significance,statistical between rate, the burden of accumulation or sample from two or more time points.Optionally, the algorithm It can be used for detecting the presence or burden of cancer, and/or detect the growth of the malignant tumour between two or more time points, And/or the risk of malignant process is layered.

In either approach, the algorithm configuration is at the sample from the mixture comprising maternal source and the particle of fetal origin The sequence read group that is associated from fetal origin particle is detected in product.

33. for determining genome rearrangement, transposition, structural variant or the method for genome connection

The present invention provides the sequences that is associated for determining target nucleic acid (such as genomic DNA) segment from single particle to read The existing method of genome rearrangement or structural variant in group is taken, the method comprise the steps that (a) appoints according to described herein A kind of determining sequence read group that is associated of method;And (b) by each sequence (at least part) for the sequence read group that is associated Map to the first reference nucleotide sequence comprising the first genome area, and by each sequence for the sequence read group that is associated (at least part) maps to the second reference nucleotide sequence comprising the second genome area;And (c) to being found to map The number of the sequence read from the sequence read group that is associated in the first genome area is counted, and to being found The number for the sequence read from the sequence read group that is associated being mapped in the second genome area is counted.

Genome rearrangement or structural variant can be any kind of genome structure phenomenon, such as genome copy numbers become Different (increase including copy number or copy number is lost), micro-deleted or any classification rearrangement (such as inversion), transposition are for example dyed Body transposition (such as intrachromosomal translocation or interchromosomal translocation).

In the method, the number of the sequence read of counting is subsequently used in further evaluation procedure or statistics credit Analysis is to determine that genome, which whether may be present, between the first genome area and the second genome area contacts (i.e. along same dyeing The connection that body extends).The method can carry out the sequence read group that is individually associated, and it can also be to two or more phases The group of contact sequence reading group carries out, and to all in particulate samples be associated sequence read group or its subgroup into Row.

Optionally, the sum for the sequence read being associated in sequence read group is further defined.First and second genomic regions Domain can be located in same chromosome, and if it is, then they can be closely adjacent to each other or can be by any number of nucleosides Acid separates.Alternatively, the first and second genome areas can be located in two different chromosomes.First and second genome areas The length that respectively can be any number of nucleotide, from 1 nucleotide to chromosome arm or the length of whole chromosome.

Optionally, it is evaluated, wherein the number of the sequence read in the first genome area and first threshold are carried out Compare, and the number of the sequence read in the second genome area is compared with second threshold, wherein first number etc. It is determining or show the first genome area and the to be equal to or higher than second threshold in or higher than first threshold and the second number The presence and/or be related to rearrangement or the translocation events of the first and second genome areas that genome contacts between two gene group region Presence.Optionally, which may also include the sum of the sequence read in sequence read group of being associated from particle.For example, The evaluation may include the score for mapping to the sequence read in any given genome area calculated except being entirely associated group; Optionally, these fractional values can be compared with one or more threshold values, to determine or show the presence of genome connection.

Optionally, statistical test can be carried out, wherein evaluating the first genome by statistical test or by algorithm The number of sequence read in region and/or the number of the sequence read in the second genome area, with estimation first and the There is a possibility that between two regions genome connection or reset event probability or.Optionally, which, which can also be incorporated into, comes from The sum of sequence read in the group of the sequence read that is associated of particle.

Optionally, the method can carry out in the sequence read group that is individually associated from particle or it can be two It is carried out in the group of a or more sequence read group that is associated.It can also read in all sequences that are associated from specific sample It takes and is carried out in group, and it can also be carried out in the group for the sequence read group that is associated.Optionally, wherein the method at two Or more carry out in the group for the sequence read group that is associated, one or more other evaluation procedures can be carried out to evaluate There is a possibility that between one and second area significance,statistical that genome contacts or probability or, wherein evaluating quilt together It was found that mapping to the number of the sequence from two or more sequence read groups that are associated in first area and second area.

34. the method for determine phase to variant or variant allele

The present invention provides the methods for determine phase to the allele for being distributed in chromosomal region.These analyses can It is adapted on same chromosome or there are two Nucleic acid variants can have biology or medicine on two different chromosomes Any application of meaning or task.For example, can wherein find two different variant site (compound heterozygous in individual gene The case where), the mutation in the first site whether be located at genes of individuals group it is intragentic be mutated identical copy in the second site It is inside significant, or on the contrary, if each is located at one of genes of individuals group intragentic two different copies Upper (for example, if two mutation are Inactivating mutations), they, which are located on identical gene copy, will allow one to have work Property, functional gene copy, and if two Inactivating mutations are each located on one of two of gene copies, the gene Two copy all will without activity.

The present invention provides the methods for determine phase to two variant allele, wherein the first variant allele includes The second variant allele is included in the second genome area in the first genome area, and wherein, and wherein every A variant allele has at least two variants or potential variant, the method comprise the steps that (a) is according to described herein Any method determines the sequence read group that is associated;And it (b) determines comprising potential from each of the first variant allele The sequence of variant whether there is in the sequence read group that is associated, and determine comprising from each of second variant allele The sequence of potential variant whether there is to be associated in sequence read group in same.

Variant allele may include single nucleotide acid or two or more nucleotide region or one or The insertion and/or missing of more nucleotide.Optionally, further evaluation procedure is carried out, wherein detecting the first allele The first variant presence, and wherein detect the presence of the first variant of the second allele, and wherein the two equipotentials Gene is found to show or have estimated two allele each other in same chromosome phase in same be associated in sequence read group In and/or the probability that is associated along same chromosome or haplotype or haplotype block.

The method can be repeated to variant allele to comprising two pairs below or more: any potential variant equipotential base Any potential variant in cause and allele or variant allele site and two or more are different in this way Variant allele any combination thereof.

The method can carry out in the sequence read group that is individually associated from particle or it can be at two or more It is carried out in the group of a sequence read group that is associated.It can also be enterprising in all sequence read groups that are associated from specific sample Row, and it can also be in the upper progress of one or more particular demographics for the sequence read group that is associated.Optionally, the method Carried out in the group of two or more sequence read groups that are associated, can carry out one or more other evaluation procedures with Evaluate the system that two allele are found in same chromosome phase and/or in same chromosome or identical haplotype each other Meter learns conspicuousness or probability or possibility.Optionally, it can evaluate self-contained to come from the first and/or second variant equipotential base together The sequence of two or more sequence read groups that are associated of one or more variants of cause.Optionally, wherein the method Carried out in the group of two or more sequence read groups that are associated, can in variant allele it is specific (or more number Purpose) variant counts to the fixed number mutually in the sequence read group that is individually associated is found;Optionally, gained number can It is compared with one or more threshold values, or is evaluated with one or more statistical tests or algorithm, with evaluation The variant and each in sample are the same as a possibility that phase or probability.

Optionally, the method can be used for carrying out three or more variant allele determining phase.Optionally, this can lead to It crosses and simultaneously three or more all described variant allele determine mutually to carry out in single step, or can pass through Two or more consecutive steps carry out in order.

Optionally, the method can be used for variant allele (for example, at least 2, at least 5 in genome span A, at least ten, at least 25, at least 50, at least 100, at least 500, at least 1000, at least 10,000 or At least 100,000 variant allele) it carries out determining phase.Genome span can be at least 100,000 bases, at least 1,000,000 Base, at least 10,000,000 bases or whole chromosome arm or whole chromosome.Optionally, the method can be used for entire sequence Column carry out determining phase, including any kind of variant or constant sequence, including its size is that at least 1,000 bases, size are at least 10000 bases, size are that at least 100,000 bases, size are that at least 1,000,000 bases, size are at least 10,000,000 bases, big Small 100,000,000 bases, the length of being at least is at least chromosome arm and length the genome span that is whole chromosome.

Variant allele can be the genetic variants of any classification, including mononucleotide variant or mononucleotide polymorphic Property, length be two or more nucleotide variant, one or more nucleotide insertion or missing, from the beginning mutation, it is miscellaneous Conjunction property loss, rearrangement or translocation events, copy number variation or any other genome signature or mutation.

The method may include or be extended to including hereditary interpolating method (genetic imputation process).Appoint Selection of land determines the list of one or more allele or variant allele of the sequence read group that is associated from particle To carry out hereditary interpolating method;Optionally, the list can be associated from two or more sequence read group group determine, or Person determines from the specific subgroup group for the sequence read group that is associated.Can carry out hereditary interpolating method, wherein by it is one or more this The list of sample with from crowd's body one or more previously known haplotypes or haplotype block be compared, to institute It states allele or the variant allele in list to carry out fixed phase or estimate that it determines phase, or determines or estimate the sequence institute The haplotype or haplotype block of a part of the genome in source.It optionally, can be to two before carrying out hereditary interpolating method A or more allele or variant allele carry out determining phase.Optionally, two or more such allele or The phase of determining of variant allele can be by above-mentioned any method progress.Optionally, can carry out it is fixed mutually and/or hereditary interpolation and/or The combination of haplotype estimation and/or the process of iteration, the such step of any of them or component can be repeated once, twice or more It is secondary.

Can be used carry out hereditary interpolation and/or haplotype estimation and/or it is fixed mutually and/or any tool of variant estimation and/ Or method and/or information approach.Optionally, SHAPEIT2, MaCH, Minimac, IMPUTE2 and/or Beagle can be used.

Optionally, hereditary interpolating method can be used to generate one or more reference sequences (for example, generating one or more Multiple list of reference sequences).Optionally, hereditary interpolating method simultaneously and/or can be used together with haplotype estimation method.Optionally Ground, can be used hereditary interpolating method to generate one or more reference sequences, it includes being included in Fetal genome and/or The sequence being enriched in Fetal genome and/or in Fetal genome is possibly comprised in (for example, generating one or more references Sequence list, sequence include to be included in Fetal genome and/or be possibly comprised in Fetal genome and/or in fetus gene The sequence being enriched in group).Optionally, hereditary interpolating method can be used to generate one or more reference sequences, it includes packets It is contained in maternal gene group and/or is possibly comprised in the sequence (example being enriched in maternal gene group and/or in maternal gene group Such as, one or more list of reference sequences are generated, sequence includes to be included in maternal gene group and/or be possibly comprised in parent The sequence being enriched in genome and/or in maternal gene group).Optionally, hereditary interpolating method can be used generate one or More reference sequences, it includes be included in male parent's genome and/or be possibly comprised in male parent's genome and/or in male parent (for example, generating one or more list of reference sequences, sequence includes to be included in male parent's genome to the sequence being enriched in genome It is interior and/or be possibly comprised in the sequence being enriched in male parent's genome and/or in male parent's genome).Optionally, heredity can be used Interpolating method generates one or more reference sequences, and it includes be included in cancer gene group and/or be possibly comprised in cancer The sequence being enriched in disease genome and/or in cancer gene group is (for example, generate one or more list of reference sequences, sequence Column include in cancer gene group and/or to be possibly comprised in cancer gene group and/or be enriched in cancer gene group Sequence).

Optionally, input list (such as the mononucleotide polymorphic of sequence and/or allele can be used in hereditary interpolating method Property list), wherein the input list is originated from the sequence for carrying out the genomic DNA fragment of self-loopa particle.Optionally, the input List may originate from coming the sequence that is associated of the genomic DNA fragment of self-loopa particle.Optionally, the input list may originate from coming The sequence that is not associated of the genomic DNA fragment of self-loopa particle.Optionally, the input list may originate from carrying out self-loopa particle Genomic DNA fragment (being associated or be not associated) sequence subgroup.Optionally, the input list may originate from coming The subgroup of (being associated or be not associated) sequence of the genomic DNA fragment of self-loopa particle, wherein the Asia of the sequence Group comprising be included in maternal gene group and/or be possibly comprised in maternal gene group and/or in maternal gene group enrichment and/ Or the sequence under a cloud being enriched in maternal gene group.Optionally, the input list may originate from coming the gene of self-loopa particle The subgroup of (being associated or be not associated) sequence of group DNA fragmentation, wherein the subgroup of the sequence includes to be included in male parent In genome and/or it is possibly comprised in male parent's genome and/or the enrichment and/or under a cloud in male parent's base in male parent's genome Because of the sequence being enriched in group.Optionally, the input list may originate from coming (being connected for the genomic DNA fragment of self-loopa particle System or be not associated) subgroup of sequence, wherein the subgroup of the sequence includes to be included in Fetal genome and/or may Enrichment and/or the sequence under a cloud being enriched in Fetal genome in Fetal genome and/or in Fetal genome. Optionally, the input list may originate from coming (being associated or be not associated) of the genomic DNA fragment of self-loopa particle The subgroup of sequence, wherein the subgroup of the sequence includes to be included in cancer gene group and/or be possibly comprised in cancer gene group It is interior and/or in cancer gene group enrichment and/or the sequence under a cloud being enriched in cancer gene group.

Any input list (such as single nucleotide polymorphism list) of sequence and/or allele and/or any one Or more reference sequences (such as one or more list of reference sequences) and/or its any subgroup can be by described herein Any method generate.

Optionally, hereditary interpolating method can be used generate, determine or estimate genome a part haplotype or list Times type block.Optionally, hereditary interpolating method can be used generate, determine or estimate maternal gene group a part haplotype Or haplotype block.Optionally, hereditary interpolating method can be used generate, determine or estimate male parent's genome a part list Times type or haplotype block.Optionally, hereditary interpolating method can be used to generate, determine or estimate a part of Fetal genome Haplotype or haplotype block.Optionally, hereditary interpolating method can be used to generate, determine or estimate the one of cancer gene group Partial haplotype or haplotype block.Optionally, it is at least 2 that such haplotype or haplotype block, which can be related to length, A nucleotide, at least ten, at least 100, at least 1000, at least 10,000, at least 100,000, at least 1,000, The genome area of 000, at least 10,000,000 or at least 100,000,000 nucleotide;Optionally, such described Haplotype or haplotype block can be related to chromosome arm, whole chromosome and/or full-length genome.

Optionally, hereditary interpolating method can be used from crowd's body two or more are previously known (and/or previously Prediction or generation) haplotype or haplotype block entry.Optionally, haplotype or haplotype block can be related to length and be At least two nucleotide, at least ten, at least 100, at least 1000, at least 10,000, at least 100,000, at least 1, The genome area of 000,000, at least 10,000,000 or at least 100,000,000 nucleotide;Optionally, haplotype Or haplotype block can be related to chromosome arm, whole chromosome and/or full-length genome.

Optionally, hereditary interpolating method can be used at least two, at least three, at least five, at least ten, at least 50, extremely Few 100, at least 500, at least 1000, at least 5000, at least 10,000, at least 50,000, at least 100,000 Single times of a, at least 500,000 or at least 1,000,000 more previously known (and/or previous prediction or generation) The entry of type or haplotype block.

The method can carry out the sequence read group that is individually associated, and its sequence that can also be associated to two or more The group of column reading group carries out, and to all be associated sequence read group or the progress of its subgroup in particulate samples.

35. the method for the sequence read that is associated for determining and analyzing fetal origin

The present invention provides the methods for analyzing the sequence data that is associated, wherein the data are by from pregnant female Sample (therefore the sample may include particle and the fetus (and/or placenta) of maternal source (i.e. from normal body maternal tissue) The mixture of the particle in source) it generates.The method can be used for detecting fetal chromosomal abnormalities (such as fetal trisomic or fetus Microdeletion) presence.Several such methods can be carried out to identical group of foetal sequence, it is thus possible to fetus genetic Venereal disease disease carries out multiple and sensitive detection.

The present invention provides the methods for the sequence read group that is associated for determining fetal origin, the method comprise the steps that (a) The sequence read group that is associated is determined according to any method described herein, wherein the sample includes from maternal blood Particle;And (b) by each sequence read (at least part) for the sequence read group that is associated and it is present in Fetal genome The reference listing of sequence be compared;And (c) by one or more sequence reads of sequence read group that are associated It is interior to there are one or more sequences from reference listing to identify the sequence read group that is associated of fetal origin.

The sequence read group that is associated of fetal origin may include the sequence read of the target nucleic acid fragment from fetus, by its group At or it is consisting essentially of.Optionally, the sequence read group that is associated of fetal origin may include the target nucleic acid piece from fetus The sequence read of section is made from it, and also comprising the target nucleus from one or more maternal tissues and/or mother cell The sequence read of acid fragment is made from it.

The reference listing for the sequence (or sequence variants) being present in Fetal genome may include rich in Fetal genome The sequence of collection is made from it or consisting essentially of.The reference listing for the sequence being present in Fetal genome may include The sequence (compared with maternal gene group) that is enriched in Fetal genome is made from it or consisting essentially of.It is present in fetus The reference listing of sequence in genome may include the sequence (compared with Fetal genome) lacked in maternal gene group, by it Composition is consisting essentially of.Be present in the sequence in Fetal genome reference listing may include in maternal gene group not Existing sequence is made from it or consisting essentially of.The reference listing for the sequence being present in Fetal genome may include Sequence male parent sequence or male parent's sequence variants are made from it or consisting essentially of.

Particle may originate from the maternal blood of pregnant individuals.Optionally, particle may originate from the maternal blood of pregnant individuals, wherein The individual gestation has at least two developmental fetuses (for example, the individual gestation has twins or triplet or any Greater number of developmental fetus).Optionally, particle may originate from the maternal blood of pregnant individuals, wherein gestation has passed through body Outer fertilization generates.Optionally, any external fertilization method can further comprise following any step: the heredity sieve before being implanted into Genetic diagnosis, the embryo score before implantation and/or the Embryo selection before implantation before choosing, implantation.

Particle may originate from the maternal blood of pregnant individuals, wherein the embryo for generating corresponding developmental fetus has been subjected to The genetic modification procedures (or being generated by it) of one or more of synthesis.Optionally, any one or more synthesis heredity Modification may include CRISPR modification operation.Optionally, any one or more synthesis genetic modification procedure may include line Plastochondria replacement operator.Optionally, any one or more synthesis genetic modification procedure can be related to disease correlation or disease causes Mutation and/or sequence and/or allele modification and/or correction.Optionally, any one or more synthesis heredity is repaired Decorations process can be related to include sequence in individual gene modification.Optionally, any one or more synthesizes genetic modification Process can be related to the modification for the sequence being included in non-genomic (such as between gene) region.Optionally, any one or more Synthesis genetic modification procedure can be related to the insertion of sequence, the modification of the missing of sequence and/or sequence and/or inactivation.Optionally, appoint What one or more synthesis genetic modification procedure can be related to insertion, missing, displacement or the modification of genome area;Optionally, Such genomic region lengths can be at least two nucleotide, at least three nucleotide, at least five nucleotide, at least 100 Nucleotide, at least 1000 nucleotide, at least 10,000 nucleotide, at least 100,000 nucleotide, at least 1,000,000 A nucleotide, at least 10,000,000 nucleotide, at least chromosome arm or at least chromosome.

Any synthesis genetic modification procedure may include at least two, at least three, at least five, at least ten, at least 50, The group of at least 100, at least 1000 or at least 10,000 different synthesis genetic modification procedures.Any such synthesis The group of genetic modification procedure can in turn carry out (for example, wherein carrying out the first synthesis genetic modification procedure, then carrying out second and closing At genetic modification procedure) or concurrently carry out (for example, two of them or multiple synthesis genetic modification procedures are same on single sample Shi Jinhang).

Particle may originate from the maternal blood of pregnant individuals, wherein the embryo for generating corresponding developmental fetus has passed through One or more of external gamete generating processes generate.Optionally, external gamete generating process as one kind may include external Ovum occurs.Optionally, external gamete generating process as one kind may include external sperm.Optionally, any Or more as external gamete generating process may include from somatic tissue's (such as the skin for being obtained from one or more individuals Skin and/or fibroblast tissue or cell) gamete is synthesized in vitro.Optionally, match in vitro as any one or more Sub- generating process may also include extracorporeal fertilization process.Optionally, external gamete generating process as any one or more It may also include (in vitro one or more gametes after fertilization process and/or one or more embryos) one or more Multiple synthesis genetic modification procedures.

The method can include: carry out step (a) to determine at least two, at least ten, at least 100, at least 1000 A, at least 10,000, at least 100,000, at least 1,000,000, at least 10,000,000, at least 100,000, 000 or at least 1,000,000,000 sequence read groups that are associated;Step (b) is carried out to each sequence read group that is associated; And step (c) is carried out to come from reference columns by existing in one or more sequence reads for the sequence read group that is associated One or more sequences of table identify the sequence read group that is associated of fetal origin.

The method may include identify at least two of fetal origin, at least ten, at least 100, at least 1000, extremely Few 10,000, at least 100,000, at least 1,000,000 sequence read groups that are associated.

The method may include identifying the sequence read group that is associated of maternal source and/or non-fetal source.

The reference listing of sequence read and sequence or sequence variants from each sequence read group that is associated can be carried out Compare, wherein the reference listing of the sequence or sequence variants exists or is enriched in Fetal genome.Optionally, wherein sequence Or sequence variants are not present or lack in maternal gene group.It can be by being detected in the sequence read for being associated sequence read group One or more sequences or sequence variants from the reference listing determine or predict the sequence that is associated of fetal origin Reading group.

Male parent's sequence or sequence variants or its group sequence read group or all can be associated by evaluating them being associated Allele score in sequence read group determines, and wherein finds the allele score in the sequence read Less than particular fraction, be, for example, less than 50%, less than 40%, less than 30%, less than 25%, less than 20%, less than 15%, be less than 10%, less than 8%, less than 5%, less than 4%, less than 3%, less than 2%, less than 1% or less than any other threshold value.Optionally Ground, male parent's sequence or sequence variants determine by the finite list of one or more sequences or sequence variants, optionally its Described in finite list include mononucleotide variant common in crowd's body and/or mononucleotide insertion or the list of missing. Any sequence or sequence variants can be the form of mononucleotide variant, be at least one nucleotide, at least two nucleotide, Or more the insertion of the sequence or sequence variants of the nucleotide of number or any other classification or size or the form of missing.It is logical It crosses any one that the above method determines or more male parent's sequence or sequence variants is then used as reference listing to evaluate and It whether is fetal origin from the sequence read group that is associated of particulate samples, such as to evaluate the given sequence read group that is associated. Optionally, above-mentioned any method can be alternatively for the sequence read group that is associated for determining maternal source.

Male parent's sequence or sequence variants or its group can be determined by hereditary interpolation.It optionally, include mononucleotide variant Or first group of male parent's sequence of the sequence or sequence variants or combinations thereof comprising any other type is for estimating haplotype or list Times type block, and second group of male parent's sequence or sequence variants are determined by the haplotype or haplotype block, wherein from the The sequence of one group of sequence and the sequences from second group of sequence are included in haplotype or haplotype block, but wherein Two groups of male parent's sequences or sequence variants are not included in first group of paternal line sequence or sequence variants.It optionally, can be by as determining Sequence in all sequence read groups that are associated lower than specific threshold allele score equally determines first group of male parent Sequence.Male parent's sequence or one or two group of sequence variants are then used as reference listing to evaluate the phase from particulate samples Contact sequence reading group, such as to evaluate the given sequence read group that is associated whether be fetal origin.Optionally, above-mentioned any Method can be alternatively for the sequence read group that is associated for determining or predicting maternal source.

Male parent's sequence or sequence variants or its group can be by being sequenced the sample comprising the genomic DNA from father To determine (such as by carrying out the sequencing of target gene group and/or genome sequencing to male parent's genomic DNA).Parental sequences or sequence Column variant or its group can be determined by the way that the sample comprising the genomic DNA from mother is sequenced (such as by mother Body genomic DNA carries out the sequencing of target gene group and/or genome sequencing).

It can be different from two or more of sequence or sequence variants by the sequence from each sequence read group that is associated Reference listing be compared.

The method may include that determination includes the sequence from each sequence read group that is associated in the reference listing Arrange the number read.

The method may include to the non-parent from non-parent or male parent's sequence list being associated in sequence read group Or the number of male parent's sequence counts.Optionally, the counting side can be carried out to all sequence read groups that are associated in sample Method.Optionally, each non-parent or male parent's sequence can be related to weighted value, so that method of counting includes weighted count method, In non-parent or the weighted sum of male parent's sequence in the sequence read group that is associated has been determined.Optionally, which can correspond to A possibility that given sequence is non-parent or male parent or probability, or a possibility that be parent corresponding to given sequence or probability.

It can by the total and/or weighted sum of non-parent or male parent's sequence from the sequence read group that is associated and one or more Multiple threshold values are compared, and wherein determine the non-parent of number comprising being greater than the threshold value or being associated for male parent's sequence Sequence read group is fetal origin.Optionally, all sequence read groups that are associated in sample can be determined it is any in this way The summation and the method that is compared it with one or more threshold values.Optionally, it determines any such described total The method of sum may include determining weighted sum as described above.Optionally, it may be determined that total and/or weighted sum is equal to threshold value, in threshold The sequence read group that is associated in one or more ranges of value, less than threshold value or in particular value group is fetal origin.Appoint Selection of land, above-mentioned any method can be used for determining the sequence read group that is associated of maternal source.It optionally, can be to by any of above Method is found to be fetal origin or is found to be the sum of the sequence read group that is associated of maternal source and counted, with respectively Determine the sum of the sequence read group that is associated of fetal origin or maternal source.

It optionally, can be by the sum of the sequence read group that is associated of fetal origin and the sequence read that is associated of maternal source The sum of group is compared or the sum of the sequence read group that is associated divided by maternal source, with estimate or determine fetus particle with Parent particle and/or with the fine-grained score of institute or a ratio.

The method may include two or more determining from one or more sequence read group that is associated The length of genome sequence, and wherein the length determine described in the sequence read group that is associated whether correspond to fetus or parent The particle in source.Optionally, can all sequence read groups that are associated in sample be determined with the side of such length Method.Optionally, average value, median or the mode of the determining genome sequence length from the sequence read group that is associated, and with It is compared with threshold value afterwards, wherein the sequence reading that is associated of such value comprising being less than, more than or equal to the threshold value Group is taken to be confirmed as fetal origin.Optionally, by the described average of the genome sequence length from the sequence read group that is associated One or more limited groups of value, median or mode and one or more ranges of value or value are compared, and are determined Value in the range or described group is fetal origin.Optionally, above-mentioned any method can be used for determining being connected for maternal source It is sequence read group.It optionally, can be to being found to be fetal origin by any of above method or be found to be maternal source The sum for the sequence read group that is associated is counted, to determine the sequence read group that is associated of fetal origin or maternal source respectively Sum.

It optionally, can be by the sum of the sequence read group that is associated of fetal origin and the sequence read that is associated of maternal source The sum of group is compared or the sum of the sequence read group that is associated divided by maternal source, with estimate or determine fetus particle with Parent particle and/or with the fine-grained score of institute or a ratio.

The method may include determining two or more genes from one or more sequence read groups that are associated Group sequence length, and by the length and with reference to genome length distribution be compared, wherein carry out statistical test with by Length from the sequence read group that is associated is compared with the reference distribution, and is being confirmed as and the reference The length of distribution is statistically similar, statistically different, statistically bigger and/or statistically smaller than its Length in the sequence read group that is associated be confirmed as fetus or maternal source.Optionally, t inspection, graceful-Whitney test (Mann-Whitney test), variance analysis (ANOVA) are examined or any other statistical test can be used as the statistics inspection It tests.Optionally, can by mapping to the first end and second end of each sequence read that is associated with reference to genome sequence, and with The total span at 3 ' ends of the genome sequence from 5 ' ends of first end to second end is determined afterwards, thus calculates the overall length of base-pair Spend the genome length to determine molecule in the sequence read group that is associated.It optionally, can be by from the 5 ' of first end ends to the Each of the 3 ' ends at the two ends sequence that is associated integrally is sequenced, to directly determine the base of the molecule comprising genome sequence The genome length of molecule in the sequence read group that is associated is determined to length.

Optionally, all sequence read groups that are associated in sample can be determined and statistical evaluation described in length Method.Optionally, above-mentioned any method can be alternatively for the sequence read group that is associated for determining maternal source.Optionally, It can be to being found to be fetal origin by any of above method or be found to be the total of sequence read group that be associated of maternal source Number is counted, to determine the sum of the sequence read group that is associated of fetal origin or maternal source respectively.It optionally, can be by tire The sum of the sequence read group that is associated in youngster source is compared or removes with the sum of the sequence read group that is associated of maternal source With the sum of the sequence read group that is associated of maternal source, with estimate or determine fetus particle and parent particle and/or with it is all The score or ratio of particle.

The method may include the genome length for determining each sequence read in sequence read group of being associated, and wherein Determine the presence and/or number of non-parent or male parent's sequence in the sequence read of the same sequence read group that is associated, and wherein Whether two parameters are fetal origin for determining the sequence read group that is associated.Optionally, it can be associated to all in sample Sequence read group is determined this method of length and sequence.Optionally, it is connected using algorithm with evaluating two parameters with determination It is whether sequence read group is fetal origin.Optionally, the sequence read group that is associated is confirmed as fetal origin, wherein it is each this The sequence read group that is associated of sample is confirmed as with the average sequence length within the scope of specific length, and wherein also found The same sequence read group that is associated includes non-parent or the father of the number of the specific threshold number higher than non-parent or male parent's sequence Body sequence.Optionally, two or more that can be used length range and sequence count are such to determining the sequence that is associated Whether reading group is fetal origin, wherein if its fall into length range and sequence count any one or more as Pair parameter in, it is determined that the sequence read group that is associated be fetal origin.

The method may include being mapped in one or more be associated in sequence read group with particular reference in sequence The sum of sequence read is counted, wherein the reference sequences length is at least one nucleotide, length is at least two nucleosides Acid or length are at least ten nucleotide or length is that at least 100 nucleotide or length are at least 1000 nucleotide, or Length is at least 10, and 000 nucleotide or length are at least 100, and 000 nucleotide or length are at least 1,000,000 Nucleotide or length are at least 10,000,000 nucleotide or length is chromosome arm or length is whole chromosome. Optionally, reference sequences can be made of two or more independent sections, and therefore substantially be discontinuous.Optionally, Can to two or more different reference sequences or at least ten reference sequences, at least 100 reference sequences, at least 1000 A reference sequences, at least 10,000 reference sequences, at least 100,000 reference sequences, at least 1,000,000 refers to sequence Column, at least 10,000,000 reference sequences, at least 100,000,000 reference sequences, or at least 1,000,000,000 ginseng It examines sequence and carries out the method for counting.

Optionally, the method for counting can be carried out to sliding window, two of them or more window crosses over the one of chromosome Partially or across whole chromosome arm or across whole chromosome or across all chromosomes of genome tile (tile). Optionally, it may be determined that map to the absolute number of all sequences for being confirmed as fetal origin of given such reference sequences Mesh.Optionally, it may be determined that map to the score of all sequences for being confirmed as fetal origin of given such reference sequences Or ratio.Optionally, it may be determined that map to all sequences for being confirmed as fetal origin of given such reference sequences Number, and then by it divided by the sum for the sequence read group that is associated for being confirmed as fetal origin, come with each fetus of determination The average number of the sequence read for mapping to the reference sequences of the sequence read group that is associated in source.It optionally, can be to fetus One or more each of sequence read groups that are individually associated in source independently carry out any such analysis.Appoint Selection of land, all sequences of two or more sequence read groups that are associated from fetal origin can jointly be carried out it is any this The analysis of sample.Optionally, the sequence of one or more sequence read groups that are associated from maternal source can be carried out above-mentioned Any such analysis.Optionally, it can be mapped in corresponding to particular reference to the sequence from fetal origin particle in sequence Any such number or score with correspond to the sequence from maternal source particle being mapped in same reference sequence Any such number or score are compared.Optionally, above-mentioned any such analysis can be carried out to come to determine from fetus The such number or score of the sequence read group that is associated of source particle, and identical analysis can be carried out to determine from parent The such number or score of the sequence read group that is associated of source particle, and can be by the number of the sequence of fetal origin and mother The respective number of the sequence in body source is compared to generate its ratio, score or fiducial value.

In the method, (reference listing of sequence) at least one reference sequences may include repetitive sequence.Optionally, The repetitive sequence includes that dinucleotides repetition, Trinucleotide repeats, tetranucleotide repeat or pentanucleotide repeat.Optionally, join The series of two or more adjacent copies of the examination nucleotide sequence comprising identical repetitive unit, such as 2 adjacent copies, 5 adjacent copies, 8 adjacent copies, 10 adjacent copies, 15 adjacent copies, 20 adjacent copies, 30 it is tight Adjacent copy, 40 adjacent copies, 50 adjacent copies or 100 adjacent copies.

The method may include further evaluation procedure, wherein be each associated sequence read group or the sequence that is associated reading Take that any such absolute number of the sequence read of the group of group, be each associated sequence read group or the sequence read that is associated The average number of the sequence read of the group of group or the opposite or fractional number for mapping to sequence read in reference sequences can be with One or more ranges of threshold value or value are compared.Optionally, the number is higher or lower than the threshold value or in value Show or determine heredity or chromosome illness in one or more ranges or is of the presence of an anomaly with.Optionally, any such point Analysis can be shown that or determine that the copy number of any length in nucleotide increases, the copy number of any length is lost, is any in nucleotide The microdeletion or chromosomal aneuploidy of length or any other structure or chromosome illness or exception.Optionally, Can to be higher than such threshold value, lower than such threshold value or in one or more such ranges of value The sum of the group of the sequence read group that is associated or the sequence read group that is associated is counted.

The method may include further evaluation procedure, wherein can be between two or more different reference sequences Any such absolute number of the sequence read for the sequence read group that is more each associated, the sequence read group that is each associated it The average number of sequence read or the opposite or fractional number for the sequence read being mapped in reference sequences.It optionally, can be in the future Such number from the first reference sequences is compared with such number from the second reference sequences.Optionally, can make With the second reference sequences of two or more equal lengths.Optionally, the reference of two or more different lengths can be used Sequence, wherein length normalization method of the number of each reference sequences relative to the reference sequences before comparison.Optionally, may be used By one or more ranges of antipode and threshold value between number as number and second as first or value into Row compares, wherein the difference is higher than the threshold value, shows lower than the threshold value or in one or more such ranges Or it determines heredity or chromosome illness or is of the presence of an anomaly with.It optionally, can be by number as number and second as first Between relative different (such as being indicated in the form of ratio, score or percentage) and threshold value or value one or more ranges It is compared, wherein the difference is higher than the threshold value, lower than the threshold value or the table in one or more such ranges Bright or determining heredity or chromosome illness are of the presence of an anomaly with.Optionally, any such analysis can be shown that or determine nucleotide In any length copy number increase, in nucleotide the copy number of any length lose, the microdeletion of any length or Chromosomal aneuploidy or any other structure or chromosome illness or exception.Optionally, above-mentioned any such point can be carried out Between two or more the different reference sequences of analysis to determine the sequence read group that is associated from fetal origin particle in this way Number, score, ratio or relative different, and can carry out it is identical analysis with determine being associated from maternal source particle Such number, score, ratio or relative different between two or more different reference sequences of sequence read group, and can By the respective number of the sequence of the number of the sequence of fetal origin, score, ratio or relative different and maternal source, score, ratio Example or relative different are compared to generate its ratio, score or fiducial value.

The method may include that the group of the sequence read that is associated of determining each fetal origin maps in reference sequences Sequence read average number, and wherein the average number is compared with threshold value, and wherein the number is higher than Or shows or determine fetus genetic or chromosome illness lower than the threshold value or be of the presence of an anomaly with.Optionally, the reference sequences Comprising essentially all of chromosome, and the number is higher than the threshold value and shows or determine depositing for fetal chromosomal trisomy In.Optionally, the reference sequences include the micro-deleted region of essentially all genome, and the number is lower than the threshold value Show or determine the micro-deleted presence of fetus.

It has been determined that the method may include that the sequence read group that is associated of determining each fetal origin is mapped in the first ginseng The average number of the sequence read in sequence is examined, and has wherein been determined that the sequence read group that is associated of each fetal origin is reflected Penetrate the average number of the sequence read in the second reference sequences, and number as being wherein determined first with second in this way Number between relative different (such as being indicated in the form of ratio, score or percentage), and wherein by the relative mistake It is different to be compared with threshold value, wherein the difference, which is higher or lower than the threshold value, shows or determines fetus genetic or chromosome illness Or it is of the presence of an anomaly with.Optionally, first reference sequences include essentially all of chromosome, and the relative different is high Show or determine the presence of fetal chromosomal trisomy in the threshold value.Optionally, first reference sequences include substantially All micro-deleted regions of genome, and the relative different shows or determines the micro-deleted presence of fetus lower than the threshold value.

The present invention provides the methods for determining fetus genotype comprising: (a) pass through any method described herein Determine the sequence read group that is associated of fetal origin;And fetus base (b) is determined from the sequence read group that is associated of fetal origin Because of type.

Fetus genotype can be fetal chromosomal abnormalities (such as aneuploidy).

The present invention provides determine fetus genotype, Fetal genome sequence, the Fetal genome sequence or its group for determining phase At or score method, wherein the sequence comprising the fetus genotype or sequence is by the sequence that is associated from fetal origin particle Sequence in column reading group determines.Optionally, the genotype or genome may include two lists from Fetal genome times Type (such as paternal inheritance haplotype and matrilinear inheritance haplotype) sequence or sequence variants.Optionally, fetus genotype Or genome also may include that can be paternal or matrilinear inheritance one or more structures or chromosome abnormality, or can make To be generated from header structure or chromosome abnormality.Optionally, fetus genotype or genome also may include non-maternal or paternal inheritance One or more from the beginning mononucleotide variants.

The method may include determining fetus base from the sequence being associated in sequence read group from fetal origin particle Because of the sequence of group DNA, and wherein determine one haplotype or two haplotypes.Optionally, the genomic DNA may include The sequence or sequence variants of two haplotypes from Fetal genome, and thus phase algorithm or haplotype are determined using haplotype Algorithm for estimating is estimated or fixed mutually one or two haplotypes.It optionally, can be right before determining phase algorithm using haplotype The list of sequence or sequence variants carries out processing or filter process, wherein determining only to make in phase or haplotype estimating step in subsequent With the sequence or sequence variants of at least certain confidence level, at least certain level of accuracy or at least any other or The threshold value of more parameters.Optionally, it before the step of determining phase or haplotype is estimated, is surveyed using error correction and/or redundancy Sequence method is to improve the accuracy of the sequence or sequence variants.Optionally, the fixed phase of the haplotype or algorithm for estimating can also wrap Include the group of one or more haplotypes or haplotype block from crowd's body.Optionally, any of above method can be used Determine the haplotype for corresponding to specific chromosome or chromosomal section, and optionally, it may be determined that correspond to the chromosome or Both haplotype and the haplotype of paternal inheritance of the matrilinear inheritance of chromosomal section.

As described herein, the method may include carrying out counting and/or to weighting, average, absolute, phase to sequence read Pair or any step for being counted of normalized sequence read.The step can after data de-duplication step, wherein Further analysis, count, evaluation, processing or the step of operation before, by what is be sequenced twice or repeatedly from sequencing reaction Sequencing molecule is compressed into single expression.Optionally, which can further comprise error recovery procedure, wherein Counting or further before any step for analyzing, detection and/or quantization and/or correction repeat intramolecular mistake and/ Or mismatch repeats molecule.

The present invention offer sequence read group that is associated from fetal origin and/or maternal source particle is combined or The method of association evaluation, wherein the method includes carrying out the first evaluation comprising any analysis is as described herein with true Determine the illness, event or exception of First ray or chromosome;And carry out the second evaluation comprising any as described herein Analysis is with determining second sequence or illness, event or the exception of chromosome.Optionally, to the disease of different sequence or chromosome Disease, event or abnormal progress at least 3 times, at least 10 times, at least 100 times, at least 1000 times, at least 10,000 times or at least 100 Evaluation as ten thousand times is analyzed.Optionally, any such analysis or evaluation can with to the sequence data progress that is not associated Sequence analysis joint carries out.

36. the method for diagnosing and monitoring

The present invention provides the methods of diagnosis and monitoring based on any method described herein.

The present invention provides diagnosing the illness in study subject or the method for illness, the method comprise the steps that (a) is determined The first parameter value for being associated sequence read group determined from the given the test agent from the object, wherein according to described herein Any method determine the parameter value;And (b) by the parameter value of the sequence read group that is associated determined from given the test agent It is compared with control parameters value.

The sequence read group that can be associated from second determined by the given the test agent from object determines control parameters value, wherein The control parameters value is determined according to any method described herein.

Control parameters value can be determined from the sequence read group that is associated determined by control sample, wherein according to described herein Any method determine the control parameters value.

Disease or illness can be cancer, chromosomal aneuploidy or microdeletion, genome copies number variation (such as copy number increases or copy number is lost), loss of heterozygosity, rearrangement or translocation events, mononucleotide variant or from head process Become.

The present invention provides the methods that disease or illness are monitored in study subject, the method comprise the steps that (a) is determined The first parameter value for being associated sequence read group determined from the given the test agent from the object, wherein according to described herein Any method determine the parameter value;And (b) parameter value for the sequence read group that is associated and control parameters value are carried out Compare.

It can be associated from second by being determined in the control sample for being obtained from same target than the time point of given the test agent earlier Sequence read group determines control parameters value.Time interval between the control sample and given the test agent of acquisition can be at least 1 day, At least 1 week, at least one moon or at least 1 year.

Group can be associated solely to the sequence of two or more the different samples separated by time interval from object It is on the spot determined parameter value and/or carries out any method of the second analytical procedure described herein, two of them or more A different sample comes from same target, wherein time interval be at least 1 day, at least 1 week, at least one moon, at least 1 year, at least 2 years or at least 3 years.Can between any difference sample as two or more more any such parameter value and/or The result of second analytical procedure.Can comparison step in this way determine the knot of such parameter value and/or the second analytical procedure Absolute or relative different between fruit.Optionally, such absolute or relative different can be relative between the time between two samples Every length normalization method and/or divided by the length of time interval between two samples.It optionally, can will be such absolutely or opposite Difference and/or relevant normalized value are compared with one or more threshold values, wherein being higher than the value of such threshold value can refer to Show disease or illness, such as the high risk that cancer or cancer occur.

Disease or illness can be cancer.

The present invention provides diagnosing the illness in object or the method for illness, the method comprise the steps that (a) is according to herein Described in any method determine and be associated sequence read group, wherein the sample includes the particle from blood;And (b) By the ginseng of each sequence read (at least part) for the sequence read group that is associated and the sequence being present in the cell of disease It examines list to be compared, comes from reference listing wherein existing in one or more sequence reads for the sequence read group that is associated One or more sequences show the presence of disease.

Disease or illness can be cancer.

The present invention provides determine sick cell (such as tumour cell) source the sequence read group that is associated method, Described in method to include: (a) determine according to any method described herein sequence read that is associated group, wherein the sample Product include the particle from blood;And (b) by each sequence read (at least part) for the sequence read group that is associated with The reference listing for the sequence being present in the cell (such as tumour cell) of disease is compared;And (c) by being associated There are one or more sequences from reference listing in one or more sequence reads of sequence read group to identify disease Attenuate the sequence read group that is associated in born of the same parents (such as tumour cell) source.

The present invention provides the methods for determining tumor genotype comprising: (a) according to any method described herein Determine the sequence read group that is associated in tumour source;And tumour base (b) is determined from the sequence read group that is associated in tumour source Because of type.

Sample may include the particle of the blood from the patient for being diagnosed with disease (such as cancer).

The present invention further defines that in the number clause with the following group

1. the method for sample of the analysis comprising the particle from blood, wherein the particle contains at least two genome DNA fragmentation, and the method comprise the steps that

(a) sample of the preparation for sequencing comprising at least two at least two genomic DNA fragments are associated To generate at least two groups for being associated genomic DNA fragment;And

(b) segment that is each associated in described group is sequenced to generate at least two and be associated sequence read.

2. method described in clause 1, wherein by least three of the particle, at least four, at least five, at least ten, extremely 50, at least 100, at least 500, at least 1000, at least 5000, at least 10,000, at least 100,000 few, Or at least 1,000,000 genomic DNA fragment is associated, and is then sequenced to generate at least three, at least four, at least 5, at least ten, at least 50, at least 100, at least 500, at least 1000, at least 5000, at least 10,000, At least 100,000 or at least 1,000,000 sequence reads that are associated.

3. method described in clause 1 or clause 2, wherein the diameter of the particle is 100 to 5000nm.

4. method described in any one of clause 1 to 3, wherein the genomic DNA fragment that is associated is from individual gene Group DNA molecular.

5. method described in any one of clause 1 to 4, wherein the method also includes estimating or determine the genome that is associated The genome sequence length of DNA fragmentation.

6. method described in any one of clause 1 to 5, wherein the method also includes dividing from blood, blood plasma or serum The step of from particle.

7. method described in clause 6, wherein the separating step includes centrifugation.

8. method described in clause 6 or clause 7, wherein the separating step includes size exclusion chromatography.

9. method described in any one of clause 6 to 8, wherein the separating step includes filtering.

10. method described in any one of clause 1 to 9, wherein the sample includes first and second from blood micro- Grain, wherein each particle contains at least two genomic DNA fragment, and wherein the method includes carrying out step (a) to produce Second group of genomic DNA fragment that is associated of first group of be associated genomic DNA fragment and second particle of raw first particle, with And step (b) is carried out to generate second group of sequence that is associated of first group of be associated sequence read and second particle of the first particle It reads.

11. method described in any one of clause 1 to 9, wherein the sample includes the n particle from blood, wherein Each particle contains at least two genomic DNA fragment, and wherein the method includes carrying out step (a) to generate n group phase Contact genomic DNA fragment, one group of each of n particle, and carry out step (b) with generate n group be associated sequence reading It takes, one group of each of n particle.

12. method described in clause 11, wherein n is at least three, at least five, at least ten, at least 50, at least 100 It is a, at least 1000, at least 10,000, at least 100,000, at least 1,000,000, at least 10,000,000 or At least 100,000,000 particles.

13. method described in any one of clause 10 to 12, wherein before step (a), the method also includes by sample Product are assigned to the step at least two differential responses volumes.

14. method of the preparation for the sample of sequencing, wherein the sample includes the particle for being originated from blood, wherein particle packet Containing at least two genomic DNA fragments, and wherein, the method includes being attached at least two genomic DNA fragments of particle To sequence of barcodes or the different sequence of barcodes of sequence of barcodes group, to generate the group for the genomic DNA fragment that is associated.

15. method described in clause 14, wherein at least two genomic DNA fragments of particle are attached to sequence of barcodes Or sequence of barcodes group different sequence of barcodes the step of before, the method includes coupling sequence is attached to each base of particle Because of a group DNA fragmentation, wherein the coupling sequence is then attached to the different sequence of barcodes of sequence of barcodes or sequence of barcodes group, with Generate the group for the genomic DNA fragment that is associated.

16. method described in clause 14 or clause 15, wherein the sample includes the first and second particles from blood, Wherein each particle contains at least two genomic DNA fragment, and wherein the method includes by least the two of the first particle A genomic DNA fragment is attached to the different sequence of barcodes of the first sequence of barcodes or first group of sequence of barcodes, to generate first group Be associated genomic DNA fragment, and by least two genomic DNA fragments of the second particle be attached to the second sequence of barcodes or The different sequence of barcodes of second group of sequence of barcodes, to generate second group of genomic DNA fragment that is associated.

17. method described in any one of clause 1 to 13, the method comprise the steps that

(a) sample of the preparation for sequencing comprising at least two genomic DNA fragments of particle are attached to bar code sequence It arranges to generate the group for the genomic DNA fragment that is associated;And

(b) segment that is each associated in described group is sequenced to generate at least two and be associated sequence read, wherein Described at least two sequence reads that are associated are associated by sequence of barcodes.

18. method described in clause 17, wherein at least two genomic DNA fragments of particle are attached to sequence of barcodes The step of before, the method includes coupling sequence to be attached to each genomic DNA fragment of particle, wherein the coupling sequence Column are then attached to sequence of barcodes to generate the group for the genomic DNA fragment that is associated.

19. method described in clause 17 or clause 18, wherein the sample includes the first and second particles from blood, Wherein each particle contains at least two genomic DNA fragment, and wherein the method includes carrying out step (a) to generate the First group of one particle is associated second group of genomic DNA fragment that is associated of genomic DNA fragment and the second particle, Yi Jijin Row step (b) is to generate first group of the first particle be associated sequence read and second group of sequence reading that is associated of the second particle Take, wherein be associated sequence read relative to described at least two of second particle, first particle it is described at least Two sequence reads that are associated are associated by different sequence of barcodes.

20. method described in any one of clause 1 to 13, the method comprise the steps that

(a) sample of the preparation for sequencing comprising will be each at least two genomic DNA fragment of particle A different sequence of barcodes for being attached to sequence of barcodes group are to generate the group of the genomic DNA fragment that is associated;And

(b) segment that is each associated in described group is sequenced to generate at least two and be associated sequence read, wherein Described at least two sequence reads that are associated are associated by the sequence of barcodes group.

21. method described in clause 20, wherein by each of at least two genomic DNA fragments of the particle Before the step of being attached to different sequence of barcodes, the method includes coupling sequence is attached to each genome of the particle DNA fragmentation, wherein each of at least two genomic DNA fragment of the particle is attached to by its coupling sequence The different sequence of barcodes of the sequence of barcodes group.

22. method described in clause 20 or clause 21, wherein the sample includes the first and second particles from blood, Wherein each particle contains at least two genomic DNA fragment, and wherein the method includes carrying out step (a) to generate the First group of one particle is associated second group of genomic DNA fragment that is associated of genomic DNA fragment and the second particle, Yi Jijin Row step (b) is to generate first group of the first particle be associated sequence read and second group of sequence reading that is associated of the second particle It takes, wherein first group of sequence read that is associated passes through different sequence of barcodes group phases relative to second group of sequence read that is associated Connection.

23. method described in any one of clause 14 to 22, wherein the method includes preparing the first He for sequencing Second sample, wherein each sample includes at least one particle for being originated from blood, wherein each particle contains at least two gene Group DNA fragmentation, and wherein sequence of barcodes respectively contains sample identifier region, and the method comprise the steps that

(i) step (a) is carried out to each sample, wherein being attached to the bar code of the genomic DNA fragment from the first sample Sequence has different sample identifier regions from the sequence of barcodes for being attached to the genomic DNA fragment from the second sample;

(ii) step (b) is carried out to each sample, wherein the sequence read that is each associated includes sample identifier region Sequence;And

(iii) sample of each sequence read that is associated is obtained by the determination of its sample identifier region.

24. method described in any one of clause 14 to 23, wherein the step of being attached sequence of barcodes and/or coupling sequence Before, during and/or after, the method includes being crosslinked the genomic DNA fragment in particle.

25. method described in any one of clause 14 to 24, wherein the step of being attached sequence of barcodes and/or coupling sequence Before, during and/or after, and/or optionally after the step of being crosslinked the genomic DNA fragment in particle, the side Method includes the steps that making particle permeabilization.

26. method described in any one of clause 14 to 25, wherein the method also includes will before the step of being attached Sample is assigned to the step at least two differential responses volumes.

27. method of the preparation for the sample of sequencing, wherein the sample includes the first and second particles from blood, And wherein each particle contains at least two target nucleic acid fragment, and the method comprise the steps that

(a) contact sample with the library for containing at least two polymer bar code reagent, wherein each polymer bar code Changing reagent includes the first and second bar code regions to link together, wherein each bar code region includes nucleic acid sequence, and its In the first and second bar code regions of the first polymer bar code reagent be different from the second polymer bar code reagent in library First and second bar code regions;And

(b) sequence of barcodes is attached to each of first and second target nucleic acid fragments of the first particle to generate first First and second bar code target nucleic acid molecules of particle, wherein the first bar code target nucleic acid molecule includes the first polymer bar code The nucleic acid sequence in the first bar code region of reagent, and the second bar code target nucleic acid molecule includes the first polymer bar code reagent The second bar code region nucleic acid sequence, and sequence of barcodes is attached in the first and second target nucleic acid fragments of the second particle Each with generate the second particle the first and second bar code target nucleic acid molecules, wherein the first bar code target nucleic acid molecule packet The nucleic acid sequence in the first bar code region containing the second polymer bar code reagent, and the second bar code target nucleic acid molecule includes the The nucleic acid sequence in the second bar code region of two polymer bar code reagents.

28. method described in clause 27, the method comprise the steps that

(a) contact sample with the library for containing at least two polymer bar code reagent, wherein each polymer bar code Changing reagent includes the first and second bar code oligonucleotides to link together, and wherein bar code oligonucleotides respectively contains bar code Region, and the bar code region of the first and second bar code oligonucleotides of the first polymer bar code reagent of its Chinese library is not It is same as the bar code region of the first and second bar code oligonucleotides of the second polymer bar code reagent in library;And

(b) make the first and second bar code oligonucleotides of the first polymer bar code reagent and the first of the first particle and The annealing of second target nucleic acid fragment is connected to generate the first and second bar code target nucleic acid molecules, and makes the second polymer bar code Change the first and second bar code oligonucleotides and the second particle of reagent the first and second target nucleic acid fragments anneal or connect with Generate the first and second bar code target nucleic acid molecules.

29. method described in clause 28, wherein making the first and second bar code oligonucleotides and the first and second genes Before the step of annealing of group DNA fragmentation or connection, the method includes coupling sequence is attached to each genomic DNA fragment, The first and second bar code oligonucleotides are wherein made then to anneal or connect with the coupling sequence of the first and second genomic DNA fragments It connects.

30. method described in clause 28 or clause 29, wherein step (b) includes:

(i) make the first and second bar code oligonucleotides of the first polymer bar code reagent and the first of the first particle and The annealing of second genomic DNA fragment, and make the first and second bar code oligonucleotides of the second polymer bar code reagent with First and second genomic DNA fragments of the second particle are annealed;And

(ii) extend the first and second bar code oligonucleotides of the first polymer bar code reagent to generate the first He Second different bar code target nucleic acid molecule, and make the first and second bar code few nucleosides of the second polymer bar code reagent Acid extends to generate the first and second different bar code target nucleic acid molecules, wherein each bar code target nucleic acid molecule includes at least One nucleotide by genomic DNA fragment as templated synthesis.

31. method described in clause 28 or clause 29, the method comprise the steps that

(a) contact sample with the library for containing at least two polymer bar code reagent, wherein each polymer bar code Changing reagent includes the first and second bar code oligonucleotides to link together, and wherein bar code oligonucleotides is respectively with 5 ' to 3 ' Direction includes target region and bar code region, first and second bar code widow's cores of the first polymer bar code reagent of Chinese library The bar code region of thuja acid is different from the item of the first and second bar code oligonucleotides of the second polymer bar code reagent in library Code region, and contact sample further with the first and second target primers of each polymer bar code reagent;And

(b) each particle is followed the steps below

(i) target region of the first bar code oligonucleotides and the first subsequence of the first target nucleic acid fragment of particle are moved back Fire, and the first subsequence of the target region of the second bar code oligonucleotides and the second target nucleic acid fragment of particle is made to anneal,

(ii) the second subsequence of the first target nucleic acid fragment of the first target primer and particle is made to anneal, wherein second son Sequence moves back the second subsequence of the second target nucleic acid fragment of the second target primer and particle the 3 ' of first subsequence Fire, wherein second subsequence is the 3 ' of first subsequence,

(iii) using the first target nucleic acid fragment of particle as template makes the first target primer extend until it reaches the first son Sequence makes the second target primer extend until it is arrived using the second target nucleic acid fragment of particle to generate the first extension target primer Up to the first subsequence to generate the second extension target primer, and

(iv) the 3 ' of the first extension target primer is made to hold 5 ' the end connections with the first bar code oligonucleotides to generate first Codeization target nucleic acid molecule, and make the 3 ' of the second extension target primer to hold 5 ' the end connections with the second bar code oligonucleotides to generate Second bar code target nucleic acid molecule wherein the first and second bar code target nucleic acid molecules are different, and respectively contains at least One nucleotide by target nucleic acid as templated synthesis.

32. method described in any one of clause 27 to 31, wherein polymer bar code reagent respectively contains:

(i) the first and second hybrid molecules to link together, wherein each hybrid molecule includes to contain hybridising region Nucleic acid sequence;And

(ii) the first and second bar code oligonucleotides, wherein making the first bar code oligonucleotides and the first hybrid molecule Hybridising region annealing, and the hybridising region of the second bar code oligonucleotides and the second hybrid molecule is wherein made to anneal.

33. method described in clause 32, wherein polymer bar code reagent respectively contains:

(i) the first and second molecular bar codes to link together, wherein each molecular bar code includes to contain bar code region Nucleic acid sequence;And

(ii) the first and second bar code oligonucleotides, wherein the first bar code oligonucleotides includes and the first molecular bar code The annealing of bar code region bar code region, and wherein the second bar code oligonucleotides includes bar code area with the second molecular bar code The bar code region of domain annealing.

34. method of the preparation for the sample of sequencing, wherein the sample contains at least two the particle from blood, In each particle contain at least two target nucleic acid fragment, and the method comprise the steps that

(a) contact sample with the library comprising the first and second polymer bar code reagents, wherein each polymer item Codeization reagent includes the first and second molecular bar codes to link together, wherein each molecular bar code includes optionally with 5 ' to 3 ' Direction includes bar code region and the nucleic acid sequence for being connected subregion;

(b) coupling sequence is attached to the first and second target nucleic acid fragments of the first and second particles;

(c) for each polymer bar code reagent, make being connected for the coupling sequence of the first segment and the first molecular bar code Subregion annealing, and the coupling sequence of the second segment is made to anneal with the subregion that is connected of the second molecular bar code;And

(d) for each polymer bar code reagent, sequence of barcodes is attached to at least two target nucleic acid fragments of particle Each of to generate the first and second different bar code target nucleic acid molecules, wherein the first bar code target nucleic acid molecule includes The nucleic acid sequence in the bar code region of the first molecular bar code, and the second bar code target nucleic acid molecule includes the item of the second molecular bar code The nucleic acid sequence in code region.

35. method described in clause 34, wherein it includes bar code region and rank that each molecular bar code, which includes with 5 ' to 3 ' directions, The nucleic acid sequence of subregion is connect, and wherein step (d) includes, for each polymer bar code reagent, uses the first bar code The bar code region of molecule extends the coupling sequence of the first segment to generate the first bar code target nucleic acid molecule as template, and The bar code region of the second molecular bar code is used to extend the coupling sequence of the second segment to generate the second bar code target as template Nucleic acid molecules, wherein the first bar code target nucleic acid molecule includes the sequence complementary with the bar code region of the first molecular bar code, and Second bar code target nucleic acid molecule includes the sequence complementary with the bar code region of the second molecular bar code.

36. method described in clause 34, wherein each molecular bar code include with 5 ' to 3 ' directions include linking subregion and The nucleic acid sequence in bar code region, wherein step (d) include, for each polymer bar code reagent,

(i) using the bar code region of the first molecular bar code makes the first extension primer anneal and is extended as template to generate First bar code oligonucleotides, and use the bar code region of the second molecular bar code that the second extension primer is made to anneal as template And extend to generate the second bar code oligonucleotides, wherein the first bar code oligonucleotides includes the bar code with the first molecular bar code The sequence of region complementation, and the second bar code oligonucleotides includes the sequence complementary with the bar code region of the second molecular bar code,

(ii) make the 3 ' of the first bar code oligonucleotides to hold to be connected with 5 ' ends of the coupling sequence of the first segment to generate the One bar code target nucleic acid molecule, and make the 3 ' of the second bar code oligonucleotides to hold and connect with 5 ' ends of the coupling sequence of the second segment It connects to generate the second bar code target nucleic acid molecule.

37. method described in clause 34, wherein it includes linking subregion, item that each molecular bar code, which includes with 5 ' to 3 ' directions, The nucleic acid sequence in code region and initiation area, wherein step (d) include, for each polymer bar code reagent,

(i) the first extension primer and the initiation area of the first molecular bar code is made to anneal and using the item of the first molecular bar code Code region as template make the first extension primer extend to generate the first bar code oligonucleotides, and make the second extension primer and The initiation area of second molecular bar code anneals and the bar code region of the second molecular bar code is used to draw the second extension as template Object extends to generate the second bar code oligonucleotides, wherein the first bar code oligonucleotides includes the bar code with the first molecular bar code The sequence of region complementation, and the second bar code oligonucleotides includes the sequence complementary with the bar code region of the second molecular bar code, And

(ii) make the 3 ' of the first bar code oligonucleotides to hold to be connected with 5 ' ends of the coupling sequence of the first segment to generate the One bar code target nucleic acid molecule, and make the 3 ' of the second bar code oligonucleotides to hold and connect with 5 ' ends of the coupling sequence of the second segment It connects to generate the second bar code target nucleic acid molecule.

38. method described in clause 34, the method comprise the steps that

(a) contact sample with the library comprising the first and second polymer bar code reagents, wherein each polymer item Codeization reagent includes the first and second molecular bar codes to link together, wherein each molecular bar code includes with 5 ' to 3 ' directions packet The nucleic acid sequence in region containing bar code and linking subregion, and wherein sample further with each polymer bar code reagent the One and the second adapter oligonucleotides contact, wherein the first and second adapter oligonucleotides respectively contain linking subregion, with And

(b) make the first and second adapter oligonucleotides of the first polymer bar code reagent and the first of the first particle and The connection of second target nucleic acid fragment, and make the first and second adapter oligonucleotides and second of the second polymer bar code reagent First and second target nucleic acid fragments of particle connect;

(c) for each polymer bar code reagent, make the first adapter oligonucleotides linking subregion and first The linking subregion annealing of code molecule, and make the rank of the linking subregion and the second molecular bar code of the second adapter oligonucleotides Connect subregion annealing;And

(d) for each polymer bar code reagent, the bar code region of the first molecular bar code is used to make first as template Adapter oligonucleotides is extended to generate the first bar code target nucleic acid molecule, and is made using the bar code region of the second molecular bar code Extend the second adapter oligonucleotides to generate the second bar code target nucleic acid molecule, wherein the first bar code target nucleic acid for template Molecule includes the sequence complementary with the bar code region of the first molecular bar code, and the second bar code target nucleic acid molecule includes and second The sequence of the bar code region complementation of molecular bar code.

39. method described in clause 34, the method comprise the steps that

(a) contact sample with the library comprising the first and second polymer bar code reagents, wherein each polymer item Codeization reagent includes:

(i) the first and second molecular bar codes to link together, wherein each molecular bar code includes optionally with 5 ' to 3 ' Direction includes the nucleic acid sequence of linking subregion and bar code region, and

(ii) the first and second bar code oligonucleotides, wherein the first bar code oligonucleotides includes and the first molecular bar code The annealing of bar code region bar code region, wherein the second bar code oligonucleotides includes to move back with the bar code region of the second molecular bar code The bar code region of fire, and the item of the first and second bar code oligonucleotides of the first polymer bar code reagent of its Chinese library Code region is different from the bar code region of the first and second bar code oligonucleotides of the second polymer bar code reagent in library;Its Middle sample is further contacted with the first and second adapter oligonucleotides of each polymer bar code reagent, wherein first and Two adapter oligonucleotides respectively contain linking subregion;

(b) make the first and second adapter oligonucleotides of the first polymer bar code reagent and the first of the first particle and The annealing of second target nucleic acid fragment or connection, and make the first and second adapter oligonucleotides of the second polymer bar code reagent It anneals or connect with the first and second target nucleic acid fragments of the second particle;

(c) for each polymer bar code reagent, make the first adapter oligonucleotides linking subregion and first The linking subregion annealing of code molecule, and make the rank of the linking subregion and the second molecular bar code of the second adapter oligonucleotides Connect subregion annealing;And

(d) for each polymer bar code reagent, the 3 ' of the first bar code oligonucleotides is made to hold and the first adapter widow 5 ' end connections of nucleotide make the 3 ' of the second bar code oligonucleotides to hold and the to generate the first bar code target nucleic acid molecule 5 ' end connections of two adapter oligonucleotides are to generate the second bar code target nucleic acid molecule.

40. method described in clause 39, wherein step (b) includes make the first polymer bar code reagent first and second The annealing of first and second target nucleic acid fragments of adapter oligonucleotides and the first particle, and make the second polymer bar code reagent The first and second adapter oligonucleotides and the first and second target nucleic acid fragments of the second particle anneal, and wherein:

(i) for each polymer bar code reagent, step (d) includes making the 3 ' of the first bar code oligonucleotides to hold and the 5 ' end connections of one adapter oligonucleotides make second bar code widow's core to generate the first bar code adapter oligonucleotides 3 ' ends of thuja acid are connected with 5 ' ends of the second adapter oligonucleotides to generate the second bar code adapter oligonucleotides, and are made First and second bar code adapter oligonucleotides extend to generate the first and second different bar code target nucleic acid molecules, every One nucleotide comprising at least one by target nucleic acid fragment as templated synthesis, or

(ii) for each polymer bar code reagent, before step (d), the method includes making the first and second ranks Sub- oligonucleotides is connect to generate the first and second different target nucleic acid molecules, each of which includes at least one by target nucleic acid fragment Nucleotide as templated synthesis.

41. method described in any one of clause 38 to 40, wherein making the first and second adapter oligonucleotides and One and second target nucleic acid fragment annealing or connection the step of before, the method includes coupling sequence is attached to each target nucleic acid Segment, wherein the first and second adapter oligonucleotides are then annealed or connected with the coupling sequence of the first and second target nucleic acid fragments It connects.

42. method described in any one of clause 27 to 41, wherein step (a) and (b) and optional (c) and (d) In It is carried out at least two particles in single reaction volume.

43. method described in any one of clause 27 to 41, wherein before step (b), the method also includes by sample Product are assigned to the step at least two differential responses volumes.

44. method described in any one of clause 1 to 26, the method comprise the steps that

(a) sample of the preparation for sequencing comprising:

(i) sample and the polymer bar code reagent in the first and second bar code regions comprising linking together are contacted, Wherein each bar code region includes nucleic acid sequence, and

(ii) sequence of barcodes is attached to each of at least two genomic DNA fragments of particle to generate the first He Second different bar code target nucleic acid molecule, wherein the first bar code target nucleic acid molecule includes the nucleic acid sequence in the first bar code region And the second bar code target nucleic acid molecule includes the nucleic acid sequence in the second bar code region;And

(b) each bar code target nucleic acid molecule is sequenced to generate at least two and be associated sequence read.

45. method described in clause 44, wherein at least two genomic DNA fragments that sequence of barcodes is attached to particle Each of the step of before, the method includes coupling sequence to be attached to each genomic DNA fragment of particle, wherein Sequence of barcodes is then attached to the coupling sequence of each of at least two genomic DNA fragments of particle to generate One and the second different bar code target nucleic acid molecule.

46. method described in clause 44 or clause 45, wherein step (a) passes through side described in any one of clause 27 to 43 Method carries out.

47. method described in any one of clause 44 to 46, wherein the method includes preparing the first He for sequencing Second sample, wherein each sample includes at least one particle for being originated from blood, wherein particle contains at least two genomic DNA Segment, and wherein sequence of barcodes respectively contains sample identifier region, and the method comprise the steps that

(i) step (a) is carried out to each sample, wherein being attached to the bar code of the genomic DNA fragment from the first sample Sequence has different sample identifier regions from the sequence of barcodes for being attached to the genomic DNA fragment from the second sample;

(ii) step (b) is carried out to each sample, wherein each sequence read includes the sequence in sample identifier region;With And

(iii) sample of each sequence read is obtained by the determination of its sample identifier region.

48. method described in any one of clause 44 to 47 is originated from wherein containing at least two the method includes analysis The sample of the particle of blood, wherein each particle contains at least two genomic DNA fragment, and wherein the method includes with Lower step:

(a) sample of the preparation for sequencing comprising:

(i) make the polymer item of sample Yu the polymer bar code reagent comprising each of two or more particles Codeization agent library contact, wherein each polymer bar code reagent is as defined in any one of clause 44 to 46;And

(ii) sequence of barcodes is attached to each of at least two genomic DNA fragments of each particle, wherein by Each of at least two particles generate at least two bar code target nucleic acid molecules, and wherein by respectively containing from same The single particle of the nucleic acid sequence in the bar code region of polymer bar code reagent generates at least two bar code target nucleic acid molecules;With And

(b) each bar code target nucleic acid molecule is sequenced with generate at least two of each particle be associated sequence read It takes.

49. method described in clause 48, wherein sequence of barcodes to be attached to the gene of the particle in single reaction volume Group DNA fragmentation.

50. method described in clause 48, wherein before the attachment the step of, the method also includes by sample be assigned to Step in few two differential responses volumes.

51. method described in any one of clause 1 to 13, the method comprise the steps that

(a) preparation for sequencing sample comprising by least two genomic DNA fragments of particle link together with Generate the single nucleic acid molecules for containing at least two the sequence of genomic DNA fragment;And

(b) each segment in single nucleic acid molecules is sequenced to generate at least two and be associated sequence read.

52. method described in clause 51, wherein at least two genomic DNA fragment is to connect in single nucleic acid molecules Continuous.

53. method described in clause 51, wherein the method includes being attached to coupling sequence before the step of contacting At least one genomic DNA fragment, and then contacted at least two genomic DNA fragments one by the coupling sequence It rises.

54. method described in clause 51 to 53, wherein genomic DNA fragment is linked together by connecting reaction.

55. method described in any one of clause 51 to 54, wherein the sample is contained at least two from the micro- of blood Grain, wherein each particle contains at least two genomic DNA fragment, and wherein the method includes carrying out step (a) to produce The single nucleic acid molecules of the sequence of raw at least two genomic DNA fragments comprising each particle, and step (b) is carried out to produce The sequence read that is associated of raw each particle.

56. method described in any one of clause 51 to 55, wherein contacting by least two genomic DNA fragments one Before, during and/or after the step of rising, the method includes being crosslinked the genomic DNA fragment in particle.

57. method described in any one of clause 51 to 56, wherein contacting by least two genomic DNA fragments one Before, during and/or after the step of rising, and/or optionally the step of being crosslinked genomic DNA fragment in particle it Afterwards, the method includes making particle permeabilization.

58. method described in any one of clause 55 to 57, wherein before step (a), the method also includes by sample Product are assigned to the step at least two differential responses volumes.

59. method described in any one of clause 13,26,43,50 and 58, wherein the sample that particle will be contained at least two It is assigned at least two different reaction volumes.

60. method described in clause 59, wherein different reaction volumes is provided by different reaction vessels.

61. method described in clause 59, wherein different reaction volumes is provided by different aqueous droplets.

62. method described in clause 61, wherein different aqueous droplets is the aqueous droplet of difference in lotion.

63. method described in clause 61, wherein different aqueous droplets is the aqueous droplet of difference on solid support.

64. method described in any one of clause 1 to 13, the method comprise the steps that

(a) sample of the preparation for sequencing, wherein at least two genomic DNA fragment of particle is being surveyed by them Degree closer to each other on sequence device and be associated, to generate at least two groups for being associated genomic DNA fragment;And

(b) each genomic DNA fragment that is associated is sequenced to generate at least two and be associated using sequencing device Sequence read.

65. method described in clause 64, wherein the sample contains at least two the particle from blood, wherein each micro- Grain contains at least two genomic DNA fragment, and wherein the method includes carrying out step (a) to generate the phase of each particle The group of genomic DNA fragment is contacted, and wherein the genomic DNA fragment of each particle is spatially not on sequencing device With, and step (b) is carried out to generate the sequence read that is associated of each particle.

66. method described in any one of clause 1 to 13, wherein the sample includes:

(a) sample of the preparation for sequencing, wherein at least two genomic DNA fragments of each particle are by being loaded into list It is associated in only sequencing approach to generate at least two groups for being associated genomic DNA fragment;And

(b) each genomic DNA fragment that is associated is sequenced to generate at least two and be associated using sequencing device Sequence read.

67. method described in clause 66, wherein the sample contains at least two the particle from blood, wherein each micro- Grain contains at least two genomic DNA fragment, and wherein the method includes carrying out step (a) to generate the phase of each particle Genomic DNA fragment is contacted, wherein at least two genomic DNA fragments of each particle are by being loaded into independent sequencing approach To be associated, and step (b) is carried out to generate the sequence read that is associated of each particle to each sequencing approach.

68. determining the sequence read group method that is associated of the genomic DNA fragment from single particle, wherein the method Include:

(a) method according to any one of clause 1 to 26 and 44 to 67 analyzes sample;And

(b) two or more sequence reads that are associated are determined.

69. method described in clause 68, wherein determining two by identifying the sequence read comprising same sequence of barcodes Or more the sequence read that is associated.

70. method described in clause 68, wherein including the different sequence of barcodes from same sequence of barcodes group by identification Sequence read determine two or more sequence reads that are associated.

71. method described in clause 68, wherein including the bar code area from same polymer bar code reagent by identification The sequence read of the sequence of barcodes in domain determines two or more sequence reads that are associated.

72. the method for determining the sum for the sequence read group that is associated in sequence data collection comprising:

(a) method according to any one of clause 1 to 26 and 44 to 67 analyzes sample;And

(b) number for the sequence read group that is associated is determined.

73. method described in clause 72, wherein by being counted to the number of the sequence read comprising different sequence of barcodes Count the number to determine the sequence read group that is associated.

74. method described in clause 72, wherein by the sequence of barcodes group with the sequence of barcodes in sequence read into Row counts the number to determine the sequence read group that is associated.

75. method described in clause 72, wherein by the number to the polymer bar code reagent with bar code region into Row counts the number to determine the sequence read group that is associated, and the sequence of barcodes in the bar code region is in sequence read.

76. the method for determining parameter value from the group for the sequence read that is associated, the method comprise the steps that

(a) method according to any one of clause 68 to 71 determines the group for the sequence read that is associated;And

(b) at least part by each sequence read for the sequence read group that is associated maps to one or more references Nucleotide sequence;And

(c) by being counted to one or more reference nucleotide sequences in the sequence read group that is associated or identifying it In the presence of determining parameter value.

77. the method for determining the sequence read group that is associated comprising:

(a) parameter value of two or more each of sequence read groups that are associated is determined, wherein according to clause 76 The method determines the parameter value for the sequence read group that is each associated;And

(b) parameter value for the sequence read group that is associated is compared each other or is compared with one or more threshold values To identify the group of two or more sequence read groups that are associated.

78. determine the genomic DNA fragment from single particle the sequence read that is associated group in genome rearrangement or The existing method of structural variant, the method comprise the steps that

(a) method according to any one of clause 68 to 71 determines the group for the sequence read that is associated;And

(b) at least part of each sequence for the sequence read group that is associated is mapped to comprising the first genome area First reference nucleotide sequence, and at least part of each sequence for the sequence read group that is associated is mapped to comprising second Second reference nucleotide sequence of genome area;And

(c) to the number for being found to be mapped in the sequence read from the sequence read group that is associated in the first genome area Mesh is counted, and the sequence read from the sequence read group that is associated to being found to be mapped in the second genome area Number counted.

79. the method that pair two variant allele determine phase, wherein the first variant allele is included in the first base Because in group region, and wherein, the second variant allele is included in the second genome area, and wherein each variant etc. Position gene has at least two variants or potential variant, the method comprise the steps that

(a) method according to any one of clause 68 to 71 determines the sequence read group that is associated;And

(b) it determines to whether there is comprising the sequence from the potential variant of each of the first variant allele and be connected in described Be in sequence read group, and determine comprising the sequence from the potential variant of each of the second variant allele whether there is in It is same to be associated in sequence read group.

80. the method for determining the sequence read group that is associated of fetal origin, the method comprise the steps that

(a) method according to any one of clause 68 to 71 determines the group for the sequence read that is associated, wherein the sample Product include the particle from maternal blood;And

(b) by least part of each sequence read for the sequence read group that is associated be present in Fetal genome The reference listing of sequence is compared;And

(c) by there is one from reference listing in one or more sequence reads for the sequence read group that is associated A or more sequence identifies the sequence read group that is associated of fetal origin.

81. the method for determining fetus genotype comprising:

(a) method according to clause 80 determines the sequence read group that is associated of fetal origin;And

(b) fetus genotype is determined from the sequence read group that is associated of fetal origin.

82. it diagnoses the illness in study subject or the method for illness, the method comprise the steps that

(a) the determine from the given the test agent from object the parameter value for being associated sequence read group is determined, wherein basis Method described in clause 76 determines parameter value;And

(b) parameter value of the sequence read group that is associated determined from given the test agent is compared with control parameters value.

83. method described in clause 82, wherein being read from second determined by the given the test agent from object the sequence that is associated A group determining control parameters value is taken, wherein the method according to clause 76 determines control parameters value.

84. method described in clause 82, wherein determining control ginseng from the sequence read group that is associated determined by control sample Numerical value, wherein the method according to clause 76 determines control parameters value.

85. the method that disease or illness are monitored in study subject, the method comprise the steps that

(a) the first parameter value for being associated sequence read group determined from the given the test agent from object is determined, wherein root Parameter value is determined according to method described in clause 76;And

(b) parameter value for the sequence read group that is associated is compared with control parameters value.

86. method described in clause 85, wherein from by pair for being obtained from same object than the time point of given the test agent earlier The second sequence read group that is associated that product determine in the same old way determines control parameters value, optionally the wherein method according to clause 76 Determine control parameters value.

87. the method to diagnose the illness in object, the method comprise the steps that

(a) method according to any one of clause 68 to 71 determines the sequence read group that is associated, wherein the sample Particle comprising being originated from blood;And

(b) by least part of each sequence read for the sequence read group that is associated in the cell for being present in disease The reference listing of sequence is compared, and is come from wherein existing in one or more sequence reads for the sequence read group that is associated One or more sequences of reference listing show the presence of disease.

88. the method for determining the sequence read group that is associated in sick cell source, the method comprise the steps that

(a) the sequence read group that is associated is determined according to any one of clause 68 to 71, wherein the sample includes to be originated from blood The particle of liquid;And

(b) by least part of each sequence read for the sequence read group that is associated in the cell for being present in disease The reference listing of sequence is compared;And

(c) by there is one from reference listing in one or more sequence reads for the sequence read group that is associated A or more sequence identifies the sequence read group that is associated in sick cell source.

89. method described in clause 88, wherein the method includes determining the sequence read that is associated in tumour cell source Group, and the method comprise the steps that

(a) the sequence read group that is associated is determined according to any one of clause 68 to 71, wherein the sample includes to be originated from blood The particle of liquid;And

(b) by least part of each sequence read for the sequence read group that is associated and the sequence that is present in tumour cell The reference listing of column is compared;And

(c) by there is one from reference listing in one or more sequence reads for the sequence read group that is associated A or more sequence identifies the sequence read group that is associated in tumour cell source.

90. the method for determining tumor genotype comprising:

(a) method according to clause 89 determines the sequence read group that is associated in tumour source;And

(b) tumor genotype is determined from the sequence read group that is associated in tumour source.

Detailed description of the invention

By referring to description taken together with the accompanying drawings, the present invention and its other objects and advantages can be best understood, in which:

Fig. 1 shows the polymer bar code reagent that can be used for method shown in Fig. 3 or Fig. 4.

Fig. 2 shows the external members comprising polymer bar code reagent and adapter oligonucleotides for tagged target nucleic acid (kit)。

Fig. 3 shows the first method using the preparation of polymer bar code reagent for the nucleic acid samples of sequencing.

Fig. 4 shows the second method using the preparation of polymer bar code reagent for the nucleic acid samples of sequencing.

Fig. 5 shows the nucleic acid samples using polymer bar code reagent and the preparation of adapter oligonucleotides for being sequenced Method.

Fig. 6, which is shown, to be prepared using polymer bar code reagent, adapter oligonucleotides and target oligonucleotide for being sequenced Nucleic acid samples method.

Fig. 7 shows the method using Rolling Circle Amplification methods assembling polymer molecular bar code.

Fig. 8, which shows synthesis, can be used for the polymer for tagged target nucleic acid of method shown in Fig. 3, Fig. 4 and/or Fig. 5 The method of bar code reagent.

Fig. 9, which shows synthesis, can be used for the polymer bar code for tagged target nucleic acid of method shown in Fig. 3 and/or Fig. 4 Change the alternative of reagent (as shown in fig. 1).

Figure 10 is to show the figure of each sequence of barcodes inner nucleotide sum.

Figure 11 is the figure of unique barcode number molecule sum in the polymer molecular bar code for show each sequencing.

Figure 12 shows the representative polymer molecular bar code detected by analyzing script (script).

Figure 13 is the synthesis shown with the polymer bar code reagent comprising bar code oligonucleotides to known array After DNA profiling carries out bar code, the number of the unique barcode number of each molecular sequences identifier is relative to molecular sequences identifier Number figure.

Figure 14 is the synthesis shown with polymer bar code reagent and independent adapter oligonucleotides to known array After DNA profiling carries out bar code, the number of the unique barcode number of each molecular sequences identifier is relative to molecular sequences identifier Number figure.

Figure 15 be show with the polymer bar code reagent comprising bar code oligonucleotides to three kinds of people's genes (BRCA1, HLA-A and DQB1) genomic DNA locus carry out bar code result table.

Figure 16 is to carry out item to genomic DNA locus from the polymer bar code reagent comprising bar code oligonucleotides The schematic diagram for the sequence read that codeization obtains.

Figure 17 is to show to carry out the sequence on same synthesis template molecule from same polymer bar code reagent Figure of the number of the bar code of label relative to the number of synthesis template molecule.

Figure 18 shows a kind of method, wherein two or more sequences from particle have been determined and have made it in information It is associated.

Figure 19 shows a kind of method, wherein the sequence from specific particle passes through shared identifier.

Figure 20 shows a kind of method, and wherein molecular bar code is attached to the genomic DNA fragment in the particle being assigned, And wherein the bar code provides the connection being originated between the sequence of same particle.

Figure 21 shows a kind of specific method, and wherein molecular bar code is attached in particle by polymer bar code reagent Genomic DNA fragment, and wherein the bar code provides the connection being originated between the sequence of same particle.

Figure 22 shows a kind of method, and wherein the genomic DNA fragment in individual particles is attached to each other, and wherein to institute It obtains molecule to be sequenced, so that determining two or more genomic DNA fragments from same particle from same sequencing molecule Sequence, thus between the segment in same particle establish connection.

Figure 23 shows a kind of method, wherein arrogant micro- to coming in two or more independent independent sequencing reactions The individual particles (and/or small particle group) of grain sample are sequenced, and the sequence determined from each such sequencing reaction Therefore column are determined to be in is associated in information and is therefore predicted to be from same individual particles (and/or small Particle Swarm Group).

Figure 24 shows a kind of specific method, wherein by the genomic DNA fragment in individual particles before being sequenced It is attached to the zone of dispersion of sequencing flow cell, and the degree of approach for the segment being wherein sequenced on the flow cell is provided and is originated from together Connection between the sequence of one particle.

Figure 25 shows the connection of the sequence of genomic DNA fragment in circulation particle, such as passes through attachment bar code few nucleosides Caused by the method for acid (' modification A ' version from example scheme).It shows in human genome on all chromosomes The density of sequence read has in individual chromosome section and clearly reads cluster (clustering of read).

Figure 26 shows the connection of the sequence of genomic DNA fragment in circulation particle, such as passes through attachment bar code few nucleosides Caused by the method for acid (from example scheme ' variant B ' version).It shows in human genome on all chromosomes The density of sequence read has in individual chromosome section and clearly reads cluster.

Figure 27 shows the connection of the sequence of genomic DNA fragment in circulation particle, such as passes through attachment bar code few nucleosides Caused by the method for acid (from example scheme ' variant B ' version).It shows and amplifies in specific chromosome segment The density of sequence read, to show the high density characteristic of the concentration of these readings that are associated.

Figure 28 shows the connection of the sequence of genomic DNA fragment in circulation particle, such as passes through attachment bar code few nucleosides Caused by the method for acid (from example scheme ' variant C ' version).It shows in human genome on all chromosomes The density of sequence read has in individual chromosome section and clearly reads cluster, although such section ratio is in other changes Chromosome span is bigger (since bigger particle compared with modification A or B precipitates in variant C) in body method.

Figure 29 shows negative control experiment, and wherein the segment of genomic DNA is before being attached to bar code oligonucleotides Purified (not being associated i.e. therefore).Reading cluster is not observed at all, it was confirmed that circulation particle includes from collection In consecutive gene group region genomic DNA fragment.

The detailed description of Figure 18 to each of 29 is provided below.

Figure 18 shows a kind of method, wherein two or more sequences from particle have been determined and have made it in information It is associated.In the method, the particle in blood, blood plasma or blood serum sample or from its source includes two or more A genomic DNA fragment.Determine at least part of sequence of these genomic DNA fragments;And in addition, by one kind or more A variety of methods establish informedness connection, so that the first and second sequences from particle are associated.

This connection can take any form, such as shared identifier (for example, it can be from molecular bar code process Period attaches to the shared bar code of first and second genomic dna sequence);Any other shared characteristic can also be used for making Two sequences are associated;Data comprising sequence itself may include in shared electronic storage medium or its subregion.In addition, connection System may include non-binary or relative value, such as indicate the physical proximity of two segments in the sequencing reaction of spatial econometric, Or indicate two sequences can from include the estimation of genomic DNA fragment in same particle a possibility that or probability.

Figure 19 shows a kind of method, wherein the sequence from specific particle passes through shared identifier.Institute It states in method, it is determined that be included in two different particles (for example, being originated from two differences of single blood, blood plasma or blood serum sample Particle) in genomic DNA fragment sequence number, for example, pass through nucleic acid sequencing react.Corresponding to from the first particle The sequence of genomic DNA fragment is respectively assigned to identical informedness identifier (here, identifier ' 0001 '), and corresponding It is respectively assigned to same different informedness identifier in the sequence of the genomic DNA fragment from the second particle (here, mark Know symbol ' 0002 ').Therefore, this of sequence and respective identifier information include from same particle sequence between in information It is associated, wherein the function that the identifier service of different groups is contacted in informedness.

Figure 20 shows a kind of method, and wherein molecular bar code is attached to the genomic DNA fragment in the particle being assigned And wherein the bar code provides the connection being originated between the sequence of same particle.In the method, particulate samples will be come from Particle be assigned to two or more subregions, and then in subregion in particle genomic DNA fragment carry out bar code, And then in this way determine sequence: bar code identification sequence obtained by which subregion, and thus make come from individual particles Different sequences be associated.

In the first step, particle is assigned to two or more subregions (it may include for example different physical reactions Different droplets in container or lotion).Then from the segment of the particle release genomic DNA in each subregion (that is, making segment It is physically close to, so that they then can be by bar code).The release steps by high temperature incubation step and/or can pass through use Molecular solvent or chemical surfactant are incubated for carry out.Optionally (but being not shown on this), can before being attached sequence of barcodes, Amplification step is carried out at the time point, so that all or part of duplication of genomic DNA fragment is at least once (such as in PCR In reaction), and then sequence of barcodes then can be attached to gained duplication product.

Sequence of barcodes is then attached to genomic DNA fragment.Sequence of barcodes can take any form, such as include bar code Molecular bar code in the primer in region or the bar code oligonucleotides or polymer molecular bar code in polymer bar code reagent. Sequence of barcodes can also be attached in any manner, such as anti-by primer extend and/or PCR reaction or single-stranded or double-stranded connection It answers, or passes through external swivel base.Under any circumstance, the method for being attached sequence of barcodes generates molecular solution in each subregion, In each such molecule include sequence of barcodes and then comprising corresponding to the gene from the particle for being assigned to the subregion Organize all or part of of the sequence of DNA fragmentation.

It then merges the molecule comprising bar code from different subregions to form single reactant, and then to institute Obtain sequence of the molecule progress sequencing reaction to determine genomic DNA and the sequence of barcodes attached by them.Then using relevant Sequence of barcodes identifies to obtain the subregion of each sequence, and thus make to determine in sequencing reaction be originated from and be included in same particle Or the sequence of the genomic DNA fragment in the group of particle is associated.

Figure 21 shows a kind of specific method, and wherein molecular bar code is attached in particle by polymer bar code reagent Genomic DNA fragment, and wherein the bar code provides the connection being originated between the sequence of same particle.In the method In, it is crosslinked particle from particulate samples and then carries out permeabilization, and then by polymer bar code reagent to being included in Genomic DNA fragment in particle carries out bar code, and then determines sequence in this way: bar code identifies each sequence Which polymer bar code reagent to carry out bar code by, and thus so that the different sequences from individual particles is associated.

In the first step, it is crosslinked the particle from particulate samples by chemical cross-linking agent.The step is for keeping every The purpose of genomic DNA fragment in a particle physical proximity each other, so that sample can be operated and handled, while retaining particle Basic structure property (that is, simultaneously keep be originated from same particle genomic DNA fragment physical proximity).In second step In, make the particle permeabilization of crosslinking (that is, being close to genomic DNA fragment physically, so that they then can be in bar code step It is middle by bar code);This permeabilization can be carried out for example by being incubated for chemical surfactant (such as nonionic detergent).

Sequence of barcodes is then attached to genomic DNA fragment, wherein it is (and/or more to be included in polymer bar code reagent Aggressiveness molecular bar code) in sequence of barcodes be attached to the segment in identical crosslinked fine particles.Sequence of barcodes can be in any manner Attachment, such as reacted by primer extension reaction, or by single-stranded or double-stranded connection.The method for carrying out attachment sequence of barcodes, makes The library of many polymer bar code reagents (and/or polymer molecular bar code) is obtained for being attached to sequence comprising many crosslinkings The sample of particle makes each polymer bar code reagent (and/or polymer molecular bar code) usually only right under diluting condition It include the sequence progress bar code in single particle.

Bar code sequence of the sequencing reaction to determine the sequence of genomic DNA and attached by them then is carried out to gained molecule Column.Then identify that each sequence passes through which polymer bar code reagent (and/or polymer item using relevant sequence of barcodes Code molecule) carry out bar code, and thus make to determine in sequencing reaction to be originated from include genomic DNA piece in same particle The sequence of section is associated.

Figure 22 shows a kind of method, and wherein the genomic DNA fragment in individual particles is attached to each other, and wherein to institute It obtains molecule to be sequenced, so that determining two or more genomic DNA fragments from same particle from same sequencing molecule Sequence, thus between the segment in same particle establish connection.In the method, make the genome in individual particles DNA fragmentation is cross-linked to each other, and subsequent flat end, and is connected to each other to gained genomic DNA flat end segment continuously Manifold sequence.Then gained molecule is sequenced, so that from two or more for being included in same sequencing intramolecular Therefore the sequence of a genomic DNA fragment is confirmed as being associated because same particle is originated from.

In the first step, it is crosslinked the particle from particulate samples by chemical cross-linking agent.The step is for keeping every The purpose of genomic DNA fragment in a particle physical proximity each other, so that sample can be operated and handled, while retaining particle Basic structure property (that is, simultaneously keep be originated from same particle genomic DNA fragment physical proximity).In second step In, make the particle permeabilization of crosslinking (that is, being close to genomic DNA fragment physically, so that they then can be in bar code step It is middle by bar code);This permeabilization can be carried out for example by being incubated for chemical surfactant (such as nonionic detergent).

In the next step, make the blunt ended (that is, any single-stranded overhang of the genomic DNA fragment in each particle It is removed and/or end is filled) so that end can be attached to each other in double-strand connection reaction.Then carry out double-strand connection Reaction (such as with T4 DNA ligase), wherein including that the flat end end of the molecule in same particle is connected to each other to continuously Manifold double-stranded sequence.Connection reaction (or any other step) can carry out under diluting condition, so that being included in two The false connection product between sequence in a or more difference particle minimizes.

Sequencing reaction then is carried out to gained molecule, with the genomic dna sequence in each multi-section fractionated molecule of determination.Then Evaluation gained molecule so that from be included in it is same sequencing intramolecular two or more genomic DNA fragments sequence because This is confirmed as being associated because same particle is originated from.

Figure 23 shows a kind of method, wherein arrogant micro- to coming in two or more independent independent sequencing reactions The individual particles (and/or small particle group) of grain sample are sequenced, and the sequence determined from each such sequencing reaction Therefore column are determined to be in is associated in information and is therefore predicted to be from same individual particles (and/or small Particle Swarm Group).In the method, the particle from particulate samples is divided into two or more individual subsamples of particle.Each Subsample may include one or more individual particles, but under any circumstance will be only comprising a part of primary particle sample.

Genomic DNA fragment in each subsample is subsequently released and is shaped such that them can be sequenced Form (for example, they are attached to sequencing adapter such as Illumina sequencing adapter, and is optionally amplified and purifies For being sequenced).The step of this method may include or may not include attachment sequence of barcodes;Optionally, the molecule of sequencing, which does not include, appoints What sequence of barcodes.

Then in individual independent sequencing reaction to from each independent subsample genomic DNA fragment (and/or its The copy of duplication) it is sequenced.For example, the molecule from each subsample can be sequenced on individually sequencing flow cell, or Person can be sequenced in the different swimming lanes of flow cell, or can be tested in the different port or flow cell of nano-pore sequencing instrument Sequence.

The then molecule that evaluation gained is sequenced, so that the sequence from same independent sequencing reaction is confirmed as because being originated from Same particle (and/or from same small particle group) and be associated.

Figure 24 shows a kind of specific method, wherein by the genomic DNA fragment in individual particles before being sequenced It is attached to the zone of dispersion of sequencing flow cell, and the degree of approach for the segment being wherein sequenced on the flow cell includes being originated from together Connection between the sequence of one particle.In the method, it is crosslinked the particle from particulate samples and then carries out permeabilization, and It then will include that genomic DNA fragment in individual particles is attached to sequencing flow cell, so that from same individual particles Two or more segments are attached to the same area of flow cell.Then the molecule of attachment is sequenced, and gained sequence The degree of approach on flow cell includes connection value, wherein the sequence on predictable flow cell in close proximity is originated from primary sample Interior same individual particles.

In the first step, it is crosslinked the particle from particulate samples by chemical cross-linking agent.The step is for keeping every The purpose of genomic DNA fragment in a particle physical proximity each other, so that sample can be operated and handled, while retaining particle Basic structure property (that is, simultaneously keep be originated from same particle genomic DNA fragment physical proximity).In second step In, make the particle permeabilization of crosslinking (that is, being close to genomic DNA fragment physically, so that they are then attached to flowing Pond);This permeabilization can be carried out for example by being incubated for chemical surfactant (such as nonionic detergent).

In the next step, the genomic DNA fragment from particle is then attached to the flow cell of sequencing device, so that The same zone of dispersion of flow cell is attached in two or more internally crosslinked segments of same particle.This can be related to adapter It is carried out in the multi-section of molecule point reaction;For example, adapter molecule attaches the genomic DNA fragment to particle, and described Adapter molecule may include the single stranded portion with the single-stranded Primers complementary on flow cell.The sequence from crosslinked fine particles can then be made It spreads and anneals with the different primers in the same area of flow cell.

Then gained sequencing molecule is sequenced, so that the degree of approach of the gained sequence on flow cell provides connection value, The sequence (for example, in some zone of dispersion and/or proximity value) on flow cell in close proximity wherein can be predicted to be originated from Same individual particles in primary sample.

Only by way of example, it by referring to the possible application in NIPT and cancer detection, can illustrate the present invention The advantages of:

For example, in oncology, the present invention can enable powerful new frame screen the early detection of cancer.It is several Group is seeking to develop cfDNA measurement, can detect the low-level Circulating DNA from infantile tumour before metastatic conversion (so-called " Circulating tumor DNA " or ctDNA).Describe making first is that passing through detection for the main method of cancer from non-cancer sample For " structural variant " (genetic amplification, missing or the transposition) that almost generally indicates of malignant tumour;However, " being divided by current Sub-count " frame, which detects such Large scale genetic event, needs to carry out cfDNA ultra-deep sequencing statistically has to realize The detection of meaning, and even need there are the ctDNA of sufficient amount in blood plasma to generate enough complete molecular signals, even if Unlimited sequencing depth with hypothesis is also such.

In contrast, the direct Molecular Evaluation of structure variation can be achieved in the present invention, has potential single molecule sensitivity: packet Containing " reset site " (for example, with another chromosome translocation and point therefore attached to it on item chromosome, or in list The point that gene or other chromosome segments have been amplified or have lacked in a chromosome) any structure variation by this method can be straight Detection is connect, because the circulation particle comprising resetting DNA may include the group for resetting the DNA fragmentation of site two sides flank itself, By this method then can communication with one another in informedness infer reset itself position and two participation genome positions Boundary both of the point in its every one end.

For generalities, how this can improve the cost-effectiveness and Absolute analysis sensitivity of universal screening for cancer, can provide The example of the single cycle particle of hypothesis, it includes the chromosome translocations from early stage cancer cell, and it includes cross over the transposition Left-half and right half part 1 megabasse in total DNA, the DNA by segment turn to accumulation cross over entire 1 megabasse section 10,000 individual segments different, 100 nucleotide is long.In order to use, current, be not only associated the method for segment The presence for detecting this translocation events needs the segment of the 100 single base-pairs to comprising accurate transposition site itself to carry out Sequencing, and its whole length is sequenced to detect true transposition site itself.Therefore, which needs following two Person: 1) format for efficiently being converted into read on sequenator by all 10,000 segments is (that is, in 10,000 segments It is most of successfully to handle and retain in entire DNA purifying and sequencing sample preparation procedure), subsequent 2) all 10,000 A segment must be sequenced by DNA sequencing method at least once with reliably the segment comprising transposition site is sequenced (that is, The sequencing of at least 1 megabasse must be carried out, or even speculates that all input molecular theory uniform samplings enter sequencing steps).Therefore, Need the sequencing for carrying out 1 megabasse to detect translocation events.

In contrast, for the presence for the fragment approach detection transposition being associated with high statistical confidence but use, only It needs that a small amount of input segment from the every side in transposition site itself is sequenced (to miss from such as statistical noise or mistake mapping Difference distinguishes " firmly believing " translocation events).It, can be to 10 segments of every side from transposition in order to provide high statistical confidence Sequence be sequenced;And since they need the position being only mapped in genome and do not survey over the whole length Sequence is to observe actual transposition itself, it is therefore desirable to which the sequence of only 50 base-pairs from each segment is sequenced.Always It, this generates total sequencings of 1000 base-pairs to require to detect the presence of transposition (than needed for the prior art 1,000,000 A base-pair reduces by 1000 times).

Other than about this significant benefit of opposite sequencing throughput and cost, this is also can be improved in the read method that is associated The achievable absolute sensitivity of a little cancer screening tests.Since for early stage (and therefore can be curable) cancer, circulation The absolute magnitude of middle Tumour DNA is very low, and the loss of sample DNA can significantly hinder test to imitate during sample treatment and sequencing preparation process Power, even if the sequencing depth with theory unlimited system is also such.It is consistent with examples detailed above, using current method, comprising easy Position site itself single DNA segment entire sample collection, processing and sequencing preparation method in need be retained and successfully Ground is processed, and is then successfully sequenced.However, all these steps cause " input " molecule of certain score from processing The sample Physical Loss or Damage (such as during being centrifuged or removing step) crossed, or simply otherwise not by subsequent step success Ground is processed/change (for example, without successfully expanding before being placed in DNA sequencer).Conversely, because of the invention is connected It is the sequencing of reality " input " molecule that read method only needs to be related to small scale, therefore such sample loss may be right The ultimate sensitivity finally measured has significantly reduced influence.

Other than its application in oncology and screening for cancer, the present invention can also be in the antenatal detection of Noninvasive Considerable new tool is realized in the field (noninvasive prenatal testing, NIPT).Developmental fetus (and packet Containing its placenta) flow into the DNA of fragmentation in maternal circulation, part of it is included in circulation particle.Similar to from ctDNA The problem of screening cancer, the foetal DNA of circulation only represent sub-fraction (most of circulations of Circulating DNA total in pregnant individuals DNA is normal mother body D NA).A sizable technological challenge of NIPT is around the actual foetal DNA of differentiation and parent DNA fragmentation (it will share identical nucleotide sequence, because they are the genetic origins of half Fetal genome).The one of NIPT The genome sequence that the short-movie section that a other technological challenge is related to the foetal DNA present in the circulation detects long range is (or prominent Become).

From same individual circulation particle the segment that is associated analysis present for substantially solve NIPT this two The strong frame of a technological challenge.By (about) half and the hereditary parent base of developmental fetus of Fetal genome Because (about) half sequence of group is identical, therefore, it is difficult to distinguish whether the given sequencing fragment with parental sequences can be by just Normal maternal tissue, or generated on the contrary by developmental fetal tissue.In contrast, (hereditary from father for paternal inheritance Parent) Fetal genome (about) half, the sequence variants for being present in male parent's genome but being not present in maternal gene group The molecular marker for identifying the fetus segment of these paternal inheritances is served as in the presence of (such as mononucleotide variant or other variants) (because only male parent's DNA sequence dna will be from those of gestation itself in circulation).

Therefore to from single loop fetus particle by chance comprising parent and male parent's sequence (such as from a specific mother Be heredity fetal chromosomal sequence, and the sequence of the second fetal chromosomal from paternal inheritance) both multiple The method that the ability of Duan Jinhang sequencing shows which parental sequences of the developmental fetus genetic of Direct Recognition: it is found common location Parental sequences in the particle for including also male parent's sequence can be predicted to be the parental sequences of fetus genetic, and on the contrary, not by It was found that the parental sequences with male parent's sequence common location can be predicted to be representative not by the parental sequences of fetus genetic.Pass through the skill Art can be concentrated from the sequence data of processing and specifically be filtered out by the major part of the normal mother body D NA Circulating DNA constituted, and And the sequence for being only certified as true foetal sequence can separate in information for further analyzing.

Since " fetus score " (by the score for all Circulating DNAs that fetus itself generates) that measures for NIPT is usually low It is 1% to 5% in 10%, and for some clinical samples, and due to " information gate " step in male parent's sequence source 100% " effective fetus score " (estimate wrong mapping error minimum) is generated, therefore have will for this fragment approach that is associated The signal-to-noise ratio of NIPT test improves the potentiality of one or two order of magnitude.Therefore, the present invention has the totality point for improving NIPT test The potentiality of analysis sensitivity and specificity and sequencing amount needed for significantly reducing the process, and also make NIPT test can be First Trimester (time point that the test that the score of fetus is sufficiently low, current has unacceptable false positive and false negative rate) into Row.

Importantly, the present invention provides in the sequence data of the Circulating DNA from the sequence form that is associated in information It is comparable quicker to generate can directly to carry out parser, calculating and/or statistical test on it for new, orthogonal dimension Sense and special heredity measurement.For example, not being the sequence total amount in the entire sample of evaluation between two chromosomes to measure fetus Chromosomal aneuploidy, but can directly assess the sequence that is associated (and/or its group or subgroup) with check for example map to it is specific Each of chromosome or chromosomal section be associated in information group sequence number.It can be compared and/or statistics is examined Test with the group that is associated of the sequences of more different presumption cell origins (for example, foetal sequence is compared between parental sequences, or The health tissues of presumption are compared between the cancer of presumption or malignant tissue), or evaluation is only in the water of the group that is associated of sequence Sequence signature or numerical characteristics (and it is not present in the level for not being associated sequence individually) present on flat, for example, it is specific The enrichment of the specific chromosome profile or accumulation of sequence or sequence group.

Other than it is used to detect the application of fetus particle sequence, this method also has in detection Fetal genome and exists Long range genetic sequence or series jump potentiality.Mode described in resetting with cancer gene group is roughly the same, if To leap genome rearrangement site (such as transposition or amplification or missing) and/or in its flank from the several of fetus particle DNA fragmentation is sequenced, even when being sequenced in itself without direct counterweight ranking point, the rearrangement of these types can also believed It is detected on breath.In addition, except genome rearrangement event, this method has information of " determining phase " in detection genes of individuals group region Potentiality.For example, if two mononucleotide variants are found at the difference in specific gene but by the genes of many kilobases Group distance separates, then this method makes it possible to assess the gene whether the two mononucleotide variants are located in Fetal genome On identical single copy or difference that whether they are each located in two of gene present in Fetal genome copies is copied On shellfish (i.e. whether they are located in identical haplotype).This function is for from the beginning single nucleotide mutation in Fetal genome The genetic evaluation of (it includes most of major developmental obstacle with genetic aetiology) and prognosis have special clinical efficacy.

Embodiment

226页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:两部分式中介探针

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!