Linker and method for optical detection and sequencing

文档序号:1894605 发布日期:2021-11-26 浏览:36次 中文

阅读说明:本技术 用于光学检测和测序的接头和方法 (Linker and method for optical detection and sequencing ) 是由 琳达·G·李 吉拉德·阿莫吉 史蒂文·孟肯 于 2020-02-18 设计创作,主要内容包括:本公开提供用于标记底物例如核苷酸、蛋白质、抗体、脂质和细胞的标记试剂。本文提供的标记试剂可包括荧光标记和半刚性接头。本文还提供了用于使用包含此类标记试剂的材料进行核酸测序的方法。(The present disclosure provides labeling reagents for labeling substrates such as nucleotides, proteins, antibodies, lipids, and cells. The labeling reagents provided herein can include a fluorescent label and a semi-rigid linker. Also provided herein are methods for nucleic acid sequencing using materials comprising such labeling reagents.)

1. A fluorescently labeled reagent, comprising:

(a) A fluorescent dye; and

(b) a linker attached to the fluorescent dye and configured to couple to a substrate for fluorescently labeling the substrate,

wherein the linker comprises (i) one or more water-solubilizing groups and (ii) two or more ring systems, wherein at least two of the two or more ring systems are connected to each other by no more than two atoms, and wherein the linker comprises a non-protein amino acid comprising the ring systems of the two or more ring systems.

2. The fluorescently labeled reagent of claim 1, wherein the linker comprises a plurality of amino acids.

3. The fluorescently labeled reagent of claim 2, wherein the plurality of amino acids comprises a plurality of non-protein amino acids.

4. The fluorescently labeled reagent of any of claims 1-3, wherein the linker comprises three or more hydroxyprolines.

5. The fluorescently labeled reagent of claim 4, wherein the linker comprises ten or more hydroxyprolines.

6. The fluorescently labeled reagent of any of claims 1-5, wherein the at least two of the two or more ring systems pass through sp 2The carbon atoms are linked to each other.

7. The fluorescently labeled reagent of any of claims 1-5, wherein the at least two of the two or more ring systems are directly connected to each other without an intervening atom.

8. The fluorescently labeled reagent of any of claims 1-7, wherein at least one of the one or more water-solubilizing groups is attached to a ring system of the two or more ring systems.

9. The fluorescently labeled reagent of any of claims 1-8, wherein at least one water-solubilizing group of the one or more water-solubilizing groups is a component of a ring system of the two or more ring systems.

10. The fluorescent labeling reagent of any of claims 1-9, wherein the one or more water-soluble groups are selected from pyridinium, imidazolium, quaternary ammonium groups, sulfonates, phosphates, alcohols, amines, imines, nitriles, amides, thiols, carboxylic acids, polyethers, aldehydes, boronic acids, and boronic esters.

11. The fluorescently labeled reagent of any of claims 1-10, wherein the fluorescently labeled reagent further comprises a cleavable group configured to be cleaved to separate the fluorescently labeled reagent or portion thereof from the substrate.

12. The fluorescently labeled reagent of claim 11, wherein the cleavable group is configured to be cleaved to separate a first portion of the fluorescently labeled reagent comprising the fluorescent dye and a first portion of the linker from a second portion of the fluorescently labeled reagent comprising a second portion of the linker.

13. The fluorescent labeling reagent of claim 11 or 12, wherein the cleavable group is selected from an azidomethyl group, a disulfide bond, a hydrocarbyl dithiomethyl group, and a 2-nitrobenzyloxy group.

14. The fluorescently labeled reagent of any of claims 11-13, wherein the cleavable group is cleavable by application of one or more members of the group: tris (2-carboxyethyl) phosphine (TCEP), Dithiothreitol (DTT), Tetrahydropyranyl (THP), Ultraviolet (UV) light, and combinations thereof.

15. The fluorescently labeled reagent of any of claims 1-14, wherein the linker comprises a linker moiety selected from the group consisting ofPart (c) of (a).

16. The fluorescently labeled reagent of any of claims 1-15, wherein the substrate is a nucleotide, a protein, a lipid, a cell, or an antibody.

17. The fluorescently labeled reagent of claim 16, wherein the substrate is a nucleotide and the linker is attached to the nucleotide through a nucleobase of the nucleotide.

18. The fluorescently labeled reagent of any of claims 1-15, wherein the substrate is a fluorescence quencher, a fluorescence donor, or a fluorescence acceptor.

19. A composition comprising a solution comprising a fluorescently labeled nucleotide, wherein the fluorescently labeled nucleotide comprises a fluorescent dye attached to the nucleotide through a linker, wherein the linker comprises (i) one or more water-solubilizing groups and (ii) two or more ring systems, wherein at least two of the two or more ring systems are attached to each other through no more than two atoms, and wherein the linker comprises a non-protein amino acid comprising the ring systems of the two or more ring systems.

20. The composition of claim 19, wherein the linker comprises a plurality of amino acids.

21. The composition of claim 20, wherein the linker comprises a plurality of hydroxyprolines.

22. The composition of any one of claims 19-21, wherein at least two ring systems of the two or more ring systems are sp-bridged2The carbon atoms are linked to each other.

23. The composition of any one of claims 19-22, wherein at least one of the one or more hydrosoluble groups is attached to a ring system of the two or more ring systems.

24. The composition of any one of claims 19-23, wherein the one or more water-solubilizing groups are selected from the group consisting of pyridinium, imidazolium, quaternary ammonium groups, sulfonates, phosphates, alcohols, amines, imines, nitriles, amides, thiols, carboxylic acids, polyethers, aldehydes, boronic acids, and boronic esters.

25. The composition of any one of claims 19-24, wherein the linker further comprises a cleavable group configured to be cleaved to separate the fluorescent dye from the nucleotide.

26. The composition of claim 25, wherein said cleavable group is selected from an azidomethyl group, a disulfide bond, a hydrocarbyl dithiomethyl group, and a 2-nitrobenzyloxy group.

27. The composition of any one of claims 19-26, wherein the solution comprises a plurality of fluorescently labeled nucleotides, wherein each fluorescently labeled nucleotide in the plurality of fluorescently labeled nucleotides comprises the same type of fluorescent dye, the same type of linker, and the same type of nucleotide.

28. The composition of claim 27, wherein each of the linkers of each of the plurality of fluorescently labeled nucleotides has the same molecular weight.

29. The composition of claim 27 or 28, wherein the solution further comprises a plurality of unlabeled nucleotides, wherein each nucleotide in the plurality of unlabeled nucleotides is of the same type as each of the nucleotides in the plurality of fluorescently labeled nucleotides.

30. The composition of claim 29, wherein the ratio of the plurality of fluorescently labeled nucleotides to the plurality of unlabeled nucleotides in the solution is at least about 1: 4.

31. The composition of claim 30, wherein the ratio is at least about 1: 1.

32. A method comprising providing the composition of any one of claims 19-31 to a template nucleic acid molecule coupled to a nucleic acid strand.

33. The method of claim 32, further comprising subjecting the template nucleic acid molecule and the composition to conditions sufficient to incorporate the fluorescently labeled nucleotides into the nucleic acid strand coupled to the template nucleic acid molecule.

34. The method of claim 33, further comprising detecting a signal from the fluorescently labeled nucleotide.

35. The method of any one of claims 32-34, further comprising contacting the fluorescently labeled nucleotides with a cleavage reagent configured to cleave the fluorescent dye from the nucleotides.

36. The method of claim 35, further comprising, after contacting the fluorescently labeled nucleotides with the cleaving agent, subjecting the template nucleic acid molecule and the composition to conditions sufficient to incorporate additional fluorescently labeled nucleotides into the nucleic acid strand coupled to the template nucleic acid molecule.

37. The method of any one of claims 32-36, wherein the template nucleic acid molecule is immobilized on a support.

38. A method, comprising: providing a fluorescently labeled reagent, wherein the fluorescently labeled reagent comprises a fluorescent dye and a linker attached to the fluorescent dye, wherein the linker comprises (i) one or more water-solubilizing groups and (ii) two or more ring systems, wherein at least two of the two or more ring systems are attached to each other by no more than two atoms, and wherein the linker comprises a non-protein amino acid comprising the ring systems of the two or more ring systems.

39. The method of claim 38, further comprising contacting the fluorescently labeled reagent with a substrate to produce a fluorescently labeled substrate, wherein the linker attached to the fluorescent dye is coupled to the substrate.

40. The method of claim 39, wherein the substrate is a nucleotide, a protein, a lipid, a cell, or an antibody.

41. The method of claim 39 or 40, further comprising contacting the fluorescently labeled substrate with a cleaving agent, wherein the cleaving agent is configured to cleave the fluorescently labeled reagent or a portion thereof from the fluorescently labeled substrate to produce a scarred substrate.

42. The method of claim 41, wherein said cleavage reagent is configured to cleave a cleavable group of said linker, wherein said cleavable group is selected from the group consisting of an azidomethyl group, a disulfide bond, a hydrocarbyl dithiomethyl group, and a 2-nitrobenzyloxy group.

43. The method of claim 41 or 42, further comprising, prior to generating the scarred substrate, subjecting the fluorescently labeled substrate and nucleic acid molecule to conditions sufficient to incorporate the fluorescently labeled substrate into the nucleic acid molecule.

44. The method of claim 43, further comprising, prior to generating the scarred substrate, subjecting an additional substrate and the nucleic acid molecule to conditions sufficient to incorporate the additional substrate into the nucleic acid molecule at a location adjacent to the substrate.

45. The method of claim 43, further comprising, after generating the scarred substrate, subjecting an additional substrate and the nucleic acid molecule to conditions sufficient to incorporate the additional substrate into the nucleic acid molecule at a location adjacent to the scarred substrate.

46. The method of claim 44 or 45, wherein the additional substrate does not comprise a fluorescent labeling agent.

47. The method of claim 44 or 45, wherein the additional substrate comprises a fluorescently labeled reagent.

48. The method of any one of claims 38-47, wherein the linker comprises a plurality of amino acids.

49. The method of claim 48, wherein the linker comprises a plurality of hydroxyprolines.

50. The method of any one of claims 38-49, wherein the at least two ring systems of the two or more ring systems are sp-via2The carbon atoms are linked to each other.

51. The method of any one of claims 38-50, wherein at least one of the one or more hydrosoluble groups is attached to a ring system of the two or more ring systems.

52. The method of any one of claims 38-51, wherein the one or more water-solubilizing groups are selected from pyridinium, imidazolium, quaternary ammonium groups, sulfonates, phosphates, alcohols, amines, imines, nitriles, amides, thiols, carboxylic acids, polyethers, aldehydes, boronic acids, and boronic esters.

53. A kit, comprising: a plurality of linkers, wherein a linker in the plurality of linkers comprises (i) one or more water-solubilizing groups and (ii) two or more ring systems, wherein at least two of the two or more ring systems are connected to each other by no more than two atoms, and wherein the linker comprises a non-protein amino acid comprising a ring system of the two or more ring systems.

54. The kit of claim 53, wherein the linker is linked to a fluorescent dye.

55. The kit of claim 53 or 54, wherein the linker comprises a plurality of amino acids.

56. The kit of claim 55, wherein the linker comprises a plurality of hydroxyprolines.

57. The kit of any one of claims 53-56, wherein the at least two of the two or more ring systems are sp-gated2The carbon atoms are linked to each other.

58. The kit of any one of claims 53-57, wherein the one or more water-solubilizing groups are selected from the group consisting of pyridinium, imidazolium, quaternary ammonium groups, sulfonates, phosphates, alcohols, amines, imines, nitriles, amides, thiols, carboxylic acids, polyethers, aldehydes, boronic acids, and boronic esters.

59. The kit of any one of claims 53-58, wherein the linker further comprises a cleavable group configured to be cleaved to separate a first portion of the linker from a second portion of the linker.

60. The kit of claim 59, wherein said cleavable group is selected from the group consisting of an azidomethyl group, a disulfide bond, a hydrocarbyl dithiomethyl group, and a 2-nitrobenzyloxy group.

61. The kit of any one of claims 53-60, wherein the linker comprises a linker selected from the group consisting ofPart (c) of (a).

62. The kit of any one of claims 53-61, wherein the linker is coupled to a substrate.

63. The kit of claim 62, wherein the substrate comprises a nucleotide, a protein, a lipid, a cell, or an antibody.

64. The kit of any one of claims 53-63, wherein the plurality of linkers comprises a first linker associated with a first substrate and a second linker associated with a second substrate, wherein the first substrate and the second substrate are of different types.

65. The kit of claim 64, wherein the first linker and the second linker comprise the same chemical structure.

66. The kit of claim 64 or 65, wherein the first substrate and the second substrate are nucleotides comprising different types of nucleobases.

67. The kit of any one of claims 64-66, wherein the kit further comprises a third linker associated with a third substrate and a fourth linker associated with a fourth substrate, wherein the first substrate, the second substrate, the third substrate, and the fourth substrate are of different types.

68. The kit of claim 67, wherein the first, second, third, and fourth substrates are nucleotides comprising different types of nucleobases.

69. The kit of claim 67 or 68, wherein the first linker and the third linker comprise different chemical structures.

70. An oligonucleotide molecule comprising the fluorescently labeled reagent of claim 1, or a derivative thereof.

71. The oligonucleotide molecule of claim 70, further comprising one or more additional fluorescent labeling reagents.

72. The oligonucleotide molecule of claim 71, wherein the fluorescently labeling agent and the one or more additional fluorescently labeling agents comprise linkers having the same chemical structure.

73. The oligonucleotide molecule of claim 71 or 72, wherein the fluorescently labeled reagent and the one or more additional fluorescently labeled reagent comprise a fluorescent dye having the same chemical structure.

74. The oligonucleotide molecule of any one of claims 71-73, wherein the fluorescently labeled reagent and the one or more additional fluorescently labeled reagent are associated with the same type of substrate, wherein the substrate is a nucleotide.

75. The oligonucleotide molecule of claim 74, wherein the fluorescently labeling reagent and the one or more additional fluorescently labeling reagents are linked to a nucleobase of the nucleotide.

76. The oligonucleotide molecule of claim 74 or 75, wherein the fluorescently labeling agent and the one or more additional fluorescently labeling agents are linked to adjacent nucleotides of the oligonucleotide molecule.

77. The oligonucleotide molecule of claim 74 or 75 wherein the fluorescent labeling reagent and the one or more additional fluorescent labeling reagents are linked to nucleotides of the oligonucleotide molecule that are separated by one or more nucleotides that are not linked to a fluorescent labeling reagent.

78. The oligonucleotide molecule of any one of claims 71-77, wherein the linker of the fluorescent labeling reagent comprises a cleavable group configured to be cleaved to separate the fluorescent dye from a substrate associated therewith.

79. A method, comprising:

(a) contacting a nucleic acid molecule with a solution comprising a plurality of non-terminating nucleotides under conditions sufficient to incorporate a first nucleotide and a second nucleotide of the plurality of non-terminating nucleotides into a growing strand complementary to the nucleic acid molecule, wherein the first nucleotide is labeled, and wherein at least about 20% of the plurality of nucleotides are labeled nucleotides;

(b) detecting one or more signals or signal changes from the first nucleotide, wherein the one or more signals or signal changes indicate incorporation of the first nucleotide; and

(c) resolving the one or more signals or signal changes to determine the sequence of the nucleic acid molecule.

80. The method of claim 79, wherein the plurality of non-terminating nucleotides comprise nucleotides of the same canonical base type.

81. The method of claim 79, wherein the first nucleotide comprises a fluorescent dye.

82. The method of claim 81, wherein the fluorescent dye is cleavable.

83. The method of claim 82, further comprising:

(i) cleaving the fluorescent dye;

(ii) contacting the nucleic acid molecule with a second solution comprising a second plurality of non-terminating nucleotides under conditions sufficient to incorporate a third nucleotide of the second plurality of non-terminating nucleotides into the growing strand, wherein at least about 20% of the second plurality of non-terminating nucleotides are labeled nucleotides, wherein the third nucleotide is a labeled nucleotide;

(iii) detecting one or more second signals or signal changes from the third nucleotide; and

(iv) resolving the one or more second signals or signal changes to determine a second sequence of the nucleic acid molecule.

84. The method of claim 83, wherein the first nucleotide and the third nucleotide are different canonical base types.

85. The method of claim 83, wherein the third nucleotide comprises the fluorescent dye.

86. The method of claim 79, further comprising:

(i) contacting the nucleic acid molecule with a second solution comprising a second plurality of non-terminating nucleotides under conditions sufficient to incorporate a third nucleotide of the second plurality of non-terminating nucleotides into the growing strand, wherein at least about 20% of the second plurality of nucleotides are labeled nucleotides, wherein the third nucleotide is a labeled nucleotide;

(ii) Detecting one or more second signals or signal changes from the third nucleotide; and

(iii) resolving the one or more second signals or signal changes to determine a second sequence of the nucleic acid molecule.

87. The method of claim 86, wherein the first nucleotide and the third nucleotide are different canonical base types.

88. The method of claim 86, wherein the third nucleotide comprises the fluorescent dye.

89. The method of claim 88, wherein the contacting in (i) is performed without cleavage of a fluorescent dye from the first nucleotide.

90. The method of claim 88, further comprising repeating (i) - (iii) at least 5 times, each time using a different non-terminating nucleotide solution comprising at least 20% labeled nucleotides without cleaving fluorescent dye from the first nucleotide.

91. The method of claim 79, wherein at least about 50%, 70%, 80%, 90%, 95%, or 99% of the plurality of non-terminating nucleotides are labeled nucleotides.

92. The method of claim 79, wherein substantially all of the plurality of non-terminating nucleotides are labeled nucleotides.

93. The method of claim 79, wherein the resolving in (c) comprises determining the number of consecutive nucleotides from the solution incorporated into the growing strand.

94. The method of claim 93, wherein the amount is selected from the group consisting of: 2. 3, 4, 5, 6, 7 or 8 nucleotides.

95. The method of claim 93, wherein the resolving in (c) comprises processing tolerances of the solution.

96. The method of claim 79, wherein the second nucleotide is unlabeled.

97. The method of claim 79, wherein the second nucleotide is labeled.

98. The method of claim 79, wherein the first nucleotide and the second nucleotide are the same canonical base type.

99. The method of claim 79, wherein the first nucleotide and the second nucleotide are different canonical base types.

Background

Detection, quantification, and sequencing of cells and biomolecules may be important for molecular biology and medical applications (e.g., diagnostics). Genetic testing may be useful for a number of diagnostic methods. For example, disorders caused by rare genetic alterations (e.g., sequence variations) or changes in epigenetic markers (e.g., cancer and partial or complete aneuploidy) can be detected or more accurately characterized with deoxyribonucleic acid (DNA) or ribonucleic acid (RNA) sequence information.

Nucleic acid sequencing is a process that can be used to provide sequence information for a nucleic acid sample. Such sequence information may be helpful in diagnosing and/or treating a subject having a condition. For example, a subject's nucleic acid sequence can be used to identify, diagnose, and potentially develop treatments for genetic diseases. As another example, research into pathogens may lead to treatment of contagious diseases.

Nucleic acid sequencing may include the use of fluorescently labeled moieties. Such moieties may be labeled with an organic fluorescent dye. The sensitivity of the detection scheme can be improved by using dyes with high extinction coefficients and quantum yields, the product of these properties can be referred to as the "brightness" of the dye. Dye brightness may be diminished by quenching phenomena, including quenching by biological materials, quenching by proximity to other dyes, and quenching by solvents. Other routes to loss of brightness include photobleaching, reactivity to molecular oxygen, and chemical decomposition.

Disclosure of Invention

The present disclosure provides improved optically (e.g., fluorescently) labeled reagents and methods of nucleic acid processing, including the use of optically (e.g., fluorescently) labeled moieties. The materials and methods provided herein can include the use of organic fluorescent dyes. The materials provided herein can allow optimized molecular quenching to facilitate efficient nucleic acid processing and detection. Molecular quenching mechanisms can include photoinduced electron transfer, photoinduced hole transfer, Forster energy transfer, Dexter quenching, and the like. Many types of quenching general solution requires the physical separation of the dye from the quencher moiety, but existing solutions have advantages and disadvantages in terms of ease of use, cost, solvent dependence and polydispersity. Accordingly, the present disclosure recognizes a need for materials and methods that address these limitations and provide materials that include improved linker moieties.

In one aspect, the present disclosure provides a fluorescently labeled reagent comprising: (a) a fluorescent dye; (b) a linker attached to the fluorescent dye and configured to couple to a substrate to fluorescently label the substrate, wherein the linker comprises (i) one or more water-solubilizing groups and (ii) two or more ring systems, wherein at least two of the two or more ring systems are attached to each other through no more than two atoms, and wherein the linker comprises a non-protein amino acid comprising the ring systems of the two or more ring systems.

In some embodiments, the fluorescently labeled reagent coupled to the substrate is configured to emit a fluorescent signal.

In some embodiments, the linker is configured to establish a functional length of at least about 0.5 nanometers (nm) between the fluorescent dye and the substrate when the linker and the substrate are associated. In some embodiments, the functional length varies based on one or more members selected from the group consisting of temperature, solvent, pH, and salt concentration of the solution comprising the fluorescent labeling agent. In some embodiments, the functional length is about 0.5 nanometers (nm) to 50 nm.

In some embodiments, the linker is configured to form a bond with a plurality of fluorescent dyes or substrates.

In some embodiments, the linker comprises a plurality of amino acids. In some embodiments, the plurality of amino acids comprises a plurality of non-protein amino acids. In some embodiments, the plurality of amino acids comprises a plurality of hydroxyprolines. In some embodiments, the plurality of amino acids comprises three or more hydroxyprolines. In some embodiments, the plurality of amino acids comprises ten or more hydroxyprolines.

In some embodiments, the plurality of amino acids comprises a homopolymer. In some embodiments, the homopolymer comprises a repeat unit that is amino acid. In some embodiments, the repeating unit is hydroxyproline. In some embodiments, a homopolymer of the linker comprises three or more hydroxyprolines. In some embodiments, a homopolymer of the linker comprises ten or more hydroxyprolines.

In some embodiments, the linker comprises a copolymer. In some embodiments, the copolymer comprises two or more repeat units, wherein at least one of the two or more repeat units is an amino acid. In some embodiments, the amino acid is a non-proteinogenic amino acid.

In some embodiments, the two or more ring systems comprise an aromatic ring or an aliphatic ring. In some embodiments, the two or more ring systems comprise rings having 5 or 6 members.

In some embodiments, at least two of the two or more ring systems are joined by one or two sp3The carbon atoms are linked to each other. In some embodiments, at least two of the two or more ring systems are through sp2The carbon atoms are linked to each other. In some embodiments, at least two of the two or more ring systems are directly connected to each other without an intervening carbon atom.

In some embodiments, at least two of the two or more ring systems comprise a water-solubilizing group of the one or more water-solubilizing groups. In some embodiments, at least one of the one or more water-solubilizing groups is attached to a ring system of two or more ring systems. In some embodiments, at least one of the one or more water-solubilizing groups is part of a ring system of two or more ring systems. In some embodiments, at least one of the one or more water-solubilizing groups is positively charged. In some embodiments, the one or more water-solubilizing groups are selected from pyridinium, imidazolium, quaternary ammonium groups, sulfonates, phosphates, alcohols, amines, imines, nitriles, amides, thiols, carboxylic acids, polyethers, aldehydes, boronic acids, and boronic esters. In some embodiments, the one or more water-solubilizing groups reduce the logP of the fluorescently labeled reagent. In some embodiments, the fluorescent labeling reagent comprises more ring systems than water-solubilizing groups.

In some embodiments, the linker is configured to form a covalent bond with the substrate. In some embodiments, the linker is configured to form a non-covalent bond with the substrate.

In some embodiments, the fluorescently labeled reagent further comprises a cleavable group configured to be cleaved to separate the fluorescently labeled reagent or portion thereof from the substrate. In some embodiments, the cleavable group is configured to be cleaved to separate a first portion of the fluorescent labeling reagent comprising a fluorescent dye and a first portion of the linker from a second portion of the fluorescent labeling reagent comprising a second portion of the linker. In some embodiments, the cleavable group is selected from an azidomethyl group, a disulfide bond, a hydrocarbyl dithiomethyl group, and a 2-nitrobenzyloxy group. In some embodiments, the cleavable group is cleavable by the application of one or more members of the group: tris (2-carboxyethyl) phosphine (TCEP), Dithiothreitol (DTT), Tetrahydropyranyl (THP), Ultraviolet (UV) light, and combinations thereof. In some embodiments, the linker comprises a linker group selected fromPart (c) of (a).

In some embodiments, the fluorescent labeling reagent is configured to emit a signal between about 625 nanometers (nm) and 740 nm. In some embodiments, the fluorescent labeling agent is configured to emit a signal between about 500 nanometers (nm) and 565 nm.

In some embodiments, the substrate is a protein, lipid, cell, or antibody. In some embodiments, the substrate is a nucleotide. In some embodiments, the linker is attached to the nucleotide through the nucleobase of the nucleotide. In some embodiments, the substrate is a fluorescence quencher, fluorescence donor, or fluorescence acceptor.

In another aspect, the present disclosure provides a composition comprising a solution comprising a fluorescently labeled nucleotide, wherein the fluorescently labeled nucleotide comprises a fluorescent dye attached to the nucleotide by a linker, wherein the linker comprises (i) one or more water-solubilizing groups and (ii) two or more ring systems, wherein at least two of the two or more ring systems are attached to each other by no more than two atoms, and wherein the linker comprises a non-protein amino acid comprising the ring systems of the two or more ring systems.

In some embodiments, the fluorescently labeled nucleotide is configured to emit a fluorescent signal.

In some embodiments, the linker comprises a plurality of amino acids. In some embodiments, the plurality of amino acids comprises a plurality of non-protein amino acids. In some embodiments, the linker comprises a plurality of hydroxyprolines.

In some embodiments, at least two ring systems of the two or more ring systems are sp through2The carbon atoms are linked to each other. In some embodiments, at least two ring systems of the two or more ring systems are directly connected to each other without an intervening carbon atom.

In some embodiments, at least one of the one or more water-solubilizing groups is attached to a ring system of two or more ring systems. In some embodiments, the one or more water-solubilizing groups are selected from pyridinium, imidazolium, quaternary ammonium groups, sulfonates, phosphates, alcohols, amines, imines, nitriles, amides, thiols, carboxylic acids, polyethers, aldehydes, boronic acids, and boronic esters.

In some embodiments, the linker further comprises a cleavable group configured to be cleaved to separate the fluorescent dye from the nucleotide. In some embodiments, the cleavable group is selected from an azidomethyl group, a disulfide bond, a hydrocarbyl dithiomethyl group, and a 2-nitrobenzyloxy group.

In some embodiments, the solution comprises a plurality of fluorescently labeled nucleotides, wherein each fluorescently labeled nucleotide in the plurality of fluorescently labeled nucleotides comprises the same type of fluorescent dye, the same type of linker, and the same type of nucleotide. In some embodiments, each linker of each fluorescently labeled nucleotide in the plurality of fluorescently labeled nucleotides has the same molecular weight. In some embodiments, the solution further comprises a plurality of unlabeled nucleotides, wherein each nucleotide of the plurality of unlabeled nucleotides is of the same type as each nucleotide of the plurality of fluorescently labeled nucleotides. In some embodiments, the ratio of the plurality of fluorescently labeled nucleotides to the plurality of unlabeled nucleotides in the solution is at least about 1: 4. In some embodiments, the ratio is at least about 1: 1.

The present disclosure also provides a method comprising providing a composition described herein to a template nucleic acid molecule coupled to a nucleic acid strand.

In some embodiments, the method further comprises subjecting the template nucleic acid molecule and the composition to conditions sufficient to incorporate the fluorescently labeled nucleotide into the nucleic acid strand coupled to the template nucleic acid molecule. In some embodiments, the composition further comprises a polymerase, wherein the polymerase incorporates the fluorescently labeled nucleotides into the nucleic acid strand.

In some embodiments, the method further comprises detecting a signal from the fluorescently labeled nucleotide.

In some embodiments, the method further comprises contacting the fluorescently labeled nucleotides with a cleavage reagent configured to cleave the fluorescent dye from the nucleotides. In some embodiments, the cleavage agent is configured to cleave the linker to provide a nucleotide attached to a portion of the linker. In some embodiments, the linker moiety attached to the nucleotide comprises a thiol moiety, an aromatic moiety, or a combination thereof.

In some embodiments, the method further comprises, after contacting the fluorescently labeled nucleotides with a cleavage agent, subjecting the template nucleic acid molecule and the composition to conditions sufficient to incorporate additional fluorescently labeled nucleotides into the nucleic acid strand coupled to the template nucleic acid molecule.

In some embodiments, the template nucleic acid molecule is immobilized on a support.

In another aspect, the present disclosure provides a method comprising providing a fluorescently labeled reagent, wherein the fluorescently labeled reagent comprises a fluorescent dye and a linker attached to the fluorescent dye, wherein the linker comprises (i) one or more water-solubilizing groups and (ii) two or more ring systems, wherein at least two of the two or more ring systems are attached to each other through no more than two atoms, and wherein the linker comprises a non-proteinogenic amino acid that comprises the ring systems of the two or more ring systems.

In some embodiments, the method further comprises contacting a fluorescently labeled reagent with the substrate to produce a fluorescently labeled substrate, wherein the linker attached to the fluorescent dye is coupled to the substrate. In some embodiments, the substrate is a nucleotide. In some embodiments, the substrate is a protein, lipid, cell, or antibody. In some embodiments, the fluorescently labeled substrate is configured to emit a fluorescent signal.

In some embodiments, the method further comprises contacting the fluorescently labeled substrate with a cleavage reagent, wherein the cleavage reagent is configured to cleave the fluorescently labeled reagent or a portion thereof from the fluorescently labeled substrate to generate a scarred substrate. In some embodiments, the cleavage reagent is configured to cleave a cleavable group of the linker, wherein the cleavable group is selected from the group consisting of an azidomethyl group, a disulfide bond, a hydrocarbyl dithiomethyl group, and a 2-nitrobenzyloxy group. In some embodiments, the scarred substrate comprises a thiol moiety, an aromatic moiety, or a combination thereof.

In some embodiments, the method further comprises, prior to generating the scar substrate, subjecting the fluorescently labeled substrate and the nucleic acid molecule to conditions sufficient to incorporate the fluorescently labeled substrate into the nucleic acid molecule. In some embodiments, a fluorescently labeled substrate is incorporated into a nucleic acid molecule using a polymerase.

In some embodiments, the method further comprises, prior to generating the scar substrate, subjecting the additional substrate and the nucleic acid molecule to conditions sufficient to incorporate the additional substrate into the nucleic acid molecule at a location adjacent to the fluorescently labeled substrate. In some embodiments, the additional substrate does not comprise a fluorescent labeling agent. In some embodiments, the additional substrate comprises a fluorescent labeling agent.

In some embodiments, the method further comprises, after generating the scarred substrate, subjecting the additional substrate and the nucleic acid molecule to conditions sufficient to incorporate the additional substrate into the nucleic acid molecule at a location adjacent to the scarred substrate. In some embodiments, the additional substrate does not comprise a fluorescent labeling agent. In some embodiments, the additional substrate comprises a fluorescent labeling agent.

In some embodiments, the nucleic acid molecule is immobilized on a support.

In some embodiments, the linker comprises a plurality of amino acids. In some embodiments, the plurality of amino acids comprises a plurality of non-protein amino acids. In some embodiments, the linker comprises a plurality of hydroxyprolines.

In some embodiments, at least two ring systems of the two or more ring systems are sp through2The carbon atoms are linked to each other. In some embodiments, at least two ring systems of the two or more ring systems are directly connected to each other without an intervening carbon atom.

In some embodiments, at least one of the one or more water-solubilizing groups is attached to a ring system of two or more ring systems. In some embodiments, the one or more water-solubilizing groups are selected from pyridinium, imidazolium, quaternary ammonium groups, sulfonates, phosphates, alcohols, amines, imines, nitriles, amides, thiols, carboxylic acids, polyethers, aldehydes, boronic acids, and boronic esters.

In another aspect, the present disclosure provides a kit comprising: a plurality of linkers, wherein a linker in the plurality of linkers comprises (i) one or more water-solubilizing groups and (ii) two or more ring systems, wherein at least two of the two or more ring systems have no more than two sp passes 3The carbon atoms are linked to each other, and wherein the linker comprises a non-proteinogenic amino acid comprising a ring system of said two or more ring systems.

In some embodiments, the linker comprises a plurality of amino acids. In some embodiments, the plurality of amino acids comprises a plurality of non-protein amino acids. In some embodiments, the linker comprises a plurality of hydroxyprolines.

In some embodiments, two or more ring systemsThrough sp of at least two ring systems2The carbon atoms are linked to each other. In some embodiments, at least two ring systems of the two or more ring systems are directly connected to each other without an intervening carbon atom.

In some embodiments, at least one of the one or more water-solubilizing groups is attached to a ring system of two or more ring systems. In some embodiments, the one or more water-solubilizing groups are selected from pyridinium, imidazolium, quaternary ammonium groups, sulfonates, phosphates, alcohols, amines, imines, nitriles, amides, thiols, carboxylic acids, polyethers, aldehydes, boronic acids, and boronic esters.

In some embodiments, the linker further comprises a cleavable group configured to be cleaved to separate the first portion of the linker from the second portion of the linker. In some embodiments, the cleavable group is selected from an azidomethyl group, a disulfide bond, a hydrocarbyl dithiomethyl group, and a 2-nitrobenzyloxy group. In some embodiments, the cleavable group is cleavable by the application of one or more members of the group: tris (2-carboxyethyl) phosphine (TCEP), Dithiothreitol (DTT), Tetrahydropyranyl (THP), Ultraviolet (UV) light, and combinations thereof. In some embodiments, the linker comprises a linker group selected from Part (c) of (a).

In some embodiments, the linker is attached to a fluorescent dye.

In some embodiments, the linker is associated with a substrate. In some embodiments, the substrate comprises a protein, lipid, cell, or antibody. In some embodiments, the substrate comprises a nucleotide.

In some embodiments, the plurality of linkers comprises a first linker associated with a first substrate and a second linker associated with a second substrate, wherein the first substrate and the second substrate are of different types. In some embodiments, the first linker and the second linker comprise the same chemical structure. In some embodiments, the first substrate and the second substrate are nucleotides comprising different types of nucleobases. In some embodiments, the kit further comprises a third linker associated with a third substrate and a fourth linker associated with a fourth substrate, wherein the first, second, third, and fourth substrates are of different types. In some embodiments, the first, second, third and fourth substrates are nucleotides comprising different types of nucleobases. In some embodiments, the first linker and the third linker comprise different chemical structures. In some embodiments, the first linker and the third linker comprise the same chemical group. In some embodiments, the same chemical group comprises a disulfide bond.

In another aspect, the present disclosure provides an oligonucleotide molecule or derivative thereof comprising a fluorescent labeling agent described herein.

In some embodiments, the oligonucleotide molecule further comprises one or more additional fluorescent labeling reagents. In some embodiments, the fluorescently labeling agent and the one or more additional fluorescently labeling agents comprise linkers having the same chemical structure. In some embodiments, the fluorescent labeling reagent and the one or more additional fluorescent labeling reagents comprise fluorescent dyes having the same chemical structure. In some embodiments, the fluorescent labeling agent and the one or more additional fluorescent labeling agents are associated with the same type of substrate, wherein the substrate is a nucleotide. In some embodiments, the fluorescent labeling reagent and one or more additional fluorescent labeling reagents are linked to the nucleobase of the nucleotide. In some embodiments, the fluorescent labeling reagent and the one or more additional fluorescent labeling reagents are linked to adjacent nucleotides of the oligonucleotide molecule. In some embodiments, the fluorescent labeling agent and one or more additional fluorescent labeling agents are attached to nucleotides of the oligonucleotide molecule that are separated by one or more nucleotides that are not attached to the fluorescent labeling agent. In some embodiments, the linker of the fluorescently labeled reagent comprises a cleavable group configured to be cleaved to separate the fluorescent dye from a substrate associated therewith. In some embodiments, the fluorescent labeling reagent is configured to emit a fluorescent signal.

In one aspect, the present disclosure provides a method comprising: (a) contacting a nucleic acid molecule with a solution comprising a plurality of nucleotides, wherein at least about 20% of the plurality of nucleotides are labeled nucleotides, under conditions sufficient to incorporate a first labeled nucleotide and a second labeled nucleotide of the plurality of nucleotides into a growing strand complementary to the nucleic acid molecule; (b) detecting one or more signals or signal changes from the first labeled nucleotide and the second labeled nucleotide, wherein the one or more signals or signal changes indicate incorporation of the first labeled nucleotide and the second labeled nucleotide; and (c) resolving the one or more signals or signal changes to determine the sequence of the nucleic acid molecule.

In some embodiments, the first labeled nucleotide and the second labeled nucleotide are the same canonical base type. In some embodiments, the first labeled nucleotide comprises a fluorescent dye. In some embodiments, the second labeled nucleotide comprises a fluorescent dye. In some embodiments, the fluorescent dye is cleavable. In some embodiments, the method further comprises (i) cleaving the fluorescent dye; (ii) contacting the nucleic acid molecule with a second solution comprising a second plurality of nucleotides, wherein at least about 20% of the second plurality of nucleotides are labeled nucleotides, under conditions sufficient to incorporate a third labeled nucleotide of the second plurality of nucleotides into the growing strand; (iii) detecting one or more second signals or signal changes from the third labeled nucleotide; and (iv) resolving the one or more second signals or signal changes to determine a second sequence of the nucleic acid molecule. In some embodiments, the first labeled nucleotide and the third labeled nucleotide are different canonical base types. In some embodiments, the third labeled nucleotide comprises a fluorescent dye.

In some embodiments, the method further comprises (i) contacting the nucleic acid molecule with a second solution comprising a second plurality of nucleotides, wherein at least about 20% of the second plurality of nucleotides are labeled nucleotides, under conditions sufficient to incorporate a third labeled nucleotide of the second plurality of nucleotides into the growing strand; (ii) detecting one or more second signals or signal changes from the third labeled nucleotide; and (iii) resolving the one or more second signals or signal changes to determine a second sequence of the nucleic acid molecule. In some embodiments, the first labeled nucleotide and the third labeled nucleotide are different canonical base types. In some embodiments, the third labeled nucleotide comprises a fluorescent dye. In some embodiments, the contacting in (i) is performed without cleaving the fluorescent dye from the first labeled nucleotide or the second labeled nucleotide. In some embodiments, the method further comprises repeating (i) - (iii) at least 5 times, each time using a different nucleotide solution comprising at least 20% labeled nucleotide, without cleaving the fluorescent dye from the first labeled nucleotide or the second labeled nucleotide.

In some embodiments, at least about 50%, 70%, 80%, 90%, 95%, or 99% of the plurality of nucleotides are labeled nucleotides. In some embodiments, substantially all of the plurality of nucleotides are labeled nucleotides. In some embodiments, the resolving in (c) comprises determining the number of consecutive nucleotides incorporated into the growing strand from the solution. In some embodiments, the number is selected from 2, 3, 4, 5, 6, 7, or 8 nucleotides. In some embodiments, resolving in (c) comprises processing solution tolerances.

In some embodiments, after (a), a third nucleotide of the plurality of nucleotides has been incorporated into the growing strand. In some embodiments, the third nucleotide is unlabeled. In some embodiments, the third nucleotide is labeled. In some embodiments, the first labeled nucleotide and the third labeled nucleotide are the same canonical base type. In some embodiments, the first labeled nucleotide and the third nucleotide are different canonical base types.

In one aspect, the present disclosure provides a method comprising: (a) contacting the nucleic acid molecule with a solution comprising a plurality of non-terminating nucleotides under conditions sufficient to incorporate a first nucleotide and a second nucleotide of the plurality of non-terminating nucleotides into a growing strand complementary to the nucleic acid molecule, wherein the first nucleotide is labeled, and wherein at least about 20% of the plurality of nucleotides are labeled nucleotides; (b) detecting one or more signals or signal changes from the first nucleotide, wherein the one or more signals or signal changes indicate incorporation of the first nucleotide; and (c) resolving the one or more signals or signal changes to determine the sequence of the nucleic acid molecule.

In some embodiments, the plurality of non-terminating nucleotides comprise nucleotides of the same canonical base type. In some embodiments, the first nucleotide comprises a fluorescent dye. In some embodiments, the fluorescent dye is cleavable. In some embodiments, the method further comprises: (i) cutting the fluorescent dye; (ii) contacting the nucleic acid molecule with a second solution comprising a second plurality of non-terminating nucleotides, wherein at least about 20% of the second plurality of non-terminating nucleotides are labeled nucleotides, wherein the third nucleotide is a labeled nucleotide, under conditions sufficient to incorporate a third nucleotide of the second plurality of non-terminating nucleotides into the growing strand; (iii) detecting one or more second signals or signal changes from the third nucleotide; and (iv) resolving the one or more second signals or signal changes to determine a second sequence of the nucleic acid molecule. In some embodiments, the first nucleotide and the third nucleotide are different canonical base types. In some embodiments, the third nucleotide comprises a fluorescent dye.

In some embodiments, the method further comprises: (i) contacting the nucleic acid molecule with a second solution comprising a second plurality of non-terminating nucleotides, wherein at least about 20% of the second plurality of nucleotides are labeled nucleotides, wherein the third nucleotide is a labeled nucleotide, under conditions sufficient to incorporate a third nucleotide of the second plurality of non-terminating nucleotides into the growing strand; (ii) detecting one or more second signals or signal changes from the third nucleotide; and (iii) resolving the one or more second signals or signal changes to determine a second sequence of the nucleic acid molecule. In some embodiments, the first nucleotide and the third nucleotide are different canonical base types. In some embodiments, the third nucleotide comprises a fluorescent dye. In some embodiments, the contacting in (i) is performed without cleavage of the fluorescent dye from the first nucleotide. In some embodiments, the method further comprises repeating (i) - (iii) at least 5 times, each time using a different non-terminating nucleotide solution comprising at least 20% of the labeled nucleotide without cleaving the fluorescent dye from the first nucleotide.

In some embodiments, at least about 50%, 70%, 80%, 90%, 95%, or 99% of the plurality of non-terminating nucleotides are labeled nucleotides. In some embodiments, substantially all of the plurality of non-terminating nucleotides are labeled nucleotides. In some embodiments, the resolving in (c) comprises determining the number of consecutive nucleotides incorporated into the growing strand from the solution. In some embodiments, the number is selected from 2, 3, 4, 5, 6, 7, or 8 nucleotides. In some embodiments, resolving in (c) comprises processing solution tolerances.

In some embodiments, the second nucleotide is unlabeled. In some embodiments, the second nucleotide is labeled. In some embodiments, the first nucleotide and the second nucleotide are the same canonical base type. In some embodiments, the first nucleotide and the second nucleotide are different canonical base types.

In one aspect, the present disclosure provides a fluorescently labeled reagent comprising: (a) a fluorescent dye; and (b) a linker attached to the fluorescent dye and capable of associating with the substrate to fluorescently label the substrate, wherein the linker comprises (i) one or more water-solubilizing groups and (ii) two or more ring systems, wherein the two or more ring systems pass through no more than two sp 3The carbon atoms are linked to each other. In some embodiments, the linker is configured to establish a functional length of at least about 0.5 nanometers (nm) between the fluorescent dye and the substrate when the linker and the substrate are associated.

In some embodiments, the functional length is measured in solution. In some embodiments, the fluorescently labeled reagent coupled to the substrate is capable of emitting a fluorescent signal in solution. In some embodiments, the functional length varies based on the temperature, solvent, pH, or salt concentration of the solution.

In some embodiments, the functional length is about 0.5 to 50 nm.

In some embodiments, the linker is capable of forming a bond with a plurality of fluorescent dyes and/or substrates.

In some embodiments, the linker has a defined molecular weight.

In some embodiments, the linker comprises a polymer having regular repeating units. In some embodiments, the linker is a copolymer without regular repeating units.

In some embodiments, the two or more ring systems comprise an aromatic ring or an aliphatic ring. In some embodiments, the two or more ring systems comprise rings having 5 or 6 members. In some embodiments, at least one of the two or more ring systems comprises hydroxyproline.

In some embodiments, two or more ring systems are through one or two sp3The carbon atoms are linked to each other. In some embodiments, two or more ring systems are directly connected to each other without an intervening carbon atom.

In some embodiments, each of the two or more ring systems comprises a water-solubilizing group. In some embodiments, the fluorescent labeling reagent comprises more ring systems than water-solubilizing groups. In some embodiments, at least one of the one or more water-solubilizing groups is attached to a ring system of two or more ring systems. In some embodiments, at least one of the one or more water-solubilizing groups is part of a ring system of two or more ring systems. In some embodiments, at least one of the one or more water-solubilizing groups is positively charged. In some embodiments, the one or more water-solubilizing groups are selected from pyridinium, imidazolium, quaternary ammonium groups, sulfonates, phosphates, alcohols, amines, imines, nitriles, amides, thiols, carboxylic acids, polyethers, aldehydes, boronic acids, and boronic esters. In some embodiments, the one or more water-solubilizing groups reduce the logP of the fluorescently labeled reagent.

In some embodiments, the substrate can be associated with one or more different portions of the fluorescently labeled reagent.

In some embodiments, the linker is capable of forming a covalent bond with the substrate.

In some embodiments, the linker is capable of forming a non-covalent bond with the substrate. In some embodiments, the non-covalent bond is a biotin-streptavidin bond.

In some embodiments, the fluorescently labeled reagent coupled to the substrate is capable of emitting a fluorescent signal that is proportional to the amount of fluorescently labeled reagent associated with the substrate.

In some embodiments, the fluorescently labeled reagent further comprises a cleavable group that can be cleaved to separate the fluorescently labeled reagent or portion thereof from the substrate. In some embodiments, cleavage of the cleavable group leaves a scar group associated with the substrate. In some embodiments, the cleavable group is an azidomethyl group capable of being cleaved by tris (2-carboxyethyl) phosphine (TCEP), Dithiothreitol (DTT), or Tetrahydropyranyl (THP) to leave a hydroxyl scar group. In some embodiments, the cleavable group is a disulfide bond capable of being cleaved by tris (2-carboxyethyl) phosphine (TCEP), Dithiothreitol (DTT), or Tetrahydropyranyl (THP) to leave a thiol scar group. In some embodiments, the cleavable group is a hydrocarbyl dithiomethyl group capable of being cleaved by tris (2-carboxyethyl) phosphine (TCEP), Dithiothreitol (DTT), or Tetrahydropyranyl (THP) to leave a hydroxyl scar group. In some embodiments, the cleavable group is a 2-nitrobenzyloxy group capable of being cleaved by Ultraviolet (UV) light to leave a hydroxyl scar group.

In some embodiments, the fluorescent dye is Atto 633.

In some embodiments, the substrate to be labeled is a protein, lipid, cell, or antibody. In some embodiments, the substrate is a nucleotide. In some embodiments, the linker is attached to the nucleobase of the nucleotide. In some embodiments, the substrate is a fluorescence quencher, fluorescence donor, or fluorescence acceptor.

In some embodiments, the linker can be prepared by peptide synthesis chemistry.

In some embodiments, the linker comprises a plurality of amino acids. In some embodiments, the plurality of amino acids comprises a plurality of non-protein (e.g., non-natural) amino acids. In some embodiments, the linker comprises the polymerization product of two half-monomers. In some embodiments, the two half-monomers have water-solubilizing groups. In some embodiments, at least one of the two or more ring systems comprises hydroxyproline.

In another aspect, the present disclosure provides a method of sequencing a nucleic acid molecule, the method comprising: (a) contacting the nucleic acid molecule with a primer under conditions sufficient for hybridization of the primer to the nucleic acid molecule, thereby generating a sequencing template; (b) contacting the sequencing template with a polymerase and a solution comprising a plurality of fluorescently labeled nucleotides, wherein each fluorescently labeled nucleotide of the plurality of fluorescently labeled nucleotides is of the same type, and wherein a fluorescently labeled nucleotide of the plurality of fluorescently labeled nucleotides is complementary to a nucleic acid molecule at a plurality of positions adjacent to a primer that hybridizes to the nucleic acid molecule, thereby incorporating two or more fluorescently labeled nucleotides of the plurality of fluorescently labeled nucleotides into the sequencing template; (c) washing a solution comprising a plurality of fluorescently labeled nucleotides away from the sequencing template; and (d) measuring a fluorescent signal emitted by the sequencing template, wherein the intensity of the measured fluorescent signal is greater than a fluorescent signal that can be measured if a single fluorescently labeled nucleotide of the plurality of fluorescently labeled nucleotides has been incorporated into the sequencing template, wherein a fluorescently labeled nucleotide of the plurality of fluorescently labeled nucleotides comprises a fluorescent dye and a linker attached to the fluorescent dye and the nucleotide, wherein the linker comprises (i) one or more water-solubilizing groups and (ii) two or more ring systems, wherein the two or more ring systems pass through no more than two sp' s 3The carbon atoms are linked to each other; and wherein the linker establishes a functional length between the fluorescent dye and the nucleotide of at least about 0.5 nanometers.

In some embodiments, the fluorescently labeled nucleotides include any of the optical (e.g., fluorescent) labeling agents described herein.

In some embodiments, the intensity of the measured fluorescent signal is proportional to the number of fluorescently labeled nucleotides incorporated into the sequencing template. In some embodiments, the intensity of the measured fluorescent signal is linearly proportional to the number of fluorescently labeled nucleotides incorporated into the sequencing template. In some embodiments, the intensity of the measured fluorescent signal is linearly proportional to the slope of about 1.0 when plotted against the number of fluorescently labeled nucleotides incorporated into the sequencing template.

In some embodiments, the solution comprising a plurality of fluorescently labeled nucleotides further comprises unlabeled nucleotides. In some embodiments, at least about 20% of the nucleotides in the solution are fluorescently labeled.

In some embodiments, three or more fluorescently labeled nucleotides of the plurality of fluorescently labeled nucleotides are incorporated into the sequencing template.

In some embodiments, a first fluorescently labeled nucleotide of the plurality of fluorescently labeled nucleotides is incorporated into four positions of a second fluorescently labeled nucleotide of the plurality of fluorescently labeled nucleotides.

In some embodiments, the method further comprises, after (d), cleaving the fluorescent label of the two or more fluorescently labeled nucleotides incorporated into the sequencing template.

In a further aspect, the present disclosure provides a method of sequencing a nucleic acid molecule, the method comprising: (a) contacting the nucleic acid molecule with a primer under conditions sufficient for hybridization of the primer to the nucleic acid molecule, thereby generating a sequencing template; (b) contacting the sequencing template with a polymerase and a first solution comprising a plurality of first fluorescently labeled nucleotides, wherein each first fluorescently labeled nucleotide of the plurality of first fluorescently labeled nucleotides is of the same type, and wherein a first fluorescently labeled nucleotide of the plurality of first fluorescently labeled nucleotides is complementary to the nucleic acid molecule at a position adjacent to a primer that hybridizes to the nucleic acid molecule, thereby incorporating the first fluorescently labeled nucleotide of the plurality of first fluorescently labeled nucleotides into the sequencing template to produce an extended primer; (c) will comprise a plurality of firstWashing away a first solution of fluorescently labeled nucleotides from the sequencing template; (d) measuring a first fluorescent signal emitted by the sequencing template; (e) contacting the sequencing template with a polymerase and a second solution comprising a plurality of second fluorescently labeled nucleotides, wherein each second fluorescently labeled nucleotide in the plurality of second fluorescently labeled nucleotides is of the same type, and wherein a second fluorescently labeled nucleotide in the plurality of second fluorescently labeled nucleotides is complementary to the nucleic acid molecule at a position adjacent to an extension primer hybridized to the nucleic acid molecule, thereby incorporating the second fluorescently labeled nucleotide in the plurality of second fluorescently labeled nucleotides into the sequencing template to generate a further extended primer; (f) washing a second solution comprising a plurality of second fluorescently labeled nucleotides away from the sequencing template; and (g) measuring a second fluorescent signal emitted by the sequencing template, wherein the intensity of the second fluorescent signal is greater than the intensity of the first fluorescent signal, wherein a first fluorescently labeled nucleotide in the plurality of first fluorescently labeled nucleotides comprises a first fluorescent dye and a first linker attached to the first fluorescent dye and the first nucleotide, and a second fluorescently labeled nucleotide in the plurality of second fluorescently labeled nucleotides comprises a second fluorescent dye and a second linker attached to the second fluorescent dye and the second nucleotide; and wherein (I) the first linker comprises (I) one or more water-solubilizing groups and (ii) two or more ring systems, wherein the two or more ring systems pass through no more than two sp 3The carbon atoms are linked to each other; and wherein the first linker establishes a functional length between the first fluorescent dye and the first nucleotide of at least about 0.5 nanometers; and/or (II) the second linker comprises (i) one or more water-solubilizing groups and (II) two or more ring systems, wherein the two or more ring systems are joined by a linkage of no more than two sp3The carbon atoms are linked to each other; and wherein the second linker establishes a functional length between the second fluorescent dye and the second nucleotide of at least about 0.5 nanometers.

In some embodiments, the first fluorescently labeled nucleotide and/or the second fluorescently labeled nucleotide comprise any optical (e.g., fluorescent) labeling agent described herein.

In some embodiments, the first connectionThe head comprises (i) one or more water-solubilising groups and (ii) two or more ring systems, wherein the two or more ring systems pass through no more than two sp3The carbon atoms are linked to each other; and wherein the first linker establishes a functional length between the first fluorescent dye and the first nucleotide of at least about 0.5 nanometers.

In some embodiments, the second linker comprises (i) one or more water-solubilizing groups and (ii) two or more ring systems, wherein the two or more ring systems pass through no more than two sp 3The carbon atoms are linked to each other; and wherein the second linker establishes a functional length between the second fluorescent dye and the second nucleotide of at least about 0.5 nanometers.

In some embodiments, the first solution comprising a plurality of first fluorescently labeled nucleotides further comprises a first unlabeled nucleotide.

In some embodiments, the second solution comprising a plurality of second fluorescently labeled nucleotides further comprises second unlabeled nucleotides.

In some embodiments, the plurality of first fluorescently labeled nucleotides are different from the plurality of second fluorescently labeled nucleotides. In some embodiments, the first fluorescent dye of a first fluorescently labeled nucleotide in the plurality of first fluorescently labeled nucleotides and the second fluorescent dye of a second fluorescently labeled nucleotide in the plurality of second fluorescently labeled nucleotides are the same, and the first nucleotide of the first fluorescently labeled nucleotide in the plurality of first fluorescently labeled nucleotides and the second nucleotide of the second fluorescently labeled nucleotide in the plurality of second fluorescently labeled nucleotides are a different plurality of second fluorescently labeled nucleotides. In some embodiments, the first fluorescent dye of a first fluorescently labeled nucleotide in the plurality of first fluorescently labeled nucleotides and the second fluorescent dye of a second fluorescently labeled nucleotide in the plurality of second fluorescently labeled nucleotides are different, and the first nucleotide of the first fluorescently labeled nucleotide in the plurality of first fluorescently labeled nucleotides and the second nucleotide of the second fluorescently labeled nucleotide in the plurality of second fluorescently labeled nucleotides are the same plurality of second fluorescently labeled nucleotides. In some embodiments, the first fluorescent dye of a first fluorescently labeled nucleotide in the plurality of first fluorescently labeled nucleotides and the second fluorescent dye of a second fluorescently labeled nucleotide in the plurality of second fluorescently labeled nucleotides are different, and the first nucleotide of the first fluorescently labeled nucleotide in the plurality of first fluorescently labeled nucleotides and the second nucleotide of the second fluorescently labeled nucleotide in the plurality of second fluorescently labeled nucleotides are different plurality of second fluorescently labeled nucleotides.

In some embodiments, two or more first fluorescently labeled nucleotides are incorporated into the sequencing template. In some embodiments, two or more second fluorescently labeled nucleotides are incorporated into the sequencing template.

In some embodiments, the method further comprises: (h) contacting the sequencing template with a polymerase and a third solution comprising a plurality of third fluorescently labeled nucleotides, wherein each third fluorescently labeled nucleotide of the plurality of third fluorescently labeled nucleotides is of the same type, and wherein a third fluorescently labeled nucleotide of the plurality of third fluorescently labeled nucleotides is complementary to the nucleic acid molecule at a location adjacent to the further extended primer that hybridizes to the nucleic acid molecule, thereby incorporating the third fluorescently labeled nucleotide of the plurality of third fluorescently labeled nucleotides into the sequencing template; (i) washing a third solution comprising a plurality of third fluorescently labeled nucleotides away from the sequencing template; and (j) measuring a third fluorescent signal emitted by the sequencing template, wherein the intensity of the third fluorescent signal is greater than the intensity of the first fluorescent signal and the intensity of the second fluorescent signal, wherein a third fluorescently labeled nucleotide of the plurality of third fluorescently labeled nucleotides comprises a third fluorescent dye and a third linker attached to the third fluorescent dye and the third nucleotide.

In some embodiments, the third linker comprises (i) one or more water-solubilizing groups and (ii) two or more ring systems, wherein the two or more ring systems pass through no more than two sp3The carbon atoms are linked to each other; and wherein the third linker establishes a functional length between the third fluorescent dye and the third nucleotide of at least about 0.5 nanometers.

In some embodiments, the third fluorescent dye of the third fluorescently labeled nucleotide in the plurality of third fluorescently labeled nucleotides is different from the first fluorescent dye of the first fluorescently labeled nucleotide in the plurality of first fluorescently labeled nucleotides. In some embodiments, the third fluorescent dye of the third fluorescently labeled nucleotide in the plurality of third fluorescently labeled nucleotides is different from the second fluorescent dye of the second fluorescently labeled nucleotide in the plurality of second fluorescently labeled nucleotides. In some embodiments, the third nucleotide of the third fluorescently labeled nucleotide of the plurality of third fluorescently labeled nucleotides is different from the first nucleotide of the first fluorescently labeled nucleotide of the plurality of first fluorescently labeled nucleotides. In some embodiments, the third nucleotide of the third fluorescently labeled nucleotide of the plurality of third fluorescently labeled nucleotides is different from the second nucleotide of the second fluorescently labeled nucleotide of the plurality of second fluorescently labeled nucleotides.

In some embodiments, the method further comprises, after (d), cleaving the first fluorescent dye incorporated into the first fluorescently labeled nucleotide of the sequencing template.

In some embodiments, the method further comprises, after (g), cleaving a second fluorescent dye incorporated into a second fluorescently labeled nucleotide of the sequencing template.

In yet another aspect, the present disclosure provides a method of sequencing a nucleic acid molecule, the method comprising: (a) providing a solution comprising a plurality of fluorescently labeled nucleotides, wherein each fluorescently labeled nucleotide in the plurality of fluorescently labeled nucleotides is of the same type, and wherein a given fluorescently labeled nucleotide in the plurality of fluorescently labeled nucleotides comprises a fluorescent dye attached to the nucleotide by a semi-rigid water-soluble linker having a defined molecular weight and a length of at least about 0.5 nanometers (nm); (b) contacting the nucleic acid molecule with a primer under conditions sufficient for hybridization of the primer to the nucleic acid molecule, thereby generating a sequencing template; (c) contacting the sequencing template with a polymerase and a solution comprising a plurality of fluorescently labeled nucleotides, wherein a fluorescently labeled nucleotide of the plurality of fluorescently labeled nucleotides is complementary to the nucleic acid molecule at a position adjacent to a primer that hybridizes to the nucleic acid molecule, thereby incorporating one or more fluorescently labeled nucleotides of the plurality of fluorescently labeled nucleotides into the sequencing template; (d) washing a solution comprising a plurality of fluorescently labeled nucleotides away from the sequencing template; and (e) measuring the fluorescent signal emitted by the sequencing template.

In some embodiments, the nucleotide is guanine (G).

In some embodiments, the linker reduces quenching between the nucleotide and the fluorescent dye.

In some embodiments, a fluorescently labeled nucleotide of the one or more fluorescently labeled nucleotides is incorporated into the sequencing template more efficiently than another fluorescently labeled nucleotide comprising the same nucleotide and a fluorescent dye, but not including a linker.

In some embodiments, a fluorescently labeled nucleotide of the one or more fluorescently labeled nucleotides is incorporated into the sequencing template with greater fidelity than another fluorescently labeled nucleotide comprising the same nucleotide and a fluorescent dye, but not including a linker.

In some embodiments, the polymerase is a family a polymerase selected from the group consisting of Taq polymerase, Klenow polymerase, and Bst polymerase.

In some embodiments, the polymerase is selected from Vent (exo-) polymerase and TherminatorTMFamily B polymerases of polymerases.

In some embodiments, the linker comprises (i) one or more water-solubilizing groups and (ii) two or more ring systems, wherein the two or more ring systems pass through no more than two sp3The carbon atoms are linked to each other; and wherein the linker establishes a functional length between the fluorescent dye and the nucleotide of at least about 0.5 nanometers.

Another aspect of the disclosure provides a system comprising one or more computer processors and a computer memory coupled thereto. The computer memory contains machine executable code that when executed by one or more computer processors performs any of the methods described above or elsewhere herein.

Another aspect of the disclosure provides a system comprising one or more computer processors and a computer memory coupled thereto. The computer memory contains machine executable code that when executed by one or more computer processors performs any of the methods described above or elsewhere herein.

Other aspects and advantages of the present disclosure will become apparent to those skilled in the art from the following detailed description, wherein only illustrative embodiments of the disclosure are shown and described. As will be realized, the disclosure is capable of other and different embodiments and its several details are capable of modifications in various obvious respects, all without departing from the disclosure. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.

Is incorporated by reference

All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference. To the extent publications and patents or patent applications incorporated by reference contradict the disclosure contained in the specification, the specification is intended to supersede and/or take precedence over any such contradictory material.

Drawings

The novel features believed characteristic of the invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings (also referred to herein as "figures"), of which:

FIG. 1A illustrates an example of a joint of the present disclosure;

FIG. 1B illustrates an example of a joint of the present disclosure;

fig. 1C illustrates an example of a linker of the present disclosure, wherein R is a water-solubilizing group;

fig. 2A illustrates an example of a method for synthesizing a linker of the present disclosure having an effective length of about 2 nanometers;

FIG. 2B shows examples of reagents that may be used in the method of FIG. 2A for synthesizing the disclosed linkers, as well as some trifunctional reagents;

Fig. 2C illustrates an example of a method for synthesizing a linker of the present disclosure, which is a polymer having a defined molecular weight and a linking group; and

FIG. 3 illustrates a computer system programmed or otherwise configured to implement the methods provided herein;

figure 4 shows an example of a method for constructing a labeled nucleotide comprising a propargyl-derivatized nucleotide, a linker and a dye.

Fig. 5A and 5B illustrate exemplary methods for preparing labeled nucleotides comprising dGTP analogs.

Fig. 6 shows an exemplary method for preparing labeled nucleotides comprising dCTP.

FIG. 7 shows components used to construct dye-labeled nucleotides to excite at about 530 nm.

Fig. 8 shows an exemplary method for preparing labeled nucleotides comprising guanine analogs.

Figure 9 shows a schematic of a bead-based assay for evaluating labeled nucleotides.

Figure 10 shows the results of bead-based assays of different labeled dutps.

Figure 11 shows the results of bead-based assays of different labeled datps.

Figure 12 shows the results of bead-based assays of different labeled dGTP.

FIGS. 13A-13C illustrate an exemplary method for preparing labeled nucleotides comprising guanine analogs.

Fig. 14A and 14B illustrate an exemplary method for preparing labeled nucleotides comprising amino acid repeat units.

FIG. 15 shows a schematic of an assay for assessing quenching.

FIG. 16 shows the quenching results for the red dye linker.

FIG. 17 shows the quenching results for the green dye linker.

Figure 18 shows an exemplary sequencing procedure.

FIG. 19 shows the Tolerance (Tolerance) of different labeled nucleotides.

FIGS. 20A and 20B show examples of constructs comprising homopolymeric regions.

FIG. 20C shows the signal detected by sequencing a template with homopolymeric regions using labeled nucleotides.

Fig. 21A shows exemplary results of sequencing analysis using a population of nucleotides comprising 20% fluorophore-labeled dntps.

FIG. 21B shows fluorescence signal intensity as a function of homopolymer length.

Fig. 22 shows exemplary results of sequencing analysis using a population of nucleotides comprising 100% fluorophore-labeled dntps.

Detailed Description

While various embodiments of the present disclosure have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed.

Where values are described as ranges, it is understood that such disclosure includes disclosure of all possible subranges within such ranges, as well as particular values falling within such ranges, whether or not particular values or particular subranges are explicitly stated.

The terms "about" and "approximately" shall generally mean an acceptable degree of error or variation for a given value or range of values, e.g., an error or variation within 20 percent (%), within 15%, within 10%, or within 5% of the given value or range of values.

As used herein, the term "subject" generally refers to an individual or entity from which a biological sample (e.g., a biological sample that is being or will be processed or analyzed) can be derived. The subject can be an animal (e.g., mammalian or non-mammalian) or a plant. The subject can be a human, dog, cat, horse, pig, bird, non-human primate, simian, farm animal, companion animal, sport animal, or rodent. The subject may be a patient. The subject may have or be suspected of having a disease or disorder, such as a cancer (e.g., breast, colorectal, brain, leukemia, lung, skin, liver, pancreatic, lymphoma, esophageal, or cervical cancer) or an infectious disease. Alternatively or additionally, the subject may be known to have previously suffered from a disease or disorder. The subject may be a patient. The subject has or is suspected of having a genetic disorder, such as achondroplasia, alpha-1 antitrypsin deficiency, antiphospholipid syndrome, autism, autosomal dominant polycystic kidney disease, progressive neurotic peroneal muscle atrophy (Charcot-Marie-tooth), crinis Carbonisatus syndrome, Crohn's disease, cystic fibrosis, painful steatosis (Dercuse disease), Down's syndrome, Duane syndrome, Duchenne muscular dystrophy, ledten's five-factor thrombophilia, familial hypercholesterolemia, familial mediterranean fever, Fragile x syndrome, gaucher's disease, hemochromatosis, hemophilia, forebrain anaclasis, Huntington's disease, Klinefelter's syndrome, Marfan's syndrome, myotonic dystrophy, neurofibromatosis, Nonan syndrome, osteogenesis imperfecta, Parkinson's disease, phenylketonuria, Pollen's disease, Crohn's disease, phenylketonuria, Polonemia, Crohn's disease, and Alzheimer's disease, Porphyria, premature aging, retinitis pigmentosa, severe combined immunodeficiency, sickle cell disease, spinal muscular atrophy, Tay-Sachs disease, thalassemia, trimethylaminouria, turner's syndrome, sailin heart face syndrome, WAGR syndrome, or wilson's disease. The subject may be receiving treatment for a disease or disorder. The subject may be symptomatic or asymptomatic for a given disease or disorder. The subject may be healthy (e.g., not suspected of having a disease or disorder). A subject may have one or more risk factors for a given disease. The subject may have a given weight, height, body mass index, or other physical characteristic. The subject may have a given ethnic or ethnic heritage, place of birth or residence, nationality, disease or remission status, family history, or other characteristic.

As used herein, the term "biological sample" generally refers to a sample obtained from a subject. The biological sample may be obtained directly or indirectly from the subject. The sample may be obtained from the subject by any suitable method, including but not limited to expectoration, swabbing, blood drawing, biopsy, obtaining an excretion (e.g., urine, stool, sputum, vomit, or saliva), excision, scraping, and puncturing. Samples can be obtained from a subject by, for example, intravenous or intra-arterial access to the circulatory system, collection of secreted biological samples (e.g., stool, urine, saliva, sputum, etc.), respiratory or surgical extraction of tissue (e.g., biopsy). Samples can be obtained by non-invasive methods, including but not limited to: scraping the skin or cervix, wiping the cheek, or collecting saliva, urine, feces, menses, tears, or semen. Alternatively, the sample may be obtained by an invasive procedure, such as biopsy, needle aspiration, or phlebotomy. The sample may include a bodily fluid such as, but not limited to, blood (e.g., whole blood, red blood cells, white or white blood cells, platelets), plasma, serum, sweat, tears, saliva, sputum, urine, semen, mucus, synovial fluid, breast milk, colostrum, amniotic fluid, bile, bone marrow, interstitial or extracellular fluid, or cerebrospinal fluid. For example, a sample may be obtained by a lancing method to obtain a body fluid comprising blood and/or plasma. Such samples may comprise cells and cell-free nucleic acid material. Alternatively, the sample may be obtained from any other source, including but not limited to blood, sweat, hair follicles, cheek tissue, tears, menses, feces, or saliva. The biological sample may be a tissue sample, such as a tumor biopsy. Samples can be obtained from any tissue provided herein, including but not limited to skin, heart, lung, kidney, breast, pancreas, liver, intestine, brain, prostate, esophagus, muscle, smooth muscle, bladder, gall bladder, colon, or thyroid. The methods of obtaining provided herein include biopsy methods, including fine needle aspiration, core needle biopsy, vacuum assisted biopsy, coarse needle aspiration core biopsy, incisional biopsy, excisional biopsy, punch biopsy, shave biopsy, or skin biopsy. The biological sample may comprise one or more cells. A biological sample can comprise one or more nucleic acid molecules, such as one or more deoxyribonucleic acid (DNA) and/or ribonucleic acid (RNA) molecules (e.g., included within a cell or excluded from a cell). The nucleic acid molecule may be comprised within a cell. Alternatively or additionally, the nucleic acid molecule may not be included within a cell (e.g., a cell-free nucleic acid molecule). The biological sample may be a cell-free sample.

As used herein, the term "cell-free sample" generally refers to a sample that is substantially free of cells (e.g., less than 10% cells on a volume basis). The cell-free sample can be derived from any source (e.g., as described herein). For example, the cell-free sample may be derived from blood, sweat, urine, or saliva. For example, the cell-free sample may be derived from a tissue or a body fluid. Cell-free samples may be derived from a variety of tissues or body fluids. For example, a sample from a first tissue or fluid may be combined with a sample from a second tissue or fluid (e.g., at or after the time the sample is obtained). In one example, the first fluid and the second fluid may be collected from the subject (e.g., at the same or different times) and the first fluid and the second fluid may be combined to provide a sample. The cell-free sample may comprise one or more nucleic acid molecules, such as one or more DNA or RNA molecules.

A sample that is not a cell-free sample (e.g., a sample comprising one or more cells) can be processed to provide a cell-free sample. For example, a sample (e.g., cell-free nucleic acid molecules) that includes one or more cells and one or more nucleic acid molecules (e.g., DNA and/or RNA molecules) that are not included within the cells can be obtained from a subject. The sample can be processed (e.g., as described herein) to separate cells and other materials from nucleic acid molecules not included in the cells, thereby providing a cell-free sample (e.g., including nucleic acid molecules not included in the cells). The cell-free sample can then be further analyzed and processed (e.g., as provided herein). Nucleic acid molecules not included within cells (e.g., cell-free nucleic acid molecules) can be derived from cells and tissues. For example, the cell-free nucleic acid molecule can be derived from tumor tissue or degraded cells (e.g., of body tissue). The cell-free nucleic acid molecule can comprise any type of nucleic acid molecule (e.g., as described herein). The cell-free nucleic acid molecule can be double-stranded, single-stranded, or a combination thereof. Cell-free nucleic acid molecules can be released into body fluids by secretory or cell death processes, such as cell necrosis, apoptosis, and the like. Cell-free nucleic acid molecules can be released into body fluids from cancer cells (e.g., circulating tumor dna (ctdna)). The cell-free nucleic acid molecule can also be fetal DNA (e.g., a cell-free fetal nucleic acid molecule, such as cffDNA) that circulates freely in the maternal blood stream. Alternatively or additionally, cell-free nucleic acid molecules may be released from healthy cells into body fluids.

Biological samples can be obtained and analyzed directly from a subject without any intervening processing, such as, for example, sample purification or extraction. For example, a blood sample may be obtained directly from a subject by entering the circulatory system of the subject, removing blood from the subject (e.g., through a needle), and transferring the removed blood into a container. The container may contain a reagent (e.g., an anticoagulant) such that the blood sample may be used for further analysis. Such reagents may be used to process a sample or an analyte derived from a sample in a container or another container prior to analysis. In another example, a swab may be used to access epithelial cells on the oropharyngeal surface of a subject. After obtaining a biological sample from a subject, a swab containing the biological sample may be contacted with a fluid (e.g., a buffer) to collect the biological fluid from the swab.

Any suitable biological sample comprising one or more nucleic acid molecules may be obtained from a subject. A sample (e.g., a biological sample or a cell-free biological sample) suitable for use in accordance with the methods provided herein can be any material, including tissue, cells, degraded cells, nucleic acids, genes, gene fragments, expression products, gene expression products, and/or gene expression product fragments of an individual to be tested. The biological sample may be a solid substance (e.g., biological tissue) or may be a fluid (e.g., biological fluid). In general, a biological fluid may include any fluid associated with a living organism. Non-limiting examples of biological samples include blood (or blood components-e.g., white blood cells, red blood cells, platelets) obtained from any anatomical location of a subject (e.g., tissue, circulatory system, bone marrow), cells obtained from any anatomical location of a subject, skin, heart, lung, kidney, respiration, bone marrow, stool, semen, vaginal secretions, interstitial fluid derived from tumor tissue, breast, pancreas, cerebrospinal fluid, tissue, throat swab, biopsy, placental fluid, amniotic fluid, liver, muscle, smooth muscle, bladder, gall bladder, colon, intestine, brain, uterine cavity fluid, sputum, pus, microbiota, meconium, breast milk, prostate, esophagus, thyroid, serum, saliva, urine, gastric and digestive fluids, tears, ocular fluid, sweat, mucus, cerumen, oil, glandular secretions, spinal fluid, hair, nail fluid, nail, stool, urine, stomach fluid, digestive fluids, tears, urine, sweat, mucous, cerumen, oil, glandular secretions, and the like, Skin cells, plasma, nasal swab or nasopharyngeal wash, spinal fluid, cord blood, tonic fluid, and/or other excreta or body tissue. Methods for determining sample suitability and/or sufficiency are provided. The sample may include, but is not limited to, blood, plasma, tissue, cells, degraded cells, cell-free nucleic acid molecules, and/or biological material from cells or derived from cells of an individual, such as cell-free nucleic acid molecules. The sample may be a heterogeneous or homogeneous population of cells, tissue, or acellular biological material. The biological sample may be obtained using any method that provides a sample suitable for use in the assay methods described herein.

A sample (e.g., a biological sample or a cell-free biological sample) may undergo one or more processes to prepare for analysis, including but not limited to filtration, centrifugation, selective precipitation, permeabilization, separation, agitation, heating, purification, and/or other processes. For example, the sample may be filtered to remove contaminants or other materials. In one example, a sample containing cells can be processed to separate the cells from other materials in the sample. Such processes can be used to prepare samples containing only cell-free nucleic acid molecules. Such processes may consist of a multi-step centrifugation process. Multiple samples, e.g., multiple samples from the same subject (e.g., obtained in the same or different manner from the same or different body locations, and/or obtained at the same or different time (e.g., seconds, minutes, hours, days, weeks, months, or years) or multiple samples from different subjects) can be obtained for analysis as described herein. In one example, the first sample is obtained from the subject before the subject undergoes the treatment regimen or procedure, and the second sample is obtained from the subject after the subject undergoes the treatment regimen or procedure. Alternatively or additionally, multiple samples may be obtained from the same subject at or about the same time. Different samples obtained from the same subject may be obtained in the same or different ways. For example, a first sample may be obtained by biopsy and a second sample may be obtained by blood draw. Samples obtained in different ways may be obtained by different medical professionals, using different techniques, at different times, and/or at different locations. Different samples obtained from the same subject may be obtained from different parts of the body. For example, a first sample may be obtained from a first region of the body (e.g., a first tissue) and a second sample may be obtained from a second region of the body (e.g., a second tissue).

A biological sample as used herein (e.g., a biological sample comprising one or more nucleic acid molecules) may not be purified when provided in a reaction vessel. Furthermore, for biological samples comprising one or more nucleic acid molecules, the one or more nucleic acid molecules may not be extracted when the biological sample is provided to the reaction vessel. For example, ribonucleic acid (RNA) and/or deoxyribonucleic acid (DNA) molecules of a biological sample may not be extracted from the biological sample when the biological sample is provided to the reaction vessel. In addition, when a biological sample is provided to a reaction vessel, target nucleic acids (e.g., target RNA or target DNA molecules) present in the biological sample may not be concentrated. Alternatively, the biological sample may be purified and/or the nucleic acid molecules may be isolated from other materials in the biological sample.

A biological sample as described herein can comprise a target nucleic acid. As used herein, the terms "template nucleic acid," "target nucleic acid," "nucleic acid molecule," "nucleic acid sequence," "nucleic acid fragment," "oligonucleotide," "polynucleotide," and "nucleic acid" generally refer to a polymeric form of nucleotides of any length, such as deoxyribonucleotides (dntps) or ribonucleotides (rNTP) or analogs thereof, and may be used interchangeably. The nucleic acid may have any three-dimensional structure and may perform any known or unknown function. The nucleic acid molecule can have at least about 10 nucleic acid bases ("bases"), 20 bases, 30 bases, 40 bases, 50 bases, 100 bases, 200 bases, 300 bases, 400 bases, 500 bases, 1 kilobase (kb), 2kb, 3kb, 4kb, 5kb, 10kb, 50kb, or more. Oligonucleotides generally consist of a specific sequence of four nucleotide bases: adenine (a); cytosine (C); guanine (G); and thymine (T) (when the polynucleotide is RNA, uracil (U) represents thymine (T)). An oligonucleotide may include one or more non-standard nucleotides, nucleotide analogs, and/or modified nucleotides. Non-limiting examples of nucleic acids include DNA, RNA, genomic DNA (e.g., gDNA, e.g., sheared gDNA), cell-free DNA (e.g., cfDNA), synthetic DNA/RNA, coding or non-coding regions of a gene or gene fragment, sites (loci) defined by linkage analysis, exons, introns, messenger RNA (mrna), transfer RNA, ribosomal RNA, short interfering RNA (sirna), short hairpin RNA (shrna), micro-RNA (mirna), ribozymes, complementary DNA (cdna), recombinant nucleic acids, branched nucleic acids, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, and primers. The nucleic acid may comprise one or more modified nucleotides, such as methylated nucleotides and nucleotide analogs. If present, the nucleotide structure may be modified before or after nucleic acid assembly. The nucleotide sequence of a nucleic acid may be interrupted by non-nucleotide components. The nucleic acid may be further modified after polymerization, for example by conjugation or binding to a reporter agent.

A target nucleic acid or sample nucleic acid as described herein can be amplified to produce an amplification product. The target nucleic acid may be a target RNA or a target DNA. When the target nucleic acid is a target RNA, the target RNA can be any type of RNA, including the types of RNA described elsewhere herein. The target RNA may be viral RNA and/or tumor RNA. The viral RNA may be pathogenic to the subject. Non-limiting examples of pathogenic viral RNAs include human immunodeficiency virus i (HIV i), human immunodeficiency virus N (HIV 11), orthomyxovirus, ebola virus, dengue virus, influenza virus (e.g., H1N1, H3N2, H7N9, or H5N1), herpes virus, hepatitis a virus, hepatitis b virus, hepatitis c (e.g., armored RNA-HCV virus) virus, hepatitis d virus, hepatitis e virus, hepatitis g virus, epstein-barr virus, mononucleosis virus, cytomegalovirus, SARS virus, west nile virus, poliovirus, and measles virus.

A biological sample can comprise a plurality of target nucleic acid molecules. For example, a biological sample can comprise a plurality of target nucleic acid molecules from a single subject. In another example, a biological sample can comprise a first target nucleic acid molecule from a first subject and a second target nucleic acid molecule from a second subject.

As used herein, the term "nucleotide" generally refers to a substance that includes a base (e.g., a nucleobase), a sugar moiety, and a phosphate moiety. The nucleotide may comprise a free base with an attached phosphate group. A substance comprising a base with three attached phosphate groups may be referred to as a nucleoside triphosphate. When nucleotides are added to a growing nucleic acid molecule chain, the formation of phosphodiester bonds between the proximal phosphate of the nucleotide and the growing chain may be accompanied by hydrolysis of the high energy phosphate bond and the release of the two distal phosphates as pyrophosphates. Nucleotides can be naturally occurring or non-naturally occurring (e.g., modified or engineered nucleotides).

As used herein, the term "nucleotide analog" may include, but is not limited to, nucleotides that may or may not be naturally occurring nucleotides. For example, a nucleotide analog can be derived from and/or include structural similarity to a canonical nucleotide, e.g., a nucleotide that includes adenine- (A), thymine- (T), cytosine- (C), uracil- (U), or guanine- (G). A nucleotide analog may comprise one or more differences or modifications relative to the native nucleotide. Examples of nucleotide analogs include inosine, diaminopurine, 5-fluorouracil, 5-bromouracil, 5-chlorouracil, 5-iodouracil, hypoxanthine, xanthine, deazaxanthine, deazaguanine, isocytosine, isoguanine, 4-acetylcytosine, 5- (carboxyhydroxymethyl) uracil, 5-carboxymethylaminomethyl-2-thiouracil, 5-carboxymethylaminomethyluracil, dihydrouracil, beta-D-galactosyltetraoside, N6-isopentenyladenine, 1-methylguanine, 1-methylinosine, 2-dimethylguanine, 2-methyladenine, 2-methylguanine, 3-methylcytosine, 5-methylcytosine, N6-adenine, 7-methylguanine, 5-methylaminomethyluracil, 5-methoxyaminomethyl-2-thiouracil, beta-D-mannosylstevioside, 5' -methoxycarboxymethyluracil, 5-methoxyuracil, 2-methylthio-D46-isopentenyladenine, uracil-5-oxoacetic acid (v), wybutoxysin (wybutoxosine), pseudouracil, stevioside, 2-thiocytosine, 5-methyl-2-thiouracil, 4-thiouracil, 5-methylthiouracil, uracil-5-oxoacetic acid methyl ester, uracil-5-oxoacetic acid (v), 5-methyl-2-thiouracil, 3- (3-amino-3-N-2-carboxypropyl) uracil, (acp3) w, 2, 6-diaminopurine, ethynyl nucleobases, 1-propynyl nucleobases, azido nucleobases, phosphino selenate nucleic acids, and modified forms thereof (e.g., by oxidation, reduction, and/or addition of substituents such as alkyl, hydroxyalkyl, hydroxyl, or halogen moieties). Nucleic acid molecules (e.g., polynucleotides, double-stranded nucleic acid molecules, single-stranded nucleic acid molecules, primers, adaptors, etc.) can be modified at the base moiety (e.g., at one or more atoms that are generally available to form hydrogen bonds with complementary nucleotides and/or at one or more atoms that are generally not available to form hydrogen bonds with complementary nucleotides), the sugar moiety, or the phosphate backbone. In some cases, a nucleotide may include modifications in its phosphate moiety, including modifications to the triphosphate moiety. Further, non-limiting examples of modifications include longer length phosphate chains (e.g., phosphate chains having 4, 5, 6, 7, 8, 9, 10, or more phosphate moieties), modifications having thiol moieties (e.g., α -and β -thiotriphosphates), and modifications having selenium moieties (e.g., phosphino selenate nucleic acids). The nucleotide or nucleotide analog may comprise a sugar selected from ribose, deoxyribose, and modified forms thereof (e.g., by oxidation, reduction, and/or addition of substituents such as alkyl, hydroxyalkyl, hydroxyl, or halogen moieties). The nucleotide analog may also comprise a modified linker moiety (e.g., instead of a phosphate moiety). The nucleotide analogs may also contain amine-modified groups, such as aminoallyl-dUTP (aa-dUTP) and aminohexylacrylamide-dCTP (aha-dCTP), to allow covalent attachment of amine-reactive moieties, such as N-hydroxysuccinimide ester (NHS). Substitutions of standard DNA base pairs or RNA base pairs in the oligonucleotides of the present disclosure may provide, for example, higher density in bits per cubic millimeter, higher safety (against accidental or deliberate synthesis of natural toxins), easier discrimination of photo-programmed polymerases, and/or lower secondary structures. The nucleotide analog may be capable of reacting with or binding to a detectable moiety for nucleotide detection.

As used herein, the term "homopolymer" generally refers to a polymer or portion of a polymer that comprises the same monomeric units. The homopolymer may have a homopolymer sequence. A nucleic acid homopolymer may refer to a polynucleotide or oligonucleotide comprising consecutive repeats of the same nucleotide or any nucleotide variant thereof. For example, the homopolymer may be poly (dA), poly (dT), poly (dG), poly (dC), poly (rA), poly (U), poly (rG), or poly (rC). The homopolymer may be of any length. For example, a homopolymer can have a length of at least 2, 3, 4, 5, 10, 20, 30, 40, 50, 100, 200, 300, 400, 500, or more nucleic acid bases. Homopolymers may have from 10 to 500, or from 15 to 200, or from 20 to 150 nucleic acid bases. Homopolymers may have a length of up to 500, 400, 300, 200, 100, 50, 40, 30, 20, 10, 5, 4, 3, or 2 nucleic acid bases. Molecules, such as nucleic acid molecules, can include one or more homopolymer portions and one or more non-homopolymer portions. The molecule may be formed entirely of a homopolymer, a plurality of homopolymers, or a combination of homopolymers and non-homopolymers. In nucleic acid sequencing, multiple nucleotides can be incorporated into a homopolymer region of a nucleic acid strand. Such nucleotides can be non-terminating to allow incorporation of contiguous nucleotides (e.g., during a single nucleotide flow).

The terms "amplifying", "amplification" and "nucleic acid amplification" are used interchangeably and generally refer to the generation of one or more copies of a nucleic acid or template. For example, "amplification" of DNA generally refers to the generation of one or more copies of a DNA molecule. The amplicon can be a single-stranded or double-stranded nucleic acid molecule generated from an initial template nucleic acid molecule by an amplification procedure. Such amplification procedures may include one or more extension cycles or ligation procedures. The amplicon can comprise a nucleic acid strand, at least a portion of which can be substantially identical or substantially complementary to at least a portion of the starting template. When the starting template is a double-stranded nucleic acid molecule, the amplicon can comprise a strand of nucleic acid that is substantially identical to at least a portion of one strand and substantially complementary to at least a portion of either strand. The amplicon may be single-stranded or double-stranded regardless of whether the initial template is single-stranded or double-stranded. Amplification of nucleic acids may be linear, exponential, or a combination thereof. Amplification may be emulsion based or may be non-emulsion based. Non-limiting examples of nucleic acid amplification methods include reverse transcription, primer extension, Polymerase Chain Reaction (PCR), Ligase Chain Reaction (LCR), helicase-dependent amplification, asymmetric amplification, rolling circle amplification, and Multiple Displacement Amplification (MDA). In the case of using PCR, any form of PCR may be used, non-limiting examples include real-time PCR, allele-specific PCR, assembly PCR, asymmetric PCR, digital PCR, emulsion PCR, dial-out PCR, helicase-dependent PCR, nested PCR, hot start PCR, inverse PCR, methylation-specific PCR, miniprimer PCR, multiplex PCR, nested PCR, overlap-extension PCR, thermally asymmetric staggered PCR, and touchdown PCR. In addition, amplification can be performed in a reaction mixture that includes various components (e.g., primers, templates, nucleotides, polymerases, buffer components, cofactors, etc.) that participate in or facilitate amplification. In some cases, the reaction mixture includes a buffer that allows the nucleotide to undergo environmentally independent incorporation. Non-limiting examples include magnesium ions, manganese ions, and isocitrate buffers. Other examples of such buffers are described in Tabor, s. et al, c.c. pnas,1989,86, 4076-.

The amplification may be clonal amplification. As used herein, the term "clone" generally refers to a population of nucleic acids in which a majority (e.g., greater than about 50%, 60%, 70%, 80%, 90%, 95%, or 99%) of its members have sequences that are at least about 50%, 60%, 70%, 80%, 90%, 95%, or 99% identical to one another. Members of a clonal population of nucleic acid molecules can have sequence homology to each other. Such members may have sequence homology with the template nucleic acid molecule. The members of a clonal population can be double-stranded or single-stranded. Members of a population may not be 100% identical or complementary, e.g., an "error" may occur during the synthetic process such that a minority of a given population may not have sequence homology with a majority of the population. For example, at least 50% of the members of a population can be substantially identical to each other or to a reference nucleic acid molecule (i.e., a defined sequence molecule used as a basis for sequence comparison). At least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 99%, or more of the population members can be substantially identical to a reference nucleic acid molecule. Two molecules can be considered substantially identical (or homologous) if the percent identity between the two molecules is at least 60%, 70%, 75%, 80%, 85%, 90%, 95%, 98%, 99%, 99.9%, or greater. Two molecules can be considered substantially complementary if the percentage of complementarity between the two molecules is at least 60%, 70%, 75%, 80%, 85%, 90%, 95%, 98%, 99%, 99.9%, or greater. Low or insignificant levels of mixing of non-homologous nucleic acids may occur, and thus the clonal population may contain a small number of different nucleic acids (e.g., less than 30%, e.g., less than 10%).

Useful methods for clonal Amplification from a Single molecule include Rolling Circle Amplification (RCA) (Lizardi et al, nat. Genet.19:225-232(1998), incorporated herein by reference), bridge PCR (Adams and Kron, Method for Performing Amplification of Nucleic acids with Two Primers Bound to a Single solution Support, Mosaic Technologies, Inc. (Winter Hill, Mass.); Whitehead Institute for biological Research, Cambri, Mass., (1997); Adessi et al, Nucleic acids Res.28: E87 (2000); Pemo et al, Nucleic acids Res 33: E11 (2005); or U.S. Pat. No. 5,658, each incorporated herein by reference, cloning of Nucleic acids with PCR products, PCR products, PCR products, PCR products, PCR products, PCR products, PCR products, PCR, nat.Biotechnol.18:630-634 (2000); brenner et al, Proc. Natl. Acad. Sci. USA 97:1665-1670 (2000)); reinartz et al, Brief funct. genomic 1:95-104(2002), each of which is incorporated herein by reference). .

As used herein, the term "polymerizing enzyme" or "polymerase enzyme" generally refers to any enzyme capable of catalyzing a polymerization reaction. Polymerases can be used to extend a nucleic acid primer that pairs with a template strand by incorporating nucleotides or nucleotide analogs. The polymerase can add a new DNA strand by extending the 3' end of the existing nucleotide strand, adding new nucleotides matching the template strand one at a time by creating phosphodiester bonds. The polymerase used herein may have strand displacement activity or non-strand displacement activity. Examples of polymerases include, but are not limited to, nucleic acid polymerases. An exemplary polymerase is Φ 29DNA polymerase or a derivative thereof. The polymerase may be a polymerase (polymerization enzyme). In some cases, a transcriptase or ligase (i.e., an enzyme that catalyzes the formation of a bond) is used. Examples of polymerases include DNA polymerase, RNA polymerase, thermostable polymerase, wild-type polymerase, modified polymerase, E.coli (E.coli) DNA polymerase I, T7 DNA polymerase, bacteriophage T4 DNA polymerase Φ 29(phi29) DNA polymerase, Taq polymerase, Tth polymerase, Tli polymerase, Pfu polymerase, Pwo polymerase, VENT polymerase, DEEPVENT polymerase, EX-Taq polymerase, LA-Taq polymerase, Sso polymerase, Poc polymerase, Pab polymerase, Mth polymerase, ES4 polymerase, Tru polymerase, Tac polymerase, Tne polymerase, Tma polymerase, Tea polymerase, Tih polymerase, Tfi polymerase, platinum Taq polymerase, Tbr polymerase, Tfl polymerase, Pfu-turbo polymerase, Pyrobest polymerase, Pwo polymerase, KOD polymerase, Bst polymerase, Sac polymerase, Klow fragment, polymerase with 3 'to 5' exonuclease activity, and variants thereof, Modified products and derivatives. In some cases, the polymerase is a single subunit polymerase. The polymerase may have a high processivity, i.e., the ability of the polymerase to continuously incorporate nucleotides into the nucleic acid template without releasing the nucleic acid template. In some cases, the polymerase is a polymerase modified to accept a dideoxynucleotide triphosphate, such as, for example, Taq polymerase having the 667Y mutation (see, e.g., Tabor et al, PNAS,1995,92, 6339-. In some cases, the polymerase is a polymerase with modified nucleotide binding that can be used for nucleic acid sequencing, non-limiting examples include thermosequennas polymerase (GE Life Sciences), AmpliTaq FS (ThermoFisher) polymerase, and sequencing Pol polymerase (Jena Bioscience). In some cases, the polymerase is genetically engineered to distinguish dideoxynucleotides, such as, for example, the sequencer enzyme DNA polymerase (ThermoFisher).

The polymerase can be a family a polymerase or a family B DNA polymerase. Family A polymerases include, for example, Taq, Klenow, and Bst polymerase. Family B polymerases include, for example, Vent (exo-) and Therminator polymerases. Family B polymerases are known to accept more different nucleotide substrates than family a polymerases. Family a polymerases are widely used for sequencing by synthetic methods, probably due to their high processivity and fidelity.

As used herein, the term "complementary sequence" generally refers to a sequence that hybridizes to another sequence. Hybridization between two single-stranded nucleic acid molecules may involve the formation of a double-stranded structure that is stable under certain conditions. Two single-stranded polynucleotides are considered to be hybridized if they bind to each other through two or more sequentially adjacent base pairings. A significant portion of the nucleotides in one strand of the double-stranded structure can undergo watson-crick base pairing with the nucleosides on the other strand. Hybridization can also include pairing of nucleoside analogs, such as deoxyinosine, nucleosides with 2-aminopurine bases, and the like, which can be used to reduce the degeneracy of the probe, whether or not such pairing involves the formation of hydrogen bonds.

As used herein, the term "denaturing" generally refers to the separation of double-stranded molecules (e.g., DNA) into single-stranded molecules. Denaturation may be complete or partial. In partial denaturation, a single-stranded region can be formed in a double-stranded molecule by denaturation of two deoxyribonucleic acid (DNA) strands on both sides of the double-stranded region in the DNA.

As used herein, the term "melting temperature" or "melting point" generally refers to the temperature at which at least a portion of a nucleic acid molecule strand is separated from at least a portion of a complementary strand in a sample. The melting temperature may be the temperature at which the double-stranded nucleic acid molecule is partially or completely denatured. The melting temperature may refer to the temperature of a sequence of a plurality of sequences, or the temperature of a plurality of sequences, of a given nucleic acid molecule. Different regions of a double-stranded nucleic acid molecule can have different melting temperatures. For example, a double-stranded nucleic acid molecule can include a first region having a first melting point and a second region having a second melting point higher than the first melting point. Thus, different regions of a double-stranded nucleic acid molecule can melt (e.g., partially denature) at different temperatures. The melting point of a nucleic acid molecule or region thereof (e.g., a nucleic acid sequence) can be determined experimentally (e.g., by melting analysis or other procedures) or can be estimated based on the sequence and length of the nucleic acid molecule. For example, a software program such as MELTING may be used to estimate the MELTING temperature of a nucleic acid sequence (Dumousseau M, Rodriguez N, Juty N, Le Nov re N, MELTING, a flexible platform to predict the MELTING temperature of nucleic acids. BMC biologics.2012May 16; 13:101.doi: 10.1186/1471. 2105-13-101). Thus, the melting point as described herein may be an estimated melting point. The true melting point of a nucleic acid sequence may vary based on the sequence adjacent to the nucleic acid sequence of interest or lack thereof, among other factors.

As used herein, the term "sequencing" generally refers to the process of generating or identifying a sequence of a biological molecule, such as a nucleic acid molecule or polypeptide. Such sequences may be nucleic acid sequences, which may include sequences of nucleic acid bases (e.g., nucleobases). Sequencing may be, for example, single molecule sequencing, sequencing by synthesis, sequencing by hybridization, or sequencing by ligation. Sequencing can be performed using template nucleic acid molecules immobilized on a support (e.g., a flow cell or one or more beads). A sequencing assay can generate one or more sequencing reads corresponding to one or more template nucleic acid molecules.

As used herein, the term "read" generally refers to a nucleic acid sequence, e.g., a sequencing read. The sequencing reads can be deduced sequences of nucleic acid bases (e.g., nucleotides) or base pairs obtained by a nucleic acid sequencing assay. Sequencing reads can be generated by a nucleic acid sequencer, such as a massively parallel array sequencer (e.g., Illumina or Pacific Biosciences of California). The sequencing reads may correspond to a portion of the subject's genome, or in some cases all. The sequencing reads can be part of a set of sequencing reads that can be combined, for example, by alignment (e.g., with a reference genome) to generate a subject's genomic sequence.

As used herein, the term "detector" generally refers to a device capable of detecting or measuring a signal, e.g., a signal indicative of the presence or absence of an incorporated nucleotide or nucleotide analog. The detector may comprise optical and/or electronic components that may detect and/or measure signals. Non-limiting examples of detection methods involving detectors include optical detection, spectroscopic detection, electrostatic detection, and electrochemical detection. Optical detection methods include, but are not limited to, fluorimetry and ultraviolet-visible absorbance. Optical detection methods include, but are not limited to, light absorption, ultraviolet-visible (UV-vis) light absorption, infrared light absorption, light scattering, rayleigh scattering, raman scattering, surface enhanced raman scattering, mie scattering, fluorescence, luminescence, and phosphorescence. Spectroscopic detection methods include, but are not limited to, mass spectrometry, Nuclear Magnetic Resonance (NMR) spectroscopy, and infrared spectroscopy. Electrostatic detection methods include, but are not limited to, gel-based techniques such as gel electrophoresis. Electrochemical detection methods include, but are not limited to, electrochemical detection of amplification products after high performance liquid chromatography separation of the amplification products.

As used herein, the term "support" generally refers to any solid or semi-solid preparation on which an agent, such as a nucleic acid molecule, can be immobilized. Nucleic acid molecules may be synthesized, attached, linked, or otherwise immobilized. The nucleic acid molecule may be immobilized on the support by any method, including but not limited to physical adsorption, formation by ionic or covalent bonds, or a combination thereof. The support may be two-dimensional (e.g., a planar 2D support) or 3-dimensional. In some cases, the support may be a component of a flow cell and/or may be included within or adapted to be received by a sequencing instrument. The support may comprise a polymer, glass or metal material. Examples of supports include membranes, planar supports, microtiter plates, beads (e.g., magnetic beads), filters, test strips, slides, coverslips, and test tubes. The support may include organic polymers such as polystyrene, polyethylene, polypropylene, polyvinyl fluoride, polyoxyethylene, and polyacrylamide (e.g., polyacrylamide gel), as well as copolymers and grafts thereof. The support may comprise latex or dextran. The support may also be inorganic, such as glass, silica, gold, Controlled Pore Glass (CPG) or reversed phase silica. The support may be configured, for example, in the form of beads, spheres, particles, granules, gels, porous matrices, or supports. In some cases, a support can be a single solid or semi-solid product (e.g., a single particle), while in other cases, a support can comprise a plurality of solid or semi-solid products (e.g., a collection of particles). The support may be planar, substantially planar or non-planar. The support may be porous or non-porous and may have swelling or non-swelling properties. The support may be shaped to include one or more apertures, recesses, or other receptacles, vessels, features, or locations. The plurality of supports may be arranged in an array at different locations. The support may be addressable (e.g. for robotic delivery of reagents), or scanned by detection methods, for example by laser irradiation and confocal or deflected light concentration. For example, the support may be in optical and/or physical communication with the detector. Alternatively, the support may be physically separated from the detector by a distance. The amplification support (e.g., beads) can be placed in or on another support (e.g., in the wells of a second support).

As used herein, the term "label" generally refers to a moiety capable of coupling to a species such as, for example, a nucleotide analog. The label may comprise an affinity moiety. In some cases, the label may be a detectable label that emits a signal that can be detected (or reduces the emitted signal). In some cases, such a signal may indicate the incorporation of one or more nucleotides or nucleotide analogs. In some cases, the label may be coupled to a nucleotide or nucleotide analog that isCan be used for primer extension reaction. In some cases, the label may be coupled to the nucleotide analog after the primer extension reaction. In some cases, the label may be specifically reactive with the nucleotide or nucleotide analog. The coupling may be covalent or non-covalent (e.g., by ionic interactions, van der waals forces, etc.). In some cases, the coupling may be through a linker, which may be cleavable, e.g., photocleavable (e.g., cleavable under ultraviolet light), chemically cleavable (e.g., by a reducing agent such as Dithiothreitol (DTT), tris (2-carboxyethyl) phosphine (TCEP), tris (hydroxypropyl) phosphine (THP), or enzymatically cleavable (e.g., by an esterase, lipase, peptidase, or protease) SYBR blue, DAPI, propidium iodide, Hoeste, SYBR gold, ethidium bromide, acridine, proflavine, acridine orange, acriflavine, fluorescent coumarin (fluorocoumanin), ellipticine, daunomycin, chloroquine, distamycin D, chromomycin, ethidium (hoderium), mithramycin, polypyridyl ruthenium, anthranilic acid, phenanthridine and acridine, propidium iodide, hexidium iodide, ethidium dihydrogen, ethidium homodimer-1 and ethidium homodimer-2, ethidium nitride and ACMA, Hoechst 33258, Hoechst 33342, Hoechst 34580, DAPI, acridine orange, 7-AAD, actinomycin 35751, hydroxyethrin D, LDS751, and its derivatives Amidine (hydroxyystilbamidine), SYTOX Blue, SYTOX Green, SYTOX Orange, POPO-1, POPO-3, YOYO-1, YOYO-3, TOTO-1, TOTO-3, JOJO-1, LOLO-1, BOBO-3, PO-PRO-1, PO-PRO-3, BO-PRO-1, BO-PRO-3, TO-PRO-1, TO-PRO-3, TO-PRO-5, JO-PRO-1, LO-PRO-1, YO-PRO-3, PicoGreen, OliGrENE, RiboGreen, SYBR Gold, SYBR Green I, SYBR Green II, SYBR DX, SYTO markers (e.g., SYTO-40, SYTO-41, SYTO-42, SYTO-43, SYTO-44, and SYTO-45 (blue), SYTO-13, SYTO-16, SYTO-24, SYTO-21, SYTO-23, SYTO-12, SYTO-11, SYTO-20, SYTO-22, SYTO-15, SYTO-14, and SYTO-25 (Green), SYTO-81, SYTO-80, SYTO-82, SYTO-83, SYTO-84, and SYTO-85 (orange), SYTO-64, SYTO-17, SYTO-59, SYTO-61, SYTO-62, SYTO-60, and SYTO-63 (Green)), Fluorescein Isothiocyanate (FITC), fluorescein isothiocyanate (TRITC), Tetramethylguanamine (TC)), SYBR DX, SYTO-25 (Green), SYTO-81, SYTO-80, SYTO-82, SYTO-83, SYTO-84, and SYTO-85 (orange), Rhodamine, tetramethylrhodamine, R-phycoerythrin, Cy-2, Cy-3, Cy-3.5, Cy-5, Cy5.5, Cy-7, Texas Red (Texas Red), Phar-Red, Allophycocyanin (APC), Sybr Green I, Sybr Green II, Sybr Gold, CellTracker Green, 7-AAD, ethidium homodimer I, ethidium homodimer II, ethidium homodimer III, ethidium bromide, umbelliferone, eosin, Green fluorescent protein, erythrosine, coumarin, methylcoumarin, pyrene, malachite Green, stilbene, luciferin, cascade blue (cascade ue), dichlorotriazinylamine blcein, dansyl chloride, fluorescein complex (such as those containing europium and terbium), carboxytetrachlorofluorescein, 5-carboxyfluorescein and/or 6-carboxyfluorescein (FAM), C, 5-iodoacetamido fluorescein or 6-iodoacetamido fluorescein, 5- { [2-5- (acetylmercapto) -succinyl ]Amino fluorescein and 5- { [3-5- (acetylmercapto) -succinyl]Amino } fluorescein (SAMSA-fluorescein), lissamine rhodamine B sulfonyl chloride, 5-carboxyrhodamine and/or 6-carboxyrhodamine (ROX), 7-amino-methyl-coumarin, 7-amino-4-methylcoumarin-3-acetic acid (AMCA), BODIPY fluorophore, 8-methoxypyrene-1, 3, 6-trisulfonate trisodium salt, 3, 6-disulfonic acid-4-amino-naphthalimide, phycobiliprotein, AlexaFluor labels (e.g., AlexaFluor 750 and AlexaFluor 790 dyes, DyLight 350, DyLight 405, DyLight 488, DyLight 550, DyLight 594, DyLight 633, DyLight 650, DyLight 680, DyLight 755, and DyLight 800 dyes), or other fluorophores, Black Hole quencher dyes (Biosearch Technologies), such as BH1-0, BHQ-1-Q-0, BHQ-3, BHQ-10; QSY dye fluorescence quenchers (from molecular probes/Invitrogen) such as QSY7, QSY9, QSY21, and,QSY35 and other quenchers such as Dabcyl and Dabsyl; cy5Q and Cy7Q and dark cyanine dyes (GE Healthcare); dy-quenchers (Dyomics), such as DYQ-660 and DYQ-661; and ATTO fluorescence quenchers (ATTO-TEC GmbH), such as ATTO 540Q, 580Q, 612Q. Atto532[ e.g. Atto532 succinimidyl ester ]And Atto633), as well as other fluorophores and/or quenchers. Additional examples are included in the structures provided herein. Dyes included in the structures provided herein are contemplated for use in combination with any of the linkers and substrates described herein. The fluorescent dye may be excited by applying energy corresponding to the visible region of the electromagnetic spectrum (e.g., about 430-770 nanometers (nm)). Excitation may be performed using any useful device, such as a laser and/or a light emitting diode. Optical elements including, but not limited to, mirrors, wave plates, filters, monochromators, gratings, beam splitters, and lenses may be used to direct light to or from the fluorescent dye. Fluorescent dyes can emit light (e.g., fluorescence) in the visible region of the electromagnetic spectrum (e.g., about 430-770 nm). The fluorescent dye may be excited at a single wavelength or range of wavelengths. The fluorescent dye may be excited (e.g., has an excitation maximum in the red region of the visible portion of the electromagnetic spectrum) by light in the red region (about 625-740nm) of the visible portion of the electromagnetic spectrum. Alternatively or additionally, the fluorescent dye may be excited (e.g., have an excitation maximum in the green region of the visible portion of the electromagnetic spectrum) by light in the green region (about 500-565nm) of the visible portion of the electromagnetic spectrum. The fluorescent dye may emit a signal in the red region of the visible portion of the electromagnetic spectrum (about 625-740nm) (e.g., having an emission maximum in the red region of the visible portion of the electromagnetic spectrum). Alternatively or additionally, the fluorescent dye may emit a signal in the green region of the visible portion of the electromagnetic spectrum (about 500-565nm) (e.g., having an emission maximum in the green region of the visible portion of the electromagnetic spectrum).

The label may be a quencher molecule. As used herein, the term "quencher" generally refers to a molecule that can be an energy acceptor. A quencher may be a molecule that can reduce the emitted signal. For example, the template nucleic acid molecule can be designed to emit a detectable signal. Incorporation of a nucleotide or nucleotide analog comprising a quencher can reduce or eliminate the signal, which reduction or elimination is then detected. Luminescence from a label (e.g., a fluorescent moiety, such as a fluorescent moiety linked to a nucleotide or nucleotide analog) can also be quenched (e.g., by incorporation of other nucleotides that may or may not include a label). In some cases, labeling with a quencher can occur after incorporation of a nucleotide or nucleotide analog (e.g., after incorporation of a nucleotide or nucleotide analog comprising a fluorescent moiety), as described elsewhere herein. In some cases, the indicia may be indicia with a linker. For example, a label may have a disulfide linker attached to the label. Non-limiting examples of such labels include Cy 5-azide, Cy-2-azide, Cy-3-azide, Cy-3.5-azide, Cy5.5-azide, and Cy-7-azide. In some cases, the linker may be a cleavable linker. In some cases, the label may be of a type that is not self-quenching and does not exhibit proximity quenching. Non-limiting examples of label types that are not self-quenching and do not exhibit proximity quenching include bis-tetralin (Bimane) derivatives, such as bromo-bis-tetralin. As used herein, the term "proximity quenching" generally refers to a phenomenon in which one or more dyes that are in close proximity to each other may exhibit lower fluorescence than the fluorescence they exhibit alone. In some cases, the dye can undergo proximity quenching, where the donor dye and the acceptor dye are within 1nm to 50nm of each other. Examples of quenchers include, but are not limited to, black hole quencher dyes (Biosearch Technologies) (e.g., BH1-0, BHQ-1, BHQ-3, and BHQ-10), QSY dye fluorescence quenchers (Molecular Probes/Invitrogen) (e.g., QSY7, QSY9, QSY21, and QSY35), Dabcyl, Dabsyl, Cy5Q, Cy7Q, melanine dyes (HeGE althecare), Dy-quenchers (Dyomics) (e.g., DYQ-660 and DYQ-661), and ATTO fluorescence quenchers (ATTO-TEC 35540) (e.g., ATTO-TEC 540Q, ATTO 580Q 580 and ATTO 612). Fluorophore donor molecules can be used in combination with quenchers. Examples of fluorophore donor molecules that can be used in conjunction with the quencher include, but are not limited to, fluorophores such as Cy3B, Cy3, or Cy 5; dy-quenchers (Dyomics) (e.g., DYQ-660 and DYQ-661); and ATTO fluorescence quenchers (ATTO-TEC GmbH) (e.g., ATTO 540Q, 580Q, and 612Q).

As used herein, the term "labeling fraction" generally refers to the ratio of dye-labeled nucleotides or nucleotide analogs to a single canonical type of natural/unlabeled nucleotides or nucleotide analogs in a flowing solution. The labeling fraction may be expressed as the concentration of labeled nucleotides or nucleotide analogs divided by the sum of the concentrations of labeled and unlabeled nucleotides or nucleotide analogs. The labeling fraction may be expressed as a percentage of labeled nucleotides contained in the solution (e.g., nucleotide stream). The marker score may be at least about 0.5%, 1%, 2%, 3%, 4%, 5%, 10%, 15%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90% or higher. For example, the mark fraction may be at least about 20%. The mark fraction may be about 100%. The labeling fraction may also be expressed as the ratio of labeled nucleotides to unlabeled nucleotides contained in the solution. For example, the ratio of labeled nucleotides to unlabeled nucleotides can be at least about 1:5, 1:4, 1:3, 1:2, 1:1, 2:1, 3:1, 4:1, 5:1, or higher. For example, the ratio of labeled nucleotides to unlabeled nucleotides can be at least about 1: 4. For example, the ratio of labeled nucleotides to unlabeled nucleotides can be at least about 1: 1. For example, the ratio of labeled nucleotides to unlabeled nucleotides can be at least about 5: 1.

As used herein, the term "labeled fraction" generally refers to the actual fraction of labeled nucleic acid (e.g., DNA) produced upon treatment of a primer template with a mixture of dye-labeled nucleotides or nucleotide analogs and natural nucleotides or nucleotide analogs. The score of the mark may be approximately the same as the mark score. For example, if 20% of the nucleotides in a nucleotide stream are labeled, about 20% of the nucleotides incorporated into a growing nucleic acid strand (e.g., during nucleic acid sequencing) can be labeled. Alternatively, the score of the token may be greater than the token score. For example, if 20% of the nucleotides in a nucleotide stream are labeled, more than 20% of the nucleotides incorporated into a growing nucleic acid strand (e.g., during nucleic acid sequencing) can be labeled. Alternatively, the score of the token may be less than the token score. For example, if 20% of the nucleotides in a nucleotide stream are labeled, less than 20% of the nucleotides incorporated into a growing nucleic acid strand (e.g., during nucleic acid sequencing) can be labeled.

When a solution containing less than 100% labeled nucleotides or nucleotide analogs is used in an incorporation process, such as a sequencing process (e.g., as described herein), labeled ("light") and unlabeled ("dark") nucleotides or nucleotide analogs can be incorporated into a growing nucleic acid strand. As used herein, the term "tolerance" generally refers to the ratio of the fraction of a mark (e.g., the "bright" incorporation fraction) to the mark fraction (e.g., the "bright" fraction in a solution). For example, if using a labeling score of 0.2 results in a labeling score of 0.4, the tolerance is 2. Similarly, if a 2.5% mark fraction in solution is used (b) fFraction of bright solution) were subjected to an incorporation process such as a sequencing process and 5% were labeled (b)iBright blend fraction), the tolerance may be 2 (e.g., tolerance). For low mark scores (e.g., mark scores of 10% or less), the model may be linear. For higher mark scores, Tolerance (Tolerance) may take into account competitive dark incorporation. Tolerance may refer to the ratio of the light incorporation fraction to the dark incorporation fraction (b)i/di) Ratio of light to dark solution fraction (b)f/df) Comparison of (1):

wherein d isi=1-bi(for example, assuming 100% light score normalized to 1, dark incorporation score and light incorporation score sum to 1)

Although d isiNot easily measured, but can be measured (e.g., as described herein) biBright incorporation score and by fitting the bright solution score (b)f) And a light incorporation fraction (b)i) The curve of (2) was used to determine the tolerance:

a "positive" tolerance (>1) means that at a mark fraction of 50%, more than 50% is marked. A "negative" tolerance number (<1) indicates that less than 50% is marked at a marking score of 50%.

As used herein, the term "context" generally refers to the sequence or context of adjacent nucleotides that has been observed to affect the tolerance in the incorporation reaction. The nature of the enzyme, pH and other factors may also affect tolerances. Minimizing the contextual impact greatly simplifies the basic assay.

As used herein, the term "scar" generally refers to a residue left on a previously labeled nucleotide or nucleotide analog after cleavage of an optical (e.g., fluorescent) dye and optionally attachment of the optical dye to all or part of the linker of the nucleotide or nucleotide analog. Examples of scars include, but are not limited to, hydroxyl moieties (e.g., resulting from cleavage of an azidomethyl group, a hydrocarbyl dithiomethyl linkage, or a 2-nitrobenzyloxy linkage), thiol moieties (e.g., resulting from cleavage of a disulfide linkage), and benzyl moieties. For example, the scar may contain aromatic groups such as phenyl or benzyl. The size and nature of the scar may affect subsequent incorporation.

As used herein, the term "misincorporation" generally refers to the situation that occurs when a DNA polymerase incorporates a labeled or unlabeled nucleotide that is not the correct watson crick partner of the template base. In methods lacking all four base competition in the incorporation event, misincorporation may occur more frequently and result in strand loss, limiting the read length of the sequencing method.

As used herein, the term "mismatch extension" generally refers to the situation that occurs when a DNA polymerase incorporates a labeled or unlabeled nucleotide that is not a correct watson crick partner for the template base, followed by subsequent incorporation of the correct watson crick partner for the following base. Mismatch extension usually leads to a prime phase and limits the read length of the sequencing method.

With respect to quenching, dye-dye quenching between two dye moieties attached to different nucleotides (e.g., adjacent nucleotides in a growing nucleic acid strand, or nucleotides in a nucleic acid strand separated by one or more other nucleotides) can be strongly dependent on the distance between the two dye moieties. The distance between two dye moieties may depend, at least in part, on the nature of the linker that connects the two dye moieties to the respective nucleotides or nucleotide analogs, including linker composition and functional length. The characteristics of the linker, including composition and functional length, may be affected by temperature, solvent, pH, and salt concentration (e.g., in solution). Quenching may also vary based on the nature of the dye used. Quenching can also occur between the dye moiety and the nucleobase moiety (e.g., between the fluorescent dye and the nucleobase of the nucleotide associated therewith). Controlling the quenching phenomenon may be a key feature of the methods described herein.

With respect to streams, a nucleotide stream can be composed of a mixture of labeled and unlabeled nucleotides or nucleotide analogs (e.g., a single canonical type of nucleotide or nucleotide analog). For example, a solution comprising a plurality of optically (e.g., fluorescently) labeled nucleotides and a plurality of unlabeled nucleotides can be contacted with, for example, a sequencing template (as described herein). The plurality of optically labeled nucleotides and the plurality of unlabeled nucleotides can each comprise the same canonical nucleotide or nucleotide analog. The stream may comprise only labelled nucleotides or nucleotide analogues. Alternatively, the stream may comprise only unlabeled nucleotides or nucleotide analogs. The stream may comprise a mixture of different types (e.g., a and G) of nucleotides or nucleotide analogs.

A wash stream (e.g., a solution comprising a buffer) can be used to remove any nucleotides not incorporated into the nucleic acid complex (e.g., sequencing template, as described herein). A cleavage stream (e.g., a solution comprising a cleavage reagent) can be used to remove a dye moiety (e.g., a fluorescent dye moiety) from an optically (e.g., fluorescently) labeled nucleotide or nucleotide analog. In some cases, different cleaving agents may be used to remove different dyes (e.g., fluorescent dyes). In other cases, the same cleavage reagent may be used to remove a different dye (e.g., a fluorescent dye). Cleaving the dye moiety from the optically labeled nucleotide or nucleotide analog may comprise cleaving all or part of a linker connecting the nucleotide or nucleotide analog to the dye moiety.

As used herein, the term "cycling" generally refers to a process in which a nucleotide stream, a wash stream, and a cleavage stream (e.g., provided to a sequencing template, as described herein) corresponding to each canonical nucleotide (e.g., dATP, dCTP, dGTP, and dTTP or dUTP, or modified forms thereof) are used. Multiple cycles can be used to sequence and/or amplify nucleic acid molecules. The order of the nucleotide flows may vary.

The phasing may be lead or lag phasing. Lead phasing generally refers to a phenomenon in which a population of chains exhibits nucleotide incorporation into the flow preceding the expected cycle (e.g., due to contamination in the system). Hysteretic phasing refers to a phenomenon in which a population of chains exhibits nucleotide incorporation into the flow after the expected cycle (e.g., due to incomplete extension in earlier cycles).

The compounds and chemical moieties described herein, including linkers, may contain one or more asymmetric centers, thus giving rise to enantiomers, diastereomers, and other stereoisomeric forms, which are defined as (R) or (S) in terms of absolute stereochemistry, and (D) -or (L) -in terms of relative stereochemistry. The D/L system associates molecules with the chiral molecule glyceraldehyde and is commonly used to describe biomolecules including amino acids. Unless otherwise indicated, the present disclosure is intended to encompass all stereoisomeric forms of the compounds disclosed herein. When the compounds described herein contain olefinic double bonds, the disclosure is intended to include both E and Z geometric isomers (e.g., cis or trans), unless otherwise indicated. Likewise, all possible isomers, as well as their racemic and optically pure forms, and all tautomeric forms are intended to be included. The term "geometric isomers" refers to the E and Z geometric isomers (e.g., cis or trans) of an olefinic double bond. The term "positional isomers" refers to structural isomers around a central ring, such as ortho, meta, and para isomers around the benzene ring. The separation of stereoisomers may be performed by chromatography or by forming diastereomers and separating by recrystallization or chromatography or any combination thereof. (Jean Jacques, Andre Collet, Samuel H.Wilen, "Enantiomers, racemes and solutions", John Wiley and Sons, Inc.,1981, incorporated herein by reference for the purposes of this disclosure). Stereoisomers may also be obtained by stereoselective synthesis.

The compounds and chemical moieties described herein, including linkers, may exist as tautomers. "tautomer" refers to a molecule in which a proton may be transferred from one atom of the molecule to another atom of the same molecule. In cases where tautomerization is likely to occur, there will be a chemical equilibrium of the tautomers. Unless otherwise specified, chemical structures described herein are intended to include the structures as different tautomers of the structures. For example, the chemical structures described with the enol moiety also include keto tautomeric forms of the enol moiety. The exact ratio of tautomers depends on several factors including physical state, temperature, solvent and pH. Some examples of tautomeric equilibrium include:

the compounds and chemical moieties described herein, including linkers and dyes, may be provided in different isotopically enriched forms. For example, the compound may be enriched in2H、3H、11C、13C and/or14The content of C. For example, a linker, substrate (e.g., a nucleotide or nucleotide analog), or dye can be deuterated at least one position. In some examples, the linker, substrate (e.g., nucleotide or nucleotide analog), or dye may be fully deuterated. Such deuterated forms can be prepared by the procedures described in U.S. Pat. nos. 5,846,514 and 6,334,997, the entire contents of each of which are incorporated herein by reference. As described in U.S. patent nos. 5,846,514 and 6,334,997, deuteration can improve metabolic stability and or efficacy, thereby increasing the duration of action of the drug.

Unless otherwise indicated, structures depicted and described herein are intended to include compounds that differ only in the presence of one or more isotopically enriched atoms. For example, except that hydrogen is replaced by deuterium or tritium or carbon is enriched13C-or14Carbon position of C-In addition, compounds and chemical moieties having this structure are within the scope of the disclosure.

The compounds and chemical moieties of the present disclosure may contain unnatural proportions of atomic isotopes at one or more of the atoms that constitute such compounds. For example, a compound or chemical moiety such as a linker, substrate (e.g., a nucleotide or nucleotide analog), or dye, or combination thereof, may be substituted with one or more isotopes such as deuterium (g: (b) (ii))2H) Tritium (A)3H) Iodine-125 (125I) Or carbon 14 (C)14C) And (4) marking. By using2H、11C、13C、14C、15C、12N、13N、15N、16N、16O、17O、14F、15F、16F、17F、18F、33S、34S、35S、36S、35Cl、37Cl、79Br、81Br and125isotopic substitutions of I are contemplated. All isotopic variations of the compounds and chemical moieties described herein, whether radioactive or not, are intended to be encompassed within the scope of the present disclosure.

Splice for optical inspection

The present disclosure provides an optically (e.g., fluorescently) labeled reagent comprising a dye (e.g., a fluorescent dye) and a linker attached to the dye and capable of associating with a substrate to be optically (e.g., fluorescently) labeled. The substrate may be any suitable molecule, analyte, cell, tissue or surface to be optically labeled. Examples include cells, including eukaryotic cells, prokaryotic cells, healthy cells, and diseased cells; a cellular receptor; an antibody; a protein; a lipid; a metabolite; a probe; a reagent; nucleotides and nucleotide analogs; and nucleic acid molecules. The association between the linker and the substrate can be any suitable association, including covalent or non-covalent, e.g., an association between a purine-containing nucleotide and a pyrimidine-containing nucleotide in a nucleic acid molecule. In some cases, such association may be a biotin-avidin interaction. In other cases, the association between the linker and the substrate may be through a propargylamino moiety. In some cases, the association between the linker and the substrate can be through an amide bond (e.g., a peptide bond).

The joint may be semi-rigid. The semi-rigid nature of the linker can be most easily achieved by using a structure comprising a series of ring systems (e.g., aliphatic and aromatic rings). As used herein, a ring (e.g., a ring structure) is a cyclic moiety comprising any number of atoms connected in a closed, substantially cyclic manner, as used in the field of organic chemistry. A ring may be defined by any number of atoms. For example, the ring may include 3 to 12 atoms, such as 3 to 12 carbon atoms. In certain examples, the rings can be five-membered rings (i.e., pentagons) or six-membered rings (i.e., hexagons). The rings may be aromatic or non-aromatic. The rings may be aliphatic. The ring may contain one or more double bonds.

A ring (e.g., a ring structure) can be a component of a ring system that can include one or more ring structures (e.g., a polycyclic system). For example, the ring system may comprise a single ring. In another example, the ring system may be a bicyclic ring or a bridged system. The ring structure may be a carbocyclic ring formed by carbon atoms or components thereof. Carbocycles may be saturated, unsaturated, or aromatic rings in which each atom of the ring is carbon. Carbocycles include 3-to 10-membered monocyclic rings, 4-to 12-membered bicyclic rings (e.g., 6-to 12-membered bicyclic rings), and 5-to 12-membered bridged rings. Each ring of the bicyclic carbocycle may be selected from saturated, unsaturated and aromatic rings. For example, a bicyclic carbocycle may include an aromatic ring (e.g., phenyl) fused to a saturated or unsaturated ring (e.g., cyclohexane, cyclopentane, or cyclohexene). Bicyclic carbocycles may include any combination of saturated, unsaturated, and aromatic bicyclic rings, as the valence permits. Bicyclic carbocycles can include any combination of ring sizes, such as 4-5 fused ring systems, 5-6 fused ring systems, and 6-6 fused ring systems. A carbocycle may be, for example, cyclopropyl, cyclobutyl, cyclopentyl, cyclohexyl, cyclohexenyl, adamantyl, phenyl, indanyl or naphthyl. Saturated carbocycles do not include multiple bonds (e.g., double or triple bonds). The saturated carbocyclic ring may be, for example, cyclopropane, cyclobutane, cyclopentane, or cyclohexane. Unsaturated carbocycles include at least one multiple bond (e.g., a double or triple bond) but are not aromatic carbocycles. The unsaturated carbocyclic ring may be, for example, cyclohexadiene, cyclohexene or cyclopentene. Other examples of carbocycles include, but are not limited to, cyclopropane, cyclobutane, cyclopentane, cyclopentadiene, cyclohexane, cycloheptane, cycloheptene, naphthalene, and adamantane. An aromatic carbocycle (e.g., aryl moiety) may be, for example, phenyl, naphthyl or dihydronaphthyl.

In some cases, a ring may include one or more heteroatoms, such as one or more oxygen, nitrogen, silicon, phosphorus, boron, or sulfur atoms. The ring may be a heterocyclic ring or a component thereof containing one or more heteroatoms. The heterocyclic ring may be a saturated, unsaturated or aromatic ring in which at least one atom is a heteroatom. Heteroatoms include 3-to 10-membered monocyclic, 6-to 12-membered bicyclic, and 6-to 12-membered bridged rings. Bicyclic heterocycles may include any combination of saturated, unsaturated, and aromatic bicyclic rings, as the valence allows. For example, a heteroaromatic ring (e.g., pyridyl) can be fused to a saturated or unsaturated ring (e.g., cyclohexane, cyclopentane, morpholine, piperidine, or cyclohexene). Bicyclic heterocycles can include any combination of ring sizes, such as 4-5 fused ring systems, 5-6 fused ring systems, and 6-6 fused ring systems. Unsaturated heterocycles include at least one multiple bond (e.g., double or triple bond) but are not aromatic heterocycles. The unsaturated heterocycle may be, for example, dihydropyrrole, dihydrofuran, oxazoline, pyrazoline, or dihydropyridine. Other examples of heterocycles include, but are not limited to, indole, benzothiophene, benzothiazole, benzoxazole, benzimidazole, oxazolopyridine, imidazopyridine, thiazolopyridine, furan, oxazole, pyrrole, pyrazole, imidazole, thiophene, thiazole, isothiazole, and isoxazole. The heteroaryl moiety may be an aromatic monocyclic ring structure, for example a 5 to 7 membered ring, comprising at least one heteroatom, for example 1 to 4 heteroatoms. Alternatively, the heteroaryl moiety may be a polycyclic ring system having two or more cyclic rings in which two or more atoms are common to two adjacent rings, wherein at least one ring is heteroaromatic. Heteroaryl groups include, for example, pyrrole, furan, thiophene, imidazole, oxazole, thiazole, pyrazole, pyridine, pyrazine, pyridazine, and pyrimidine, and the like.

The ring may be substituted or unsubstituted. SubstitutionWith substituents replacing a hydrogen atom on one or more atoms of the ring or a substitutable heteroatom of the ring (e.g. NH or NH)2). Substitution depends on the allowed valency of the various components of the ring system and provides stable compounds (e.g., compounds that do not undergo spontaneous conversion by, for example, rearrangement, elimination, or cyclization). A substituent may be substituted for a single hydrogen atom or multiple hydrogen atoms (e.g., on the same ring atom or on different ring atoms). The substituents on the ring may be, for example, halogen, hydroxy, oxo, thioketo, thiol, amide, amino, carboxy, nitrilo, cyano, nitro, imino, oximo, hydrazino, alkoxy, alkenyl, alkynyl, aryl, aralkyl, aralkenyl, aralkynyl, cycloalkyl, cycloalkylalkyl, alkylcycloalkyl, heterocycloalkyl, heterocyclyl, alkylheterocyclyl or any other useful substituent. The substituents may be water soluble. Examples of water-soluble substituents include, but are not limited to, pyridinium, imidazolium, quaternary ammonium groups, sulfonates, phosphates, alcohols, amines, imines, nitriles, amides, thiols, carboxylic acids, polyethers, aldehydes, boronic acids, and boronic esters.

The linker can have any number of rings, including at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or more rings. In some cases, the rings may share an edge (e.g., be a component of a two-ring system). In general, the loop portion of the linker may provide a degree of physical rigidity to the linker and/or may be used to physically separate a dye (e.g., a fluorescent dye) on one end of the linker from the substrate to be labeled and/or from a second dye (e.g., a fluorescent dye) associated with the substrate and/or with the linker. The loop may be a component of an amino acid (e.g., a non-proteinogenic amino acid, as described herein).

In some cases, the joint may be "completely rigid" (e.g., substantially inflexible). For example, the ring system of the linker may not be protected from any sp2Or sp3Carbon atoms are separated. Generally, sp2And sp3Carbon atoms (e.g., between ring systems) provide a degree of physical flexibility to the linker. Especially sp3Carbon atoms can impart significant flexibility. Without limitation, spiritThe activity may allow the polymerase to accept substrates (e.g., nucleotides or nucleotide analogs) modified with linkers and dyes (e.g., fluorescent dyes), or otherwise improve the performance of the labeling system. However, in a multi-dye system (e.g., a system comprising multiple fluorescent labeling reagents, such as a polynucleotide comprising two or more nucleotides coupled to two or more fluorescent labeling reagents), too flexible a linker may disrupt the rigid character and allow the two dyes (e.g., fluorescent dyes) to associate closely and be quenched. Thus, the ring system of the linker can pass through a limited number of sp 3The bonds being connected to each other, e.g. by not more than two sp3Bonds (e.g. 0, 1 or 2 sp3A key). For example, at least two ring systems of the linker may pass through no more than two sp3Bonding (e.g. by 0, 1 or 2 sp3Keys) are connected to each other. For example, at least two ring systems of the linker may pass through no more than two sp2Bonding, e.g. by not more than 1 sp2The keys are connected to each other. The ring systems of the linker may be connected to each other by a limited number of atoms, for example by not more than 2 atoms. For example, at least two ring systems of the linker may be connected to each other through no more than 2 atoms, such as only through 1 atom or no atoms (e.g., directly connected).

The series of ring systems of the linker may comprise aromatic and/or aliphatic rings. At least two ring systems of the linker may be directly connected to each other without an intermediate carbon atom. The linker may comprise at least one amino acid which may comprise a ring system. For example, the linker may comprise at least one non-protein amino acid (e.g., as described herein), such as hydroxyproline.

Many applications of optically (e.g., fluorescently) labeled reagents (e.g., nucleic acid sequencing reactions) can be performed in aqueous solutions. In some cases, linkers with too high a proportion of carbon and hydrogen atoms and/or lacking charged chemical groups may be insufficiently water soluble to be useful in aqueous solutions. Thus, the linkers described herein may have one or more water-solubilizing groups.

The linker may include a water-solubilizing group at any useful position. For example, the linker may comprise a water-soluble group at or near the point of attachment to the label (e.g., a dye, as described herein). Alternatively or additionally, the linker may comprise a water-solubilising group at or near the point of attachment to the substrate (e.g. protein or nucleotide analogue). Alternatively or additionally, the linker may comprise a water-soluble group between the point of attachment to the label (e.g., a dye, as described herein) and the substrate (e.g., a protein or nucleotide analog). One or more rings of the linker may comprise a water-solubilizing group. For example, each ring can comprise a water-solubilizing group, two or more rings can comprise a water-solubilizing group, only one ring can comprise a water-solubilizing group, or any position in between. A given ring may contain one or more water-soluble moieties. For example, the ring of the linker may comprise two water-soluble moieties. The water-solubilizing group(s) can be part of the backbone of the ring of the linker or can be appended to the ring of the linker (e.g., as a substituent). Each water-soluble portion of the linker may be different. Alternatively, one or more of the water-soluble moieties of the linker may be the same. For example, each water-soluble moiety of the linker may be the same. In some cases, the water-solubilizing group is positively charged. Examples of suitable water-solubilizing groups include, but are not limited to, pyridinium, imidazolium, quaternary ammonium groups, sulfonates, phosphates, alcohols, amines, imines, nitriles, amides, thiols, carboxylic acids, polyethers, aldehydes, and boronic acids or esters.

The water-solubilizing group can be any functional group that reduces (including makes it more negative) the logP of the optical (e.g., fluorescent) labeling agent. LogP is the partition coefficient of the molecule between water and n-octanol. The lipidic molecules are more likely to partition into octanol, giving positive values and large logP values. The formula of LogP can be expressed as log POctanol/waterLog ([ solute ]]Octanol (I)/[ solute]Water (W)) Wherein [ solute ]]Octanol (I)Is the concentration of solute (i.e., labeling agent) in octanol and]water (W)Is the concentration of the solute in the water. Thus, the more the compound partitions into water, the more negative the logP is compared to octanol. LogP can be predicted by experimental measurements or using software algorithms. The water-solubilizing group can have any suitable LogP value. In some cases, the LogP is less than about 2, smallLess than about 1.5, less than about 1, less than about 0.5, less than about 0, less than about-0.5, less than about-1, less than about-1.5, less than about-2, or less. In some cases, the LogP is about 2.0 to about-2.0.

The linker may comprise one or more asymmetric (e.g., chiral) centers (e.g., as described herein). All stereochemical isomers of the linker are contemplated, including racemates and enantiomerically pure linkers.

The linker and/or the substrate (e.g., protein or nucleotide analog) or dye to which it may be attached may include one or more isotopic (e.g., radioactive) labels (e.g., as described herein). All isotopic variations of the linker are contemplated.

The structural features of the linker, including the number of loops, the rigidity of the linker, etc., can be combined to establish a functional distance between the dye (e.g., fluorescent) and the substrate (e.g., protein or nucleotide analog) connected by the linker. In some cases, the distance corresponds to the length (and/or functional length) of the joint. In some cases, the functional length varies based on the temperature, solvent, pH, and/or salt concentration of the solution in which the length is measured or estimated. Functional length can be measured in solution, where an optical (e.g., fluorescent) signal from the substrate is measured. The functional length may be an average or overall value of the functional length distribution (e.g., in rotational, vibrational and translational motion) and may vary based on, for example, temperature, solvent, pH and/or salt concentration. Functional length can be estimated (e.g., based on bond length and spatial considerations, such as by using chemical mapping or modeling procedures) and/or measured (e.g., using molecular imaging and/or crystallography techniques).

The linker may establish any suitable functional length between the dye (e.g., a fluorescent dye) and the substrate (e.g., a protein or nucleotide analog). In some cases, the functional length is up to about 500 nanometers (nm), about 200nm, about 100nm, about 75nm, about 50nm, about 40nm, about 30nm, about 20nm, about 10nm, about 5nm, about 2nm, about 1.0nm, about 0.5nm, about 0.3nm, about 0.2nm, or less. In some cases, the functional length is at least about 0.2 nanometers (nm), at least about 0.3nm, at least about 0.5nm, at least about 1.0nm, at least about 2nm, at least about 5nm, at least about 10nm, at least about 20nm, at least about 30nm, at least about 40nm, at least about 50nm, at least about 75nm, at least about 100nm, at least about 200nm, at least about 500nm, or greater. In some cases, the functional length is from about 0.5nm to about 50 nm.

In some cases, the linker forms a straight and/or continuous chain. In some cases, the linker is branched. The linker may be capable of forming a bond with a plurality of dyes (e.g., fluorescent dyes) and/or substrates (e.g., nucleotides and/or nucleotide analogs).

The linker may be a polymer having regular repeating units. Alternatively, the linker may be a copolymer without regular repeating units. In some cases, the linker is not the result of the polymerization process. In general, the polymerization process can produce products having a variety of degrees of polymerization and molecular weights. Rather, in some cases, the linkers described herein have a defined (i.e., known) molecular weight.

Linkers can be constructed from one or more amino acids. For example, a linker may be constructed from two or more amino acids. The amino acid may be a natural amino acid or an unnatural amino acid. The amino acids may be proteinogenic or non-proteinogenic amino acids. As used herein, "proteinogenic amino acid" generally refers to a genetically encoded amino acid that can be incorporated into a protein during translation. Proteinogenic amino acids include arginine, histidine, lysine, aspartic acid, glutamic acid, serine, threonine, asparagine, glutamine, cysteine, selenocysteine, glycine, proline, alanine, isoleucine, leucine, methionine, phenylalanine, tryptophan, tyrosine, valine, selenocysteine, and pyrrolysine. As used herein, a "non-proteinogenic amino acid" is an amino acid that is not a proteinogenic amino acid. The non-protein amino acids can be natural amino acids or non-natural amino acids. Non-protein amino acids include amino acids not found in proteins and/or not naturally encoded or found in the genetic code of an organism. Examples of non-proteinogenic amino acids include, but are not limited to, hydroxyproline, selenomethionine, hydroxybutylysine, 2-aminoisobutyric acid, α γ -aminobutyric acid, ornithine, citrulline, β -alanine (3-aminopropionic acid), δ -aminolevulinic acid, 4-aminobenzoic acid, dehydroalanine, carboxyglutamic acid, pyroglutamic acid, norvaline, norleucine, alloisoleucine, tyrosineucine, pipecolic acid, allothreonine, homocysteine, homoserine, α -amino-N-heptanoic acid, α, β -diaminopropionic acid, α, γ -diaminobutyric acid, β -amino-N-butyric acid, β -aminoisobutyric acid, isovaline, sarcosine, N-ethylglycine, N-propylglycine, N-isopropylglycine, N-methylalanine, N-ethylalanine, N-methyl beta-alanine, N-ethyl beta-alanine, isoserine and alpha-hydroxy-gamma-aminobutyric acid. Other examples of non-protein amino acids include the unnatural amino acids described herein. Non-protein amino acids may comprise a loop structure. For example, the non-proteinogenic amino acid may be trans-4-aminomethylcyclohexanecarboxylic acid or 4-hydrazinobenzoic acid. Such compounds can be FMOC protected with FMOC (fluorenylmethyloxycarbonyl chloride) and used for solid phase peptide synthesis. The structures of these compounds are shown below:

Where the linker comprises a plurality of amino acids, e.g., a plurality of non-proteinogenic amino acids, an amine moiety adjacent to the ring moiety (e.g., an amine moiety in a hydrazine moiety) may serve as a water-solubilizing group. To synthesize water-soluble peptides, mixed linkers comprising alternating water-insoluble and water-soluble amino acids (e.g., hydroxyproline) can be prepared. Other moieties may be used to increase water solubility. For example, linking an amino acid to an oxamate moiety may provide water solubility through additional hydrogen bonding without the addition of any sp3A key. The structure of the oxamate precursor 2-amino-2-oxoacetic acid is shown below:

in some cases, a component (e.g., a monomeric unit) of the linker can have an amino group, a carboxyl group, and a water-soluble moiety. In some cases, a monomer may be deconstructed into two "half-monomers". That is, by using two different units, one comprising two amino groups and the other comprising two carboxyl groups, an amino acid moiety can be constructed, which can be a unit (e.g., a repeating unit) of a linker. One or both units may include one or more water-soluble moieties. For example, at least one unit can include a water-solubilizing group (e.g., as described herein). For example, 2, 5-diaminohydroquinone may be one semi-monomer (A) and 2, 5-dihydroxyterephthalic acid may be the other semi-monomer (B). Such a scheme is shown below:

As indicated above, A is a diamine and B is a diacid. Thus, non-protein (e.g., non-natural) amino acids can be constructed from diamines and diacids. Further examples of such structures are shown below:

polymers based on two half-monomers (e.g., as shown above) can be constructed by solid phase synthesis. Since the half-monomers can be homobifunctional at the linking moiety, FMOC protection is not required in some cases. For example, the dicarboxylic acid may be attached to a solid support and then the excess diamine added with an appropriate coupling agent (HBTU/HOBT/collidine). After washing away the excess reagent, an excess of dicarboxylic acid may be added together with the coupling agent. A side product consisting of the reaction of one molecular liquid phase reagent with two solid phase attachment reagents may lead to a truncation of the synthesis. These by-products can be separated from the product after cleavage from the support and purification by HPLC.

One advantage of the semi-monomer process may be the increased flexibility of the resulting polymer. The diamine (A) may be replaced in a subsequent step by a different diamine (A') to modify the properties of the polymer in a repetitive or non-repetitive manner. Such a scheme may facilitate the construction of polymers such as ABA 'BABA' B.

Further examples of semi-monomers for use according to the above scheme include 2, 5-diaminopyridine and 2, 5-dicarboxylpyridine, both of which are shown below, as well as other moieties shown below:

diamine (b):

dicarboxylic acids:

as described above, amino acids (e.g., non-proteinogenic amino acids that may be non-natural amino acids) may be constructed from diamines and dicarboxylic acids. Amino acids (e.g., non-protein amino acids that may be non-natural amino acids) may also be constructed from aminothiols and thiol carboxylic acids. Examples of amino thiols and thiol carboxylic acids are shown below:

examples of amino acids (e.g., unnatural amino acids) constructed from aminothiols and thiol carboxylic acids are shown below:

as indicated above, amino acids constructed using aminothiols and thiol carboxylic acids may include disulfide bonds. As described elsewhere herein, disulfide bonds can be cleaved using a cleavage reagent (e.g., as described herein). Thus, amino acids constructed from an amino thiol and a thiol carboxylic acid can be used as cleavable moieties of a linker. The amino acid constructed from an aminothiol and a carboxylic acid can be a component of a linker (e.g., as described herein) that can couple a labeling moiety (e.g., a fluorescent dye) to a substrate (e.g., a nucleotide or nucleotide analog). The various structures allow for different hydrophobicity for incorporation and may provide different "scars upon interaction with the cleaving agent A scar "portion (e.g., as described herein). Two or more amino acids, for example, two or more amino acids constructed from an aminothiol and a thiol carboxylic acid, can be included in the linker. For example, two or more amino acids may be included in a linker and are no more than 2 sp3Separated by carbon atoms, e.g. not more than 2 sp2Carbon atoms or no more than 2 atoms apart. In case two or more amino acids formed from an aminothiol and a thiol carboxylic acid are linked to each other within the linker, cleavage is likely to be faster, since there will be multiple possible cleavage sites. Examples of a portion of a joint comprising such components are as follows:

as described above, the two half-monomers can be combined to provide an amino acid (e.g., a non-proteinogenic amino acid, such as a non-natural amino acid). Thus, an unnatural amino acid can include any known unnatural amino acid, as well as any unnatural amino acid that can be constructed as described herein.

Half-monomers such as those described herein can be constructed into polypeptide polymers. Examples of nucleotides constructed with two repeat units of amino acids are shown below:

in some cases, the nitrogen in the nitrogen-containing ring may be quaternized to provide a pyridinium moiety, either before or after peptide coupling, thereby increasing the water solubility of the final product. An exemplary linker sequence generated in this manner is shown below:

Water-soluble linkages that may be used with the semi-monomeric process include, for example, those having symmetric functional groups such as secondary amides, dihydrazides, and ureas. Examples of such moieties are shown below:

the amino acid linker subunits can be assembled into polymers by peptide synthesis methods. For example, a solid support method known as SPPS (solid phase peptide synthesis) or by liquid phase synthesis can be used to assemble amino acids into a linker. The SPPS method can use solid phase beads, where the initial step is to attach the C-terminal amino acid through its carboxylic acid moiety, leaving its free amine ready for coupling. Peptide synthesis can be initiated by flowing the FMOC amine protected monomer with a peptide coupling reagent (e.g., HBTU and organic base). Excess reagent can be washed away and the next monomer introduced. After addition of one or more amino acids, the final peptide can be cleaved from the beads and purified by HPLC. Liquid phase synthesis can use the same reagents (except for the beads), but purification is performed after each step. The advantage of either step-wise polymerization process is that the resulting linker can have a defined molecular weight that can be confirmed by mass spectrometry.

The linker may comprise one or more components. For example, the linker may comprise a first component comprising a polymeric region (e.g., comprising a repeating unit) and a second unit that does not comprise a polymeric region. The second component can include a cleavable component (e.g., as described herein). Examples of cleavable linkers include, but are not limited to, structures E and B as shown below:

In the structures shown above, disulfide moieties can be cleaved (e.g., as described herein) to provide thiol scarring. Upon reaction between the carboxyl moiety of the linker moiety and the amine moiety attached to the substrate (e.g., a protein or nucleotide analog), the cleavable linker can be attached to the substrate to provide a substrate attached to the cleavable linker through the amide moiety. For example, the substrate can be a nucleotide or nucleotide analog that includes a propargylamino moiety, and a fluorescently labeled reagent that includes a dye and a linker described herein can be configured to associate with the substrate through the propargylamino moiety. Examples of such substrates are shown below:

the first component of the linker comprising the first and second components may comprise a repeat unit. For example, the linker may include a first component that includes one or more hydroxyproline moieties. Examples of such linker components are shown below:

the linker shown above includes 10 hydroxyproline moieties and one glycine moiety, and is referred to herein as "H" or "hyp 10". An alternative to the above linker comprises 20 hydroxyproline moieties and one glycine moiety, and is referred to herein as "hyp 20". As described herein, all stereoisomers of hyp10 and hyp20 and combinations thereof are contemplated. A linker component such as hyp10 may be attached to the cleavable linker by reaction between the free carboxylic moiety of the linker component and the amino moiety of the cleavable linker. The linker component, e.g., hyp10, may be attached to the dye through the free amino moiety of the linker component. Examples of optical labeling reagents comprising a first linker component comprising a repeat unit (e.g., hyp10) and a second linker component comprising a cleavable linker are provided elsewhere herein.

The linker may provide a linkage between the fluorescent moiety (e.g., a dye, as described herein) and the substrate (e.g., a protein or nucleotide analog). For example, an optical (e.g., fluorescent) labeling reagent can comprise an optical dye (e.g., a fluorescent dye) attached to a linker (e.g., as described herein). Non-limiting examples of dyes (e.g., fluorescent dyes) include SYBR Green, SYBR blue, DAPI, propidium iodide, Hoechst, SYBR gold, ethidium bromide, acridine, proflavine, acridine orange, acridine yellow, fluorescent coumarin, ellipticine, daunorubicin, chloroquine, distamycin D, chromomycin, ethidium, mithramycin, ruthenium polypyridine, anthranomycin, phenanthridine and acridine, propidium iodide, hexidium iodide, ethidium dihydroethidium, ethidium dimer-1 and-2, ethidium monoazide, ACMA, Hoechst 33258, Hoechst 33342, Hoechst34580, DAPI, acridine orange, 7-AAD, actinomycin D, LDS751, dobamidine, SYTOX blue, SYTOX Green, SYTOX orange, POPO-1, POPO-3, YOYO-1, YOTO-3, BO-JTO-1, BO-3, BO-1-O-BO-3, BO-O-1-O-1, BO-3, BO-3, BO, and other, PO-PRO-1, PO-PRO-3, BO-PRO-1, BO-PRO-3, TO-PRO-1, TO-PRO-3, TO-PRO-5, JO-PRO-1, LO-PRO-1, YO-PRO-3, PicoGreen, OliGreen, RiboGreen, SYBR gold, SYBR Green I, SYBR Green II, SYBR DX, SYTO dyes (e.g., SYTO-40, -41, -42, -43, -44, and-45 (blue); SYTO-13, -16, -24, -21, -23, -12, -11, -20, -22, -15, -14, and-25 (green); SYTO-81, -80, -82, -83, -43, 44, and-45 (blue); SYTO-13, -16, -21, -23, -12, -11, -20, -22, -15, -14, and-25 (green); SYTO-81, -80, -82), -84 and-85 (orange); SYTO-64, -17, -59, -61, -62, -60, and-63 (Red)), Fluorescein Isothiocyanate (FITC), tetramethylrhodamine isothiocyanate (TRITC), rhodamine, tetramethylrhodamine, R-phycoerythrin, Cy-2, Cy-3, Cy-3.5, Cy-5, Cy5.5, Cy-7, Texas Red, Phar-Red, Allophycocyanin (APC), Sybr Green I, Sybr Green II, Sybr gold, CellTracker Green, 7-AAD, ethidium homodimer I, ethidium homodimer II, ethidium homodimer III, ethidium bromide, umbelliferone, eosin, green fluorescent protein, erythrosine, coumarin, methylcoumarin, pyrene, malachite Green, stilbene, lucifer yellow, cascade blue, dichlorotriazinyl aminofluorescein, dansyl chloride, fluorescent lanthanide complexes such as those including europium and terbium, europium and terbium, Carboxypetraflurofluorescein, 5 and/or 6-carboxyfluorescein (FAM), VIC, 5- (or 6-) iodoacetamido fluorescein, 5- { [2 (and 3) -5- (acetylmercapto) -succinyl ] amino } fluorescein (SAMSA-fluorescein), lissamine rhodamine B sulfonyl chloride, 5 and/or 6 carboxyrhodamine (ROX), 7-amino-methyl-coumarin, 7-amino-4-methylcoumarin-3-acetic acid (AMCA), BODIPY fluorophore, 8-methoxypyrene-1, 3, 6-trisulfonate trisodium salt, 3, 6-disulfonic-4-amino-naphthalimide, phycobiliprotein, AlexaFluor dyes (e.g., AlexaFluor350, 405, 430, 488, 532, 546, 555, a, 568. 594, 610, 633, 635, 647, 660, 680, 700,/750, and 790 dyes), DyLight dyes (e.g., DyLight 350, 405, 488, 550, 594, 633, 650, 680, 755, and 800 dyes), black hole quenching dyes (Biosearch Technologies) (e.g., BH1-0, BHQ-1, BHQ-3, and BHQ-10), QSY dye fluorescence quenchers (from Molecular Probes/Invitrogen) (e.g., QSY7, QSY9, QSY21, and QSY35), Dabcyl, dasyl. Cy5Q, Cy7Q, melanine dye (GE Healthcare), Dy-quenchers (Dyomics) (e.g., DYQ-660 and DYQ-661), ATTO fluorescence quenchers (ATTO-TEC GmbH) (e.g., ATTO540Q, 580Q, 612Q, 532, and 633), and other fluorophores and quenchers (e.g., as described herein). In some cases, the tag may be of a type that is not self-quenching or exhibits near-quenching. Non-limiting examples of types of labels that do not self-quench or exhibit proximity quenching include bis-tetralin derivatives, such as Monobromobimane. Additional dyes included in the structures provided herein can also be used in combination with any linker provided herein and any substrate described herein, regardless of the context of their disclosure.

An optically (e.g., fluorescent) labeling reagent comprising an optical dye (e.g., a fluorescent dye) and a linker can further comprise a cleavable group that can be cleaved to separate the optical dye from a substrate with which the optical labeling reagent is associated. All or a portion of the linker may be part of the cleavable group. In some cases, cleavage of the cleavable group may leave a scar group associated with the substrate. The cleavable group may be, for example, an azidomethyl group capable of being cleaved by tris (2-carboxyethyl) phosphine (TCEP), Dithiothreitol (DTT), or Tetrahydropyranyl (THP) to leave a hydroxyl scar group. The cleavable group may be, for example, a disulfide bond capable of being cleaved by TCEP, DTT or THP to leave a thiol scar group. The cleavable group may be, for example, a hydrocarbyl dithiomethyl group capable of being cleaved by TCEP, DTT or THP to leave a hydroxyl scar group. The cleavable group can be, for example, a 2-nitrobenzyloxy group capable of being cleaved by Ultraviolet (UV) light to leave a hydroxyl scar group. The scar can also be, for example, an aromatic group, such as a phenyl or benzyl moiety.

An optically (e.g., fluorescently) labeled reagent can be configured to associate with a substrate, such as a nucleotide or nucleotide analog (e.g., as described herein). Alternatively or in addition, the optically (e.g., fluorescently) labeled agent can be configured to associate with a substrate, such as a protein, cell, lipid, or antibody. For example, the optical labeling reagent may be configured to associate with a protein. The protein substrate may be any protein and may include any useful modification, mutation or label, including any isotopic label. For example, the protein may be an antibody, such as a monoclonal antibody. The protein associated with one or more optically (e.g., fluorescently) labeled reagents (e.g., as described herein) can be, for example, an antibody (e.g., a monoclonal antibody) for labeling cells that can be analyzed and sorted using flow cytometry.

Optical (e.g., fluorescent) labeling agents (e.g., as described herein) can reduce quenching (e.g., coupling between dyes incorporated into nucleotides or nucleotide analogs in a growing nucleic acid strand, e.g., during nucleic acid sequencing). For example, the optical (e.g., fluorescent) signal emitted by a substrate (e.g., a nucleotide or nucleotide analog that can be incorporated into a growing nucleic acid strand) can be proportional to the amount of optical (e.g., fluorescent) label associated with the substrate (e.g., the amount of optical label adjacent or near to the substrate that is incorporated). For example, a plurality of optical labeling reagents comprising the same or different types of substrates (e.g., the same or different types of nucleotides or nucleotide analogs) can be incorporated into a growing nucleic acid strand in proximity to one another (e.g., during a nucleic acid sequencing process). In such systems, the signal emitted by the collective substrate may be approximately proportional (e.g., linearly proportional) to the amount of dye-labeled substrate incorporated. In other words, quenching may not significantly affect the emitted signal. This is observable in systems where a 100% mark score is used. In the case where less than 100% of the substrate is labeled (e.g., less than 100% of the nucleotides in the nucleotide stream are labeled), the optical (e.g., fluorescent) signal emitted by the substrate (e.g., a nucleotide or nucleotide analog) incorporated into the plurality of growing nucleic acid strands (e.g., the plurality of growing nucleic acid strands coupled to the sequencing template coupled to the support, as described herein) can be proportional to the length of the homopolymer region of the growing nucleic acid strands. Similarly, where less than 100% of the substrate is labeled (e.g., less than 100% of the nucleotides in each of the continuous nucleotide streams are labeled), the optical (e.g., fluorescent) signal emitted by the substrate (e.g., a nucleotide or nucleotide analog) incorporated into the plurality of growing nucleic acid strands (e.g., the plurality of growing nucleic acid strands coupled to the sequencing template coupled to the support, as described herein) can be proportional to the length of the heteropolymer and/or homopolymer region of the growing nucleic acid strands. In some such cases, the intensity of the measured optical (e.g., fluorescent) signal can be linearly proportional to the length of the heteropolymeric and/or homopolymeric region into which the substrate has been incorporated. For example, when plotting the optical (e.g., fluorescent) signal against the length in the substrate of the heteropolymeric and/or homopolymeric region into which the substrate has been incorporated, the measured optical (e.g., fluorescent) signal can be linearly proportional to a slope of about 1.0.

An optical (e.g., fluorescent) labeling agent (e.g., as described herein) can reduce quenching in a protein system. When a protein is labeled, quenching may begin when the ratio of fluorophore to protein (F/P) is about 3. Using the optical labeling reagents provided herein, higher F/P ratios, and thus brighter reagents, can be obtained. This can be used to analyze the protein (e.g., using imaging) and/or to analyze cells labeled with a protein (e.g., an antibody) associated with one or more optical (e.g., fluorescent) labeling reagents.

Examples of linkers described herein can be found, for example, in FIGS. 1A-1C, 2A, 4, 5A, 5B, 6, 7, 8, 13A-13C, 14A, 14B, 16, and 17. In some cases, the R groups included in these linkers (e.g., as in fig. 1C) impart sufficient water solubility on the labeling reagent. Additional examples are included elsewhere herein, including in the examples below.

In one aspect, the present disclosure provides oligonucleotide molecules comprising a fluorescent labeling agent or a derivative thereof (e.g., as described herein). The oligonucleotide molecule may comprise one or more additional fluorescent labeling reagents of the same type (e.g., comprising linkers of the same chemical structure, comprising dyes of the same chemical structure, and/or associated with the same type of substrate (e.g., nucleotides)). The fluorescent labeling reagent of the oligonucleotide molecule and one or more additional fluorescent labeling reagents may be associated with the nucleotide. For example, a fluorescent labeling reagent may be attached to the nucleobase of a nucleotide of an oligonucleotide molecule. The fluorescent labeling reagent and one or more additional fluorescent labeling reagents may be linked to adjacent nucleotides of the oligonucleotide molecule. Alternatively or additionally, the fluorescent labeling reagent and one or more additional fluorescent labeling reagents may be linked to nucleotides of the oligonucleotide molecule that are separated by one or more nucleotides that are not linked to the fluorescent labeling reagent. The oligonucleotide molecule may be a single stranded molecule. Alternatively, the oligonucleotide molecule may be a double-stranded or partially double-stranded molecule. The double-stranded or partially double-stranded molecule may comprise a fluorescent labeling agent associated with the single-stranded or double-stranded molecule. The oligonucleotide molecule may be a deoxyribonucleic acid molecule. The oligonucleotide molecule may be a ribonucleic acid molecule. Oligonucleotide molecules can be generated and/or modified by a nucleic acid sequencing process (e.g., as described herein).

The linker of the fluorescently labeled reagent can include a cleavable group configured to be cleaved to separate the fluorochrome of the fluorescently labeled reagent from the substrate (e.g., nucleotide) with which it is associated. For example, the linker may comprise a cleavable group that includes an azidomethyl group, a disulfide bond, a hydrocarbyl dithiomethyl group, or a 2-nitrobenzyloxy group. The cleavable group may be configured to be cleaved by application of one or more members of the group: consisting of tris (2-carboxyethyl) phosphine (TCEP), Dithiothreitol (DTT), Tetrahydropyranyl (THP), Ultraviolet (UV) light, and combinations thereof. An oligonucleotide molecule comprising a fluorescently labeled reagent can be configured to emit a fluorescent signal (e.g., upon excitation at an appropriate range of energies, as described herein).

In another aspect, the present disclosure provides a kit comprising a plurality of linkers (e.g., as described herein). The linker of the plurality of linkers may comprise (i) one or more water-solubilizing groups and (ii) two or more ring systems. At least two of the two or more ring systems may pass through no more than two sp3The carbon atoms are linked to each other. For example, at least two of the two or more ring systems may be through sp 2The carbon atoms are linked to each other. Two or moreAt least two of the ring systems may be connected to each other by no more than two atoms. The linker may comprise a non-protein amino acid (e.g., as described herein) comprising a ring system of two or more ring systems. For example, the linker may comprise hydroxyproline or an amino acid composed of, for example, diamines and dicarboxylic acids or amino thiols and thiol carboxylic acids. The linker can be attached to a fluorescent dye (e.g., as described herein) and/or associated with a substrate. For example, the linker may be attached to a fluorescent dye and coupled to a substrate selected from the group consisting of nucleotides, proteins, lipids, cells, and antibodies. For example, the linker may be attached to a fluorescent dye and a nucleotide.

The linker can comprise a plurality of amino acids, such as a plurality of non-protein (e.g., non-natural) amino acids. For example, the linker may comprise a plurality of hydroxyprolines (e.g., hyp10 moieties). At least one of the one or more water-solubilizing groups may be appended to the ring structure of two or more ring systems. The one or more water-solubilizing groups may be selected from pyridinium, imidazolium, quaternary ammonium groups, sulfonates, phosphates, alcohols, amines, imines, nitriles, amides, thiols, carboxylic acids, polyethers, aldehydes, boronic acids, and boronic esters. The linker may comprise a cleavable group configured to be cleaved to separate a first portion of the linker from a second portion of the linker. The cleavable group may be selected from the group consisting of an azidomethyl group, a disulfide bond, a hydrocarbyl dithiomethyl group, and a 2-nitrobenzyloxy group. Cleavable groups are cleavable by the application of one or more members of the group: tris (2-carboxyethyl) phosphine (TCEP), Dithiothreitol (DTT), Tetrahydropyranyl (THP), Ultraviolet (UV) light, and combinations thereof. The linker may comprise a linker selected from Part (c) of (a). These moieties all comprise disulfide groups and can therefore be considered cleavable groups.

The plurality of linkers of the kit can comprise a first linker associated with a first substrate (e.g., a first nucleotide) and a second linker associated with a second substrate (e.g., a second nucleotide). The first substrate and the second substrate may be of different types (e.g., different canonical nucleotides). The first and second substrates can be nucleotides comprising different types (e.g., A, C, G, U and T) of nucleobases. The first linker and the second linker may comprise the same chemical structure. Similarly, the first linker may be attached to the first fluorescent dye and the second linker may be attached to the second fluorescent dye. The first fluorescent dye and the second fluorescent dye may be of different types. For example, the first and second fluorescent dyes may fluoresce at different wavelengths and/or have different maximum excitation wavelengths. The first and second fluorescent dyes may fluoresce at similar wavelengths and/or have similar maximum excitation wavelengths, regardless of whether they share the same chemical structure.

The plurality of linkers of the kit can further comprise a third linker associated with a third substrate and a fourth linker associated with a fourth substrate. The first, second, third and fourth substrates may be of different types. For example, the first, second, third and fourth substrates can be nucleotides comprising different types (e.g., A, C, G and U/T) of nucleobases. The first linker and the third linker may comprise different chemical structures. The first and third linkers can comprise the same chemical group, e.g., the same cleavable group (e.g., as described herein). For example, the first linker and the third linker may each comprise a moiety comprising a disulfide bond. Similarly, the first linker and the fourth linker may comprise different chemical structures. The first and fourth linkers can comprise the same chemical group, e.g., the same cleavable group (e.g., as described herein). For example, the first linker and the fourth linker may each comprise a moiety comprising a disulfide bond.

In one example, the first linker comprises a hyp10 moiety and a first cleavable moiety, the second linker comprises a hyp10 moiety and a second cleavable moiety, the third linker comprises a third cleavable moiety but does not comprise a hyp10 moiety, and the fourth linker comprises a fourth cleavable moiety and does not comprise a hyp10 moiety. The second cleavable moiety can have a different chemical structure than the first cleavable moiety. Alternatively, the second cleavable moiety and the first cleavable moiety may have the same chemical structure. The third cleavable moiety and the fourth cleavable moiety may have the same chemical structure. Alternatively, the third cleavable moiety and the fourth cleavable moiety may have different chemical structures. In one example, the first linker and the second linker each have a first chemical structure and the third linker and the fourth linker each have a second chemical structure that is different from the first chemical structure. In another example, the first linker, the second linker, the third linker, and the fourth linker all have the same chemical structure. In another example, the first linker, the second linker, the third linker, and the fourth linker all have different chemical structures.

Method for using optical labeling reagent

There are several different types of quenching that can be reduced, and different types of applications can be performed using the optical (e.g., fluorescent) labeling reagents described herein.

The methods described herein can be used to reduce quenching, including G-quenching. Attachment of a dye (e.g., a fluorescent dye) to a nucleotide (e.g., via a linker provided herein) can result in dye quenching of many dyes, particularly when the dye is attached to a guanosine nucleotide. Dye quenching can occur between a dye and its associated nucleotide, as well as between dye moieties, such as dye moieties coupled to different nucleotides (e.g., adjacent nucleotides or nucleotides separated by one or more other nucleotides). Use of the linkers provided herein can mitigate quenching, allowing for more sensitive detection of G-containing sequences. In addition, dye-labeled nucleotides of the crystalline G-homopolymer region may exhibit reduced fluorescence. Any nucleic acid sequencing method that requires the attachment of a dye to the dGTP can benefit from these linkers, including single molecule detection, sequencing using 3' blocked nucleotides, and sequencing by hybridization.

The methods described herein can be used to reduce dye-dye quenching on adjacent or neighboring nucleotides (e.g., nucleotides separated by one, two, or more other nucleotides) on the same DNA strand. Methods that require dyes on adjacent or nearby nucleotides may result in proximity quenching; that is, the brightness of two adjacent dyes is less than twice that of one dye, or often even less than that of a single dye. Quenching can be mitigated using the linkers provided herein, allowing for quantitative detection of multiple dyes. For example, in sequencing methods such as most natural nucleotide flow sequencing, the fraction of labeled dye is typically less than 5% because the signal at higher fractions for homopolymers is not linear with homopolymer length due to quenching problems. The reagents described herein can allow more (e.g., greater than 5%, in some cases up to 100%) of the nucleotides to be labeled while facilitating sensitive and accurate detection of incorporated nucleotides.

The use of dye-linker-nucleotides provided herein can result in more efficient incorporation (e.g., increased tolerance) into a growing nucleic acid strand by a polymerase (e.g., as described herein) as compared to dye-nucleotides lacking a linker (e.g., during nucleic acid sequencing). The result may be that a smaller amount of dye-labeled nucleotides is used to obtain the same signal.

Use of the dye-linker-nucleotides provided herein can result in less polymerase misincorporation (e.g., as described herein) (e.g., during nucleic acid sequencing). The result may be less loss of template strand and therefore longer sequencing reads.

Use of the dye-linker-nucleotides provided herein can result in less mismatch extension (e.g., during nucleic acid sequencing), and thus reduce priming.

The methods described herein can be used to reduce dye-dye quenching in multi-dye applications. Hybridization assays may also benefit from linkers that prevent quenching. Quenching effects may lead to target to signal non-linearity.

The methods described herein can be used in combination with oligomers and dendrimers for signal amplification. Non-quenching linkers can allow the synthesis of very bright polymers for antibody labeling. These bright antibodies can be used for cell surface labeling in flow cytometry or for antigen detection methods such as side flow assays and fluorescence immunoassays.

The optically (e.g., fluorescently) labeled reagents of the present disclosure can be used as molecular scales. The substrate may be a fluorescence quencher, a fluorescence donor or a fluorescence acceptor. In some cases, the substrate is a nucleotide. The linker may be attached to the nucleotide on the nucleobase as shown below, where the dye is Atto 633:

the structure shown above is an optical (e.g., fluorescent) labeling reagent comprising a cleavable (via disulfide bond) moiety and a fluorescent dye attached to a dGTP analog (dGTP-SS-py-Atto633) via a pyridinium linker. Other examples of optical labeling reagents are provided throughout the disclosure.

Dye-labeled nucleotides described herein can be used in methods of sequencing by synthesis using mixtures of dye-labeled and natural nucleotides in flow-based protocols. Such methods typically use a low percentage of labeled nucleotides compared to natural nucleotides. However, the use of a low percentage of labeled nucleotides (e.g., less than 20%) may have several disadvantages compared to the natural nucleotides in the flow mixture: (a) since a small portion of the template provides sequence information, this approach requires a higher template copy number; (b) variability in DNA polymerase elongation between labeled and unlabeled nucleotides may lead to a context-dependent labeling score, thereby increasing the difficulty of distinguishing single-base incorporation from multi-base incorporation; and (c) a low proportion of the marked portion can result in high binomial noise in the marked product population. Flow-based sequencing methods that primarily use natural nucleotides are further described in U.S. patent No. 8,772,473, which is incorporated by reference in its entirety for all purposes.

The semi-rigid linkers provided herein can allow the labeling fraction of dye-labeled nucleotides to natural nucleotides in each stream to be sufficiently high (e.g., 20-100% labeling) to avoid or reduce the impact of the aforementioned disadvantages of such schemes. This higher percentage of labels may produce a greater optical (e.g., fluorescent) signal, thereby reducing template requirements. If a 100% mark is used, binomial noise and context changes can be substantially eliminated. A key technical hurdle overcome by the solution described herein is that dye-labeled nucleotides on adjacent or nearby nucleotides must exhibit minimal quenching. The overall result of the combined advantages may be more accurate DNA sequencing.

The present disclosure provides a method for sequencing a nucleic acid molecule. The method can include contacting the nucleic acid molecule with a primer under conditions sufficient for hybridization of the primer to the nucleic acid molecule, thereby generating a sequencing template. The sequencing template can then be contacted with a polymerase (e.g., as described herein) and a solution (e.g., a nucleotide stream) comprising a plurality of optically (e.g., fluorescently) labeled nucleotides (e.g., as described herein). Each optically (e.g., fluorescently) labeled nucleotide of the plurality of optically (e.g., fluorescently) labeled nucleotides can comprise the same chemical structure (e.g., each labeled nucleotide can comprise the same type of dye, the same type of linker, and the same type of nucleotide or nucleotide analog). An optically labeled nucleotide of the plurality of optically labeled nucleotides can be complementary to a nucleic acid molecule at a plurality of positions adjacent to a primer that hybridizes to the nucleic acid molecule. Thus, one or more of the plurality of optically labeled nucleotides can be incorporated into the sequencing template. Where the nucleic acid molecule includes a homopolymeric region, multiple nucleotides (e.g., labeled and unlabeled nucleotides) may be incorporated. Incorporation of multiple nucleotides adjacent to each other can be facilitated by the use of non-terminating nucleotides. The solution comprising the plurality of optically-labeled nucleotides can then be washed away from the sequencing template (e.g., using a wash stream, as described herein). Optical (e.g., fluorescent) signals from the sequencing template can be measured. When two or more labeled nucleotides are incorporated into the homopolymeric region, the intensity of the measured optical (e.g., fluorescent) signal may be greater than the optical (e.g., fluorescent) signal that would be measurable if a single optical (e.g., fluorescent) labeled nucleotide of the plurality of optical (e.g., fluorescent) labeled nucleotides had been incorporated into the sequencing template. Such methods may be particularly useful for sequencing homopolymers or nucleic acid portions of homopolymers (i.e., having multiple identical bases in a row) The application is as follows. An optically-labeled nucleotide in the plurality of optically-labeled nucleotides can comprise a dye (e.g., a fluorescent dye) and a linker (e.g., as described herein) attached to the dye and the nucleotide. The linker may comprise (i) one or more water-solubilizing groups and (ii) two or more ring systems, wherein at least two of the two or more ring systems have no more than two sp transitions3The carbon atoms are linked to each other, for example by not more than two atoms. The linker may comprise a non-protein amino acid comprising a ring system of two or more ring systems. For example, the linker may comprise hydroxyproline or an amino acid composed of, for example, diamines and dicarboxylic acids or amino thiols and thiol carboxylic acids. The linker may be configured to establish a functional length between the dye and the nucleotide of at least about 0.5 nanometers.

The intensity of the measured optical (e.g., fluorescent) signal can be proportional to the number of optically (e.g., fluorescent) labeled nucleotides incorporated into the sequencing template (e.g., where a 100% label fraction is used). In other words, quenching may not significantly affect the emitted signal. For example, the intensity can be linearly proportional to the number of optically (e.g., fluorescently) labeled nucleotides incorporated into the sequencing template. When plotted against the number of optically (e.g., fluorescently) labeled nucleotides incorporated into the sequencing template, the intensity of the measured optical (e.g., fluorescent) signal can be linearly proportional to the slope of about 1.0. In the case where less than 100% of the substrate is labeled (e.g., less than 100% of the nucleotides in the nucleotide stream are labeled), the optical (e.g., fluorescent) signal emitted by the substrate (e.g., a nucleotide or nucleotide analog) incorporated into the plurality of growing nucleic acid strands (e.g., the plurality of growing nucleic acid strands coupled to the sequencing template coupled to the support, as described herein) can be proportional to the length of the homopolymer region of the growing nucleic acid strands. Similarly, where less than 100% of the substrate is labeled (e.g., less than 100% of the nucleotides in each of the continuous nucleotide streams are labeled), the optical (e.g., fluorescent) signal emitted by the substrate (e.g., a nucleotide or nucleotide analog) incorporated into the plurality of growing nucleic acid strands (e.g., the plurality of growing nucleic acid strands coupled to the sequencing template coupled to the support, as described herein) can be proportional to the length of the heteropolymer and/or homopolymer region of the growing nucleic acid strands. In some such cases, the intensity of the measured optical (e.g., fluorescent) signal can be linearly proportional to the length of the heteropolymeric and/or homopolymeric region into which the substrate has been incorporated. For example, when plotting the optical (e.g., fluorescent) signal against the length in the substrate of the heteropolymeric and/or homopolymeric region into which the substrate has been incorporated, the measured optical (e.g., fluorescent) signal can be linearly proportional to a slope of about 1.0.

A solution comprising a plurality of optically (e.g., fluorescently) labeled nucleotides can also comprise unlabeled nucleotides (e.g., the labeling fraction can be less than 100%). For example, at least about 20% of the nucleotides in solution may be optically labeled, and at least about 80% of the nucleotides in solution may not be optically labeled. In some cases, a majority of the nucleotides in solution can be optically labeled (e.g., about 50-100%).

In some cases, two or more optically (e.g., fluorescently) labeled nucleotides of the plurality of optically (e.g., fluorescently) labeled nucleotides are incorporated into the sequencing template (e.g., into the homopolymeric region). In some cases, three or more optically (e.g., fluorescently) labeled nucleotides of the plurality of optically (e.g., fluorescently) labeled nucleotides are incorporated into the sequencing template. The number of optically labeled nucleotides incorporated into the sequencing template during a given nucleotide flow may depend on the homopolymeric nature of the nucleic acid molecule. In some cases, a first optically (e.g., fluorescently) labeled nucleotide of the plurality of optically (e.g., fluorescently) labeled nucleotides is incorporated into four positions of a second optically (e.g., fluorescently) labeled nucleotide of the plurality of optically (e.g., fluorescently) labeled nucleotides.

Optically (e.g., fluorescently) labeled nucleotides can comprise a cleavable group to facilitate cleavage of an optical (e.g., fluorescent) label (e.g., as described herein). In some cases, the method can further comprise, after incorporating the one or more optically (e.g., fluorescently) labeled nucleotides and washing away residual solution, incorporating an optical (e.g., fluorescent) label that cleaves the one or more optically (e.g., fluorescently) labeled nucleotides into the sequencing template (e.g., as described herein). The cutting stream may be followed by a further washing stream.

In some cases, the nucleotide stream and wash stream may be followed by a "chase" stream containing unlabeled nucleotides and unlabeled nucleotides. The trace stream can be used to complete a sequencing reaction for a given nucleotide position or positions of a sequencing template (e.g., across a plurality of such templates immobilized to a support). The trace stream may precede the detection of the optical signal from the template. Alternatively, the tracking flow may follow the detection of the optical signal from the template. The trace stream may precede the cutting stream. Alternatively, the trace stream may be followed by a cutting stream. The trace stream may be followed by a wash stream.

The methods provided herein can also be used to sequence heteropolymers and/or heteropolymer regions (i.e., portions that are not homopolymers) of nucleic acid molecules. Thus, the methods described herein can be used to sequence nucleic acid molecules having any degree of heteropolymeric or homopolymeric properties.

With respect to homopolymers, the nucleotide flow at the homopolymer region may incorporate several nucleotides into a row. Contacting a sequencing template comprising a nucleic acid molecule comprising a region of a homopolymer (e.g., a nucleic acid molecule hybridized to an unextended primer) with a solution comprising a plurality of nucleotides (e.g., labeled and unlabeled nucleotides), wherein each nucleotide of the plurality of nucleotides is of the same type, may result in the incorporation of multiple nucleotides of the plurality of nucleotides into the sequencing template. In some cases, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 nucleotides are incorporated (i.e., in a homopolymeric region of a nucleic acid molecule). The plurality of nucleotides incorporated into the sequencing template can comprise a plurality of labeled nucleotides (e.g., optically labeled, e.g., fluorescently labeled), as described herein. In this case, one or more of the nucleotides incorporated into the homopolymer region can be labeled and can occupy positions that are adjacent or not adjacent to other labeled nucleotides incorporated into the homopolymer region. The signal intensity obtained from a nucleic acid molecule can be proportional to the number of incorporated labeled nucleotides (e.g., using a 100% label fraction). For example, the intensity of an optical signal (e.g., a fluorescent signal) obtained from a nucleic acid molecule comprising two labeled nucleotides may have a greater intensity than an optical signal obtained from a nucleic acid molecule comprising one labeled nucleotide. Furthermore, the intensity of the signal obtained from a nucleic acid molecule may depend on the relative position of the labeled nucleotides within the nucleic acid molecule. For example, a nucleic acid molecule comprising two labeled nucleotides in non-adjacent positions may provide a different signal intensity than a nucleic acid molecule comprising two labeled nucleotides in adjacent positions. Quenching in such systems can be optimized by careful selection of the linker and dye (e.g., fluorescent dye). In some cases, the plot of optical signal (e.g., fluorescence) versus homopolymer length can be linear. For example, for a collection of growing nucleic acid strands that include a homopolymer region incorporating labeled nucleotides, the measured optical signal can be approximately linearly proportional to the nucleotide length of the homopolymer region.

In another aspect, the present disclosure provides a method of sequencing a nucleic acid molecule. The method can include contacting the nucleic acid molecule with a primer under conditions sufficient for hybridization of the primer to the nucleic acid molecule, thereby generating a sequencing template. It can then be contacted with a polymerase and a first solution comprising a plurality of first optically (e.g., fluorescently) labeled nucleotides (and optionally, a plurality of first unlabeled nucleotides). Each first optically (e.g., fluorescently) labeled nucleotide in the plurality of first optically (e.g., fluorescently) labeled nucleotides is of the same type. A first optically (e.g., fluorescently) labeled nucleotide of the plurality of first optically (e.g., fluorescently) labeled nucleotides can be complementary to a nucleic acid molecule to be sequenced at a position adjacent to the primer. A first optically (e.g., fluorescently) labeled nucleotide of the plurality of first optically (e.g., fluorescently) labeled nucleotides can thus be incorporated into the sequencing template to generate an extended primer. The first solution comprising the plurality of first optically (e.g., fluorescently) labeled nucleotides can then be washed away from the sequencing template (e.g., using a wash solution). A first optical (e.g., fluorescent) signal emitted by the sequencing template can then be measured (e.g., as described herein). The sequencing template can then be contacted with a polymerase and a second solution comprising a plurality of second optically (e.g., fluorescently) labeled nucleotides (and, optionally, a plurality of second unlabeled nucleotides). Each of the plurality of second optically (e.g., fluorescently) labeled nucleotides can be of the same type. A second optically (e.g., fluorescently) labeled nucleotide of the plurality of second optically (e.g., fluorescently) labeled nucleotides can be complementary to a nucleic acid molecule to be sequenced at a position adjacent to the extended primer. A second optically (e.g., fluorescently) labeled nucleotide of the plurality of second optically (e.g., fluorescently) labeled nucleotides can thus be incorporated into the sequencing template. A second solution comprising a plurality of second optically (e.g., fluorescently) labeled nucleotides can then be washed away from the sequencing template. A second optical (e.g., fluorescent) signal emitted by the sequencing template can then be measured. In some cases, the intensity of the second optical (e.g., fluorescent) signal may be greater than the intensity of the first optical (e.g., fluorescent) signal.

A first optically-labeled nucleotide in the plurality of first optically-labeled nucleotides can comprise a first dye (e.g., a fluorescent dye) and a first linker (e.g., as described herein) attached to the first dye and the first nucleotide. Similarly, a second optically-labeled nucleotide in the plurality of second optically-labeled nucleotides can comprise a second dye (e.g., a fluorescent dye) and a second linker (e.g., as described herein) attached to the second dye and the second nucleotide. The first linker can comprise (i) one or more water-solubilizing groups and (ii) two or more ring systems, wherein at least two of the two or more ring systems have no more than two sp passes3The carbon atoms are linked to each other, for example by not more than two atoms. For example, at least two of the two or more ring systems may be through sp2The carbon atoms are linked to each other. The linker may comprise a non-protein amino acid comprising a ring system of two or more ring systems. For example, the first linker can comprise one or more hydroxyproline moieties (e.g., as described herein). The first linker can be configured to establish a functional length between the first dye and the first nucleotide of at least about 0.5 nanometers. Similarly, the second linker may comprise (i) one or more water-solubilizing groups and (ii) Two or more ring systems, wherein at least two of the two or more ring systems have no more than two sp passes3The carbon atoms are linked to each other, for example by not more than two atoms. For example, at least two of the two or more ring systems may be through sp2The carbon atoms are linked to each other. The linker may comprise a non-protein amino acid comprising a ring system of two or more ring systems. For example, the second linker may comprise one or more hydroxyproline moieties (e.g., as described herein). The second linker can be configured to establish a functional length between the second dye and the second nucleotide of at least about 0.5 nanometers. The first joint and the second joint may have the same structure. Alternatively, the first and second connectors may have different structures. The first linker and the second linker may comprise a shared structural motif, such as a shared cleavable component (e.g., as described herein). The first linker and/or the second linker may comprise a cleavable group configured to be cleaved with a cleavage reagent (e.g., as described herein).

The first solution comprising a plurality of first optically (e.g., fluorescently) labeled nucleotides may also comprise first unlabeled nucleotides. For example, about 20% of the nucleotides in the first solution may be unlabeled. In some cases, at least 20% of the nucleotides in the first solution can be optically labeled, e.g., at least 50% or at least 80%. An unlabeled nucleotide can comprise the same nucleotide moiety (e.g., a canonical nucleotide moiety) as an optically labeled nucleotide. Similarly, a second solution comprising a plurality of first optically-labeled nucleotides may also comprise a second unlabeled nucleotide. For example, about 20% of the nucleotides in the second solution may be unlabeled. In some cases, at least 20% of the nucleotides of the second solution may be optically labeled, e.g., at least 50% or at least 80%. An unlabeled nucleotide can comprise the same nucleotide moiety (e.g., a canonical nucleotide moiety) as an optically labeled nucleotide.

The plurality of first optically (e.g., fluorescently) labeled nucleotides can be different from the plurality of second optically (e.g., fluorescently) labeled nucleotides. For example, the plurality of first optically (e.g., fluorescently) labeled nucleotides and the plurality of second optically (e.g., fluorescently) labeled nucleotides can comprise the same optical (e.g., fluorescent) label (e.g., the same dye) and different nucleotides. Alternatively, the plurality of first optically (e.g., fluorescently) labeled nucleotides and the plurality of second optically (e.g., fluorescently) labeled nucleotides can comprise different optical (e.g., fluorescent) labels (e.g., different dyes) and the same nucleotide. In some cases, the plurality of first optically (e.g., fluorescently) labeled nucleotides and the plurality of second optically (e.g., fluorescently) labeled nucleotides can comprise different optical (e.g., fluorescent) labels (e.g., different dyes) and different nucleotides. A first dye in a first plurality of optically-labeled nucleotides and a second dye in a second plurality of optically-labeled nucleotides can emit signals at about the same wavelength or range of wavelengths (e.g., whether the first dye and the second dye have the same or different chemical structures). For example, both the first dye and the second dye may emit a signal in the green region of the visible portion of the electromagnetic spectrum.

In some cases, two or more first optically (e.g., fluorescently) labeled nucleotides can be incorporated into a sequencing template (e.g., in a homopolymeric region of a nucleic acid molecule). In some cases, two or more second optically (e.g., fluorescently) labeled nucleotides can be incorporated into the sequencing template.

Additional optically (e.g., fluorescently) labeled nucleotides can also be provided and incorporated into the sequencing template (e.g., in a continuous stream of nucleotides, as described herein). For example, the method can further include contacting the sequencing template with a polymerase and a third solution comprising a plurality of third optically (e.g., fluorescently) labeled nucleotides, wherein each third optically (e.g., fluorescently) labeled nucleotide of the plurality of third optically (e.g., fluorescently) labeled nucleotides is of the same type, and wherein a third optically (e.g., fluorescently) labeled nucleotide of the plurality of third optically (e.g., fluorescently) labeled nucleotides is complementary to the nucleic acid molecule at a position adjacent to the further extended primer that hybridizes to the nucleic acid molecule, thereby incorporating a third optically (e.g., fluorescently) labeled nucleotide of the plurality of third optically (e.g., fluorescently) labeled nucleotides into the sequencing template; washing a third solution comprising a plurality of third optically (e.g., fluorescently) labeled nucleotides away from the sequencing template; and measuring a third optical (e.g., fluorescent) signal emitted by the sequencing template. In some cases, the intensity of the third optical signal may be greater than the intensity of the first optical (e.g., fluorescent) signal and the intensity of the second optical (e.g., fluorescent) signal. This process can be repeated with a fourth solution, etc. The third and fourth solutions may comprise optically (e.g., fluorescently) labeled nucleotides having nucleotides that are different from the first and second solutions such that each canonical nucleotide (A, C, G and U/T) may be provided to the sequencing template in sequence. The cycle in which each canonical nucleotide is provided to the sequencing template may be repeated one or more times to sequence and/or amplify the nucleic acid molecule.

A third optically-labeled nucleotide in the plurality of third optically-labeled nucleotides can comprise a third dye (e.g., a fluorescent dye) and a third linker (e.g., as described herein) attached to the third dye and the third nucleotide. The third linker can comprise (i) one or more water-solubilizing groups and (ii) two or more ring systems, wherein at least two of the two or more ring systems have no more than two sp passes3The carbon atoms are linked to each other, for example by not more than two atoms. For example, at least two of the two or more ring systems may be through sp2The carbon atoms are linked to each other. The linker may comprise a non-protein amino acid comprising a ring system of two or more ring systems. For example, the third linker can comprise one or more hydroxyproline moieties (e.g., as described herein). The third linker can be configured to establish a functional length between the third dye and the third nucleotide of at least about 0.5 nanometers. The third joint and the first joint may have the same or different structures. Similarly, the third linker and the second linker may have the same or different structures. The third dye may have the same or different structure as the first dye. Similarly, the third dye may have the same or different structure as the second dye. The third dye and the first and/or second dye may emit light at approximately the same wavelength or range of wavelengths (e.g., whether the dyes have the same or different chemical structures) And (4) shooting. Further, the third nucleotide may be the same or different type from the first nucleotide, or the third nucleotide may be the same or different type from the second nucleotide.

The method can further include cleaving the optical (e.g., fluorescent) label of its corresponding nucleotide after washing away (e.g., using a wash solution) a given solution (e.g., a stream of nucleotides). For example, after washing away the first solution, the optical (e.g., fluorescent) label of the first optically (e.g., fluorescent) labeled nucleotide incorporated into the sequencing template can be cleaved (e.g., using a cleavage reagent to cleave the cleavable group of the linker of the first optically labeled nucleotide, as described herein). For example, the fluorescent dye(s) of the first optically-labeled nucleotide(s) incorporated into the sequencing template can be cleaved prior to contacting the sequencing template with the second optically-labeled nucleotide (e.g., in the second nucleotide stream, as described herein). Thus, a signal can be detected from one or more first optically labeled nucleotides prior to incorporation of one or more second optically labeled nucleotides into the sequencing template. The separation of the fluorescent dye(s) of the first optically-labeled nucleotide(s) incorporated into the sequencing template can provide the scar nucleotide(s) that comprise a portion of the linker of the first optically-labeled nucleotide or derivative thereof. Similarly, after washing away the second solution (e.g., second nucleotide stream), the optical (e.g., fluorescent) label of the second optically (e.g., fluorescent) labeled nucleotide incorporated into the sequencing template can be cleaved. All portions of the first and second joints may be cut in respective cutting processes.

In another aspect, provided herein is a method of sequencing a nucleic acid molecule. The method can include providing a solution comprising a plurality of optically (e.g., fluorescently) labeled nucleotides, wherein each optically (e.g., fluorescently) labeled nucleotide in the plurality of optically (e.g., fluorescently) labeled nucleotides is of the same type. A given optically (e.g., fluorescently) labeled nucleotide in the plurality of fluorescently labeled nucleotides can comprise an optical (e.g., fluorescent) dye linked to the nucleotide through a semi-rigid water-soluble linker having a defined molecular weight. The linker connecting the dye and the nucleotide can provide a functional length between the dye and the nucleotide of at least about 0.5 nanometers (nm). The nucleic acid molecule can then be contacted with the primer under conditions sufficient to hybridize the primer to the nucleic acid molecule to be sequenced to produce a sequencing template. The sequencing template can then be contacted with a polymerase and a solution containing a plurality of optically (e.g., fluorescently) labeled nucleotides, wherein an optically (e.g., fluorescently) labeled nucleotide of the plurality of optically (e.g., fluorescently) labeled nucleotides is complementary to a nucleic acid molecule to be sequenced at a position adjacent to the primer. One or more of the plurality of optically (e.g., fluorescently) labeled nucleotides can thus be incorporated into the sequencing template. A solution comprising a plurality of optically (e.g., fluorescently) labeled nucleotides can be washed away from the sequencing template (e.g., using a wash solution). The optical (e.g., fluorescent) signal emitted by the sequencing template can then be measured.

The linker may comprise (i) one or more water-solubilizing groups and (ii) two or more ring systems, wherein at least two of the two or more ring systems have no more than two sp transitions3The carbon atoms are linked to each other, such as by no more than two atoms (e.g., as described herein). For example, at least two of the two or more ring systems may be through sp2The carbon atoms are linked to each other. The linker may comprise a non-protein amino acid comprising a ring system of two or more ring systems. For example, the linker may comprise one or more hydroxyproline moieties (e.g., as described herein). The linker can establish a functional length between the fluorescent dye and the nucleotide of at least about 0.5 nanometers (e.g., as described herein).

The measured optical (e.g., fluorescent) signal can be proportional to the number of optically (e.g., fluorescently) labeled nucleotides incorporated into the sequencing template. For example, where a 100% labeling fraction is used (e.g., all nucleotides in solution are labeled), quenching may not reduce the emitted signal. In such systems, the measured optical (e.g., fluorescent) signal can be linearly proportional to the number of optically (e.g., fluorescently) labeled nucleotides incorporated into the sequencing template. When plotted against the number of optically (e.g., fluorescently) labeled nucleotides incorporated into the sequencing template, the measured optical (e.g., fluorescent) signal can be linearly proportional to a slope of about 1.0. In the case where less than 100% of the nucleotides are labeled (e.g., less than 100% of the nucleotides in solution are labeled), the optical (e.g., fluorescent) signal emitted by the nucleotides incorporated into the plurality of growing nucleic acid strands (e.g., the plurality of growing nucleic acid strands coupled to the sequencing template coupled to the support, as described herein) can be proportional to the length of the homopolymer region of the growing nucleic acid strands. Similarly, where less than 100% of the nucleotides are labeled, the optical (e.g., fluorescent) signal emitted by the nucleotides incorporated into a plurality of growing nucleic acid strands (e.g., a plurality of growing nucleic acid strands coupled to a sequencing template coupled to a support, as described herein) can be proportional to the length of the heteropolymer and/or homopolymer regions of the growing nucleic acid strands. In some such cases, the intensity of the measured optical (e.g., fluorescent) signal can be linearly proportional to the length of the homopolymeric and/or homopolymeric region into which the nucleic acid has been incorporated. For example, when the optical (e.g., fluorescence) signal is plotted against the length in the nucleotide of the heteropolymeric and/or homopolymeric region into which the nucleotide has been incorporated, the measured optical (e.g., fluorescence) signal can be linearly proportional to a slope of about 1.0.

In some cases, a solution containing optically (e.g., fluorescently) labeled nucleotides also contains unlabeled nucleotides. Unlabeled nucleotides can comprise the same nucleotide moiety (e.g., the same canonical nucleotide). In some embodiments, about 5%, about 10%, about 20%, about 30%, about 40%, about 50%, about 60%, about 70%, about 80%, about 90%, or about 100% of the nucleotides in the solution are fluorescently labeled. In some cases, at least about 5%, at least about 10%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90% or more of the nucleotides in the solution are fluorescently labeled. In some cases, at least about 5%, at least about 10%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90% or more of the nucleotides in the solution are not fluorescently labeled.

A plurality of labeled nucleotides can be incorporated adjacent to each other along a nucleic acid molecule. In some cases, a first optically (e.g., fluorescently) labeled nucleotide is incorporated within 4 positions, within 3 positions, within 2 positions, or adjacent to a second optically (e.g., fluorescently) labeled nucleotide (e.g., a second optically labeled nucleotide of the same or different nucleotide type). In some cases, the method further comprises cleaving the optical (e.g., fluorescent) label from the nucleotide after measuring the optical (e.g., fluorescent) signal (e.g., as described herein). Cutting an optical (e.g., fluorescent) marker may leave a scar (e.g., as described herein). Nucleic acid sequencing assays can be used to evaluate dye-labeled nucleotides. The assay may use a nucleic acid template having a known sequence, which may include one or more homopolymeric regions. The template may be immobilized on the support by an adapter (e.g., as described herein). A primer having a sequence that is at least partially complementary to an adaptor or a portion thereof can hybridize to the adaptor or a portion thereof and provide an origin for generating a nucleic acid strand having a sequence that is complementary to a sequence of the template by incorporating labeled and unlabeled nucleotides (e.g., as described herein). Sequencing assays can use four different four nucleotide streams, including different canonical nucleobases that can be repeated in a cyclic manner (e.g., cycle 1: A, G, C, U; cycle 2A, G, C, U, etc.). Each nucleotide stream can include nucleotides (or analogs thereof) comprising a single canonical type of nucleobase, some of which can include the optical labeling reagents provided herein. The labeling fraction (e.g., the percentage of nucleotides included in the stream attached to the optical labeling reagent) may vary, for example, between 0.5% and 100%. The labeling scores may differ for different nucleotide streams. Nucleotides may not be terminated to facilitate incorporation into the homopolymer region. The template may be contacted with a nucleotide stream followed by one or more wash streams (e.g., as described herein). The template may also be contacted with a cleavage stream (e.g., as described herein) comprising a cleavage agent configured to cleave a portion of the optical labeling agent attached to the labeled nucleotides incorporated into the growing nucleic acid strand. The wash stream can be used to remove the cleavage reagent and prepare the template for contact with a subsequent nucleotide stream. After each nucleotide flow, emissions can be detected from labeled nucleotides incorporated into the growing nucleic acid strand.

An example sequencing procedure 1800 is provided in fig. 18. In process 1802, a template and a primer configured for nucleotide incorporation are provided. A first sequencing cycle 1804 is then performed. The first sequencing cycle 1804 includes four flows 1804a, 1804b, 1804c, and 1804d, each flow having multiple flows. Nucleotides 1, 2, 3, and 4 can each include nucleobases of different canonical types (e.g., A, G, C and U). A given nucleotide stream can include labeled nucleotides (e.g., nucleotides labeled with an optical labeling reagent provided herein) and unlabeled nucleotides. The labeling score may be different for each nucleotide stream. That is, A, B, C and D in fig. 18 may be the same or different and may range from 0% to 100% (e.g., as described herein). The labels and linkers used to label nucleotides 1, 2, 3 and 4 may be of the same or different types. For example, nucleotide 1 may have a linker that includes a cleavable linker and a hyp10 linker and a first green dye, and nucleotide 2 may have a linker that includes a cleavable linker but does not include a hyp10 linker and a second green dye. The first green dye may be the same as or different from the first green dye. The cleavable linkers associated with different nucleotides may be the same or different. The flow path 1804a can include a nucleotide flow path (e.g., a flow including a plurality of nucleotides of the nucleotide 1 type, a% of which can be labeled). In this stream, labeled and unlabeled nucleotides can be incorporated into the growing strand (e.g., using a polymerase). The first wash stream ("wash stream 1") can be used to remove unincorporated nucleotides and related reagents. A cleavage stream comprising a cleavage reagent may be provided to all or a portion of the optically labeled reagent attached to the incorporated nucleotide. For example, the labeled nucleotides can include a cleavable linker moiety that can be cleaved upon contact with a cleavage reagent to provide a scarred nucleotide. The second wash stream ("wash stream 2") can be used to remove the cleavage agent and cleaved material. The nucleotide flow process 1804a may also include a "chase" process in which a nucleotide flow may be flowed that includes only unlabeled nucleotides of nucleotide type 1. Such a tracing process may be followed by a wash stream. The tracking process and its attendant wash streams may occur after the initial nucleotide stream and wash stream 1, or after the cleavage stream and wash stream 2. The next nucleotide flow process 1804b can then begin and proceed in a similar manner. After completing processes 1804b, 1804c, and 1804d, first flow loop 1804 may be completed. A second stream loop 1806 may begin. The loop 1806 may include the same flow processes in the same or different order. Additional cycles may be performed until all or part of the template has been sequenced. Detection of incorporated nucleotides via emission detection can be performed after the nucleotide flow and initial wash flow and before the cleavage flow for each nucleotide flow process (e.g., flow process 1804a can include detection processes between wash flow 1 and cleavage flow, etc.). Templates interrogated by such sequencing processes may be immobilized onto a support (e.g., as described herein). A plurality of such templates (e.g., at least about 100, 200, 500, 1000, 10000, 100,000, 500,000, 1,000,000, or more templates) may be interrogated simultaneously in this manner (e.g., in a clonal manner). In such systems, incorporation of nucleotides can be detected as an average of multiple templates, which can allow the use of a labeling fraction of less than 100%.

In some cases, for any of the foregoing methods, the nucleotide is guanine (G) and the linker reduces quenching between the nucleotide and a dye (e.g., fluorescent) dye.

In some cases, for any of the foregoing methods, an optically (e.g., fluorescently) labeled nucleotide comprising a linker provided herein is incorporated into a sequencing template more efficiently than another optically (e.g., fluorescently) labeled nucleotide comprising the same nucleotide and an optical (e.g., fluorescent) dye, but not comprising a linker. In some cases, for any of the foregoing methods, an optically (e.g., fluorescently) labeled nucleotide comprising a linker provided herein is incorporated into a sequencing template with higher fidelity than another optically (e.g., fluorescently) labeled nucleotide comprising the same nucleotide and an optical (e.g., fluorescent) dye, but not comprising a linker.

For any of the sequencing methods provided herein,the polymerase used may be a family A polymerase, such as Taq, Klenow or Bst polymerase. Alternatively, for any of the sequencing methods provided herein, the polymerase can be a family B polymerase, such as Vent (exo-) or a TherminatorTMA polymerase.

In one aspect, the disclosure provides methods of sequencing nucleic acid molecules using optically (e.g., fluorescently) labeled nucleotides described herein. A method may comprise providing a plurality of nucleic acid molecules, which may comprise or be part of one or more colonies. The plurality of nucleic acid molecules may have sequence homology to the template sequence. The method can include contacting the plurality of nucleic acid molecules with a solution comprising a plurality of nucleotides (e.g., a solution comprising a plurality of optically-labeled nucleotides) under conditions sufficient to incorporate a subset of the plurality of nucleotides into a plurality of growing nucleic acid strands that are complementary to the plurality of nucleic acid molecules. In some cases, at least about 20% of a subset of the plurality of nucleotides are optically (e.g., fluorescently) labeled nucleotides (e.g., as described herein). The method can include detecting one or more signals or signal changes from the labeled nucleotides incorporated into the plurality of growing nucleic acid strands, wherein the one or more signals or signal changes indicate that the labeled nucleotides have been incorporated into the plurality of growing nucleic acid strands.

The optically (e.g., fluorescently) labeled nucleotide in the plurality of nucleotides can be non-terminating. In such cases, the growing strand may incorporate one or more contiguous nucleotides during the period (e.g., complementary bases of multiple nucleotides in solution are not present at multiple locations adjacent to a primer that hybridizes to a nucleic acid molecule). One or more signals or signal changes detected from optically (e.g., fluorescently) labeled nucleotides can indicate that consecutive nucleotides have been incorporated into a plurality of growing nucleic acid strands. Methods for determining multiple fluorophores from detected signals or signal changes are described elsewhere herein.

Alternatively, the optically (e.g., fluorescently) labeled nucleotide can be terminated. In this case, no more than one nucleotide can be incorporated per growing strand in each flow cycle until synthesis is terminated. One or more signals or signal changes detected from optically (e.g., fluorescently) labeled nucleotides can indicate that the nucleotides have been incorporated into a plurality of growing nucleic acid strands. The terminating group of the labeled nucleotide can be cleaved before, during, or after detection (e.g., to facilitate sequencing of the homopolymer, and/or to reduce potential background and/or quenching problems).

Alternatively or additionally, the optically (e.g., fluorescently) labeled nucleotides can include a mixture of terminated and non-terminated nucleotides. In this case, one or more contiguous nucleotides of the primer that produces the extension may be incorporated into the growing strand. The solution containing the plurality of terminated and non-terminated nucleotides can then be washed away from the sequencing template. An unlabeled nucleotide of a plurality of nucleotides can comprise the same type of nucleotide moiety (e.g., the same canonical nucleotide) as a labeled nucleotide of the plurality of nucleotides.

In one aspect, the present disclosure provides compositions comprising one or more fluorescently labeled nucleotides and methods of using the same. The composition can comprise a solution containing a fluorescently labeled nucleotide (e.g., as described herein). The fluorescently labeled nucleotide can comprise a fluorescent dye attached to a nucleotide or nucleotide analog (e.g., as described herein) via a linker (e.g., as described herein). The linker may comprise (i) one or more water-solubilizing groups and (ii) two or more ring systems. At least two of the two or more ring systems may pass through no more than two sp3Carbon atoms, e.g. not passing sp3The carbon atoms are linked to each other. For example, at least two of the two or more ring systems may be connected to each other through no more than two atoms. For example, at least two of the two or more ring systems may be through sp 2The carbon atoms are linked to each other. The linker may comprise a non-protein amino acid comprising a ring system of two or more ring systems. The fluorescently labeled nucleotide can be configured to emit a fluorescent signal. The fluorescently labeled nucleotide can comprise a plurality of amino acids, such as a plurality of non-protein (e.g., non-natural) amino acids. For example, the linker may comprise a plurality of hydroxyprolines. At least one of the one or more water-solubilizing groups may be appended to the ring structure of two or more ring systems. One or more water solubleThe polar group may be selected from pyridinium, imidazolium, quaternary ammonium groups, sulfonates, phosphates, alcohols, amines, imines, nitriles, amides, thiols, carboxylic acids, polyethers, aldehydes, boronic acids and boronic esters. The linker may comprise cleavable groups (e.g., an azidomethyl group, a disulfide bond, a hydrocarbyl dithiomethyl group, and a 2-nitrobenzyloxy group) configured to be cleaved to separate the fluorescent dye from the nucleotide.

The solution (e.g., nucleotide stream) can comprise a plurality of fluorescently labeled nucleotides, each of which can comprise the same type of fluorescent dye, the same type of linker, and the same type of nucleotide. Each linker of each fluorescently labeled nucleotide in the plurality of fluorescently labeled nucleotides can have the same molecular weight (e.g., they may not comprise a polymer having a range of molecular weights). The solution can also comprise a plurality of unlabeled nucleotides, wherein each nucleotide in the plurality of unlabeled nucleotides is of the same type as each nucleotide in the plurality of fluorescently labeled nucleotides. The ratio of the plurality of fluorescently labeled nucleotides to the plurality of unlabeled nucleotides in the solution can be at least about 1:4 (e.g., the labeling fraction can be at least 20%). For example, the ratio can be at least 1:1 (e.g., the mark fraction can be at least 50%). Alternatively, the solution may not contain any unlabeled nucleotides and the labeling fraction may be 100%.

A solution (e.g., a stream of nucleotides) can be provided to a template nucleic acid molecule coupled to a nucleic acid strand. The template nucleic acid molecule may be immobilized on a support (e.g., as described herein). For example, the template nucleic acid molecule may be immobilized on a support by an adaptor. For example, the template nucleic acid molecule may be immobilized on a support by a primer that hybridizes thereto. The nucleic acid strand may be at least partially complementary to a portion of the template nucleic acid molecule. The template nucleic acid molecule and the nucleic acid strand coupled thereto may be subjected to conditions sufficient to incorporate the fluorescently labeled nucleotides of the solution into the nucleic acid strand coupled to the template nucleic acid molecule. Incorporation of the fluorescently labeled nucleotide can be accomplished using a polymerase (e.g., as described herein). More than one fluorescently labeled nucleotide of the solution can be incorporated, for example, into a homopolymeric region of the template nucleic acid molecule. Alternatively or additionally, unlabeled nucleotides can be incorporated (e.g., adjacent to fluorescently labeled nucleotides), for example, into a homopolymeric region of a template nucleic acid molecule. A signal (e.g., a fluorescent signal) can be detected from a fluorescently labeled nucleotide incorporated into a nucleic acid strand. Prior to detection of the signal, a wash solution can be used to remove fluorescently labeled nucleotides that are not incorporated into the nucleic acid strand. After detecting the signal, the fluorescently labeled nucleotides incorporated into the nucleic acid strand can be contacted with a cleavage agent configured to cleave the fluorescent dye from the nucleotides. The cleavage reagent can be configured to cleave the linker to provide a nucleotide attached to a portion of the linker, which can comprise a thiol moiety, an aromatic moiety, or a combination thereof. A nucleic acid strand, e.g., a plurality of nucleic acid strands coupled to a plurality of template nucleic acid molecules, can be contacted with a tracking stream that contains only unlabeled nucleotides of the same nucleotide type (e.g., before or after detection of a signal). The nucleic acid strand coupled to the template nucleic acid molecule may also be contacted with one or more additional wash streams. The nucleic acid strand coupled to the template nucleic acid molecule may be contacted with a further solution comprising further fluorescently labeled nucleotides, for example further fluorescently labeled nucleotides comprising different types of nucleotides. The dye of the further fluorescently labeled nucleotide may be of the same type as the dye of the fluorescently labeled nucleotide. Similarly, the linker of the additional fluorescently labeled nucleotide can be of the same type as the linker of the fluorescently labeled nucleotide.

In another aspect, the present disclosure provides a method comprising providing a fluorescently labeled reagent (e.g., as described herein). The fluorescent labeling reagent may comprise a fluorescent dye and a linker attached to the fluorescent dye. The linker may comprise (i) one or more water-solubilizing groups and (ii) two or more ring systems. At least two of the two or more ring systems may pass through no more than two sp3The carbon atoms are, for example, connected to each other by not more than two atoms. For example, at least two of the two or more ring structures may pass through sp2The carbon atoms are linked to each other. The linker may comprise a non-protein amino acid comprising a ring system of two or more ring systems. The fluorescent labeling reagent may be configured to emit fluorescenceAn optical signal. The fluorescent labeling reagent can comprise a plurality of amino acids, such as a plurality of non-protein (e.g., non-natural) amino acids. For example, the linker may comprise a plurality of hydroxyprolines. At least one of the one or more water-solubilizing groups may be appended to the ring structure of two or more ring systems. The one or more water-solubilizing groups may be selected from pyridinium, imidazolium, quaternary ammonium groups, sulfonates, phosphates, alcohols, amines, imines, nitriles, amides, thiols, carboxylic acids, polyethers, aldehydes, boronic acids, and boronic esters.

The substrate can be contacted with a fluorescent labeling reagent to produce a fluorescently labeled substrate, wherein the linker attached to the fluorescent dye is associated with the substrate. The substrate may be a nucleotide or nucleotide analog (e.g., as described herein). Alternatively, the substrate may be a protein, lipid, cell or antibody. The fluorescently labeled substrate can be configured to emit a fluorescent signal (e.g., upon excitation in a suitable energy range), which can be detected (e.g., using imaging-based detection). The linker may comprise cleavable groups (e.g., an azidomethyl group, a disulfide bond, a hydrocarbyl dithiomethyl group, and a 2-nitrobenzyloxy group) configured to be cleaved to separate the fluorescent dye from the substrate. The fluorescently labeled substrate can be contacted with a cleavage agent configured to cleave the fluorescently labeled reagent or a portion thereof from the fluorescently labeled substrate to produce a scarred substrate. The scarred substrate may comprise thiol moieties, aromatic moieties, or combinations thereof. Prior to generating the scar substrate, the fluorescently labeled substrate and the nucleic acid molecule can be subjected to conditions sufficient to incorporate the fluorescently labeled substrate into the nucleic acid molecule. Incorporation can be accomplished using a polymerase (e.g., as described herein). More than one fluorescently labeled substrate can be incorporated, for example, into a homopolymeric region of a nucleic acid molecule. For example, additional fluorescently labeled substrate can be incorporated adjacent to the location of incorporation of the fluorescently labeled substrate. Alternatively or additionally, unlabeled substrates (e.g., nucleotides of the same type as the nucleotides of the fluorescently labeled nucleotides) can also be incorporated into the nucleic acid molecule, e.g., into adjacent positions of the nucleic acid molecule. The incorporation of additional fluorescently labeled substrate can be performed before or after the scar substrate is generated. Similarly, incorporation of unlabeled substrate can be performed before or after the production of the scarred substrate.

A nucleic acid molecule, e.g., a nucleic acid molecule of a plurality of nucleic acid molecules, can be contacted with a chase stream that contains only unlabeled substrates of the same type (e.g., before or after detection of a signal from the nucleic acid molecule). The nucleic acid molecule may also be contacted with one or more additional wash streams. The nucleic acid molecule may be contacted with a further solution comprising a further fluorescently labeled substrate, e.g. a further fluorescently labeled substrate comprising a different type of nucleotide. The dye of the further fluorescently labeled substrate may be of the same type as the dye of the fluorescently labeled substrate. Similarly, the linker of the additional fluorescently labeled substrate can be of the same type as the linker of the fluorescently labeled substrate.

The nucleic acid molecule may be immobilized on a support (e.g., as described herein). For example, the nucleic acid molecule may be immobilized on a support by an adaptor. For example, the nucleic acid molecule may be immobilized on a support by a primer that hybridizes thereto. A nucleic acid molecule can comprise a first nucleic acid strand that is at least partially complementary to a portion of a second nucleic acid strand. The second nucleic acid strand can comprise a template nucleic acid sequence or a complement thereof.

The labeled nucleotides of the present disclosure can be used during sequencing operations involving a high fraction of labeled nucleotides. For example, the present disclosure provides a method comprising contacting a nucleic acid molecule (e.g., a template nucleic acid molecule) with a solution comprising a plurality of nucleotides under conditions sufficient to incorporate a first labeled nucleotide and a second labeled nucleotide of the plurality of nucleotides into a growing strand that is at least partially complementary to the nucleic acid molecule. The first labeled nucleotide and the second labeled nucleotide may be of the same canonical base type. The first nucleotide may comprise a fluorescent dye (e.g., as described herein) that may be associated with the first nucleotide via a linker (e.g., as described herein). The second nucleotide may comprise the same fluorescent dye (e.g., associated with the second nucleotide through a linker having the same chemical structure as the linker that associates the first nucleotide and the fluorescent dye). The fluorescent dye coupled to the nucleotide (e.g., the first and/or second nucleotide) can be cleavable (e.g., upon application of a cleavage reagent). At least about 20% of the plurality of nucleotides can be labeled nucleotides. For example, at least 20% of the plurality of nucleotides can be associated with a fluorescent labeling agent (e.g., as described herein). For example, at least about 50%, 70%, 80%, 90%, 95%, or 99% of the plurality of nucleotides can be labeled nucleotides. For example, all nucleotides of the plurality of nucleotides can be labeled nucleotides (e.g., the labeling fraction can be 100%). One or more signals or signal changes can be detected from the first labeled nucleotide and the second labeled nucleotide (e.g., as described herein). The one or more signals or signal changes may comprise a fluorescent signal or signal change. One or more signals or signal changes can indicate incorporation of a first labeled nucleotide and a second labeled nucleotide. One or more signals or signal changes can be resolved to determine the sequence of the nucleic acid molecule or portion thereof. Resolving one or more signals or signal changes can include determining the number of consecutive nucleotides from the solution incorporated into the growing strand. The number of consecutive nucleotides may be selected from 2, 3, 4, 5, 6, 7 or 8 nucleotides. Resolving one or more signals or signal variations may include processing solution tolerances. A third nucleotide can also be incorporated into the growing strand (e.g., before or after detection of one or more signals or signal changes). The third nucleotide may be one nucleotide of a plurality of nucleotides in solution. Alternatively, the third nucleotide may be provided in a separate solution, for example in a "chase" stream (e.g., as described herein). The third nucleotide may be unlabeled. Alternatively, the third nucleotide may be labeled. The first labeled nucleotide and the third nucleotide may be of the same canonical base type. Alternatively, the first labeled nucleotide and the third nucleotide may be different canonical base types.

The method may further comprise cleaving the fluorescent dye coupled to the first labeled nucleotide. The fluorescent dye may be cleaved by application of a cleavage reagent configured to cleave a linker associating the first labeled nucleotide and the fluorescent dye. The nucleic acid molecule can be contacted with a second solution comprising a second plurality of nucleotides under conditions sufficient to incorporate a third labeled nucleotide of the second plurality of nucleotides into the growing strand. At least about 20% of the second plurality of nucleotides can be labeled nucleotides (e.g., as described herein). One or more second signals or signal changes can be detected from a third labeled nucleotide (e.g., as described herein). One or more of the second signals or signal changes can be resolved to determine a second sequence of the nucleic acid molecule or portion thereof. The first labeled nucleotide and the third labeled nucleotide can be different canonical base types (e.g., A, C, U/T or G). The third labeled nucleotide may comprise a fluorescent dye. The fluorescent dye may be coupled to the third labeled nucleotide via a linker (e.g., as described herein), which may have the same chemical structure or a different chemical structure than the linker that connects the fluorescent dye to the first labeled nucleotide.

Alternatively, the method can comprise contacting the nucleic acid molecule with a second solution comprising a second plurality of nucleotides under conditions sufficient to incorporate a third labeled nucleotide of the second plurality of nucleotides into the growing strand. At least about 20% of the second plurality of nucleotides can be labeled nucleotides (e.g., as described herein). One or more second signals or signal changes can be detected from a third labeled nucleotide (e.g., as described herein). One or more of the second signals or signal changes can be resolved to determine a second sequence of the nucleic acid molecule or portion thereof. The first labeled nucleotide and the third labeled nucleotide can be different canonical base types (e.g., A, C, U/T or G). The third labeled nucleotide may comprise a fluorescent dye. The fluorescent dye may be coupled to the third labeled nucleotide via a linker (e.g., as described herein), which may have the same chemical structure or a different chemical structure than the linker that connects the fluorescent dye to the first labeled nucleotide. Contacting the nucleic acid molecule with the second solution can be performed without cleaving the fluorescent dye from the first labeled nucleotide or the second labeled nucleotide. Without cleaving the fluorescent dye from the first labeled nucleotide or the second labeled nucleotide, the process may be repeated one or more times, e.g., 1, 2, 3, 4, 5 or more times, each time using a different nucleotide solution. One or more of these different nucleotide solutions may comprise at least 20% labeled nucleotides.

The present disclosure also provides a method comprising contacting a nucleic acid molecule with a solution comprising a plurality of non-terminated nucleotides under conditions sufficient to incorporate a labeled nucleotide and a second nucleotide of the plurality of non-terminated nucleotides into a growing strand that is at least partially complementary to the nucleic acid molecule or a portion thereof. The labeled nucleotide and the second nucleotide may be of the same canonical base type. Alternatively, the labeled nucleotide and the second nucleotide may be different canonical base types. The labeled nucleotide may comprise a fluorescent dye (e.g., as described herein) that may be associated with the labeled nucleotide through a linker (e.g., as described herein). The second nucleotide may be a labeled nucleotide. For example, the second nucleotide can comprise the same fluorescent dye (e.g., associated with the second nucleotide through a linker having the same chemical structure as the linker that associates the first nucleotide and the fluorescent dye). Alternatively, the second nucleotide may not be coupled to a fluorescent dye (e.g., the second nucleotide may not be labeled). The fluorescent dye coupled to the nucleotide (e.g., the first and/or second nucleotide) can be cleavable (e.g., upon application of a cleavage reagent). The plurality of non-terminating nucleotides can comprise nucleotides of the same canonical base type. At least about 20% of the plurality of nucleotides can be labeled nucleotides. For example, at least 20% of the plurality of nucleotides can be associated with a fluorescent labeling agent (e.g., as described herein). For example, at least about 50%, 70%, 80%, 90%, 95%, or 99% of the plurality of non-terminating nucleotides can be labeled nucleotides. For example, substantially all of the plurality of non-terminating nucleotides can be labeled nucleotides. For example, all nucleotides of the plurality of non-terminating nucleotides can be labeled nucleotides (e.g., the labeling fraction can be 100%). One or more signals or signal changes can be detected from labeled nucleotides (e.g., as described herein). The one or more signals or signal changes may comprise a fluorescent signal or signal change. One or more signals or changes in signals can indicate incorporation of a labeled nucleotide. One or more signals or signal changes can be resolved to determine the sequence of the nucleic acid molecule or portion thereof. Resolving one or more signals or signal changes can include determining the number of consecutive nucleotides from the solution incorporated into the growing strand. The number of consecutive nucleotides may be selected from 2, 3, 4, 5, 6, 7 or 8 nucleotides. Resolving one or more signals or signal variations may include processing solution tolerances. A third nucleotide can also be incorporated into the growing strand (e.g., before or after detection of one or more signals or signal changes). The third nucleotide may be one nucleotide of a plurality of non-terminating nucleotides in solution. Alternatively, the third nucleotide may be provided in a separate solution, for example in a "chase" stream (e.g., as described herein). The third nucleotide may be unlabeled. Alternatively, the third nucleotide may be labeled. The labeled nucleotide and the third nucleotide may be of the same canonical base type. Alternatively, the labeled nucleotide and the third nucleotide may be different canonical base types.

The method may further comprise cleaving the fluorescent dye coupled to the labeled nucleotide. The fluorescent dye may be cleaved by application of a cleavage reagent configured to cleave a linker associating the labeled nucleotide and the fluorescent dye. The nucleic acid molecule can be contacted with a second solution comprising a second plurality of non-terminated nucleotides under conditions sufficient to incorporate a third labeled nucleotide of the second plurality of non-terminated nucleotides into the growing strand. At least about 20% of the second plurality of non-terminating nucleotides can be labeled nucleotides (e.g., as described herein). One or more second signals or signal changes can be detected from a third labeled nucleotide (e.g., as described herein). One or more of the second signals or signal changes can be resolved to determine a second sequence of the nucleic acid molecule or portion thereof. The first labeled nucleotide and the third labeled nucleotide can be different canonical base types (e.g., A, C, U/T or G). The third labeled nucleotide may comprise a fluorescent dye. The fluorescent dye may be coupled to the third labeled nucleotide via a linker (e.g., as described herein), which may have the same chemical structure or a different chemical structure than the linker that connects the fluorescent dye to the first labeled nucleotide.

Alternatively, the method can comprise contacting the nucleic acid molecule with a second solution comprising a second plurality of non-terminated nucleotides under conditions sufficient to incorporate a third labeled nucleotide of the second plurality of non-terminated nucleotides into the growing strand. At least about 20% of the second plurality of nucleotides can be labeled nucleotides (e.g., as described herein). One or more second signals or signal changes can be detected from a third labeled nucleotide (e.g., as described herein). One or more of the second signals or signal changes can be resolved to determine a second sequence of the nucleic acid molecule or portion thereof. The first labeled nucleotide and the third labeled nucleotide can be different canonical base types (e.g., A, C, U/T or G). The third labeled nucleotide may comprise a fluorescent dye. The fluorescent dye may be coupled to the third labeled nucleotide via a linker (e.g., as described herein), which may have the same chemical structure or a different chemical structure than the linker that connects the fluorescent dye to the first labeled nucleotide. Contacting the nucleic acid molecule with the second solution can be performed without cleaving the fluorescent dye from the first labeled nucleotide or the second labeled nucleotide. Without cleaving the fluorescent dye from the first labeled nucleotide or the second labeled nucleotide, the process may be repeated one or more times, e.g., 1, 2, 3, 4, 5 or more times, each time using a different nucleotide solution. One or more of these different nucleotide solutions may comprise at least 20% labeled nucleotides.

Method for synthesizing optical marking reagent

In some cases, the linkers provided herein can be prepared using peptide synthesis chemistry.

For example, a linker comprising a pyridinium moiety can be prepared using peptide synthesis chemistry. Such methods can use four bifunctional reagents to make the linker, namely: (a) r1A. (b) BB, (c) AA, and (d) AR2. Reacting the reagent A and the reagent B to generate a pyridinium group; r1And R2Is a heterobifunctional attachment group. Synthesis from the radical R1A (or R)2A) And starting.Adding excess BB to R1A to form R1A-BB. The product is precipitated and washed in a less polar solvent (e.g., ethyl acetate or tetrahydrofuran) to remove excess BB. With heating in N-methylpyrrolidone (NMP), excess AA is added to produce R1A-BB-AA. The product is precipitated in a less polar solvent and washed. The synthesis continues until a linker of a particular length is formed. Addition of the group AR in the last step2

1)R1A+10BB→R1A-BB (washing off excess BB)

2)R1A-BB+10AA→R1A-BB-AA (excess AA washed away)

3)R1A-BB-AA+10BB→R1A-BB-AA-BB (washing off excess BB)

4)R1A-BB-AA-BB+AR2→R1A-BB-AA-BB-AR2(use stop reagent)

Fig. 2A illustrates an example of a method for synthesizing a linker of the present disclosure having an effective length of about 2 nanometers.

Fig. 2B shows examples of reagents that can be used in the method of fig. 2A for synthesizing the disclosed linkers, as well as some trifunctional reagents.

Fig. 2C illustrates an example of a method for synthesizing a linker of the present disclosure, which is a polymer having a defined molecular weight and a linking group.

Additional synthetic methods for preparing optical labeling reagents (e.g., as described herein) are described elsewhere and in the examples below.

Method for constructing labeled nucleotides

In one aspect, the present disclosure provides methods for constructing labeled nucleotides (e.g., optically labeled nucleotides).

The labeled nucleotides can be constructed using modular chemical building blocks. Nucleotides or nucleotide analogs can be derivatized, for example, with a propargylamino moiety to provide a handle for attachment to a linker or detectable label (e.g., a dye). One or more detectable labels, such as one or more dyes, may be attached to the nucleotide or nucleotide analog by covalent bonds. Alternatively or additionally, one or more detectable labels may be attached to a nucleotide or nucleotide analog by non-covalent bonds. The detectable label may be attached to the nucleotide or nucleotide analog by a linker (e.g., as described herein). The joint may comprise one or more parts. For example, the linker may include a first portion that includes a disulfide bond therein to facilitate cleavage of the linker and release of the detectable label (e.g., during a sequencing process). Additional linker moieties may be added using sequential peptide bonds. The linker moiety may have various lengths and charges. The linker moiety may comprise one or more different components, such as one or more different ring systems, and/or repeating units (e.g., as described herein). Examples of linkers include, but are not limited to, aminoethyl-SS-propionic acid (epSS), aminoethyl-SS-benzoic acid, aminohexyl-SS-propionic acid, hyp10, and hyp 20.

Examples of methods for constructing labeled nucleotides are shown in fig. 4, fig. 5A, and fig. 5B. As shown in FIG. 4, the labeled nucleotides may be constructed from nucleotides, dyes, and one or more linker moieties. Together, one or more linker moieties comprise a linker as described herein. The nucleotide functionalized with a propargylamino moiety may be attached to the first linker moiety by a peptide bond. The first linker moiety may comprise a cleavable moiety, such as a disulfide moiety. The first linker moiety may also be attached to one or more additional linker moieties in a linear or branched manner. For example, the second linker moiety can comprise two or more ring systems, wherein at least two of the two or more ring systems are no greater than two sp3The carbon atoms are separated, e.g., by no more than two atoms. For example, at least two of the two or more ring systems may be through sp2The carbon atoms are linked to each other. The linker may comprise a non-protein amino acid comprising a ring system of two or more ring systems. For example, the second linker moiety may comprise two or more hydroxyproline moieties. Amine handles on the linker moiety can be used to attach the linker and a dye, such as a dye that fluoresces in the red or green portion of the visible electromagnetic spectrum. The labeled nucleotides produced in FIG. 4 contain modified deoxyadenosine triphosphate moieties, including A first linker moiety of a disulfide moiety and a linker of a second linker moiety comprising at least two ring systems, and a dye.

Construction of labeled nucleotides can begin at the nucleotide end or at the dye end. Construction from the dye end allows the use of unlabeled, unactivated amino acid moieties, while construction from the nucleotide end may require amine-protected, carboxy-activated amino acid moieties.

Fig. 5A and 5B illustrate an exemplary synthesis of labeled nucleotides including a propargyl amino-functional dGTP moiety, a first linker moiety including a disulfide group, a second linker moiety that is hyp10, and a dye moiety Atto 633. Details of this synthesis are provided in example 3 below.

The nucleotide or nucleotide analog of the labeled nucleotide may include one or more modifications, such as one or more modifications to a nucleobase. Alternatively, the nucleotide or nucleotide analog of the labeled nucleotide may include one or more modifications that are not on the nucleobase. Modifications may include, but are not limited to, covalent attachment of one or more linkers or label moieties, alkylation, amination, amidation, esterification, hydroxylation, halogenation, sulfonation and/or phosphorylation.

The nucleotide or nucleotide analog of the labeled nucleotide may include one or more modifications configured to prevent subsequent nucleotides from being added to a position adjacent to the labeled nucleotide when the labeled nucleotide is incorporated into a growing nucleic acid strand. For example, the labeled nucleotide may include a terminating or blocking group (e.g., a dimethoxytrityl, phosphoramidite, or nitrobenzyl molecule). In some cases, the terminating or blocking group may be cleavable.

Computer system

The present disclosure provides a computer system programmed to implement the methods of the present disclosure. FIG. 3 illustrates a computer system 301 programmed or otherwise configured to perform nucleic acid sequencing. The computer system 301 can determine a sequence read based at least in part on the strength of the detected optical signal. The computer system 301 can adjust various aspects of the disclosure, such as, for example, performing nucleic acid sequencing, sequence analysis, and adjusting conditions for transient binding and non-transient binding (e.g., incorporation) of nucleotides. Computer system 301 can be a user's electronic device or a computer system remotely located from the electronic device. The electronic device may be a mobile electronic device.

Computer system 301 includes a central processing unit (CPU, also referred to herein as "processor" and "computer processor") 305, which may be a single or multi-core processor, or multiple processors for parallel processing. Computer system 301 also includes memory or memory location 310 (e.g., random access memory, read only memory, flash memory), electronic storage unit 315 (e.g., hard disk), communication interface 320 (e.g., network adapter) for communicating with one or more other systems, and peripherals 325 such as cache, other memory, data storage, and/or an electronic display adapter. The memory 310, storage unit 315, interface 320, and peripheral devices 325 communicate with the CPU 305 through a communication bus (solid line) such as a motherboard. The storage unit 315 may be a data storage unit (or data repository) for storing data. Computer system 301 is operatively coupled to a computer network ("network") 330 by way of a communication interface 320. The network 330 may be the internet, the internet and/or an extranet, or an intranet and/or extranet in communication with the internet. In some cases, network 330 is a telecommunications and/or data network. The network 330 may include one or more computer servers, which may implement distributed computing, such as cloud computing. In some cases, network 330 may implement a peer-to-peer network with the aid of computer system 301, which may enable devices coupled to computer system 301 to function as clients or servers.

The CPU 305 may execute a series of machine-readable instructions, which may be embodied in a program or software. The instructions may be stored in a storage location such as memory 310. Instructions may be directed to the CPU 305 which may then program or otherwise configure the CPU 305 to implement the methods of the present disclosure. Examples of operations performed by the CPU 305 may include fetch, decode, execute, and write back.

The CPU 305 may be part of a circuit such as an integrated circuit. One or more other components of system 301 may be included in the circuit. In some cases, the circuit is an Application Specific Integrated Circuit (ASIC).

The storage unit 315 may store files such as drivers, libraries, and saved programs. The storage unit 315 may store user data such as user preferences and user programs. In some cases, computer system 301 can include one or more additional data storage units located external to computer system 301, such as on a remote server in communication with computer system 301 over an intranet or the internet.

Computer system 301 may communicate with one or more remote computer systems over a network 330. For example, computer system 301 may communicate with a remote computer system of a user. Examples of remote computer systems include a personal computer (e.g., a laptop PC), a tablet or tablet PC (e.g., iPad、Galaxy Tab), telephone, smartphone (e.g.,iPhone, Android-enabled device,) Or a personal digital assistant. A user may access computer system 301 via network 330.

The methods described herein may be implemented by way of machine (e.g., computer processor) executable code that is stored on an electronic storage location of computer system 301, such as memory 310 or electronic storage unit 315. The machine executable code or machine readable code may be provided in the form of software. During use, the code may be executed by the processor 305. In some cases, code may be retrieved from storage unit 315 and stored on memory 310 for ready access by processor 305. In some cases, electronic storage unit 315 may be eliminated, and the machine-executable instructions stored on memory 310.

The code may be precompiled and configured for use by a machine having a processor adapted to execute the code, or may be compiled during runtime. The code may be provided in a programming language, which may be selected to enable the code to be executed in a pre-compiled or just-in-time (as-compiled) manner.

Aspects of the systems and methods provided herein, such as computer system 301, may be embodied in programming. Various aspects of the technology may be considered as an "article of manufacture" or "article of manufacture" typically in the form of machine (or processor) executable code and/or associated data carried or embodied on a type of machine-readable medium. The machine executable code may be stored on an electronic storage unit such as a memory (e.g., read only memory, random access memory, flash memory) or a hard disk. A "storage" type medium may include any or all of the tangible memory, processors, etc. of a computer, or associated modules thereof, such as the various semiconductor memories, tape drives, disk drives, etc., that may provide non-transitory storage for software programming at any time. All or portions of the software may sometimes communicate over the internet or various other telecommunications networks. For example, such communication may enable software to be loaded from one computer or processor into another computer or processor, such as from a management server or host into the computer platform of an application server. Thus, another type of medium that can carry software elements includes optical, electrical, and electromagnetic waves, such as those used across physical interfaces between local devices, through wired and optical land-line networks, and various air links. The physical elements that carry such waves, such as wired or wireless links, optical links, etc., may also be considered as media carrying software. As used herein, unless limited to a non-transitory tangible "storage" medium, terms such as a computer or machine "readable medium" refer to any medium that participates in providing instructions to a processor for execution.

Thus, a machine-readable medium, such as computer executable code, may take many forms, including but not limited to tangible storage media, carrier wave media, or physical transmission media. Non-volatile storage media include, for example, optical or magnetic disks, such as any storage device in any computer, etc., such as may be used to implement a database as shown in the figures. Volatile storage media includes dynamic memory, such as the main memory of such a computer platform. Tangible transmission media include coaxial cables, copper wire and fiber optics, including the wires that comprise a bus within a computer system. Carrier-wave transmission media can take the form of electrical or electromagnetic signals, or acoustic or light waves, such as those generated during Radio Frequency (RF) and Infrared (IR) data communications. Thus, common forms of computer-readable media include, for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD or DVD-ROM, any other optical medium, punch cards, paper tape, any other physical storage medium with patterns of holes, a RAM, a ROM, a PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, cables or links transporting such a carrier wave, or any other medium from which a computer can read programming code and/or data. Many of these forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a processor for execution.

The computer system 301 may include or be in communication with an electronic display 335, the electronic display 335 including a User Interface (UI)340 for providing, for example, nucleic acid sequences and results of optical signal detection (e.g., sequence reads, intensity maps, etc.). Examples of UIs include, but are not limited to, Graphical User Interfaces (GUIs) and web-based user interfaces.

The methods and systems of the present disclosure may be implemented by one or more algorithms. The algorithm may be implemented in software when executed by the central processing unit 305. The algorithm may, for example, implement the methods and systems of the present disclosure, such as determining sequence reads based at least in part on the intensity of the detected optical signal.

Examples

Example 1: general principle of synthesis

Certain of the following examples illustrate various methods of preparing the linkers and labeled substrates described herein. It is understood that one skilled in the art can prepare these compounds by similar methods or by combining other methods known to those skilled in the art. It will also be appreciated that the skilled person will be able to prepare other compounds in a similar manner as described below by using suitable starting materials and modifying the synthetic route as required. In general, starting materials and reagents can be obtained from commercial suppliers or synthesized according to sources known to those skilled in the art or prepared as described herein.

Unless otherwise indicated, reagents and solvents used in the synthetic methods described herein are obtained from commercial suppliers. Anhydrous solvents and oven-dried glassware can be used for synthetic transformations sensitive to moisture and/or oxygen. The yield may not be optimized. The reaction time may be approximate and may not be optimized. The materials and equipment used in the synthesis process may be replaced by suitable alternatives. Unless otherwise indicated, column chromatography and Thin Layer Chromatography (TLC) can be performed on reverse phase silica gel. Nuclear Magnetic Resonance (NMR) and mass spectra can be obtained to characterize the reaction products and/or monitor the progress of the reaction.

Example 2: structure of labeling reagent

Described herein are examples of semi-rigid, water-soluble linkers of defined molecular weight that can effectively achieve dye-dye or dye-quencher separation. Semi-rigid structures may be constructed by having zero or one sp3A bond and zero or more sp or sp2Linked, a series of linked, aromatic or non-aromatic ring systems. Water solubility may be achieved by including (e.g., in each subunit) at least one moiety selected from the group consisting of: hydroxyl, pyridinium, imidazolium, sulfonate, amino, thiol, carboxyl, and quaternary ammonium. The linker may be an iso-or iso-di (or tri) functional reagent that allows for attachment of a dye (e.g., a fluorescent dye) at one end and a biological ligand at the other end (e.g., nucleotides). Examples of the general formula of such linkers are shown below:

wherein p is the number of repeating units selected from 1 to 100; each R3Is a water soluble moiety independently selected from, for example, pyridinium and sulfonate; r1And R2Are attachment groups such as amino and carboxyl moieties; each n is independently 1 or 2; each m is independently selected from 1 and 2; and each q is independently selected from 4 to 8. In the above structure, m represents sp connecting ring portions to each other3The amount of carbon. The cyclic moiety may be an aliphatic or aromatic ring.

A plurality of such subunits may be connected to each other. For example, the linker may be represented by the formula:

wherein p and r are each independently selected from the number of repeating units of 1 to 100; each R3And R4Is a water soluble moiety independently selected from, for example, pyridinium and sulfonate; r1And R2Are attachment groups such as amino and carboxyl moieties; each n and i is independently 1 or 2; each m and k is independently selected from 1 and 2; and each q and j is independently selected from 4-8. In the above structure, m and k represent sp connecting ring portions to each other3The amount of carbon. The cyclic moiety may be an aliphatic or aromatic ring. In some cases, the ring portion of the left-hand portion of the structure is aliphatic and the ring portion of the right-hand portion of the structure is aromatic, or vice versa.

Note that the above-described structure does not cover all embodiments of the present disclosure. For example, the linker need not be a polymer of "P-repeat" units. Similarly, the water-soluble functional group may be a constituent of the ring rather than attached to the ring.

Example 3: synthesis of dGTP-AP-SS-hyp10-Atto633

A method of constructing the labelled nucleotide dGTP-AP-SS-hyp10-Atto633 is described. Figure 5A illustrates an exemplary method for synthesizing a fluorescently labeled dGTP reagent. Figure 5B illustrates the same synthesis with the complete structure of the dye and linker. The method includes forming a covalent bond between Gly-Hyp10 and fluorophore Atto633 (process (a)), esterification to couple Atto633-Gly-Hyp10 to pentafluorophenol (process (b)), substitution with a linker molecule epSS (process (c)), esterification to form Atto633-Gly-Hyp10-epSS-PFP (process (d)), and substitution with dGTP to provide a fluorescently labeled nucleotide (process (e)). Details of the synthesis are provided below.

Preparation of Atto633-Gly-Hyp 10. (FIG. 5 Process (a)) stock solutions of Gly-Hyp10 (also referred to herein as "Hyp 10") in bicarbonate were prepared by dissolving 25 milligrams (mg) of the 11 amino acid peptide in 500 microliters (μ L) of 0.2 moles (M) of sodium bicarbonate in a 1.5 milliliter (mL) microcentrifuge tube. 7mg of Atto633-NHS was weighed into another microcentrifuge tube and dissolved in 200. mu.L of Dimethylformamide (DMF). A300. mu.L volume of peptide solution was added to the Atto633-NHS containing solution. The resulting solution was mixed and heated to 50 ℃ for 20 minutes (min). The extent of reaction was followed by reverse phase Thin Layer Chromatography (TLC). A1. mu.L aliquot of the reaction solution was removed and dissolved in 40. mu.L of water, and spotted on a reverse phase TLC. Including a concurrent sample with Atto633 acid, and Atto633 also runs alone. The plate was eluted with a 2:1 solution of acetonitrile 0.1M triethylammonium acetate (TEAA). Both Atto633 acid and Atto633-NHS have a zero R fAnd Gly-Hyp10 has an R of 0.4f. The product was purified by injecting the solution onto a C18 reverse phase column using a gradient of 20% → 50% acetonitrile versus 0.1M TEAA over 16 minutes at 2.5 mL/min. The expected product is the main product Atto633-Gly-Hyp10, eluting at 15.2 min. The fractions containing the desired material were collected in a microcentrifuge tube and dried to yield a blue solid. Major peaks were observed on the ESI mass spectrum: for C87H115N14O24 +Calculated M/z, [ M]+1739.8; obtaining a value: 1740.6.

preparation of Atto633-Gly-Hyp 10-PFP. (FIG. 5 Process (b)) Atto633-Gly-Hyp10 was suspended in a 1.5mL microcentrifuge tubeIn 100. mu.L of DMF. Pyridine (20 μ L) and pentafluorophenyl trifluoroacetate (PFP-TFA,20 μ L) were added to the tube. The reaction mixture was heated to 50 ℃ in a heating block for 20 minutes. The reaction was monitored by taking 1 μ L aliquots and adding 1mL of dilute HCl (0.4%). When the reaction was complete, the aqueous solution was colorless. After 10 minutes, the dilute HCl solution appeared light blue. Additional PFP-TFA (30. mu.L) was added. After an additional 100 minutes at 50 ℃, the precipitate was retested to give a colorless solution. The remaining reaction mixture was precipitated in 20. mu.L portions into 1mL of dilute hydrochloric acid. mu.L was added to 1mL of dilute hydrochloric acid, the tube was rotated down and the aqueous solution was discarded. This process was repeated until all the product precipitated. The residue was dried thoroughly. After drying, the solid was washed twice with 1mL of methyl tert-butyl ether (MTBE). The product was a dark blue powder. The product produced one main peak on electrospray ionization (ESI) -Mass Spectrometry (MS): for C 93H115F5N14O24 2+Calculated M/z, [ M + H ]]2+1906.8/2 953.4; obtaining a value: 953.4.

preparation of Atto633-Gly-Hyp 10-epsS. (FIG. 5 Process (c)) Atto633-Gly-Hyp10-PFP (1.6. mu. mol) was dissolved in 100. mu.L of DMF in a microcentrifuge tube. aminoethyl-SS-propionic acid solution (Broadpharm; 6mg in 200. mu.L of 0.1M bicarbonate) was mixed with Atto633-gly-hyp10-PFP and heated to 50 ℃ in a heating block for 20 min. Atto633-Gly-Hyp10-epSS was purified from the resulting reaction mixture by reverse phase HPLC using a gradient of 20% → 50% acetonitrile over 16 minutes. Atto633-Gly-Hyp10 eluted at 15 minutes, and Atto633-Gly-Hyp10-epsS eluted at 15.6 minutes. Fractions containing the product Atto633-Gly-Hyp10-epsS were combined and dried. The product has a main peak on ESI-MS: for C92H124N15O25S2 +Calculated M/z, [ M]+1902.8; obtaining a value: 1902.6.

preparation of Atto633-Gly-Hyp 10-epsS-PFP. (FIG. 5 Process (d)) Atto633-Gly-Hyp10-epsS was dissolved in 100. mu.L of DMF in a microcentrifuge tube. Pyridine (20 μ L) and PFP-TFA (20 μ L) were added and the mixture was heated to 50 ℃ in a heating block for 20 minutes. Test aliquots (1 μ L) in dilute HCl produced colorless solutions and blue precipitates. The reaction was precipitated as a 20 μ L aliquot in 1mL of dilute HCl, the tube was spun down and the aqueous solution was discarded. This process was repeated until all the PFP ester precipitated. The residue was dried thoroughly under vacuum and washed with MTBE.

Preparation of dGTP-AP-SS-Atto 633. (FIG. 5 Process (e)) A solution of aminopropyl dGTP (Trilink; 1. mu. mol in 100. mu.L of 0.2M bicarbonate) was added to 50. mu.L of a DMF solution containing Atto633-gly-hyp 10-epsS-PFP. The mixture was heated to 50 ℃ for 10 minutes. The product dGTP-AP-epSS-Atto633 was purified by reverse phase HPLC using a gradient of 20% → 50% acetonitrile for 16 minutes. The product eluted at 15.3 minutes. Preparative HPLC provided 0.65 μmol. The product produced one major peak on ESI-MS: for C106H139N20O37P3S2 2–Calculated M/z, [ M-H ]]2-1220.4; obtaining a value: 1220.6.

although the synthesis of dGTP-Atto633-Gly-Hyp0-epsS-PFP is described, the skilled practitioner will recognize that other fluorescently labeled nucleotides can be produced in a similar manner using suitable starting materials.

Example 4: synthesis of dCTP-epsS-Atto633

dCTP-SS12-Atto633 can be prepared in a similar manner to the method outlined in example 3. Briefly, Atto633-epSS was prepared by mixing a 200 μ L DMF solution containing 11mg Atto633-NHS with a 200 μ L aqueous solution containing 0.2M sodium bicarbonate and 24mg epSS, heating the resulting mixture to 50 ℃ for 15 minutes, purifying Atto633-epSS from the mixture by reverse phase HPLC using a gradient of 40% → 60% acetonitrile versus 0.1MTEAA over 16 minutes at 4.5mL/min, and confirming the identity of the product with ESI-MS (method (a) of fig. 6). The product eluted at 7.3 minutes and the free dye eluted at 6.4 minutes. The yield was about 80%. This product produced one major peak on ESI-MS: for C 40H51N4O4S2 +Calculated, [ M]715.3; find the value [ M]+=715.3。

Then by mixing Atto633-epsS solutions dissolved in 100. mu.L of DMF, 20. mu.L of pyridine and 20. mu.L of PFP-TFA; the solution was heated at 50 ℃ for 5 minutes, then another 20-40 μ L of PFP-TFA was added; heating back to 50 ℃ for 5 minutes; and precipitating the product in 1mL of dilute HCl converts Atto633-epsS to Atto-epsS-PFP (FIG. 6 Process (b)). The product was washed with an additional 1mL of dilute HCl and the supernatant removed by pipette and evaporation to yield a blue solid.

dCTP-epsS-Atto633 is formed by the reaction of Atto-epsS-PFP with aminopropargyl dCTP (AP-dCTP) (FIG. 6, Process (c)). AP-dCTP stock solution (Trilink; 1. mu. mol) was added to 100. mu.L of DMF solution containing 0.2M sodium bicarbonate and combined with the Atto-epsS-PFP solution dissolved in 100. mu.L of DMF. The mixture was allowed to stand overnight. dCTP-epsS-Atto633 was purified from the mixture over 16 minutes at 2.5mL/min using a gradient of 20% → 100% acetonitrile versus 0.1M TEAA on a C18 reverse phase column. The product eluted at 10.7 minutes. The product containing fractions were collected and dried. This product produced one major peak on ESI-MS: for C52H66N8O16P3S2 Calculated value of M/z, [ M ]]-1215.3; obtaining a value: 1215.5.

Example 5: preparation of dye-labeled nucleotides

A set of dye-labeled nucleotides designed for excitation at about 530nm was prepared. Excitation at 530nm can be achieved using a green laser, which can be readily available, high power and stable. There are many commercially available fluorescent dyes that can be excited at or near 530nm, are inexpensive, and have a variety of properties (hydrophobic, hydrophilic, positively charged, negatively charged). The synthetic routes for such dyes may be shorter and less expensive than those for longer wavelength dyes. In addition, certain green dyes may have significantly less self-quenching than red dyes, possibly allowing the use of higher labeling scores (e.g., as described herein).

A possible set of reagents for sequencing applications consists of each of the four canonical nucleotides or their analogs with a cleavable green dye that performs well in sequencing, for example. The optimal set can be prepared by varying each component of the labeled nucleotide structure to obtain an array of candidate labeled nucleotides with different properties. The resulting nucleotides are evaluated (e.g., as described below), and certain labeled nucleotides are optimized for concentration and labeling fraction (ratio of labeled to unlabeled nucleotides in the stream).

FIG. 7 shows various components used to construct candidate labeled nucleotides. Each of the four propargylamino-functionalized nucleotides (A, C, G and U) may be modified with one of two cleavable linkers E and B; hydroxyproline linker (hyp10) or not; and one of three fluorescent dyes, # and $. With these components, there are 48 possible nucleotide variations. Labeled nucleotides can be prepared according to the synthetic routes and principles described herein. An exemplary synthesis of G x-B-H labeled nucleotides is described in example 6.

Example 6: synthesis of G.sub.B-H labeled nucleotides

The synthetic method for the preparation of G x-B-H (see example 5) is shown in figure 8. Similar methods can be used to prepare other labeled nucleotides described in example 5 and elsewhere herein. Since the ingredients used include amino acids, there are various ways to obtain the final product. Synthetic considerations include the tendency of triphosphate to hydrolyze under heat or acidic conditions (to form diphosphate and monophosphate), the tendency of disulfide to decompose in the presence of triethylamine and ammonia, the prevention of the use of acid labile protecting groups, and the prevention of the use of trifluoroacetamide or FMOC protecting groups.

Preparation of PN 40142. A solution of Atto 532 succinimidyl ester (Atto-tec, PN 40183; 5mg ═ 4.6 μmol) in 100 μ L DMF was mixed with a solution of gly-hyp-hyp-hyp-hyp-hyp-hyp-hyp (custom synthesis from Genscript, PN 40035; 8.5mg ═ 7 μmol) in 170 μ L of 0.1M bicarbonate in a 1.5mL microcentrifuge tube. The reaction was purified over 16 minutes on a Phenomenex reverse phase C18 semi-preparative column (Gemini 5 μ M C18, 250x10mm) using a gradient of 10% → 40% acetonitrile versus 0.1M triethylammonium acetate. The fractions containing product 40142 are combined and concentrated to dryness. By diluting the fractions and measuring the Optical Density (OD) at 633nm and using 130,000cm-1M-1The extinction coefficient of the dye to determine the yield. The yield was 50%. The structure was confirmed by mass spectrometry in negative ion mode: for C81H103N14O31S2 Calculated m/z, 1831.6; obtaining a value: 1831.8.

preparation of PN 40143. PN 40142 (4. mu. mol) was suspended in 100. mu.L of DMF in a 1.5mL microcentrifuge tube. Pyridine (20 μ L) and pentafluorophenyl trifluoroacetate (20 μ L) were added to the DMF solution and heated to 50 ℃ for 5 min. A portion (1 μ Ι) of the reaction mixture was precipitated into 0.4% HCl; the aqueous solution remained colorless, indicating complete conversion to the active pentafluorophenyl ester. The remainder of the reaction precipitated into dilute acidic solution and the aqueous solution was pipetted off. The residue was washed with hexane and dried to a highly colored solid (PN 40143).

Preparation of PN 40146. PN 40143 was dissolved in 100. mu.L of DMF and mixed with disulfide PN 40113(5mg, 20. mu. mol) in DMF. Diisopropylethylamine (5 μ L) was added to the mixture. The mixture was purified over 16 minutes on reverse phase HPLC using a gradient of 20% → 50% acetonitrile versus 0.1M TEAA. The two dye colored fractions were obtained at 8.8 minutes and 9.5 minutes. The fraction at 9.5 min was identified by mass spectrometry as the desired product: for C90H111N15O32S4 2-Calculated m/z, 1020.84; obtaining a value: 1021.1.

preparation of PN 40147. PN 40146 was suspended in 100. mu.L of DMF in a 1.5mL microcentrifuge tube. Pyridine (20 μ L) and pentafluorophenyl trifluoroacetate (20 μ L) were added to the DMF solution and heated to 50 ℃ for 5 min. A portion (1 μ Ι _ of) of the reaction mixture was precipitated into 0.4% HCl; the aqueous solution remained colorless, indicating complete conversion to the active pentafluorophenyl ester. The remainder of the reaction precipitated into dilute acidic solution and the aqueous solution was pipetted off. The residue was washed with hexane and dried to a highly colored solid (PN 40147).

Preparation of PN 40150. PN 40147 was dissolved in 50. mu.L of DMF in a 1.5mL microcentrifuge tube. Preparation of 0.5. mu. mol of 7-deaza-7-propargylamino-2 '-deoxyguanosine-5' -triphosphate in 50. mu.L of 1M hydrogen carbonate And adding it to the tube. After holding at 4 ℃ overnight, the product was purified on HPLC; the fraction at 12 minutes over 16 minutes using a 20% → 50% acetonitrile versus 0.1M TEAA gradient contained the desired product: for C104H129N20O44P3S4 2–Calculated M/z, [ M-H ]]2-1291.33; obtaining a value: 1292.4.

example 7: evaluation of dye-labeled nucleotides

The dye-labeled nucleotides of example 5 were evaluated using a bead-based assay. Streptavidin beads were prepared with a 5' -biotinylated template strand annealed to a primer strand. The primer strand is designed such that the next homologous base incorporated by the DNA polymerase is thymidine. The DNA polymerase is bound to the bead complex. Various mixtures containing different ratios of dye-labeled nucleotides (dUTP) and natural bases (TTP) were then provided to the beads. After washing away excess reagents, the fluorescence of the beads was read on a flow cytometer using PE channels (excitation 488nm, emission 580 nm). A schematic of this assay is shown in figure 9.

The results of the bead assay for the different labeled dutps are shown in figure 10. The total concentration of the sum of nucleotides was maintained at 2 μ M; a labeling score of 10% means 0.2 μ M dUTP and 1.8 μ M TTP. The behavior of the two nucleotides is significantly different: the "tolerance" of U # -E is about 1, which means that there is no difference in incorporation of dye-labeled nucleotides compared to natural nucleotides in all ratios tested; that is, a labeling fraction of 50% results in 50% of beads being labeled. On the other hand, U x-E has a negative tolerance, which means that at each scale it is below the line between zero and the signal at 100% of the mark. Negative tolerance indicates that the dye labels make the nucleotide a worse substrate than the natural substrate. This result is consistent with the following observations: negatively charged dyes such as Atto532 (dye represented by U x-E) inhibit incorporation by many polymerases, while dyes such as 5-carboxyrhodamine-6G (dye represented by U # -E) are zwitterionic and are considered good substrates.

Other labeled nucleotides were evaluated using similar assays. Figure 11 shows the results of bead assays of labeled dATP. Figure 12 shows the results of bead assays for labeled dGTP. For labeled dATP, very low fluorescence was observed at 100% labeling for a-B compared to a-B-H and a-E-H. This indicates that the hydroxyproline linker (H) mitigates quenching of the dye by the nucleotide. Similar results were observed for labeled dGTP. This result is expected for labeled dGTP, since G-quenching by photoinduced electron transfer is well known. The quenching effect from disulfide linker B may also contribute to the lower fluorescence observed for labeled dATP and dGTP.

Example 8: sequencing using dye-labeled nucleotides

Nucleic acid sequencing assays can be used to assess dye-labeled nucleotides (e.g., as described herein). An exemplary procedure is shown in fig. 18.

Sequencing can be performed using instruments equipped with Light Emitting Devices (LEDs) and/or lasers. Each nucleotide evaluated may include a dye configured for excitation and emission at a similar wavelength (e.g., all red or all green emission). One or more of the different nucleotide types may be conjugated to different dyes. Sequencing performance can be evaluated based on base recognition quality, phase lag, phase lead, and homopolymer completion.

The beads with amplified template are prepared, immobilized on a support, and incubated with a tightly bound DNA polymerase. The beads were then subjected to multiple sequencing cycles. Each sequencing cycle may include incubation with U x/T (a fixed ratio of dye-labeled TTP and native TTP), a "chase" process (TTP only), an imaging and cleavage process (10mM tris (hydroxypropyl) phosphine (THP))) to release the dye. There may be wash processes between each process. This process may be repeated for nucleotides or nucleotide analogs including A, C and G. The sequencing procedure can effectively identify homopolymeric regions of at least 2, 3, 4, 5, 6, 7, 8, or more nucleotides.

Sequencing was also assessed against the full hyp linker group, where dye-labeled nucleotides (including each canonical nucleotide) included hyp10 or hyp20 linkers. This evaluation was performed to identify the group in which the higher score was used and the quenching was the least. Higher quenching may result in higher scarring (e.g., as described herein), which may reduce the incorporation efficiency of the polymerase. However, family B enzymes such as PolD may perform well in the case of scars. Sequencing can be assessed with label scores of 2.5% and 20% with dyes (e.g., Atto 633).

Sequencing can be used to assess tolerance to various labeled nucleotides. Figure 19 shows normalized bead data for nucleotides labeled with red-emitting dyes. Fraction of bright solution (b)f) Fraction of incorporation relative to light (b)i) And (6) drawing. The curve conforms to the following equation:

wherein d isfIs the dark solution fraction. In fig. 19, the calculated tolerance of G is 10.6, a is 2.8, U is 2.0, and C is 1.2. A positive tolerance figure indicates that at a marking score of 50%, greater than 50% is marked. A reagent with a tolerance of 1 may have the least "context" in sequencing. With very negative tolerances (e.g. tolerance)<<1) The reagents of (a) may have problems with uniform incorporation between the plurality of templates coupled to the support, since they must be used in such low concentrations that they may be below saturation and are consumed at non-uniform rates.

Example 9: dye-labeled nucleotides comprising guanine or analogs thereof

Nucleotides that include guanine or analogs thereof may perform worse in base calling accuracy in sequencing applications (e.g., as described herein). This may be associated with photoinduced electron transfer from the nucleobase to the dye attached to the nucleobase, which may quench the signal emitted by the dye, thereby reducing the dynamic range of the signal. Thus, as provided herein, various dye-labeled nucleotides, including guanine or analogs thereof, are prepared and evaluated. Examples of such dye-labeled nucleotides include:

G1

G2

G3

G4

G5(Hyp10 linker, Cya dye)

G6(Hyp10 linker, Cya2 dye)

Several of the structures shown above include a Hyp10 linker comprising the sequence Gly-Hyp-Hyp-Hyp-Hyp-Hyp-Hyp-Hyp-Hyp from the N-terminus. G4, lacking the hyp10 linker, was highly quenched. The remaining dye-labeled nucleotides are evaluated in a sequencing assay, as described herein. In the illustrated construction, G6 provides the highest accuracy. The synthetic route to preparation G6 is shown in FIGS. 13A-13C.

Example 10: preparation of dye-labeled nucleotides

Dye-labeled nucleotides may include one or more amino acids. As noted above, diamines and diacids can be used to construct amino acids. Dye-labeled nucleotides may include two or more given amino acids as a repeat unit. Examples of dye-labeled nucleotides comprising two repeat units of amino acids are shown below:

FIGS. 14A and 14B show synthetic routes for preparing the dye-labeled nucleotides described above. The composition of each intermediate was confirmed by mass spectrometry. Dye-labeled nucleotides were evaluated in a bead assay as described in example 7. The linker provides G less bright than G with a polyhydroxyproline linker, but is more effective at reducing quenching than G without a linker.

Example 11: evaluation of quenching

Dye-labeled nucleotides provided herein can improve quenching between a nucleobase and a dye to which it is attached and/or between dyes in a nucleic acid molecule (e.g., a growing nucleic acid strand), e.g., in a homopolymeric region of the nucleic acid molecule. Quenching can be assessed in an enzyme-independent manner.

FIG. 15 shows a schematic for assessing quenching. Synthetic oligonucleotides are constructed using one or two "linker arm nucleotides". The linker arm nucleotide is a thymidine analog, and the linker arm contains a primary amine. Oligonucleotides containing linker arm nucleotides can be labeled with linkers and dyes and purified by HPLC. An advantage of assays using bead labeling is that accurate quantification of the reagents is not required; a large excess can be used and bead washes performed at each step to ensure that only stoichiometric amounts of oligonucleotide are bound to the template. Each dye linker was placed on two oligonucleotides. Beads were measured on a flow cytometer in the APC (red) channel. Percent quenching was determined by the following formula: % quenching of 100 × (1-Fl)bis/(2*Flmono)).。

Fig. 16 and 17 show the quenching results for the red dye linker (fig. 16) and the green dye linker (fig. 17). The results show that the nature of the dye affects quenching. Negative charges (see Atto532 and attorop 6G) may improve quenching, but if the dye is very large and flat (see Cy5, Alexa647) quenching may not be improved. The hyp10 or hyp20 linker improves quenching. As shown in fig. 16, hyp10 improved quenching in Atto633 case, and the cyanine dye quenched even in four sulfonic acid groups. As shown in fig. 17, the sulfonic acid group on Atto532 improved quenching, as did the combination of Atto532 and hyp 10.

Example 12: interrogation of homopolymers

Nucleic acid templates are provided having homopolymer regions comprising cytosines (1C, 2C, 3C, 4C, 5C) of varying lengths. The template is contacted with a guanosine-containing nucleotide labeled with an Atto532 fluorophore (e.g., as described herein; denoted G). The labeled nucleotides can be provided in solution in the form of a stream of nucleotides (e.g., as described herein). The nucleotide stream can include 100% labeled nucleotides (e.g., the nucleotide stream can include only labeled nucleotides without unlabeled nucleotides) or can include both labeled and unlabeled nucleotides (e.g., as described herein). The labeled nucleotides and, if present, unlabeled nucleotides may not terminate such that multiple nucleotides may be successively incorporated at as many positions as there are cytosines present in the template. An enzyme (e.g., a polymerase such as Bst 3.0) can be used to incorporate labeled nucleotides and/or unlabeled nucleotides into an extension primer that uses a nucleic acid having a poly-cytosine sequence as a template. Multiple copies of the template may be immobilized on beads or other supports (e.g., as described herein). This procedure is schematically shown in fig. 20A and 20B.

In some cases, the labeled nucleotides are incorporated consecutively at as many positions as there are cytosines present in the template. In other cases, less than all potential gs are incorporated. When unlabeled nucleotides are included in the nucleotide stream, both unlabeled and labeled nucleotides can be incorporated. For example, for a template comprising a homopolymeric region comprising three cytosines, the incorporated nucleotides may have the sequence GGG, GG G, GGG, G GG, G G, G GG, GG G, or G, wherein G represents a labeled nucleotide and G represents an unlabeled nucleotide. The sequence of incorporated nucleotides can vary based on, for example, the labeling fraction of the nucleotide stream (e.g., the ratio of labeled to unlabeled nucleotides in the stream) and the optical (e.g., fluorescent) labeling reagent used to label the nucleotides.

The labeled polynucleotide products were separated on a Biorad denaturing acrylamide gel and imaged using blue and green LEDs to detect incorporated labeled nucleotides. As shown in fig. 20C, 1, 2, 3, 4, and 5 consecutive cytosines can be detected using this method.

Example 13: sequencing by synthesis using high fractions of labeled nucleotides

A template nucleic acid of at least 30 nucleotides in length is sequenced using the procedures and labeled nucleotides described herein. The template to be sequenced may be immobilized on a support (e.g., as described herein). The template is sequenced by a synthesis reaction in which the template is sequentially contacted with a solution (e.g., a stream of nucleotides) comprising PolD polymerase (New England Biolabs) and a plurality of nucleotides of a single canonical type (e.g., T, A, C or G). In each nucleotide stream, approximately 20% of the nucleotide population was labeled with Atto633 as described herein above to provide a labeling score of approximately 20%. The remaining nucleotides are not labeled. The nucleotides contained in the nucleotide stream do not terminate to allow efficient sequencing of the homopolymeric region of the template. After contacting the template with a first nucleotide stream comprising nucleotides of the first canonical type, the template is contacted with a wash stream to remove unincorporated nucleotides. Fluorescence images were collected. Contacting a linker of a fluorescent labeling reagent associated with the incorporated labeled nucleotide with a cleavage stream comprising a cleavage reagent configured to cleave a cleavable group of the linker to separate a fluorescent dye (e.g., Atto633) of the fluorescent labeling reagent from the incorporated nucleotide. Additional wash streams may be used to remove the cut stream. In some cases, a trace stream comprising unlabeled nucleotides of the first canonical type may follow the initial nucleotide stream and precede or follow the imaging procedure. This process is repeated for the second, third and fourth nucleotide types in succession, and the entire cycle is repeated.

Fig. 21A shows the result of applying the method to a sample template. Black circles indicate incorporation of nucleotides, and gray circles indicate no incorporation of nucleotides in a particular flow cycle. As shown, the incorporation of one or more nucleotides in a flow cycle can be determined with high accuracy. Further, as shown in fig. 21B, the relationship between signal intensity and labeled nucleotide homopolymer length can be substantially linear across multiple templates (e.g., as described herein). For example, the signal intensity may be proportional to the length of the homopolymeric region of the template. This ratio indicates that the quenching effect has been sufficiently overcome. In fig. 21B, the slope of G is 0.96, C is 0.80, a is 079, and T is 0.70. The dotted line represents the actual signal and the solid line represents the phase corrected signal.

Example 14: sequencing by Synthesis Using 100% labeled nucleotides

Template nucleic acids having a length of at least 30 nucleotides were sequenced as described in example 13, but using a solution in which 100% of the nucleotides were labeled. In FIG. 22, black circles indicate incorporation of bases in a given flow cycle, while gray circles indicate non-incorporation of bases in a given flow cycle. As can be seen from fig. 22, the sequencing method can be used to detect base incorporation by 50 flow cycles.

Example 15: labeled proteins

Proteins are labeled with a variety of optical (e.g., fluorescent) labeling reagents (e.g., as described herein). For example, a protein may be labeled with three or more optical labeling agents. The optical labeling reagents associated with the proteins may all comprise the same type of fluorescent dye. The optical labeling reagents associated with the proteins may all comprise the same type of linker. The protein may be an antibody, such as a monoclonal antibody.

The protein is used to label the cells. The cell may be a component of a sample, which may include a plurality of cells. Flow cytometry can be used to analyze and sort cells of a sample. Flow cytometry analysis can identify cells as protein markers associated with a plurality of optical labeling reagents. In some cases, a plurality of cells of a sample can be labeled with an optical labeling reagent (e.g., as described herein). For example, cells comprising a particular cell surface feature (e.g., an antigen) configured to associate with a protein (e.g., a protein labeled with a plurality of optical labeling reagents, e.g., an antibody labeled with a plurality of optical labeling reagents) can be labeled with the labeled protein and analyzed and/or sorted using flow cytometry. The analyzed and/or sorted cells may be subjected to further downstream analysis and processing including, for example, nucleic acid sequencing, staining, imaging, functional assays, immunoassays, separation/amplification, additional labeling, immunoprecipitation, and the like.

While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that: these embodiments are provided by way of example only. The present invention is not intended to be limited to the specific examples provided in the specification. While preferred embodiments of the present disclosure have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the disclosure. Further, it is to be understood that all aspects of the present invention are not limited to the specific depictions, configurations or relative proportions set forth herein which depend upon a variety of conditions and variables. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is therefore contemplated that the present invention shall also cover any such alternatives, modifications, variations or equivalents. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.

101页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:用于治疗血液癌症和实体瘤癌症的嵌合抗原受体修饰的T细胞(CAR-T)

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!

技术分类