System and method for identifying sequence information from single nucleic acid molecule measurements

文档序号:1189045 发布日期:2020-09-22 浏览:28次 中文

阅读说明:本技术 用于由单个核酸分子测量中鉴定序列信息的系统和方法 (System and method for identifying sequence information from single nucleic acid molecule measurements ) 是由 D·C·施瓦茨 S·南迪 M·A·牛顿 于 2018-12-04 设计创作,主要内容包括:公开了用于由对单核酸分子进行的测量中鉴定序列信息的系统和方法。该系统和方法可以包括将核酸分子的部分与标志物分子如荧光分子和/或嵌入分子进行结合。标志物分子提供了可检测的信号,其包括有关结合了给定标志物物分子的核酸分子上位置的潜在基因组信息的信息。获得沿着核酸位置多个不同核酸分子的可检测信号的概况。PRIMR算法处理数据以提供一个共识概况,从中可以确定共识潜在基因组信息。(Systems and methods for identifying sequence information from measurements made on single nucleic acid molecules are disclosed. The systems and methods may include combining portions of the nucleic acid molecule with marker molecules, such as fluorescent molecules and/or intercalating molecules. The marker molecules provide a detectable signal comprising information about the underlying genomic information of the location on the nucleic acid molecule to which a given marker molecule is bound. A profile of detectable signals is obtained for a plurality of different nucleic acid molecules along a nucleic acid position. The PRIMR algorithm processes the data to provide a consensus profile from which consensus underlying genomic information can be determined.)

1. A method of obtaining data relating to a nucleic acid molecule, the method comprising:

a) binding a plurality of marker molecules to at least a portion of the nucleic acid molecule, each of the plurality of marker molecules providing a detectable signal comprising potential genomic information about the nucleic acid molecule;

b) acquiring detectable signals from a plurality of locations along at least a portion of the nucleic acid molecule; and

c) generating a report or output signal comprising the detectable signal.

2. The method of claim 1, wherein at least a portion of the nucleic acid molecule is linear.

3. The method of claim 2, wherein at least a portion of the nucleic acid molecule is confined within a nanoslit.

4. The method of any one of the preceding claims, wherein the detectable signal comprises potential genomic information derived from binding a sequence preferentially over another sequence.

5. The method of any one of the preceding claims, wherein the plurality of marker molecules comprises a plurality of fluorescent molecules.

6. The method of claim 5, wherein the plurality of marker molecules comprises a plurality

{1,1' - (4,4,8, 8-tetramethyl-4, 8-diaza-undecamethylene) bis [4- [ (3-methylbenzo-1, 3-oxazol-2-yl) methylene ] -l, 4-dihydroquinolinium ] tetraiodide } (YOYO-1) molecule.

7. The method of any one of the preceding claims, wherein the plurality of marker molecules comprises a plurality of first and second fluorescent molecules.

8. The method of claim 7, wherein the first and second fluorescent molecules interact with each other to provide the detectable signal.

9. The method of claim 7, wherein the first fluorescent molecule and the second fluorescent molecule have different emission characteristics.

10. The method of claim 7, wherein the first fluorescent molecule and the second fluorescent molecule have different absorption characteristics.

11. The method of claim 7, wherein the first fluorescent molecule and the second fluorescent molecule have different binding characteristics.

12. The method of any one of the preceding claims, further comprising binding a plurality of quencher molecules to at least a portion of the nucleic acid molecules, each of the plurality of quencher molecules modulating emission from the plurality of marker molecules to provide the detectable signal.

13. The method of any one of claims 1-4, wherein the plurality of marker molecules comprises a plurality of donor molecules and a plurality of acceptor molecules.

14. The method of any one of the preceding claims, wherein the binding of step a) is by covalent bonds, ionic bonds, polar bonds, hydrogen bonds or combinations thereof.

15. The method of any one of the preceding claims, wherein the binding of step a) involves inserting each of the plurality of marker molecules between bases of the nucleic acid molecule.

16. The method of any one of the preceding claims, comprising:

a) binding a second plurality of marker molecules to at least a second portion of the nucleic acid molecules, each of the second plurality of marker molecules providing a detectable signal; and

b) obtaining a detectable signal from a second plurality of locations along at least a second portion of the nucleic acid molecule.

17. The method of any one of the preceding claims, comprising repeating steps a) and b) again using a second nucleic acid molecule instead of the nucleic acid molecule.

18. The method of claim 17, wherein the nucleic acid molecule and the second nucleic acid molecule have substantially the same sequence.

19. The method of claim 18, wherein at least a portion of the nucleic acid molecule and at least a portion of the second nucleic acid molecule at least partially overlap.

20. The method of claim 17, wherein the nucleic acid molecule and the second nucleic acid molecule have different sequences.

21. The method of any one of the preceding claims, comprising repeating steps a) and b) a further plurality of times, each of the further plurality of times using a different one of a plurality of other nucleic acid molecules, respectively, in place of the nucleic acid molecule.

22. The method of claim 21, wherein the nucleic acid molecule and the plurality of other nucleic acid molecules have substantially the same sequence.

23. The method of claim 22, wherein at least a portion of the nucleic acid molecule and at least a portion of each of different other nucleic acid molecules of the plurality of other nucleic acid molecules at least partially overlap.

24. The method of any one of the preceding claims, wherein the nucleic acid molecule is a single-stranded DNA molecule, a double-stranded DNA molecule, a single-stranded RNA molecule, or a double-stranded RNA molecule.

25. The method of any one of claims 17 to the preceding claim, wherein the second nucleic acid molecule is a single stranded DNA molecule, a double stranded DNA molecule, a single stranded RNA molecule, or a double stranded RNA molecule.

26. The method of any one of claims 21 to the preceding claim, wherein the plurality of other nucleic acid molecules are a plurality of single-stranded DNA molecules, a plurality of double-stranded DNA molecules, a plurality of single-stranded RNA molecules, or a plurality of double-stranded RNA molecules.

27. The method of any preceding claim, wherein the detectable signal is an optical signal and the acquiring of step b) comprises detecting the optical signal.

28. The method of claim 27, wherein the detectable signal is an optical fluorescence signal and the acquiring of step b) comprises detecting the optical fluorescence signal.

29. The method of claim 27 or 28, wherein the optical signal is detected by a microscope.

30. The method of claim 28, wherein the optical fluorescence signal is detected by a fluorescence microscope.

31. The system of claim 30, wherein the fluorescence microscope is a near-field microscope.

32. The method of any one of the preceding claims, wherein the detectable signal is triggered by an external stimulus.

33. The method of claim 32, wherein the external stimulus is electromagnetic radiation.

34. The method of any preceding claim, wherein the acquiring of step b) comprises extracting the detectable signal from an image.

35. The method of claim 34, wherein the method further comprises evaluating the quality of the image and excluding any images that do not meet a quality threshold.

36. The method of claim 35, wherein the quality assessment comprises: x) analyzing the detectable signal in a predetermined number of pixels around a nucleic acid molecule in the image, thereby producing an integrated intensity measurement; y) clustering the integrated intensity measurements; and z) scoring the image based on one or more factors selected from the group consisting of: the number of clusters in the image, the distance between clusters if two or more clusters exist, a cluster quality metric, and combinations thereof.

37. A method of analyzing detectable signals obtained from a plurality of nucleic acid molecules, the method comprising:

a) receiving a data set comprising a profile of detectable signal intensities versus position obtained from a plurality of marker molecules bound to substantially the same portion of the plurality of nucleic acid molecules;

b) extracting potential genomic information from the dataset; and

c) generating a report or output signal comprising the potential genomic information.

38. The method of claim 37, wherein the extracting of step b) comprises eliminating outliers from the dataset.

39. The method of claim 38, wherein the elimination outliers use Fraiman and muniz (fm) depths and Random Projection (RP) depths.

40. The method of any one of claims 37 to the preceding claim, wherein the extracting of step b) comprises normalizing the profile of the detectable signal intensity versus position.

41. The method of any one of claims 37 to the preceding claim, wherein the extracting of step b) comprises excluding a profile of detectable signal intensity versus position corresponding to nucleic acid molecules having a stretch value outside a predetermined range of acceptable stretch values.

42. The method of any one of claims 37 to the preceding claim, wherein the extracting of step b) comprises smoothing a profile of the detectable signal intensity versus position.

43. The method of claim 42, wherein the extracting of step b) comprises re-normalizing the smoothed profile of detectable signal intensity versus position.

44. The method of any one of claim 37 to the preceding claim, wherein the method further comprises generating a consensus profile of detectable signal intensity versus position.

45. The method of claim 44, wherein generating the consensus profile comprises correcting for amplitude variations between profiles of relative positions of the detectable signal strengths.

46. The method of claim 44 or 45, wherein generating the consensus profile comprises correcting for phase variations between profiles of relative positions of the detectable signal strengths.

47. The method of any of claims 44 to the preceding claim, wherein the generating the consensus profile comprises an iterative alignment process.

48. The method of any one of claims 44 to the preceding claim, wherein the generating the consensus profile comprises an iterative process comprising the steps of: (i) detecting an abnormal value; (ii) calculating a template in a first iteration and updating the template in a subsequent iteration; (iii) aligning the profile of the relative positions of the detectable signal intensities with the template; and (iv) calculating an average similarity between the profile of detectable signal intensity versus position and the template, wherein the iterative process is repeated until the average similarity is maximized, the aligned profile of step (iii) from the final iteration of the iterative process is subjected to steps (i) and (ii), and the updated template of step (ii) is a consensus profile.

49. The method of claim 48, further comprising correlating the consensus profile with a characteristic of the potential genomic information.

50. The method of any one of claims 37 to the preceding claim, wherein the plurality of marker molecules comprises a plurality of fluorescent molecules.

51. The method of claim 50, wherein said plurality of marker molecules comprises a plurality of {1,1' - (4,4,8, 8-tetramethyl-4, 8-diaza undecamethylene) bis [4- [ (3-methylbenzo-1, 3-oxazol-2-yl) methylene ] -l, 4-dihydroquinolinium ] tetraiodide } (YOYO-1) molecules.

52. The method of any one of claims 37 to the preceding claim, wherein the plurality of nucleic acid molecules are a plurality of single-stranded DNA molecules, a plurality of double-stranded DNA molecules, a plurality of single-stranded RNA molecules, or a plurality of double-stranded RNA molecules.

53. The method of any one of claims 37 to the preceding claim, wherein the extracting of step b) comprises generating a prediction dataset using the predicted potential genomic information; and minimizing a difference between the data set and the predicted data set by altering the predicted potential genomic information, wherein the potential genomic information is the predicted potential genomic information that minimizes the difference.

54. The method of any one of claims 37 to the preceding claim, wherein the data set comprising a profile of the relative positions of the detectable signal intensities is generated by the means of any one of claims 1 to 36.

55. A method, comprising:

a) binding at least a portion of each of a plurality of nucleic acid molecules to a plurality of fluorescent molecules, the plurality of fluorescent molecules providing a detectable fluorescent signal comprising potential genomic information about a given portion of the nucleic acid molecule bound to a given fluorescent molecule, at least a portion of each of the plurality of nucleic acid molecules having overlapping regions with substantially the same characteristics;

b) obtaining a detectable fluorescent signal and a location for at least a portion of each of the plurality of nucleic acid molecules, thereby obtaining a data set comprising a profile of the relative locations of the detectable fluorescent signals;

c) identifying an outlier of the profile of the relative position of the detectable signal, thereby generating an outlier profile;

d) calculating a median profile from the profiles of relative positions of said detectable signals not identified as outlier profiles in step c);

e) calculating a weighted mean profile by estimating a similarity index between the profile of the relative positions of said detectable signals not identified as an outlier profile in step c) and said median profile of step d), and then generating a template by weighted averaging the profiles of the relative positions of said detectable signals not identified as outlier profiles in step c) by weighting according to said similarity index;

f) aligning the profile of the relative positions of the detectable signals with the template, thereby generating an aligned profile of the relative positions of the detectable signals;

g) identifying an outlier of the alignment profile of the relative position of the detectable signal, thereby generating an outlier alignment profile;

h) calculating a median alignment profile from alignment profiles of relative positions of said detectable signals not identified as outlier alignment profiles in step g);

i) calculating an updated weighted mean profile by evaluating a registration similarity index between the registration profile of the relative position of said detectable signal not identified as an outlier registration profile of step g) and said median registration profile, and then generating a registration template by weighted averaging the registration profiles of the relative position of said detectable signal not identified as an outlier registration profile of step g) by weighting according to said registration similarity index;

i) aligning said alignment profile of relative positions of said detectable signals with said alignment template, thereby generating a second alignment profile of relative positions of said detectable signals, said alignment of step i) having a lower penalty parameter than said alignment of step f);

j) calculating an average similarity between the alignment profile of the relative positions of the detectable signals and the alignment template;

k) repeating steps g), h) and j) using a second penalty parameter lower than said low penalty parameter until the difference between the average similarities for successive iterations of said repeating is less than a threshold, thereby generating a final alignment profile of the relative positions of said detectable signals;

l) identifying outliers of a final alignment profile of the relative positions of the detectable signals from the final iteration of step k), thereby generating an outlier final alignment profile;

m) calculating a median final alignment profile from the final alignment profiles of relative positions of said detectable signals not identified as outlier final alignment profiles in step l); and

n) calculating a final weighted average profile by estimating a final alignment similarity index between the final alignment profile not identified as the relative position of the detectable signal in step l) and the median final alignment profile, and then generating a consensus profile of the relative positions of the detectable signals by final weighted averaging of the final alignment profiles of the relative positions of the detectable signals not identified as outlier final alignment profiles in step l) based on weighting by the final alignment similarity index.

56. The method of claim 55, further comprising identifying the substantially identical features using a consensus profile of the relative positions of the detectable signals.

57. The method of claim 55 or 56, wherein the plurality of nucleic acid molecules are stretched linearly.

58. The method of any one of claims 55 to the preceding claim, wherein the plurality of nucleic acid molecules are confined within a nanoslit.

59. The method of any one of claims 55 to the preceding claim, wherein the plurality of fluorescent molecules preferentially bind to a sequence over another sequence.

60. The method of any one of claims 55 to the preceding claim, wherein the plurality of marker molecules comprises a plurality of {1,1' - (4,4,8, 8-tetramethyl-4, 8-diazaundecamylidene) bis [4- [ (3-methylbenzo-1, 3-oxazol-2-yl) methylene ] -l, 4-dihydroquinolinium ] tetraiodide } (YOYO-1) molecules.

61. The method of any one of claims 55 to the preceding claim, wherein the binding of step a) is by covalent, ionic, polar, hydrogen bonding, or a combination thereof.

62. The method of any one of claims 55 to the preceding claim, wherein the binding of step a) involves inserting the marker between bases of the nucleic acid molecule.

63. The method of any one of claims 55 to the preceding claim, wherein the plurality of nucleic acid molecules are a plurality of single-stranded DNA molecules, a plurality of double-stranded DNA molecules, a plurality of single-stranded RNA molecules, or a plurality of double-stranded RNA molecules.

64. The method of any one of claims 55 to the preceding claim, wherein the obtaining of step b) is achieved by fluorescence microscopy.

65. The method of any one of claims 55 to the preceding claim, wherein the acquiring of step b) is achieved by near field microscopy.

66. The method of any one of claims 55 to the preceding claim, the acquiring of step b) comprising acquiring an image and extracting from the image a profile of the relative positions of detectable signals.

67. The method of claim 66, further comprising evaluating the quality of the images and excluding any images that do not meet a quality threshold.

68. The method of claim 67, wherein the quality assessment comprises: x) analyzing the detectable signal with a predetermined number of pixels around nucleic acid molecules in the image, thereby producing an integrated intensity measurement; y) clustering the integrated intensity measurements; and z) scoring the image based on one or more factors selected from the group consisting of: the number of clusters in the image, the distance between clusters if two or more clusters exist, a cluster quality metric, and combinations thereof.

69. A non-transitory computer readable medium having stored thereon instructions that, when executed by a processor, cause the processor to perform the method of any of claims 37-52.

70. A system comprising a processor and the non-transitory computer readable medium of claim 69.

71. A system comprising a fluorescence microscope, a processor and a memory, the fluorescence microscope being arranged to acquire the detectable fluorescence signal of step b), and the memory having stored thereon instructions that, when executed by the processor, cause the processor to perform steps c) to n) of the method of any one of claims 55-68.

Background

Nucleic acid molecule analysis is very important to biology. There is a need for new methods for rapidly and efficiently analyzing potential genomic information in nucleic acid molecules. It would be beneficial to provide a method that can analyze a single nucleic acid molecule or group of single nucleic acid molecules and provide relevant information about potential genomic information.

Disclosure of Invention

In one aspect, the present disclosure provides a method of obtaining data relating to a nucleic acid molecule. The method comprises the following steps: a) binding a plurality of marker molecules to at least a portion of a nucleic acid molecule, the plurality of marker molecules each providing a detectable signal comprising potential genomic information about the nucleic acid molecule; b) acquiring detectable signals from a plurality of locations along at least a portion of the nucleic acid molecule; and c) generating a report or output signal comprising the detectable signal.

In another aspect, the present disclosure provides a method of analyzing detectable signals obtained from a plurality of nucleic acid molecules. The method comprises the following steps: a) receiving a data set comprising a profile of detectable signal intensities versus position obtained from a plurality of marker molecules bound to substantially the same portion of a plurality of nucleic acid molecules; b) extracting potential genomic information from the dataset; and c) generating a report or output signal comprising the detectable signal.

In another aspect, the present disclosure provides a method comprising the steps of: a) binding at least a portion of each of a plurality of nucleic acid molecules to a plurality of fluorescent molecules, the plurality of fluorescent molecules providing a detectable fluorescent signal comprising potential genomic information about a given portion of the nucleic acid molecule bound to a given fluorescent molecule, at least a portion of each of the plurality of nucleic acid molecules having overlapping regions with substantially the same characteristics; b) obtaining a detectable fluorescent signal and a location for at least a portion of each of a plurality of nucleic acid molecules, thereby obtaining a data set comprising a profile of the relative locations of the detectable fluorescent signals; c) identifying an outlier of the profile of the relative position of the detectable signal, thereby generating an outlier profile; d) calculating a median profile (mediaprofile) from the profiles of the relative positions of the detectable signals not identified as outlier profiles in step c); e) calculating a weighted mean profile by estimating a similarity index between the profile of the relative positions of the detectable signals not identified as an outlier profile in step c) and the median profile of step d), and then generating a template by weighted averaging (weighted averaging) of the profiles of the relative positions of the detectable signals not identified as outlier profiles in step c) by weighting according to the similarity index; f) aligning (register) the profile of the relative position of the detectable signal with the template, thereby generating a registered profile of the relative position of the detectable signal; g) identifying an outlier of the alignment profile of the relative position of the detectable signal, thereby generating an outlier alignment profile; h) calculating a median alignment profile from alignment profiles for the relative positions of detectable signals not identified as outlier alignment profiles in step g); i) calculating an updated weighted average profile by evaluating the alignment similarity index between the alignment profile of the relative position of the detectable signal not identified as an outlier alignment profile of step g) and the median alignment profile, and then generating an alignment template by weighted averaging the alignment profiles of the relative position of the detectable signal not identified as an outlier alignment profile of step g) based on the weighting of the alignment similarity index; i) aligning the alignment profile of the relative positions of the detectable signals with an alignment template to generate a second alignment profile of the relative positions of the detectable signals, the alignment of step i) having a lower penalty parameter than the alignment of step f); j) calculating an average similarity between the alignment profile and the alignment template for the relative positions of the detectable signals; k) repeating steps g), h) and j) using a second penalty parameter lower than the low penalty parameter until the difference between the average similarities for successive iterations of the repetition is less than the threshold, thereby generating a final alignment profile of the relative positions of the detectable signals; l) identifying outliers of the final alignment profile of the relative positions of the detectable signals from the final iteration of step k), thereby generating an outlier final alignment profile; m) calculating a median final alignment profile from the final alignment profiles of relative positions of the detectable signals not identified as outlier final alignment profiles in step l); and n) calculating a final weighted average profile by estimating a final alignment similarity index between the final alignment profile not identified as the relative position of the detectable signal in step l) and the median final alignment profile, and then generating a consensus profile (consensus profile) of the relative position of the detectable signal by final weighted averaging of the final alignment profiles of the relative positions of the detectable signal not identified as the outlier final alignment profile in step l) by weighting according to the final alignment similarity index.

In another aspect, the present disclosure provides a non-transitory computer readable medium having stored thereon instructions that, when executed by a processor, cause the processor to perform one of the methods described herein.

In another aspect, the present disclosure provides a system comprising a processor and a non-transitory computer readable medium as described elsewhere herein.

In another aspect, the present disclosure provides a system that includes a fluorescence microscope, a processor, and a memory.

Drawings

Fig. 1 is a flow chart showing steps of a method according to an aspect of the present disclosure.

Fig. 2 is a flow chart showing steps of a method according to an aspect of the present disclosure.

Fig. 3 is a flow chart showing steps of a method according to an aspect of the present disclosure.

Fig. 4 is a schematic diagram representing a system in accordance with an aspect of the present disclosure.

Detailed Description

Before the present invention is described in further detail, it is to be understood that this invention is not limited to particular embodiments described. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting. The scope of the invention is limited only by the claims. As used herein, the singular forms "a", "an" and "the" include plural embodiments unless the context clearly dictates otherwise.

Specific structures, devices and methods relating to modifying biomolecules are disclosed. It will be apparent to those skilled in the art that many more modifications besides those already described are possible without departing from the inventive concepts herein. In interpreting the disclosure, all terms should be interpreted in the broadest possible manner consistent with the context. The terms "comprises" and "comprising" should be interpreted as referring to elements, components, or steps in a non-exclusive manner, such that the referenced elements, components, or steps may be combined with other elements, components, or steps that are not expressly referenced. Embodiments referred to as "comprising" certain elements are also considered to "consist essentially of" and "consist of" the element(s). Where two or more ranges of specific values are recited, the disclosure contemplates all combinations of the upper and lower limits of those ranges not explicitly recited. For example, reference to a value of 1-10 or 2-9 also contemplates values between 1-9 or 2-10.

Various aspects of the various functional components and process steps may be described herein. It should be understood that such components and steps may be realized by any number of hardware components configured to perform the specified functions.

Method of producing a composite material

The present disclosure provides various methods. It should be understood that the various methods are suitable for use with other methods. Similarly, it should be understood that various methods are suitable for use with the systems described elsewhere herein. When a feature of the present disclosure is described with respect to a given method, it is also expressly contemplated that the feature can be used with other methods and systems described herein, unless the context clearly dictates otherwise.

Referring to fig. 1, the present disclosure provides a method 100 of obtaining data relating to a nucleic acid molecule. At process block 102, the method 100 includes binding at least a portion of the nucleic acid molecules to a plurality of marker molecules. The plurality of marker molecules each provide a detectable signal comprising potential genomic information about the nucleic acid molecule. At process block 104, the method 100 includes acquiring detectable signals from a plurality of locations along at least a portion of the nucleic acid molecule. At process block 106, the method 100 may include generating a report or output signal including the detectable signal.

At optional process block 108, the method 100 may include binding at least a second portion of the nucleic acid molecules to a second plurality of marker molecules. The second plurality of marker molecules each provides a detectable signal. At optional process block 110, the method 100 includes receiving detectable signals at a second plurality of locations along at least a second portion of the nucleic acid molecule.

In some cases, the method 100 may include repeating process blocks 102 and 104 again, replacing the nucleic acid molecule with a second nucleic acid molecule. The nucleic acid molecule and the second nucleic acid molecule can have substantially the same sequence. The nucleic acid molecule and the second nucleic acid molecule may have different sequences. As used herein, "substantially identical sequences" refers to nucleic acid sequences that cannot be distinguished using the methods of the present disclosure. Nucleic acid molecules having substantially the same sequence may have the following differences: (a) single Nucleotide Polymorphism (SNP) or Single Nucleotide Variation (SNV) -a single base pair difference in sequence; (2) small insertions and deletions (INDELs) -insertions or deletions 1-100bp short; (3) methylation, such as C-me and A-me. As used herein, "different sequences" refers to nucleic acid sequences that are distinguishable using the methods of the present disclosure.

In some cases, method 100 may include repeating process blocks 102 and 104 another number of times, each of the another number of times replacing the nucleic acid molecule with a different nucleic acid molecule of the plurality of other nucleic acid molecules, respectively. The nucleic acid molecule and the plurality of other nucleic acid molecules can have substantially the same sequence. At least a portion of a nucleic acid molecule and at least a portion of a different nucleic acid molecule of the plurality of other nucleic acid molecules may at least partially overlap.

Referring to fig. 2, the present disclosure provides a method 200 of analyzing detectable signals obtained from a plurality of nucleic acid molecules. At process block 202, the method 200 includes collecting a data set. The data set includes a profile of detectable signal strength versus position. Detectable signal intensity is obtained from a plurality of marker molecules bound to substantially the same portion of a plurality of nucleic acid molecules. At process block 204, the method 200 includes extracting potential genomic information from the data set. At process block 206, the method 200 includes generating a report or output signal including the potential genomic information.

In any of the methods, the detectable signal may comprise potential genomic information derived from binding a sequence preferentially to another sequence. For example, a marker molecule or fluorescent molecule that preferentially binds a GC-rich fragment relative to an AT-rich fragment can provide information about the amount of GC relative to AT in the underlying genomic information.

The plurality of marker molecules may comprise a plurality of fluorescent molecules. In the case where a fluorescent molecule is involved, the fluorescent molecule may be one capable of binding to a nucleic acid molecule, including, but not limited to, {1,1' - (4,4,8, 8-tetramethyl-4, 8-diaza-undecamethylene) bis [4- [ (3-methylbenzo-1, 3-oxazol-2-yl) methylene ] -l, 4-dihydroquinolinium ] tetraiodide } (YOYO-1) ethidium bromide, oxazole yellow (YOYO fluoromonomer), SYTOX orange, SYTOX green, SYBR gold, YO-Pro-1, POPO-3, DAPI, and the like.

The plurality of marker molecules may include a plurality of first fluorescent molecules and a plurality of second fluorescent molecules. The plurality of marker molecules may further comprise a plurality of third fluorescent molecules, a plurality of fourth fluorescent molecules, a plurality of fifth fluorescent molecules, and so on up to the plurality of nth fluorescent molecules. Each of these different fluorescent molecules may interact with each other to provide a detectable signal. Each of these different fluorescent molecules may have different emission characteristics, such as emission wavelength, emission waveform, and the like. Each of these different fluorescent molecules may have different absorption characteristics, such as absorption wavelength, absorption coefficient, and the like. Each of these different fluorescent molecules may have different binding characteristics.

The methods described herein can further comprise binding any nucleic acid molecule or at least a portion of any nucleic acid molecule to a plurality of quencher molecules. The quencher molecule can modulate emission from the plurality of marker molecules to provide a detectable signal.

The plurality of marker molecules may comprise a plurality of donor molecules and a plurality of acceptor molecules. The plurality of marker molecules may comprise a plurality of protein markers, including intercalating fluorescent proteins, such as Lee, s., Oh, y., Lee, j., Choe, s., Lim, s., Lee, h.s., … Schwartz, d.c. (2016. DNA binding fluorescent proteins for the direct visualization of large DNA molecules) Nucleic Acids Research,44(1), e6.doi:10.1093/nar/gkv834, the entire contents of which are incorporated herein by reference.

Referring to fig. 3, the present disclosure provides a method 300. Method 300 is one specific implementation of a combination of methods 100 and 200. The description of method 300 should not be construed as limiting the interpretation of methods 100 and 200. Aspects of method 300 may utilize aspects of methods 100 and 200, and vice versa. At process block 302, the method 300 includes binding at least a portion of each of a plurality of nucleic acid molecules to a plurality of fluorescent molecules. It is to be understood that the exemplary fluorescent molecule is only one example of the above-mentioned marker molecules, and that other marker molecules may be considered. The plurality of fluorescent molecules provides a detectable fluorescent signal that includes potential genomic information about a given portion of the nucleic acid molecule bound to a given fluorescent molecule. At least a portion of each of the plurality of nucleic acid molecules has an overlapping region having substantially the same characteristics.

At process block 304, the method 300 includes obtaining a detectable signal and a location of at least a portion of each of the plurality of nucleic acid molecules. The acquisition of process block 304 produces a data set that includes a profile of the relative positions of the detectable fluorescent signals.

At process block 306, the method 300 includes identifying outliers of the profile of the relative location of the detectable signal, thereby generating an outlier profile. One of ordinary skill in the imaging arts will appreciate that there are a variety of methods for eliminating poor quality images. In one non-limiting example, a sophisticated image quality assessment method was developed to identify high quality images for subsequent analysis. The image quality evaluation method comprises the following steps: 1. for each molecule in the image frame, we analyzed Integrated Fluorescence Intensity (IFI) measurements for up to three pixels around the molecule. 2. Bayesian Information Criteria (BIC) and Gaussian Mixture Models (GMM) cluster IFI. There is a cluster of IFIs in a high quality molecular image. 3. In the case of multiple clusters, the distance between the centers of the centroids (centroids) that cluster farthest is used as one of the factors in establishing the quality score. Other factors are cluster quality metrics such as Dunn index and connectivity index (see Brock, Guy, valyl Pihur, susmitita Datta, Somnath Datta, et al 2011. valid, a r package for cluster verification (valid, an r package for validation), Journal of statistical Software (Brock et al, month 2008 3), the entire contents of which are incorporated herein by reference). 4. The training set of 300 images was manually labeled as "high" and "low" quality. Fitting a logistic regression model using the factors described in step (3). 5. Using cross-validation, an optimal probability cutoff (cutoff) is obtained, so that the image is detected as "high quality" by minimizing type II errors.

Other data processing may be performed. For example, the profile may be normalized. As another example, the profile may be selected to ensure that the data for the DNA molecule falls within a given range of stretch values (e.g., +/-10% of the median stretch). Also for example, the scans may be smoothed using methods known to those of ordinary skill in the art, such as B-spline De Boor (De Boor, Carl.1978.spline practical guidelines (A practical guides to splines), Vol.27, Schpringer Press, Springer-Verlag, N.Y., the entire contents of which are incorporated by reference) smoothing methods. Some of the pretreatment steps are described in more detail in example 1 below.

At process block 308, the method 300 includes calculating a median profile from the profiles of the relative positions of the detectable signals that were not identified as outlier profiles in process block 306. The median profile may be calculated using a functional data depth (functional data depth) metric as understood by those of ordinary skill in the art, including, but not limited to, Fraiman and Muniz depths, h-mode depths, random projection depths (random projection depth), random Tukey depths, and the like.

At process block 310, method 300 includes calculating a weighted average profile, thereby generating a template including the weighted average profile. A weighted average profile is calculated by evaluating a similarity index between the profile of the relative location of the detectable signal that was not identified as an outlier profile in process block 306 and the median profile of process block 308, and then taking a weighted average of the profiles of the relative location of the detectable signal that was not identified as an outlier profile in process block 306 by weighting according to the similarity index.

At process block 312, the method 300 includes aligning (register) the profile of the relative position of the detectable signal with the template to generate an aligned profile of the relative position of the detectable signal. In some cases, the alignment of process block 312 may include curvilinear alignment (curveregistration), as described below. Let n function (or curve) f1, … …, fn define a closed solid interval [0, S]In (1). H is to bei(x) Set as the abscissa x transformation of curve i. Function f to be observed in the absence of amplitude noisei(x) Set as a true warp curve fc(x) Result of (f)i(x)=fc[hi(x)]. The warping function (warping function) is commonly referred to as "time warping" because time is a common abscissa in the phase noise problem. In the context of the present disclosure, the abscissa is the DNA molecule backbone. The warping function should satisfy the following condition:

·hi(0) 0 and hi(S)=S,i=1,...,n,

The timing of the events remains of the same order of magnitude regardless of the time scaleThis means that the time warping function hiShould be strictly incremented, i.e. for x1>x2Has the following advantagesi(x1)>hi(x2) Wherein x is1,x2∈[0,S].。

·

Figure BDA0002617085320000071

The purpose of the alignment of the curves is that the function f of the alignment1(h1 -1(x)),...,fn(hn -1(x) Will have no phase noise.

At process block 314, the method 300 includes identifying outliers of the alignment profile of the relative positions of the detectable signals, thereby generating an outlier alignment profile. Identifying outliers of the alignment profile may include a function data depth metric. Suitable functional data depth measures include, but are not limited to, Fraiman and Muniz depths, h-mode depths, random projection depths, random Tukey depths, and the like.

Depth and outlier are the opposite concepts, so if there are outliers in the dataset, the depth of the corresponding curve will be very low. For at curve f1,...,fnAn exemplary process for functional outlier detection in a dataset of (1) is as follows:

1. obtaining a function depth Dn(f1),...,Dn(fn) (this may be any depth defined above: FMD, MD, RPD or RTD)

2. For a given cutoff value C, let fi1,...,fikIs a k curve so that Dn(fik) C is less than or equal to C. Then assume fi1,...,fikIs an outlier and is deleted from the sample.

3. Then go back to step 1 and use a new data set that deletes the outliers found in step 2. This step is repeated until no more outliers are found.

To ensure that the type I error in detecting outliers is below some small threshold α, C is chosen such that

However, because the distribution of function depth statistics is generally unknown, the use of Febrero et al (Febrero, Manual, Pelor Galeano, and Wence slide Gonz a lez-Mantegia.2008. detect abnormal values in function data by depth measurement and for identifying abnormal nox levels (output detection in Functional bottom measures, with application to identification of abnormal nox levels) 331-345, which are estimated by a helper program (bootstrap program) introduced in the incorporated by reference, and calculate function data in R-program packages of (Febrero-Band, M, and MOVideo de la.201a. use: function data and utility model of the application program (Journal analysis: 0. Journal analysis: function version of 12. function data and utility model of the application of Software, application program (application program: Software: 12. application) calculates the function data by using the application of the function data in the application of the application program (Journal analysis of the application: 12. application of the application ) 1-28, the entire contents of which are incorporated herein by reference). The smoothing commoning program based on trimming (trimming) runs as follows:

1. for arbitrary function depth, a function depth D is obtainedn(f1),...,Dn(fn)。

2. A B-standard common-benefit sample (bootstrap sample) of size n is obtained from a dataset of curves obtained after deleting α% of the least deep curves, for i 1,.., n and 1

Figure BDA0002617085320000091

3. For each guidance set B1bDepth distribution as a percentage of empirical 1%

4. Taking C as CbMedian value of the value,b=1,...,B。

The level a used may be selected as the proportion of suspected outliers in the sample. In the Fscan dataset, α is 0.15, since about 15% of the images are expected to have unusable intensity profiles based on quality score measurements.

The function data depth metric may be selected by modeling the noisy curve and the outliers and selecting the metric that best identifies the outliers. In some cases, the function data depth metric may be a combination of FM depth and RP depth, as described below.

At process block 316, the method 300 includes calculating a median alignment profile from alignment profiles of relative positions of detectable signals that were not identified as outlier alignment profiles in process block 314. The calculations of process block 316 may be performed in the same or similar manner as described above with respect to the calculations of process block 308.

At process block 318, the method 300 includes calculating an updated weighted mean profile, thereby generating a lineup template including the weighted mean profile. An updated weighted average profile is calculated by evaluating the alignment similarity index between the alignment profile for the relative position of the detectable signal not identified as an outlier alignment profile at process block 314 and the median alignment profile, and then taking a weighted average of the alignment profiles for the relative position of the detectable signal not identified as an outlier alignment profile at process block 314 by weighting according to the alignment similarity index. The calculations of process block 318 may be performed in the same or similar manner as described above with respect to the calculations of process block 310.

At process block 320, the method 300 includes aligning the alignment profile of the relative position of the detectable signal with the alignment template to generate a second alignment profile of the relative position of the detectable signal. The alignment of process block 320 may be accomplished in the same or similar manner as described above with respect to the alignment of process block 312. The alignment of process block 320 has a lower penalty parameter than the alignment of process block 312.

At process block 322, the method 300 includes calculating an average similarity between the alignment profile and the alignment template for the relative positions of the detectable signals. The calculation of process block 322 may be accomplished using the same or similar methods as described below with respect to the PRIMR algorithm.

At process block 324, the method 300 includes repeating process blocks 316, 318, 320, and 322 using a second penalty parameter that is less than the lower penalty parameter. The repetition of process block 300 continues until the difference between the average similarities for successive iterations of the repetition is less than the threshold. The outcome of the repetition of process block 300 is the final alignment profile.

At process block 326, the method 300 includes identifying outliers of the final alignment profile of the relative positions of the detectable signals from the final iteration of process block 324, thereby generating an outlier final alignment profile. The qualification of process block 326 may be accomplished using the same or similar methods as described above with respect to the calculations of process block 314.

At process block 328, the method 300 includes calculating a median final alignment profile from the final alignment profiles of the relative positions of the detectable signals that were not identified as outlier final alignment profiles in process block 326.

At process block 330, the method 300 includes calculating a final weighted average profile, thereby generating a consensus profile of detectable signal versus time. A final alignment weighted average profile is calculated by estimating a final alignment similarity index between the final alignment profile and the median final alignment profile for the relative positions of the detectable signals not identified as the final outlier alignment profile in process block 326, and then taking a final weighted average of the final alignment profiles for the relative positions of the detectable signals not identified as the final outlier alignment profile in process block 326 by weighting according to the final alignment similarity index. The individual profiles are sometimes referred to herein as Fscan. The consensus profile is sometimes referred to herein as cFscan.

One example of the steps of process blocks 306 through 330 is the PRIMR algorithm. The PRIMR algorithm described herein iteratively uses the minimum second eigen value Method (MSEV) to align the noisy Fscan. PRIMR differs from MSEV in three ways. First, PRIMR is detected using outliers, using Fraiman and muniz (fm) depth and Random Projection (RP) depth, as described below. Second, PRIMR estimates consensus (or mean) of Fscan as follows: first estimatingL1 median and then estimate a weighted average of Fscan. The L1 median values are evaluated by Vardi and Zhang at Vardi and Zhang (2000), "The multivariate L1 median values are correlated with data depth (The multivariate L1-media associated data depth)," Proceedings of The national Academy of Sciences 97(4):1423-1426 (which is incorporated herein by reference in its entirety), are implemented in The R-package robustX (Stahel, Werner, MartinMaechler, Maintainer Maechler, and MASS Suggests.2009, which is incorporated herein by reference in its entirety), where

Figure BDA0002617085320000101

Wherein

Figure BDA0002617085320000111

And is

Figure BDA0002617085320000112

Finally, in PRIMR, we use three values of the penalty parameter λ. We started at 0.001, and reduced it to 0.0005 after the first iteration, and then to 0.0001 in all subsequent iterations. λ plays an important role in aligning the nearby features of Fscan. For higher lambda values, distant features will be aligned, and for lower lambda values, close features will be aligned. Decreasing λ in PRIMR ensures that we gradually increase the confidence of consensus estimates (consensus estimates).

After convergence (iteration T), the alignment curve is run through steps 1 and 2By last updating the template toWhich serves as the consensus Fscan (or cFscan) for the group Fscan. Mean degree of similarity

Figure BDA0002617085320000115

Is a measure of the quality of the alignment. Higher isThe value indicates that the noise is small in the aligned Fscan.

Figure BDA0002617085320000117

Fraiman and Muniz introduced function data depth first. F is to ben,x(fi(x) Is set as a curve f1(x),...,fn(x) At any x ∈ [ a, b ]]An empirically accumulated distribution function of values of (a) which is

Figure BDA0002617085320000122

And, point fi(x) Has a univariate depth of

Fraiman and Muniz function depth (FMD), or relative to set f1(x),...,fn(x) Curve f ofiIs composed of

Higher FMD values indicate deeper curves; lower FMD values indicate farther from the deepest curve.

The random projection depth is based on the depth of the functional data and its derivatives under the measured projection. The basic idea is to project each curve and its first derivative in random directions and define

Figure BDA0002617085320000132

Point (2). Now that the user has finished the process,

Figure BDA0002617085320000133

the medium data depth provides the order of the proxels. Make itWith a large number of random projections, the average of the projection point depths defines the depth of the function data. Given curve f1,...,fnAnd belongs to a direction of independent directional processesSimilarly, T'i,v=<v,f′i>Is the first derivative f in the direction vi' (x). Thus, (T)i,v,T′i,v) To is pairOne point of (2). Now, if v1,...,vpIs p independent random directions, curve fiThe random projection depth of (d) is defined as:

for example, DnCan beModal depth (modal depth).

The method 300 may also include generating a predicted consensus profile. The predicted consensus profile may be generated by the SUBAGGING algorithm described below. A predictive consensus profile can be generated by altering the potential predictive genomic information. The predicted genomic information may be altered to minimize the difference between the predicted consensus profile and the consensus profile. Generating the predicted consensus profile may use Random Forest (RF), gradient boosting (GF), or both.

In the MM Fscan dataset discussed in example 1 below, there are 30,560 intervals (each 50 pixels in length) that meet the PRIMR selection criteria. The cFscan of all intervals was estimated using PRIMR. For each interval, its cFscan is a smooth curve spanning 50 data points, each corresponding to an expected fluorescence intensity measurement of 206bp of the genomic sequence. The counts of the genomic elements in this 206bp subsequence were used as features, while cFscan was used as the response of the predictive model. Specially for treating diabetesThe signature is the count of nucleotide G, C, A, T in the 206bp subsequence, all possible 2-mer GG, GC, GA,.. gtoreq, TT, all possible 3-mer, 4-mer, and 5-mer counts. Presence 16 (4)2) 2-mer, 64 (4)3) A 3-mer, 256 (4)4) 4-mer and 1,024 (4)5) The total number of features is 6,820(1,364x 5.) the response vector is 1,528,000 pixels (30,560 intervals × 50) in length, corresponding to pixel points j on cFscan, the count of k-mers in window j and the count of k-mers in windows j + and j + + are used as features for each window 206 bp.. for example, feature at is the count of 2-mers "at" in the corresponding window, feature at + is the count of "at" in window j +, feature at is the count of "at" in window j.

RF is a relatively new tree-based machine learning tool that is becoming increasingly popular with the proliferation of large data analytics. Since the advent (Breiman, L.2001. Random forest (Random forest). Machine learning 45(1): 5-32, the entire contents of which are incorporated herein by reference), RF has increasingly been used in regression and classification settings (Efren, Bradley and Trevor Hastie.2016. Computer statistical inference, Vol.5. Cambridge University Press, the entire contents of which are incorporated herein by reference). RF is particularly attractive in high-dimensional settings and in predictions involving features with multiple collinearities. RF combines the concepts of adaptive nearest neighbors (adaptive nearest neighbors) and Bagging (Bagging) (Breiman, Leo.1996. Bagging predictors. Machine learning 24(2): 123-The "Boosting" method was originally used to improve the performance of "weak learners" in the binary classification problem, Efreon and Hastie (2016) by resampling the training points and giving higher weight to the misclassified training points Friedman, Jeromer H.2001 Greedy function approximation, gradient Boosting machine (Greedy function: a gradient Boosting machine) in which the contents of "Boosting" method for iteratively adding basic functions in a basic fashion to increase the loss of other functions in the binary classification problem, and "Boosting" method was originally used to improve the performance of "weak learners" in the binary classification problem, which method was used to increase the performance of "weak learners" by Boosting "machine" in the binary classification method for cumulatively expanding (Boosting) based on a plurality of different criteria, which method iteratively adds basic functions to reduce the loss of "weak learners" (the "Boosting" trend "and Hastie" growing "by Boosting" method "in the binary classification problem" 18 "and" which method for increasing the performance of "growing the" results of "increasing the performance of" and the results of "growing the results of the results in the binary classification problem, which are used to increase the performance of" increasing "of the" trees under the background classification problem, which method of the results of the growth of the "growing tree, which method of the" of the results of the "growing tree, which method of increasing the" growing of increasing the loss of the results of the "growing of the binary classification problem, which method of increasing the" growing of weak learners (adding results of the "growing tree, which method) by adding results of increasing the same) by adding and the sum of increasing2Losses study the lift. We use random forest and random gradient boosting, assume gaussian distribution of error, minimize squared error loss, and build a prediction model between sequence composition and cFscan.

The R package "randomForest" was used to fit RF models (Liaw, Andy and Matthew wiener.2002. Classification and regression by random forest R news2(3): 18-22), the entire contents of which are incorporated herein by reference. The GB model was fitted using the R package "gbm" (Ridgeway, Greg, et al 2006.gbm: general marching regression models. Rackage version 1(3):55), the entire contents of which are incorporated herein by reference.

In one non-limiting example, the model in the following equation is fitted to the data (X, Y):

Figure BDA0002617085320000151

wherein d is 6,820 (6)

Where X is the d-dimensional predictor variable (genomic sequence composition count) and Y is the univariate response of length (N-1,528,000). To avoid overfitting and to fit the model efficiently (computational efficiency), the Subagging algorithm (3) is implemented to fit the prediction function h using a CHTC parallel framework running HTCondor 2. Subbagging is an alias of subsampling aggregation (subsampling), where subsamples of data are used, rather than the commonalities of aggregation (in Bagging). Subagging is supported by Buchlmann and Yu (2002) because it is computationally economical while still being as accurate as bagging. The following describes the subagging algorithm developed for predictive pFscan.

After fitting the predictive model, the relative importance of the features can be analyzed using methods known to those of ordinary skill in the art. For example, for the RF model, the overall reduction in node impurities (node impurity) resulting from segmenting the features (averaging over all trees) gives the notion of feature importance. Node impurities can be measured by the sum of the squared residuals. The more the node impurity of a feature is reduced, the more important it is for prediction. As another example, to estimate feature importance from the GB model, a defined approximation measure of relative impact in a decision tree from Breiman, et al (Breiman, Leo, Jerome Friedman, Charles J Stone and RichardA olsen.1984. Classification and regression trees, CRC Press, incorporated herein by reference in its entirety) was used.

Figure BDA0002617085320000161

In any method, any nucleic acid molecule can be stretched linearly. In any method, at least a portion of any of the one or more nucleic acid molecules can be confined within a nanoslit (nanoslit).

The bonding of process block 102 or process block 302 may be by various types of bonds including, but not limited to, covalent bonds, ionic bonds, polar bonds, hydrogen bonds, or combinations thereof. The binding of process block 102 or process block 302 may involve the insertion of marker molecules between nucleic acid molecule bases. For example, YOYO-1 inserts itself between DNA bases. As understood by one of ordinary skill in the art, the process block 102 or combination of process blocks 302 may utilize YOYO-1 or other similar dyes.

YOYO-1 (oxazole yellow) exhibits a large degree of fluorescence enhancement upon binding to nucleic acids. Previous studies have observed a 2-fold increase in quantum yield when switching from an AT-rich region to a GC-rich region. Other studies observed that the fluorescence intensity was dependent on the base sequence. This indicates that the quantum yield and fluorescence lifetime of YOYO complexed with GC-rich DNA sequences is approximately twice that of YOYO complexed with AT-rich sequences. As a result, the probability that a dye molecule will intercalate between DNA bases and fluoresce is not uniform.

The detectable signal described herein may be an optical signal. The optical signal may be an optical fluorescence signal. The detectable signal may be initiated by an external stimulus such as electromagnetic radiation. The detectable signal may be: (1) speech patterns or other sound waves; (2) any dynamic process that changes over time; (3) a 2D image; or other signal having a characteristic related to that listed. The detectable signal may include an electrical signal, such as a change in local electrical polarizability, a magnetic field (i.e., ferromagnetic nanoparticles coupled to a dye or other binding moiety), or the like.

Receiving the detectable signal of process block 104 and/or obtaining the relative location of the detectable fluorescent signal of process block 304 may include obtaining an image, such as a fluorescent image, of the nucleic acid molecule that has been bound by the marker molecule and/or the fluorescent molecule. The relative positions of the detectable signals for the receive process block 104 and/or the capture process block 304 are described in pages 1-10 of Nandi, subangshu (2007, prohibited from disclosure), "statistical learning Methods for fluorescence scanning" (statistical learning Methods for fluorescence) ", thesis in doctorals, University of Wisconsin (University of Wisconsin-Madison), the entire contents of which are incorporated herein by reference.

An example of potential genomic information for the extraction process block 204 is described in Nandi, Subranshu, pages 11-114 (2007, prohibited for disclosure), "Statistical Learning method for fluorescence scanning" (Statistical Learning), proceedings of doctor University of Wisconsin-Madison, University of Wisconsin), which is incorporated herein by reference in its entirety. In some cases, the extraction of process block 204 may include the same or similar steps as described in process blocks 306-330.

In some cases, the extraction of process block 204 may include eliminating outliers from the dataset. The elimination of outliers in process block 204, as well as elsewhere described herein, may use Fraiman and muniz (fm) depth and Random Projection (RP) depth.

In some cases, the extraction of process block 204 may include normalizing the profile of detectable signal intensity versus position. The extraction of process block 204 may include excluding profiles of detectable signal intensity versus position corresponding to nucleic acid molecules having stretch values outside a predetermined range of acceptable stretch values. The extraction of process block 204 may include smoothing the profile of detectable signal strength versus position. The smoothed profile may be normalized again after smoothing.

The extraction of process block 204 may include generating a consensus profile of the relative locations of detectable signal strengths. The consensus profile is sometimes referred to herein as cFscan. Generating the consensus profile may include correcting for amplitude variations between the profiles of detectable signal strength versus position. Generating the consensus profile may include correcting for phase variations between the profiles of detectable signal strength versus position.

Generating the consensus profile may include an iterative alignment process. Generating the consensus may comprise an iterative process with the following steps: (i) detecting an abnormal value; (ii) calculating a template in a first iteration and updating the template in a subsequent iteration; (iii) aligning the profile of the relative positions of the detectable signal intensities with the template; and (iv) calculating an average similarity between the profile of detectable signal intensity versus position and the template, wherein the iterative process is repeated until the average similarity is maximized, the aligned profile of step (iii) from the final iteration of the iterative process is subjected to steps (i) and (ii), and the updated template of step (ii) is a consensus profile.

The methods described herein may include correlating the consensus profile with one or more features of the underlying genomic information. As used herein, a feature of potential genomic information may include any smallest detectable unit of potential genomic information. In some cases, the smallest detectable unit may be a 2-mer, a 3-mer, a 4-mer, or a 5-mer.

In some cases, the extraction of process block 204 may include: generating a prediction dataset using the predicted potential genomic information; and minimizing a difference between the data set and the predicted data set by altering the predicted potential genomic information, wherein the potential genomic information is the predicted potential genomic information that minimizes the difference.

System for controlling a power supply

The present disclosure also provides a system. The system may be adapted for use with the methods described herein. When a feature of the present disclosure is described with respect to a given system, it is also expressly contemplated that the feature can be used with other systems and methods described herein, unless the context clearly dictates otherwise.

Referring to fig. 4, a system 400 may include a computer 402 having a processor 404 and/or CPU and memory 406. The system 400 may also include a spectroscopy system 408. The spectroscopy system 408 may include a fluorescence microscope 410. The computer 402 may be arranged to control the spectroscopic system 408 and/or the fluorescence microscope 410.

The processor 404 and/or CPU may be configured to read and execute computer-executable instructions stored in the memory 406. The computer-executable instructions may include all or part of the methods described herein.

The memory 406 may include one or more computer-readable and/or writable media and may include, for example, a magnetic disk (e.g., hard disk), an optical disk (e.g., DVD, Blu-ray, CD), a magneto-optical disk, semiconductor memory (e.g., nonvolatile memory card, flash memory, solid state drive, SRAM, DRAM), EPROM, EEPROM, and so forth. The memory may store computer-executable instructions for all or part of the methods described herein.

29页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:用于生物或化学因子样本阻抗测量的传感器及使用传感器检测样本中生物或化学因子的方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!