Template-directed nucleic acid-targeting compounds

文档序号:991096 发布日期:2020-10-20 浏览:11次 中文

阅读说明:本技术 模板导向的核酸靶向型化合物 (Template-directed nucleic acid-targeting compounds ) 是由 D·H·李 W-C·谢 R·巴哈尔 于 2018-12-21 设计创作,主要内容包括:本文描述的是包含末端芳香部分的基因识别试剂,其特异性地结合到模板核酸并连接。还提供了使用基因识别试剂的方法,例如,治疗或诊断重复扩张疾病,例如DM1。(Described herein are gene recognition reagents comprising a terminal aromatic moiety that specifically binds to and is linked to a template nucleic acid. Methods of using the gene recognition reagents, e.g., in the treatment or diagnosis of a repetitive dilation disease, e.g., DM1, are also provided.)

1. A gene recognition reagent, comprising:

a nucleic acid or nucleic acid analog backbone having a first end and a second end and having from 3 to 8 ribose, deoxyribose, or nucleic acid analog backbone residues;

nucleobases, which may be the same or different, linked to a plurality of ribose, deoxyribose, or nucleic acid-like backbone residues in a sequence complementary to a target nucleic acid;

a first aryl moiety linked to a first end of the nucleic acid or nucleic acid analogue backbone by a linker; and

a second aryl moiety, optionally identical to the first aryl moiety, is attached to the second end of the nucleic acid or nucleic acid analogue backbone via a linker, wherein the aryl moiety is stacked with the aryl moiety of an adjacent recognition reagent when the recognition reagent is hybridized to an adjacent sequence of the target nucleic acid.

2. The gene recognition reagent according to claim 1, having the following structure:

wherein the content of the first and second substances,

n is an integer from 1 to 6;

each instance of R is independently a nucleobase, resulting in a sequence of nucleobases that is optionally complementary to a target nucleic acid;

b is a ribose, deoxyribose, or nucleic acid analog backbone residue;

l is independently a linker; and

each instance of Ar is independently a2 to 5 ring fused polycyclic aromatic moiety.

3. The gene recognition reagent according to claim 1, wherein the nucleic acid or nucleic acid analog backbone residue is a nucleic acid analog backbone residue.

4. The gene recognition reagent according to claim 3, wherein the nucleic acid analog backbone residue comprises a conformational pre-organization residue.

5. The gene recognition reagent according to claim 4, wherein the conformational pre-organization nucleic acid analogue backbone residue is a γ PNA, LNA or ethylene glycol nucleic acid backbone residue.

6. The gene recognition reagent according to claim 4, wherein the conformational pre-organization nucleic acid analog backbone residue is a γ PNA backbone residue.

7. The gene recognition reagent of claim 6, wherein one or more of the γ PNA backbone residues is substituted with a group comprising an ethylene glycol unit having 1 to 100 ethylene glycol residues, such as: - (OCH)2-CH2)qOP1;-(OCH2-CH2)q-NHP1;-(SCH2-CH2)q-SP1;-(OCH2-CH2)r-OH;-(OCH2-CH2)r-NH2;-(OCH2-CH2)r-NHC(NH)NH2(ii) a Or- (OCH)2-CH2)r-S-S[CH2CH2]sNHC(NH)NH2In which P is1Is H, (C)1-C8) Alkyl, (C)2-C8) Alkenyl, (C)2-C8) Alkynyl, (C)3-C8) Aryl group, (C)3-C8) Cycloalkyl group, (C)3-C8) Aryl radical (C)1-C6) Alkylene or (C)3-C8) Cycloalkyl (C)1-C6) An alkylene group; q is an integer of 0 to 50; r is an integer from 1 to 50 and s is an integer from 1 to 50, and optionally through (C)1-C6) The divalent hydrocarbyl linker is attached to one or more γ PNA backbone residues.

8. The gene recognition reagent of claim 4, wherein the conformational pre-organization nucleic acid analog backbone residue is L- γ PNA.

9. The gene recognition reagent according to claim 1, having the following structure:

wherein the content of the first and second substances,

r is independently a nucleobase;

n is an integer between 1 and 6, such as 1,2,3,4,5 or 6;

l is independently a linker;

R1and R2Each attached to a gamma carbon and independently: h; a guanidine-containing group; an amino acid side chain; straight or branched chain (C)1-C8) Alkyl, (C)2-C8) Alkenyl, (C)2-C8) Alkynyl, (C)1-C8) Hydroxyalkyl group, (C)3-C8) Aryl group, (C)3-C8) Cycloalkyl group, (C)3-C8) Aryl radical (C)1-C6) Alkylene, (C)3-C8) Cycloalkyl (C)1-C6) Alkylene, optionally substituted with ethylene glycol units comprising 1 to 50 ethylene glycol moieties; -CH2-(OCH2-CH2)qOP1;-CH2-(OCH2-CH2)q-NHP1;-CH2-(SCH2-CH2)q-SP1;-CH2-(OCH2-CH2)r-OH;-CH2-(OCH2-CH2)r-NH2;-CH2-(OCH2-CH2)r-NHC(NH)NH2(ii) a or-CH2-(OCH2-CH2)r-S-S[CH2CH2]sNHC(NH)NH2In which P is1Is H, (C)1-C8) Alkyl, (C)2-C8) Alkenyl, (C)2-C8) Alkynyl, (C)3-C8) Aryl group, (C)3-C8) Cycloalkyl group, (C)3-C8) Aryl radical (C)1-C6) Alkylene or (C)3-C8) Cycloalkyl (C)1-C6) An alkylene group; q is an integer of 0 to 50; r is an integer from 1 to 50, and s is an integer from 1 to 50; and

R3independently a2 to 5 ring fused polycyclic aromatic moiety,

or a pharmaceutically acceptable salt thereof.

10. The gene recognition reagent of claim 9, wherein each linker independently comprises one or more guanidine-containing groups, one or more amino acid side chains, or one or more adjacent amino acid residues.

11. The gene recognition reagent according to claim 9, wherein each instance of L comprises a peptide having a pendant group

Figure FDA0002641565610000031

Wherein N ranges from 1 to 5, and both an N-terminal and a C-terminal arginine residue are attached to each of said first amino acid residues.

12. The gene recognition reagent according to claim 11, wherein n ranges from 1 to 3.

13. The gene recognition reagent according to claim 1, having the following structure:

wherein the content of the first and second substances,

r is independently a nucleobase;

n is an integer between 1 and 6, such as 1,2,3,4,5 or 6;

R1and R2Each attached to a gamma carbon and independently: h; a guanidine-containing group; an amino acid side chain; straight or branched chain (C)1-C8) Alkyl, (C)2-C8) Alkenyl, (C)2-C8) Alkynyl, (C)1-C8) Hydroxyalkyl group, (C)3-C8) Aryl group, (C)3-C8) Cycloalkyl group, (C)3-C8) Aryl radical (C)1-C6) Alkylene, (C)3-C8) Cycloalkyl (C)1-C6) Alkylene, optionally substituted with ethylene glycol units comprising 1 to 50 ethylene glycol moieties; -CH2-(OCH2-CH2)qOP1;-CH2-(OCH2-CH2)q-NHP1;-CH2-(SCH2-CH2)q-SP1;-CH2-(OCH2-CH2)r-OH;-CH2-(OCH2-CH2)r-NH2;-CH2-(OCH2-CH2)r-NHC(NH)NH2(ii) a or-CH2-(OCH2-CH2)r-S-S[CH2CH2]sNHC(NH)NH2In which P is1Is H, (C)1-C8) Alkyl, (C)2-C8) Alkenyl, (C)2-C8) Alkynyl, (C)3-C8) Aryl group, (C)3-C8) Cycloalkyl group, (C)3-C8) Aryl radical (C)1-C6) Alkylene or (C)3-C8) Cycloalkyl (C)1-C6) An alkylene group; q is an integer of 0 to 50; r is an integer from 1 to 50, and s is an integer from 1 to 50;

R4or R5And R6、R7Or R8is-L-R3Wherein R is3Independently is a2 to 5 ring fused polycyclic aromatic moiety, and L is a linker, and R is4、R5、R6、R7And R8Each of the remaining parts of (a) is independently: h; one or more adjacent amino acid residues; containing guanidino group; an amino acid side chain; straight or branched chain (C)1-C8) Alkyl, (C)2-C8) Alkenyl, (C)2-C8) Alkynyl, (C)1-C8) Hydroxyalkyl group, (C)3-C8) Aryl group, (C)3-C8) Cycloalkyl group, (C)3-C8) Aryl radical (C)1-C6) Alkylene, (C)3-C8) Cycloalkyl (C)1-C6) Alkylene, optionally substituted with ethylene glycol units comprising 1 to 50 ethylene glycol moieties; -CH2-(OCH2-CH2)qOP1;-CH2-(OCH2-CH2)q-NHP1;-CH2-(SCH2-CH2)q-SP1;-CH2-(OCH2-CH2)r-OH;-CH2-(OCH2-CH2)r-NH2;-CH2-(OCH2-CH2)r-NHC(NH)NH2(ii) a or-CH2-(OCH2-CH2)r-S-S[CH2CH2]sNHC(NH)NH2In which P is1Is H, (C)1-C8) Alkyl, (C)2-C8) Alkenyl, (C)2-C8) Alkynyl, (C)3-C8) Aryl group, (C)3-C8) Cycloalkyl group, (C)3-C8) Aryl radical (C)1-C6) Alkylene or (C)3-C8) Cycloalkyl (C)1-C6) An alkylene group; q is an integer of 0 to 50; r is an integer of 1 to 50, and s is an integer of 1 to 50,

or a pharmaceutically acceptable salt thereof.

14. The gene recognition reagent according to claim 13, wherein R is1、R2、R3、R4、R5、R6、R7Or R8Is substituted by (C)1-C6) Alkyl groups: - (OCH)2-CH2)qOP1;-(OCH2-CH2)q-NHP1;-(SCH2-CH2)q-SP1;-(OCH2-CH2)r-OH;-(OCH2-CH2)r-NH2;-(OCH2-CH2)r-NHC(NH)NH2(ii) a Or- (OCH)2-CH2)r-S-S[CH2CH2]sNHC(NH)NH2(ii) a Wherein, P1Is H, (C)1-C8) Alkyl, (C)2-C8) Alkenyl, (C)2-C8) Alkynyl, (C)3-C8) Aryl group, (C)3-C8) Cycloalkyl group, (C)3-C8) Aryl radical (C)1-C6) Alkylene or (C)3-C8) Cycloalkyl (C)1-C6) An alkylene group; q is an integer of 0 to 50; r is an integer from 1 to 50, and s is an integer from 1 to 50.

15. The gene-recognition reagent according to claim 13, wherein R is4And R7is-L-R3

16. The gene-recognition reagent according to claim 13, wherein R is5And R8Comprising an arginine residue.

17. The gene recognition reagent according to claim 1, having the following structure:

wherein the content of the first and second substances,

n is an integer from 1 to 8;

m is an integer of 1 to 5;

R2attached to gamma carbon and is: containing guanidino group; an amino acid side chain; straight or branched chain (C)1-C8) Alkyl, (C)2-C8) Alkenyl, (C)2-C8) Alkynyl, (C)1-C8) Hydroxyalkyl group, (C)3-C8) Aryl group, (C)3-C8) Cycloalkyl group, (C)3-C8) Aryl radical (C)1-C6) Alkylene, (C)3-C8) Cycloalkyl (C)1-C6) Alkylene, optionally substituted with ethylene glycol units comprising 1 to 50 ethylene glycol moieties; -CH2-(OCH2-CH2)qOP1;-CH2-(OCH2-CH2)q-NHP1;-CH2-(SCH2-CH2)q-SP1;-CH2-(OCH2-CH2)r-OH;-CH2-(OCH2-CH2)r-NH2;-CH2-(OCH2-CH2)r-NHC(NH)NH2(ii) a or-CH2-(OCH2-CH2)r-S-S[CH2CH2]sNHC(NH)NH2In which P is1Is H, (C)1-C8) Alkyl radical (C)2-C8) Alkenyl, (C)2-C8) Alkynyl (C)3-C8) Aryl radical, (C)3-C8) Ring (C)Alkyl radical (C)3-C8) Aryl radical (C)1-C6) Alkylene or (C)3-C8) Cycloalkyl (C)1-C6) An alkylene group; q is an integer of 0 to 50; r is an integer from 1 to 50, and s is an integer from 1 to 50;

R3is an unsubstituted fused-ring polycyclic aromatic moiety, for example pentene, indene, naphthalene, azulene, heptene, biphenyl, as-indene, s-indene, acenaphthylene, fluorene, phenalene, phenanthrene, anthracene, fluoranthene, acephenanthrene, aceanthrylene, triphenylene, pyrene,

Figure FDA0002641565610000052

R5、R7and R8Each of which is independently H, a guanidine-containing group such as

Figure FDA0002641565610000053

Wherein n is 1,2,3,4 or 5, an amino acid side chain or one or more adjacent amino acid residues,

or a pharmaceutically acceptable salt thereof.

18. The gene recognition reagent according to claim 17, wherein the R is1、R2、R4、R4、R5、R6、R7Or R8Is substituted by (C)1-C6) Alkyl groups: - (OCH)2-CH2)qOP1;-(OCH2-CH2)q-NHP1;-(SCH2-CH2)q-SP1;-(OCH2-CH2)r-OH;-(OCH2-CH2)r-NH2;-(OCH2-CH2)r-NHC(NH)NH2(ii) a Or- (OCH)2-CH2)r-S-S[CH2CH2]sNHC(NH)NH2(ii) a Wherein, P1Is H, (C)1-C8) Alkyl, (C)2-C8) Alkenyl, (C)2-C8) Alkynyl (C)3-C8) Aryl group, (C)3-C8) Cycloalkyl group, (C)3-C8) Aryl radical (C)1-C6) Alkylene or (C)3-C8) Cycloalkyl (C)1-C6) An alkylene group; q is an integer of 0 to 50; r is an integer from 1 to 50, and s is an integer from 1 to 50.

19. The gene-recognition reagent according to claim 17, wherein R is2is-CH2-O-CH2-O-CH2-CH2-OH,R8Is H, R5Is Arg-Dab (pyrene) -, Arg-Orn (pyrene) -or Arg-Lys (pyrene) -; r7is-Dab (pyrene) -Arg, -Orn (pyrene) -Arg or-Lys (pyrene) -Arg, optionally wherein the chiral centers of Arg, Dab, Orn and Lys are L-Arg, LDab, L-Orn and L-Lys.

20. The gene recognition reagent of claim 2, wherein the two instances of the 2-to 5-ring fused polycyclic aromatic moiety are the same.

21. The gene recognition reagent according to claim 2, wherein one or two of the 2-to 5-ring fused polycyclic aromatic moieties are unsubstituted or substituted pentene, indene, naphthalene, azulene, heptene, biphenyl, as-indene, s-indene, acenaphthylene, fluorene, phenalene, phenanthrene, anthracene, fluoranthene, acephenanthrene, aceanthrylene, triphenylene, pyrene,naphthalene/tetracene, pleiadene, picene or perylene, optionally substituted with one or more heteroatoms such as O, N, P and/or S.

22. The gene recognition reagent according to claim 2, wherein one or both of the 2-to 5-ring fused polycyclic aromatic moieties comprises riboflavin (vitamin B2), mangiferin (mangostin), or mangiferin (mangiferin).

23. The gene recognition reagent according to any one of claims 9 to 27, wherein R is1And R2Different in two or more gamma carbons.

24. The gene recognition reagent according to claim 23, wherein R is1Is wherein R is1And R2H in different ones of the two or more gamma carbons.

25. The gene recognition reagent according to claim 23, wherein R is2Is wherein R is1And R2H in different ones of the two or more gamma carbons.

26. The gene recognition reagent according to claim 1, which comprises a guanidine moiety.

27. The gene-recognition reagent according to claim 26, which comprises a guanidino-containing group

Wherein n is 1,2,3,4 or 5.

28. The gene recognition reagent according to any one of claims 9 to 27, wherein R is2is-CH2-(OCH2-CH2) r-OH, wherein r is an integer of 1 to 50, an integer of 1 to 10, or 2.

29. The gene recognition reagent according to any one of claims 9 to 27, wherein R is2is-CH2-O-CH2-CH2-O-CH2-OH, and/or R3Is pyrene.

30. The gene recognition reagent according to claim 1, wherein the linker comprises 5 to 25 atoms, or 1 to 10 total C, O, P, N and S atoms in total.

31. The gene recognition reagent according to claim 1, wherein the nucleobase sequence is completely complementary to a nucleic acid having an expanded repeat sequence associated with a repetitive expansion disease, such as FRDA, FRAXA, FRAXE, SCA1, SCA2, SCA3(MJD), SCA6, SCA7, SCA17, DRPLA, SBMA, HD, MD1, MD2, FXTAS, SCA8, SCA10, SCA12, HDL2 or ALS.

32. The gene recognition reagent of claim 31, wherein the expanded repeat sequence has one of the following sequences: (GAA) n, (CGG) n, (CCG) n, (CAG) n, (CTG) n, (CCTG) n, (ATTCT) n, or (GGGGCC) n, wherein n is at least 3.

33. A method of binding nucleic acid comprising contacting nucleic acid having a target sequence with the gene recognition reagent of claim 1.

34. The method according to claim 33, wherein the nucleobase sequence of the gene recognition agent is fully complementary to a nucleic acid having an expanded repeat sequence associated with a repeated expansion disease, such as FRDA, FRAXA, FRAXE, SCA1, SCA2, SCA3(MJD), SCA6, SCA7, SCA17, DRPLA, SBMA, HD, MD1, MD2, FXTAS, SCA8, SCA10, SCA12, HDL2 or ALS.

35. The method of claim 34, wherein the expanded repeat sequence has one of the following sequences: (GAA) n, (CGG) n, (CCG) n, (CAG) n, (CTG) n, (CCTG) n, (ATTCT) n, or (GGGGCC) n, wherein n is at least 3.

36. A method of knocking down mRNA expression in a cell, comprising contacting a target sequence of the mRNA with the gene recognition agent of claim 1 having a nucleobase sequence complementary to the target sequence.

37. The method according to claim 36, wherein the nucleobase sequence of the gene recognition agent is fully complementary to a nucleic acid having an expanded repeat sequence associated with a repeated expansion disease, such as FRDA, FRAXA, FRAXE, SCA1, SCA2, SCA3(MJD), SCA6, SCA7, SCA17, DRPLA, SBMA, HD, MD1, MD2, FXTAS, SCA8, SCA10, SCA12, HDL2 or ALS.

38. The method of claim 37, wherein the expanded repeat sequence has one of the following sequences: (GAA) n, (CGG) n, (CCG) n, (CAG) n, (CTG) n, (CCTG) n, (ATTCT) n, or (GGGGCC) n, wherein n is at least 3.

39. A method of identifying a target sequence of a nucleic acid in a sample, comprising: contacting a sample comprising a nucleic acid with the gene recognition reagent of claim 1, wherein the aryl moiety produces a first fluorescent emission when not attached to the target sequence when exposed to an excitation frequency of light and a second fluorescent emission different from the first fluorescent emission when attached to the target sequence when exposed to the excitation frequency of light, and determining the presence of the target sequence in the sample by exciting the fluorescent aromatic moiety and measuring the amount of the second fluorescent signal produced in the sample by the fluorescent aromatic moiety.

40. The method of claim 39, wherein the fluorescent aromatic moiety is pyrene.

41. The method according to claim 39, wherein the nucleobase sequence of the gene recognition agent is fully complementary to a nucleic acid having an expanded repeat sequence associated with a repeated expansion disease, such as FRDA, FRAXA, FRAXE, SCA1, SCA2, SCA3(MJD), SCA6, SCA7, SCA17, DRPLA, SBMA, HD, MD1, MD2, FXTAS, SCA8, SCA10, SCA12, HDL2 or ALS.

42. The method of claim 39, wherein the expanded repeat sequence has one of the following sequences: (GAA) n, (CGG) n, (CCG) n, (CAG) n, (CTG) n, (CCTG) n, (ATTCT) n, or (GGGGCC) n, wherein n is at least 3.

43. A composition comprising a gene recognition agent according to claim 1 and a pharmaceutically acceptable carrier.

1. Field of the invention

Described herein are compositions and methods for binding nucleic acids using nucleic acids and nucleic acid oligomer compositions. Also provided is a method of treating repetitive dilation diseases such as type 1 (DM1) and type 2 (DM2) myotonic dystrophy and the like.

2.Description of the related Art

For most organisms, genetic information is encoded in double-stranded DNA in watson-crick base pairing, where adenine (a) pairs with thymine (T) and cytosine (C) pairs with guanine (G). Depending on which set of the genetic information is decoded by transcription and translation, the developmental program and physiological state will be determined. The development of self-programmable molecules to sequence-specifically bind to any part of the genetic biopolymer, thereby enabling the control of the genetic information flow and the assessment and manipulation of the structure and function of the genome, is important for biological and biomedical research. This work is also important for medical and therapeutic applications for the treatment and detection of genetic diseases.

Compared to proteins, RNA molecules are easier to target because they consist of only four building blocks (a, C, G, U), the interaction of which is defined by the established rules of watson-crick base pairing. The secondary structure of RNA is generally thermodynamically less stable than standard double-stranded DNA (or RNA) and thus energetically less demanding for binding, since many of them are non-classical (mismatched) except for classical (perfectly matched) bases and comprise single-stranded loops, bulges and knots. The presence of these local interaction domains is critical for "tertiary" interactions and for the assembly of secondary structures into compact three-dimensional shapes. Thus, subtle changes in the interaction pattern or binding strength in these regions will have profound effects on the overall three-dimensional folding pattern of the RNA. Thus, molecules that can be used to modulate RNA interactions and thereby interfere with RNA folding behavior are important as molecular tools for assessing RNA function and as therapeutic and diagnostic agents.

Genetic diseases are often caused by aberrant protein function resulting from mutations in the DNA coding sequence or dysregulation of the level of transcription or translation, resulting in loss or gain of protein function. However, over the last three decades, there has been a substantial body of evidence that a number of neuromuscular diseases, more than 20, including type I myotonic dystrophy (DM1) and type 2 myotonic dystrophy (DM2), have been caused by unstable repetitive expansion. Expansion of the coding region of a gene can result in altered protein function, while expansion in non-coding regions can cause disease without interfering with the protein sequence through toxic acquisition of RNA function, and in some cases, accidental production of deleterious polypeptides through repetitive non-atg (ran) translation.

The prototype of the latter type of genetic disease is DMI, a debilitating muscular atrophy with one disease occurring in every 8000 adults worldwide and for which there is no effective treatment. DMJ is caused by expansion of CTG repeats in the 3 '-untranslated region (3' -UTR) of dystrophic myotonic kinase (DMPK) gene, from 5-35 repeats in the normal range to 80 in the pathogenic range>2500 pieces of the extract. The etiology of DMI is mainly due to RNA toxicity. Once transcribed, expanded rCUG repeats (rCUG)exp) An imperfect hairpin structure is used, which chelates the myoblinding protein I (MBNL)1) It is a key regulator of RNA splicing. Their association results in rCUGexp-MBNL1The complex is trapped in the nucleus, preventing its export into the cytoplasm to produce DMPK protein, and MBNL1Performing its normal physiological functions. The accumulated evidence suggests that therapeutic intervention can be developed by targeting the mutant transcript to DM1 and possibly for other related neuromuscular diseases. However, the challenge is how to design molecules that will target expanded transcripts without interfering with the wild type (wt) and will be able to switch from rCUGexpBy substitution of non-homologous proteins, e.g. MBNL1

The pursuit of this goal has led to targeting of rCUGexpIncluding pentamidine, triaminotriazine and peptidomimetics. Recently, Disney and his colleagues reported the development of modular peptoids (peptioids) and the identification of several small molecules with high affinity and potency. An antisense approach using morpholino and 2' -O-methoxyethyl spacer (gapmer) was also investigated and shown to be useful for disrupting rCUGexp-MBNL1Complexing and degrading toxic RNA as well as reversing the DMI phenotype in animal models are effective. Recently, a counter gene strategy using TALENs and CRISPR/Cas9 for modifying affected alleles was investigated as a possible therapy for DMI and related medical conditions. Despite the promise, there remain significant challenges associated with recognition specificity and/or selectivity and cellular delivery to some extent with many such designed molecules, particularly antisense agents. The low to moderate affinities of most synthetic oligonucleotide molecules developed to date, combined with the lack of substantial binding synergy, have prevented the use of shorter probes to more easily achieve cellular delivery and better distinguish expanded (pathogenic) RNA repeat transcripts from wild-type.

Nucleic acid interactions, such as RNA-RNA and RNA-protein interactions, play key roles in gene regulation, including replication, translation, folding, and packaging. The ability to selectively bind these perturbed regions in the secondary structure of RNA is important in manipulating its physiological functions. Thus, there is a need for improved reagents and methods capable of selectively binding nucleic acids.

Background

Disclosure of Invention

In one aspect, a gene recognition reagent is provided. The gene recognition reagent comprises: a nucleic acid or nucleic acid analog backbone having a first end and a second end and having from 3 to 8 ribose, deoxyribose, or nucleic acid analog backbone residues; the same or different nucleobases linked to a plurality of ribose, deoxyribose, or nucleic acid analog backbone residues in a sequence complementary to the target nucleic acid; a first aryl moiety attached to the first end of the nucleic acid or nucleic acid analogue backbone by a linker; and a second aryl moiety, optionally identical to the first aryl moiety, attached to the second end of the nucleic acid or nucleic acid analogue backbone by a linker, wherein the aryl moiety is stacked with the aryl moiety of an adjacent recognition reagent when the recognition reagent hybridizes to an adjacent sequence of the target nucleic acid. Also provided are compositions comprising the gene recognition agents and a pharmaceutically acceptable carrier.

In another aspect, there is provided a method of binding nucleic acid in a cell, the method comprising contacting a target sequence of the mRNA with a gene recognition reagent comprising: a nucleic acid or nucleic acid analog backbone having a first end and a second end and having from 3 to 8 ribose, deoxyribose, or nucleic acid analog backbone residues; the same or different nucleobases linked to a plurality of ribose, deoxyribose, or nucleic acid analog backbone residues in a sequence complementary to the target nucleic acid; a first aryl moiety attached to the first end of the nucleic acid or nucleic acid analogue backbone by a linker; and a second aryl moiety, optionally identical to the first aryl moiety, linked to the second end of the nucleic acid or nucleic acid analogue backbone by a linker, wherein the aryl moiety is stacked with the aryl moiety of an adjacent recognition reagent when the recognition reagent hybridizes to an adjacent sequence of the target nucleic acid and has a nucleobase sequence complementary to the target sequence.

In another aspect, there is provided a method of knocking down mRNA expression in a cell, comprising contacting a target sequence of mRNA with a gene recognition agent comprising: a nucleic acid or nucleic acid analog backbone having a first end and a second end and having from 3 to 8 ribose, deoxyribose, or nucleic acid analog backbone residues; the same or different nucleobases linked to a plurality of ribose, deoxyribose, or nucleic acid analog backbone residues in a sequence complementary to the target nucleic acid; a first aryl moiety attached to the first end of the nucleic acid or nucleic acid analogue backbone by a linker; and a second aryl moiety, optionally identical to the first aryl moiety, linked to the second end of the nucleic acid or nucleic acid analogue backbone by a linker, wherein the aryl moiety is stacked with the aryl moiety of an adjacent recognition reagent when the recognition reagent hybridizes to an adjacent sequence of the target nucleic acid and has a nucleobase sequence complementary to the target sequence.

In another aspect, a method of identifying a target sequence of a nucleic acid in a sample is provided. The method comprises the following steps: contacting a sample comprising nucleic acids with a gene recognition reagent comprising a nucleic acid or nucleic acid analog backbone having a first end and a second end and having from 3 to 8 ribose, deoxyribose, or nucleic acid analog backbone residues; the same or different nucleobases linked to a plurality of ribose, deoxyribose, or nucleic acid analog backbone residues in a sequence complementary to the target nucleic acid; a first aryl moiety attached to the first end of the nucleic acid or nucleic acid analogue backbone by a linker; and a second aryl moiety, optionally identical to the first aryl moiety, attached to the second end of the nucleic acid or nucleic acid analogue backbone by a linker, wherein the aryl moiety is stacked with the aryl moieties of adjacent recognition reagents when the recognition reagents hybridize to adjacent sequences of the target nucleic acid, wherein the aryl moiety produces a first fluorescent emission when exposed to an excitation frequency of light without being attached to the target sequence and a second fluorescent emission different from the first fluorescent emission when exposed to an excitation frequency of light with being attached to the target sequence, and the presence of the target sequence in the sample is determined by exciting a fluorescent aromatic moiety and measuring the amount of said second fluorescent signal produced by the fluorescent aromatic moiety in the sample.

Brief Description of Drawings

FIG. 1 is a schematic diagram illustrating the cooperative binding of non-limiting examples of gene recognition reagents described herein. The sequence of the gene recognition reagent is depicted in the 3 'to 5' direction to depict binding to RNA containing repeated sequences (SEQ ID NO: 1).

Exemplary structures of nucleic acid analogs are provided in FIGS. 2 (A-F).

Figure 3 provides an exemplary nucleobases structure.

FIGS. 4A-4C provide structures of exemplary bivalent nucleobases.

Fig. 5 provides examples of amino acid side chains.

Fig. 6 depicts the chemical building blocks (a, group a), MP γ PNA oligomers (B) and RNA targets (C) described in the examples.

FIG. 7 UV melting curves for PI to P4 and T6. The concentration of T6 prepared in physiologically relevant buffer was 1. mu.M, and the concentration of each probe was 6. mu.M. Evidence of synergistic binding was clearly observed with P4.

FIG. 8 UV melting curves of P4-RNA heteroduplexes at 8. mu.M strand concentration in physiologically relevant buffer. Samples were prepared by mixing 8 μ MP4 with the corresponding RNA target at the following concentrations in physiological buffer: t1 ═ 8.0 μ M, T2 ═ 4.0 μ M, T4 ═ 2.0 μ M, T6 ═ 1.3 μ M, and T8 ═ 1.0 μ M, and were annealed at 90 ℃ for 5 minutes, and then gradually cooled to room temperature.

FIG. 9 UV-melting transitions of RNA and corresponding probe-RNA heteroduplexes comprising perfect match and mismatch sequences as a function of the number of r (CUGCUG) n-binding sites in the target. Illustration is shown: UV melting curves of RNA with P5 and P6 containing single base mismatches and double base mismatches, respectively.

FIG. 10 fluorescence spectra of P4-RNA duplexes at equimolar concentrations (P4 ═ 1 μ M; T1 ═ 1 μ M, T2 ═ 1/2, T41/4, T6 ═ 1/6, T81/8) after incubation at 37 ℃ for 1 hour and then excitation at 345 nm. Illustration is shown: the fluorescence signal at 480nm at 37 ℃ was time-varying after addition of P4 (1. mu.M) to T8 (1/8. mu.M), P4 (1. mu.M) to [ T1 (1. mu.M) + T8 (1/8. mu.M) ].

FIG. 11 is a photograph of the sample for fluorescence measurement as shown in FIG. 10, when irradiated with a hand-held UV lamp. The concentrations of the individual components were as follows: p4 ═ 1 μ M; t1 ═ 1 μ M, T2 ═ 1/2 μ M, T4 ═ 1/4 μ M, T6 ═ 1/6 μ M, and T8 ═ 1/8 μ M.

FIG. 12 Selective binding of P4 to r (CUGCUG) n-RNA transcript. Samples were prepared by mixing pre-annealed RNA with probe at 37 ℃ for 4 hours. The ratio of P4 to total RNA binding sites was 0 (lane 3), 1/4 (lane 4), 1/2 (lane 5), 1/1 (lane 6); 1/1 for unmatched P5 (lane 7).

FIG. 13 MBNL by P41Permutation from T48. P-T48 was allowed to react with GST-MBNL before adding P41-F1 forming a complex. Final T48 and GST-MBNL at 25nM and 400nM, respectively, in physiologically relevant buffer1-concentration of F1 preparation of samples.

Detailed Description

Unless expressly stated otherwise, the use of numerical values in the various ranges specified in this application are expressed as approximations as if the minimum and maximum values within the stated ranges both begin with the word "about". In this manner, minor variations above and below the stated ranges can be used to achieve substantially the same results as values within the ranges. Further, unless otherwise indicated, the disclosure of a range is intended to include every value between the minimum and maximum values for the continuous range. As used herein, "a" and "an" mean one or more.

As used herein, the term "comprising" is open-ended and may be synonymous with "including," comprises, "or" characterized by. The term "consisting essentially of … …" limits the scope of the claims to the specified materials or steps as well as those materials or steps that do not materially affect the basic and novel characteristics of the claimed invention. The term "consisting of does not include any elements, steps or ingredients not specified in the claims. As used herein, embodiments "comprising" one or more stated elements or steps also include, but are not limited to, embodiments "consisting essentially of and" consisting of the stated elements or steps.

Provided herein are compositions and methods for binding a target sequence in a nucleic acid, e.g., for binding to a repetitive expansion associated with a disease involving repetitive expansion of a nucleic acid sequence. Fig. 1 is a schematic illustrating the cooperative binding of a non-limiting example of a recognition agent (gene recognition agent) described herein, targeting the CUG repeat in the RNA hairpin structure as seen in DM 1. In this example, synergistic binding of a module to an adjacent module is facilitated by terminal aromatic groups (such as the exemplary pyrene groups described in the examples below) that emit 380nm and 480nm when unstacked and 480nm when stacked. A variety of recognition agents bind to a template nucleic acid by Watson-Crick (Watson-Crick) or Watson-Crick-like (Watson-Crick-like) cooperative base pairing. In a cell, the template nucleic acid is an RNA or DNA molecule, although in vitro, the template nucleic acid may be any RNA or DNA, as well as modified nucleic acids or nucleic acid analogs. Recognition reagents that are sufficiently close, for example, to bind to adjacent sequences on the template nucleic acid, will associate through pi-pi stacking, such that the association forms primarily longer oligomers or polymers. More detailed information is provided below.

As used herein, a "patient" is an animal, e.g., a mammal, including a primate (e.g., a human, a non-human primate, such as a monkey and chimpanzee), a non-primate (e.g., such as a cow, pig, camel, horse, goat, rabbit, sheep, hamster, guinea pig, cat, dog, rat, mouse, horse and whale), or a bird (e.g., a duck or goose).

As used herein, the term "treatment" or "therapy" refers to beneficial or desired results, such as improvement in one or more liver functions or disease symptoms. The term "treating" or "therapy" also includes, but is not limited to, alleviating or ameliorating one or more symptoms of a repetitive dilation disease (e.g., DM1, DM2, or huntington's disease). "treatment" may also mean an increase in survival compared to the expected survival without treatment.

By "decrease" in the context of a disease marker or symptom is meant a clinically relevant and/or statistically significant decrease in that level. The reduction may be, for example, at least 10%, at least 20%, at least 30%, at least 40% or more, to a level within the normal range acceptable for an individual without such disease, or below the measured level of detection. In certain aspects, the reduction is to a level within the normal range accepted for an individual without such a condition, which may also be referred to as normalization of the level. In certain aspects, a reduction is a normalization of the level of a sign or symptom of a disease, a reduction in the difference between the subject level of the disease sign and the normal level of the disease sign (e.g., a higher normal level is reached when the subject's value must be reduced to reach the normal value, and a lower normal level is reached when the subject's value must be increased to reach the normal value). In certain aspects, the methods comprise clinically relevant inhibition of mRNA expression of a repeat expansion disease, e.g., DM1, DM2, or huntington's disease, e.g., as evidenced by clinically relevant results after treatment of a subject with an identifying agent of the invention.

As used herein, "therapeutically effective amount" is intended to include an amount of an identified agent as described herein that, when administered to a subject having a disease, is sufficient to effect treatment of the disease (e.g., by reducing, ameliorating, or maintaining an existing disease or one or more symptoms of the disease). The "therapeutically effective amount" may vary according to: identifying the agent(s), mode of administration, disease and its severity and medical history, age, weight, family history, genetic make-up, type of prior or concomitant therapy (if any), and other individual characteristics of the subject to be treated.

A "therapeutically effective amount" also includes an amount of an agent that produces some desired local or systemic effect at a reasonable benefit/risk ratio applicable to any treatment. The identifying agents employed in the methods described herein can be administered in an amount sufficient to produce a reasonable benefit/risk ratio applicable to such treatment.

The term "pharmaceutically acceptable carrier" as used herein refers to a pharmaceutically acceptable material, composition or vehicle, such as a liquid or solid filler, diluent, excipient, manufacturing aid (e.g., lubricant, talc, magnesium, calcium or zinc stearate, or stearic acid) or solvent encapsulating material, involved in carrying or transporting a test compound from one organ or portion of the body to another organ or portion of the body. Each carrier must be "acceptable" in the sense of being compatible with the other ingredients of the formulation and not injurious to the subject being treated. Some examples of materials that can be used as pharmaceutically acceptable carriers include: (1) sugars such as lactose, glucose and sucrose; (2) starches, such as corn starch and potato starch; (3) cellulose and its derivatives, such as sodium carboxymethyl cellulose, ethyl cellulose and cellulose acetate; (4) radix astragali powder; (5) malt; (6) gelatin; (7) lubricants, such as magnesium, sodium lauryl sulfate and talc; (8) excipients, such as cocoa butter and suppository waxes; (nine) peanut oil, cottonseed oil, safflower oil, sesame oil, olive oil, corn oil, soybean oil and other oils; (10) glycols, such as propylene glycol; (11) polyols, such as glycerol, sorbitol, mannitol and polyethylene glycol; (12) esters such as ethyl oleate and ethyl laurate; (13) agar; (14) buffering agents such as magnesium hydroxide and aluminum hydroxide; (15) alginic acid; (16) pyrogen-free water; (17) isotonic saline; (18) solving the forest lattice; (19) ethanol; (20) a pH buffer solution; (21) polyesters, polycarbonates and/or polyanhydrides; (22) bulking agents, such as polypeptides and amino acids; (23) serum components, such as serum albumin, HDL and LDL; (22) other non-toxic compatible substances used in pharmaceutical formulations.

As used herein, the term "cell" refers to any type of cell from any animal, such as, but not limited to, rat, mouse, monkey, and human. For example, but not limited to, the cells may be progenitor cells, such as stem cells, or differentiated cells, such as endothelial cells, smooth muscle cells. In certain embodiments, cells for use in medical procedures may be obtained from a patient for autologous procedures or obtained from other donors for allogeneic procedures.

"expression" or "gene expression" refers to the complete information flow from a gene (without limitation, a functional genetic unit such as RNA or protein, or other expression system encoded on a nucleic acid, for producing a gene product in a cell, comprising a transcriptional promoter and other cis-acting elements such as response elements and/or enhancers, an expression sequence that typically encodes a protein (open reading frame or ORF) or a functional/structural RNA, and a polyadenylation sequence to produce a gene product (typically a protein, optionally post-translationally modified or functional/structural RNA), "gene is expressed under the transcriptional control of a specified sequence," or "under the control of a specified sequence," refers to the expression of a gene containing the specified sequence operably linked (typically functionally linked in cis.) the specified sequence can be all or part of a transcriptional element (without limitation a promoter, enhancers and response elements) and may regulate and/or influence, in whole or in part, the transcription of a gene. A "gene for expressing the gene product" is a gene that is capable of expressing the gene product when placed in an appropriate environment, e.g., when transformed, transfected, transduced, or the like, into a cell and the cell is expressed, and under appropriate expression conditions. In the case of constitutive promoters, "suitable conditions" means that it is generally only necessary to introduce the gene into the host cell. In the case of an inducible promoter, "suitable conditions" refer to the administration of an amount of the corresponding inducing agent to an expression system (e.g., a cell) effective to cause expression of the gene.

As used herein, the term "knock-down" refers to a generally significant reduction in the expression of one or more genes in an organism relative to functional genes, e.g., relative to the degree of therapeutic effectiveness. Gene knockdown also includes complete gene silencing. As used herein, "gene silencing" refers to substantially completely preventing expression of a gene. Knockdown and gene silencing may occur during the transcriptional or translational stages. Targeting RNA (e.g., mRNA) in a cell using the recognition agent can modify gene expression by knocking down or silencing one or more genes during translation.

As used herein, the term "nucleic acid" refers to both deoxyribonucleic acid (DNA) and ribonucleic acid (RNA). Nucleic acid analogs include, for example, but are not limited to: 2' -O-methyl substituted RNA, locked nucleic acids, unlocked nucleic acids, triazole-linked DNA, peptide nucleic acids, morpholino oligomers, dideoxynucleotide oligomers, diol nucleic acids, threose nucleic acids, and combinations thereof, including optional ribonucleotide or deoxyribonucleotide residues. In this context, with respect to nucleic acids and nucleic acid analogs, "nucleic acids" and "oligonucleotides" are short single-stranded structures composed of superordinate nucleotides, which may be used interchangeably. Oligonucleotides can be referred to by the nomenclature "mer" by the length of the strand (i.e., the number of nucleotides). For example, an oligonucleotide having 22 nucleotides will be referred to as a 22 mer.

A "nucleic acid analog" is a composition comprising a sequence of nucleobases arranged on a substrate, such as a polymer backbone, and can bind DNA and/or RNA by Watson-Crick hybridization or Watson-Crick-like hydrogen bonding base pairing. Non-limiting examples of common nucleic acid analogs include peptide nucleic acids, such as γ PNA, morpholino nucleic acids, phosphorothioates, locked nucleic acids (2'-O-4' -C-methylene bridges, including oxy, thio or amino versions thereof), unlocked nucleic acids (C2'-C3The ` bond is cleaved), ` 2' -O-methyl substituted RNA, threose nucleic acid, ethylene glycol nucleic acid, etc.

Conformationally pre-organized nucleic acid analogs are nucleic acid analogs having a backbone (pre-organized backbone) that forms only a right-handed helix or a left-handed helix depending on the structure of the nucleic acid backbone. As shown herein, one example of a conformationally pre-organized nucleic acid analog is γ PNA, which has a chiral center at the γ carbon and forms only a right-handed helix or a left-handed helix depending on and due to the chirality of the group at the γ carbon. Likewise, a locked nucleic acid comprises a ribose sugar with a bridge between the 2' oxygen and the 4' carbon, which "locks" the ribose sugar into a 3' -endo (north) conformation.

In the context of the present disclosure, "nucleotide" refers to a monomer comprising at least one nucleobase and backbone elements (backbone moieties), which in nucleic acids such as RNA or DNA are ribose or deoxyribose. "nucleotides" typically also contain reactive groups that allow polymerization under specific conditions. In native DNA and RNA, these reactive groups are the 5 'phosphate and 3' hydroxyl groups. For chemical synthesis of nucleic acids and analogs thereof, the base and backbone monomers may contain modified groups, such as blocked amines as known in the art. "nucleotide residue" refers to a single nucleotide that is incorporated into an oligonucleotide or polynucleotide. Similarly, "nucleobase residue" refers to a nucleobase incorporated into a nucleotide or nucleic acid or analog thereof. "Gene recognition reagent" generally refers to a nucleic acid or nucleic acid analog comprising a nucleobase sequence capable of hybridizing by cooperative base pairing, e.g., Watson-Crick base pairing or Watson-Crick-like base pairing, to a complementary nucleic acid or nucleic acid analog sequence on the nucleic acid.

In more detail, the nucleotides for RNA, DNA or nucleic acid analogs have the structure a-B, wherein a is a backbone monomer moiety and B is a nucleobase as described herein. The backbone monomer may be any suitable nucleic acid backbone monomer, such as ribotriphosphate or deoxyribotriphosphate, or a monomer of a nucleic acid analogue, such as a Peptide Nucleic Acid (PNA), such as γ PNA (γ PNA). In one example, the backbone monomer is ribose mono-, di-, or triphosphate ribose or deoxyribose mono-, di-, or triphosphate, such as ribose or deoxyribose 5' monophosphate, diphosphate, or triphosphate. The backbone monomers include both structural "residue" components, such as ribose in RNA, and any reactive groups that are modified when the monomers are joined together, such as the 5 'triphosphate and 3' hydroxyl groups of ribonucleotides, which are polymerized into RNA upon modification, leaving phosphodiester linkages. Also for PNA, the C-terminal carboxyl group and N-terminal amine reactive group of the N- (2-aminoethyl) glycine backbone monomer condense during polymerization to leave a peptide (amide) linkage. In another aspect, the reactive group is a phosphoramidite group useful in phosphoramidite oligomer synthesis, as is widely known in the art. The nucleotide also optionally contains one or more protecting groups known in the art, such as 4, 4' -Dimethoxytrityl (DMT), and as described herein. Many other methods of making synthetic gene recognition reagents are known and depend on the backbone structure and the specific chemistry of the base addition process. Determining which reactive groups are utilized in the attachment of nucleotide monomers and which groups are protected in the base, and the steps required to prepare the oligomer, are within the capabilities of one of ordinary skill in the chemical arts and in the synthesis of nucleic acid analog oligomers.

Non-limiting examples of common nucleic acid analogs include peptide nucleic acids, such as γ PNA, phosphorothioates (e.g., FIG. 2(A)), locked nucleic acids (2'-O-4' -C-methylene bridges, including oxygen, sulfur, or amino versions thereof, e.g., FIG. 2(B)), unlocked nucleic acids (C)2'-C3' bond is cleaved, e.g., FIG. 2(C)),2' -O-methyl substituted RNA, morpholino nucleic acids (e.g., FIG. 2(D)), threose nucleic acids (e.g., FIG. 2(E)), ethylene glycol nucleic acids (e.g., FIG. 2(F), showing R and S forms), and the like. FIG. 2(A-F) shows the monomer structures of various examples of nucleic acid analogs. FIGS. 2(A-F) each show two monomer residues incorporated into the longer chain, as indicated by the wavy line. The incorporated monomers are referred to herein as "residues" and the portion of the nucleic acid or nucleic acid analog other than the nucleobase is referred to as the "backbone" of the nucleic acid or nucleic acid analog. For example, for RNA, an exemplary nucleobase is adenine, the corresponding monomer is adenosine triphosphate, and the introduced residue is an adenosine monophosphate residue. For RNA, the "backbone" consists of ribose subunits linked by phosphate esters, so the backbone monomer is ribose triphosphate before incorporation and ribose monophosphate after incorporation. Like γ PNA, the locked nucleic acid (fig. 2(B)) is conformationally pre-organized.

A "moiety" is a portion of a molecule and includes a class of "residues," which are moieties that remain in a larger molecule, such as a polymer chain, after a compound or monomer is incorporated into the larger molecule, such as a nucleotide incorporated into a nucleic acid or an amino acid incorporated into a polypeptide or protein.

The term "polymer composition" is a composition comprising one or more polymers. As a class, "polymers" include, but are not limited to, homopolymers, heteropolymers, copolymers, block polymers, block copolymers, and can be natural and synthetic. Homopolymers comprise one type of structural unit or monomer, while copolymers comprise more than one type of monomer. An "oligomer" is a polymer comprising a small number of monomer residues, for example 3 to 100 monomer residues. Similarly, the term "polymer" includes oligomers. The terms "nucleic acid" and "nucleic acid analog" include nucleic acids as well as nucleic acid polymers and oligomers.

A polymer "comprises" or "is derived from" the monomer if the monomer is incorporated into the polymer. Thus, the polymer comprises incorporated monomers which are different from the monomers prior to incorporation into the polymer, in that at least during the polymerization some linker groups are incorporated into the polymer backbone or some groups are removed. If the bond is present in the polymer, the polymer is said to contain a particular type of bond. The monomer incorporated is a "residue". Common monomers of nucleic acids or nucleic acid analogs are referred to as nucleotides.

"non-reactive" in the context of a chemical component, e.g., a molecule, compound, composition, group, moiety, ion, etc., means that the component does not react with other chemical components to any substantial degree in its intended use. The non-reactive component is selected so as to not interfere or to insignificantly interfere with the intended use of the component, moiety or group as an identification agent. In the context of linker moieties described herein, they are non-reactive in that they do not interfere with the binding of the recognition reagent to the target template and do not interfere with the attachment of the recognition reagent to the target template.

As used herein, "alkyl" refers to a straight, branched or cyclic hydrocarbon group including, for example, from 1 to about 20 carbon atoms, such as, but not limited to C1-3、C1-6、C1-10Groups such as, but not limited to, straight chain, branched chain alkyl groups such as methyl, ethyl, propyl, butyl, pentyl, hexyl, heptyl, octyl, nonyl, decyl, undecyl, dodecyl and the like. "substituted alkyl" refers to an alkyl group substituted at 1 or more, e.g., 1,2,3,4,5, or from 6 positions, which substituent is attached to any available atom to produce a stable compound, as described herein. "optionally substituted alkyl" refers to alkyl or substituted alkyl. "halogen", "halide" and "halo" refer to-F, -Cl, -Br and/or-I. "alkylene" and "substituted alkylene" refer to divalent alkyl and divalent substituted alkyl groups, respectively, including but not limited to methylene, ethylene, trimethylene, tetramethylene, pentamethylene, hexamethylene, heptamethylene, octamethylene, nonamethylene, or decamethylene. "optionally substituted alkylene" refers to alkylene or substituted alkylene.

"alkenyl" refers to a straight, branched or cyclic hydrocarbon group including, for example, from 2 to about 20 carbon atoms, such as but not limited to C1-3、C1-6、C1-10Group having one or moreSuch as 1,2,3,4 or 5 carbon-carbon double bonds. "substituted alkene" refers to an alkene substituted at 1 or more, e.g., 1,2,3,4, or 5, positions with substituents attached to any available atom to produce a stable compound, as described herein. "optionally substituted alkene" refers to an alkene or substituted alkene. Likewise, "alkenylene" refers to a divalent olefin. Examples of alkenylene groups include, but are not limited to, ethenylene (-CH ═ CH-) and all stereoisomeric and conformationally isomeric forms thereof. "substituted alkenylene" refers to a divalent substituted alkene. "optionally substituted alkenylene" refers to alkenylene or substituted alkenylene.

"alkyne" or "alkynyl" refers to a straight or branched chain unsaturated hydrocarbon having the indicated number of carbon atoms and at least one triple bond. (C)2-C8) Examples of alkynyl groups include, but are not limited to, ethynyl, propynyl, 1-butynyl, 2-butynyl, 1-pentynyl, 2-pentynyl, 1-hexynyl, 2-hexynyl, 3-hexynyl, 1-heptynyl, 2-heptynyl, 3-heptynyl, 1-octynyl, 2-octynyl, 3-octynyl and 4-octynyl. An alkynyl group can be unsubstituted or optionally substituted with one or more substituents described below. The term "alkynylene" refers to a divalent alkyne. Examples of alkynylene groups include, but are not limited to, ethynylene, propynyl. "substituted alkynylene" refers to a divalent substituted alkynyl group.

The term "alkoxy" refers to an-O-alkyl group having the indicated number of carbon atoms. For example, (C)1-C6) Alkoxy includes-O-methyl (methoxy), -O-ethyl (ethoxy), -O-propyl (propoxy), -O-isopropyl (isopropoxy), -O-butyl (butoxy), -O-sec-butyl (sec-butoxy), -O-tert-butyl (tert-butoxy), -O-pentyl (pentoxy), -O-isopentyl (isopentoxy), -O-neopentyl (neopentoxy), -O-hexyl (hexyloxy), -O-isohexyl (isohexoxy) and-O-neohexyl (neohexyloxy). "hydroxyalkyl" refers to a compound in which one or more of the alkyl hydrogen atoms is replaced by an-OH group (C)1-C10) An alkyl group. Examples of hydroxyalkyl groups include, but are not limited to, -CH2OH,-CH2CH2OH,-CH2CH2CH2OH,-CH2CH2CH2CH2OH,-CH2CH2CH2CH2CH2OH,-CH2CH2CH2CH2CH2CH2OH and branched forms thereof. The term "ether" or "oxy ether" refers to an alkyl group in which one or more of the carbon atoms of the alkyl group are replaced with an-O-group. The term ether includes-CH2-(OCH2-CH2)qOP1A compound of formula (I) wherein P1Is a protecting group, -H or (C)1-C10) An alkyl group. Exemplary ethers include polyethylene glycol, diethyl ether, methyl hexyl ether, and the like.

"PEG" refers to polyethylene glycol. "pegylated" refers to a compound comprising a moiety that comprises two or more consecutive ethylene glycol moieties. Non-limiting examples of PEG moieties for PEGylation of compounds include one or more blocks of a chain of 1 to 50 ethylene glycol moieties, e.g., - (O-CH)2-CH2)n-.,–(CH2-CH2-O)n-, or- (O-CH)2-CH2)n-OH, wherein n is 2 to 50.

"heteroatom" means N, O, P and S. The compounds containing N or S atoms may optionally be oxidized to the corresponding N-oxide, sulfoxide or sulfone compounds. "hetero-substituted" refers to an organic compound according to any embodiment herein, wherein one or more carbon atoms is substituted with N, O, P, or S.

"aryl" alone or in combination refers to an aromatic ring system, such as phenyl or naphthyl; "aryl" also includes aromatic ring systems optionally fused to a cycloalkyl ring. A "substituted aryl" is an aryl group that is independently substituted with one or more substituents attached at any available atom to produce a stable compound, wherein the substituents are as described herein. "optionally substituted aryl" refers to aryl or substituted aryl. "arylene" refers to divalent aryl groups, and "substituted arylene" refers to divalent substituted aryl groups. "optionally substituted arylene" refers to arylene or substituted arylene. The term "polycyclic aryl" and related terms, such as "polycyclic aromatic", as used herein, refers to a group consisting of at least two fused aromatic compounds. "heteroaryl" or "hetero-substituted aryl" refers to an aryl group substituted with one or more heteroatoms such as N, O, P, and/or S.

"cycloalkyl" refers to a monocyclic, bicyclic, tricyclic, or polycyclic 3 to 14 membered ring system which is saturated, unsaturated, or aromatic. The cycloalkyl group may be attached through any atom. Cycloalkyl also contemplates fused rings, wherein the cycloalkyl is fused to an aryl or heteroaryl ring. Representative examples of cycloalkyl groups include, but are not limited to, cyclopropyl, cyclobutyl, cyclopentyl, and cyclohexyl. Cycloalkyl groups may be unsubstituted or optionally substituted with one or more substituents as described below. "cycloalkylene" refers to a divalent cycloalkyl group. The term "optionally substituted cycloalkylene" refers to cycloalkylene substituted with 1,2, or 3 substituents attached at any available atom to produce a stable compound, wherein the substituents are as described herein.

"carboxy" or "carboxy-based" refers to a group having the indicated number of carbon atoms (if indicated), and terminating in a-C (O) OH group, and thus having the structure-R-C (O) OH, wherein R is a divalent organic group comprising a straight, branched, or cyclic hydrocarbon. Non-limiting examples of these include: c1-8Carboxyl groups such as glycolic acid, propionic acid, 2-methylpropionic acid, butyric acid, 2, 2-dimethylpropionic acid, valeric acid and the like. "amine" or "amino" refers to a group having the indicated number of carbon atoms and terminating in an-NH-group2A radical, therefore having-R-NH2Structures wherein R is an unsubstituted or unsubstituted divalent organic group, e.g., including straight chain, branched chain, or cyclic hydrocarbons, and optionally containing one or more heteroatoms.

Reference to the foregoing terms is made to any suitable combination of the foregoing, such as arylalkenyl, arylalkynyl, heteroarylalkyl, heteroarylalkenyl, heteroarylalkynyl, heterocyclylalkyl, heterocyclylalkenyl, heterocyclylalkynyl, aryl, heteroaryl, heterocyclyl, cycloalkyl, cycloalkenyl, alkylarylalkyl, alkylarylalkenyl, alkylarylalkynyl, alkenylarylalkyl, alkenylarylalkenyl, alkenylarylalkynyl, alkynylarylalkenyl, alkynylarylalkynyl, alkylheteroarylalkyl, alkylheteroarylalkenyl, alkylheteroarylalkynyl, alkenylheteroarylalkylAlkenyl heteroarylalkenyl, alkenyl heteroarylalkynyl, alkynyl heteroarylalkyl, alkynyl heteroarylalkenyl, alkynyl heteroarylalkynyl, alkylheterocyclylalkyl, alkylheterocyclenyl, alkylheterocyclylalkynyl, alkenylheterocyclylalkyl, alkenylheterocyclylalkynyl, alkynylheterocyclylalkyl, alkynylheterocyclylalkynyl, alkylaryl, alkenylaryl, alkynylaryl, alkylheteroaryl, alkenylheteroaryl, alkynylheteroaryl. For example, "arylalkylene" refers to an alkylene group in which one or more hydrogen atoms in the alkylene group are replaced with an aryl group such as (C)3-C8) An aryl-substituted divalent alkylene group. (C)3-C8) Aryl radical (C)1-C6) Examples of alkylene groups include, but are not limited to, 1-phenylbutene, phenyl-2-butene, 1-phenyl-2-methylpropene, phenylmethylene, phenylpropene, and naphthylethylene. Term "(C)3-C8) Cycloalkyl- (C)1-C6) Alkylene "means wherein C1-C6One or more hydrogen atoms in the alkylene radical being replaced by (C)3-C8) A divalent alkylene group substituted with a cycloalkyl group. (C)3-C8) Cycloalkyl- (C)1-C6) Examples of alkylene groups include, but are not limited to, 1-cyclopropenylbutene, cyclopropenyl-2-butene, cyclopentyl-1-phenyl-2-methylpropene, cyclobutylmethylene, and cyclohexylpropylene.

"amino acids" have the structure H2N-C (R) -C (O) OH, wherein R is a side chain, e.g., an amino acid side chain. "amino acid residue" means the remainder of an amino acid when incorporated into an amino acid chain, such as when incorporated into a recognition reagent disclosed herein, e.g., having the structure-NH-C (R) -C (O) -, H2N-C (R) -C (O) (when at the N-terminus of the polypeptide), or-NH-C (R) -C (O) OH (when at the C-terminus of the polypeptide). An "amino acid side chain" is a side chain of an amino acid, including a proteinogenic or a non-proteinogenic amino acid. The amino acids have the following structure:

wherein R is an amino acid side chain. Non-limiting examples of amino acid side chains are shown in figure 5. Glycine (H)2N-CH2-C (O) OH) has no side chain.

"peptide nucleic acid" refers to a nucleic acid analog or DNA or RNA mimetic in which the sugar phosphodiester backbone of the DNA or RNA is replaced with N- (2-aminoethyl) glycine units. γ PNA (γ PNA) is an oligomer or polymer of γ -modified N- (2-aminoethyl) glycine monomer having the structure:

Figure BDA0002641565620000152

wherein R is attached to the gamma carbon1Or R2Is not hydrogen, such that the gamma carbon is a chiral center. When R is1And R2Is hydrogen (N- (2-aminoethyl) -glycine backbone) or the same, there is no such chirality with respect to the gamma carbon. Bound PNA or gamma PNA monomers,referred to herein as PNA or γ PNA "residues", refers to the remaining structure after integration into an oligomer, wherein each residue has R groups that are the same or different from its base (nucleobase), e.g., adenine, guanine, cytosine, thymine and uracil bases, or other bases, e.g., monovalent and divalent bases as described herein, such that the nucleobase order on a PNA is its "sequence", e.g., DNA or RNA. Nucleobase sequences in nucleic acids or nucleic acid analogue oligomers or polymers (e.g., PNA or γ PNA oligomers or polymers) bind to complementary sequences of adenine, guanine, cytosine, thymine and/or uracil residues in nucleic acids or nucleic acid analogues by nucleobase pairing in a watson-crick or watson-crick like manner, essentially the same as double stranded DNA or RNA.

A "guanidine" or "guanidinium" group may be added to the recognition reagent to increase solubility and/or bioavailability. Since PNAs are produced in a similar manner to synthetic peptides, a simple method of adding a guanidino group is to add one or more terminal arginine (Arg) residues, e.g., γ PNA, recognition reagent at the N-terminus and/or C-terminus of the PNA. Likewise, the pendant arginine groups,or a guanidino-containing moiety, for example,wherein n is, for example, but not limited to, in the range of 1 to 5, or a salt of the foregoing, can be attached to the recognition reagent backbone described herein. A guanidino-containing group is a group that comprises a guanidine moiety and can have less than 100 atoms, less than 50 atoms, for example, less than 30 atoms. In one aspect, the guanidinyl-containing group has the structure:

Figure BDA0002641565620000163

wherein L is a linker according to any aspect described herein, e.g. a non-reactive aliphatic hydrocarbon-based linker, e.g. a methylene, ethylene, trimethylene, tetramethylene or pentamethylene linker. In some aspects, the guanidinyl-containing group has the structure:wherein n is 1-5, for example, guanidino may be arginine.

"nucleobase" includes the primary nucleobase: adenine, guanine, thymine, cytosine and uracil, and modified purine and pyrimidine bases such as, but not limited to, hypoxanthine, xanthene, 7-methylguanine, 5, 6-dihydrouracil, 5-methylcytosine and 5-hydroxymethylcytosine. FIGS. 3 and 4A-4C also depict non-limiting examples of nucleobases, including monovalent nucleobases (e.g., adenine, cytosine, guanine, thymine, or uracil, which are bound to one strand of a nucleic acid or nucleic acid analog) and divalent nucleobases (e.g., JB1-JB16 described herein) that simultaneously bind to complementary nucleobases on both DNA strands and "clamp" the nucleobases, e.g., "G-clamp," which bind to complementary nucleobases with increased strength. Additional purine, purine-like, pyrimidine, and pyrimidine-like nucleobases are known in the art, for example, as disclosed in U.S. patent nos. 8,053,212, 8,389,703, and 8,653,254. For the bivalent nucleobases JB1-JB16 shown in FIG. 4A, Table A shows the specificity of the different nucleobases. Notably, the JB1-JB4 series bind complementary bases (C-G, G-C, A-T and T-A), while JB5-JB16 bind mismatches and are therefore useful for binding two base-matched and/or mismatched bases. Bivalent nucleobases are described in more detail in U.S. patent application publication No. 20160083434a1 and international patent publication No. WO 2018/058091, both of which are incorporated herein by reference.

Table a: bivalent nucleobases

Figure BDA0002641565620000171

Diaminopurine, adenine analog.

Exemplary γ PNA structures are not end-modified with aryl groups in the manner described herein, but may be disclosed as described herein in international patent publication No. WO 2012/138955, which is incorporated herein by reference.

Complementary refers to the ability of polynucleotides (nucleic acids) to hybridize to each other to form interchain base pairs. Base pairs are formed by hydrogen bonding between nucleotide units in antiparallel polynucleotide strands. Complementary polynucleotide strands may be base paired (hybridized) in a watson-crick fashion (e.g., a to T, a to U, C to G) or in any other fashion that allows duplex formation. When RNA is used instead of DNA, uracil instead of thymine is the nucleobase complementary to adenosine. Two sequences comprising complementary sequences can hybridize if they form a duplex under the following conditions: under specified conditions, for example in water, saline (e.g., normal saline or 0.9% w/v saline) or phosphate buffer, or under other stringent conditions, for example, but not limited to, 0.1XSSC (saline sodium citrate) is diluted to 10 XSSC, where 1XSSC is 0.15M NaCl and 0.015M sodium citrate in water. Hybridization of complementary sequences is determined by, for example, salt concentration and temperature, with decreasing melting temperature (Tm) increasing mismatches and stringency. Perfectly matched sequences are considered "perfectly complementary," although one sequence (e.g., a target sequence in an mRNA) may be longer than the other, as is the case with the small recognition agents described herein and their associated longer targets to which they are attached, e.g., containing a repeatedly expanded mRNA.

Containing expanded trinucleosidesToxic RNAs of acid repeats are responsible for many neuromuscular diseases, one of which is myotonic dystrophy type I (DMI). DMI is triggered by repeated expansion of CTG in the 3' -UTR of the DMP K gene, resulting in increased toxicity of RNA function by chelating mblli protein or the like. Described herein are short probes that are capable of binding nucleic acid sequences, e.g., repetitive nucleic acid sequences, in a sequence-specific and selective manner. For example, as described in the examples, it was demonstrated that short PNA probes comprising terminal pyrene moieties, two triplet repeats in length, are capable of binding rCUG repeats in a sequence-specific and selective manner. The probe can distinguish pathogenicity rCUGexp from wild type transcripts and can destroy rCUGexp-MBNL1And (c) a complex. Thus, in various aspects, short nucleic acid probes, referred to as gene recognition agents, are described herein for targeting RNA repeat expansion associated with DMI and other related neuromuscular diseases.

The methods and compositions described herein overcome three major obstacles currently faced with conventional antisense and antigene (antigene) approaches. The first obstacle relates to the scale and cost of oligonucleotide synthesis. Since oligonucleotides are traditionally synthesized on solid supports in a stepwise manner, it is difficult to scale up the production scale. This translates into high cost and unmet need for oligonucleotide therapeutics. The methods and compositions described herein overcome this challenge because the recognition agents are relatively small in size, 3 to 8 nucleotides in length, at the molecular weight boundary between small molecules and biomimetics. The compounds described herein can be produced on a large scale using convergent solution phase synthesis methods, which translates into lower production costs and greater accessibility of these materials for therapy.

The second obstacle involves cellular delivery-in particular how these nucleic acid probes are made to cross the lipid bilayer of the cell membrane and enter the cytoplasm and nucleus of the target cell. Most oligonucleotides cannot penetrate into the cell membrane due to their relatively large molecular weight. Their delivery into cells would require the assistance of transfection reagents or mechanical or electrical transduction. Although these methods have been successfully used to transport oligonucleotides and other macromolecules into cells, they are limited to small scale in vitro (tissue culture) experimental settings. In vivo, systemic delivery (a requirement for treatment of genetic diseases and most infectious diseases) remains a problem, particularly for central nervous system diseases. The present invention overcomes this limitation due to the size reduction of recognition reagents and the flexibility of chemical modification. The fact that their size is relatively small makes them more readily taken up by cells and more permeable to the nuclear membrane. Furthermore, with regard to PNAs, such as γ PNAs, due to their synthetic flexibility, since any chemical group can be incorporated into the backbone of the PNA, these recognition reagents can be easily modified with specific chemical functional groups to facilitate cellular uptake and systemic delivery.

The third obstacle involves nonspecific binding and cytotoxic effects. When inserted into a cell, a synthetic or otherwise naked oligonucleotide 10-30nt in length will bind not only to its intended target, but also to other DNA or RNA regions having related sequences. This non-specific binding captures the probe, prevents the probe from freely diffusing, and seeks and binds its target. A decrease in the effective concentration of probe will result in a decrease in efficacy due to non-specific binding. In addition, such non-specific binding may also lead to cytotoxic effects due to misregulation of gene expression and/or dysfunction of other key proteins. In fact, non-specific binding has been considered to be the main cause of side effects of oligonucleotide therapeutic drugs (as well as small molecule drugs), and no solution is currently available. The present invention overcomes this limitation by using weak interactions between short recognition agents (typically 3 to 8nt in length) and the target. This weak "kiss" interaction allows the module to diffuse freely within the intracellular environment to find its target. In this case, its designated target is different from "random", "single binding sites", in that it comprises repeated sequence elements, which allows modules to assemble adjacent to each other in a coordinated manner through adjacent base stacks and begin a "native chemical ligation" reaction, forming a series of extended oligomers of different lengths.

As shown in Table B, there are currently several known genetic diseases associated with unstable repetitive expansion of nucleotide sequences. The challenge is that the three-dimensional structure of the target (in this case DNA or RNA) is monotonic compared to proteins. This makes it difficult for small molecules to distinguish a particular site from other DNA or RNA sequences.

The advantages of the present invention over the "small molecule drug" approach are the treatment of cancer and the fight against bacterial, viral and parasitic infections, where the target rapidly develops due to the rapid rate of mutation. Oncogenic clones, as well as bacterial, viral and parasitic pathogens, have many conserved and repetitive elements in the genome and transcriptome that may be targeted in this way. In contrast to the "small molecule-protein recognition" approach, the likelihood that these tumor cells or pathogens will evade these recognition agents described herein and become drug resistant is not great because mutations must occur on every repetitive element in the DNA/RNA template.

The recognition agents described herein are designed to be chemically inert until they enter the cytoplasm and/or nucleus of the cell, under which conditions the recognition agent hybridizes to a complementary sequence in the nucleic acid and, if adjacent to the sequence, to another recognition agent that hybridizes to it, the adjacent aromatic groups of which will stack, thereby linking the recognition agent. The recognition agent recognizes and binds its DNA or RNA target through cooperative watson-crick (or watson-crick-like hydrogen bonding) base pairing interactions whereby adjacent modules form an expanded cascade oligomer in a head-to-tail fashion by pi-pi stacking ("pi stacking") of non-covalently linked terminal aryl groups, as shown, for example, in fig. 1. The rate of intramolecular and intermolecular pi stacking in the recognition reagent can be controlled by adjusting the rigidity of the recognition reagent backbone with rigidity, such as a conformationally pre-organized backbone, e.g., a γ PNA or LNA backbone, which limits intramolecular pi stacking of the terminal aromatic groups, thereby potentially inactivating the recognition reagent. That is, even if intramolecular pi stacking occurs, the recognition reagent will "open" when hybridized to the target sequence. In the presence of the target sequence, hybridization and ligation to the target sequence predominate.

The recognition reagents described herein combine the features of small molecules, e.g., low molecular weight, ease of large scale production, low production cost, cell permeability and desired pharmacokinetics, and sequence-specific recognition of oligonucleotides by Watson-Crick base-pairing pairs. The attachment of oligomer recognition reagents has been illustrated and described in accordance with several examples, which are intended to be illustrative in all respects and not limiting. Thus, many variations of the detailed embodiments of the invention are possible.

Examples of uses of the recognition agents described herein are in the treatment of genetic diseases with repeated expansions of small sequences, such as those listed in table B.

TABLE B genetic diseases associated with unstable repeats

Figure BDA0002641565620000201

Based on table B, the recognition agents targeting the gene products described above include the following sequences in the 5 'to 3' direction: GAA, CGG, CCG, CAG, CTG, CCTG, ATTCT, and GGGGCC or a sequence complementary thereto (e.g., TTC, CCG, CGG, CTG, CAG, CAGG, AGAAT, or GGCCCC), for targeting RNA, or to hybridize to a target sequence comprising the following repeats: GAA, CGG, CCG, CAG, CTG, CCTG, ATTCT and GGGGCC, or a sequence complementary thereto.

Other potential applications of the invention are in the treatment of cancer (telomeres), bacterial infections (resistant strains, targeting unique repetitive and conserved elements of pathogenic strains), hepatitis c (the disease has no effective therapeutic approach and affects 3% of the world population, by targeting repetitive elements in the viral RNA genome in the present invention), malaria (targeting microsatellites which have been shown to be essential in ciliates' replication and life cycle) and AIDS (this is a rapidly developing target and new mutated sequences can be traced by dialing in the corresponding nucleobase sequences in the recognition reagent).

Thus, provided herein are recognition agents-modified nucleic acids-assembled on a nucleic acid template and bound to each other and linked together on the template. The recognition agent being a nucleusOligomers of acids or nucleic acid analogs, e.g., 3 to 10 bases, e.g., 3,4,5, 6, 7, 8, 9, or 10 bases, or 3 to 8 bases in length, include terminal aromatic (aryl) groups, e.g., fused ring polycyclic aromatic groups of 2-5 rings, or contain 8-20 carbon or ring atoms, at their ends (e.g., 5 'and 3' ends, relative to the nucleic acid or nucleic acid analog). Non-limiting examples of fused ring polycyclic aromatic groups include: pentene (pentalene), indene (indene), naphthalene (naphthalene), azulene (azelene), heptene (heptalene), biphenyl (biphenylene), as-indene (indacene), s-indene, acenaphthylene (acenaphthylene), fluorene (fluorrine), phenalene (phenalene), phenanthrene (phenanthrene), anthracene (anthrylene), fluoranthene (fluoranthrylene), acephenanthrene (aceanthrylene), aceanthrylene (chrysene), aceanthrylene (triphenylene), pyrene (pyrene),

Figure BDA0002641565620000221

(chrysene), naphtalene/tetracene, pleiadene, picene or perylene, all of which are well known in the chemical arts. The aromatic group is attached to the recognition reagent backbone at any suitable point of attachment in its structure in any suitable manner, for example using a linker, using any suitable attachment chemistry. For example, as shown in the following examples, an amino acid or an amino acid comprising two amine groups, such as diaminobutyric acid (Dab), ornithine (Orn) or lysine (Lys), a carboxy-functionalized aromatic compound such as, for example, pyrene-1-carboxylic acid, pyrene-2-carboxylic acid, or pyrene-4-carboxylic acid, pyrene-1-acetic acid, pyrene-2-acetic acid or pyrene-4-acetic acid, may be linked to form an amide bond in the linker.

The aromatic group may be heterocyclic or non-heterocyclic in that the aromatic group does not contain a heteroatom, or contains one or more heteroatoms, e.g., S, O or N, in any of its rings. Furthermore, the aromatic group may be substituted with one or more non-reactive groups that do not substantially interfere with the function of the recognition reagent in hybridization and attachment to the target sequence.

In addition, the aromatic group may have a function in addition to its use for attachment of a recognition reagent. In some aspects, the aromatic group is an antioxidant, such as a vitamin or antioxidant, such as riboflavin (vitamin B2), mangiferin (mangostin) or mangiferin (mangiferin), or other natural aromatic antioxidants, present in fruits and vegetables, such as commonly used dietary supplements to combat oxidative stress, inflammation, cancer, aging, or other diseases.

The gene recognition agents are rigid or conformationally pre-organized in certain aspects, such as gamma PNA and locked nucleic acid backbones. Unless otherwise indicated, all nucleotide sequences are provided in a 5 'to 3' direction (left to right). In the context of PNA oligomers, which can hybridize in parallel or anti-parallel orientation, unless otherwise indicated, their sequences are depicted in a 5 'to 3' orientation relative to their nucleobase sequence, relative to their watson-crick or watson-crick-like binding to a complementary nucleic acid strand.

Moieties in the compounds, such as aryl moieties or nucleobases, are covalently attached to the recognition reagent backbone, and thus are referred to as "attached" to the backbone. Depending on the chemistry used to prepare the compound, the linkage may be direct, or through a "linker" which is a moiety that covalently links two other moieties or groups. In one aspect, the terminal aromatic (aryl) group is attached to the recognition reagent via a linker. The linker is a non-reactive moiety that links the aromatic group to the backbone of the recognition reagent, and in some aspects comprises 1-10 carbon atoms (C)1-C10) Optionally substituted with heteroatoms such as N, S or O, or a non-reactive bond, for example an amide bond (peptide bond) formed by reacting an amine with a carboxyl group. C1-C10Examples of alkylene groups are straight or branched chain alkylene (divalent) moieties, optionally comprising cyclic moieties, such as methylene, ethylene, trimethylene, tetramethylene, pentamethylene, hexamethylene, heptamethylene, octamethylene, nonamethylene or decamethylene moieties (i.e., -CH)2-[CH2]n-, wherein n ═ 1 to 9), optionally comprising an amide bond. Linkers are non-placeholder in that they do not sterically hinder or otherwise substantially interfere with recognition of the agent to the target templateBind and do not interfere with the attachment of the recognition reagent to the target template. The linker is the remainder resulting from the backbone linkage of the aromatic group and the recognition reagent, e.g.,or in one non-limiting example,

Figure BDA0002641565620000232

is generated by the attachment of an acetic acid substituted aryl compound (a) such as pyrene-1-acetic acid to Dab (n ═ 1), Orn (n ═ 2) or Lys (n ═ 3) residues.

In other aspects, a linker or linker group is an organic moiety that links two moieties of a compound, e.g., covalently links two moieties of a compound, such as but not limited to, in the context of the present invention, the linkage of an aromatic group to the backbone of a recognition reagent, the linkage of a nucleobase to the backbone of a nucleic acid or nucleic acid analog, and/or the linkage of a guanidino group to a recognition reagent. The linker typically comprises a direct bond or atom such as oxygen or sulfur, and units such as C (O), C (O) NH, SO2,SO2NH or an atomic chain such as, but not limited to, substituted or unsubstituted alkyl, substituted or unsubstituted alkenyl, substituted or unsubstituted alkynyl, arylalkyl, arylalkenyl, arylalkynyl, heteroarylalkyl, heteroarylalkenyl, heteroarylalkynyl, heterocyclylalkyl, heterocyclylalkenyl, heterocyclylalkynyl, aryl, heteroaryl, heterocyclyl, cycloalkyl, cycloalkenyl, alkylarylalkyl, alkylarenylalkenylarylalkyl, alkenylarylalkenyl, alkenylarylalkynyl, alkynylarylalkyl, alkynylarylalkenyl, alkynylarylalkynyl, alkylheteroarylalkyl, alkylheteroarylalkenyl, alkylheteroarylalkynyl, alkenylheteroarylalkyl, alkenylheteroarylalkenyl, alkenylheteroarylalkynyl, alkynylheteroarylalkyl, alkynylheteroarylalkynyl, alkylheterocyclylalkyl, alkylheterocyclylalkenyl, alkylheterocyclylalkynyl, alkenyl heterocyclylalkyl, alkenyl heterocyclylalkenyl, alkenyl heterocyclylalkynyl, alkynyl heterocyclylalkyl, alkynyl heterocyclylalkenyl, alkynyl heterocyclylalkynyl, alkylaryl, alkenyl aryl, alkynyl aryl, alkyl heteroarylAlkenylheteroaryl, alkynyliheteroaryl, wherein one or more carbon atoms, e.g., methylene or methine (-CH ═) are optionally interrupted or terminated by a heteroatom such as O, S or N, substituted or unsubstituted aryl, substituted or unsubstituted heteroaryl, substituted or unsubstituted heterocycle. In one aspect, the linker comprises about 5 to 25 atoms, such as 5-20, 5-10, e.g., 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16. 17. 18, 19 or 20 atoms, or a total of 1 to 10, for example 1,2,3,4,5, 6, 7, 8, 9 or 10C atoms and heteroatoms, for example O, P, N or S atoms.

For bonding to a PNA (e.g. γ PNA), a convenient and useful linker is one that reacts an amine with a carboxyl group to form an amide bond, e.g. an amino acid is added to the recognition reagent using well known peptide synthesis chemistry, where the amino acid can be pre-modified with a chemical moiety (e.g. an aryl moiety or a guanidino group) as shown in the examples below, and an aryl modified amino acid is added to link the pyrene aryl moiety to the recognition reagent and provide the guanidino group with arginine. Attachment to the non-peptide nucleic acid analogue may be achieved using any suitable attachment chemistry which is well known, for example by using carbodiimide chemistry such as EDC (EDAC, 1-ethyl-3- [ 3-dimethylaminopropyl ] carbodiimide hydrochloride) to attach an amine-modified Ar group to the recognition reagent, for example via a terminal phosphate.

The linker is of a suitable size or length when attaching the aromatic group to the backbone of the recognition reagent so that the aromatic group is placed at the position of the pi-pi stack during attachment of the recognition reagent to the target nucleic acid sequence, as described herein.

According to an aspect of the present invention, there is provided an identification agent comprising: a nucleic acid or nucleic acid analogue backbone having a first end and a second end, prepared from 3 or more, such as 3 to 10, or 3,4,5, 6, 7, 8, 9 or 10 or 3 to 8 nucleic acid or nucleic acid analogue backbone residues, which is optionally conformationally pre-organized; nucleobase sequences, which may be the same or different, attached or linked in sequence to a plurality of nucleic acids or nucleic acid-like backbone residues; a first aryl moiety to whichA first end attached to a nucleic acid or nucleic acid analogue backbone; and optionally a second aryl moiety, identical to said first aryl moiety, attached to a second end of said nucleic acid or nucleic acid analogue backbone. The aryl moiety is independently a2 to 5 ring fused polycyclic aromatic moiety, such as a substituted or unsubstituted aryl or heteroaryl moiety having 2 to 5 fused rings, such as, but not limited to, unsubstituted or substituted pentene, indene, naphthalene, azulene, heptene, biphenyl, as-indene, s-indene, acenaphthylene, fluorene, phenalene, phenanthrene, anthracene, fluoranthene, acephenanthrene, aceanthrylene, triphenylene, pyrene,

Figure BDA0002641565620000253

naphthalene/tetracene, pleiadene, picene or perylene, optionally substituted with one or more heteroatoms such as O, N and/or S, for example xanthene, riboflavin (vitamin B2), mangiferin or mangiferin, and may be the same or different and in the same respect, each stacks with an Ar group of an adjacent recognition reagent when the recognition reagent hybridizes to an adjacent sequence of the target nucleic acid.

According to an aspect of the present invention, there is provided an identification reagent (identification module) having a structure of:

Figure BDA0002641565620000251

wherein R is independently a nucleobase, and each instance of R can be the same or different.

n is an integer from 1 to 6, such as 1,2,3,4,5 or 6;

each B is independently a ribose 5' phosphate residue, a deoxyribose-5-phosphate residue, or a nucleic acid analog backbone residue, and in one aspect, is a backbone residue of a conformationally pre-organized nucleic acid analog such as γ PNA or LNA;

l is independently a linker, e.g., a non-reactive linker or a non-reactive, non-space occupying linker, and each instance of L can be the same or different; and

each instance of Ar is independently a2 to 5 ring fused polycyclic aromatic moietyExamples of substituted or unsubstituted aryl or heteroaryl moieties are those having from 2 to 5 fused rings, such as, but not limited to, unsubstituted or substituted pentene, indene, naphthalene, azulene, heptene, biphenyl, as-indene, s-indene, acenaphthylene, fluorene, phenalene, phenanthrene, anthracene, fluoranthene, aceanthrene, triphenylene, pyrene,

Figure BDA0002641565620000252

naphthonaphthalene/tetracene, pleiadene, picene or perylene, optionally substituted with one or more heteroatoms such as O, N and/or S, for example xanthene, riboflavin (vitamin B2), mangiferin or mangiferin, and may be the same or different and the same in every respect. In some aspects, each Ar is stacked with an Ar group of an adjacent recognition reagent when the recognition reagent hybridizes to an adjacent sequence of the target nucleic acid.

In one aspect, the recognition reagent comprises a PNA backbone, and thus has the following structure:

wherein R is independently a nucleobase, and each instance of R can be the same or different.

n is an integer from 1 to 6, such as 1,2,3,4,5 or 6;

each instance of L is independently a linker, and may comprise one or more amino acid residues, or substituted or unsubstituted alkyl, substituted or unsubstituted alkenyl, substituted or unsubstituted alkynyl, arylalkyl, arylalkenyl, arylalkynyl, heteroarylalkyl, heteroarylalkenyl, heteroarylalkynyl, heterocyclylalkyl, heterocyclylalkenyl, heterocyclylalkynyl, aryl, heteroaryl, heterocyclyl, cycloalkyl, cycloalkenyl, alkylarylalkyl, alkylarylalkynyl, alkylarylalkyl, alkenylarylalkyl, alkenylarylalkenyl, alkenylarylalkynyl, alkynylarylalkyl, alkynylarylalkenyl, alkynylarylalkynyl, alkylheteroarylalkyl, alkylheteroarylalkenyl, alkylheteroarylalkynyl, alkenylheteroarylalkyl, alkenylheteroarylalkenyl, alkenylheteroarylalkynyl, alkynylheteroarylalkyl, alkynylheteroarylalkenyl, alkynylheteroarylalkynyl, alkylheterocyclylalkyl, alkylheterocyclylalkenyl, alkylheterocyclylalkynyl, alkenylheterocyclylalkyl, alkenylheterocyclenyl, alkenylheterocyclylalkynyl, alkynylheterocyclylalkyl and like alkynylheterocyclylalkenyl, alkynylheterocyclylalkynyl, alkylaryl, alkenylaryl, alkynylaryl, alkylheteroaryl, alkenylheteroaryl, alkynylheteroaryl, where one or more carbon atoms, e.g., methylene or methylene (CH ═), are optionally interrupted or terminated by a heteroatom such as O, S or N, substituted or unsubstituted aryl, substituted or unsubstituted heteroaryl, substituted or unsubstituted heterocycle, and optionally contain a guanidino-containing group, e.g., guanidino

Wherein n is 1,2,3,4 or 5, and/or an amino acid side chain;

R3independently a2 to 5 ring fused polycyclic aromatic moiety, such as a substituted or unsubstituted aryl or heteroaryl moiety having 2 to 5 fused rings, such as, but not limited to, unsubstituted or substituted pentene, indene, naphthalene, azulene, heptene, biphenyl, as-indene, s-indene, acenaphthylene, fluorene, phenalene, phenanthrene, anthracene, fluoranthene, acephenanthrene, aceanthrylene, triphenylene, pyrene,naphthonaphthalene/tetracene, pleiadene, picene or perylene, optionally substituted with one or more heteroatoms such as O, N and/or S, for example xanthene, riboflavin (vitamin B2), mangiferin or mangiferin, and may be the same or different and the same in every respect. In some aspects, when the recognition reagents hybridize to adjacent sequences of the target nucleic acid, each stacks with an aromatic portion of an adjacent recognition reagent;

R1and R2Each independently is: h; containing guanidino groups, e.g.

Figure BDA0002641565620000271

Wherein n is 1,2,3,4, or 5; amino acid side chains such as:

straight or branched chain (C)1-C8) Alkyl radical (C)2-C8) Alkenyl, (C)2-C8) Alkynyl (C)1-C8) Hydroxyalkyl radical, (C)3-C8) Aryl radical, (C)3-C8) Cycloalkyl group, (C)3-C8) Aryl radical (C)1-C6) Alkylene group, (C)3-C8) Cycloalkyl (C)1-C6) Alkylene, optionally substituted with ethylene glycol units comprising 1 to 50 ethylene glycol moieties; -CH2-(OCH2-CH2)qOP1;-CH2-(OCH2-CH2)q-NHP1;-CH2-(SCH2-CH2)q-SP1;-CH2-(OCH2-CH2)r-OH;-CH2-(OCH2-CH2)r-NH2;-CH2-(OCH2-CH2)r-NHC(NH)NH2(ii) a or-CH2-(OCH2-CH2)r-S-S[CH2CH2]sNHC(NH)NH2In which P is1Is H, (C)1-C8) Alkyl radical (C)2-C8) Alkenyl, (C)2-C8) Alkynyl (C)3-C8) Aryl radical, (C)3-C8) Cycloalkyl group, (C)3-C8) Aryl radical (C)1-C6) Alkylene or (C)3-C8) Cycloalkyl (C)1-C6) An alkylene group; q is an integer of 0 to 50; r is an integer from 1 to 50, and s is an integer from 1 to 50; in one aspect, R1And R2In a different sense, R1Is H, R2Is not H, or R2Is H, R1Is not H. For binding to a natural nucleic acid (e.g., RNA or DNA), R1Is H, R2Is not H, thereby forming a "right-handed" L-gamma PNA. R2Is H and R1"left-handed" D-gamma PNAs other than H do not bind to native nucleic acids. In one aspect, the linker comprises about 5 to 25 atoms, such as 5-20, 5-10, e.g., 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16. 17. 18, 19 or 20 atoms, or a total of 1 to 10, for example 1,2,3,4,5, 6, 7, 8, 9 or 10C atoms and heteroatoms, for example O, P, N or S atoms. In one aspect, R1Or R2Is substituted by (C)1-C6) Alkyl groups: - (OCH)2-CH2)qOP1;-(OCH2-CH2)q-NHP1;-(SCH2-CH2)q-SP1;-(OCH2-CH2)r-OH;-(OCH2-CH2)r-NH2;-(OCH2-CH2)r-NHC(NH)NH2(ii) a Or- (OCH)2-CH2)r-S-S[CH2CH2]sNHC(NH)NH2In which P is1Is H, (C)1-C8) Alkyl radical (C)2-C8) Alkenyl, (C)2-C8) Alkynyl (C)3-C8) Aryl radical, (C)3-C8) Cycloalkyl group, (C)3-C8) Aryl radical (C)1-C6) Alkylene or (C)3-C8) Cycloalkyl (C)1-C6) An alkylene group; q is an integer of 0 to 50; r is an integer from 1 to 50 and s is an integer from 1 to 50.

In another aspect, the recognition reagent has the following structure:

Figure BDA0002641565620000281

wherein the content of the first and second substances,

each instance of R is independently a nucleobase, and each instance of R may be the same or different.

Figure BDA0002641565620000282

Wherein n is 1,2,3,4 or 5; amino acid side chains, for example:

Figure BDA0002641565620000283

straight or branched chain (C)1-C8) Alkyl radical (C)2-C8) Alkenyl, (C)2-C8) Alkynyl (C)1-C8) Hydroxyalkyl radical, (C)3-C8) Aryl radical, (C)3-C8) Cycloalkyl group, (C)3-C8) Aryl radical (C)1-C6) Alkylene group, (C)3-C8) Cycloalkyl (C)1-C6) Alkylene, optionally substituted with ethylene glycol units comprising 1 to 50 ethylene glycol moieties; -CH2-(OCH2-CH2)qOP1;-CH2-(OCH2-CH2)q-NHP1;-CH2-(SCH2-CH2)q-SP1;-CH2-(OCH2-CH2)r-OH;-CH2-(OCH2-CH2)r-NH2;-CH2-(OCH2-CH2)r-NHC(NH)NH2(ii) a or-CH2-(OCH2-CH2)r-S-S[CH2CH2]sNHC(NH)NH2In which P is1Is H, (C)1-C8) Alkyl radical (C)2-C8) Alkenyl, (C)2-C8) Alkynyl (C)3-C8) Aryl radical, (C)3-C8) Cycloalkyl group, (C)3-C8) Aryl radical (C)1-C6) Alkylene or (C)3-C8) Cycloalkyl (C)1-C6) An alkylene group; q is an integer of 0 to 50; r is an integer from 1 to 50 and s is an integer from 1 to 50. In one aspect, R1And R2In a different sense, R1Is H, R2Is not H, or R2Is H, R1Is not H.

R4And R5One of, and R6,R7,R8One is-L-R3Wherein R is3Each example of (a) is independently a2 to 5 ring fused polycyclic aromatic moiety, such as a substituted or unsubstituted aryl or heteroaryl moiety having 2 to 5 fused rings, such as, but not limited to, unsubstituted or substituted pentene, indene, naphthalene, azulene, heptene, biphenyl, as-indene, s-indene, acenaphthylene, fluorene, phenalene, phenanthrene, anthracene, fluoranthene, acephenanthrene, aceanthrylene, triphenylene, pyrene,naphthalene/tetracene, pleiadene, picene or perylene, optionally substituted with one or more heteroatoms such as O, N and/or S, for example xanthene, riboflavin (vitamin B2), mangiferin or mangiferin, and may be the same or different and are the same in various respects, and in some examples, each hybridizes to an adjacent sequence of a target nucleic acid, each with R of an adjacent recognition reagent when the recognition reagent hybridizes to the adjacent sequence of the target nucleic acid3Groups are stacked, and wherein L is a linker, e.g., a non-reactive linker or a non-reactive, non-space occupying linker, and each instance of L can be the same or different, and can comprise an amino acid residue, or a substituted or unsubstituted alkyl, substituted or unsubstituted alkenyl, substituted or unsubstituted alkynyl, arylalkyl, arylalkenyl, arylalkynyl, heteroarylalkyl, heteroarylalkenyl, heteroarylalkynyl, heterocyclylalkyl, heterocyclylalkynyl, aryl, heteroaryl, heterocyclyl, cycloalkyl, cycloalkenyl, alkylarylalkyl, alkylarylalkenyl, alkylarenylalkynyl, alkynyl, arylalkenylalkylheteroarylalkenyl, alkylheteroarylalkynyl, alkenylheteroarylalkyl, alkenylheteroarylalkenyl, alkenylheteroarylalkynyl, alkynylheteroarylalkyl, alkynylheteroarylalkenyl, alkynylalkynyl alkynyl, alkylheteroalicyclic alkyl, alkylheteroalicyclic alkenyl, alkylheteroalicyclic alkynyl, alkynylheterocyclylalkyl, alkynylheterocyclenyl, alkynylheterocyclylalkynyl, alkylaryl, alkenylaryl, alkynylaryl, alkylheteroaryl, alkenylheteroaryl, alkynylheteroaryl, where one or more carbon atoms, e.g. methyleneOptionally substituted or unsubstituted aryl, substituted or unsubstituted heteroaryl, substituted or unsubstituted heterocycle, and R is optionally substituted or unsubstituted with a heteroatom such as O, S or N, and R is optionally substituted or unsubstituted4,R5,R6,R7And R8Each independently is: h; one or more contiguous amino acid residues; containing guanidino groups, e.g.

Wherein n is 1,2,3,4 or 5; amino acid side chains, for example:

Figure BDA0002641565620000302

straight or branched chain (C)1-C8) Alkyl radical (C)2-C8) Alkenyl, (C)2-C8) Alkynyl (C)1-C8) Hydroxyalkyl radical, (C)3-C8) Aryl radical, (C)3-C8) Cycloalkyl group, (C)3-C8) Aryl radical (C)1-C6) Alkylene group, (C)3-C8) Cycloalkyl (C)1-C6) Alkylene, optionally substituted with ethylene glycol units comprising 1 to 50 ethylene glycol moieties; -CH2-(OCH2-CH2)qOP1;-CH2-(OCH2-CH2)q-NHP1;-CH2-(SCH2-CH2)q-SP1;-CH2-(OCH2-CH2)r-OH;-CH2-(OCH2-CH2)r-NH2;-CH2-(OCH2-CH2)r-NHC(NH)NH2(ii) a or-CH2-(OCH2-CH2)r-S-S[CH2CH2]sNHC(NH)NH2In which P is1Is H, (C)1-C8) Alkyl radical (C)2-C8) Alkenyl, (C)2-C8) Alkynyl (C)3-C8) Aryl radical, (C)3-C8) CycloalkanesRadical (C)3-C8) Aryl radical (C)1-C6) Alkylene or (C)3-C8) Cycloalkyl (C)1-C6) An alkylene group; q is an integer of 0 to 50; r is an integer from 1 to 50, and s is an integer from 1 to 50.

In one aspect, R4And R7is-L-Ar, and in another aspect, R4And R7is-L-Ar and R5And R8Is Arg. In one aspect, the linker comprises about 5 to 25 atoms, such as 5-20, 5-10, such as 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 atoms, or 1 to 10 total, such as 1,2,3,4,5, 6, 7, 8, 9, or 10C atoms and heteroatoms, such as O, P, N, or S atoms. In another aspect, R1,R2,R4,R5,R6,R7Or R is8Is substituted by (C)1-C6) Alkyl groups: - (OCH)2-CH2)qOP1;-(OCH2-CH2)q-NHP1;-(SCH2-CH2)q-SP1;-(OCH2-CH2)r-OH;-(OCH2-CH2)r-NH2;-(OCH2-CH2)r-NHC(NH)NH2(ii) a Or- (OCH)2-CH2)r-S-S[CH2CH2]sNHC(NH)NH2In which P is1Is H, (C)1-C8) Alkyl radical (C)2-C8) Alkenyl, (C)2-C8) Alkynyl (C)3-C8) Aryl radical, (C)3-C8) Cycloalkyl group, (C)3-C8) Aryl radical (C)1-C6) Alkylene or (C)3-C8) Cycloalkyl (C)1-C6) An alkylene group; q is an integer of 0 to 50; r is an integer from 1 to 50, and s is an integer from 1 to 50.

In another aspect, the recognition reagent comprises a PNA backbone, and thus has the following structure:

wherein the content of the first and second substances,

n is an integer from 1 to 8, including 1,2,3,4,5, 6, 7 or 8;

m is an integer from 1 to 5, such as 1 to 3, including 1,2,3,4, or 5;

R2comprises the following steps: containing guanidino groups, e.g.

Wherein n is 1,2,3,4 or 5; amino acid side chains, for example:

straight or branched chain (C)1-C8) Alkyl radical (C)2-C8) Alkenyl, (C)2-C8) Alkynyl (C)1-C8) Hydroxyalkyl radical, (C)3-C8) Aryl radical, (C)3-C8) Cycloalkyl group, (C)3-C8) Aryl radical (C)1-C6) Alkylene group, (C)3-C8) Cycloalkyl (C)1-C6) Alkylene, optionally substituted with ethylene glycol units comprising 1 to 50 ethylene glycol moieties; -CH2-(OCH2-CH2)qOP1;-CH2-(OCH2-CH2)q-NHP1;-CH2-(SCH2-CH2)q-SP1;-CH2-(OCH2-CH2)r-OH;-CH2-(OCH2-CH2)r-NH2;-CH2-(OCH2-CH2)r-NHC(NH)NH2(ii) a or-CH2-(OCH2-CH2)r-S-S[CH2CH2]sNHC(NH)NH2In which P is1Is H, (C)1-C8) Alkyl radical (C)2-C8) Alkenyl, (C)2-C8) Alkynyl (C)3-C8) Aryl radical, (C)3-C8) Cycloalkyl group, (C)3-C8) Aryl radical (C)1-C6) Alkylene or (C)3-C8) Cycloalkyl (C)1-C6) An alkylene group; q is an integer of 0 to 50; r is an integer from 1 to 50, and s is an integer from 1 to 50;

R3are unsubstituted fused-ring polycyclic aromatic moieties, such as pentene, indene, naphthalene, azulene, heptene, biphenyl, as-indene, s-indene, acenaphthylene, fluorene, phenalene, phenanthrene, anthracene, fluoranthene, acephenanthrene, aceanthrylene, triphenylene, pyrene,

Figure BDA0002641565620000321

naphthalene/naphthacene, obsidian

Wherein n ═ 1,2,3,4, or 5, the amino acid side chain, or one or more consecutive amino acid residues, for example one or more Arg residues. In one aspect, R3Is pyrene.

In another aspect, R2is-CH2-(OCH2-CH2) r-OH, wherein r is an integer from 1-50, such as 1-10, e.g., 1,2,3,4,5, 6, 7, 8, 9, or 10, and in one example is 2. In another aspect, R2、R5、R7Or R8Is substituted by (C)1-C6) Alkyl groups: - (OCH)2-CH2)qOP1;-(OCH2-CH2)q-NHP1;-(SCH2-CH2)q-SP1

-(OCH2-CH2)r-OH;-(OCH2-CH2)r-NH2;-(OCH2-CH2)r-NHC(NH)NH2(ii) a Or- (OCH)2-CH2)r-S-S[CH2CH2]sNHC(NH)NH2In which P is1Is H, (C)1-C8) Alkyl radical (C)2-C8) Alkenyl, (C)2-C8) Alkynyl (C)3-C8) Aryl radical, (C)3-C8) Cycloalkyl group, (C)3-C8) Aryl radical (C)1-C6) Alkylene or (C)3-C8) Cycloalkyl (C)1-C6) An alkylene group; q is an integer of 0 to 50; r is an integer from 1 to 50, and s is an integer from 1 to 50.

Although previous PNA-based recognition reagents have been shown to be achiral, in one example, gamma carbon (R)1And R2Attached thereto) in the (R) direction, wherein R is2Is H and R1Not H, or gamma carbon atoms in the (S) direction, wherein R1Is H and R2Is not H. In one example, when present, at R3,R4,R5,R6,R7And/or R8One or more or all of the chiral amino acid residues at a position may be L-amino acids. In another example, when present, at R3,R4,R5,R6,R7And/or R8In position, one or more or all of the chiral amino acid residues may be D-amino acids.

In any of the above structures, the sequence of the recognition agent can target an expanded repeat sequence, thus exemplary nucleobase sequences include (e.g., in a single recognition agent or when linked) the following: TTC, CCG, CGG, CTG, CAG, CAGG, AGAAT or GGCCCC for targeting mRNA (sense). These sequences are merely exemplary, and any sequential combination of repeated sequences used to target sequences in a repeating element, such as "… CAGCAGCAG …," may be included in a single recognition agent. For example, the sequence CTG will target complementary CAG repeats, but so will TGC and GCT and their dimers, CTGCTG, TGCTGC and GCTGCT. Thus, the following sequences can be used to target expanded repeat sequences in the mRNA shown in table B: TTC, TTCTTC, TCT, TCTTCT, CTT, CTTCTT, CCG, CCGCCG, CGC, CGCCGC, GCC, GCCGCC, CGG, CGGCGG, GCG, GCGGCG, GGC, GGCGGC, CTG, CTGCTG, TGC, TGCTGC, GCT, GCTGCT, CAG, CAGCAG, AGC, AGCAGC, GCA, GCAGCA, CAGG, CAGGCAGG, AGGC, AGGCAGGC, GGCA, GGCAGGCA, GCAG, GCAGGCAG, AGAAT, GAATA, AATA, ATAGA, TAA, GGCCCC, GCCCCG, CCCGG, CCCGGC, CCGGCC and CGGCCC. The foregoing sequences 5 'to 3' are listed antiparallel to the sense mRNA strand, e.g., in the C-terminal to N-terminal direction in γ PNA.

The R groups of the recognition reagents described herein are arranged in a sequence that is complementary to a nucleobase sequence in a template nucleic acid such that the composition will bind to the nucleobase sequence in the template nucleic acid. "template nucleic acid" includes any nucleic acid or nucleic acid analog. When the template is within a cell, it may be a nucleic acid, e.g., DNA or RNA, e.g., a silenced mRNA. If the recognition reagent is assembled in vitro, the template can be a nucleic acid or any analog thereof that allows for specific hybridization to the recognition reagent described herein.

Unless otherwise indicated, the recognition reagents described herein are not described for any particular nucleobase sequence. The present disclosure relates to methods and compositions for attaching the described recognition agents, such as those based on gamma PNA backbones, and is independent of the identity and sequence of the bases attached thereto. It is contemplated that any nucleobase sequence attached to the γ PNA oligomer backbone will hybridize to a complementary nucleobase sequence of a target nucleic acid or nucleic acid analog in a desired, specific manner by watson-crick or watson-crick-like hydrogen bonding. The compositions and methods described herein are sequence independent and describe novel, general methods and related compositions for the assembly of template-directed longer gamma PNA sequences from shorter gamma PNA (precursor) sequences.

The nucleobases of the recognition agents described herein are disposed in a sequence that is complementary to a target sequence of a template nucleic acid, e.g., an mRNA containing an expanded repeat sequence, such that two or more recognition agents as described herein bind and join together base sequences on the template nucleic acid by base pairing, e.g., by watson-crick or watson-crick like base pairing. A non-limiting example of a combination of recognition reagents that can be assembled according to the methods described herein is two recognition reagents, wherein a first recognition reagent has a nucleobase sequence complementary to a first sequence of a template nucleic acid or nucleic acid analog, and a second recognition reagent has a nucleobase sequence complementary to a second sequence on the template that is adjacent to the first sequence on the template, such that the two precursors bind a contiguous series of bases on the template, e.g., as shown in fig. 1. Two or more different recognition reagents can be assembled in this manner, each recognition reagent binding to an adjacent short sequence in a longer contiguous sequence of the template nucleic acid and linking together adjacent recognition reagents. In one example, as shown in figure 1, the recognition reagent has a single nucleobase sequence complementary to a repeat sequence on the template such that two or more identical recognition reagents bind in tandem to consecutive repeat sequences on the template. As described above, recognition agents that will target the above gene products include TTC, CCG, CGG, CTG, CAG, CAGG, AGAAT or GGCCCC based on Table B. In another example, two or more different recognition agents having different sequences can be selected to link unique non-repetitive sequences. For example, two different hexamer recognition agents can be generated to hybridize adjacent to each other over a unique 12 nucleotide sequence present in a target sequence (e.g., mRNA).

As described above, there is provided a method of producing a concatenated nucleic acid or nucleic acid analogue, for example a conformationally pre-organised nucleic acid analogue such as γ PNA, comprising binding a plurality of recognition reagents according to any of the aspects above to a template nucleic acid or nucleic acid analogue. When the terminal aryl groups are close to each other, the composition will be linked. When the aryl groups are close enough that they are stacked, for example as shown in fig. 1, the aryl groups are considered to be close to each other.

In another method, an identifying agent as described herein is introduced into a cell to obtain a therapeutic effect. Various transfection reagents suitable for use in vitro or in vivo are suitable for delivering the compositions described herein to cells, e.g.

Figure BDA0002641565620000341

Or liposome formulations (commercially available from various sources, see Immorino et al, "Stealtliposomes: review of the basic science, ratinale, and clinical applications, existing and potential,", (2006) Int' l J. Nanomedicine1 (3): 297) 316). Once the cover is closedInto the cell, the recognition reagent will be attached to a suitable nucleic acid template, such as native mRNA. When more than one recognition reagent hybridizes to the same template nucleic acid, the terminal aryl groups are brought into proximity with each other, e.g., adjacent to each other in a contiguous sequence, and the recognition reagents will be linked due to pi stacking of the proximal aryl groups. Compounds that do not bind to nucleic acids containing repeated sequences of recognition agent sequences or adjacent sequences complementary to more than one delivered recognition agent will be released from the bound nucleic acids because, for example, the binding strength of the 3 to 8 mer is insufficient to maintain the compound on the bound nucleic acids. When more than one recognition agent binds to a target nucleic acid sequence, such as to adjacent repeat sequences, they will form a linkage of sufficient length to bind the nucleic acid with sufficient strength to remain hybridized to the target sequence to achieve a desired effect, such as gene silencing, where the composition has the sequence of an siRNA, miRNA, mirtron, or similar composition.

In some aspects, methods of treating a patient having a disease listed in table B, such as myotonic dystrophy type 1 (DM1) and myotonic dystrophy type 2 (DM2) or huntington's disease, are provided. The method comprises administering an amount of a recognition agent according to any aspect described herein and which is complementary to the repetitive expansion target sequence in the patient's mRNA effective to knock down expression of mRNA comprising the repetitive expansion target sequence in the patient. For DM1, the target sequence is (CTG) n, so the recognition agent has the sequence: CTG, CTGCTG, TGC, TGCTGC, GCT or GCTGCT. For DM2, the target sequence is (CCTG) n, so the recognition reagent has the sequence: CAGG, CAGGCAGG, AGGC, AGGCAGGC, GGCA, GGCAGGCA, GCAG or GCAGGCAG. For Huntington's disease, the target sequence is (CAG) n, so the recognition agent has the sequence: CTG, CTGCTG, TGC, TGCTGC, GCT or GCTGCT. Sequences complementary to the above sequences can be used to bind to the antisense strand of DNA containing the expanded repeat sequence.

For the treatment of a patient, the identified agent may be administered by any effective route of administration, such as, but not limited to: parenteral administration, e.g., by intravenous, intraperitoneal, intraorgan administration, e.g., delivery to the liver, or intramuscular injection; by inhalation, for example in a spray or aerosol metered dose inhaler; topical, e.g., dermal, transdermal, otic or ophthalmic delivery; transmucosal, e.g., transvaginal or buccal; or orally administered. The compositions may be administered as single doses or in multiple doses over time to maintain reduced expression of the target RNA.

In some aspects, provided herein are pharmaceutical compositions and formulations comprising an identification agent described herein. In one aspect, provided herein is a pharmaceutical composition comprising an identifying agent as described herein and a pharmaceutically acceptable carrier. Pharmaceutical compositions containing the recognition agents can be used to treat diseases, such as repeat expansion diseases, for example, as listed in table B. Such pharmaceutical compositions are formulated based on a mode of delivery. One example is a composition formulated for systemic administration by parenteral delivery, for example by Intravenous (IV) or for subcutaneous delivery. The pharmaceutical composition can be administered in a dose sufficient to treat the disease, for example, by knocking down mRNA expression, such as for repeatedly expanding the disease. Typically, suitable doses of the recognition agent are in the range of about 0.001 to about 200.0 milligrams per kilogram of body weight of the recipient per day, typically in the range of about 1 to 50 milligrams per kilogram of body weight per day. Repeated dosage regimens may include the periodic administration of a therapeutic amount of the identifying agent, for example, once every other day or year. In certain aspects, the identifying agent is administered from about once a month to about once a quarter (e.g., about once every three months). After the initial treatment regimen, the frequency of treatment administration can be reduced. One skilled in the art will recognize that certain factors may affect the dosage and time required to effectively treat a subject, including but not limited to the severity of the disease or disorder, previous treatments, the general health and/or age of the subject, and the presence of other diseases. Moreover, treating a subject with a therapeutically effective amount of the composition can include a monotherapy or a series of therapies. Estimation of effective dosages and in vivo half-lives for each of the identified agents contemplated herein can be performed using conventional methods or based on in vivo testing using appropriate animal models.

The pharmaceutical compositions may be administered in a variety of ways depending on whether local or systemic treatment is desired and the area to be treated. Administration can be topical (e.g., via a transdermal patch), pulmonary (e.g., by inhalation or insufflation of powders or aerosols, including by nebulizer), intratracheal, intranasal, transdermal and epidermal, oral, or parenteral. Parenteral administration includes intravenous, intraarterial, subcutaneous, intraperitoneal or intramuscular injection or infusion; sub-dermally, e.g. by implantation devices; or transcranially, e.g., by intraparenchymal, intrathecal, or intraventricular administration. The recognition agents described above may be delivered in a manner that targets a particular tissue, such as the liver (e.g., hepatocytes of the liver).

Pharmaceutical compositions and formulations for topical administration may include transdermal patches, ointments, lotions, creams, gels, drops, suppositories, sprays, liquids and powders. Conventional pharmaceutical carriers, aqueous, powdered or oily bases, thickeners and the like may be necessary or desirable. Coated condoms, gloves and the like may also be useful. Suitable topical formulations include those in which the recognition agent is mixed with a topical delivery agent such as lipids, liposomes, fatty acids, fatty acid esters, steroids, chelating agents and surfactants. Suitable lipids and liposomes include neutral (e.g., dioleoylphosphatidydope ethanolamine, dimyristoylphosphatidylcholine DMPC, distearoylphosphatidylcholine), negative (e.g., dimyristoylphosphatidylglycerol DMPG) and cationic (e.g., dioleoyltrimethylaminopropyl DOTAP and dioleoylphosphatidylethanolamine DOTMA). The recognition agent may be encapsulated within the liposome or may form a complex therewith, particularly with cationic liposomes. Alternatively, the recognition agent may be complexed with a lipid, particularly a cationic lipid. Suitable fatty acids and esters include, but are not limited to, arachidonic acid, oleic acid, eicosanoic acid, lauric acid, caprylic acid, capric acid, myristic acid, palmitic acid, stearic acid, linoleic acid, linolenic acid, dicaprate, tricaprate, glycerol monooleate, glycerol dilaurate, glycerol 1-monocaprate, 1-dodecylazacycloheptan-2-one, acylcarnitines, acylcholines, or C1-20 alkyl esters (e.g. isopropyl myristate-IPM), mono-or diglycerides; or a pharmaceutically acceptable salt thereof. Topical formulations are described in detail in U.S. patent No. 6747014. The drug and compounding technicians may be as described hereinTo prepare a suitable formulation for delivery of the recognition agent.

In another aspect, a diagnostic method is provided comprising contacting a sample comprising a nucleic acid with a gene recognition reagent as described herein and detecting ligation of the gene recognition reagent in the presence of a nucleic acid comprising a target sequence of the gene recognition reagent. In one aspect, the nucleic acid is obtained from a cell, e.g., a cell of a patient. As described herein, the nucleic acid is contacted with a gene recognition reagent and the fluorescent terminal aryl groups produce different emission wavelengths or emission intensities when the pi are stacked, e.g., linked. Many fused-ring polycyclic aromatic compounds, such as pyrene, are used in this manner for fluorescence detection. Binding of the gene recognition reagent to the target sequence within the nucleic acid sample will produce a detectable fluorescent emission, such that detection of this emission is considered a positive reaction, indicating the presence of nucleic acid comprising the target sequence. This is a simple method of detecting the presence of nucleic acids, including repetitive expansion of sequences, and is indicative of a disease of repetitive expansion in the patient from which the nucleic acid sample was obtained.

In some embodiments, the present disclosure provides a compound comprising: a) a series of Peptide Nucleic Acid (PNA) units, wherein the series of units comprises i) a first unit; ii) a final unit; and iii) at least one intermediate unit between the first unit and the last unit, wherein each unit in the series of units comprises: a) a backbone moiety, wherein: 1) the backbone portion of the first unit is covalently bonded to the backbone portion of another unit; 2) the backbone portion of the last unit is covalently bonded to the backbone portion of the other unit; 3) the backbone portion of each intermediate unit is covalently bonded to the backbone portions of the other two units; b) a first aryl moiety covalently attached to the first unit; and c) a last aryl moiety covalently attached to the last unit.

In some embodiments, the present disclosure provides a compound comprising: a) a PNA unit series, wherein said series of units comprises i) a first unit; ii) a final unit; and iii) at least one intermediate unit between the first unit and the last unit, wherein each unit in the series of units comprises: a) a backbone moiety, wherein: 1) the backbone portion of a first unit is covalently bonded to the backbone portion of another unit; 2) the backbone portion of the last unit is covalently bonded to the backbone portion of the other unit; 3) the backbone portion of each intermediate unit is covalently bonded to the backbone portions of the other two units; and B) a nucleobase covalently bound to said backbone moiety; b) two aryl moieties, one covalently linked to the first unit and the other covalently linked to the last unit.

In some embodiments, the present disclosure provides a compound, wherein the first aryl moiety is covalently linked to the first unit through a first linker moiety and the last aryl moiety is covalently linked to the last unit through a last linker moiety. In some embodiments, the present disclosure provides a compound, wherein the first linker moiety and the last linker moiety each independently comprise a guanidino group. In some embodiments, the present disclosure provides a compound, wherein the first linker moiety and the last linker moiety each independently comprise an amino acid residue. In some embodiments, the present disclosure provides a compound, wherein the first linker moiety and the last linker moiety each independently comprise three guanidine-containing amino acid residues.

In some embodiments, the present disclosure provides a compound, wherein the first linker moiety and the last linker moiety each independently comprise three consecutive arginine residues.

In some embodiments, the present disclosure provides compounds wherein the series of units is3 to 8 units. In some embodiments, the present disclosure provides compounds wherein the series of units is γ -PNA. In some embodiments, the present disclosure provides a compound, wherein the first aryl moiety and the last aryl moiety are each independently a2 to 5 ring fused polycyclic aromatic moiety, e.g., a polyaromatic hydrocarbon such as pyrene. In some embodiments, the present disclosure provides compounds wherein the first aryl moiety and the last aryl moiety are the same. In some embodiments, the present disclosure provides compounds further comprising ethylene glycol units, such as diethylene glycol units.

In some embodiments, the present disclosure provides a compound, wherein the compound presents nucleobases in an order complementary to a target nucleic acid sequence. In some embodiments, the present disclosure provides compounds wherein the target nucleic acid sequence is associated with a repeat expansion disease. In some embodiments, the present disclosure provides compounds wherein the repetitive dilation disease is myotonic dystrophy type 1 (DM1) or myotonic dystrophy type 2 (DM 2). In some embodiments, the present disclosure provides compounds wherein a first aryl moiety of one compound and a last aryl moiety of the other compound are stacked when the two compounds are hybridized to a nucleic acid.

In some embodiments, the present disclosure provides a compound of the formula: h-LArg-CAGCAG-LArg-NH2(P1);H-LArg-LDab(Pyr)-CAGCAG-LOrn(Pyr)-LArg-NH2(P2);H-LArg-LOrn(Pyr)-CAGCAG-LOrn(Pyr)-LArg-NH2(P3);H-LArg-LLys(Pyr)-CAGCAG-LLys(Pyr)-LArg-NH2(P4);H-LArg-LLys(Pyr)-CATCAG-LLys(Pyr)-LArg-NH2(P5); or H-LArg-LLys(Pyr)-CTGCTG-LLys(Pyr)-LArg-NH2(P6) wherein Orn is ornithine, Dab is diaminobutyric acid and Pyr is a carboxy-functionalized aromatic compound, such as pyrene, e.g. pyrene-1-carboxylic acid, pyrene-2-carboxylic acid, pyrene-4-carboxylic acid, pyrene-1-acetic acid, pyrene-2-acetic acid or pyrene-4-acetic acid.

In some embodiments, the present disclosure provides methods of binding nucleic acids, comprising contacting the nucleic acids with the disclosed compounds, wherein the compounds bind to the nucleic acids upon contact. In some embodiments, the present disclosure provides a method of knocking down mRNA expression in a cell, the method comprising contacting the cell with a compound of the disclosure, wherein the compound knockdown mRNA expression in the cell upon binding to a DNA sequence corresponding to the mRNA in the cell.

In some embodiments, the present disclosure provides compounds of the following structure:

Figure BDA0002641565620000391

wherein each B is independently a backbone residue; n is 1,2,3,4,5 or 6; each R is independently a nucleobase; each L is independently a linker; and each Ar is independently a2 to 5 ring fused polycyclic aromatic moiety or a pharmaceutically acceptable salt thereof.

In some embodiments, the present disclosure provides compounds having the structure:

Figure BDA0002641565620000392

wherein each R is independently a nucleobase; n is 1,2,3,4,5 or 6; each L is independently a linker; each R1And R2Independently are: h; a guanidine-containing group; an amino acid side chain; straight or branched chain (C)1-C8) Alkyl, (C)2-C8) Alkenyl, (C)2-C8) Alkynyl, (C)1-C8) Hydroxyalkyl group, (C)3-C8) Aryl group, (C)3-C8) Cycloalkyl group, (C)3-C8) Aryl radical (C)1-C6) Alkylene, (C)3-C8) Cycloalkyl (C)1-C6) Alkylene, optionally substituted with ethylene glycol units comprising 1 to 50 ethylene glycol moieties; -CH2-(OCH2-CH2)qOP1;-CH2-(OCH2-CH2)q-NHP1;-CH2-(SCH2-CH2)q-SP1;-CH2-(OCH2-CH2)r-OH;-CH2-(OCH2-CH2)r-NH2;-CH2-(OCH2-CH2)r-NHC(NH)NH2(ii) a or-CH2-(OCH2-CH2)r-S-S[CH2CH2]sNHC(NH)NH2In which P is1Is H, (C)1-C8) Alkyl radical (C)2-C8) An alkenyl group, which is a radical of an alkenyl group,(C2-C8) Alkynyl (C)3-C8) Aryl radical, (C)3-C8) Cycloalkyl group, (C)3-C8) Aryl radical (C)1-C6) Alkylene or (C)3-C8) Cycloalkyl (C)1-C6) An alkylene group; q is an integer of 0 to 50; r is an integer from 1 to 50, and s is an integer from 1 to 50; each R3Independently a2 to 5 ring fused polycyclic aromatic moiety, or a pharmaceutically acceptable salt thereof.

In some embodiments, the present disclosure provides compounds having the structure:

Figure BDA0002641565620000401

wherein each R is independently a base; n is 1,2,3,4,5 or 6; each R1And R2Independently are: h; a guanidine-containing group; an amino acid side chain; straight or branched chain (C)1-C8) Alkyl, (C)2-C8) Alkenyl, (C)2-C8) Alkynyl, (C)1-C8) Hydroxyalkyl group, (C)3-C8) Aryl group, (C)3-C8) Cycloalkyl group, (C)3-C8) Aryl radical (C)1-C6) Alkylene, (C)3-C8) Cycloalkyl (C)1-C6) Alkylene, optionally substituted with ethylene glycol units comprising 1 to 50 ethylene glycol moieties; -CH2-(OCH2-CH2)qOP1;-CH2-(OCH2-CH2)q-NHP1;-CH2-(SCH2-CH2)q-SP1;-CH2-(OCH2-CH2)r-OH;-CH2-(OCH2-CH2)r-NH2;-CH2-(OCH2-CH2)r-NHC(NH)NH2(ii) a or-CH2-(OCH2-CH2)r-S-S[CH2CH2]sNHC(NH)NH2In which P is1Is H, (C)1-C8) Alkyl radical,(C2-C8) Alkenyl, (C)2-C8) Alkynyl (C)3-C8) Aryl radical, (C)3-C8) Cycloalkyl group, (C)3-C8) Aryl radical (C)1-C6) Alkylene or (C)3-C8) Cycloalkyl (C)1-C6) An alkylene group; q is an integer of 0 to 50; r is an integer from 1 to 50, and s is an integer from 1 to 50; r4Or R5And R, and6、R7or R8is-L-R3And R is4、R5、R6、R7And R8Each independently of the others is H, one or more contiguous amino acid residues, a guanidine-containing group, an amino acid side chain, a straight chain or a branched chain (C)1-C8) Alkyl, (C)2-C8) Alkenyl, (C)2-C8) Alkynyl, (C)1-C8) Hydroxyalkyl group, (C)3-C8) Aryl group, (C)3-C8) Cycloalkyl group, (C)3-C8) Aryl radical (C)1-C6) Alkylene, (C)3-C8) Cycloalkyl (C)1-C6) Alkylene, optionally substituted with ethylene glycol units containing 1 to 50 ethylene glycol moieties, -CH2-(OCH2-CH2)qOP1,-CH2-(OCH2-CH2)q-NHP1,-CH2-(OCH2-CH2)q-SP1,-CH2-(SCH2-CH2)q-SP1,-CH2-(OCH2-CH2)r-OH,-CH2-(OCH2-CH2)r-NH2,-CH2-(OCH2-CH2)r-NHC(NH)NH2or-CH2-(OCH2-CH2)r-S-S[CH2CH2]sNHC(NH)NH2In which P is1Is selected from H, (C)1-C8) Alkyl radical (C)2-C8) Alkenyl, (C)2-C8) Alkynyl (C)3-C8) Aryl radical, (C)3-C8) Cycloalkyl group, (C)3-C8) Aryl radical (C)1-C6) Alkylene and (C)3-C8) Cycloalkyl (C)1-C6) Alkylene groups; q is an integer of 0 to 50; r is an integer from 1 to 50, and s is an integer from 1 to 50; each R3Independently a2 to 5 ring fused polycyclic aromatic moiety; and each L is independently a linker, or a pharmaceutically acceptable salt thereof.

In some embodiments, the present invention provides compounds having the structure:

wherein n is 1,2,3,4,5, 6, 7 or 8; m is 1,2,3,4 or 5; r2The method comprises the following steps: a guanidine-containing group; an amino acid side chain; straight or branched chain (C)1-C8) Alkyl, (C)2-C8) Alkenyl, (C)2-C8) Alkynyl, (C)1-C8) Hydroxyalkyl group, (C)3-C8) Aryl group, (C)3-C8) Cycloalkyl group, (C)3-C8) Aryl radical (C)1-C6) Alkylene, (C)3-C8) Cycloalkyl (C)1-C6) Alkylene, optionally substituted with ethylene glycol units comprising 1 to 50 ethylene glycol moieties; -CH2-(OCH2-CH2)qOP1;-CH2-(OCH2-CH2)q-NHP1;-CH2-(SCH2-CH2)q-SP1;-CH2-(OCH2-CH2)r-OH;-CH2-(OCH2-CH2)r-NH2;-CH2-(OCH2-CH2)r-NHC(NH)NH2(ii) a or-CH2-(OCH2-CH2)r-S-S[CH2CH2]sNHC(NH)NH2In which P is1Is H, (C)1-C8) Alkyl radical (C)2-C8) Alkenyl, (C)2-C8) Alkynyl (C)3-C8) Aryl radical, (C)3-C8) Cycloalkyl group, (C)3-C8) Aryl radical (C)1-C6) Alkylene or (C)3-C8) Cycloalkyl (C)1-C6) An alkylene group; q is an integer of 0 to 50; r is an integer from 1 to 50, and s is an integer from 1 to 50; r3Is pentene, indene, naphthalene, azulene, heptene, biphenyl, as-indene, s-indene, acenaphthylene, fluorene, phenalene, phenanthrene, anthracene, fluoranthene, acephenanthrene, aceanthrylene, triphenylene, pyrene,naphthonaphthalene/tetracene, pleiadene, picene or perylene; and R is5、R7And R8Each of which is independently H, an amino acid side chain, a chain of at least one adjacent amino acid residue, or

Figure BDA0002641565620000412

Wherein n is 1,2,3,4 or 5, or a pharmaceutically acceptable salt thereof.

In some embodiments, the present invention provides a gene recognition reagent comprising: a nucleic acid or nucleic acid analogue backbone having a first end and a second end and having from 3 to 8 ribose-5-phosphate, deoxyribose-5-phosphate or nucleic acid analogue backbone residues; nucleobases, which may be the same or different, linked to a plurality of ribose-5-phosphate, deoxyribose-5-phosphate, or nucleic acid analogous backbone residues; a first aryl moiety linked to said first end of the nucleic acid or nucleic acid analogue backbone by a linker; and optionally a second aryl moiety, identical to the first aryl moiety, linked by a linker to the second end of the nucleic acid or nucleic acid analogue backbone.

In some embodiments, the present invention provides a method of detection comprising a) hybridizing a first probe nucleic acid to a first repeat portion of a target nucleic acid, said first probe nucleic acid comprising a first end linked to a first emitter portion and a second end linked to a second emitter portion; and b) hybridizing a second probe nucleic acid to the second repeating portion of the target nucleic acid, the second probe nucleic acid comprising a first end linked to the third emitter portion and a second end linked to the fourth emitter portion; wherein i) the first and second repeated portions of the target nucleic acid are associated with a repeat expansion disease; ii) binding of the first and second probe nucleic acids to the target indicates that the first or second emitter moiety is proximal to the third or fourth emitter moiety; and iii) the presence of the first or second emitter portion in proximity to said third or fourth emitter portion results in a change of the emission wavelength of the near distance emitter portion.

In some embodiments, the detection method further comprises detecting a change in emission wavelength of the short-range emission portion. In some embodiments, hybridizing the first probe nucleic acid to the first repeat portion increases the affinity of the second probe nucleic acid to the second repeat portion. In some embodiments, the presence of the first or second emitter portion in proximity to the third or fourth emitter portion causes a pi-pi stack interaction between the first or second emitter portion and the third or fourth emitter portion. In some embodiments, the target nucleic acid is obtained directly from a biological sample.

Examples

Peptide Nucleic Acid (PNA) is a randomly folded nucleic acid mimetic comprising a pseudopeptide backbone, which can be pre-organized into right-handed helical motifs, and whose water solubility and biocompatibility can be improved by installing (R) -diethylene glycol (miniPEG, or MP) units on the γ backbone (fig. 6). This molecular fold has an ultra-high affinity and sequence specificity for DNA or RNA. These thermodynamic properties support the development of relatively short MP γ PNA probes for targeted repetitive expansion such as rCUGexpThe feasibility of (3).

Method of producing a composite material

Ultraviolet melting experiment: all UV-fused samples were prepared by mixing MP γ PNA with RNA target at the indicated concentration in physiological simulated buffer (10mM NaPi (sodium phosphate buffer), 150mM KCl, 2mM MgCl 2; pH 7-4) and incubated at 90 ℃ for 5min, then gradually cooled to room temperature. The UV melting curve was collected using an Agilent Kary UV-vis300 spectrometer. The uv melting spectrum was collected by: the UV absorption at 260nm was monitored from 25 ℃ to 95 ℃ during heating and from 95 ℃ to 25 ℃ during cooling, all at a rate of 0.2 ℃/min. The cooling and heating curves were almost identical, indicating that the hybridization process was reversible. The recorded spectra were smoothed using a 20-point adjacent average algorithm. The first derivative of the melting curve is used to determine the melting temperature of the duplex.

Steady state fluorescence measurement: all steady state fluorescence samples were prepared as follows: P4-RNA duplex was prepared by mixing at the indicated concentrations in a simulated physiological buffer and by incubation at 90 ℃ for 5 minutes followed by gradual cooling to 37 ℃. The samples were incubated at 37 ℃ for 1 hour prior to measurement. Steady state fluorescence data were collected at 37 ℃ using a Cary-Eclipse fluorescence spectrometer. (ex350nm andem480nm) with a slit size.

Real-time fluorescence binding kinetics: for real-time fluorescence kinetics experiments, samples were prepared as follows: (1) p4Preparation alone, (2) measurement of Pre-annealed P4、T1And T8(3) adding P at 37 ℃4Adding to T8And (4) the presence of T at equimolar binding sites1In case of (2) P4Adding to T8. Real-time fluorescence data were collected using a Cary-Eclipse fluorescence spectrometer at 37 ℃. (ex 350nm and em 480nm) with slit size.

Competitive binding assay: t6 was purchased from IDT. T48, and r (CUG)96Same, prepared as described above except that the DNA template is (CTG)96. T6 and T48 were annealed by heating to 90 deg.C for 5min and gradually cooling to room temperature, respectively. The probe and RNA target were mixed at the indicated concentrations and incubated in a physiological mimic buffer at 37 ℃ for 4 hours in a silicon-coated Eppendorf tube. The samples were then loaded onto 2% agarose gels containing IX-SYBR gold using a triboronic acid buffer, separated by electrophoresis at 100V for 15 minutes, and the bands observed using a UV transilluminator.

By probe pair T48-MBNL1Was analyzed for destruction of (2): t48 and GST-MBNL1-F1 was prepared as previously described. In a buffer containing 50mM Tris, 50mM NaCl, 50mM KCI and 1mM MgCl2(pH 8.0)In (b), T48 labeled with a radioisotope was incubated at 25 ℃ for 30 minutes. The indicated concentration of P4 was added to T48-MBNL1In a complex. Samples were immediately loaded onto 1.5% agarose gels with 1 Xtriboric acid buffer and separated for 2 hours at 100V electrophoresis. These bands were visualized by fluorescence imaging and quantified using ImageQuant (molecular dynamics).

We synthesized a series of MP γ PNA probes of 6 bases in length, including terminal pyrenes (P2 to P6, FIG. 6), and P1Control, and characterization of binding properties. P2 to P4 included different linker lengths for pyrene to the probe backbone (FIG. 6, Il). P5 and P6 contained corresponding single and double base mismatches aimed at testing recognition specificity. We have chosen a tandem triplet repeat sequence because previous studies have shown that MP γ PNAs of similar length are capable of transiently interacting with RNA targets at physiological temperatures. Pyrene was used as a model compound to promote binding cooperativity due to its expanded aromatic surface and the large reddish shift of the emission spectrum upon dimerization, and Sugiyama and colleagues have successfully demonstrated this in the cooperative binding of polyamides to DNA. The latter photochemical properties provide a convenient means for monitoring probe hybridization and pyrene-pyrene interactions. The monomers were prepared according to published protocols. Probes were synthesized on HMBA resin, purified by RP-HPLC, and verified by MALDI-TOF MS. A series of model RNA targets containing varying numbers of hexamer r (CUGCUG) -repeats were selected for binding studies (FIG. 6 (C)).

All experiments were performed at physiologically relevant ionic strengths (10mM NaPi, 150mM KCI and 2mM MgCl)2pH 7.4). RNA concentrations were prepared so that the number of r (CUGCUG) -repeats in each sample was the same. Preliminary studies showed that lysine produced the highest degree of binding synergy (fig. 7) among the three linker lengths (fig. 6, P2 to P4). Based on this finding, we selected P4 and performed uv-melting studies on different RNA targets. Our results show that the melting transition (Tms) of the P4-RNA sequence increases monotonically with the number of target sites (fig. 8). The pyrene-pyrene interaction was seen from the inverse absorption curves of P4-T6 and P4-T8 at a temperature range of 40-70 ℃. After heating, due to the accurate classification of pyreneBefore dissociation of the proton after further heating, Ae ° →/Am ° → 0.6 relative to the vibrational transition of the monomer, and thus the solvent-repellent effect becomes more pronounced. Comparison of P4-RNA, P1Tms for three sequences, RNA and RNA, a different pattern was found. P4-T of RNA sequencems is positively linearly related, Y66 +2X, and X is the number of binding sites in the target (fig. 9). However, for the last two series, Tms tend to stabilize at X-3, indicating that an increase in the number of binding sites over 3 in the RNA does not necessarily make the corresponding hairpin structure thermodynamically more stable. Unmatched P5 and P6 probes at T of RNAs compared to perfectly matched sequencesmNo significant differences were observed in s (inset). Taken together, these results indicate that P4 binds synergistically and sequence specifically to RNA repeat targets. This phenomenon was also observed with PNA ligand conjugates.

To further confirm these findings, we performed fluorescence measurements under the same conditions. The sample was excited at 340nm and the fluorescence signal was recorded at 345-650 nm. The formation of pyrene-pyrene excimer was characterized by emission at 480nm (FIG. 10). Consistent with the UV melting data, the degree of synergy of P4 binding to RNA increases with the number of binding sites, with a gradual increase in fluorescence intensity at 480 nm. The samples were clearly different under uv irradiation and could therefore be distinguished by the naked eye (fig. 11). Kinetic measurements further showed that hybridization of P4 to T8 was almost complete within 10 minutes (fig. 10, inset). The competing T1 strands were added in equimolar ratio to the binding sites, resulting in a hybridization delay time of 2min, after which complete fluorescence recovery and binding of P4 and T8 was observed. This result indicates that under physiological modeling conditions, the interaction of P4 with single binding site RNA is weak and transient, and complete recovery and binding of the probe to the RNA repeat is achieved within the same time.

To assess the selectivity of recognition of the P4 probe, we performed a competitive binding assay. Normal length T6 and pathogenic T48[ r (CUG).)96Was incubated with different concentrations of P4 at 37 ℃ for 16 hours. The resulting mixture was analyzed on an agarose gel and visualized by SYBR-gold staining. For FIG. 12The examination of (2) showed that P4 is able to distinguish between pathogenic T48 and wt-T12 (compare lane 6 and lane 3). No sign of binding was observed with the single base mismatched P5 probe (compare lane 7 and lane 3). These results indicate that P4 is able to distinguish between expanded T48 transcripts and wt-T6, and that probe binding occurs in a sequence-specific manner.

Next, we determined whether P4 could disrupt rCUGexp-MBNL by performing gel transfer analysis1And (c) a complex. Under physiologically relevant conditions, 5-32P-labelled T48 with MBNL1Incubation to prepare RNA protein complexes. After confirming its binding, P4 was added, and the resulting mixture was incubated at 37 ℃ for 4 hours, and then analyzed by native polyacrylamide gel electrophoresis and autoradiography. T48-MBNL1The formation of the complex can be confirmed from the smear pattern observed in lanes 2 to 4 of figure 13. The addition of P4 resulted in the formation of a displacement band, which became more pronounced with increasing probe concentration (lanes 5 to 7). We used this result as evidence that P4 is able to disrupt rCUGexp-MBNL1Complexes resulting in the formation of T48-P4 heteroduplexes with all MBNLs1The protein is displaced from the RNA transcript. This ability is critical for the interference of DMI disease pathways.

MP γ PNAs are more synthetically flexible than conventional antisense agents (typically in the range of 15-30 nucleotides in length) or shorter versions containing all Locked Nucleic Acids (LNAs). Its structure and chemical function can be easily changed to meet the existing application requirements. Smaller probe sizes provide several distinct benefits for biological and biomedical applications, including easier chemical synthesis and scale-up, and improved recognition specificity and selectivity (and possibly pharmacokinetic properties). Pyrene has good chemical and photophysical properties and was chosen as a model compound inducing intermolecular pi-pi interactions. However, in practical biological and biomedical applications, such aromatic side groups are readily replaced by natural products that are more biologically benign or provide health benefits, such as riboflavin (vitamin B2), mangiferin or mangiferin, all of which promote pi-pi interactions due to aromatic, fused ring structures (lower panel). These are natural antioxidants present in fruits and vegetables and are commonly used as dietary supplements against oxidative stress, inflammation, cancer, aging and other diseases.

This example demonstrates that relatively short nucleic acid probes (two triplet repeats in length, containing terminal aromatic moieties) can confer pathogenicity to rCUGexpIs distinguished from transcripts containing short CUG repeats and is capable of disrupting rCUGexp-MBNL1And (c) a complex. In addition to the advantage of small volume, the modular design, high recognition specificity and selectivity of the MP gamma PNA probe can target RNA repeated expansion, and is not only suitable for rCUGexpBut also to other broad repeat sequences as possible treatments for DM1 and other related neuromuscular and neurodegenerative diseases.

The following numbered items provide non-limiting examples of various aspects of the present invention:

a gene recognition reagent, comprising: a nucleic acid or nucleic acid analog backbone having a first end and a second end and having from 3 to 8 ribose, deoxyribose, or nucleic acid analog backbone residues; nucleobases, which may be the same or different, linked to a plurality of ribose, deoxyribose, or nucleic acid-like backbone residues in a sequence complementary to a target nucleic acid; a first aryl moiety linked to a first end of the nucleic acid or nucleic acid analogue backbone by a linker; and optionally a second aryl moiety identical to the first aryl moiety, linked to the second end of the nucleic acid or nucleic acid analogue backbone by a linker, wherein the aryl moiety is stacked with the aryl moiety of an adjacent recognition reagent when the recognition reagent is hybridized to the adjacent sequence of the target nucleic acid.

The gene identification reagent according to items 2 and 1, which has a structure comprising:

Figure BDA0002641565620000462

wherein the content of the first and second substances,

n is an integer from 1 to 6;

each instance of R is independently a nucleobase, resulting in a sequence of nucleobases, which is optionally complementary to the target nucleic acid;

b is ribose, deoxyribose, or nucleic acid like backbone residue;

l is independently a linker; and

each instance of Ar is independently a2 to 5 ring fused polycyclic aromatic moiety.

The gene discrimination reagent according to item 3, item 1 or item 2, wherein the nucleic acid or nucleic acid analog backbone residue is a nucleic acid analog backbone residue.

The gene identification reagent according to item 4 or 3, wherein the nucleic acid analog backbone residue comprises a conformational pre-organization residue.

The gene recognition reagent according to item 5 or 4, wherein the conformational pre-organization nucleic acid backbone residue is γ PNA, LNA or ethylene glycol nucleic acid backbone residue.

The gene recognition reagent according to item 6 or 4, wherein the conformationally pre-organized nucleic acid analog backbone residue is a γ PNA backbone residue.

The gene recognition reagent of items 7.6, wherein one or more of the γ PNA backbone residues are substituted with a group comprising an ethylene glycol unit having 1 to 100 ethylene glycol residues, such as: - (OCH)2-CH2)qOP1;-(OCH2-CH2)q-NHP1;-(SCH2-CH2)q-SP1;-(OCH2-CH2)r-OH;-(OCH2-CH2)r-NH2;-(OCH2-CH2)r-NHC(NH)NH2(ii) a Or- (OCH)2-CH2)r-S-S[CH2CH2]sNHC(NH)NH2In which P is1Is H, (C)1-C8) Alkyl, (C)2-C8) Alkenyl, (C)2-C8) Alkynyl, (C)3-C8) Aryl group, (C)3-C8) Cycloalkyl group, (C)3-C8) Aryl radical (C)1-C6) Alkylene or (C)3-C8) Cycloalkyl (C)1-C6) An alkylene group; q is an integer of 0 to 50; r is an integer from 1 to 50 and s is an integer from 1 to 50, and optionally through (C)1-C6) The divalent hydrocarbyl linker is attached to one or more γ PNA backbone residues.

Item 8. The gene recognition reagent according to item 4, wherein the backbone residue of the nucleic acid analog of the conformational pre-organization is L- γ PNA.

The gene identification reagent according to any one of items 9, 1 to 8, which has the structure:

Figure BDA0002641565620000481

wherein the content of the first and second substances,

r is independently a nucleobase;

n is an integer between 1 and 6, such as 1,2,3,4,5 or 6;

l is independently a linker;

R1and R2Each attached to a gamma carbon and independently: h; a guanidine-containing group; an amino acid side chain; straight or branched chain (C)1-C8) Alkyl, (C)2-C8) Alkenyl, (C)2-C8) Alkynyl, (C)1-C8) Hydroxyalkyl group, (C)3-C8) Aryl group, (C)3-C8) Cycloalkyl group, (C)3-C8) Aryl radical (C)1-C6) Alkylene, (C)3-C8) Cycloalkyl (C)1-C6) Alkylene, optionally substituted with ethylene glycol units comprising 1 to 50 ethylene glycol moieties; -CH2-(OCH2-CH2)qOP1;-CH2-(OCH2-CH2)q-NHP1;-CH2-(SCH2-CH2)q-SP1;-CH2-(OCH2-CH2)r-OH;-CH2-(OCH2-CH2)r-NH2;-CH2-(OCH2-CH2)r-NHC(NH)NH2(ii) a or-CH2-(OCH2-CH2)r-S-S[CH2CH2]sNHC(NH)NH2In which P is1Is H, (C)1-C8) Alkyl radical (C)2-C8) Alkenyl, (C)2-C8) Alkynyl (C)3-C8) Aryl radical, (C)3-C8) Cycloalkyl group, (C)3-C8) Aryl radical (C)1-C6) Alkylene or (C)3-C8) Cycloalkyl (C)1-C6) An alkylene group; q is an integer of 0 to 50; r is an integer from 1 to 50, and s is an integer from 1 to 50; and

R3independently a2 to 5 ring fused polycyclic aromatic moiety,

or a pharmaceutically acceptable salt thereof.

The gene recognition reagent of items 10, 9, wherein each linker independently comprises one or more guanidino-containing groups, one or more amino acid side chains, or one or more adjacent amino acid residues.

The gene-recognition reagent according to items 11 and 9, wherein each instance of the L includes a first amino acid residue having a side group

Wherein n ranges from 1 to 5 and both the n-terminal and C-terminal arginine residues are attached to each of said first amino acid residues.

The gene-recognizing reagent according to item 12 or 11, wherein n is in the range of 1 to 3.

The gene identification reagent according to any one of items 1 to 12 of item 13, which has the structure:

Figure BDA0002641565620000491

wherein the content of the first and second substances,

r is independently a nucleobase;

n is an integer between 1 and 6, such as 1,2,3,4,5 or 6;

R1and R2Each attached to a gamma carbon and independently: h; a guanidine-containing group; an amino acid side chain; straight or branched chain (C)1-C8) Alkyl, (C)2-C8) Alkenyl, (C)2-C8) Alkynyl, (C)1-C8) Hydroxyalkyl group, (C)3-C8) Aryl group, (C)3-C8) Cycloalkyl group, (C)3-C8) Aryl radical (C)1-C6) Alkylene, (C)3-C8) Cycloalkyl (C)1-C6) Alkylene, optionally substituted with ethylene glycol units comprising 1 to 50 ethylene glycol moieties; -CH2-(OCH2-CH2)qOP1;-CH2-(OCH2-CH2)q-NHP1;-CH2-(SCH2-CH2)q-SP1;-CH2-(OCH2-CH2)r-OH;-CH2-(OCH2-CH2)r-NH2;-CH2-(OCH2-CH2)r-NHC(NH)NH2(ii) a or-CH2-(OCH2-CH2)r-S-S[CH2CH2]sNHC(NH)NH2In which P is1Is H, (C)1-C8) Alkyl radical (C)2-C8) Alkenyl, (C)2-C8) Alkynyl (C)3-C8) Aryl radical, (C)3-C8) Cycloalkyl group, (C)3-C8) Aryl radical (C)1-C6) Alkylene or (C)3-C8) Cycloalkyl (C)1-C6) An alkylene group; q is an integer of 0 to 50; r is an integer from 1 to 50, and s is an integer from 1 to 50;

R4or R5And R6、R7Or R8is-L-R3Wherein R is3Independently is a2 to 5 ring fused polycyclic aromatic moiety, and L is a linker, and R is4、R5、R6、R7And R8Are each independently of the otherThe method comprises the following steps: h; one or more adjacent amino acid residues; containing guanidino group; an amino acid side chain; straight or branched chain (C)1-C8) Alkyl radical (C)2-C8) Alkenyl, (C)2-C8) Alkynyl (C)1-C8) Hydroxyalkyl (C)3-C8) Aryl radical, (C)3-C8) Cycloalkyl group, (C)3-C8) Aryl radical (C)1-C6) Alkylene group, (C)3-C8) Cycloalkyl (C)1-C6) Alkylene, optionally substituted with ethylene glycol units comprising 1 to 50 ethylene glycol moieties; -CH2-(OCH2-CH2)qOP1;-CH2-(OCH2-CH2)q-NHP1;-CH2-(SCH2-CH2)q-SP1;-CH2-(OCH2-CH2)r-OH;-CH2-(OCH2-CH2)r-NH2;-CH2-(OCH2-CH2)r-NHC(NH)NH2(ii) a or-CH2-(OCH2-CH2)r-S-S[CH2CH2]sNHC(NH)NH2In which P is1Is H, (C)1-C8) Alkyl radical (C)2-C8) Alkenyl, (C)2-C8) Alkynyl (C)3-C8) Aryl radical, (C)3-C8) Cycloalkyl group, (C)3-C8) Aryl radical (C)1-C6) Alkylene or (C)3-C8) Cycloalkyl (C)1-C6) An alkylene group; q is an integer of 0 to 50; r is an integer of 1 to 50, and s is an integer of 1 to 50,

or a pharmaceutically acceptable salt thereof.

The gene identification reagent according to claim 14, wherein R is1、R2、R3、R4、R5、R6、R7Or R8Is substituted by (C)1-C6) Alkyl groups: - (OCH)2-CH2)qOP1;-(OCH2-CH2)q-NHP1;-(SCH2-CH2)q-SP1;-(OCH2-CH2)r-OH;-(OCH2-CH2)r-NH2;-(OCH2-CH2)r-NHC(NH)NH2(ii) a Or- (OCH)2-CH2)r-S-S[CH2CH2]sNHC(NH)NH2(ii) a Wherein, P1Is H, (C)1-C8) Alkyl, (C)2-C8) Alkenyl, (C)2-C8) Alkynyl (C)3-C8) Aryl group, (C)3-C8) Cycloalkyl group, (C)3-C8) Aryl radical (C)1-C6) Alkylene or (C)3-C8) Cycloalkyl (C)1-C6) An alkylene group; q is an integer of 0 to 50; r is an integer from 1 to 50, and s is an integer from 1 to 50.

The gene-recognition reagent according to item 15 or 14, wherein R4And R7is-L-R3

Item 16 the gene identification reagent according to any one of items 13 to 15, wherein R5And R8Comprising an arginine residue.

The gene identification reagent according to any one of items 17 to 1 to 16, which has the structure:

n is an integer from 1 to 8;

m is an integer of 1 to 5;

R2attached to gamma carbon and is: containing guanidino group; an amino acid side chain; straight or branched chain (C)1-C8) Alkyl, (C)2-C8) Alkenyl, (C)2-C8) Alkynyl, (C)1-C8) Hydroxyalkyl group, (C)3-C8) Aryl group, (C)3-C8) Cycloalkyl group, (C)3-C8) Aryl radical (C)1-C6) Alkylene, (C)3-C8) Cycloalkyl (C)1-C6) Alkylene, optionally substituted with ethylene glycol units comprising 1 to 50 ethylene glycol moieties; -CH2-(OCH2-CH2)qOP1;-CH2-(OCH2-CH2)q-NHP1;-CH2-(SCH2-CH2)q-SP1;-CH2-(OCH2-CH2)r-OH;-CH2-(OCH2-CH2)r-NH2;-CH2-(OCH2-CH2)r-NHC(NH)NH2(ii) a or-CH2-(OCH2-CH2)r-S-S[CH2CH2]sNHC(NH)NH2In which P is1Is H, (C)1-C8) Alkyl radical (C)2-C8) Alkenyl, (C)2-C8) Alkynyl (C)3-C8) Aryl radical, (C)3-C8) Cycloalkyl group, (C)3-C8) Aryl radical (C)1-C6) Alkylene or (C)3-C8) Cycloalkyl (C)1-C6) An alkylene group; q is an integer of 0 to 50; r is an integer from 1 to 50, and s is an integer from 1 to 50;

R3are unsubstituted fused-ring polycyclic aromatic moieties, such as pentene, indene, naphthalene, azulene, heptene, biphenyl, as-indene, s-indene, acenaphthylene, fluorene, phenalene, phenanthrene, anthracene, fluoranthene, acephenanthrene, aceanthrylene, triphenylene, pyrene,

Figure BDA0002641565620000511

naphthonaphthalene/tetracene, pleiadene, picene or perylene; and

R5、R7and R8Each of which is independently H, a guanidine-containing group such as

Wherein n is 1,2,3,4 or 5, an amino acid side chain or one or more adjacent amino acid residues,

or a pharmaceutically acceptable salt thereof.

The gene-recognition reagent according to item 18, wherein R is1、R2、R4、R4、R5、R6、R7Or R8Is substituted by (C)1-C6) Alkyl groups: - (OCH)2-CH2)qOP1;-(OCH2-CH2)q-NHP1;-(SCH2-CH2)q-SP1;-(OCH2-CH2)r-OH;-(OCH2-CH2)r-NH2;-(OCH2-CH2)r-NHC(NH)NH2(ii) a Or- (OCH)2-CH2)r-S-S[CH2CH2]sNHC(NH)NH2(ii) a Wherein, P1Is H, (C)1-C8) Alkyl, (C)2-C8) Alkenyl, (C)2-C8) Alkynyl (C)3-C8) Aryl group, (C)3-C8) Cycloalkyl group, (C)3-C8) Aryl radical (C)1-C6) Alkylene or (C)3-C8) Cycloalkyl (C)1-C6) An alkylene group; q is an integer of 0 to 50; r is an integer from 1 to 50, and s is an integer from 1 to 50.

The gene-recognition reagent according to claim 19, wherein R2is-CH2-O-CH2-O-CH2-CH2-OH,R8Is H, R5Is Arg-Dab (pyrene) -, Arg-Orn (pyrene) -, or Arg-Lys (pyrene) -; r7is-Dab (pyrene) -Arg, -Orn (pyrene) -Arg or-Lys (pyrene) -Arg, optionally wherein the chiral centers of Arg, Dab, Orn and Lys are L-Arg, LDab, L-Orn and L-Lys.

The gene recognition reagent according to one of items 2 to 19, wherein two examples of the 2-to 5-ring fused polycyclic aromatic moiety are the same.

The gene recognition reagent according to one of items 2 to 20, wherein the 2 to 5 rings are fusedOne or two of the cyclic aromatic moieties are unsubstituted or substituted pentene, indene, naphthalene, azulene, heptene, biphenyl, as-indene, s-indene, acenaphthylene, fluorene, phenalene, phenanthrene, anthracene, fluoranthene, acephenanthrene, aceanthrylene, triphenylene, pyrene,

Figure BDA0002641565620000522

naphthalene/naphthacene, pleiadene, picene or perylene, optionally substituted with one or more heteroatoms such as O, N, P and/or S.

Item 22 the gene recognition reagent according to one of items 2 to 21, wherein one or both of the 2-5 ring fused polycyclic aromatic moieties comprises riboflavin (vitamin B2), mangiferin (mangostin) or mangiferin (mangiferin).

The gene identification reagent according to any one of items 9 to 27, wherein R1And R2Different in two or more gamma carbons.

The gene-recognition reagent according to claim 24, wherein R1Is wherein R is1And R2H in different ones of the two or more gamma carbons.

The gene-recognition reagent according to claim 25, wherein R2Is wherein R is1And R2H in different ones of the two or more gamma carbons.

Item 26 the gene recognition reagent according to any one of items 1 to 25, which comprises a guanidine moiety.

The gene recognition reagent according to any one of items 1 to 26, which comprises a guanidino-containing group

Wherein n is 1,2,3,4 or 5.

The gene identification reagent according to any one of items 9 to 27, wherein R2is-CH2-(OCH2-CH2) r-OH, wherein r is an integer of 1 to 50, an integer of 1 to 10, or 2.

The gene identification reagent according to any one of items 9 to 27, wherein R2is-CH2-O-CH2-CH2-O-CH2-OH, and/or R3Is pyrene.

Item 30 the gene identification reagent according to any one of items 1 to 29, wherein the linker comprises 5 to 25 atoms, or 1 to 10 total C, O, P, N and S atoms in total.

Item 31 the gene recognition reagent according to one of items 1 to 30, wherein the base sequence is completely complementary to a nucleic acid having an expanded repetitive sequence associated with a repetitive expansion disease, such as FRDA, FRAXA, FRAXE, SCA1, SCA2, SCA3(MJD), SCA6, SCA7, SCA17, DRPLA, SBMA, HD, MD1, MD2, FXTAS, SCA8, SCA10, SCA12, HDL2 or ALS.

The gene recognition reagent according to item 31, wherein the expanded repeat sequence has one of the following sequences: (GAA) n, (CGG) n, (CCG) n, (CAG) n, (CTG) n, (CCTG) n, (ATTCT) n, or (GGGGCC) n, wherein n is at least 3.

A method of binding nucleic acids, comprising contacting a nucleic acid having a target sequence with a gene recognition reagent of any one of items 1-32.

Item 34 the method of item 33, wherein the base sequence of the gene recognition reagent is fully complementary to a nucleic acid having an expanded repeat sequence associated with a repetitive expansion disease, such as FRDA, FRAXA, FRAXE, SCA1, SCA2, SCA3(MJD), SCA6, SCA7, SCA17, DRPLA, SBMA, HD, MD1, MD2, FXTAS, SCA8, SCA10, SCA12, HDL2, or ALS.

The method of item 35. the method of item 34, wherein the expanded repeat sequence has one of the following sequences: (GAA) n, (CGG) n, (CCG) n, (CAG) n, (CTG) n, (CCTG) n, (ATTCT) n, or (GGGGCC) n, wherein n is at least 3.

Item 36. a method of knocking down mRNA expression in a cell, comprising contacting a target sequence of the mRNA with a gene recognition agent according to one of items 1 to 32 having a nucleobase sequence complementary to the target sequence.

Item 37 the method of item 36, wherein the base sequence of the gene recognition reagent is fully complementary to a nucleic acid having an expanded repeat sequence associated with a repetitive expansion disease, such as FRDA, FRAXA, FRAXE, SCA1, SCA2, SCA3(MJD), SCA6, SCA7, SCA17, DRPLA, SBMA, HD, MD1, MD2, FXTAS, SCA8, SCA10, SCA12, HDL2, or ALS.

The method of item 38, wherein the expanded repeat sequence has one of the following sequences: (GAA) n, (CGG) n, (CCG) n, (CAG) n, (CTG) n, (CCTG) n, (ATTCT) n, or (GGGGCC) n, wherein n is at least 3.

A method of identifying a target sequence of a nucleic acid in a sample, comprising: contacting a sample comprising a nucleic acid with the gene recognition reagent of item 1, wherein the aryl moiety when not attached to the target sequence produces a first fluorescent emission when exposed to an excitation frequency of light and produces a second fluorescent emission different from the first fluorescent emission when attached to the target sequence when exposed to the excitation frequency of light, and determining the presence of the target sequence in the sample by exciting the fluorescent aromatic moiety and measuring the amount of the second fluorescent signal produced in the sample by the fluorescent aromatic moiety.

The method of clause 40, according to clause 39, wherein the fluorescent aromatic moiety is pyrene.

Item 41 the method of item 39, wherein the base sequence of the gene recognition reagent is fully complementary to a nucleic acid having an expanded repeat sequence associated with a repetitive expansion disease, such as FRDA, FRAXA, FRAXE, SCA1, SCA2, SCA3(MJD), SCA6, SCA7, SCA17, DRPLA, SBMA, HD, MD1, MD2, FXTAS, SCA8, SCA10, SCA12, HDL2, or ALS.

The method of item 42. the method of item 39, wherein the expanded repeat sequence has one of the following sequences: (GAA) n, (CGG) n, (CCG) n, (CAG) n, (CTG) n, (CCTG) n, (ATTCT) n, or (GGGGCC) n, wherein n is at least 3.

A composition, which comprises the gene-recognition reagent according to any one of items 1 to 32 and a pharmaceutically acceptable carrier.

The invention has been described with reference to certain exemplary embodiments, dispersible compositions and uses thereof. Those skilled in the art will recognize, however, that various substitutions, modifications, or combinations of any of the exemplary embodiments can be made without departing from the spirit and scope of the invention. Therefore, the present invention is not limited by the description of the exemplary embodiments.

Sequence listing

<110> university of Kanai Meilong

<120> template-directed nucleic acid targeting compound

<130>6526-1807599

<150>US 62/708,789

<151>2017-12-21

<160>1

<170>PatentIn version 3.5

<210>1

<211>18

<212>RNA

<213> human

<400>1

cugcugcugc ugcugcug 18

54页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:受体抑制剂、包含其的药物组合物及其用途

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!