Method and device for screening isomiR molecular markers

文档序号：1075104 发布日期：2020-10-16 浏览：4次中文

阅读说明：本技术 一种isomiR分子标志物的筛选方法及装置 (Method and device for screening isomiR molecular markers ) 是由桑运霞吴伟静刘强宋青芳阚海亮于 2019-04-04 设计创作，主要内容包括：本发明提供一种isomiR分子标志物的筛选方法,所述方法至少包括以下步骤：利用数据库,获取待检测疾病样本和对应健康样本的isomiR的表达数据；对所述待检测疾病样本和对应健康样本的isomiR的表达数据进行差异表达分析,得到以对应健康样本为对照,待检测疾病样本的isomiR的变化率；根据变化率选取isomiR；对选取的isomiR进行靶基因预测,获得靶基因；对所述靶基因进行功能分析,筛选与待检测疾病相关的靶基因,所述靶基因及其对应的isomiR即为所述待检测疾病的分子标志物。本发明所述方法分析流程思路清晰,其实现方法简单,可广泛应用于生物学研究工作中,也可用于临床相关应用。(The invention provides a screening method of an isomiR molecular marker, which at least comprises the following steps: acquiring expression data of isomiR of a disease sample to be detected and a corresponding health sample by using a database; carrying out differential expression analysis on the expression data of the isomiR of the disease sample to be detected and the corresponding healthy sample to obtain the change rate of the isomiR of the disease sample to be detected by taking the corresponding healthy sample as a control; selecting isomiR according to the change rate; performing target gene prediction on the selected isomiR to obtain a target gene; and performing functional analysis on the target gene, and screening the target gene related to the disease to be detected, wherein the target gene and the corresponding isomiR are the molecular markers of the disease to be detected. The method has clear analysis flow thought and simple implementation method, and can be widely applied to biological research work and clinical related application.)

1. A method of screening for isomiR molecular markers, the method comprising at least the steps of:

s1: acquiring expression data of isomiR of a disease sample to be detected and a corresponding health sample by using a database;

s2: carrying out differential expression analysis on the expression data of the isomiR of the disease sample to be detected and the corresponding healthy sample to obtain the change rate of the isomiR of the disease sample to be detected by taking the corresponding healthy sample as a control;

s3: selecting isomiR according to the change rate;

s4: performing target gene prediction on the isomiR selected in the S3 to obtain a target gene;

s5: and performing functional analysis on the target gene, and screening the target gene related to the disease to be detected, wherein the target gene and the corresponding isomiR are the molecular markers of the disease to be detected.

2. The method of screening for isomiR molecular markers of claim 1, further comprising one or more of the following features:

a. in step S1, the database is selected from GEO databases, and/or the method for obtaining expression data of isomiR includes the following steps: acquiring small RNA sequencing original data of a disease sample to be detected and a corresponding healthy sample, and comparing, quantifying and annotating to obtain expression data of isomiR;

b. in step S2, the differential expression analysis method is selected from a fold method, preferably, the fold method is selected from a 1.5-fold or 2-fold difference fold;

c. in step S3, selecting isomiR according to the change rate means that the result obtained in step S2 is subjected to change rate ranking, and the isomiR ranked before the change rate ranking is selected;

d. in step S4, a target gene prediction website or prediction software is used as the predicted target gene, preferably, the prediction software is selected from miRanda;

e. in step S5, the functional analysis includes pathway analysis and/or construction of a regulatory network;

f. the disease to be detected is selected from a tumor disease.

3. The method of screening for isomiR molecular markers of claim 2, further comprising one or more of the following features:

g. in the characteristic a, in the step S1, the method for acquiring small RNA sequencing original data of a disease sample to be detected and a corresponding healthy sample comprises the following steps:

s1.1, obtaining sequencing data of a disease sample to be detected and isomiR corresponding to a healthy sample;

s1.2, obtaining SRA data link of the original sequencing data;

s1.3, downloading required sequencing original data in batch by using aspera;

s1.4, converting the data obtained in the step into a fastq format;

h. in the characteristic b, the differential expression analysis method obtains differential expression isomiR after correcting P-value by adopting a Benjamini-Hochberg method, an FDR method or a Bonforoni method;

i. in the characteristic e, performing path analysis based on clusterProfile;

j. in feature f, the neoplastic disease is selected from myeloma.

4. The method for screening of isomiR molecular markers according to claim 3, wherein in feature i, the pathway analysis is performed using the KEGG database.

5. The method for screening isomiR molecular markers according to claim 1, wherein, in step S2, the method further comprises filtering the isomiR expression data of the disease sample to be detected and the corresponding healthy sample before the differential expression analysis.

6. A screening device for isomiR molecular markers, comprising at least:

the acquisition module is used for acquiring the expression data of the isomiR of the disease sample to be detected and the corresponding health sample by using the database;

the differential expression analysis module is used for carrying out differential expression analysis on the expression data of the isomiR of the disease sample to be detected and the corresponding healthy sample to obtain the change rate of the isomiR of the disease sample to be detected by taking the corresponding healthy sample as a control;

a selection module for selecting isomiR according to the rate of change;

the target gene prediction module is used for performing target gene prediction on the isomiR selected in the selection module to obtain a target gene;

and the function analysis module is used for carrying out function analysis on the target genes and screening out the target genes related to the diseases to be detected, wherein the target genes and the corresponding isomiR are the molecular markers of the diseases to be detected.

7. The screening device for isomiR molecular markers of claim 6, further comprising one or more of the following features:

a. in the acquisition module, the database is selected from a GEO database; and/or, the method for obtaining the expression data of the isomiR comprises the following steps: acquiring small RNA sequencing original data of a disease sample to be detected and a corresponding healthy sample, and comparing, quantifying and annotating to obtain expression data of isomiR;

b. in the differential expression analysis module, the differential expression analysis method is selected from a fold method, preferably, the fold method is selected from 1.5-fold or 2-fold difference fold;

c. in the selection module, the selecting of the isomiR according to the change rate means that the result obtained in the step S2 is subjected to change rate ranking, and the isomiR ranked before the change rate ranking is selected;

d. in the target gene prediction module, a target gene prediction website or prediction software is used as a prediction target gene, preferably, the prediction software is selected from miRanda;

e. in the function analysis module, the function analysis comprises the pathway analysis and/or the construction of a regulation network;

f. the disease to be detected is selected from a tumor disease.

8. The screening device for isomiR molecular markers of claim 7, further comprising one or more of the following features:

g. in the characteristic a, the method for acquiring the expression data of the isomiR of the disease sample to be detected and the corresponding health sample in the acquisition module comprises the following steps:

s1.1, obtaining sequencing data of a disease sample to be detected and isomiR corresponding to a healthy sample;

s1.2, obtaining SRA data link of the original sequencing data;

s1.3, downloading required sequencing original data in batch by using aspera;

s1.4, converting the data obtained in the step into a fastq format;

i. in the characteristic e, performing path analysis based on clusterProfile;

j. in feature f, the neoplastic disease is selected from myeloma.

9. The screening apparatus for isomiR molecular markers according to claim 8, wherein in feature i, the pathway analysis is performed using KEGG database.

10. The screening apparatus for isomiR molecular markers according to claim 6, wherein the differential expression analysis module further comprises a filter for the isomiR expression data of the disease sample to be tested and the corresponding healthy sample before the differential expression analysis.

11. A computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the method for screening of isomiR molecular markers according to any one of claims 1-5.

12. A computer processing device comprising a processor and the aforementioned computer readable storage medium, the processor executing a computer program on the computer readable storage medium to perform the steps of the method for screening of isomiR molecular markers according to any one of claims 1-5.

13. An electronic terminal, comprising: a processor, a memory, and a communicator; the memory is used for storing a computer program, the communicator is used for being in communication connection with an external device, and the processor is used for executing the computer program stored by the memory so as to enable the terminal to execute the method for screening the isomiR molecular marker according to any one of claims 1-5.

14. Use of any one of the method of screening for isomiR molecular markers of any one of claims 1-5, the apparatus for screening for isomiR molecular markers of claims 6-10, the computer-readable storage medium of claim 11, the computer processing device of claim 12, or the electronic terminal of claim 13 in one or more of a biological targeted therapy system, a pathogenic mechanism system, and a pathogenic risk prediction system.

Technical Field

The invention relates to the field of genetic engineering, in particular to a method and a device for screening isomiR molecular markers.

Background

mirnas are important regulators of life activities, and it was originally generally thought that one miRNA gene can only form one mature miRNA. However, recent studies have found that one miRNA gene can actually form a plurality of miRNA isoforms (isoforms of mirnas, isomurs) having differences in length or sequence.

In recent years, research on the explosive increase of isomiR indicates that many isomiRs are abnormally expressed in serum or plasma of cancer patients, possibly participate in the generation and development of tumors as cancer suppressor genes or cancer promoter genes, can circulate in body fluid in a stable form, and have certain advantages in clinical diagnosis. The expression of IsomiR is cell and tissue specific, disease state specific, and has the ability to participate in cellular stress responses. The pathogenesis of many diseases is associated with changes in isomiR or isomiR expression and may be a marker for disease diagnosis or a target for therapy. The isomiR sequence can bind to the target mRNA, thereby exerting a further biological effect.

The wide application of sequencing technology in recent years provides advantages for the discovery of isomiR and the research on the functions of isomiR, and provides a data base for detecting the sequence characteristics and verifying the expression pattern and biological functions of isomiR. How to effectively carry out data mining on related sample data of a public database and screen an isomiR molecular marker related to diseases is an application difficulty of isomiR.

Disclosure of Invention

In view of the above-mentioned drawbacks of the prior art, the present invention aims to provide a method and an apparatus for screening isomiR molecular markers.

To achieve the above and other related objects, a first aspect of the present invention provides a method for screening an isomiR molecular marker, the method comprising at least the steps of:

s1: acquiring expression data of isomiR of a disease sample to be detected and a corresponding health sample by using a database;

s3: selecting isomiR according to the change rate;

s4: performing target gene prediction on the isomiR selected in the S3 to obtain a target gene;

The second aspect of the present invention provides an apparatus for screening isomiR molecular markers, comprising at least:

the acquisition module is used for acquiring the expression data of the isomiR of the disease sample to be detected and the corresponding health sample by using the database;

a selection module for selecting isomiR according to the rate of change;

the target gene prediction module is used for performing target gene prediction on the isomiR selected in the selection module to obtain a target gene;

A third aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the aforementioned screening method for isomiR molecular markers.

In a fourth aspect, the present invention provides a computer processing device, which includes a processor and the aforementioned computer readable storage medium, wherein the processor executes a computer program on the computer readable storage medium to implement the steps of the aforementioned screening method for isomiR molecular markers.

A fifth aspect of the present invention provides an electronic terminal, comprising: a processor, a memory, and a communicator; the memory is used for storing a computer program, the communicator is used for being in communication connection with an external device, and the processor is used for executing the computer program stored by the memory so as to enable the terminal to execute the screening method of the isomiR molecular marker.

The sixth aspect of the present invention provides the use of the aforementioned screening method for isomiR molecular markers, screening apparatus for isomiR molecular markers, computer-readable storage medium, computer processing device, or electronic terminal in one or more of a biological targeted therapy system, a pathogenic mechanism system, and a pathogenic risk prediction system.

As described above, the method and apparatus for screening isomiR molecular markers according to the present invention have the following advantageous effects:

the method and the device for screening the isomiR molecular markers provided by the invention are used for analyzing and processing isomiR expression data by a bioinformatics method based on public data resources and identifying the isomiR related to myeloma. The invention finds the isomiR and a plurality of risk genes related to the myeloma and has important significance for myeloma biological targeted therapy, pathogenic mechanism explanation, risk prediction and the like. The invention can solve the problems that the existing network resources are not adept to be integrated, the most common database and the frontier analysis method related to the isomiR are not familiar, and bioinformatics analysis related to the isomiR cannot be independently completed. The invention adopts rich and various bioinformatics means, integrates public network resources with strong authority and high popularity, establishes a set of complete front-edge analysis flow, can perform systematic and comprehensive functional analysis on isomiR high-throughput data and finds myeloma-related isomiR molecular markers. The high-throughput data of the public database can be effectively utilized, the scientific research cost is reduced, and the analysis efficiency is improved. The analysis process has clear thought, and the implementation method is simple, and can be widely applied to biological research work and clinical related application.

Drawings

FIG. 1 is a flow chart of a method for screening isomiR molecular markers according to an embodiment of the present invention.

FIG. 2 is a schematic diagram of a screening apparatus for isomiR molecular markers according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of an electronic terminal according to an embodiment of the invention.

FIG. 4 shows a schematic diagram of a network in which myeloma isomiR is associated with target gene top 1.

FIG. 5 shows a schematic diagram of a network relating myeloma isomiR to target gene top 2.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention.

Please refer to fig. 1 to 5. It should be noted that the drawings provided in the present embodiment are only for illustrating the basic idea of the present invention, and although the drawings only show the components related to the present invention and are not drawn according to the number, shape and size of the components in the actual implementation, the type, quantity and proportion of the components in the actual implementation may be changed arbitrarily, and the layout of the components may be more complicated.

As shown in fig. 1, the method for screening isomiR molecular markers provided by the present invention is presented, and the method at least comprises the following steps:

s1: acquiring expression data of isomiR of a disease sample to be detected and a corresponding health sample by using a database;

s3: selecting isomiR according to the change rate;

s4: performing target gene prediction on the isomiR selected in the S3 to obtain a target gene;

In one embodiment, in step S1, the database is selected from GEO databases.

In one embodiment, in step S1, the method for obtaining expression data of isomiR includes the following steps: acquiring small RNA sequencing original data of a disease sample to be detected and a corresponding healthy sample, and comparing, quantifying and annotating to obtain expression data of isomiR.

In one embodiment, in step S2, the differential expression analysis method is selected from the fold method, preferably, the fold method is selected from the group consisting of 1.5 fold or 2 fold difference.

In one embodiment, the selection of the isomalt according to the change rate in step S3 means that the result obtained in step S2 is ranked according to the change rate, and the isomalt with the top rank is selected.

In one embodiment, the differentially expressed isomicrs are sorted by fold difference absolute, with larger ranks being the earlier, and a certain number of isomicrs are screened as the isomicrs to be studied.

In one embodiment, in step S4, a target gene prediction website or prediction software is used as the predicted target gene, preferably, the prediction software is selected from miRanda.

In one embodiment, in step S5, the functional analysis includes pathway analysis and/or construction of a regulatory network.

In one embodiment, the disease to be detected is selected from a neoplastic disease.

In one embodiment, the method for obtaining the expression data of isomiR of the disease sample to be detected and the corresponding healthy sample in step S1 comprises the following steps:

s1.1, obtaining sequencing data of a disease sample to be detected and isomiR corresponding to a healthy sample;

s1.2, obtaining SRA data link of the original sequencing data;

s1.3, downloading required sequencing original data in batch by using aspera;

and S1.4, converting the data obtained in the step into a fastq format.

In one embodiment, fastq of the raw data is aligned to the reference genome, then annotated to obtain isomiRs, and then quantified to obtain expression values for isomiRs.

In one embodiment, the differential expression analysis method obtains the differentially expressed isomiR after correcting the P-value by a Benjamini-Hochberg method, an FDR method or a Bonforoni method.

In one embodiment, the pathway analysis is performed based on clusterProfiler.

In one embodiment, the neoplastic disease is selected from myeloma.

In one embodiment, the pathway analysis is performed using a KEGG database.

In one embodiment, step S2 further includes filtering the expression data of isomiR of the disease sample to be detected and the corresponding healthy sample before the differential expression analysis.

In one embodiment, filtering comprises removing the adaptor and then removing sequences with greater than 20% of the reads having a mass value of less than 20.

In one embodiment, the desired research data is downloaded using aspera software under windows or linux.

In one embodiment, at the R-plateau, the misdiscovery rate correction is performed on the results of isomiR. The Benjamini-Hochberg, FDR and Bonferroni methods can be used.

In one embodiment, binding site prediction is performed on isomiR by the prediction software miRanda.

The GEO database is a gene expression database created and maintained by the national center for biotechnology information NCBI. It was created in 2000, and it included high throughput gene expression data submitted by research institutes of all countries in the world, and the data of gene expression detection related to the paper can be found in this database in published papers, and the data volume is increased year by year.

The aspera is a high-speed download software recommended at NCBI for downloading large data amount sequencing data.

The fastq format is a data format required by general sequencing analysis.

MiRanda process

miRanda was an isomiR target prediction software developed by Enright et al in 2003. The core idea of miRanda is mainly based on base complementarity, similar to the Smith-Waterman algorithm, but an improvement is made to the principle of base pairing, allowing mismatches between G-U. Considering that there is a characteristic that requires a high degree of matching at the 5 'end when isomiR binds to the target site, the software corrects the score of 11 bases at the 5' end using scale parameters. For binding energy calculation, miRanda calculates the binding energy between isomiR-target sequences based on the rnaiib program in the vienna rna package. For the case where multiple isomiR target the same locus, miRanda uses a greedy algorithm to choose the result with the highest score and the lowest binding energy.

In one embodiment of the invention, annotation and enrichment analysis of target genes from KEGG aspects was performed using the clusterProfiler package on the R platform.

KEGG database

KEGG was established in 1995 by Kanehisa laboratory of bioinformatics center of kyoto university, japan. Is one of the most commonly used biological information databases in the world, is known as "understanding the high-level function and utility resource library of biological systems", and is also the most widely used and authoritative database in the field of metabolic analysis. They are roughly classified into three major categories, namely, system information, genome information and chemical information. Further, it can be subdivided into 16 main databases. For example, genomic information is stored in the GENES database, including complete and partially sequenced genomic sequences; higher functional information is stored in the PATHWAY database, and comprises information of the illustrated cell biochemical processes such as metabolism, membrane transport, signal transmission and cell cycle, and homologous conserved sub-paths and the like; another database of KEGG, LIGAND, contains information about chemicals, enzyme molecules, enzyme reactions, etc.

In one embodiment of the invention, a network file containing regulatory network information of the isomiR and the gene is generated in combination with the differentially expressed isomiR after obtaining the pathway-related information of the gene. Can be opened by using Cytoscape software and is graphically displayed.

As shown in fig. 2, a screening apparatus for an isomiR molecular marker according to an embodiment of the present invention is shown, the apparatus at least including:

the acquisition module is used for acquiring the expression data of the isomiR of the disease sample to be detected and the corresponding health sample by using the database;

a selection module for selecting isomiR according to the rate of change;

the target gene prediction module is used for performing target gene prediction on the isomiR selected in the selection module to obtain a target gene;

In one embodiment, in the acquisition module, the database is selected from the GEO database.

In one embodiment, in the obtaining module, the method for obtaining the expression data of the isomiR includes the following steps: acquiring small RNA sequencing original data of a disease sample to be detected and a corresponding healthy sample, and comparing, quantifying and annotating to obtain expression data of isomiR.

In one embodiment, in the differential expression analysis module, the differential expression analysis method is selected from a fold method, preferably, the fold method is selected from a 1.5 fold or 2 fold difference;

in an embodiment, in the selecting module, the selecting the isomiR according to the change rate means that the result obtained in step S2 is subjected to change rate ranking, and the isomiR ranked before the change rate ranking is selected;

in one embodiment, the target gene prediction module uses a target gene prediction website or prediction software as the predicted target gene, preferably, the prediction software is selected from miRanda;

in one embodiment, in the functional analysis module, the functional analysis comprises pathway analysis and/or construction of a regulatory network;

in one embodiment, the disease to be detected is selected from a neoplastic disease.

In one embodiment, in the obtaining module, the method for obtaining the expression data of isomiR of the disease sample to be detected and the corresponding health sample comprises the following steps:

s1.1, obtaining sequencing data of a disease sample to be detected and isomiR corresponding to a healthy sample;

s1.2, obtaining SRA data link of the original sequencing data;

s1.3, downloading required sequencing original data in batch by using aspera;

and S1.4, converting the data obtained in the step into a fastq format.

In one embodiment, fastq of the raw data is aligned to the reference genome, then annotated to obtain isomiRs, and then quantified to obtain expression values for isomiRs.

In one embodiment, the differential expression analysis method obtains the differential expression isomiR after correcting P-value by adopting a Benjamini-Hochberg method, an FDR method or a Bonforoni method;

in one embodiment, the pathway analysis is performed based on clusterProfiler.

In one embodiment, the neoplastic disease is selected from myeloma.

In one embodiment, the pathway analysis is performed using a KEGG database.

In one embodiment, the differential expression analysis module further comprises a step of filtering the expression data of the isomiR of the disease sample to be detected and the corresponding health sample before the differential expression analysis.

In one embodiment, the desired research data is downloaded using aspera software under windows or linux.

In one embodiment, at the R-plateau, the misdiscovery rate correction is performed on the results of isomiR. The Benjamini-Hochberg, FDR and Bonferroni methods can be used.

In one embodiment, binding site prediction is performed on isomiR by the prediction software miRanda.

The aspera is a high-speed download software recommended at NCBI for downloading large data amount sequencing data.

The fastq format is a data format required by general sequencing analysis.

In one embodiment of the invention, annotation and enrichment analysis of target genes from KEGG aspects was performed using the clusterProfiler package on the R platform.

In one embodiment, filtering comprises removing the adaptor and then removing sequences with greater than 20% of the reads having a mass value of less than 20.

Since the principle of the apparatus in this embodiment is basically the same as that of the foregoing method embodiment, in the foregoing method and apparatus embodiment, the definitions of the same features, the calculation method, the enumeration of the embodiments, and the enumeration and description of the preferred embodiments may be used interchangeably, and are not repeated again.

It should be noted that the division of the modules of the above apparatus is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. These modules may all be implemented in software invoked by a processing element; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. For example, the obtaining module may be a processing element that is set up separately, or may be implemented by being integrated in a certain chip, or may be stored in a memory in the form of program code, and the certain processing element calls and executes the functions of the obtaining module. Other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.

For example, the above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when one of the above modules is implemented in the form of a Processing element scheduler code, the Processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. For another example, these modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).

In some embodiments of the present invention, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the aforementioned screening method for isomiR molecular markers.

In some embodiments of the present invention, there is also provided a computer processing device, including a processor and the aforementioned computer readable storage medium, wherein the processor executes a computer program on the computer readable storage medium to implement the steps of the aforementioned screening method for isomiR molecular markers.

In some embodiments of the present invention, there is also provided an electronic terminal, including: a processor, a memory, and a communicator; the memory is used for storing a computer program, the communicator is used for being in communication connection with an external device, and the processor is used for executing the computer program stored by the memory so as to enable the terminal to execute the screening method for realizing the isomiR molecular marker.

As shown in fig. 3, a schematic diagram of an electronic terminal provided by the present invention is shown. The electronic terminal comprises a processor 31, a memory 32, a communicator 33, a communication interface 34 and a system bus 35; the memory 32 and the communication interface 34 are connected with the processor 31 and the communicator 33 through a system bus 35 and are used for achieving mutual communication, the memory 32 is used for storing computer programs, the communicator 34 and the communication interface 34 are used for communicating with other devices, and the processor 31 and the communicator 33 are used for operating the computer programs so that the electronic terminal can execute the steps of the image analysis method.

The above-mentioned system bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The system bus may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus. The communication interface is used for realizing communication between the database access device and other equipment (such as a client, a read-write library and a read-only library). The memory may include a Random Access Memory (RAM), and may further include a non-volatile memory (non-volatile memory), such as at least one disk memory.

The processor may be a general-purpose processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the integrated circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, or discrete hardware components.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the above method embodiments may be performed by hardware associated with a computer program. The aforementioned computer program may be stored in a computer readable storage medium. When executed, the program performs steps comprising the method embodiments described above; the computer-readable storage medium may include, but is not limited to, floppy diskettes, optical disks, CD-ROMs (compact disc-read only memories), magneto-optical disks, ROMs (read-only memories), RAMs (random access memories), EPROMs (erasable programmable read only memories), EEPROMs (electrically erasable programmable read only memories), magnetic or optical cards, flash memory, or other type of media/machine-readable medium suitable for storing machine-executable instructions. The computer readable storage medium may be a product that is not accessed by the computer device or may be a component that is used by an accessed computer device.

In particular implementations, the computer programs are routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types.

The invention also provides the application of the screening method of the isomiR molecular marker, the screening device of the isomiR molecular marker, the computer readable storage medium, the computer processing equipment or the electronic terminal in one or more of a biological targeted therapy system, a pathogenic mechanism system and a pathogenic risk prediction system.

Examples of the invention

Firstly, SRA conversion fastq is carried out on original data, filtering processing is carried out, then low-quality data are removed, other small RNAs are filtered, effective isomiR standardized expression values are obtained, and then annotation is carried out. Based on the results of the isomiR difference analysis, the target gene prediction can be performed based on the sequence characteristics thereof. On the basis of the above analysis, a series of statistical and visual analyses can be performed.

IsomiR annotation files are shown in Table 1

An analysis platform: r platform

Analysis software: bag of isomicrs

Table 1:

seq	name	freq	mir	start	end	mism	add	t5	t3
										AAAGGCGGGAGAAGCCCCGGC	seq_100022_x1	1	hsa-miR-4484	65	82	0	GGC	a	a
TGAGGTAGTAGTTTGTACAGTTAGA	seq_100036_x2	2	hsa-let-7g-5p	5	26	0	AGA	0	0
										CATAAAGTAGAAAGCACT	seq_100064_x1002	1002	hsa-miR-142-5p	16	33	0	0	0	act
CGGCCCGGGCTGCTGCTGTTC	seq_100088_x1	1	hsa-miR-1538	39	59	0	0	0	ct
										AACATTCAACGCTGTCGGTGAT	seq_100091_x51	51	hsa-miR-181a-5p	39	59	0	T	0	gt

explanation of column names:

seq:sequence

freq/name:depending on the input this column contains counts(tabularinput file)or name(fasta file)

mir:miRNA name

start:start of the sequence at the precursor

end:end of the sequence at the precursor

mism:nucleotide substitution position|nucleotide at sequence|nucleotide at precursor

add:nucleotides at 3end added

t5:nucleotides at 5end different from the annonated sequence inmiRBase

t3:nucleotides at 3end different from the annotated sequence inmiRBase

2. the differentially expressed isomiR results are shown in table 2

An analysis platform: r platform

Analysis software: DESeq2 bag

Table 2:

row	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
							hsa-miR-4485-3p.t5:GTT.t3:taa.ad:0	86.2160	6.6277	1.0032	6.6066	3.9326E-11	1.8981E-08
hsa-miR-6503-5p.t5:0.t3:a.ad:T	19.2697	5.4067	1.1283	4.7919	1.6519E-06	1.0631E-04
							hsa-miR-223-3p.t5:0.t3:0.ad:TG	33.6162	5.3714	0.9043	5.9400	2.8502E-09	6.8785E-07
hsa-miR-27a-3p.t5:0.t3:cgc.ad:GGC	18.6298	5.3470	1.1345	4.7131	2.4401E-06	1.4133E-04
							hsa-miR-27b-3p.t5:0.t3:tgc.ad:GGC	18.6298	5.3470	1.1345	4.7131	2.4401E-06	1.4133E-04

explanation of column names:

row：the isomiR name

baseMean:base means across samples

log2FoldChange:log2 ratio of theat vesus control

lfcSE:gives the standard error of the log2FoldChange

stat:For the Wald test,stat is the Wald statistic:the log2FoldChangedivided by lfcSE,which is

compared to a standard Normal distribution to generate a two-tailedpvalue

pvalue:pvalue of the statistics

padj:adjusted pvalue

target Gene prediction and screening of IsomiR

The target gene prediction of isomiR is predicted by using the miRanda algorithm.

The results show that:

the miRanda algorithm is a method based on the site binding free energy and sequence complementary pairing scores. The default parameters used strict seed sequence complementary pairing, score greater than 140 points, and minimum free energy of-20 KJ/mol.

An analysis platform: linux platform

The results show that:

table 3 miRanda results:

Seq1	Seq2	Tot Score	Tot Energy	Max Score	Max Energy	Len1	Len2	Positions
									hsa-miR-186-5p.t5:c.t3:t.ad: 0	ENST00000536792.5_CDK8::chr13:26401269- 26401347(+)	182	-21.5	182	-21.5	20	79	27
hsa-miR-186-5p.t5:c.t3:t.ad: 0	ENST00000352483.3_RIPK4::chr21:41739369- 41740837(-)	181	-22.53	181	-22.53	20	1469	662
									hsa-miR-186-5p.t5:c.t3:t.ad: 0	ENST00000332512.7_RIPK4::chr21:41739369- 41740837(-)	181	-22.53	181	-22.53	20	1469	662
hsa-miR-186-5p.t5:c.t3:0.ad: 0	ENST00000536792.5_CDK8::chr13:26401269- 26401347(+)	187	-21.5	187	-21.5	21	79	26
									hsa-miR-186-5p.t5:c.t3:0.ad: 0	ENST00000352483.3_RIPK4::chr21:41739369- 41740837(-)	181	-22.53	181	-22.53	21	1469	661

explanation of column names:

Seq1:search sequence

Seq2:target sequence

Tot Score:total Score of all sites

Tot Energy:total Energy of all sites

Max Score:Max Score of binding sites

Max Energy:Max Energy of binding sites,A negative value is requiredfor filtering to occur

Len1:length of Seq1

Len2:length of Seq2

Positions:binding sites

4. functional analysis

Pathway analysis of target genes from KEGG was performed using clusterProfiler. The predicted association between isomiR and cancer gene, and the risk pathway existing between important genes can be found. These association analyses and pathway connections are the source of the disease.

Analysis software: clusterProfile of R

The results show that:

table 4 pathway enrichment analysis:

isomiR	ID	Description	pvalue	geneID
					hsa-miR-221-5p.t5:0.t3:C.ad:0	hsa04360	Axon guidance	0.0234396	9037
hsa-miR-378a-3p.t5:0.t3:0.ad:0	hsa04921	Oxytocin signaling pathway	0.020359	5021
					hsa-miR-378a-3p.t5:0.t3:0.ad:A	hsa04080	Neuroactive ligand-receptor interaction	0.0371015	5021
hsa-miR-574-5p.t5:0.t3:gt.ad:0	hsa04144	Endocytosis	0.0326815	23362
					hsa-miR-92a-3p.t5:0.t3:0.ad:AAA	hsa03030	DNA replication	0.0048219	4171

explanation of column names:

isomiR:isomiR name

ID:pathway id

Description:pathway description

pvalue:pvalue

geneID:target gene id

construction of isomiR potential function regulation network

An analysis platform: r platform

And (3) graphical software: cytoscape

A schematic of the network relating myeloma isomiR to target gene top1 is shown in FIG. 4, and shows that:

hsa-miR-92a-3p.t5:0.t3:0.ad:AAA、hsa-miR-92a-3p.t5:0.t3:0.ad:AGA

and hsa-miR-92a-3p.t5:0.t3:0.ad: AGT is associated with target genes in myeloma samples, and is found to be most relevant to diseases through channel enrichment.

A schematic diagram of the network relating myeloma isomiR to target gene top2 is shown in FIG. 5, and shows that:

the relevance of hsa-miR-378a-3p.t5:0.t3:0.ad:0 and hsa-miR-378a-3p.t5:0.t3:0.ad: A with target genes in myeloma samples is found through channel enrichment

The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

17页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：基于门控循环单元神经网络的冠状病毒序列识别方法

Method and device for screening isomiR molecular markers

相关技术

网友询问留言