Method and device for screening isomiR molecular markers
阅读说明:本技术 一种isomiR分子标志物的筛选方法及装置 (Method and device for screening isomiR molecular markers ) 是由 桑运霞 吴伟静 刘强 宋青芳 阚海亮 于 2019-04-04 设计创作,主要内容包括:本发明提供一种isomiR分子标志物的筛选方法,所述方法至少包括以下步骤:利用数据库,获取待检测疾病样本和对应健康样本的isomiR的表达数据;对所述待检测疾病样本和对应健康样本的isomiR的表达数据进行差异表达分析,得到以对应健康样本为对照,待检测疾病样本的isomiR的变化率;根据变化率选取isomiR;对选取的isomiR进行靶基因预测,获得靶基因;对所述靶基因进行功能分析,筛选与待检测疾病相关的靶基因,所述靶基因及其对应的isomiR即为所述待检测疾病的分子标志物。本发明所述方法分析流程思路清晰,其实现方法简单,可广泛应用于生物学研究工作中,也可用于临床相关应用。(The invention provides a screening method of an isomiR molecular marker, which at least comprises the following steps: acquiring expression data of isomiR of a disease sample to be detected and a corresponding health sample by using a database; carrying out differential expression analysis on the expression data of the isomiR of the disease sample to be detected and the corresponding healthy sample to obtain the change rate of the isomiR of the disease sample to be detected by taking the corresponding healthy sample as a control; selecting isomiR according to the change rate; performing target gene prediction on the selected isomiR to obtain a target gene; and performing functional analysis on the target gene, and screening the target gene related to the disease to be detected, wherein the target gene and the corresponding isomiR are the molecular markers of the disease to be detected. The method has clear analysis flow thought and simple implementation method, and can be widely applied to biological research work and clinical related application.)
1. A method of screening for isomiR molecular markers, the method comprising at least the steps of:
s1: acquiring expression data of isomiR of a disease sample to be detected and a corresponding health sample by using a database;
s2: carrying out differential expression analysis on the expression data of the isomiR of the disease sample to be detected and the corresponding healthy sample to obtain the change rate of the isomiR of the disease sample to be detected by taking the corresponding healthy sample as a control;
s3: selecting isomiR according to the change rate;
s4: performing target gene prediction on the isomiR selected in the S3 to obtain a target gene;
s5: and performing functional analysis on the target gene, and screening the target gene related to the disease to be detected, wherein the target gene and the corresponding isomiR are the molecular markers of the disease to be detected.
2. The method of screening for isomiR molecular markers of claim 1, further comprising one or more of the following features:
a. in step S1, the database is selected from GEO databases, and/or the method for obtaining expression data of isomiR includes the following steps: acquiring small RNA sequencing original data of a disease sample to be detected and a corresponding healthy sample, and comparing, quantifying and annotating to obtain expression data of isomiR;
b. in step S2, the differential expression analysis method is selected from a fold method, preferably, the fold method is selected from a 1.5-fold or 2-fold difference fold;
c. in step S3, selecting isomiR according to the change rate means that the result obtained in step S2 is subjected to change rate ranking, and the isomiR ranked before the change rate ranking is selected;
d. in step S4, a target gene prediction website or prediction software is used as the predicted target gene, preferably, the prediction software is selected from miRanda;
e. in step S5, the functional analysis includes pathway analysis and/or construction of a regulatory network;
f. the disease to be detected is selected from a tumor disease.
3. The method of screening for isomiR molecular markers of claim 2, further comprising one or more of the following features:
g. in the characteristic a, in the step S1, the method for acquiring small RNA sequencing original data of a disease sample to be detected and a corresponding healthy sample comprises the following steps:
s1.1, obtaining sequencing data of a disease sample to be detected and isomiR corresponding to a healthy sample;
s1.2, obtaining SRA data link of the original sequencing data;
s1.3, downloading required sequencing original data in batch by using aspera;
s1.4, converting the data obtained in the step into a fastq format;
h. in the characteristic b, the differential expression analysis method obtains differential expression isomiR after correcting P-value by adopting a Benjamini-Hochberg method, an FDR method or a Bonforoni method;
i. in the characteristic e, performing path analysis based on clusterProfile;
j. in feature f, the neoplastic disease is selected from myeloma.
4. The method for screening of isomiR molecular markers according to claim 3, wherein in feature i, the pathway analysis is performed using the KEGG database.
5. The method for screening isomiR molecular markers according to claim 1, wherein, in step S2, the method further comprises filtering the isomiR expression data of the disease sample to be detected and the corresponding healthy sample before the differential expression analysis.
6. A screening device for isomiR molecular markers, comprising at least:
the acquisition module is used for acquiring the expression data of the isomiR of the disease sample to be detected and the corresponding health sample by using the database;
the differential expression analysis module is used for carrying out differential expression analysis on the expression data of the isomiR of the disease sample to be detected and the corresponding healthy sample to obtain the change rate of the isomiR of the disease sample to be detected by taking the corresponding healthy sample as a control;
a selection module for selecting isomiR according to the rate of change;
the target gene prediction module is used for performing target gene prediction on the isomiR selected in the selection module to obtain a target gene;
and the function analysis module is used for carrying out function analysis on the target genes and screening out the target genes related to the diseases to be detected, wherein the target genes and the corresponding isomiR are the molecular markers of the diseases to be detected.
7. The screening device for isomiR molecular markers of claim 6, further comprising one or more of the following features:
a. in the acquisition module, the database is selected from a GEO database; and/or, the method for obtaining the expression data of the isomiR comprises the following steps: acquiring small RNA sequencing original data of a disease sample to be detected and a corresponding healthy sample, and comparing, quantifying and annotating to obtain expression data of isomiR;
b. in the differential expression analysis module, the differential expression analysis method is selected from a fold method, preferably, the fold method is selected from 1.5-fold or 2-fold difference fold;
c. in the selection module, the selecting of the isomiR according to the change rate means that the result obtained in the step S2 is subjected to change rate ranking, and the isomiR ranked before the change rate ranking is selected;
d. in the target gene prediction module, a target gene prediction website or prediction software is used as a prediction target gene, preferably, the prediction software is selected from miRanda;
e. in the function analysis module, the function analysis comprises the pathway analysis and/or the construction of a regulation network;
f. the disease to be detected is selected from a tumor disease.
8. The screening device for isomiR molecular markers of claim 7, further comprising one or more of the following features:
g. in the characteristic a, the method for acquiring the expression data of the isomiR of the disease sample to be detected and the corresponding health sample in the acquisition module comprises the following steps:
s1.1, obtaining sequencing data of a disease sample to be detected and isomiR corresponding to a healthy sample;
s1.2, obtaining SRA data link of the original sequencing data;
s1.3, downloading required sequencing original data in batch by using aspera;
s1.4, converting the data obtained in the step into a fastq format;
h. in the characteristic b, the differential expression analysis method obtains differential expression isomiR after correcting P-value by adopting a Benjamini-Hochberg method, an FDR method or a Bonforoni method;
i. in the characteristic e, performing path analysis based on clusterProfile;
j. in feature f, the neoplastic disease is selected from myeloma.
9. The screening apparatus for isomiR molecular markers according to claim 8, wherein in feature i, the pathway analysis is performed using KEGG database.
10. The screening apparatus for isomiR molecular markers according to claim 6, wherein the differential expression analysis module further comprises a filter for the isomiR expression data of the disease sample to be tested and the corresponding healthy sample before the differential expression analysis.
11. A computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the method for screening of isomiR molecular markers according to any one of claims 1-5.
12. A computer processing device comprising a processor and the aforementioned computer readable storage medium, the processor executing a computer program on the computer readable storage medium to perform the steps of the method for screening of isomiR molecular markers according to any one of claims 1-5.
13. An electronic terminal, comprising: a processor, a memory, and a communicator; the memory is used for storing a computer program, the communicator is used for being in communication connection with an external device, and the processor is used for executing the computer program stored by the memory so as to enable the terminal to execute the method for screening the isomiR molecular marker according to any one of claims 1-5.
14. Use of any one of the method of screening for isomiR molecular markers of any one of claims 1-5, the apparatus for screening for isomiR molecular markers of claims 6-10, the computer-readable storage medium of claim 11, the computer processing device of claim 12, or the electronic terminal of claim 13 in one or more of a biological targeted therapy system, a pathogenic mechanism system, and a pathogenic risk prediction system.
Technical Field
The invention relates to the field of genetic engineering, in particular to a method and a device for screening isomiR molecular markers.
Background
mirnas are important regulators of life activities, and it was originally generally thought that one miRNA gene can only form one mature miRNA. However, recent studies have found that one miRNA gene can actually form a plurality of miRNA isoforms (isoforms of mirnas, isomurs) having differences in length or sequence.
In recent years, research on the explosive increase of isomiR indicates that many isomiRs are abnormally expressed in serum or plasma of cancer patients, possibly participate in the generation and development of tumors as cancer suppressor genes or cancer promoter genes, can circulate in body fluid in a stable form, and have certain advantages in clinical diagnosis. The expression of IsomiR is cell and tissue specific, disease state specific, and has the ability to participate in cellular stress responses. The pathogenesis of many diseases is associated with changes in isomiR or isomiR expression and may be a marker for disease diagnosis or a target for therapy. The isomiR sequence can bind to the target mRNA, thereby exerting a further biological effect.
The wide application of sequencing technology in recent years provides advantages for the discovery of isomiR and the research on the functions of isomiR, and provides a data base for detecting the sequence characteristics and verifying the expression pattern and biological functions of isomiR. How to effectively carry out data mining on related sample data of a public database and screen an isomiR molecular marker related to diseases is an application difficulty of isomiR.
Disclosure of Invention
In view of the above-mentioned drawbacks of the prior art, the present invention aims to provide a method and an apparatus for screening isomiR molecular markers.
To achieve the above and other related objects, a first aspect of the present invention provides a method for screening an isomiR molecular marker, the method comprising at least the steps of:
s1: acquiring expression data of isomiR of a disease sample to be detected and a corresponding health sample by using a database;
s2: carrying out differential expression analysis on the expression data of the isomiR of the disease sample to be detected and the corresponding healthy sample to obtain the change rate of the isomiR of the disease sample to be detected by taking the corresponding healthy sample as a control;
s3: selecting isomiR according to the change rate;
s4: performing target gene prediction on the isomiR selected in the S3 to obtain a target gene;
s5: and performing functional analysis on the target gene, and screening the target gene related to the disease to be detected, wherein the target gene and the corresponding isomiR are the molecular markers of the disease to be detected.
The second aspect of the present invention provides an apparatus for screening isomiR molecular markers, comprising at least:
the acquisition module is used for acquiring the expression data of the isomiR of the disease sample to be detected and the corresponding health sample by using the database;
the differential expression analysis module is used for carrying out differential expression analysis on the expression data of the isomiR of the disease sample to be detected and the corresponding healthy sample to obtain the change rate of the isomiR of the disease sample to be detected by taking the corresponding healthy sample as a control;
a selection module for selecting isomiR according to the rate of change;
the target gene prediction module is used for performing target gene prediction on the isomiR selected in the selection module to obtain a target gene;
and the function analysis module is used for carrying out function analysis on the target genes and screening out the target genes related to the diseases to be detected, wherein the target genes and the corresponding isomiR are the molecular markers of the diseases to be detected.
A third aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the aforementioned screening method for isomiR molecular markers.
In a fourth aspect, the present invention provides a computer processing device, which includes a processor and the aforementioned computer readable storage medium, wherein the processor executes a computer program on the computer readable storage medium to implement the steps of the aforementioned screening method for isomiR molecular markers.
A fifth aspect of the present invention provides an electronic terminal, comprising: a processor, a memory, and a communicator; the memory is used for storing a computer program, the communicator is used for being in communication connection with an external device, and the processor is used for executing the computer program stored by the memory so as to enable the terminal to execute the screening method of the isomiR molecular marker.
The sixth aspect of the present invention provides the use of the aforementioned screening method for isomiR molecular markers, screening apparatus for isomiR molecular markers, computer-readable storage medium, computer processing device, or electronic terminal in one or more of a biological targeted therapy system, a pathogenic mechanism system, and a pathogenic risk prediction system.
As described above, the method and apparatus for screening isomiR molecular markers according to the present invention have the following advantageous effects:
the method and the device for screening the isomiR molecular markers provided by the invention are used for analyzing and processing isomiR expression data by a bioinformatics method based on public data resources and identifying the isomiR related to myeloma. The invention finds the isomiR and a plurality of risk genes related to the myeloma and has important significance for myeloma biological targeted therapy, pathogenic mechanism explanation, risk prediction and the like. The invention can solve the problems that the existing network resources are not adept to be integrated, the most common database and the frontier analysis method related to the isomiR are not familiar, and bioinformatics analysis related to the isomiR cannot be independently completed. The invention adopts rich and various bioinformatics means, integrates public network resources with strong authority and high popularity, establishes a set of complete front-edge analysis flow, can perform systematic and comprehensive functional analysis on isomiR high-throughput data and finds myeloma-related isomiR molecular markers. The high-throughput data of the public database can be effectively utilized, the scientific research cost is reduced, and the analysis efficiency is improved. The analysis process has clear thought, and the implementation method is simple, and can be widely applied to biological research work and clinical related application.
Drawings
FIG. 1 is a flow chart of a method for screening isomiR molecular markers according to an embodiment of the present invention.
FIG. 2 is a schematic diagram of a screening apparatus for isomiR molecular markers according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of an electronic terminal according to an embodiment of the invention.
FIG. 4 shows a schematic diagram of a network in which myeloma isomiR is associated with target gene top 1.
FIG. 5 shows a schematic diagram of a network relating myeloma isomiR to target gene top 2.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention.
Please refer to fig. 1 to 5. It should be noted that the drawings provided in the present embodiment are only for illustrating the basic idea of the present invention, and although the drawings only show the components related to the present invention and are not drawn according to the number, shape and size of the components in the actual implementation, the type, quantity and proportion of the components in the actual implementation may be changed arbitrarily, and the layout of the components may be more complicated.
As shown in fig. 1, the method for screening isomiR molecular markers provided by the present invention is presented, and the method at least comprises the following steps:
s1: acquiring expression data of isomiR of a disease sample to be detected and a corresponding health sample by using a database;
s2: carrying out differential expression analysis on the expression data of the isomiR of the disease sample to be detected and the corresponding healthy sample to obtain the change rate of the isomiR of the disease sample to be detected by taking the corresponding healthy sample as a control;
s3: selecting isomiR according to the change rate;
s4: performing target gene prediction on the isomiR selected in the S3 to obtain a target gene;
s5: and performing functional analysis on the target gene, and screening the target gene related to the disease to be detected, wherein the target gene and the corresponding isomiR are the molecular markers of the disease to be detected.
In one embodiment, in step S1, the database is selected from GEO databases.
In one embodiment, in step S1, the method for obtaining expression data of isomiR includes the following steps: acquiring small RNA sequencing original data of a disease sample to be detected and a corresponding healthy sample, and comparing, quantifying and annotating to obtain expression data of isomiR.
In one embodiment, in step S2, the differential expression analysis method is selected from the fold method, preferably, the fold method is selected from the group consisting of 1.5 fold or 2 fold difference.
In one embodiment, the selection of the isomalt according to the change rate in step S3 means that the result obtained in step S2 is ranked according to the change rate, and the isomalt with the top rank is selected.
In one embodiment, the differentially expressed isomicrs are sorted by fold difference absolute, with larger ranks being the earlier, and a certain number of isomicrs are screened as the isomicrs to be studied.
In one embodiment, in step S4, a target gene prediction website or prediction software is used as the predicted target gene, preferably, the prediction software is selected from miRanda.
In one embodiment, in step S5, the functional analysis includes pathway analysis and/or construction of a regulatory network.
In one embodiment, the disease to be detected is selected from a neoplastic disease.
In one embodiment, the method for obtaining the expression data of isomiR of the disease sample to be detected and the corresponding healthy sample in step S1 comprises the following steps:
s1.1, obtaining sequencing data of a disease sample to be detected and isomiR corresponding to a healthy sample;
s1.2, obtaining SRA data link of the original sequencing data;
s1.3, downloading required sequencing original data in batch by using aspera;
and S1.4, converting the data obtained in the step into a fastq format.
In one embodiment, fastq of the raw data is aligned to the reference genome, then annotated to obtain isomiRs, and then quantified to obtain expression values for isomiRs.
In one embodiment, the differential expression analysis method obtains the differentially expressed isomiR after correcting the P-value by a Benjamini-Hochberg method, an FDR method or a Bonforoni method.
In one embodiment, the pathway analysis is performed based on clusterProfiler.
In one embodiment, the neoplastic disease is selected from myeloma.
In one embodiment, the pathway analysis is performed using a KEGG database.
In one embodiment, step S2 further includes filtering the expression data of isomiR of the disease sample to be detected and the corresponding healthy sample before the differential expression analysis.
In one embodiment, filtering comprises removing the adaptor and then removing sequences with greater than 20% of the reads having a mass value of less than 20.
In one embodiment, the desired research data is downloaded using aspera software under windows or linux.
In one embodiment, at the R-plateau, the misdiscovery rate correction is performed on the results of isomiR. The Benjamini-Hochberg, FDR and Bonferroni methods can be used.
In one embodiment, binding site prediction is performed on isomiR by the prediction software miRanda.
The GEO database is a gene expression database created and maintained by the national center for biotechnology information NCBI. It was created in 2000, and it included high throughput gene expression data submitted by research institutes of all countries in the world, and the data of gene expression detection related to the paper can be found in this database in published papers, and the data volume is increased year by year.
The aspera is a high-speed download software recommended at NCBI for downloading large data amount sequencing data.
The fastq format is a data format required by general sequencing analysis.
MiRanda process
miRanda was an isomiR target prediction software developed by Enright et al in 2003. The core idea of miRanda is mainly based on base complementarity, similar to the Smith-Waterman algorithm, but an improvement is made to the principle of base pairing, allowing mismatches between G-U. Considering that there is a characteristic that requires a high degree of matching at the 5 'end when isomiR binds to the target site, the software corrects the score of 11 bases at the 5' end using scale parameters. For binding energy calculation, miRanda calculates the binding energy between isomiR-target sequences based on the rnaiib program in the vienna rna package. For the case where multiple isomiR target the same locus, miRanda uses a greedy algorithm to choose the result with the highest score and the lowest binding energy.
In one embodiment of the invention, annotation and enrichment analysis of target genes from KEGG aspects was performed using the clusterProfiler package on the R platform.
KEGG database
KEGG was established in 1995 by Kanehisa laboratory of bioinformatics center of kyoto university, japan. Is one of the most commonly used biological information databases in the world, is known as "understanding the high-level function and utility resource library of biological systems", and is also the most widely used and authoritative database in the field of metabolic analysis. They are roughly classified into three major categories, namely, system information, genome information and chemical information. Further, it can be subdivided into 16 main databases. For example, genomic information is stored in the GENES database, including complete and partially sequenced genomic sequences; higher functional information is stored in the PATHWAY database, and comprises information of the illustrated cell biochemical processes such as metabolism, membrane transport, signal transmission and cell cycle, and homologous conserved sub-paths and the like; another database of KEGG, LIGAND, contains information about chemicals, enzyme molecules, enzyme reactions, etc.
In one embodiment of the invention, a network file containing regulatory network information of the isomiR and the gene is generated in combination with the differentially expressed isomiR after obtaining the pathway-related information of the gene. Can be opened by using Cytoscape software and is graphically displayed.
As shown in fig. 2, a screening apparatus for an isomiR molecular marker according to an embodiment of the present invention is shown, the apparatus at least including:
the acquisition module is used for acquiring the expression data of the isomiR of the disease sample to be detected and the corresponding health sample by using the database;
the differential expression analysis module is used for carrying out differential expression analysis on the expression data of the isomiR of the disease sample to be detected and the corresponding healthy sample to obtain the change rate of the isomiR of the disease sample to be detected by taking the corresponding healthy sample as a control;
a selection module for selecting isomiR according to the rate of change;
the target gene prediction module is used for performing target gene prediction on the isomiR selected in the selection module to obtain a target gene;
and the function analysis module is used for carrying out function analysis on the target genes and screening out the target genes related to the diseases to be detected, wherein the target genes and the corresponding isomiR are the molecular markers of the diseases to be detected.
In one embodiment, in the acquisition module, the database is selected from the GEO database.
In one embodiment, in the obtaining module, the method for obtaining the expression data of the isomiR includes the following steps: acquiring small RNA sequencing original data of a disease sample to be detected and a corresponding healthy sample, and comparing, quantifying and annotating to obtain expression data of isomiR.
In one embodiment, in the differential expression analysis module, the differential expression analysis method is selected from a fold method, preferably, the fold method is selected from a 1.5 fold or 2 fold difference;
in an embodiment, in the selecting module, the selecting the isomiR according to the change rate means that the result obtained in step S2 is subjected to change rate ranking, and the isomiR ranked before the change rate ranking is selected;
in one embodiment, the target gene prediction module uses a target gene prediction website or prediction software as the predicted target gene, preferably, the prediction software is selected from miRanda;
in one embodiment, in the functional analysis module, the functional analysis comprises pathway analysis and/or construction of a regulatory network;
in one embodiment, the disease to be detected is selected from a neoplastic disease.
In one embodiment, in the obtaining module, the method for obtaining the expression data of isomiR of the disease sample to be detected and the corresponding health sample comprises the following steps:
s1.1, obtaining sequencing data of a disease sample to be detected and isomiR corresponding to a healthy sample;
s1.2, obtaining SRA data link of the original sequencing data;
s1.3, downloading required sequencing original data in batch by using aspera;
and S1.4, converting the data obtained in the step into a fastq format.
In one embodiment, fastq of the raw data is aligned to the reference genome, then annotated to obtain isomiRs, and then quantified to obtain expression values for isomiRs.
In one embodiment, the differential expression analysis method obtains the differential expression isomiR after correcting P-value by adopting a Benjamini-Hochberg method, an FDR method or a Bonforoni method;
in one embodiment, the pathway analysis is performed based on clusterProfiler.
In one embodiment, the neoplastic disease is selected from myeloma.
In one embodiment, the pathway analysis is performed using a KEGG database.
In one embodiment, the differential expression analysis module further comprises a step of filtering the expression data of the isomiR of the disease sample to be detected and the corresponding health sample before the differential expression analysis.
In one embodiment, the desired research data is downloaded using aspera software under windows or linux.
In one embodiment, at the R-plateau, the misdiscovery rate correction is performed on the results of isomiR. The Benjamini-Hochberg, FDR and Bonferroni methods can be used.
In one embodiment, binding site prediction is performed on isomiR by the prediction software miRanda.
The aspera is a high-speed download software recommended at NCBI for downloading large data amount sequencing data.
The fastq format is a data format required by general sequencing analysis.
In one embodiment of the invention, annotation and enrichment analysis of target genes from KEGG aspects was performed using the clusterProfiler package on the R platform.
In one embodiment of the invention, a network file containing regulatory network information of the isomiR and the gene is generated in combination with the differentially expressed isomiR after obtaining the pathway-related information of the gene. Can be opened by using Cytoscape software and is graphically displayed.
In one embodiment, filtering comprises removing the adaptor and then removing sequences with greater than 20% of the reads having a mass value of less than 20.
Since the principle of the apparatus in this embodiment is basically the same as that of the foregoing method embodiment, in the foregoing method and apparatus embodiment, the definitions of the same features, the calculation method, the enumeration of the embodiments, and the enumeration and description of the preferred embodiments may be used interchangeably, and are not repeated again.
It should be noted that the division of the modules of the above apparatus is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. These modules may all be implemented in software invoked by a processing element; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. For example, the obtaining module may be a processing element that is set up separately, or may be implemented by being integrated in a certain chip, or may be stored in a memory in the form of program code, and the certain processing element calls and executes the functions of the obtaining module. Other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.
For example, the above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when one of the above modules is implemented in the form of a Processing element scheduler code, the Processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. For another example, these modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).
In some embodiments of the present invention, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the aforementioned screening method for isomiR molecular markers.
In some embodiments of the present invention, there is also provided a computer processing device, including a processor and the aforementioned computer readable storage medium, wherein the processor executes a computer program on the computer readable storage medium to implement the steps of the aforementioned screening method for isomiR molecular markers.
In some embodiments of the present invention, there is also provided an electronic terminal, including: a processor, a memory, and a communicator; the memory is used for storing a computer program, the communicator is used for being in communication connection with an external device, and the processor is used for executing the computer program stored by the memory so as to enable the terminal to execute the screening method for realizing the isomiR molecular marker.
As shown in fig. 3, a schematic diagram of an electronic terminal provided by the present invention is shown. The electronic terminal comprises a
The above-mentioned system bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The system bus may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus. The communication interface is used for realizing communication between the database access device and other equipment (such as a client, a read-write library and a read-only library). The memory may include a Random Access Memory (RAM), and may further include a non-volatile memory (non-volatile memory), such as at least one disk memory.
The processor may be a general-purpose processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the integrated circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, or discrete hardware components.
Those of ordinary skill in the art will understand that: all or part of the steps for implementing the above method embodiments may be performed by hardware associated with a computer program. The aforementioned computer program may be stored in a computer readable storage medium. When executed, the program performs steps comprising the method embodiments described above; the computer-readable storage medium may include, but is not limited to, floppy diskettes, optical disks, CD-ROMs (compact disc-read only memories), magneto-optical disks, ROMs (read-only memories), RAMs (random access memories), EPROMs (erasable programmable read only memories), EEPROMs (electrically erasable programmable read only memories), magnetic or optical cards, flash memory, or other type of media/machine-readable medium suitable for storing machine-executable instructions. The computer readable storage medium may be a product that is not accessed by the computer device or may be a component that is used by an accessed computer device.
In particular implementations, the computer programs are routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types.
The invention also provides the application of the screening method of the isomiR molecular marker, the screening device of the isomiR molecular marker, the computer readable storage medium, the computer processing equipment or the electronic terminal in one or more of a biological targeted therapy system, a pathogenic mechanism system and a pathogenic risk prediction system.
Examples of the invention
Firstly, SRA conversion fastq is carried out on original data, filtering processing is carried out, then low-quality data are removed, other small RNAs are filtered, effective isomiR standardized expression values are obtained, and then annotation is carried out. Based on the results of the isomiR difference analysis, the target gene prediction can be performed based on the sequence characteristics thereof. On the basis of the above analysis, a series of statistical and visual analyses can be performed.
IsomiR annotation files are shown in Table 1
An analysis platform: r platform
Analysis software: bag of isomicrs
Table 1:
seq
name
freq
mir
start
end
mism
add
t5
t3
AAAGGCGGGAGAAGCCCCGGC
seq_100022_x1
1
hsa-miR-4484
65
82
0
GGC
a
a
TGAGGTAGTAGTTTGTACAGTTAGA
seq_100036_x2
2
hsa-let-7g-5p
5
26
0
0
0
CATAAAGTAGAAAGCACT
seq_100064_x1002
1002
hsa-miR-142-5p
16
33
0
0
0
act
CGGCCCGGGCTGCTGCTGTTC
seq_100088_x1
1
hsa-miR-1538
39
59
0
0
0
ct
AACATTCAACGCTGTCGGTGAT
seq_100091_x51
51
hsa-miR-181a-5p
39
59
0
0
gt
explanation of column names:
seq:sequence
freq/name:depending on the input this column contains counts(tabularinput file)or name(fasta file)
mir:miRNA name
start:start of the sequence at the precursor
end:end of the sequence at the precursor
mism:nucleotide substitution position|nucleotide at sequence|nucleotide at precursor
add:nucleotides at 3end added
t5:nucleotides at 5end different from the annonated sequence inmiRBase
t3:nucleotides at 3end different from the annotated sequence inmiRBase
2. the differentially expressed isomiR results are shown in table 2
An analysis platform: r platform
Analysis software: DESeq2 bag
Table 2:
row
baseMean
log2FoldChange
lfcSE
stat
pvalue
padj
hsa-miR-4485-3p.t5:GTT.t3:taa.ad:0
86.2160
6.6277
1.0032
6.6066
3.9326E-11
1.8981E-08
hsa-miR-6503-5p.t5:0.t3:a.ad:T
19.2697
5.4067
1.1283
4.7919
1.6519E-06
1.0631E-04
hsa-miR-223-3p.t5:0.t3:0.ad:TG
33.6162
5.3714
0.9043
5.9400
2.8502E-09
6.8785E-07
hsa-miR-27a-3p.t5:0.t3:cgc.ad:GGC
18.6298
5.3470
1.1345
4.7131
2.4401E-06
1.4133E-04
hsa-miR-27b-3p.t5:0.t3:tgc.ad:GGC
18.6298
5.3470
1.1345
4.7131
2.4401E-06
1.4133E-04
explanation of column names:
row:the isomiR name
baseMean:base means across samples
log2FoldChange:log2 ratio of theat vesus control
lfcSE:gives the standard error of the log2FoldChange
stat:For the Wald test,stat is the Wald statistic:the log2FoldChangedivided by lfcSE,which is
compared to a standard Normal distribution to generate a two-tailedpvalue
pvalue:pvalue of the statistics
padj:adjusted pvalue
target Gene prediction and screening of IsomiR
The target gene prediction of isomiR is predicted by using the miRanda algorithm.
The results show that:
the miRanda algorithm is a method based on the site binding free energy and sequence complementary pairing scores. The default parameters used strict seed sequence complementary pairing, score greater than 140 points, and minimum free energy of-20 KJ/mol.
An analysis platform: linux platform
The results show that:
table 3 miRanda results:
Seq1
Seq2
Tot Score
Tot Energy
Max Score
Max Energy
Len1
Len2
Positions
hsa-miR-186-5p.t5:c.t3:t.ad:
0
ENST00000536792.5_CDK8::chr13:26401269-
26401347(+)
182
-21.5
182
-21.5
20
79
27
hsa-miR-186-5p.t5:c.t3:t.ad:
0
ENST00000352483.3_RIPK4::chr21:41739369-
41740837(-)
181
-22.53
181
-22.53
20
1469
662
hsa-miR-186-5p.t5:c.t3:t.ad:
0
ENST00000332512.7_RIPK4::chr21:41739369-
41740837(-)
181
-22.53
181
-22.53
20
1469
662
hsa-miR-186-5p.t5:c.t3:0.ad:
0
ENST00000536792.5_CDK8::chr13:26401269-
26401347(+)
187
-21.5
187
-21.5
21
79
26
hsa-miR-186-5p.t5:c.t3:0.ad:
0
ENST00000352483.3_RIPK4::chr21:41739369-
41740837(-)
181
-22.53
181
-22.53
21
1469
661
explanation of column names:
Seq1:search sequence
Seq2:target sequence
Tot Score:total Score of all sites
Tot Energy:total Energy of all sites
Max Score:Max Score of binding sites
Max Energy:Max Energy of binding sites,A negative value is requiredfor filtering to occur
Len1:length of Seq1
Len2:length of Seq2
Positions:binding sites
4. functional analysis
Pathway analysis of target genes from KEGG was performed using clusterProfiler. The predicted association between isomiR and cancer gene, and the risk pathway existing between important genes can be found. These association analyses and pathway connections are the source of the disease.
Analysis software: clusterProfile of R
The results show that:
table 4 pathway enrichment analysis:
isomiR
ID
Description
pvalue
geneID
hsa-miR-221-5p.t5:0.t3:C.ad:0
hsa04360
Axon guidance
0.0234396
9037
hsa-miR-378a-3p.t5:0.t3:0.ad:0
hsa04921
Oxytocin signaling pathway
0.020359
5021
hsa-miR-378a-3p.t5:0.t3:0.ad:A
hsa04080
Neuroactive ligand-receptor interaction
0.0371015
5021
hsa-miR-574-5p.t5:0.t3:gt.ad:0
hsa04144
Endocytosis
0.0326815
23362
hsa-miR-92a-3p.t5:0.t3:0.ad:AAA
hsa03030
DNA replication
0.0048219
4171
explanation of column names:
isomiR:isomiR name
ID:pathway id
Description:pathway description
pvalue:pvalue
geneID:target gene id
construction of isomiR potential function regulation network
An analysis platform: r platform
And (3) graphical software: cytoscape
A schematic of the network relating myeloma isomiR to target gene top1 is shown in FIG. 4, and shows that:
hsa-miR-92a-3p.t5:0.t3:0.ad:AAA、hsa-miR-92a-3p.t5:0.t3:0.ad:AGA
and hsa-miR-92a-3p.t5:0.t3:0.ad: AGT is associated with target genes in myeloma samples, and is found to be most relevant to diseases through channel enrichment.
A schematic diagram of the network relating myeloma isomiR to target gene top2 is shown in FIG. 5, and shows that:
the relevance of hsa-miR-378a-3p.t5:0.t3:0.ad:0 and hsa-miR-378a-3p.t5:0.t3:0.ad: A with target genes in myeloma samples is found through channel enrichment
The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.
- 上一篇:一种医用注射器针头装配设备
- 下一篇:基于门控循环单元神经网络的冠状病毒序列识别方法