Extrachromosomal circular DNA recognition methods, systems, devices and media

文档序号:1536706 发布日期:2020-02-14 浏览:31次 中文

阅读说明:本技术 染色体外环状dna识别方法、系统、设备及介质 (Extrachromosomal circular DNA recognition methods, systems, devices and media ) 是由 刘英娟 孙晓勇 陈士民 付尊元 韩金玉 魏庆功 张圆周 张童 于 2019-11-01 设计创作,主要内容包括:本公开公开了染色体外环状DNA识别方法、系统、设备及介质,包括:训练阶段:构建若干个并列的组合神经网络单元;基于已知类型的DNA进行裁剪,构建不同长度序列的训练集;对训练集进行预处理;将经过预处理后的每一种长度序列的训练集输入到对应的组合神经网络单元中,对组合神经单元进行训练;得到训练好的组合神经网络单元;每一种长度序列的训练集,均训练出对应的组合神经网络单元;最终,得到若干个针对不同长度序列的组合神经网络单元;应用阶段:获取待识别的DNA;对待识别的DNA进行裁剪;对裁剪后的结果进行预处理;将预处理后的裁剪的DNA输入到对应的组合神经网络单元中,输出待识别DNA的识别结果。(The present disclosure discloses methods, systems, devices and media for extrachromosomal circular DNA identification, comprising: a training stage: constructing a plurality of parallel combined neural network units; cutting based on known types of DNA, and constructing training sets of sequences with different lengths; preprocessing a training set; inputting the training set of each length sequence after pretreatment into a corresponding combined neural network unit, and training the combined neural network unit; obtaining a trained combined neural network unit; training a corresponding combined neural network unit in each training set of the length sequences; finally, a plurality of combined neural network units aiming at sequences with different lengths are obtained; an application stage: obtaining DNA to be identified; cutting DNA to be identified; preprocessing the cut result; and inputting the preprocessed and cut DNA into a corresponding combined neural network unit, and outputting the recognition result of the DNA to be recognized.)

1. An extrachromosomal circular DNA identification method, comprising:

a training stage:

constructing a plurality of parallel combined neural network units;

cutting based on known types of DNA, and constructing training sets of sequences with different lengths;

preprocessing a training set; inputting the training set of each length sequence after pretreatment into a corresponding combined neural network unit, and training the combined neural network unit; obtaining a trained combined neural network unit; training a corresponding combined neural network unit in each training set of the length sequences;

finally, a plurality of combined neural network units aiming at sequences with different lengths are obtained;

an application stage:

obtaining DNA to be identified; cutting DNA to be identified;

preprocessing the cut result;

inputting the preprocessed clipped DNA into a corresponding combined neural network unit, and outputting a recognition result of the DNA to be recognized, wherein the recognition result comprises: belongs to extrachromosomal circular DNA or does not belong to extrachromosomal circular DNA.

2. The method of claim 1, wherein the training phase constructs a plurality of parallel combined neural network elements, wherein each combined neural network element comprises:

the system comprises a first convolutional neural network, a second convolutional neural network and a gated cyclic unit network;

the input end of the first convolution neural network is used for inputting a matrix corresponding to an accept sequence of the DNA to be identified; the output end of the first convolution neural network is connected with the input end of the gate control circulation unit network;

the input end of the second convolutional neural network is used for inputting a matrix corresponding to the donor sequence of the DNA to be identified; the output end of the second convolutional neural network is connected with the input end of the gate control cycle unit network;

and the output end of the gate control circulation unit network outputs the identification result of the DNA to be identified corresponding to the current combined neural network unit.

3. The method of claim 1, wherein the training phase, based on known types of DNA, is tailored to construct training sets of sequences of different lengths; the method comprises the following specific steps:

s11: acquiring a starting end point and an ending end point of each segment of extrachromosomal circular DNA in a DNA sequence, determining a shearing position according to a set shearing length and a shearing direction, and storing the DNA sequences with the same shearing direction and the same shearing length together; according to the determined shearing position, shearing the front and back of the shearing site on the DNA sequence;

s12: packing the DNA gene sequences with the same cutting direction and the same cutting length into a class of data sets, taking 60 percent of each class of data sets as data sets for model training, and taking the rest 40 percent as data sets for model prediction.

4. The method of claim 1, wherein the training phase pre-processes the training set; the method comprises the following specific steps:

carrying out one-hot coding on both an accept sequence and a donor sequence of the DNA to be recognized;

converting the A of the DNA to be identified after the one-hot coding into a four-digit binary number 0001;

converting T of the DNA to be identified after the one-hot coding into a four-digit binary number of 1000;

converting C of the DNA to be identified after one-hot coding into a four-digit binary number 0100;

converting G of the DNA to be identified after the one-hot coding into a four-digit binary number 0010;

the rest cases are converted into 0000;

and converting the obtained plurality of four-digit binary numbers into two 4-x-n matrixes, wherein the two 4-x-n matrixes have the advantages that the accept sequence corresponds to one matrix, the donor sequence corresponds to the other matrix, and the length of the accept sequence and the length of the donor sequence are both represented by n.

5. The method according to claim 1, characterized in that the application phase consists in cutting the DNA to be recognized; the method comprises the following specific steps: and in the application stage, the DNA to be recognized is cut in the same cutting mode as in the training stage.

6. The method according to claim 1, characterized in that the application phase, pre-processing the clipped result; the method comprises the following specific steps:

performing one-hot coding on a sequence of DNA to be recognized;

converting the A of the DNA to be identified after the one-hot coding into a four-digit binary number 0001;

converting T of the DNA to be identified after the one-hot coding into a four-digit binary number of 1000;

converting C of the DNA to be identified after one-hot coding into a four-digit binary number 0100;

converting G of the DNA to be identified after the one-hot coding into a four-digit binary number 0010;

the rest turns to 0000.

7. The method of claim 1, wherein the step of constructing the training set comprises:

calculating position information required to be sheared under different set shearing lengths according to the position information of the known extrachromosomal circular DNA on the chromosome, shearing on a DNA sequence, and dividing different data sets according to the difference of the adopted shearing lengths;

positional information on the chromosome of the known extrachromosomal circular DNA includes: the unique sequence identification chr _ acc of the accept sequence, the start end acc _ start of the accept sequence, the end acc _ end of the accept sequence, the unique sequence identification chr _ don of the donor sequence, the start end don _ start of the donor sequence and the end don _ end of the donor sequence.

8. An extrachromosomal circular DNA identification method, comprising:

a training module comprising:

a model building unit configured to: constructing a plurality of parallel combined neural network units;

a training set construction unit configured to: cutting based on known types of DNA, and constructing training sets of sequences with different lengths;

a first pre-processing unit configured to: preprocessing a training set; inputting the training set of each length sequence after pretreatment into a corresponding combined neural network unit, and training the combined neural network unit; obtaining a trained combined neural network unit; training a corresponding combined neural network unit in each training set of the length sequences; finally, a plurality of combined neural network units aiming at sequences with different lengths are obtained;

an application module:

a clipping unit configured to: obtaining DNA to be identified; cutting DNA to be identified;

a second preprocessing unit: it is configured to: preprocessing the cut result;

an identification unit configured to: inputting the preprocessed clipped DNA into a corresponding combined neural network unit, and outputting a recognition result of the DNA to be recognized, wherein the recognition result comprises: belongs to extrachromosomal circular DNA or does not belong to extrachromosomal circular DNA.

9. An electronic device comprising a memory and a processor and computer instructions stored on the memory and executable on the processor, the computer instructions when executed by the processor performing the steps of the method of any of claims 1 to 7.

10. A computer-readable storage medium storing computer instructions which, when executed by a processor, perform the steps of the method of any one of claims 1 to 7.

Technical Field

The present disclosure relates to the field of extrachromosomal circular DNA identification technology, and in particular, to methods, systems, devices, and media for extrachromosomal circular DNA identification.

Background

The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.

Extrachromosomal circular DNA (eccDNA) is chromosome-independent and is widely found in many eukaryotes. It was originally a series of DNA circles discovered in 1964 and was later reported to be derived from repetitive sequences homologous to genomic DNA. The 2012-Loadfield et al demonstrated that the eccDNA was flanked by 9-11bp direct repeats and the middle was formed by non-repeat sequences. It has been found that the eccDNA has tissue specificity in animals, not only promotes senescence, but also participates in intercellular communication. In 2017, the eccDNA was not only detected in normal tissues, but was also found to be much longer than normal tissues as a major driver in tumors. Therefore, the eccDNA has important application value in clinical practice as a clear and important tumor marker.

In the course of implementing the present disclosure, the inventors found that the following technical problems exist in the prior art:

the prior extrachromosomal circular DNA recognition mainly adopts manual recognition, and the chromosomes are recognized by experienced doctors, so that the whole recognition period is very long, the subjectivity of judgment of the doctors is very strong, the extrachromosomal circular DNA recognition is easily influenced by the external environment, and the accuracy is not high.

Disclosure of Invention

To address the deficiencies of the prior art, the present disclosure provides methods, systems, devices and media for extrachromosomal circular DNA identification;

in a first aspect, the present disclosure provides an extrachromosomal circular DNA recognition method;

an extrachromosomal circular DNA identification method comprising:

a training stage:

constructing a plurality of parallel combined neural network units;

cutting based on known types of DNA, and constructing training sets of sequences with different lengths;

preprocessing a training set; inputting the training set of each length sequence after pretreatment into a corresponding combined neural network unit, and training the combined neural network unit; obtaining a trained combined neural network unit; training a corresponding combined neural network unit in each training set of the length sequences;

finally, a plurality of combined neural network units aiming at sequences with different lengths are obtained;

an application stage:

obtaining DNA to be identified; cutting DNA to be identified;

preprocessing the cut result;

inputting the preprocessed clipped DNA into a corresponding combined neural network unit, and outputting a recognition result of the DNA to be recognized, wherein the recognition result comprises: belongs to extrachromosomal circular DNA or does not belong to extrachromosomal circular DNA.

In a second aspect, the present disclosure also provides an extrachromosomal circular DNA recognition system;

an extrachromosomal circular DNA identification method comprising:

a training module comprising:

a model building unit configured to: constructing a plurality of parallel combined neural network units;

a training set construction unit configured to: cutting based on known types of DNA, and constructing training sets of sequences with different lengths;

a first pre-processing unit configured to: preprocessing a training set; inputting the training set of each length sequence after pretreatment into a corresponding combined neural network unit, and training the combined neural network unit; obtaining a trained combined neural network unit; training a corresponding combined neural network unit in each training set of the length sequences; finally, a plurality of combined neural network units aiming at sequences with different lengths are obtained;

an application module:

a clipping unit configured to: obtaining DNA to be identified; cutting DNA to be identified;

a second preprocessing unit: it is configured to: preprocessing the cut result;

an identification unit configured to: inputting the preprocessed clipped DNA into a corresponding combined neural network unit, and outputting a recognition result of the DNA to be recognized, wherein the recognition result comprises: belongs to extrachromosomal circular DNA or does not belong to extrachromosomal circular DNA.

In a third aspect, the present disclosure also provides an electronic device comprising a memory and a processor, and computer instructions stored on the memory and executed on the processor, wherein the computer instructions, when executed by the processor, perform the steps of the method of the first aspect.

In a fourth aspect, the present disclosure also provides a computer-readable storage medium for storing computer instructions which, when executed by a processor, perform the steps of the method of the first aspect.

Compared with the prior art, the beneficial effect of this disclosure is:

the method realizes the identification of the extrachromosomal circular DNA through a deep learning algorithm, improves the identification accuracy, does not depend on the subjective judgment of doctors, effectively reduces the workload of the doctors, and improves the identification rate.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application.

FIG. 1 is a flow chart of a data set acquisition method of the first embodiment;

FIG. 2 is a flowchart of a training and application method of the first embodiment;

fig. 3 is a schematic diagram of a combined neural network unit according to the first embodiment.

Detailed Description

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

11页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:测序序列处理方法及装置、存储介质、电子设备

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!