Detection method for functional prophage in bacteria and position and sequence thereof

文档序号:1339749 发布日期:2020-07-17 浏览:26次 中文

阅读说明:本技术 一种细菌中功能性前噬菌体及其位置与序列的检测方法 (Detection method for functional prophage in bacteria and position and sequence thereof ) 是由 张湘莉兰 谢湘成 童贻刚 孙强 彭绍亮 翟诗翔 童善惟 牛琦 于 2020-03-31 设计创作,主要内容包括:本发明公开了一种细菌中功能性前噬菌体及其位置与序列的检测方法。本发明公开的细菌中功能性前噬菌体的检测方法包括:预测待测细菌基因组测序数据中的开放阅读框,得到开放阅读框编码的蛋白质,将该蛋白质序列与噬菌体蛋白质库中序列进行比对,能与噬菌体蛋白质比对上的蛋白质为功能性蛋白质,在功能蛋白质的编码基因及其上下游查找正向重复序列,两条互为正向重复序列间的序列为候选前噬菌体的候选序列,连接候选序列首尾,测序数据中含有跨越候选序列首尾连接处的测序读长的候选前噬菌体为功能性噬菌体;测序数据中不含跨越候选序列首尾连接处的测序读长的候选前噬菌体不为功能性噬菌体。本发明的方法操作简便,应用前景广泛。(The invention discloses a method for detecting functional prophage in bacteria and a position and a sequence thereof. The invention discloses a method for detecting functional prophages in bacteria, which comprises the following steps: predicting an open reading frame in sequencing data of a bacterial genome to be tested to obtain protein encoded by the open reading frame, comparing a protein sequence with a sequence in a phage protein library, wherein the protein which can be compared with the phage protein is functional protein, searching forward repeat sequences in a coding gene of the functional protein and upstream and downstream of the coding gene, taking a sequence between two forward repeat sequences as a candidate sequence of a candidate prophage, connecting the head and the tail of the candidate sequence, and taking the candidate prophage which spans the sequencing read length of the head-tail connection part of the candidate sequence as the functional phage in sequencing data; candidate prophages in the sequencing data that do not contain sequencing reads spanning the end-to-end junction of the candidate sequence are not functional phages. The method is simple and convenient to operate and wide in application prospect.)

1. A method of detecting functional prophages in a bacterium, comprising:

(1) performing high-throughput sequencing on a bacterial genome to be detected to obtain sequencing data; the bacteria to be detected contain bacteriophage;

(2) predicting an open reading frame in the sequencing data to obtain a protein coded by the open reading frame, and marking the protein as a candidate protein;

(3) comparing the sequence of the candidate protein with the sequence in the phage protein library, wherein the candidate protein which can be compared with the phage protein is functional protein, and the candidate protein which can not be compared with the phage protein is non-functional protein; the position of the coding gene of the functional protein in the genome of the bacteria to be detected is the position of the candidate prophage, and the position is marked as a rough position;

(4) searching forward repeat sequences at the rough position and upstream and downstream of the rough position by adopting a sliding window method, wherein the forward repeat sequences refer to forward repeat sequences at two ends of a prophage sequence formed after the lysogenic phage is integrated into a bacterial genome; the method adopting the sliding window comprises the following steps: defining two sliding windows with the length of n at the rough position and the upstream and downstream of the rough position, wherein n is 50bp, the distance between the two sliding windows is 10,000bp, comparing the sequences of the two sliding windows, determining whether forward repeated sequences exist in the two sliding windows, and if the forward repeated sequences do not exist in the two sliding windows, sliding the two sliding windows along the upstream and downstream of the sequences to determine whether the forward repeated sequences exist in the rough position and the upstream and downstream of the rough position;

recording a sequence between two mutually positive repeated sequences containing the coding gene of the functional protein as a candidate sequence of the candidate prophage, and recording the position of the candidate sequence in the genome of the bacteria to be tested as a candidate position of the candidate prophage;

(5) connecting the head and the tail of the candidate sequence to obtain a cyclic sequence; determining whether said candidate prophage is a functional prophage according to the following method: the sequencing data contains sequencing reading length spanning the head-tail connection position of the candidate sequence, and the candidate prophage is or is a functional bacteriophage; the sequencing data does not contain sequencing reads spanning the end-to-end junction of the candidate sequence, and the candidate prophage is not or is not a functional phage.

2. A method of detecting the location of a prophage in the genome of a bacterium, comprising:

(1) performing high-throughput sequencing on a bacterial genome to be detected to obtain sequencing data; the bacteria to be detected contain bacteriophage;

(2) predicting an open reading frame in the sequencing data to obtain a protein coded by the open reading frame, and marking the protein as a candidate protein;

(3) comparing the sequence of the candidate protein with the sequence in the phage protein library, wherein the candidate protein which can be compared with the phage protein is functional protein, and the candidate protein which can not be compared with the phage protein is non-functional protein; the position of the coding gene of the functional protein in the genome of the bacteria to be detected is the position of the candidate prophage, and the position is marked as a rough position;

(4) searching forward repeat sequences at the rough position and upstream and downstream of the rough position by adopting a sliding window method, wherein the forward repeat sequences refer to forward repeat sequences at two ends of a prophage sequence formed after the lysogenic phage is integrated into a bacterial genome; the method adopting the sliding window comprises the following steps: defining two sliding windows with the length of n at the rough position and the upstream and downstream of the rough position, wherein n is 50bp, the distance between the two sliding windows is 10,000bp, comparing the sequences of the two sliding windows, determining whether forward repeated sequences exist in the two sliding windows, and if the forward repeated sequences do not exist in the two sliding windows, sliding the two sliding windows along the upstream and downstream of the sequences to determine whether the forward repeated sequences exist in the rough position and the upstream and downstream of the rough position;

recording a sequence between two mutually positive repeated sequences containing the coding gene of the functional protein as a candidate sequence of the candidate prophage, and recording the position of the candidate sequence in the genome of the bacteria to be tested as a candidate position of the candidate prophage;

(5) connecting the head and the tail of the candidate sequence to obtain a cyclic sequence; determining the position of the candidate prophage in the genome of the test bacterium according to the following method: the sequencing data contains sequencing reading length crossing the head-tail connection position of the candidate sequence, and the candidate position is or is the position of the candidate prophage in the genome of the bacteria to be detected; the sequencing data does not contain sequencing reads spanning the end-to-end junction of the candidate sequence, and the candidate position is not, or is not a candidate for, the position of the candidate prophage in the genome of the test bacterium.

3. A method of detecting a prophage sequence in a bacterial genome, comprising:

(1) performing high-throughput sequencing on a bacterial genome to be detected to obtain sequencing data; the bacteria to be detected contain bacteriophage;

(2) predicting an open reading frame in the sequencing data to obtain a protein coded by the open reading frame, and marking the protein as a candidate protein;

(3) comparing the sequence of the candidate protein with the sequence in the phage protein library, wherein the candidate protein which can be compared with the phage protein is functional protein, and the candidate protein which can not be compared with the phage protein is non-functional protein; the position of the coding gene of the functional protein in the genome of the bacteria to be detected is the position of the candidate prophage, and the position is marked as a rough position;

(4) searching forward repeat sequences at the rough position and upstream and downstream of the rough position by adopting a sliding window method, wherein the forward repeat sequences refer to forward repeat sequences at two ends of a prophage sequence formed after the lysogenic phage is integrated into a bacterial genome; the method adopting the sliding window comprises the following steps: defining two sliding windows with the length of n at the rough position and the upstream and downstream of the rough position, wherein n is 50bp, the distance between the two sliding windows is 10,000bp, comparing the sequences of the two sliding windows, determining whether forward repeated sequences exist in the two sliding windows, and if the forward repeated sequences do not exist in the two sliding windows, sliding the two sliding windows along the upstream and downstream of the sequences to determine whether the forward repeated sequences exist in the rough position and the upstream and downstream of the rough position;

recording a sequence between two mutually positive repeated sequences containing the coding gene of the functional protein as a candidate sequence of the candidate prophage, and recording the position of the candidate sequence in the genome of the bacteria to be tested as a candidate position of the candidate prophage;

(5) connecting the head and the tail of the candidate sequence to obtain a cyclic sequence; determining the sequence of the candidate prophage according to the following method: the sequencing data contains sequencing read length spanning the head-to-tail connection position of the candidate sequence, and the candidate sequence is or is a candidate prophage sequence; the sequencing data does not contain sequencing reads spanning the end-to-end junction of the candidate sequence, which is not or is not a sequence of the candidate prophage.

4. The method according to any one of claims 1 to 3, wherein in step (3), the sequence of the candidate protein is aligned with a sequence in a phage protein library using B L ASTP.

5. The method according to any one of claims 1-4, wherein: the phage protein library is a database consisting of phage sequences in NCBI.

6. The method according to any one of claims 1-5, wherein: the length of the forward repeat sequence is 14-50 bp;

and/or, the distance between two forward repeat sequences satisfies more than or equal to 10,000 bp;

and/or, searching for forward repeat sequences at the coarse position and upstream and downstream thereof is to search for forward repeat sequences within 45,000bp each of the coarse position and upstream and downstream thereof.

7. A data processing system for detecting a functional prophage in a bacterium, for detecting the location of a prophage in the genome of a bacterium, or for detecting a prophage sequence in the genome of a bacterium according to the method of any one of claims 1 to 6.

8. Use of the method of any one of claims 1 to 6 or the system of claim 7 for constructing a functional phage database.

Technical Field

The invention relates to the field of biotechnology, and discloses a method for detecting functional prophage in bacteria and a position and a sequence thereof.

Background

Bacteriophage, a virus that infects bacteria, plays an important biological role in the host. They can be divided into two categories: lytic and lysogenic phages. A lysogenic bacteriophage is a virus that is capable of integrating its own genes into the bacterial genome, and during the integration process, the lysogenic bacteriophage is capable of inserting its gene into the bacterial genome (the lysogenic bacteriophage, after integration into the bacterial genome, is called a prophage) or exists in the bacterial cytoplasm in the form of a plasmid.

Prophages, an important form of existence of bacteriophages, play an important role in the evolution of prokaryotes and are also the driving force for the diversification of bacterial genomes. During integration of the prophage into the bacterium, it is possible to alter gene expression of the host and disrupt the bacterial protogenome, leading to a change in bacterial phenotype through horizontal transfer of virulence and resistance genes, etc. For example, the major virulence genes of entero-hemorrhagic Escherichia coli O104: H4 found in Germany are encoded by prophages. The cholera toxin gene of Vibrio cholerae is encoded by the filamentous functional prophage CTX phi.

Prophages include functional prophages (functional prophages) and latent prophages (cryptic prophages). Functional prophages are those which still have a lytic effect after induction under specific conditions. Latent prophages, also known as prophage gene elements, are unable to undergo lysis due to their mutation. Since only functional prophages are able to lyse and infect bacteria (latent prophages have lost lytic function). When the functional prophage is induced from the host bacterium, it is a lysogenic phage. Only on the basis of obtaining a complete functional prophage sequence, the method can deeply and systematically understand the phage-bacterium interaction relationship and further deeply carry out the related research on bacterial drug resistance.

On the other hand, lytic bacteriophages have been more and more extensively studied in recent years because of their ability to kill drug-resistant bacteria due to the severity of the problem of bacterial resistance. In order to safely use lytic phages, it is very important to carefully examine and exclude functional prophages. In general, isolation and purification of lytic phages is achieved by picking appropriate plaques. If the functional prophages integrated in the bacteria enter the lytic cycle, they are very miscible in the selected lytic phage. When the lytic bacteriophage containing the induced functional prophage (namely the lytic bacteriophage) is used for sterilization, the lytic bacteriophage in the bacterial bacteriophage is likely to be integrated into the bacteria again, and the virulence factor and the drug resistance factor level contained in the lytic bacteriophage are transferred into the bacterial genome, so that the undamaged bacteria obtain new virulence/pathogenicity to cause phenotypic change, and the evolutionary variation and adaptability of the host bacteria are accelerated. Thus, a better understanding of functional prophages will help one understand the pathogenicity and specific metabolic pathways of bacteria, as well as safely proceed with lytic phage production.

Traditional methods require the identification of bacteria that contain functional prophages by induced isolation of the prophages. In biological experiments, functional prophages are separated from host bacteria by irradiating ultraviolet rays or adding chemical substances such as mitomycin and the like to destroy the DNA of the host bacteria, then the induced functional prophages (i.e. lysogenic prophages) are separated by a double-layer agar plate, amplified and cultured and transfected into different bacterial strains, plaques are observed and clones are picked, so that the characteristic analysis of the phage and the mutual relation research of the phage and the bacteria are carried out.

It is noted that, due to the lysogenic characteristics of the lysogenic phage (i.e., integration of the bacterial genome with its own genome), even after successful induction, the lysogenic phage is easily recombined and integrated with the host bacterium during bacterial transfection, and then enters the lysogenic state again, so that no plaques can be observed. Thus, the lysogenic nature of lysogenic phages makes their induction very difficult and inefficient.

The development of high-throughput sequencing technology enables people to obtain massive bacterial genome sequences in a short time, and also enables the prediction of prophages in the bacterial sequences by means of a computer algorithm to be possible. However, the low similarity of prophages themselves at the family (family) classification level, and the uncertainty in the size of the prophage genes involved in integration into the bacterial genome, have always constrained efficient prediction of prophages, and are computationally very challenging.

Early methods for determining prophages were usually based on calculating different GC contents or identifying defective genes, but the prediction results from such simple calculation methods described above are very unreliable. In the late 2000 s, a host of improved computing programs and service sites were emerging to help predict prophages in the bacterial genome. These methods first align the input sequences with known phage and bacterial genes, perform tRNA and dinucleotide analyses, and predict binding sites using hidden markov models. The method greatly improves the accuracy of prophage prediction, and promotes the development of more prophage prediction tools, including prophage prediction software which does not depend on known phage sequences and is oriented to metagenome sequencing data. The PHAST series are the prophage prediction service websites which are widely used at present. Such internet-based applications limit their throughput of use and can impact prophage prediction efficiency by responding too long to the user during peak usage periods; furthermore, it is not realistic for users (such as various microorganism research institutes) to upload all of the bacterial genome data to a website for analysis after they have generated a large amount of data.

In addition, the above tools enable the prediction of a large number of prophage sequences from a single bacterial genome. But biological experiments prove that many prophages predicted by the tools cannot be induced; however, the functional prophages that can be individually induced are greatly different from the previous prediction results, and there are cases where the prediction position deviates and the functional prophages cannot be predicted accurately. Therefore, none of these tools predict the precise location of functional prophages. In addition, they do not automatically extract the complete functional prophage genomic sequence from the bacterial genome.

Disclosure of Invention

The invention aims to solve the technical problem of how to accurately detect functional prophages in bacteria and the positions and sequences thereof.

To solve the above technical problems, the present invention provides, in a first aspect, a method for detecting functional prophage/lysogenic phage in a bacterium, the method comprising:

(1) performing high-throughput sequencing on a bacterial genome to be detected to obtain sequencing data; the bacteria to be detected contain bacteriophage;

(2) predicting an open reading frame in the sequencing data to obtain a protein coded by the open reading frame, and marking the protein as a candidate protein;

(3) comparing the sequence of the candidate protein with the sequence in the phage protein library, wherein the candidate protein which can be compared with the phage protein is functional protein, and the candidate protein which can not be compared with the phage protein is non-functional protein; the position of the coding gene of the functional protein in the genome of the bacteria to be detected is the position of the candidate prophage, and the position is marked as a rough position;

(4) searching forward repeat sequences at the rough position and upstream and downstream of the rough position by adopting a sliding window method, wherein the forward repeat sequences refer to forward repeat sequences at two ends of a prophage sequence formed after the lysogenic phage is integrated into a bacterial genome; the method adopting the sliding window comprises the following steps: defining two sliding windows with the length of n at the rough position and the upstream and downstream of the rough position, wherein n is 50bp, the distance between the two sliding windows is 10,000bp, comparing the sequences of the two sliding windows, determining whether forward repeated sequences exist in the two sliding windows, and if the forward repeated sequences do not exist in the two sliding windows, sliding the two sliding windows along the upstream and downstream of the sequences to determine whether the forward repeated sequences exist in the rough position and the upstream and downstream of the rough position;

recording a sequence between two mutually positive repeated sequences containing the coding gene of the functional protein as a candidate sequence of the candidate prophage, and recording the position of the candidate sequence in the genome of the bacteria to be tested as a candidate position of the candidate prophage;

(5) connecting the head and the tail of the candidate sequence to obtain a cyclic sequence; determining whether said candidate prophage is a functional prophage according to the following method: the sequencing data contains sequencing reads (reads) spanning the head-to-tail connection of the candidate sequence, and the candidate prophage is or is candidate to be a functional phage; sequencing reads (reads) spanning the end-to-end junction of the candidate sequence are absent from the sequencing data, and the candidate prophage is not, or the candidate is not, a functional phage.

The invention also provides a method for detecting the position of a prophage in a bacterial genome, which comprises the following steps:

(1) performing high-throughput sequencing on a bacterial genome to be detected to obtain sequencing data; the bacteria to be detected contain bacteriophage;

(2) predicting an open reading frame in the sequencing data to obtain a protein coded by the open reading frame, and marking the protein as a candidate protein;

(3) comparing the sequence of the candidate protein with the sequence in the phage protein library, wherein the candidate protein which can be compared with the phage protein is functional protein, and the candidate protein which can not be compared with the phage protein is non-functional protein; the position of the coding gene of the functional protein in the genome of the bacteria to be detected is the position of the candidate prophage, and the position is marked as a rough position;

(4) searching forward repeat sequences at the rough position and upstream and downstream of the rough position by adopting a sliding window method, wherein the forward repeat sequences refer to forward repeat sequences at two ends of a prophage sequence formed after the lysogenic phage is integrated into a bacterial genome; the method adopting the sliding window comprises the following steps: defining two sliding windows with the length of n at the rough position and the upstream and downstream of the rough position, wherein n is 50bp, the distance between the two sliding windows is 10,000bp, comparing the sequences of the two sliding windows, determining whether forward repeated sequences exist in the two sliding windows, and if the forward repeated sequences do not exist in the two sliding windows, sliding the two sliding windows along the upstream and downstream of the sequences to determine whether the forward repeated sequences exist in the rough position and the upstream and downstream of the rough position;

recording a sequence between two mutually positive repeated sequences containing the coding gene of the functional protein as a candidate sequence of the candidate prophage, and recording the position of the candidate sequence in the genome of the bacteria to be tested as a candidate position of the candidate prophage;

(5) connecting the head and the tail of the candidate sequence to obtain a cyclic sequence; determining the position of the candidate prophage in the genome of the test bacterium according to the following method: the sequencing data comprises sequencing reads (reads) spanning the head-to-tail connection of the candidate sequence, and the candidate position is or is the position of the candidate prophage in the genome of the bacteria to be tested; sequencing reads (reads) spanning the end-to-end junction of the candidate sequence are not included in the sequencing data, and the candidate location is not, or is not candidate for, the location of the candidate prophage in the genome of the test bacterium.

The invention also provides a method for detecting a prophage sequence in a bacterial genome, which comprises the following steps:

(1) performing high-throughput sequencing on a bacterial genome to be detected to obtain sequencing data; the bacteria to be detected contain bacteriophage;

(2) predicting an open reading frame in the sequencing data to obtain a protein coded by the open reading frame, and marking the protein as a candidate protein;

(3) comparing the sequence of the candidate protein with the sequence in the phage protein library, wherein the candidate protein which can be compared with the phage protein is functional protein, and the candidate protein which can not be compared with the phage protein is non-functional protein; the position of the coding gene of the functional protein in the genome of the bacteria to be detected is the position of the candidate prophage, and the position is marked as a rough position;

(4) searching forward repeat sequences at the rough position and upstream and downstream of the rough position by adopting a sliding window method, wherein the forward repeat sequences refer to forward repeat sequences at two ends of a prophage sequence formed after the lysogenic phage is integrated into a bacterial genome; the method adopting the sliding window comprises the following steps: defining two sliding windows with the length of n at the rough position and the upstream and downstream of the rough position, wherein n is 50bp, the distance between the two sliding windows is 10,000bp, comparing the sequences of the two sliding windows, determining whether forward repeated sequences exist in the two sliding windows, and if the forward repeated sequences do not exist in the two sliding windows, sliding the two sliding windows along the upstream and downstream of the sequences to determine whether the forward repeated sequences exist in the rough position and the upstream and downstream of the rough position;

recording a sequence between two mutually positive repeated sequences containing the coding gene of the functional protein as a candidate sequence of the candidate prophage, and recording the position of the candidate sequence in the genome of the bacteria to be tested as a candidate position of the candidate prophage;

(5) connecting the head and the tail of the candidate sequence to obtain a cyclic sequence; determining the sequence of the candidate prophage according to the following method: the sequencing data comprises sequencing reads (reads) spanning the head-to-tail connection of the candidate sequence, and the candidate sequence is or is a candidate of the candidate prophage; sequencing reads (reads) spanning the end-to-end junction of the candidate sequences are not contained in the sequencing data, and the candidate sequences are not or are not candidate prophage sequences.

In the above, the sequencing depth in step (1) is sufficient to obtain the full-length sequence of the bacteria to be tested.

In step (3), the sequence of the candidate protein is aligned with the sequences in the phage protein library, and can be performed using B L ASTP.

The phage protein library may be a database of phage sequence compositions in NCBI.

The length of the forward repeat sequence can be 14-50 bp. and the forward repeat sequence is att L and attR.

The distance between the two forward repeat sequences may satisfy 10,000bp or more.

Looking up forward repeat sequences at the coarse position and upstream and downstream thereof can be looking up forward repeat sequences within 45,000bp each of the coarse position and upstream and downstream thereof.

The invention also provides a data processing system capable of detecting a functional prophage in a bacterium, detecting a position of a prophage in a genome of a bacterium, or detecting a sequence of a prophage in a genome of a bacterium according to a method of detecting a functional prophage in a bacterium, a method of detecting a position of a prophage in a genome of a bacterium, or a method of detecting a sequence of a prophage in a genome of a bacterium.

In one embodiment of the invention the system is L ysoPhD/L ivePhD or a vector described as L ysoPhD/L ivePhD.

The invention also provides a functional prophage database obtained by detecting a plurality of bacterial genomes using a method for detecting functional prophages in said bacteria, a method for detecting the location of a prophage in said bacterial genomes, or a method for detecting a prophage sequence in said bacterial genomes.

The database may include functional prophages, prophage positions in the bacterial genome and/or prophage sequences. Such sequence information constitutes a functional prophage database of the invention.

In one embodiment of the invention, the functional prophage database is a data access website for livephase.

A method for detecting functional prophages in said bacterium, a method for detecting the location of a prophage in the genome of said bacterium, a method for detecting a prophage sequence in the genome of said bacterium, or the use of said system for constructing a database of functional phage.

The database may include functional prophages, prophage positions in the bacterial genome and/or prophage sequences.

In the present invention, a functional prophage, also called lysogenic phage, refers to a prophage that can still lyse a host after being isolated from the host genome induced under specific conditions (e.g., ultraviolet or mitomycin treatment).

In the present invention, the sequencing data may also be obtained directly from public databases.

Experiments prove that 76 bacterial samples are detected by using the method for detecting the functional prophages in the bacteria and the positions and the sequences thereof, 11 lysogenic bacteria (namely bacteria containing the prophages) and the positions and the sequences of the prophages contained in the bacteria are detected in total, the prophages are determined to be all functional prophages, and the results are verified by biological experiments and are accurate. The method of the invention is shown to be capable of efficiently and accurately determining the lysogenic phage sequence, position integrated in the bacterial genome and determining whether the prophages contained in the bacteria are functional prophages. In addition, the invention can directly utilize the original high-throughput sequencing data to detect the functional prophage in the bacteria and the position and the sequence thereof, does not need to use 454New bler splicing software and Cytoscape software (if the Cytoscape software can not display cyclization phenomenon due to various reasons, the phenomenon of missing detection can occur), can carry out automatic batch processing on large data, has simple and convenient operation and high throughput, and predicts the functional prophage as much as possible by an automatic and intelligent means. Compared with other methods which need to use third-party software such as Cytoscape and the like, the method can obviously reduce the rate of missed detection and reduce false negative results, and has wide application prospect.

Drawings

FIG. 1 is a work flow diagram of L ysoPhD/L ivePhD.

FIG. 2 is an example of a sliding window search for an integration site core sequence.

FIG. 3 is a schematic diagram of a prophage assay.

Detailed Description

The present invention is described in further detail below with reference to specific embodiments, which are given for the purpose of illustration only and are not intended to limit the scope of the invention. The experimental procedures in the following examples are conventional unless otherwise specified. Materials, reagents, instruments and the like used in the following examples are commercially available unless otherwise specified. The quantitative tests in the following examples, all set up three replicates and the results averaged.

The invention discloses a method for detecting functional prophage in bacteria and position and sequence thereof, which comprises the following steps:

(1) performing high-throughput sequencing on a bacterial genome to be detected to obtain sequencing data; the sequencing depth meets the requirement of obtaining a full-length sequence of the bacteria to be detected;

(2) predicting an Open Reading Frame (ORF) in sequencing data, obtaining a protein coded by the ORF, and marking the obtained protein as a candidate protein;

(3) comparing the sequence of the candidate protein obtained in the step (2) with a sequence (derived from NCBI) in a phage protein library by utilizing blastp, wherein the candidate protein which can be compared with the phage protein is a functional protein, and the candidate protein which cannot be compared with the phage protein is a non-functional protein; the position of the coding gene of the functional protein in the genome of the bacteria to be detected is the position of the candidate prophage, and the position is marked as a rough position;

(4) searching forward repeat sequences in the rough position and upstream and downstream (taking the functional protein as a center and 45000bp regions respectively in the upstream and downstream), marking a sequence between two forward repeat sequences containing a coding gene of the functional protein as a candidate sequence of the candidate prophage, and marking the position of the candidate sequence in the genome of the bacteria to be detected as a candidate position of the candidate prophage; the distance between two forward repeat sequences is more than or equal to 10,000 bp;

the forward repeat sequence refers to a short 14-50bp forward repeat sequence (att L and attR) at both ends of a prophage sequence formed after integration of the lysogenic phage into the bacterial genome;

(5) connecting the head and the tail of the candidate sequence to obtain a cyclic sequence; determining whether the candidate prophage is a functional prophage according to the following method: sequencing data contains sequencing reads (reads) spanning the head-to-tail connection of the candidate sequence, and the candidate prophage is a functional phage; sequencing data does not contain sequencing read length spanning the head-tail connection part of the candidate sequence, and the candidate prophage is not a functional bacteriophage, namely the bacteria to be detected does not contain the functional bacteriophage;

the position of the candidate prophage in the genome of the test bacterium is determined according to the following method: sequencing data contains sequencing reads (reads) spanning the head-to-tail connection of the candidate sequence, and the candidate position is the position of the candidate prophage in the genome of the bacteria to be detected; sequencing data does not contain sequencing reads (reads) spanning the head-to-tail connection position of the candidate sequence, and the candidate position is not the position of the candidate prophage in the genome of the bacteria to be detected;

the sequence of the candidate prophage was determined according to the following method: sequencing data contains sequencing reads (reads) spanning the head-to-tail connection of the candidate sequence, and the candidate sequence is the sequence of the candidate prophage; sequencing data do not contain sequencing reads (reads) spanning the end-to-end junction of candidate sequences, which are not sequences of candidate prophages.

The method is specifically illustrated below by taking specific bacteria as an example, the invention also develops software which can be used for detecting functional prophages in the bacteria and the positions and sequences thereof according to the method and is named as L ysoPhD/L ivePhD.

19页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:基于极端梯度提升方法进行特征选择来获取多基因风险评分的方法及系统

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!