Method for judging reliability of second-generation sequencing detection gene editing result and application

文档序号:1955242 发布日期:2021-12-10 浏览:14次 中文

阅读说明:本技术 一种判断二代测序检测基因编辑结果可信度的方法及应用 (Method for judging reliability of second-generation sequencing detection gene editing result and application ) 是由 曾婉俐 李雪梅 向海英 高茜 米其利 刘欣 李晶 黄海涛 杨光宇 蒋佳芮 许力 于 2021-09-14 设计创作,主要内容包括:本发明公开了一种判断二代测序检测基因编辑结果可信度的方法及应用,方法包括:(1)针对单个样本的有效数据和能够匹配上靶基因的数据均大于等于2000条;匹配上靶基因的数据与有效数据的比大于等于50%;(2)在sgRNA序列靶向的23nt中,某个位置检测到区别于参考序列的变异序列,变异序列与参考序列的比大于等于10:1,判断此突变序列为纯合型突变;若未检测到变异序列,则认为未发生编辑;(3)将GQ值设置为大于等于30;(4)有检测到变异序列时,同时满足(1)-(3)的判定变异结果可信,否则不可信;对于未检测到变异序列的,满足(1)则认为该非变异结果可信,否则不可信。解决了二代测序结果判定中出现的假阳性或假阴性,提高检测结果的可信度。(The invention discloses a method for judging the credibility of a second-generation sequencing detection gene editing result and application thereof, wherein the method comprises the following steps: (1) the effective data aiming at a single sample and the data which can match the upper target gene are more than or equal to 2000; the ratio of the data of the matched target genes to the effective data is more than or equal to 50 percent; (2) detecting a variant sequence different from a reference sequence at a certain position in 23nt targeted by the sgRNA sequence, wherein the ratio of the variant sequence to the reference sequence is more than or equal to 10:1, and judging that the variant sequence is homozygous mutation; if no variant sequence is detected, no editing is considered to occur; (3) setting the GQ value to be greater than or equal to 30; (4) if the variant sequence is detected, the judgment variant results of (1) to (3) are credible, otherwise, the judgment variant results are not credible; for sequences in which no variant is detected, if (1) is satisfied, the non-variant result is considered to be authentic, otherwise, the non-variant result is not authentic. The method solves the problem of false positive or false negative in the judgment of the second-generation sequencing result, and improves the reliability of the detection result.)

1. A method for judging the credibility of a second-generation sequencing detection gene editing result is characterized by comprising the following steps:

(1) limitation of data volume: clear Reads for a single sample, greater than or equal to 2000; mapped Reads that can match the target gene, 2000 or more; the number of matched Reads matched with the target gene accounts for the ratio of the matched Reads to the Clean Reads, and is more than or equal to 50 percent;

(2) detecting a variant sequence different from a reference sequence at a certain position in 23nt targeted by the sgRNA sequence, judging the variant sequence to be homozygous mutation when the ratio DP-Value between the variant sequence and the reference sequence is more than or equal to 10:1, and otherwise, judging the variant sequence to be heterozygous mutation; if no variant sequence is detected, no editing is considered to have occurred;

(3) setting the genotype quality value to be more than or equal to 30;

(4) if the variant sequence is detected, if the step (1), the step (2) and the step (3) are simultaneously met, the variant result is judged to be Right, otherwise, the variant result is not judged to be Wrong; for samples in which no variant sequence is detected, if step (1) is satisfied, the non-variant result is regarded as "Right", otherwise, the non-variant result is regarded as "Wrong".

2. The method for judging the credibility of the second-generation sequencing-test gene editing result according to claim 1, further comprising the steps of:

(5) judging that the detection result of the variation is a 'Right' sample, and indicating that the result of the target gene editing of the sample is credible and can be preferentially researched; and judging that the detection result of the non-variation is the sample of 'Right', indicating that the result of the target gene of the sample which is not edited is credible, and not carrying out the next research.

3. The method for judging the credibility of the second-generation sequencing test gene editing result according to claim 2, further comprising the steps of:

(6) the sample which is judged to be mutated or not mutated and has the detection result of Wrong indicates that the result of the target gene of the sample which is edited or not edited is not credible, and can be subjected to the on-machine sequencing again to judge whether to be further researched.

4. The method for judging the credibility of the second-generation sequencing test gene editing result according to claim 3, wherein the step (6) of re-performing the on-machine sequencing and then judging whether to perform further research refers to re-performing the steps (1) - (4); or re-performing the method of steps (1) - (5); or the method of steps (1) - (6) is re-performed.

5. Use of a method according to any one of claims 1 to 4 for determining the confidence level of a second generation sequencing test gene editing result.

6. The application of the method for judging the credibility of the second-generation sequencing detection gene editing result according to claim 5 is characterized in that the method is applied to the mixed pool knockout library to create high throughput of tobacco gene editing materials.

Technical Field

The invention relates to the technical field of plant genetic engineering, in particular to a method for judging the credibility of a second-generation sequencing detection gene editing result and application thereof.

Background

The gene editing technology CRISPR-Cas9 is an effective means for quickly realizing specific gene mutation, and is widely researched and applied to breeding and gene function research at present. The CRISPR-Cas9 is used for editing target genes, changing the properties of materials through the variation of the target genes and researching the functions of the target genes. The editing materials obtained by the CRISPR-Cas9 need to be detected by gene sequencing to detect the editing position, editing type and genotype of the target gene.

The mutation detection of a single target gene of an editing material is conventionally carried out by adopting first-generation sequencing, and the editing position, the editing type and the genotype of a target gene are judged by analyzing a peak image obtained by the first-generation sequencing. When editing materials are created in a large scale and the number of target genes to be detected is increased, the sequencing cost and the workload of peak map analysis are increased by adopting first-generation sequencing, so that a method for detecting the target gene mutation by adopting a second-generation sequencing mode after PCR appears. The second generation sequencing method can simultaneously sequence thousands of samples to analyze the mutation of the target gene, and mainly judges the editing positions, editing types and genotypes of different target genes through PCR amplification, library establishment, on-machine sequencing, data splitting and analysis.

The current second generation sequencing carries out statistical analysis by detecting the difference and the ratio between the result variant sequence (alt) and the reference sequence (ref) to judge the editing position and the editing type; the genotype is determined by the mutation detection software based on the number of the supported reads of ref/alt. Such determination does not take into consideration the number and ratio of reads, and the like, and thus false positive or false negative may occur in the mutation detection result and the non-mutation detection result of the target gene, which may affect the selection of the next-generation research sample.

Disclosure of Invention

The invention aims to solve the technical problem of providing a method for judging the reliability of the second-generation sequencing detection gene editing result and application thereof, solving the problem of false positive or false negative in the judgment of the second-generation sequencing result, improving the reliability of the judgment of the detection result and improving the efficiency of obtaining homozygous editing materials.

The technical problem to be solved by the invention is realized by the following technical scheme:

a method for judging the credibility of a second-generation sequencing detection gene editing result comprises the following steps:

(1) limitation of data volume: valid data for a single sample (clear Reads) is equal to or greater than 2000; data (Mapped Reads) matching the target gene was 2000 or more; the ratio of the number of matched Reads to the number of Clean Reads of the matched target gene (used for judging the specificity of PCR) is more than or equal to 50 percent;

(2) detecting a variant sequence which is different from a reference sequence in 23nt targeted by the sgRNA sequence, judging the variant sequence to be a homozygous mutation (Hom) when the ratio DP-Value between the variant sequence (alt) and the reference sequence (ref) is more than or equal to 10:1, and otherwise judging the variant sequence to be a heterozygous mutation (Het); if no variant sequence (alt) is detected, no editing is considered to have occurred;

(3) data analysis shows that when the variation sequence is detected in (2), the stability of the next generation detection result of the sample is better when the Genetic Quality (GQ) value is greater than or equal to 30, and therefore, the GQ value is set to be greater than or equal to 30;

(4) if the variant sequence is detected, if the step (1), the step (2) and the step (3) are simultaneously met, the variant result is judged to be Right, otherwise, the variant result is not judged to be Wrong; for samples in which no variant sequence is detected, if step (1) is satisfied, the non-variant result is regarded as "Right", otherwise, the non-variant result is regarded as "Wrong".

Preferably, the method further comprises the following steps:

(5) judging that the detection result of the variation is a 'Right' sample, and indicating that the result of the target gene editing of the sample is credible and can be preferentially researched; and judging that the detection result of the non-variation is the sample of 'Right', indicating that the result of the target gene of the sample which is not edited is credible, and not carrying out the next research.

Preferably, the method further comprises the following steps:

(6) the sample which is judged to be mutated or not mutated and has the detection result of Wrong indicates that the result of the target gene of the sample which is edited or not edited is not credible, and can be subjected to the on-machine sequencing again to judge whether to be further researched.

Preferably, the step (6) of re-performing the on-machine sequencing and determining whether to further study refers to re-performing the method of the steps (1) to (4); or re-performing the method of steps (1) - (5); or the method of steps (1) - (6) is re-performed.

The application of the method for judging the credibility of the gene editing result of the second-generation sequencing detection is disclosed.

Preferably, the application to mixed pool knockout libraries creates high throughput of tobacco gene editing material.

The technical scheme of the invention has the following beneficial effects:

according to the method and the device, the numbers and the proportion of reads of the detection results are set, so that the credibility of the second-generation variation detection result and the credibility of the non-variation detection result are judged, the accuracy of the edited detection result and the non-variation detection result are further improved, support is provided for the accurate judgment of the mutation detection result of the gene editing material, the credibility of the detection result is improved, and meanwhile, the efficiency of obtaining the homozygous editing material is also improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.

FIG. 1 is a graph of homozygous insertional mutagenesis of G to GT at the target site in sample S1 of the present application.

FIG. 2 shows the pattern of the variation of sample S4 in the target region.

FIG. 3 sample S7 of the present application shows no variation in target.

FIG. 4 sample S8 of the present application shows no variation pattern within the target.

Detailed Description

Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.

A method for judging the credibility of a second-generation sequencing detection gene editing result comprises the following steps:

(1) limitation of data volume: valid data for a single sample (clear Reads) is equal to or greater than 2000; data (Mapped Reads) matching the target gene was 2000 or more; the number of matched Reads matched with the target gene accounts for the ratio of the matched Reads to the Clean Reads, and is more than or equal to 50 percent;

(2) detecting a variant sequence which is different from a reference sequence in 23nt targeted by the sgRNA sequence, judging the variant sequence to be a homozygous mutation (Hom) when the ratio DP-Value between the variant sequence (alt) and the reference sequence (ref) is more than or equal to 10:1, and otherwise judging the variant sequence to be a heterozygous mutation (Het); if no variant sequence (alt) is detected, no editing is considered to have occurred;

(3) setting a Genotype Quality (GQ) value to be equal to or greater than 30;

(4) if the variant sequence is detected, if the step (1), the step (2) and the step (3) are simultaneously met, the variant result is judged to be Right, otherwise, the variant result is not judged to be Wrong; for samples in which no variant sequence is detected, if step (1) is satisfied, the non-variant result is regarded as "Right", otherwise, the non-variant result is regarded as "Wrong".

(5) Judging that the detection result of the variation is a 'Right' sample, and indicating that the result of the target gene editing of the sample is credible and can be preferentially researched; and judging that the detection result of the non-variation is the sample of 'Right', indicating that the result of the target gene of the sample which is not edited is credible, and not carrying out the next research.

(6) The sample which is judged to be mutated or not mutated and has the detection result of Wrong indicates that the result of the target gene of the sample which is edited or not edited is not credible, and can be subjected to the on-machine sequencing again to judge whether to be further researched.

The following examples 1 and 2 were both analyzed as described above:

example 1

Sample preparation: the results of selecting 4 samples for second-generation sequencing to detect target gene mutation (as shown in table 1) specifically include:

(1) the selected 4 samples are marked as S1, S2, S3 and S4, the effective data (Clean Reads) and the data (Mapped Reads) capable of matching the target genes of a single sample are respectively more than or equal to 2000, and the ratio of the number of Mapped Reads of the 4 samples matching the target genes to the number of Mapped rates of the Clean Reads is respectively 98.54%, 98.27%, 98.02% and 98.66%, and is respectively more than 50%;

(2) in 23nt targeted by the sgRNA sequence, a variant sequence (GT) different from a reference sequence (G) is detected at position 190, the ratio DP-Value between the variant sequence (alt) of S1-S3 and the reference sequence (ref) meets the condition that the ratio is more than or equal to 10:1, and the variation of the sample at the site is judged to be homozygous variation (Hom); s4, judging that the sample has no variation at the position when no variation sequence is detected;

(3) the Genotype Quality (GQ) values for the S1, S2, and S3 samples were 72, 99, and 99, respectively, all meeting greater than 30;

of the above 4 samples, the S1, S2, and S3 samples satisfied the conditions (1) to (3) at the same time. Therefore, it is considered that the samples S1, S2, and S3 were mutated from G-GT at the position 190, and the mutation types were all homozygous (Hom), and the result was regarded as "Right"; although S4 satisfied the condition (1), no mutation was detected in S4, and it was considered that the sample S4 had not mutated in the 23nt targeted by the sgRNA sequence, and the result was confirmed to be "Right".

(4) The next study will be performed with preference to S1, S2 and S3, and no further study will be performed with S4.

Table 1: credibility analysis table of S1-S4

To verify the results of the high throughput mutation detection, S1 and S4 were subjected to a first-generation sequencing verification, and sample S1 as shown in fig. 1 had homozygous insertion mutations from G to GT in 23nt targeted by sgRNA sequence; fig. 2 shows that sample S4 has no variation in the sgRNA sequence-targeted 23 nt.

Example 2

Sample preparation: the results of selecting 4 samples for second-generation sequencing to detect target gene mutation (as shown in table 2) specifically include:

(1) the 4 samples selected were designated as S5, S6, S7 and S8, wherein clear Reads and Mapped Reads for samples S6, S7 and S8 were respectively greater than or equal to 2000, and Mapped rates for the three samples were respectively 93.29%, 99.84% and 99.87%, each greater than 50%;

(2) no variant sequences were detected at 23nt within the sgRNA sequence for S5, S6, S7, and S8;

(3) the samples S6, S7, and S8 satisfy the condition (1). Therefore, samples S6, S7, and S8 were considered to have no mutation in the sequence of 23nt, and the results were judged to be reliable; sample S5 was judged to be not mutated in the sequence of 23 nt.

(4) Samples S6, S7, and S8 were discarded and S5 was again programmed to determine whether to proceed further.

Table 2: credibility analysis table of S5-S8

To verify the results of the high throughput mutation detection, one-generation sequencing verification was performed on S7 and S8, and samples S7 and S8 were not mutated in the 23nt targeted by the sgRNA sequence as shown in fig. 3 and 4.

Although the present invention has been described with reference to the above embodiments, it should be understood that the present invention is not limited thereto, and various changes and modifications may be made by those skilled in the art without departing from the spirit and scope of the present invention.

8页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:一种生存期预测模型的生成方法及装置、存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!