Method, device and storage medium for positioning segregation character

文档序号:685298 发布日期:2021-04-30 浏览:11次 中文

阅读说明:本技术 一种定位偏分离性状的方法、装置及存储介质 (Method, device and storage medium for positioning segregation character ) 是由 邓秀新 王楠 宋谢天 周银 胡健兵 谢源源 叶俊丽 于 2021-01-11 设计创作,主要内容包括:本发明提供一种定位偏分离性状的方法、装置及存储介质,方法包括:导入遗传群体待定位表型数据、遗传群体父母本及子代的基因型变异信息和基因组参考信息,对参考信息进行数据窗口划分,在多个数据窗口内对变异信息进行偏分离程度分析处理,得到待比较偏分离程度信息,从变异信息中提取相对性状的偏分离效应去除变异文件和偏分离效应增加变异文件,对其进行偏分离程度分析处理,得到第一和第二偏分离程度信息,将第一和第二偏分离程度信息与待比较偏分离程度信息进行比较,根据比较结果得到偏分离性状定位区段。本发明能够快速且准确地得到偏分离性状定位区段,解决了偏分离的性状不能定位的问题。(The invention provides a method, a device and a storage medium for positioning segregation traits, wherein the method comprises the following steps: introducing phenotype data to be positioned of a genetic group, genotype variation information and genome reference information of parents and filial generations of the genetic group, dividing data windows of the reference information, carrying out partial segregation degree analysis processing on the variation information in a plurality of data windows to obtain partial segregation degree information to be compared, extracting partial segregation effect removing variation files and partial segregation effect increasing variation files of relative characters from the variation information, carrying out partial segregation degree analysis processing on the partial segregation effect removing variation files and the partial segregation effect increasing variation files to obtain first partial segregation degree information and second partial segregation degree information, comparing the first partial segregation degree information and the second partial segregation degree information with the partial segregation degree information to be compared, and obtaining partial segregation character positioning sections according to comparison results. The method can quickly and accurately obtain the partial segregation character positioning section, and solves the problem that the partial segregation character cannot be positioned.)

1. A data processing method for positioning segregation traits is characterized by comprising the following steps:

introducing phenotype data to be located of the genetic group, genotype variation information of parents and offspring of the genetic group and genome reference information;

dividing the genome reference information based on a data window division method to obtain a plurality of data windows;

performing partial separation degree analysis processing on the genotype variation information in a plurality of data windows to obtain partial separation degree information to be compared;

dividing the filial generation of the genetic population in the phenotype data to be positioned of the genetic population into subgroups with different traits, and extracting partial segregation effect removing variation files and partial segregation effect increasing variation files of relative traits from the genotype variation information by taking the obtained subgroups as a standard;

performing partial separation degree analysis processing on the partial separation effect removal variant file through the plurality of data windows to obtain first partial separation degree information, and performing partial separation degree analysis processing on the partial separation effect addition variant file through the plurality of data windows to obtain second partial separation degree information;

and comparing the first partial separation degree information and the second partial separation degree information with the partial separation degree information to be compared, and obtaining a partial separation character positioning section according to a comparison result.

2. The method of claim 1, wherein the dividing the genomic reference information based on a data window division method to obtain a plurality of data windows comprises:

and carrying out window division on the genome reference information according to a preset step value to obtain a plurality of data windows, wherein the preset step value is 100kb in length.

3. The method of claim 1, further comprising the step of optimizing the genotypic variation information prior to analyzing the degree of segregation of the genotypic variation information over a plurality of data windows, comprising the steps of:

filtering out false positive sites of the offspring genotypes in the genotype variation information;

carrying out mutation type screening on the filtered genotype mutation information according to a preset Mendelian genetic theory model to obtain a Mendelian separation ratio;

the process of analyzing and processing the segregation degree of the genotype variation information in a plurality of data windows to obtain the segregation degree information comprises the following steps:

counting the frequency of the genotype variation information on the partial separation sites in each data window, and obtaining the number of the partial separation sites according to the frequency;

performing chi-square test on the Mendelian segregation ratio, and obtaining variation information of partial segregation sites according to a p value of a chi-square test result as a standard, wherein the p value is less than 0.001;

and taking the number of the partial separation sites and the variation information of the partial separation sites as partial separation degree information.

4. The method for localizing segregation traits as claimed in claim 1, wherein the process of dividing progeny of a genetic population in phenotype data to be localized of the genetic population into at least two sub-populations of different traits, and extracting segregation-effect-removing variation files and segregation-effect-increasing variation files of relative traits from the genotype variation information using the divided sub-populations as a standard comprises:

constructing a segregation-effect-removing class group and a segregation-effect-increasing class group through phenotype data to be located of the genetic population;

and extracting a partial segregation effect removing variation file and a partial segregation effect increasing variation file of relative characters from the genotype variation information by taking the partial segregation effect removing class group and the partial segregation effect increasing class group as a standard.

5. The method of localizing segregation traits as claimed in claim 4, wherein the process of constructing segregation effect eliminating cluster and constructing segregation effect increasing cluster from the phenotype data to be localized of the genetic population comprises:

obtaining a plurality of A phenotype population progeny information and a plurality of B phenotype population progeny information in the phenotype data to be located in the genetic population;

selecting all B phenotype population progeny information and randomly selecting A phenotype population progeny information to construct segregation-biased effect removal clusters;

and selecting all filial generation information of the A phenotype population to construct a segregation effect increasing class group.

6. The method for locating segregation trait according to claim 1, wherein the comparing the first segregation degree information and the second segregation degree information with the segregation degree information to be compared to obtain segregation trait locating segments according to the comparison result comprises:

comparing the first partial separation degree information and the second partial separation degree information with the partial separation degree information to be compared to obtain partial separation degree reduction information and partial separation degree increase information;

and obtaining data windows with overlapped data according to the partial separation degree reduction information and the partial separation degree increase information, and obtaining partial separation character positioning sections according to the data windows with overlapped data.

7. An apparatus for locating segregation traits, comprising:

the introduction module is used for introducing phenotype data to be positioned of the genetic group, genotype variation information of parents and filial generations of the genetic group and genome reference information;

the window dividing module is used for dividing the genome reference information based on a data window dividing method to obtain a plurality of data windows;

the processing module is used for analyzing and processing the segregation degree of the genotype variation information in a plurality of data windows to obtain segregation degree information;

dividing filial generations of the genetic population in the phenotype data to be located of the genetic population into at least two subgroups with different traits, and extracting partial segregation effect removing variation files and partial segregation effect increasing variation files of relative traits from the genotype variation information by taking the obtained subgroups as a standard;

performing partial separation degree analysis processing on the partial separation effect removing variation file through the plurality of data windows to obtain first partial separation degree information to be compared, and performing partial separation degree analysis processing on the partial separation effect adding variation file through the plurality of data windows to obtain second partial separation degree information to be compared;

and the comparison module is used for respectively comparing the first partial separation degree information to be compared and the second partial separation degree information to be compared with the partial separation degree information to obtain a first comparison result and a second comparison result, and performing intersection processing on the first comparison result and the second comparison result to obtain a partial separation character positioning section.

8. An apparatus for locating segregation traits, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that when the computer program is executed by the processor, a method of locating segregation traits as claimed in any one of claims 1 to 6 is implemented.

9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out a method for locating a segregation trait according to any one of claims 1 to 6.

Technical Field

The invention mainly relates to the technical field of gene data processing, in particular to a method, a device and a storage medium for positioning segregation traits.

Background

One of the major approaches in forward genetics is to locate relevant segments that control traits based on the hybrid population. For quality traits controlled by a single gene, a BC1 test cross segregating population and an F2 self-crossing segregating population are often constructed, and if the segregation ratio of dominant traits and recessive traits is tested by a chi-square method, the progeny of the BC1 population presents 1:1, while in the F2 population progeny, 1: 2: 1, the method for positioning the simple quality traits generally comprises QTL positioning and BSA positioning, and the methods can show good effect. However, the positioning of some traits which may influence the survival rate of the offspring may not be effective, because the traits can influence the segregation of the phenotype of the offspring, namely the phenomenon that the phenotype of the segregation population is partially segregated is generated, and no corresponding solution exists for the problem that the partially segregated traits cannot be positioned at present.

Disclosure of Invention

The invention provides a method, a device and a storage medium for positioning segregation traits, aiming at the defects of the prior art.

The technical scheme for solving the technical problems is as follows: a data processing method for positioning segregation traits comprises the following steps:

introducing phenotype data to be located of the genetic group, genotype variation information of parents and offspring of the genetic group and genome reference information;

dividing the genome reference information based on a data window division method to obtain a plurality of data windows;

performing partial separation degree analysis processing on the genotype variation information in a plurality of data windows to obtain partial separation degree information to be compared;

dividing the filial generation of the genetic population in the phenotype data to be positioned of the genetic population into subgroups with different traits, and extracting partial segregation effect removing variation files and partial segregation effect increasing variation files of relative traits from the genotype variation information by taking the obtained subgroups as a standard;

performing partial separation degree analysis processing on the partial separation effect removal variant file through the plurality of data windows to obtain first partial separation degree information, and performing partial separation degree analysis processing on the partial separation effect addition variant file through the plurality of data windows to obtain second partial separation degree information;

and comparing the first partial separation degree information and the second partial separation degree information with the partial separation degree information to be compared, and obtaining a partial separation character positioning section according to a comparison result.

Another technical solution of the present invention for solving the above technical problems is as follows: an apparatus for locating segregation traits, comprising:

the introduction module is used for introducing phenotype data to be positioned of the genetic group, genotype variation information of parents and filial generations of the genetic group and genome reference information;

the window dividing module is used for dividing the genome reference information based on a data window dividing method to obtain a plurality of data windows;

the processing module is used for analyzing and processing the segregation degree of the genotype variation information B in a plurality of data windows to obtain segregation degree information;

dividing filial generations of the genetic population in the phenotype data to be located of the genetic population into at least two subgroups with different traits, and extracting partial segregation effect removing variation files and partial segregation effect increasing variation files of relative traits from the genotype variation information B by taking the obtained subgroups as a standard;

performing partial separation degree analysis processing on the partial separation effect removing variation file through the plurality of data windows to obtain first partial separation degree information to be compared, and performing partial separation degree analysis processing on the partial separation effect adding variation file through the plurality of data windows to obtain second partial separation degree information to be compared;

and the comparison module is used for respectively comparing the first partial separation degree information to be compared and the second partial separation degree information to be compared with the partial separation degree information to obtain a first comparison result and a second comparison result, and performing intersection processing on the first comparison result and the second comparison result to obtain a partial separation character positioning section.

Another technical solution of the present invention for solving the above technical problems is as follows: a computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the method of locating a segregation trait as set forth above.

Another technical solution of the present invention for solving the above technical problems is as follows: a computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the method of locating a segregation trait as set forth above.

The invention has the beneficial effects that: and performing window division on the genome reference information to obtain a plurality of data windows for performing partial segregation degree analysis processing on the initial genotype variation information and the genotype variation information subjected to character division, and obtaining corresponding partial segregation degree information to be compared, first partial segregation degree information and second partial segregation degree information, and performing comparative analysis on the partial segregation degree information to determine a partial segregation character positioning segment.

Drawings

FIG. 1 is a schematic flow chart of a data processing method for locating segregation traits according to an embodiment of the present invention;

fig. 2 is a functional block diagram of an apparatus for locating segregation-based traits according to an embodiment of the present invention.

Detailed Description

The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.

Fig. 1 is a schematic flow chart of a data processing method for locating segregation traits according to an embodiment of the present invention.

Example 1: as shown in fig. 1, a data processing method for locating segregation traits includes the following steps:

introducing phenotype data to be located of the genetic group, genotype variation information of parents and offspring of the genetic group and genome reference information;

dividing the genome reference information based on a data window division method to obtain a plurality of data windows;

performing partial separation degree analysis processing on the genotype variation information in a plurality of data windows to obtain partial separation degree information to be compared;

dividing the filial generation of the genetic population in the phenotype data to be positioned of the genetic population into subgroups with different traits, and extracting partial segregation effect removing variation files and partial segregation effect increasing variation files of relative traits from the genotype variation information by taking the obtained subgroups as a standard;

performing partial separation degree analysis processing on the partial separation effect removal variant file through the plurality of data windows to obtain first partial separation degree information, and performing partial separation degree analysis processing on the partial separation effect addition variant file through the plurality of data windows to obtain second partial separation degree information;

and comparing the first partial separation degree information and the second partial separation degree information with the partial separation degree information to be compared, and obtaining a partial separation character positioning section according to a comparison result.

It should be understood that the "genotype variation information" in the "genotype variation information of parents and offspring of the genetic population" relates to common information of parents and offspring of the genetic population.

In the above embodiment, the genome reference information is subjected to window division to obtain a plurality of data windows for performing segregation degree analysis processing on the initial genotype variation information and the genotype variation information after trait division, to obtain corresponding segregation degree information to be compared, first segregation degree information and second segregation degree information, and the segregation degree information is compared and analyzed to determine the segregation trait localization segment.

On the basis of example 1, example 2: the process of performing window division on the genome reference information to obtain a plurality of data windows comprises:

and carrying out window division on the genome reference information according to a preset step value to obtain a plurality of data windows, wherein the preset step value is 100kb in length.

In the above embodiment, since the genome reference information has a long length, it is necessary to divide the genome reference information into equal lengths, which facilitates information indexing and segregation degree analysis of the initial genotype variation information and the genotype variation information after trait division.

On the basis of example 1, example 3: before analyzing and processing the segregation degree of the genotype variation information in a plurality of data windows, the method also comprises the step of optimizing the genotype variation information, and the process comprises the following steps:

filtering out false positive sites of the offspring genotypes in the genotype variation information;

and (4) carrying out mutation type screening on the filtered genotype mutation information B according to a preset Mendelian genetic theory model to obtain a Mendelian segregation ratio.

Specifically, the possible mark types of the filial generation are presumed according to the parental mark types, and when the number of individuals which do not accord with the filial generation theoretical genotype accounts for more than 5 percent of the total population, the sites which are considered to be false positive can be eliminated.

Specifically, the mutation type screening is performed according to the presumed model of the localized character, for example, establishing a BC1 separation model selects a site where only one of the parents is heterozygous, and establishing an F2 separation model selects a site where two parents are heterozygous. And determining a Mendelian separation ratio according to the separation model, performing chi-square test according to the separation ratio, and reserving the chi-square test result with the p value less than 0.001.

On the basis of example 3, example 4: the process of analyzing and processing the segregation degree of the genotype variation information B in each data window to obtain the segregation degree information comprises the following steps:

counting the frequency of the genotype variation information on the partial separation sites in each data window, and obtaining the number of the partial separation sites according to the frequency;

performing chi-square test on the Mendelian segregation ratio, and obtaining variation information of partial segregation sites according to a p value of a chi-square test result as a standard, wherein the p value is less than 0.001;

and taking the number of the partial separation sites and the variation information of the partial separation sites as partial separation degree information.

Specifically, the frequency of the genotype variation information on the segregation locus is counted in each data window, a whole genome segregation locus distribution map of the hybridization population is obtained by mapping, and the segregation degree is calculated in each window to obtain the segregation degree.

It should be understood that the segregation locus mutation information includes the position information of the segregation locus on the chromosome, and the number statistics is performed according to the 100kb window divided by the genome, and is recorded as the segregation frequency of the window, which reflects the credibility of the segregation of the locus. All the partial separation sites in each window have partial separation characteristics, and the degree of partial separation can be reflected according to the p value, and the log10 is taken for p in the embodiment to reflect the degree of partial separation.

In the above examples, variation information of partial separation sites was obtained by performing the chi-square test on the Mendelian separation ratio.

Example 5 on the basis of example 1: dividing the filial generation of the genetic population in the phenotype data to be positioned of the genetic population into at least two subgroups with different traits, and extracting a segregation effect removing variation file and a segregation effect increasing variation file of relative traits from the genotype variation information by taking the divided subgroups as a standard, wherein the process comprises the following steps:

constructing a segregation-effect-removing class group and a segregation-effect-increasing class group through phenotype data to be located of the genetic population;

and extracting a partial segregation effect removing variation file and a partial segregation effect increasing variation file of relative characters from the genotype variation information by taking the partial segregation effect removing class group and the partial segregation effect increasing class group as a standard.

In the above embodiment, the phenotype partial segregation information of the localization candidate segment, i.e., the genetic to-be-localized phenotype data, is obtained by the partial segregation degree decreasing information and the partial segregation degree increasing information.

On the basis of example 5, example 6: the process of constructing segregation effect elimination cluster and constructing segregation effect enhancement cluster by the phenotype data to be located of the genetic population comprises the following steps:

obtaining a plurality of A phenotype population progeny information and a plurality of B phenotype population progeny information in the phenotype data to be located in the genetic population;

selecting all B phenotype population progeny information and randomly selecting A phenotype population progeny information to construct segregation-biased effect removal clusters;

and selecting all filial generation information of the A phenotype population to construct a segregation effect increasing class group.

Specifically, randomly selecting the number of the filial generation information of the A phenotype population and the corresponding number of the filial generation information of all the B phenotype populations as m and n, wherein m is greater than n, wherein m: the proportion of n is in accordance with the Chi-square test of the Mendelian genetic model.

In the above embodiment, the progeny information of the plurality of phenotype group a and the progeny information of the plurality of phenotype group B are obtained through the phenotype data to be localized of the genetic group, so as to construct and obtain the segregation-effect-removing class group and the segregation-effect-increasing class group, and the segregation-effect-removing class group and the segregation-effect-increasing class group are used for facilitating further processing to obtain the segregation-trait localization segment.

Example 7 on the basis of example 1: the step of comparing the first partial separation degree information and the second partial separation degree information with the partial separation degree information to be compared and obtaining partial separation character positioning sections according to the comparison result comprises the following steps:

comparing the first partial separation degree information and the second partial separation degree information with the partial separation degree information to be compared to obtain partial separation degree reduction information and partial separation degree increase information;

and obtaining data windows with overlapped data according to the partial separation degree reduction information and the partial separation degree increase information, and obtaining partial separation character positioning sections according to the data windows with overlapped data.

Specifically, a t test method is adopted in a data window, a 99% confidence interval is set, and the t test method shows that the partial separation degree is remarkably reduced, so that the partial separation degree is remarkably reduced; similarly, a confidence interval of 99% is set in the data window by a t-test method, and the window with the significantly increased degree of partial separation is the window with the significantly increased degree of partial separation indicated by the t-test method.

It is to be understood that statistical multiple segregation effects remove windows where the degree of segregation of a class group is significantly reduced compared to a hybrid population, statistical segregation effects increase windows where the degree of segregation of a class group is significantly increased compared to a hybrid population, overlapping windows are localization candidate segments, and influence the segregation of phenotypes.

Fig. 2 is a functional block diagram of an apparatus for locating segregation-based traits according to an embodiment of the present invention.

Example 8: an apparatus for locating segregation traits, comprising:

the introduction module is used for introducing phenotype data to be positioned of the genetic group, genotype variation information of parents and filial generations of the genetic group and genome reference information;

the window dividing module is used for dividing the genome reference information based on a data window dividing method to obtain a plurality of data windows;

the processing module is used for analyzing and processing the segregation degree of the genotype variation information in a plurality of data windows to obtain segregation degree information;

dividing filial generations of the genetic population in the phenotype data to be located of the genetic population into at least two subgroups with different traits, and extracting partial segregation effect removing variation files and partial segregation effect increasing variation files of relative traits from the genotype variation information by taking the obtained subgroups as a standard;

performing partial separation degree analysis processing on the partial separation effect removing variation file through the plurality of data windows to obtain first partial separation degree information to be compared, and performing partial separation degree analysis processing on the partial separation effect adding variation file through the plurality of data windows to obtain second partial separation degree information to be compared;

and the comparison module is used for respectively comparing the first partial separation degree information to be compared and the second partial separation degree information to be compared with the partial separation degree information to obtain a first comparison result and a second comparison result, and performing intersection processing on the first comparison result and the second comparison result to obtain a partial separation character positioning section.

Example 9: an apparatus for locating segregation traits, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the computer program when executed by the processor implementing a method for locating segregation traits as described above.

Example 10: a computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the method of locating a segregation trait as set forth above.

It should be understood that in the above examples 1 to 8, the data processing method is described by taking two properties as an example, and if more than two properties are present, the data processing method is the same as the data processing method of "two properties", and thus the description thereof is omitted.

The validity of the method is verified in the following specific examples:

the citrus genome and citrus hybrid population is used for positioning, the citrus self-incompatibility character is positioned in the interval of 1-1.3mb on the first citrus chromosome, and the weight sequencing data of the population is used for constructing genotype variation files of parents and filial generations of the hybrid population, because according to the affinity: the incompatible segregation ratio is 1.7:1, the incompatible segregation ratio is supposed to deviate from a 1:1 model, a Mendelian genetic theory model, namely a parental BC1 model, is established, the incompatible traits in a citrus group are known to be from a female parent, the compatible traits are from a male parent, and the partial segregation is biased towards the male parent, wherein the partial segregation of the compatible and incompatible relative traits is successfully detected, the verification is carried out on the basis of the relative traits of a known positioning interval in the example, which shows that the partial segregation detection is effective, and the partial segregation traits can be accurately determined.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

10页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:肿瘤个体化免疫治疗基因检测结果的解读方法、系统和存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!